The “Cancer Relative Survival Model” represents transitions between three states: no evidence of disease (NED), cancer metastasis (Mets), and cancer death.
We need to calibrate the two transition probabilities in the figure above - p.Mets
, p.DieMets
- because these transition probabilities are unobservable (at least in this example).
To calibrate p.Mets
and p.DieMets
, we will fit the model to observed survival data. To do this, we’ll sample the two parameters randomly, and then measure how well the model-predicted survival matches the observed survival.
We’ll use the following packages:
# calibration functionality
library(lhs)
# visualization
library(plotrix)
library(psych)
The hypothetical calibration target is the proportion of people that survive over time.
load("CRSTargets.RData")
Let’s see what the survival looks like.
plotrix::plotCI(x = CRS.targets$Surv$Time, y = CRS.targets$Surv$value,
ui = CRS.targets$Surv$ub,
li = CRS.targets$Surv$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Pr Survive")
Note that the estimate of survival has some uncertainty around it, indicated by the error bounds.
If you had another target (stored in CRS.targets$Target2
), you could plot it like this:
plotrix::plotCI(x = CRS.targets$Target2$Time, y = CRS.targets$Target2$value,
ui = CRS.targets$Target2$ub,
li = CRS.targets$Target2$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Target 2")
The inputs to the function are parameters to be estimated through calibration. The model outputs correspond to the target data.
This creates the function markov_crs()
. The function has one argument, v.params
, which is a vector of parameters. In this case, v.params
is a named vector with elements p.Mets
and p.DieMets
.
source("markov_crs.R")
Let’s check that the model runs. We arbitrarily set some parameter values in v.parms.test
, then run the model.
v.params.test <- c(p.Mets = 0.1, p.DieMets = 0.2)
test_results <- markov_crs(v.params.test)
Within the returned model object (a list), the predicted survival is in a vector called Surv
:
str(test_results)
## List of 1
## $ Surv: Named num [1:59] 0.98 0.946 0.903 0.853 0.801 ...
## ..- attr(*, "names")= chr [1:59] "2" "3" "4" "5" ...
head(test_results$Surv)
## 2 3 4 5 6 7
## 0.9800000 0.9460000 0.9026000 0.8533000 0.8007380 0.7468786
plot(test_results$Surv)
It works!
But let’s compare the outputs to the targets.
plotrix::plotCI(x = CRS.targets$Surv$Time, y = CRS.targets$Surv$value,
ui = CRS.targets$Surv$ub,
li = CRS.targets$Surv$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Pr Survive")
points(test_results$Surv,
col = "green", pch = 20)
legend("topright", legend = c("Targets", "Outputs"),
col = c("black", "green"),
pch = c(1, 20))
That doesn’t look right! We need to calibrate the unobservable parameters so that the results match the observed data.
For random search with latin hypercube sampling, we need to specify:
First, when sampling random numbers in R
, setting a seed allows you to obtain the same sample each time.
set.seed(1010)
Let’s start with 1000 samples. We’re calibrating 2 parameters, p.Mets
and p.DieMets
.
param.names <- c("p.Mets","p.DieMets")
n.param <- length(param.names)
rs.n.samp <- 1000
The boundaries on the search space are as follows:
# lower bound
lb <- c(p.Mets = 0.04, p.DieMets = 0.04)
# upper bound
ub <- c(p.Mets = 0.16, p.DieMets = 0.16)
Now, we sample. Right away, we rename the columns of m.lhs.unit
so we can keep track of which column is which.
# Sample unit Latin Hypercube
m.lhs.unit <- lhs::randomLHS(rs.n.samp, n.param)
colnames(m.lhs.unit) <- param.names
head(m.lhs.unit)
## p.Mets p.DieMets
## [1,] 0.92829349 0.24442996
## [2,] 0.64837718 0.43424406
## [3,] 0.50388911 0.98957474
## [4,] 0.43746806 0.41460896
## [5,] 0.02729788 0.05204551
## [6,] 0.17821315 0.30577191
Note that this does not yet incorporate the ranges we specified.
plot(x = m.lhs.unit[, param.names[1]],
y = m.lhs.unit[, param.names[2]],
xlab = param.names[1],
ylab = param.names[2])
We have to rescale to the ranges we specified. (Note: we can start with a LHS sample and then transform it to any distribution - in this case, we’re rescaling it to a uniform distribution)
# Rescale to min/max of each parameter
rs.param.samp <- matrix(nrow=rs.n.samp,ncol=n.param)
colnames(rs.param.samp) <- param.names
for (i in 1:n.param){
rs.param.samp[,i] <- qunif(m.lhs.unit[,i],
min = lb[i],
max = ub[i])
}
Now, the parameters are within the range we specified.
# view resulting parameter set samples
psych::pairs.panels(rs.param.samp)
We have 1000 parameter sets, so we’ll have 1000 entries in the goodness-of-fit vector. We’re currently using a single target, so we’ll have one column (i.e., a vector).
# initialize goodness-of-fit vector
rs.GOF <- rep(0, rs.n.samp)
Now, we run the model for each set of input values, and determine how well the model results fit the target data.
In this case, the goodness of fit function is the likelihood of observing the target data, given the model results. We assume a normal likelihood for this example. In many cases, it’s more numerically stable to use the log-likelihood log = TRUE
and sum the individual likelihoods together.
gof_norm_loglike <- function(target_mean, target_sd, model_output){
sum(dnorm(x = target_mean,
mean = model_output,
sd = target_sd,
log = TRUE))
}
Loop through sampled sets of input values
for (j in 1:rs.n.samp){
### Run model for a given parameter set ###
model.res = markov_crs(v.params = rs.param.samp[j, ])
### Calculate goodness-of-fit of model outputs to targets ###
# log likelihood of the model output given the targets
rs.GOF[j] = gof_norm_loglike(model_output = model.res$Surv,
target_mean = CRS.targets$Surv$value,
target_sd = CRS.targets$Surv$se)
}
# Arrange parameter sets in order of fit
rs.calib.res <- cbind(rs.param.samp, rs.GOF)
rs.calib.res <- rs.calib.res[order(-rs.calib.res[,"rs.GOF"]),]
# Examine the top 10 best-fitting sets
rs.calib.res[1:10,]
## p.Mets p.DieMets rs.GOF
## [1,] 0.04782207 0.12645087 156.0153
## [2,] 0.12570106 0.04809467 155.9828
## [3,] 0.13269799 0.04690869 155.7880
## [4,] 0.12781268 0.04708311 155.7230
## [5,] 0.04891700 0.12277125 155.7190
## [6,] 0.04760128 0.13057134 155.6673
## [7,] 0.04657485 0.13111585 155.5720
## [8,] 0.05052389 0.11034383 155.1886
## [9,] 0.12734270 0.04851216 155.1570
## [10,] 0.05011317 0.11813266 155.0989
# Plot the top 100 (top 10%)
plot(x = rs.calib.res[1:100,1], y = rs.calib.res[1:100,2],
xlim=c(lb[1],ub[1]),ylim=c(lb[2],ub[2]),
xlab = colnames(rs.calib.res)[1],ylab = colnames(rs.calib.res)[2])
How does the model look with the best-fitting set?
rs_best_fit_params <- c(rs.calib.res[1, c("p.Mets", "p.DieMets")])
rs_best_fit_model <- markov_crs(rs_best_fit_params)
plotrix::plotCI(x = CRS.targets$Surv$Time, y = CRS.targets$Surv$value,
ui = CRS.targets$Surv$ub,
li = CRS.targets$Surv$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Pr Survive")
points(x = names(rs_best_fit_model$Surv), y = rs_best_fit_model$Surv,
col = "green",
pch = 20)
legend("topright", legend = c("Targets", "Random Search"),
col = c("black", "green"),
pch = c(1, 20))
We used a normal likelihood as a goodness-of-fit measure above, but there are many other options. For example, we could use a weighted sum of squared errors.
gof_wsse <- function(model_output, target_mean, target_se){
w = 1/(target_se^2)
gof <- (-1) * sum(w*(model_output - target_mean)^2)
return(gof)
}
There is nothing restricting us to a single target, either. If we do use multiple targets, we can combine the multiple goodness-of-fit measures into a single GOF, possibly using weights to indicate the importance of each target.
If we had 2 targets, we would define a GOF matrix with two columns:
# initialize goodness-of-fit matrix
rs.GOF <- matrix(0, nrow = n.samp, ncol = 2)
for (j in 1:n.samp){
### Run model for a given parameter set ###
model.res = markov_crs(v.params = rs.params.samp[j, ])
### Target 1 ###
rs.GOF[j, 1] = gof_norm_loglike(model_output = model.res$Output1,
target_mean = CRS.targets$Target1$value,
target_sd = CRS.targets$Target1$se)
### Target 2 ###
rs.GOF[j, 2] = gof_norm_loglike(model_output = model.res$Output2,
target_mean = CRS.targets$Target2$value,
target_sd = CRS.targets$Target2$se)
}
Then, we would combine the GoFs into a single, multiplying by a v.weights
, a vector of weights (here, equal weights).
# can give different targets different weights
v.weights <- matrix(1, nrow = n.target, ncol = 1)
# matrix multiplication to calculate weight sum of each GOF matrix row
rs.GOF.overall <- c(rs.GOF %*% v.weights)
# Store in GOF matrix with column name "Overall"
rs.GOF <- cbind(rs.GOF, Overall_fit=rs.GOF.overall)
Main differences between Random Search and Nelder-Mead:
Let’s do a Nelder-Mead calibration!
n.init <- 100
nm.params.init <- matrix(nrow=n.init,ncol=n.param)
set.seed(101)
for (i in 1:n.param){
nm.params.init[,i] <- runif(n.init,min=lb[i],max=ub[i])
}
colnames(nm.params.init) <- param.names
head(nm.params.init)
## p.Mets p.DieMets
## [1,] 0.08466381 0.05500232
## [2,] 0.04525898 0.04279920
## [3,] 0.12516208 0.08702335
## [4,] 0.11892285 0.14315183
## [5,] 0.06998269 0.12620014
## [6,] 0.07600658 0.08072740
This function is what is optimized by the optim()
function. As such, it must return a summary measure of goodness of fit - that is, a single number. If we were to use multiple targets, their combination would have to be calculated in this function.
nm_objective = function(v.params){
### Run model for parametr set "v.params" ###
model.res = markov_crs(v.params)
# log likelihood
v.GOF = gof_norm_loglike(target_mean = CRS.targets$Surv$value,
model_output = model.res$Surv,
target_sd = CRS.targets$Surv$se)
return(v.GOF)
}
For each starting point, we maximize the goodness of fit, then save the resulting parameter values:
nm.calib.res <- matrix(nrow = n.init, ncol = n.param+1)
colnames(nm.calib.res) <- c(param.names, "Overall_fit")
for (j in 1:n.init){
fit.nm <- optim(nm.params.init[j,], nm_objective,
control = list(fnscale = -1, # fnscale = -1 switches from minimization to maximization
maxit = 1000),
hessian = T)
nm.calib.res[j,] <- c(fit.nm$par,fit.nm$value)
}
# Arrange parameter sets in order of fit
nm.calib.res <- nm.calib.res[order(-nm.calib.res[,"Overall_fit"]),]
# Examine the top 10 best-fitting sets
nm.calib.res[1:10,]
## p.Mets p.DieMets Overall_fit
## [1,] 0.12439610 0.04810758 156.0328
## [2,] 0.04810757 0.12439632 156.0328
## [3,] 0.12439800 0.04810736 156.0328
## [4,] 0.12439921 0.04810714 156.0328
## [5,] 0.12439998 0.04810701 156.0328
## [6,] 0.12439728 0.04810703 156.0328
## [7,] 0.04810716 0.12439631 156.0328
## [8,] 0.04810842 0.12439197 156.0328
## [9,] 0.12439044 0.04810829 156.0328
## [10,] 0.12439814 0.04810673 156.0328
# Plot the top 10 (top 10%)
plot(nm.calib.res[1:10,1],nm.calib.res[1:10,2],
xlim=c(lb[1],ub[1]),ylim=c(lb[2],ub[2]),
xlab = colnames(nm.calib.res)[1],ylab = colnames(nm.calib.res)[2])
What do you notice about these results?
Let’s see how the results look compared to the targets:
nm_best_fit_params <- c(nm.calib.res[1, c("p.Mets", "p.DieMets")])
nm_best_fit_model <- markov_crs(nm_best_fit_params)
plotrix::plotCI(x = CRS.targets$Surv$Time, y = CRS.targets$Surv$value,
ui = CRS.targets$Surv$ub,
li = CRS.targets$Surv$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Pr Survive")
points(x = names(nm_best_fit_model$Surv), y = nm_best_fit_model$Surv,
col = "red",
pch = 4,
cex = 1.2)
legend("topright", legend = c("Targets", "Nelder-Mead"),
col = c("black", "red"),
pch = c(1, 2, 4))
We can plot the calibration results on the same axes, to see if the results are similar or different.
plotrix::plotCI(x = CRS.targets$Surv$Time, y = CRS.targets$Surv$value,
ui = CRS.targets$Surv$ub,
li = CRS.targets$Surv$lb,
ylim = c(0, 1),
xlab = "Time", ylab = "Pr Survive")
points(x = names(rs_best_fit_model$Surv), y = rs_best_fit_model$Surv,
col = "green",
pch = 2,
cex = 1.2)
points(x = names(nm_best_fit_model$Surv), y = nm_best_fit_model$Surv,
col = "red",
pch = 4,
cex = 1.2)
legend("topright", legend = c("Targets", "Random Search", "Nelder-Mead"),
col = c("black", "green", "red"),
pch = c(1, 2, 4))
#### CRS Markov model in a function ####
markov_crs <- function(v.params) {
with(as.list(v.params), {
## Markov model parameters
n.t <- 60 # time horizon, number of cycles
v.n <- c("NED", "Mets", "Death") # the 3 states of the model
n.s <- length(v.n) # number of health states
# Transition probabilities
# p.Mets = 0.10 # probability to become sicker when sick
# p.DieMets = 0.05 # hazard ratio of death in sick vs healthy
####### INITIALIZATION ##########################################
# create the cohort trace
m.M <- matrix(NA, nrow = n.t + 1 ,
ncol = n.s,
dimnames = list(0:n.t, v.n)) # create Markov trace (n.t + 1 because R doesn't understand Cycle 0)
m.M[1, ] <- c(1, 0, 0) # initialize Markov trace
# create transition probability matrix for NO treatment
m.P <- matrix(0,
nrow = n.s,
ncol = n.s,
dimnames = list(v.n, v.n))
# fill in the transition probability array
### From NED
m.P["NED", "NED"] <- 1 - (p.Mets)
m.P["NED", "Mets"] <- p.Mets
m.P["NED", "Death"] <- 0 # Not allowed to die from cancer in NED state
### From Mets
m.P["Mets", "NED"] <- 0
m.P["Mets", "Mets"] <- 1 - (p.DieMets)
m.P["Mets", "Death"] <- p.DieMets
### From Death
m.P["Death", "Death"] <- 1
# check rows add up to 1
if (!isTRUE(all.equal(as.numeric(rowSums(m.P)), as.numeric(rep(1, n.s))))) {
stop("This is not a valid transition Matrix")
}
############# PROCESS ###########################################
for (t in 1:n.t){ # throughout the number of cycles
m.M[t + 1, ] <- m.M[t, ] %*% m.P # estimate the Markov trace for cycle the next cycle (t + 1)
}
####### EPIDEMIOLOGICAL OUTPUT ###########################################
#### Overall Survival (OS) ####
v.os <- 1 - m.M[, "Death"] # calculate the overall survival (OS) probability
####### RETURN OUTPUT ###########################################
out <- list(Surv = v.os[-c(1:2)])
return(out)
}
)
}
Alarid-Escudero, Fernando, Richard F. MacLehose, Yadira Peralta, Karen M Kuntz, and Eva A Enns. 2018. “Nonidentifiability in Model Calibration and Implications for Medical Decision Making.” Medical Decision Making 38 (7). University of Minnesota Shcool of Public Health: 810–21. doi:10.1177/0272989X18792283.