Sie sind auf Seite 1von 31

Fundamentals of Model Calibration:

Theory & Practice

ISPOR 17th Annual International Meeting


Washington, DC USA
4 June 2012

Workshop Leaders
Douglas Taylor, MBA
Associate Director, Ironwood Pharmaceuticals Inc, Cambridge, MA
USA
Ankur Pandya, PhD MPH
Graduate Student, Harvard University, Boston, MA USA
David Thompson, PhD
Executive Vice President & Senior Scientist, OptumInsight, Boston,
MA USA

*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 2
Acknowledgements

We would like to thank our colleagues who have contributed much to this
research over the last several years

Kristen Gilmore
Rowan Iskandar
Denise Kruzikas
Kevin Leahy
Vivek Pawar
Milton Weinstein

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 3

Workshop Objectives
Discuss rationale for model calibration in what circumstances is
calibration needed?
Provide overview of model calibration process: selection of inputs,
specifying the objective function, implementing the search process, and
evaluating the calibration results
Describe advanced topics in model calibration, including incorporation
of calibrated inputs into uncertainty analyses
Illustrate concepts through real-world examples

*Copy and paste this text box to enter notations/source information. 7pt type. Aligned to bottom. No need to move or resize this box.

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 4
Concept of Model Calibration
Calibration traditionally conceptualized as an importantbut not
necessary stepin model validation:
If reliable benchmark data exist, then predictive validity can be
assessed & model calibrated if found to be inaccurate
Otherwise, model cannot be impugned for not being calibrated
Calibration task involves systematic adjustment of model parameter
estimates so that model outputs more accurately reflect external
benchmarks
Calibration requires modeler to assess how model outputs can govern
model inputs, rather than the other way around

Model Inputs Model Model Outputs

Data Sources

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5

When is calibration needed?


Model validity threatened by spatial variation (eg, if being adapted
from original setting to a foreign one)

CHD Risk

US

France

Cholesterol
Level

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6
When is calibration needed?
Model validity threatened by temporal variation (eg, if input data are
old or secular changes have occurred since their collection)

CHD Risk

US1980

US2010

Cholesterol
Level

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 7

When is calibration needed?


Model validity threatened by heterogeneity (eg, population average
data available, but subgroup data not)

CHD Risk US Men

US Average

US Women

Cholesterol
Level

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8
Model Calibration Process

Estimate Model
Run Model
Parameters

Adjust Inputs Assess Results

Looks straightforward, but


What criteria do we employ to adjust model results?
How do we go about adjusting model inputs?
How do we know when we are done?

Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 9

Thank You.

Contact Info:
David Thompson, PhD
david.thompson@optum.com
781-518-4034
FUNDAMENTALSOF
MODELCALIBRATION:
THEORY&PRACTICE

IdentifyingInputstoCalibrate
Theoretically,anyinputcouldbecalibrated
Butinputsshouldberelatedtotheproblem
tojustifyusingcalibration
Adaptedfromonesettingtoanother
Estimatedfromheterogeneouspopulations
Affectedbytemporalchangesinepidemiologyor
practicepatterns
IdentifyingCalibrationTargets
Targetsshouldbebasedonsettingspecific(or
otherwiseappropriate)data
Modelshouldpredictthesetypesofevents
(agespecific,compositeoutcomes,etc.)

GoodnessofFit
Assesshowwellmodeloutputsmatch
observeddata
Threepotentialapproaches:
Acceptablewindows
Minimizingdeviations
Likelihoodfunctions
AcceptableWindows
Comparemodelpredictedoutcomesto
establishedrangesforeachendpoint
Suitablewhentherearemultipleendpointsof
interest
Easytoimplement
Limitation:Doesnotcapturethedegreeof
closeness

AcceptableWindows Example

UpperBound

LowerBound
AcceptableWindows Example

UpperBound

LowerBound

AcceptableWindows Example

UpperBound

LowerBound
AcceptableWindows Example

UpperBound

LowerBound

MinimizingDeviation
Summarymeasureofrelativedistanceof
modelproducedresultsfrombenchmarks
Capturesmagnitudeofgoodnessoffit
Easytoimplement
Weightsallendpointsequally,unless
weightingschemeintroduced
PercentageDeviation

n predi obsi
Weighted Mean Percentage Deviation = wi
i =1 obsi

Where,
n = number of endpoints
predi = model-based estimate of the ith endpoint
obsi = data-based target value of the ith endpoint
wi = weight assigned to the ith endpoint

MinimizingDeviation Example

Target
MinimizingDeviation Example

Target

MinimizingDeviation Example

Target
LikelihoodFunctions
Howlikelythemodelproducedresultsarein
lightoftheobservedoutcomes
Incorporatesprecisionofendpointdata
Hardertoimplement
Needdataonsamplesizes
Havetoknow(orassume)distributions

LikelihoodFunctions Example
Assumeincidencehasbinomialdistribution

n k
Pr( K = k ) = p (1 p ) n k
Where:
k
k=#ofeventsobservedinmodel
n=samplesizeofoutcomedata
p=#ofeventsobservedinoutcomedata/n
LikelihoodFunction Example
n = person-years
k = events
(k / n)*1000 = Incidence (y-axis)
k = 23
n = 2,800
Incidence 8.21

Target

LikelihoodFunction Example
AgeSpecificIncidence(per1000personyears)
12.00

k = 28
L = 0.047
10.00

k = 23
n = 2,800
8.00

k = 14
L = 0.013 ARIC
Target
6.00
ParameterSet1
ParameterSet2

4.00

2.00

0.00
4554yrs 5564yrs 6574yrs 7584yrs
LikelihoodFunction Example
AgeSpecificIncidence(per1000personyears)
12.00

k = 28
L = 0.047
10.00

k = 23
n = 2,800
8.00

k = 14
L = 0.013 Target
ARIC
6.00
ParameterSet1
ParameterSet2

4.00

k = 287 k = 240
2.00 n = 49,000 L = 0.00045 k = 368
L = 0.00000064

0.00
4554yrs 5564yrs 6574yrs 7584yrs

CombiningLikelihoods
Multiplylikelihoods(ifindependent)
Sumloglikelihoods
SummaryofGoodnessofFitOptions
Acceptable Windows Deviations Likelihood-based

Easy to implement Easy to implement Need specific data


Not sensitive to Captures magnitude Need to know (or
magnitude of of deviations assume) distribution
deviations Weights for multiple Gives meaningful
endpoints will be goodness-of-fit
subjective measures (i.e.,
likelihoods are
probabilities)

ParameterSearchMethods
Howtoadjustinputsduringcalibration?
Manualadjustment
Randomsearches
Directedsearchalgorithms
Fundamentals of Model Calibration: Theory &
Practice
Advanced Topics

CONFIDENTIAL

Confidential | 33

EXCELDEMONSTRATION
Results of 100 calibrations of simple model

Confidential | 35

Advanced Topics
Probabilistic and deterministic sensitivity analysis
for calibrated disease models
Incorporating uncertainty of calibration endpoints in
calibrated oncology models
Identification of and correction for bias introduced
from calibrating longitudinal models to cross-
sectional data

Confidential | 36
Probabilistic and deterministic
sensitivity analyses for calibrated
disease models

Confidential | 37

WhyCSAWasNeeded

$600 $50K threshold

$500 Median: $10,500


Mean: $10,600
95% CI: ($7,800; $13,900)
Incremental Cost

$400

$300

$200

$100

$0
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016
Incremental QALY

Confidential | 38
Why CSA Was Needed
Sources of uncertainty
Algorithm
Analyst in a manual calibration
Starting seed/search space in a random calibration
Starting simplex in Nelder-Mead calibration
Objective function
Is really quite subjective
Choices include:
Calibration targets
Weighting scheme
Stopping point

Confidential | 39

CSA Methods
Evaluated algorithm uncertainty by choosing 5
different starting Nelder-Mead simplexes
Evaluated objective function uncertainty by choosing
5 different objective functions
Combined simplexes and weights for a total of 25
different calibrations
Deterministic sensitivity analysis was performed by
examining cost-effectiveness results for each
calibration while holding all other parameters constant
Probabilistic sensitivity analysis was performed by
bootstrapping (with equal probability) the 25
calibrations within a 2nd order Monte Carlo simulation
for other model parameters
Confidential | 40
CSADeterministicResults

ICER*bysimplexand
Weight 1 Weight 2 Weight 3 Weight 4 Weight 5

Simplex 1 $8,400 $13,800 $4,400 $11,600 $5,300

Simplex 2 $17,100 $20,800 $7,800 $15,100 $8,100

Simplex 3 $20,500 $11,500 $27,800 $17,300 $10,900

Simplex 4 $20,700 $22,000 $1,500 $8,000 $5,400

Simplex 5 $20,700 $21,000 $39,100 $12,100 $8,900

MedianICER:$12,600
MeanICER:$14,000
Range:$1,500 $39,000
Confidential | 41
*ICER: Incremental Cost-Effectiveness Ratio (Cost per QALY gained) for vaccination vs. no vaccination

PSAforaSingleCalibration

$600 $50K threshold

$500 Median: $10,500


Mean: $10,600
95% CI: ($7,800; $13,900)
Incremental Cost

$400

$300

$200

$100

$0
0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016
Incremental QALY

Confidential | 42
CSAProbabilisticSAResults

$ 60 0
$50K threshold

$ 50 0 Median: $12,600
Mean: $14,000
$ 40 0 95% CI: ($2,700; $29,100)
Incremental Cost

$ 30 0

$ 20 0

$ 10 0

$0
0 .00 0 0.0 02 0.0 04 0 .00 6 0.0 08 0 .01 0 0 .01 2 0.0 14 0 .01 6

-$ 10 0
Incre me ntal Q ALY
Vaccination of age cohorts are compared with no vaccination among same age cohorts.
Each square represents a calibration and each color represents the PSA around those calibrations.
Confidential | 43

Representing uncertainty in
calibration targets

Confidential | 44
Objective
Demonstrate methods for incorporating uncertainties
in calibration targets into sensitivity analyses (PSA)
using an oncology example

Confidential | 45

Model

Non-Progressed Progressed

Dead

We constructed hypothetical PFS and OS (with censoring) curves for


two treatments and a corresponding three-state Markov model
Three transition probabilities for each treatment were calibrated
(using Excel Solver) to simultaneously fit the PFS/OS curves, using
mean squared deviation as the objective function
Uncertainty in cost-effectiveness results was represented by cost-
effectiveness acceptability curves (CEAC) of lifetime costs and
quality-adjusted life-years
Confidential | 46
Analysis

We will look at results of three increasingly comprehensive PSAs using


second-order Monte Carlo simulation (SMCS):
Conventional PSA by including only probability distributions of costs
and utilities
Calibration Parameter PSA, reflecting uncertainty in the target
PFS/OS curves, by specifying beta distributions for failure
probabilities at each PFS/OS time point, simulating multiple
replicates of the PFS/OS data from these distributions, re-estimating
and refitting the curves for each replicate, and incorporating the
resulting calibrated parameter sets into the SMCS
Calibration Structural PSA, reflecting uncertainty associated with
calibration methods, by varying curve-fitting parameters (initial
values, constraints, objective function)

Confidential | 47

Sensitivity analysis process flow

Generate 200
survival curves Calibrate model Bootstrap 200
from trial data to generate 200 parameter sets
reflecting parameter sets within PSA
sampling error

Alternative
calibration
methods to
generate 200
parameter sets

Confidential | 48
Sample Kaplan-Meier Data

Timepoint 0 4 9 14 19 24
At Risk 100 88 65 47 23 9
OS
Censored 0 7 9 12 14 7
At Risk 100 80 48 27 12 3
PFS
Censored 0 6 8 7 7 3

Confidential | 49

Uncertainty estimates

Life-table estimates are computed by counting the numbers of


censored and uncensored observations that are in time intervals
[t i 1, t i ] , i = 1, 2, K, k + 1, where t 0 = 0 and t k +1 =

q i p i
Estimated standard error is (q i ) = where p = 1 qi
ni
ni = ni w i 2, where w i is the number of units censored in the interval

ni is the effective sample size in [t i 1, t i ]

Conditional probability of an event in [ti 1, ti ] is estimated by q i =


di
ni
Where dj is the number of events in the interval

Confidential | 50
Uncertainty in survival curves and
calibration
Generated OS Calibrated OS

Confidential | 51

Comparison of PSA approaches

Confidential | 52
CEAC comparison

Confidential | 53

Calibrating Longitudinal Models to


Cross-Sectional Data: The Effect
of Temporal Changes in Health
Practices
Objective
One set of calibrated transition
probabilities for cervical cancer model

Problem
Pap smear screening practices changed
over time
Calibration targets reflect current and past
screening patterns
Older women (>65 years): Less screening
when they were young
Younger women: Exposed to higher
screening rates at same ages
Annual screening coverage by age
70%

60%
50%
% S c re e n e d

40%

30%

20%
10%

0%
10 20 30 40 50 60 70 80 90 100
Age

<65 65

How did we calibrate?

g
nin
c
elree
Ou

s
d
Mitho
el w
tp

d
ut

Mo
s

target
SEER targets
tionn
n

In
tio
raratio

pu
ibra

ts

Single-stage model,
alib
alibl

single-stage calibration
CC
Ca

Two-stage model run w/


Calibration single-stage calibration
Two-stage model,
two-stage calibration
Results
Incidence and Mortality rates per 100,000 (age 65+)

SEER Target SEER Target


for incidence: 13.41 for mortality: 7.14

3 5.68
1 3 .4

7.14
13.41 15.81 10.50
13. 7.3
41 2

Single-stage model / single-stage calibration


Two-stage model run w/ single-stage calibration
Two-stage model / two-stage calibration

Implication
Effects of temporal changes are important
when calibrating longitudinal models to
cross-sectional data
Conclusions
Time is always a limiting factor with more time a
better solution can almost always be found
Calibration can affect the interpretation of cost-
effectiveness results
In order to characterize the uncertainty in a
calibrated model:
Results should be reported as a range from different
calibrations
Calibration should be included in probabilistic sensitivity
analyses
Uncertainty in calibration targets should be considered
Adjustments may need to be made to account for
temporal shifts in data
Using a combination of calibration methods is likely
the most efficient way to arrive at good calibrations

Confidential | 61

DISCUSSION

Das könnte Ihnen auch gefallen