Beruflich Dokumente
Kultur Dokumente
Schattauer 2009
Keywords
Neoplasms, survival, period analysis, software
tool
Summary
Objective: In this paper, a software package
for the R language and system for statistical
computing is presented for computation of
long-term cancer survival estimates based on
the period analysis approach. The period
analysis approach provides up-to-date longterm survival estimates of concurrently diagnosed patients, enables early detection of recent changes in long-term prognosis of cancer
patients and provides better survival predictions for recently and currently diagnosed
patients than traditional cohort-based approaches.
Methods: Computation of absolute and
relative survival estimates (both conditional
follow-up year-specific and cumulative survival estimates) and their standard errors is
based on standard actuarial methodology. For
Correspondence to:
Dipl.-Inform. Med. Bernd Holleczek
Saarland Cancer Registry
Prsident-Baltz-Strasse 5
66119 Saarbrcken
Germany
E-mail: b.holleczek@gbe-ekr.saarland.de
Introduction
Long-term survival is a key outcome measure
in monitoring cancer control. Survival is
typically reported as proportions of patients
still alive after a given time span after diagnosis. Most often long-term survival is reported in form of 5-year or 10-year survival
(e.g. [14]).
Cohort, Complete
and Period Analysis
Traditional cohort-based long-term survival
analysis measures survival of patients diagnosed within a defined calendar interval
many years ago and entirely followed up over
a defined time span since then. Thus, survival
estimates derived from cohort-based analysis
Methods Inf Med 2/2009
123
124
B. Holleczek et al.: periodR an R Package to Calculate Long-term Cancer Survival Estimates Using Period Analysis
Fig. 1 Data used for estimating 5-year survival by cohort analysis for patients diagnosed in
19962000 (dashed frame) and by period analysis for the 20012005 period (closed frame). The
numbers within the cells indicate the years following diagnosis
data
surv.m
surv.f
perbeg
perend
method
Table 1
Specification of most
relevant arguments
of function period
Schattauer 2009
B. Holleczek et al.: periodR an R Package to Calculate Long-term Cancer Survival Estimates Using Period Analysis
dm
month of diagnosis
(numeric: 112, e.g. 6 for June)
dy
year of diagnosis
(numeric: 4 digits, e.g. 2000)
diagage
fm
fy
vitstat
Function period checks and coerces arguments prior to passing the data to subroutines
provided by a dynamically loaded library
which sum up observed and expected persons
at risk and deaths during the specified calendar period. Missing arguments and type mismatch result in error messages. In a next step,
the function calculates follow-up year-specific conditional and cumulative absolute and
relative survival estimates and standard errors. Finally, these data are rearranged and returned. The source code includes additional
comments and implementation details.
Data Requirements
For each observation to be included in the
analysis, argument data must at least contain
the following variables (variable names in
parenthesis): gender (sex), month and year
of diagnosis (dm and dy), age at diagnosis
(diagage), month and year when follow-up
ended (fm and fy) and vital status when follow-up ended (vitstat). A specification of
variable types and range is given in Table 2.
Records should be excluded if any of the
mandatory variables (sex, dm, dy, diagage,
fm, fy and vitstat) includes missing information.
For calculation of relative survival, separate data frames of age-specific survival
probabilities of the underlying population
must be prepared for each sex category. These
data frames consist of coupled vectors con-
Empirical Example
The comparison of age-group-specific 5-year
relative survival estimates derived from a
conventional cohort-based and a period
analysis will be used to illustrate the use of the
package. The most important steps of the
analysis will be presented the complete
source code is included as directly executable
examples in the documentation integrated in
the package.
For the analysis data of 1963 stomach
cancer patients included in the R package as
dataset stomach will be used. The data were
provided by the Saarland Cancer Registry
(Germany) which covers a population of almost 1.06 million residents [26]. The registry
provides high-quality data on cancer incidence and mortality since 1970 and meets
national and international standards in terms
of quality and completeness of data [31-33].
The dataset includes records of stomach
cancer patients diagnosed between 1996 and
2005 and followed up until end of 2005. Records of patients aged less than 15 at the time
of diagnosis, cancers notified by death certificate only or diagnosed by an autopsy and
multiple primary cancers have been excluded. The dataset consists of the variables
sex, diagage, dm, dy, fm, fy and vitstat as
specified in section Data Requirements and
Table 2. Prior to the analysis the patients are
grouped into two age categories: 15-64 and 65
years or older at the time of diagnosis this
information is stored in variable agr of data
frame sto.agr (agr = 0 for the younger
group and agr = 1 for patients aged 65 or
older). Age- and calendar-year-specific survival probabilities of the German population during the calendar years 1996 to 2005
Schattauer 2009
125
126
B. Holleczek et al.: periodR an R Package to Calculate Long-term Cancer Survival Estimates Using Period Analysis
Table 3 Conditional and cumulative relative survival estimates of stomach cancer patients grouped into two age categories (in percent with standard
errors in parenthesis) for five years of follow up calculated for the cohorts of patients diagnosed in 19962000 (followed up until end of 2005) and for period
20012005
age at diagnosis
>= 65
conditional
cumulative
conditional
cumulative
conditional
cumulative
conditional
cumulative
62.4 (2.7)
62.4 (2.7)
63.1 (2.8)
63.1 (2.8)
41.6 (2.0)
41.6 (2.0)
49.5 (2.1)
49.5 (2.1)
77.9 (3.0)
48.6 (2.8)
78.1 (3.0)
49.2 (2.9)
72.6 (2.9)
31.7 (1.9)
80.8 (2.6)
40.0 (2.2)
89.9 (2.5)
43.7 (2.8)
86.7 (2.8)
42.7 (2.9)
84.0 (3.1)
26.6 (1.9)
84.8 (2.9)
33.9 (2.2)
97.3 (1.6)
42.5 (2.8)
96.3 (1.8)
41.1 (2.9)
89.8 (3.1)
23.9 (1.9)
90.0 (3.1)
30.5 (2.2)
96.5 (1.8)
41.0 (2.8).
96.3 (1.8)
39.6 (2.9)
98.6 (2.6)
23.6 (2.0)
99.4 (2.5)
30.3 (2.3)
Fig. 2
Plotted curves of follow-up year-specific
relative survival of
stomach cancer patients aged 65 years
or older as produced
by the R package.
Saarland Cancer
Registry, 19962000
cohort and
20012005 period
analysis
Fig. 3
Age-group-specific
cumulative relative
survival curves of
stomach cancer patients as produced by
the R package. Saarland Cancer Registry,
20012005 period
analysis
Schattauer 2009
B. Holleczek et al.: periodR an R Package to Calculate Long-term Cancer Survival Estimates Using Period Analysis
The cumulative relative 5-year survival estimates of the group of patients aged 64 or less
at time of diagnosis which would be available
end of 2005 are almost identical for the cohort of patients diagnosed in 1996 to 2000
and for period 2001 to 2005. In the group of
patients aged 65 years or older the period
analysis reveals major interim improvement
in relative 5-year survival. Here, the estimated
cumulative survival of the 19962000 cohort
is 23.6 percent compared to 30.3 percent of
survival in period 2001 to 2005. As Table 3
and Figure 2 illustrate, this improvement primarily results from improved survival during
the first two years following diagnosis.
Outlook
In this paper we present an add-on package
for the R environment for statistical computing named periodR intended to facilitate
the use of period analysis for the estimation of
long-term cancer survival. The software
package may be used as a tool for the computation of up-to-date estimates of absolute
and relative survival and hence as a means to
report and to disclose changes of long term
cancer survival as early as possible.
The add-on package extends the range of
software currently available for period analysis. The software underwent intense empirical evaluation and thorough inspection of the
code to ensure its reliability. It will be available in a precompiled form for Windows platforms and as source package for other operating systems on the website of the Saarland
Cancer Registry [26].
The implementation of period analysis
software for the R environment which is well
established and widely used in the academic
setting features many benefits. The periodR
package will be freely available at no cost or
without any licensing. It can be installed
easily on all major operating systems, allows
for computation of different types of survival
estimates (cohort-based and period analysis)
and includes detailed documentation on all
components of the package, as well as directly
executable examples using a real cancer dataset.
The software does not support the computation of survival estimates for follow-up
intervals of different length (e.g. shorter intervals during the first years after diagnosis).
As long-term survival estimates derived by a
life-table-based method are typically at best
marginally affected by the choice of intervals,
this limitation might not be grave. The source
code of the package is available for and may
allow further refinements.
We hope that the presented add-on R
package periodR will meet a wide audience
in researchers dealing with data from population-based cancer registries, where availability of sophisticated commercial software
packages is often limited. Maintenance and
further development of the package is intended along with ongoing improvements of
the period analysis methodology in the future(e. g. model-based survival analysis [35]).
Acknowledgments
The software and publication were realized in
the framework of the IMPROVE project with
the support of the Deutsche Krebshilfe (German Cancer Aid), grant no. 70-3166-Br5.
References
1. Berrino F, De Angelis R, Sant M, Rosso S, BielskaLasota M, Coebergh JW et al. Survival for eight
major cancers and all cancers combined for European adults diagnosed in 1995-99: results of the
EUROCARE-4 study. Lancet Oncol 2007; 8:
773783.
2. Verdecchia A, Francisci S, Brenner H, Gatta G,
Micheli A, Mangone L, et al. Recent cancer survival
in Europe: a 2000-02 period analysis of
EUROCARE-4 data. Lancet Oncol 2007; 8:
784796.
3. Coleman MP, Rachet B, Woods LM, Mitry E, Riga
M, Cooper N, et al. Trends and socioeconomic
inequalities in cancer survival in England and Wales
up to 2001. Br J Cancer 2004; 90: 13671373.
4. Ellison LF, Gibbons L. Survival from cancer up-todate predictions using period analysis. Health Rep
2006; 17: 1930.
5. Ederer F, Axtell LM, Cutler SJ. The relative survival
rate: a statistical methodology. Natl Cancer Inst
Monogr 1961; 6: 101121.
6. Henson DE, Ries LA. The relative survival rate.
Cancer 1995; 76: 16871688.
7. Parkin DM, Hakulinen T. Cancer registration: principles and methods. Analysis of survival. IARC Sci
Publ 1991. pp 159176.
8. Brenner H, Gefeller O. An alternative approach to
monitoring cancer patient survival. Cancer 1996;
78: 20042010.
9. Brenner H, Gefeller O. Deriving more up-to-date
estimates of long-term patient survival. J Clin
Epidemiol 1997; 50: 211216.
10. Brenner H, Hakulinen T. Up-to-date long-term survival curves of patients with cancer by period analysis. J Clin Oncol 2002; 20: 826832.
11. Brenner H, Sderman B, Hakulinen T. Use of
period analysis for providing more up-to-date
estimates of long-term survival rates: empirical
evaluation among 370,000 cancer patients in
Finland. Int J Epidemiol 2002; 31: 456462.
12. Talbck M, Stenbeck M, Rosn M. Up-to-date longterm survival of cancer patients: an evaluation of
period analysis on Swedish Cancer Registry data.
Eur J Cancer 2004; 40: 13611372.
13. Ellison LF. An empirical evaluation of period survival analysis using data from the Canadian Cancer
Registry. Ann Epidemiol 2006; 16: 191196.
14. Brenner H, Gefeller O, Stegmaier C, Ziegler H. More
Up-To-Date Monitoring of Long-Term Survival
Rates by Cancer Registries: An Empirical Example.
Methods Inf Med 2001; 40; 248252.
15. Aareleid T, Brenner H. Trends in cancer patient survival in Estonia before and after the transition from
a Soviet republic to an open-market economy. Int J
Cancer 2002; 102: 4550.
16. Brenner H. Long-term survival rates of cancer patients achieved by the end of the 20th century: a
period analysis. Lancet 2002; 360: 11311135.
17. Talbck M, Rosn M, Stenbeck M, Dickman PW.
Cancer patient survival in Sweden at the beginning
of the third millennium predictions using period
analysis. Cancer Causes Control 2004; 15: 967976.
18. Brenner H, Stegmaier C, Ziegler H. Long-term survival of cancer patients in Germany achieved by the
beginning of the third millenium. Ann Oncol 2005;
16: 981986.
Schattauer 2009
127
128
B. Holleczek et al.: periodR an R Package to Calculate Long-term Cancer Survival Estimates Using Period Analysis
19. Houterman S, Janssen-Heijnen ML, van de PollFranse LV, Brenner H, Coebergh JW. Higher longterm cancer survival rates in southeastern Netherlands using up-to-date period analysis. Ann Oncol
2006; 17: 709712.
20. Gondos A, Arndt V, Holleczek B, Stegmaier C,
Ziegler H, Brenner H. Cancer survival in Germany
and the United States at the beginning of the 21st
century: an up-to-date comparison by period
analysis. Int J Cancer 2007; 121: 395400.
21. Brenner H, Gefeller O, Hakulinen T. Period analysis
for up-to-date cancer survival data: theory, empirical evaluation, computational realisation and applications. Eur J Cancer 2004; 40: 326335.
22. Greenwood, M. The natural duration of cancer. In:
Reports on Public Health and Medical Subjects.
London: Her Majestys Stationery Office; 1926. pp
126.
23. R Development Core Team. R: A Language and Environment for Statistical Computing. 2007. URL:
http://www.r-project.org (26.08.2008 14:00 CET).
Schattauer 2009