You are on page 1of 5

MISSING THE

MARK Why is it so hard to find a test to predict cancer?


BY LIZZIE BUCHEN

O
n 3 March, two studies appeared online women — to ask whether these seemingly contrast, detected 63%.) Mor’s panel already
that offered 19 pages of gloomy reading breakthrough bio­markers were better at iden- had a tortured history. A primary research
for anyone interested in cancer. They tifying women with early ovarian cancer than paper behind it had been criticized by other
focused on biological molecules, or the one flawed biomarker that had been in use scientists for allegedly using inappropriate
biomarkers, the presence of which in the blood for almost 30 years, CA-125. None of them statistical calculations and for optimistically
might be used to detect the earliest glimmers of was1,2. “CA-125 remains the ‘best of a bad concluding that the test would help women
ovarian cancer — a disease not normally dis- lot’,” read an accompanying perspective arti- before rigorous follow-up studies proved that it
covered until it has destroyed the ovaries and cle3. “The new candidates have fallen short of could. Yet for four months in 2008, the test was
rotted other parts of the body. The researchers, expectations.” sold to patients by Laboratory Corporation
coordinated by the Early Detection Research Tied in last place for its poor performance of America (LabCorp) in Burlington, North
Network (EDRN) of the US National Cancer among the biomarker panels was one iden- Carolina, the company that licensed the panel
Institute (NCI), had assembled 35 protein bio- tified by Gil Mor, a cancer biologist at Yale from Yale. LabCorp had marketed the test
markers, including 5 panels of proteins, that University in New Haven, Connecticut. Mor’s under the name Ova­Sure until the US Food
had looked the most promising in early stud- six-protein panel detected ovarian cancer in and Drug Administration (FDA) intervened
ies. They had carried out rigorous testing — only 34% of the women who were diagnosed and the company pulled it from the market.
screening blood samples from more than 1,000 with the disease within a year. (CA-125, by The panel offered “invaluable object lessons”

4 2 8 | NAT U R E | VO L 4 7 1 | 2 4 M A RC H 2 0 1 1
© 2011 Macmillan Publishers Limited. All rights reserved
FEATURE NEWS

Biomarkers might help to detect ovarian tumours a high-profile decision at Duke University in experts agree that potential biomarkers for early

IMAGES BY DU CANE MEDICAL IMAGING LTD/SPL


(large round masses) early in the disease. Durham, North Carolina, to suspend clinical cancer detection should be validated on sam-
trials of a genomics-based biomarker panel ples taken before diagnosis — the stage at which
for bringing a test prematurely to the clinic, designed to direct chemotherapy in patients the test would be used in the clinic — that is a
wrote the authors of the perspective article. with breast cancer. A number of scientists step that few groups attempt and no biomarker
Similar lessons can be found in the sto- had raised concerns about the Duke group’s for ovarian cancer has passed, as the EDRN
ries behind many cancer biomarkers that data and analysis, and the trial was stopped study made clear. “Sometimes the glamour of
have sputtered and failed on their way to the after allegations came to light that the lead the technology or the sheer volume of omics
clinic. Those tests that are in clinical use — researcher, geneticist Anil Potti, had made data seem to make investigators forget basic
including prostate-specific antigen (PSA) for false claims on his CV. Last September, the scientific principles,” said McShane at the IOM
prostate cancer, mammogram-detected Institute of Medicine (IOM), part of the US meeting. Mor agrees that the field has faced
masses for breast cancer and CA-125 — fail National Academies, assembled a committee problems, and that it is important for markers
to detect all cancers and sometimes ‘detect’ to discuss lessons for developing tests based to go through a careful process of design and
ones that aren’t there. Genomics, proteomics on ‘omics’ technologies and bringing them to validation, as he tried to do.
and other such technologies promised to help the clinic. “Why don’t we have assays out there, “There’s been an enormous amount of hype
by finding combinations of markers that are with this enormous promise?” Dan Hayes, a and promise,” sums up David Ransohoff, a can-
more powerful and cancer-specific than indi- breast-cancer researcher at the University of cer epidemiologist at the University of North
vidual ones, but that promise has not been Michigan in Ann Arbor asked researchers at Carolina in Chapel Hill. “But after 10 or 15 years
realized. Researchers using such technologies the first IOM committee meeting in December of intense work in these fields, there’s simply not
have published studies on thousands of pan- 2010.“It’s either because these things just don’t a lot to show for it. It’s important for the whole
els, suggesting that they can detect early-stage work, or because we’ve used sloppy science to field to step back and look at what is wrong.”
disease, guide patient treatment and monitor test them.”
recurrence. But only a tiny number of such It is too early to say whether either of these MAKING A DIFFERENCE
tests have reached the clinic — and none for is true: the field is still young, and faces many Mor began his career in Israel, where he trained
the early detection of cancer, the biggest clini- challenges. It has drawn in many cancer biolo- as a clinician at the Hebrew University of Jeru-
cal challenge of all. “Much biomarker research gists who are excited by the potential to translate salem. But an experience in the final years
has been done very badly for decades,” believes their work to the clinic — but they sometimes of his oncology residency compelled him to
Lisa McShane, a biostatistician at the NCI in lack the expertise or resources needed to pursue change course. A young woman arrived at the
Rockville, Maryland. “Even when it was single translational or clinical work. “A lot of novices hospital with ovarian cancer, a disease that
markers. Now, as we’re moving up to multiple came in. They get in without realizing that kills some 140,000 women worldwide each
markers, all our bad habits are coming back to the problem may be more year. The oncology team removed the woman’s
bite us in a big way.” NATURE.COM complex than it appears,” ovaries and put her through several rounds of
These habits have been thrown into the A possible way says Eleftherios Diamandis, chemotherapy, which seemed to be successful.
spotlight by the EDRN’s study, one of the larg- forward for cancer a clinical biochemist at the But 18 months later, she was back, her body
est and most systematic validation studies of biomarkers: University of Toronto in riddled with tumours, and she soon died.
biomarkers so far. It came just months after go.nature.com/icwtue Canada. And although most “Chemotherapy didn’t do anything for her,”

2 4 M A RC H 2 0 1 1 | VO L 4 7 1 | NAT U R E | 4 2 9
© 2011 Macmillan Publishers Limited. All rights reserved
NEWS FEATURE

Mor recalls. “She was 29. She was a beautiful (PNAS), with Ward as a contributing author5. Feng and Gary Longton, another statistician
girl. An impressive girl. A medical student. And Before publication, Mor helped the Yale at the FHCRC, developed their own classifica-
I never understood what happened to her.” Office of Cooperative Research to prepare tion algorithms, and found that Mor’s test had a
Mor decided to leave medicine, which had a patent application. “A lot of companies sensitivity of 95% and specificity of 99%. They
been unable to save her, for research, which expressed interest in licensing the panel,” also calculated the positive predictive value
one day might. He earned a PhD studying says John Puziss, director of technology (PPV) of the test — the proportion of patients
ovarian cancer at the Weizmann Institute of licensing at Yale. LabCorp licensed the test who the test would diagnose with the disease
Science in Rehovot, Israel, before moving to in 2006, as did Millipore, a biomanufac- and do in fact have it. A high PPV means that
Yale in 1997. He went on to start a programme turing company based in Billerica, Mas- few people will be misdiagnosed, which is cru-
called Discovery to Cure, aiming to speed can- sachusetts. (Mor says that the royalties he cial when screening healthy people.
cer research to the clinic. The group began to and his co-inventors received “were not a Feng and Longton calculated the PPV at
build a bank of blood significant amount”.) 6.5%, too low for the test to be of much use for
and tissue samples, The test’s promising results had also caught screening. But separately, Mor was working with
including some from a “AS WE’RE the attention of researchers in the EDRN, a different figure, of 99.3%. The huge disparity
Yale clinic for women MOVING UP who were just putting together their valida- between the two values stemmed from the way
with a high risk of ovar-
ian cancer owing to a TO MULTIPLE tion study. Up to that point, most biomarkers
for detecting early ovarian cancer had only
that they calculated the figure and factored in
the prevalence of ovarian cancer — an impor-
family history of the MARKERS, ALL been shown to distinguish patients with diag- tant variable in calculating the PPV. Following
disease.“There was a lot
of excitement around
OUR BAD HABITS nosed cancer from healthy controls, but they
are intended to detect the disease in women
convention, Feng and Longton calculated the
PPV using the accepted prevalence in post-
that time for finding ARE COMING whose cancer is just budding, before symp- menopausal women, 1 in 2,500 (0.04%). But
proteins specific to can-
cer,” says Mor.
BACK TO BITE US toms develop. What the field needed was a
‘prospective’ study, run on blood samples from
Mor’s figure was calculated solely from the study
population, in which the prevalence was 46%.
In 2003, David Ward, IN A BIG WAY.” apparently healthy women, to see whether the “We calculated the PPV based on the popula-
then a geneticist at Yale, biomarkers could pinpoint those who would tion in the study, because we always intended
contacted Mor. Ward had co-founded Molec- later be diagnosed with ovarian cancer. Such the test for the high-risk population,” says Mor.
ular Staging, a company in New Haven that samples, from large numbers of women who “If you want to bring the test to the clinic, it
had developed a ‘high-throughput’ technique are tracked over months or years, are extremely has to be calculated based on the population
for quantifying multiple proteins in the blood difficult to come by. you’re going to study,” he says, noting that other
using arrays of antibodies4. He asked whether research studies work out the PPV for the study
he could use Mor’s samples to search for mark- PROBLEM DETECTION population in this way.
ers of early ovarian cancer. The EDRN found what was needed in the It’s a common mistake, believes McShane,
Mor had never been involved with bio- Prostate, Lung, Colorectal, and Ovarian who — like other statisticians — disagrees with
marker research — “I do biology of cancer, (PLCO) Cancer Screening Trial, sponsored Mor’s logic. “I see that a lot, but it is nowhere
not biomarker development,” he says — but and run by the NCI. Between 1992 and 2001, near the correct thing to do,” she says. Even
he signed up, intrigued by the clinical poten- the trial had been collecting blood at regu- in high-risk populations — women who are
tial of the technology. Ward had scoured the lar intervals from 155,000 women and men, screened every year because of their family his-
literature for proteins that had been associ- and screening them for cancer. By June 2006, tory or because they have tested positive for
ated with ovarian-cancer growth and malig- 118 of the women had developed ovarian mutations in tumour-suppressor genes BRCA1
nancy, and had come up with 169 candidates. or closely related cancers, and the EDRN or BRCA2 — the prevalence is around 0.5%,
Using the protein-quantification technique, researchers were now in a position to use them far below the 46% in
Ward’s company screened blood samples in to evaluate the most promising biomarkers Mor’s study population.
Mor’s tissue bank that came from two groups: for early detection. Ziding Feng, a biostatisti- “IT’S IMPORTANT Similar battles over the
women with newly diagnosed ovarian can- cian at the Fred Hutchinson Cancer Research FOR THE WHOLE correct use of statis-
cer who had been enrolled in Yale’s high-risk
clinic, and women who had come to the hos-
Center (FHCRC) in Seattle, Washington, and
coordinator of the EDRN, visited Mor to dis-
FIELD TO STEP tics litter the cancer-
biomarker field, said
pital for routine gynaecological exams. Using cuss whether his panel of four proteins could BACK AND LOOK researchers at the IOM
additional cancer-patient samples, they whit-
tled the list down to four proteins: leptin, pro-
be included in the study.
Mor was already in the process of refin-
AT WHAT IS meeting last year. “It’s
the type of thing where
lactin, osteopontin and insulin-like growth ing the panel: he had more patient samples, WRONG.” non-statisticians think
factor II. and wanted to add more markers, including statisticians are being
Mor worked to develop an algorithm that CA-125 and the protein macrophage migra- uptight about something that’s not going to
could automatically classify women as hav- tion inhibitory factor, to make the test more matter anyway,” says McShane.
ing cancer or not, depending on levels of these sensitive to cancer. LabCorp had been run- Mor prepared a paper reporting the latest
four proteins. When the team ran a new set of ning his new samples on assay kits manufac- work. But when Feng and Longton saw the
blood samples through the algorithm, they got tured by Millipore. (Ward, meanwhile, had page proofs, they noticed that the PPV value
astounding results. The test showed a sensitiv- moved to the Nevada Cancer Institute in Las was reported as 99.3%. They asked Mor to
ity of 95% (meaning it correctly detected 95% Vegas, and was not involved in data collection change it to the 6.5% that they had calcu-
of the ovarian-cancer cases) and a specific- or analysis.) lated, and to correct a few other typographical
ity of 95% (it erroneously classified only 5% When Mor showed Feng how he was ana- errors in the tables. “He agreed, so we signed
of healthy people as having cancer). “I was lysing his recent data, Feng was troubled. Mor off,” recalls Feng. But there was a miscommu-
delighted,” says Mor. On equivalent samples, asked him to go through the new results him- nication: Mor thought that Feng had agreed
CA-125 tests typically have a sensitivity of self, and Feng agreed to collaborate. “I do not to the use of the high PPV, and that everyone
70–80% and a specificity of around 95%. In do statistics,” says Mor. “That is not my field.” approved of the final manuscript.
May 2005, the findings were published in the The researchers also added the six-protein The paper was published online in Clini-
Proceedings of the National Academy of Sciences panel to the EDRN’s validation study. cal Cancer Research6 in February 2008, and to

4 3 0 | NAT U R E | VO L 4 7 1 | 2 4 M A RC H 2 0 1 1
© 2011 Macmillan Publishers Limited. All rights reserved
FEATURE NEWS

that “additional research is needed to vali-


date the test’s effectiveness”. The paper in
Clinical Cancer Research was also circulat-
ing at the Canary Foundation, a non-profit
organization based in Palo Alto, California,
that funds research on early cancer detec-
tion. Scientists there found other reasons for
concern. One member, Nicole Urban, head of
the Gynecologic Cancer Research Program at
the FHCRC, had found that levels of prolac-
tin, one of the proteins in the panel, are highly
sensitive to stress — something very likely to
affect women entering the clinic with symp-
toms of ovarian cancer7. After controlling for
that, she says, “prolactin gave no signal at all
for malignancy. It was useless.” Others pointed
out that the high specificity and sensitivity fig-
ures reported in the paper’s conclusions, and
trumpeted in Yale and OvaSure press releases,
were not present in any of the tables or fig-
ures. And they bristled at the positive tone of
the discussion, which stated that the test “will
enhance the potential of treating ovarian can-
cer in its early stages and therefore, increases
the successful treatment of the disease”.
“There were a lot of uncertainties, and
evidence of biases,” says Martin McIntosh,
who researches markers for early-stage ovar-
ian cancer at the FHCRC, and is a member
Gil Mor is testing whether a panel of six proteins can detect ovarian cancer in women at high risk. of the Canary group, “But the narrative only
highlighted the best-performing analysis.
Feng’s shock it reported the high PPV. “You can He says he thought that clinical use of the test It didn’t mention caveats.” Members of the
S. OGDEN

imagine how upset I was when I saw it in the might be a good way to do further validation. Canary group wrote a letter to Clinical Cancer
paper,” says Feng. “It’s very difficult to do that on large numbers Research, describing some of their complaints.
Feng called Mor. “I told him, those are of patients,” he says. “It’s extremely expensive. Meanwhile, Feng agreed to co-author a second
errors, we told you those are not correct.” Feng The only way to do the study is if LabCorp letter, criticizing the paper even though he was
also contacted the journal, the editor of which started distributing the test and enrolling a co-author.
asked Mor to submit a correction to fix the patients.” Mor notes that many tests, such as The fuss was already reaching the FDA,
PPV and the other typos. Mor agreed, adding mammography, have been offered to patients which on 7 August 2008 sent a letter to Lab-
the lower PPV as a footnote to the table and in as an aid to diagnosis even while data on the Corp saying that the test “has not received
a written correction. test are being collected.
“Was it the right time? adequate clinical validation, and may harm the
A few weeks later, Feng received an e-mail I don’t know,” he says. public health”. A second letter, sent by the FDA
with unwelcome news from a colleague: Lab- on 29 September 2008, alleged that LabCorp
Corp was preparing to market the panel, and CRITICAL BACKLASH did not have the necessary marketing clear-
was “hopeful that this test will be available to On 23 June 2008, LabCorp announced the ance or approval for the test from the FDA.
women by the end of the year”. availability of the OvaSure test, for between LabCorp replied to the FDA on 20 October,
“I was shocked,” says Feng. “I had no idea US$220 and $240. The press release said that disagreeing with the agency’s assertions, but
this was coming.” He thought that the mark- it was being offered to women with a high risk agreed to pull OvaSure from the market. It
ers should be validated further before they of the disease, and quoted Mor as saying he did so on 24 October 2008, just one day after
went to the clinic. In March 2008, Feng and was “pleased that this test is available to help Clinical Cancer Research published the critical
Mor saw each other at a meeting in Washing- physicians detect and treat ovarian cancer in letters from the Canary Foundation and Feng,
ton DC. “I told him, face to face, you cannot its earliest stages”. as well as a third from the Centers for Disease
do this,” says Feng. “You have to wait until Excited chatter about the test spread through Control and Prevention (CDC) in Atlanta,
after the PLCO validation. What you have patient forums and support groups, but it Georgia8–10. (Millipore continues to market
done is early discovery. If validation does not was soon countered by cautionary tales. Jean the biomarker panel for use in research, not
support your earlier claim, you’re making a McKibben, an ovarian-cancer survivor, by patients.)
significant error.” Mor does not recall this rushed to take OvaSure on the first day it was Mor was surprised by all three letters. In
encounter, but says that Feng’s “role was to available, and her results showed a 0.00 chance his published response11, he disputed some
analyse the data, not to make judgements of of cancer. A week later, scans showed that her of the criticisms and wrote that any concerns
a company decision”. cancer was back. She was crushed. “I wanted about commercialization should be taken
Now, Mor says that if he were preparing the this to work so badly,” she wrote on a discus- up with LabCorp. Stephen Anderson, vice-
paper again, he would include both the low sion board. president of investor relations at LabCorp,
and high values for the PPV. And he vacillates One week after LabCorp’s announce- says that OvaSure was not marketed as a test
about whether LabCorp’s decision to offer the ment, the Society of Gynecologic Oncolo- for detecting cancer recurrence, which was
test to women before it had undergone more gists in Chicago, Illinois, released a statement how some patients used it. He says that Lab-
validation studies was the right thing to do. expressing concern about OvaSure, saying Corp “continues to believe OvaSure offers a

2 4 M A RC H 2 0 1 1 | VO L 4 7 1 | NAT U R E | 4 3 1
© 2011 Macmillan Publishers Limited. All rights reserved
NEWS FEATURE

months — to monitor rising and falling levels


CAS E ST U DY of the biomarkers. But the whole experience
has made him reluctant to pursue biomarker
The gene collection that could work much further. “I’m focusing on under-
standing cancer stem cells,” he says.
Others say that’s just as well. The panel’s poor
ZEPHYR/SPL

performance in the PLCO study makes critics


question its usefulness in any group, even a
high-risk one. McIntosh says that the PLCO
study’s damning conclusions should serve as a
wake-up call. “The entire field has to cope with
this,” he says — including him, given that the
most promising biomarkers discovered by his
institution also failed to improve on CA-125 in
Genomic signatures measured in tissue samples can help to classify breast-cancer tumours. the trial. “It’s hugely disappointing.”
The IOM committee, which is expected to
When asked to name a of women with the most — the way it is prepared by release its results sometime in 2012, may help
successful cancer test that is common type of early the pathology lab after a to find a way forward. At a meeting later this
based on multiple genes or breast cancer undergo tumour is removed — in the month, the members plan to draw lessons from
proteins, many researchers chemotherapy after surgery, clinic and in all clinical trials. the biomarker failures, as well as from the few
point to Oncotype DX. The but only 15% are likely to Because of this, the team success stories (see ‘The gene collection that
panel, which tests surgically have a recurrence. Was was able to validate the could’). One of the most urgent lessons is the
removed breast tumours it possible to identify the panel using samples that need to help researchers validate their bio-
for the expression level of crucial 15%, and spare the had already been collected markers on appropriate samples before they
21 genes to predict the others from chemotherapy? in large clinical trials, rather reach the clinic. Feng says that the EDRN has
likelihood of the cancer The team set out to find a than having to collect been collecting its own high-quality tissue ref-
recurring, was developed gene signature that could samples afresh. erence sets for ovarian, breast, lung, colon, liver
by Genomic Health in do the job. “The number “It’s a poster child for one and prostate cancers, from people who aren’t
Redwood City, California, one key to their success was way to do clinical research,” yet showing symptoms and those in all stages
and has been marketed starting with a well-defined, says David Ransohoff, a of the disease. Investigators can apply to test
since January 2004. It is clinically relevant question,” cancer epidemiologist at their biomarkers on blinded tissue samples.
used by roughly half the says Simon. the University of North Until this type of testing becomes common­
patients in the United States The researchers brought Carolina in Chapel Hill. place, there is no way of excluding the possibil-
who have the most common in Michael Walker, a But the test is hardly ity that, as Hayes suggested at the IOM meeting,
type of breast cancer. statistician then based in perfect. In January 2008, “these things just don’t work” — particularly
“A lot of biomarker Sunnyvale, California, to a group commissioned by when it comes to picking up cancer early on.
research starts with getting help design the studies from the Centers for Disease “People keep talking about early-detection
some signature and then the outset. Walker says that Control and Prevention in biomarkers as if they are a fact, and we only
figuring out how to use it rarely works this way, and Atlanta, Georgia, evaluated need to find them,” says McIntosh,“when in
it,” says Richard Simon, a that often the statistician Oncotype DX. It found reality their existence is a hypothesis that needs
biostatistician at the US is only brought in after the that the test results were to be tested.” ■ SEE OUTLOOK P.450
National Cancer Institute data have been collected. reproducible and did well
in Rockville, Maryland. But By that time, biases and at predicting recurrence, Lizzie Buchen is a freelance writer in San
Genomic Health researchers confounding factors may but it was unclear whether Francisco, California.
took a different route, be hard-wired into the data. the test was better than
1. Cramer, D. W. et al. Cancer Prev. Res. 4, 365–374
sitting down in 2000 with The team also put a high established risk factors, (2011).
a group of oncologists priority on using the right such as age, or standard 2. Zhu, C. S. et al. Cancer Prev. Res. 4, 375–383
and patient advocates tissue samples in their molecular features of the (2011).
and asking what question initial studies. They decided tumour13. Results from 3. Mai, P. L., Wentzensen, N. & Greene, M. H. Cancer
Prev. Res. 4, 303–306 (2011).
in cancer treatment they early on that they wanted a large, independent 4. Schweitzer, B. et al. Proc. Natl Acad. Sci. USA 97,
should address. Two years to use tumour tissue that validation study, called 10113–10119 (2000).
later, they nailed it down. had been fixed in formalin TAILORx, are expected in 5. Mor, G. et al. Proc. Natl Acad. Sci. USA 102,
7677–7682 (2005).
At present, the majority and embedded in paraffin 2015. L.B.
6. Visintin, I. et al. Clin. Cancer Res. 14, 1065–1072
(2008).
7. Thorpe, J. D. et al. PLoS ONE 2, e1281 (2007).
valuable tool for ovarian-cancer detection earliest stages of ovarian cancer 12, and in 8. Greene, M. H., Feng, Z. & Gail, M. H. Clin. Cancer Res.
14, 7574 (2008).
in conjunction with other diagnostic tech- which LabCorp again ran the assays. The 9. McIntosh, M. et al. Clin. Cancer Res. 14, 7574
niques”, and that the assay is still in develop­ test still performed well at distinguishing (2008).
ment. The company would not provide the patients from the healthy controls. Mor 10. Coates, R. J., Kolor, K., Stewart, S. L. & Richardson,
further comment. says he is puzzled by the PLCO trial results, L. C. Clin. Cancer Res. 14, 7575–7576 (2008).
11. Mor, G., Schwartz, P. E. & Yu, H. Clin. Cancer Res. 14,
and he hopes that further analysis of the trial 7577–7579 (2008).
DOUBTS AND LESSONS data will help to explain why his biomarkers 12. Mor, G., Symanowski, J., Visintin, I., Birrer, M. &
Since then, Mor has worked hard to validate performed so poorly. He continues to express Ward, D. Proc. Am. Assoc. Cancer Res. LB-224
(AACR, 2009).
his panel. He and Ward have completed a confidence in his panel, saying that the test 13. Evaluation of Genomic Applications in Practice and
study on a much larger set of samples includ- could be most useful in high-risk populations, Prevention (EGAPP) Working Group Gen. Med. 11,
ing many from women diagnosed in the and when used regularly — every two to three 66–73 (2009).

4 3 2 | NAT U R E | VO L 4 7 1 | 2 4 M A RC H 2 0 1 1
© 2011 Macmillan Publishers Limited. All rights reserved