Sie sind auf Seite 1von 8

An Emerging Consensus on Grading Recommendations?

Gordon Guyatt* MSc, M.D.


* Department of Medicine, McMaster University, Hamilton, Ontario, Canada Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada Gordon Guyatt MSc, M.D. An Emerging Consensus on Grading Recommendations? Chin J Evid-based Med, 2007, 7(1): 18.

Clinical practice guidelines: The challenge of grading


Clinical practice guidelines have improved in quality over the past ten years by adhering to a few basic principles, such as conducting thorough systematic reviews of relevant evidence, and grading the recommendations and the quality of the underlying evidence. The large number of systems of measuring the quality of evidence and recommendations that have emerged are, however, confusing[1]. An international group of guideline developers, systematic reviewers, and clinical epidemiologists has taken on the ambitious task of helping resolve the confusion among the different systems of rating evidence and recommendations. The group has wide representation from many organizations including the Agency for Healthcare Research and Quality in the USA, the National Institute for Clinical Excellence for England, and Wales, and the World Health Organization. Developing a new uniform ratings system is challenging because all systems have limitations and because many organizations invested a great deal of time and effort to develop their ratings systems and are understandably reluctant to adopt a new system. The GRADE working group first published the results of its work in 2004 in the British Medical Journal [2]. Simpler, clinically oriented descriptions have been published subsequently[3,4]. GRADE has taken care to ensure its suggested system is simple to use and applicable to a wide variety of clinical recommendations that span the full spectrum of medical specialties and clinical care.

top rating on GRADEs 4-level quality of evidence classification (Table 1). GRADE takes into account, however, that not all RCTs are alike, and that five categories of limitations of RCTs may compromise the quality of their evidence (Table 2). 1) First, quality decreases if most of the evidence comes from RCTs with serious methodological flaws such as lack of allocation concealment or blinding, large loss to follow-up, or stopping early for benefit. How lack of blinding can influence the grading is exemplified by a recommendation to treat heparininduced thrombocytopenia (HIT) complicated by thrombosis with danaparoid sodium. The randomized trial evidence for danaproid use in HIT comes from
Table 1 Grade High Quality of evidence and their definitions

Definition Further research is very unlikely to change our confidence in the estimate of effect. Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Any estimate of effect is very uncertain. Factors in deciding on confidence in estimates of benefits, risks, burden, and costs

Moderate

Low Very low Table 2

Factors that may decrease the quality of evidence based on randomized control trials (RCTs) 1) Poor quality of planning and implementation of the available RCTs suggesting high likelihood of bias 2) 3) 4) 5) Inconsistency of results Indirectness of evidence Sparse evidence Reporting bias (including publication bias)

Quality of evidence
The GRADE system classifies recommendations in one of two levels - strong and weak and quality of evidence of evidence into one of four levels high, moderate, low and very low. Evidence based on randomized control trials (RCTs) begins with a
Contact Address: Dr. Gordon Guyatt, McMaster University, Faculty of Health Sciences, Clinical Epidemiology & Biostatistics, Room 2C12, 1200 Main Street West Hamilton, ON, L8N 3Z5. Tel: (905) 525-9140, x22900; Fax: (905) 524-3841. Email: guyatt@mcmaster.ca 200771 www.cjebm.org.cn

Factors that may increase the quality of evidence based on observational studies 1) Large magnitude of effect

2) All plausible confounding would reduce a demonstrated effect 3) Dose-response gradient 1

an unblinded trial in which the outcome was the clinicians assessment of when the thromboembolism had resolved, a subjective judgement. As a result, an ACCP guideline panel system rated the quality of the evidence as moderate rather than strong[5]. 2) A second reason for downgrading is inconsistency of results. When several RCTs yield widely differing estimates of treatment effect (heterogeneity or variability in results) investigators look for explanations for that heterogeneity. For instance, drugs may have larger relative effects in sicker, or in less sick, populations. When heterogeneity exists, but investigators fail to identify a plausible explanation, the strength of recommendations from even rigorous RCTs is weaker. For example, RCTs of pentoxifylline in patients with intermittent claudication have shown conflicting results that so far defy explanation. Acknowledging the unexplained heterogeneity, a guideline panel rated the quality of the evidence for pentoxifylline as moderate, rather than high[6]. 3) Indirectness may compromise the quality of evidence. Evidence is indirect if there are no head-to-head comparisons between therapeutic alternatives. For instance, drug benefit plans or formularies have to choose between funding of a number of bisphosphonates, including alendronate and risedronate, for prevention of osteoporotic fractures. Unfortunately, the decision must be made on a comparison of trials evaluating alendronate against placebo, and risedronate against placebo, rather than direct comparisons of alendronate and risedronate. Evidence may also be indirect if the population differs (we are interested in valvular atrial fibrillation, but all RCTs are in non-valvular atrial fibrillation), intervention (wed like to know about relatively lowdose angiotensin-converting enzyme inhibition, but all trials are in higher dose), or outcome (wed like to know about long-term effectiveness, but all trials have only short follow-up durations). As an example of differences in populations, avian flu is a disease caused by influenza A (H5N1) virus and is associated with a high case fatality (approximately 33 to more than 50% of patients die). Potential exposure to the virus raises the concern of chemoprophylaxis. Pharmacological interventions could include the use of antiviral neuraminidase inhibitors such as oseltamivir. Oseltamivir, however, has been used only in studies of patients with seasonal influenza with a different influenza A virus, a quite different patient population. 4) When total sample size is small, and outcome events are few in number, our uncertainty about estimates of benefit and risk increase. For instance, a well-designed and rigorously conducted RCT addressed the use of nadroparin, a low molecular weight heparin, in patients with cerebral venous sinus thrombosis. Of 30 treated patients, three had
2

a poor outcome, as did six of 29 patients in the control group. The investigators analysis suggests a 38% RRR of a poor outcome, but the result was not statistically significant. GRADE continues to debate the appropriate thresholds for decreasing strength of inference: are confidence intervals too wide, how few events are too few? 5) A final reason for downgrading quality of evidence is a high likelihood of reporting bias. The quality of evidence may be reduced if investigators fail to report studies (typically those that show no effect) or outcomes (typically those that may be harmful or for which no effect was observed) or if other reasons lead to results not being reported. Unfortunately, guideline panels are still required to make guesses about the likelihood of reporting bias. A prototypical situation that should elicit suspicion of reporting bias is when published evidence includes a number of small trials, all of which are industry funded[7]. For example, 14 trials of flavanoids in patients with hemorrhoids have shown apparent large benefits, but enrolled a total of only 1,432 patients[8]. The heavy involvement of sponsors in most of these trials raises questions of whether unpublished trials suggesting no benefit exist. While observational studies (e.g. cohort studies) start with a low quality rating, they may be graded upwards if the magnitude of the treatment effect is very large (e.g., hip replacement for severe hip osteoarthritis), if there is evidence of a dose response relationship, or if all apparent confounders would decrease the magnitude of the treatment effect (Table 2). For example, a systematic review revealed higher mortality in for-profit than in not-for-profit hospitals[9]. This result was found despite the fact that for-profit hospitals usually admit healthier patients with a higher socio-economic status and have more resources at their disposal. These potential confounders, if anything, favor for-profit hospitals. If such confounders were taken into account, the magnitude of effect favoring not-for-profit hospitals would be even larger.

Commentary

Strength of Recommendations
As noted, the GRADE system offers 2 levels of recommendations: strong and weak. When an interventions benefits clearly outweigh its risks and burden, or clearly do not, strong recommendations are warranted. On the other hand, when the tradeoff between benefits and risks is less certain, either because of low quality evidence or because high quality evidence suggests benefits and risks are closely balanced, weak recommendations become appropriate. This 2-level approach is easy to put into practice. For strong recommendations in which it is clear that benefits far outweigh risks, or risks far outweigh benefits, virtually all patients will make the same choice (e.g. aspirin in the setting of acute myocardial infarction). In such instances, physicians can confidently recommend
Chin J Evid-based Med, 2007, Vol.7(1) www.cjebm.org.cn

treatment. For weak recommendations, different patients may choose different approaches to treatment. One example is the use of hormone replacement therapy for menopausal hot flashes. Under these circumstances, clinicians must know the evidence and communicate the evidence to their patients, or conduct a detailed inquiry to ensure their recommendations are consistent with patients values and preferences[10].

Determinants of strength of evidence


Beyond the quality of the evidence, a number of other factors may bear on whether recommendations are strong or weak (Table 3). The choice of adjusted dose warfarin versus aspirin for prevention of stroke in patients with atrial fibrillation illustrates a number of the factors that will influence the strength of a recommendation. A systematic review and meta-analysis found a relative risk reduction (RRR) of 46% in all strokes with warfarin versus aspirin[11]. This large effect supports a strong recommendation for warfarin. Furthermore, the relatively narrow 95% confidence interval (RRR 29 to 57%) suggests that warfarin provides a RRR of at least 29%, and further supports a strong recommendation. At the same time, warfarin is associated with an inevitable burden of keeping dietary intake of vitamin K constant, monitoring the intensity of anticoagulation with blood tests, and living with the increased risk of both minor and major bleeding. Most patients, however, are much more stroke averse than they are bleeding averse[12]. As a result, almost all patients with
Table 3 Issue Methodological quality of the evidence supporting estimates of likely benefit, and likely risk, inconvenience, and costs Importance of the outcome that treatment prevents Magnitude of treatment Effect Precision of estimate of treatment Effect Risks associated with therapy Burdens of Therapy

high risk of stroke would choose warfarin, suggesting the appropriateness of a strong recommendation. This last point emphasizes the importance of the patients baseline risk, sometimes called control event rate, of the adverse outcome that treatment is designed to avoid (Table 3). Consider a 65 year-old patient with atrial fibrillation and no other risk factors for stroke. This individuals risk for stroke in the next year is approximately 2%. Considering the relative risk reduction and this baseline risk one can derive the absolute magnitude of an effect (table 3). Doseadjusted warfarin can, relative to aspirin, reduce the risk to approximately 1% for an absolute risk reduction of 1% (2% 1%). Some patients who are very stroke averse may consider the down sides of taking warfarin well worth it. Given the relative narrow confidence interval that follow from the confidence interval around the relative risk reduction one could make a strong recommendation to use warfarin if all patients were equally stroke adverse. Some patients are, however, likely to consider the benefit not worth the risks and inconvenience. When, across the range of patient values, fully informed patients are liable to make different choices, editors should offer weak (Grade 2) recommendations.

Conclusion
The GRADE system is rigorous in its methodology, yet practical to use. It is neither too complex nor misleadingly simple. The GRADE groups vision was extremely ambitious: to have a system of evaluating

Factors in deciding on a strong or weak recommendation Example Many high quality randomized trials have demonstrated the benefit of inhaled steroids in asthma while only case series have examined the utility of pleurodesis in pneumothorax Preventing post-phlebitic syndrome with thrombolytic therapy in DVT in contrast to preventing death from PE. Clopidogrel versus aspirin leads to a smaller stroke reduction in TIA (8.7% RRR*[13]) than anticoagulation versus placebo in AF (68% RRR) ASA versus placebo in AF has a wider confidence interval than ASA for stroke prevention in patients with TIA ASA and clopidogrel in acute coronary syndromes anticoagulation have a higher risk for bleeding than ASA alone Taking adjusted-dose warfarin is associated with a higher burden than taking aspirin; warfarin requires monitoring the intensity of anticoagulation and a relatively constant dietary vitamin K intake. Some surgical patients are at very low risk of post-operative DVT and PE while others surgical patients have considerably higher rates of DVT and PE Clopidogrel has much higher cost in patients with TIA than aspirin Most young, healthy people will put a high value on prolonging their lives (and thus incur suffering to do so); the elderly and infirm are likely to vary in the value they place on prolonging their lives (and may vary in the suffering they are ready to experience to do so).

Risk of target event Costs Varying Values

*Relative risk reduction 200771 www.cjebm.org.cn 3

quality of evidence and grading recommendations that would become the international standard. Given the rapid adoption of GRADE in the international community since the original publication in the BMJ in 2004, it sees that this vision may possibly come true. The Cochrane Collaboration is moving to adopt the GRADE approach to rating of methodological quality. The Endocrine Society was the first North American organization to adopt GRADE for its recommendations while another important organization, the American College of Chest Physicians (ACCP), has adopted a slightly modified version of GRADE. Other North American organizations have followed, including the very prestigious American College of Physicians and the almost equally prestigious American Thoracic Society, the Surviving Sepsis Campaign and the Ontario Ministry of Health Medical Advisory Secretariat. In what might be the most important advance for GRADE dissemination in North America, the extraordinarily successful electronic medical text UpToDate is formally grading recommendations using the GRADE approach. European organizations that have endorsed GRADE include the European Society of Thoracic Surgery, Agencia sanitaria regionale in Bologna Italy, and the German Agency for Quality in Medicine. International groups that have endorsed GRADE include Kidney disease: Improving global outcome, the Surviving sepsis campaign, and Guidelines International Network. The British Medical Journal is encouraging guidelines submission to the BMJ to use the GRADE approach. If the momentum of uptake continues, GRADE will do more than achieve the worthy and important goal of standardizing systems of grading quality of evidence and recommendations for clinical practice. GRADE may facilitate the evolution toward a world in which expert recommendations for front-line clinicians uniformly adhere to principles of evidence management and guideline development that flow from the intellectual movement we call evidence-based

medicine. References
1 Schunemann HJ, et al. Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations. Cmaj, 2003, 169(7): p. 677680. 2 Atkins D, et al. Grading quality of evidence and strength of recommendations. Bmj, 2004, 328(7454): p. 1490. 3 Guyatt G, et al. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an american college of chest physicians task force. Chest, 2006, 129(1): p. 174181. 4 Schunemann HJ, et al. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med, 2006, 174(5): p. 605614. 5 Wa r k e n t i n T E a n d A . G r e i n a c h e r , H e p a r i n - i n d u c e d thrombocytopenia: recognition, treatment, and prevention: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest, 2004, 126(3 Suppl): p. 311S337S. 6 Clagett GP, et al. Antithrombotic therapy in peripheral arterial occlusive disease: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest, 2004, 126(3 Suppl): p. 609S 626S. 7 Bhandari M, et al. Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials. Cmaj, 2004, 170(4): p. 477480. 8 Alonso-Coello P, et al. Meta-analysis of flavonoids for the treatment of haemorrhoids. Br J Surg, 2006, 93(8): p. 909920. 9 Devereaux PJ, et al. A systematic review and meta-analysis of studies comparing mortality rates of private for-profit and private not-for-profit hospitals. Cmaj, 2002, 166(11): p. 13991406. 10 Charles C., T. Whelan, and A. Gafni, What do we mean by partnership in making decisions about treatment? Bmj, 1999, 319(7212): p. 780782. 11 van Walraven, C., et al., Oral anticoagulants vs aspirin in nonvalvular atrial fibrillation: an individual patient meta-analysis. Jama, 2002, 288(19): p. 24412448. 12 Devereaux, P.A., DR. Gardner, MJ. Putnam, W. Flowerdew, GJ. Brownell, BF. Nagpal, S. Cox, JL., Differences between perspectives of physicians and patients on anticoagulation in patients with atrial fibrillation: observational study. Bmj, 2001, 323(7323): p. 1218 1222. 13 CAPRIE-Steering-Committee, A randomized, blinded trial of clopidogrel versus aspirin in patients at risk of ischemic events. Lancet, 1996. 348: p. 13291339.

Commentary

Chin J Evid-based Med, 2007, Vol.7(1) www.cjebm.org.cn


Gordon Guyatt* MSc, M.D.
*Department of Medicine, McMaster University, Hamilton, Ontario, Canada Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada Gordon Guyatt MSc, M.D. An Emerging Consensus on Grading Recommendations?Chin J Evid-based Med, 2007, 7(1): 18.

AHRQ NICEWHO 2004GRADE BMJ [2] GRADE[3,4]GRADE


[1]

1 RCT HITRCT HIT ACCP [5] 2 RCT


1 2 1) RCT 2) 3) 4) 5)

GRADE2 4 RCTGRADE4 1RCT 5 2


: Dr. Gordon Guyatt, McMaster University, Faculty of Health Sciences, Clinical Epidemiology & Biostatistics, Room 2C12, 1200 Main Street West Hamilton, ON, L8N 3Z5. Tel: (905) 525-9140, x22900; Fax: (905) 524-3841. Email: guyatt@mcmaster.ca 200771 www.cjebm.org.cn

1) 2) 3) 5

RCT RCT 3 RCT AH5N1 33~50 A 4 RCT330 6 29 RRR38 GRADE 5


6
[6]

[7]14 1 432[8] 2 [9]

Commentary

GRADE [10]

3 Meta 46[11] 95CIRR29~57 29 K


Chin J Evid-based Med, 2007, Vol.7(1) www.cjebm.org.cn

3 * 1 TIA8.7 68 TIA K TIA

65 2 3 1
[12]

GRADE GRADE GRADE UpToDateGRADE GRADE GRADEBMJ GRADE GRADE

GRADE GRADE 2004 BMJGRADE CochraneGRADE GRADE


200771 www.cjebm.org.cn

1 Schunemann HJ, et al. Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations. CMAJ, 2003, 169(7): 677680. 2 Atkins D, et al. Grading quality of evidence and strength of recommendations. BMJ, 2004, 328(7454): 1490. 3 Guyatt G, et al. Grading strength of recommendations and quality of evidence in clinical guidelines: report from an american college of chest physicians task force. Chest, 2006, 129(1): 174181. 4 Schunemann HJ, et al. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med, 2006, 174(5): 605614. 5 Wa r k e n t i n T E a n d A . G r e i n a c h e r , H e p a r i n - i n d u c e d thrombocytopenia: recognition, treatment, and prevention: the 7

8 9

10

Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest, 2004, 126(3 Suppl): 311S337S. Clagett GP, et al. Antithrombotic therapy in peripheral arterial occlusive disease: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest, 2004, 126(3 Suppl): 609S626S. Bhandari M, et al. Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials. CMAJ, 2004, 170(4): 477480. Alonso-Coello P, et al. Meta-analysis of flavonoids for the treatment of haemorrhoids. Br J Surg, 2006, 93(8): 909920. Devereaux PJ, et al. A systematic review and meta-analysis of studies comparing mortality rates of private for-profit and private not-for-profit hospitals. CMAJ, 2002, 166(11): 13991406. Charles C., T. Whelan, and A. Gafni, What do we mean by partnership in making decisions about treatment? BMJ, 1999, 319(7212): 780782.

11 van Walraven, C., et al., Oral anticoagulants vs aspirin in nonvalvular atrial fibrillation: an individual patient meta-analysis. JAMA, 2002, 288(19): 24412448. 12 Devereaux, P.A., DR. Gardner, MJ. Putnam, W. Flowerdew, GJ. Brownell, BF. Nagpal, S. Cox, JL., Differences between perspectives of physicians and patients on anticoagulation in patients with atrial fibrillation: observational study. BMJ, 2001, 323(7323): 12181222. 13 CAPRIE-Steering-Committee, A randomized, blinded trial of clopidogrel versus aspirin in patients at risk of ischemic events. LANCET, 1996. 348: 13291339.

Commentary
8

20061220 20070208

Chin J Evid-based Med, 2007, Vol.7(1) www.cjebm.org.cn

Das könnte Ihnen auch gefallen