Sie sind auf Seite 1von 3

Pain 96 (2002) 403412 www.elsevier.

com/locate/pain

Letters to the Editor

Spiritual healing as a therapy for chronic pain: a randomized, clinical trial (Abbot et al., PAIN 2001;91:7989) Professor Ernst and his group have presented a well controlled and cleverly designed study on spiritual healing in chronic pain. While we would certainly agree that the study is valid, we do have some serious problems with the presentation of the data, the discussion of the ndings and with some general aspects of the planning which in our eyes should be taken into account when interpreting the results. We would like to point these out and ask you to make our reply publicly available. For simplicitys sake we focus on the rst half of the trial, i.e. healing vs. sham healing. (1) Misprint There is an obvious and important misprint in the legend of the results in Table 3: Negative scores represent improvement. Positive scores represent improvement. This is a typing error but it does not allow a clear-cut interpretation of the results. We take the sentence to mean that in parts of the data (MPQ, VAS, HAD, MYMOP) negative scores represent improvement, and in SF 36 positive data represent improvement, and we will proceed on that presupposition. (2) Inconsistencies in sample sizes and condence intervals (CIs) Why are the sample sizes in Table 2 smaller than those in Table 3 and do not match with the trial ow chart? From the trial ow chart we can gather that at the initial interview (i.e. baseline 1) 60 patients were present and should have given data, and at the start of the trial (i.e. baseline 2) 56 patients gave data. If we exclude the ten withdrawals we should expect baseline data from at least 46 patients. Only data from 40 patients, sometimes 41, are reported. How could the baseline data have smaller n? Because baseline 2 was used for comparison, Table 2 should have provided us with the actual means used, not the ones that are there for a slightly reduced sample size. The means and CIs do not match those given in the text and apparently used to get the results in Table 3. (3) Inconsistencies regarding the statistics Where was ANOVA used, or is that just a fancy way to say that a t-test was done? Where/how was the Newman

Keuls procedure used? If comparisons were done with some sort of repeated measures ANOVA, should not the CI widths be the same in the rst two numerical columns of Table 3 (because in both cases n 25)? The only reason those widths should differ is if different standard deviations were used. If ANOVA was used, then would not MSE be used to estimate the standard deviation in both cases? Why were two-tailed tests done, when the hypothesized direction was clear from the outset? This reduced the power. (4) Power The power calculation, which the authors present, expected a specic effect of d 0:8 standard deviations between healing and sham healing. This means that the authors, or the healers, or both expected spiritual healing to be very effective, indeed more effective than nearly any intervention known. Power calculation can be done in various ways. Frequently, power analysis is done rather ad hoc, in order to have a proper catchword in the publication for referees to skip the point in question. In that case some virtual effect is presumed and the sample size to detect precisely this effect is calculated. Sometimes one recalculates effect sizes after the study to see how big an effect could have been possibly discovered with the sample size at hand at reasonable power. Obviously one of these ad hoc procedures was used here. The point in question is: just how reasonable is such an assumption, to expect to nd a specic difference of d 0:8 standard deviations in the rst place? After all, chronic pain patients are, by denition, patients who are not well helped with other available, effective treatment options, and placebo effects can be quite impressive in pain patients. Just for the sake of comparison: Morley et al. (1999) in a meta-analysis of the effects of behavior therapy for chronic pain in adults a comparable disease entity and a loosely comparable intervention found a median effect size of d 0:5. Fishbain et al. (1998) in a meta-analysis of antidepressants in pain disorders found: The overall effect size was large (mean 0:48). Superio-Cabuslay et al. (1996) in a meta-analysis of patient education and non-steroidal anti-inammatory drugs (NSAIDs) in arthritis found effect sizes of d 0:27, d 0:38, d 0:47, d 0:65 and d 0:84 for NSAIDs and between d 0 and d 0:28 for patient education. The latter is an accepted intervention meanwhile, because it involves patients and is without

0304-3959/02/$20.00 q 2002 International Association for the Study of Pain. Published by Elsevier Science B.V. All rights reserved.

404

Letters to the Editor

risks and side-effects. In our own trial, which was a randomized waiting list controlled trial of spiritual healing vs. waiting in patients who had no conventional treatment options left, we found what Ernst and colleagues would call a non-specic effect of d 0:66, which we consider worthwhile (Wiesendanger et al., 2001). We know of no study in the healing literature and of no pilot trial which would justify the apriori assumption of an effect size of the magnitude d 0:8. In fact, we are not aware of any potent conventional drug or treatment except acute medication which would produce such a strong specic treatment effect in chronic pain patients. The most potent drugs in migraine therapy, triptanes, produce specic net effects in 1531% of patients reaching the outcome of being pain free after 2 h (Diener et al., 2000). So our question is: just how realistic was the assumption of an effect size d 0:8 from the outset? Was it really thoughtfully assumed? If so, on what grounds? What are the comparison standards for that? If it was an ad hoc assumption, how can such an assumption pass a rigorous peer review process? If we assume an effect size of d 0:5, it would have required n 63 subjects per group to have a power of 80%. With a sample size of 25 per group, assuming an effect size of d 0:5, power was 0.42. (5) Specic and non-specic effects Intimately related to this question is the discussion of the results against this background. The authors claim, correctly, to not have demonstrated a specic effect. This is not amazing considering the lack of power in this study for small effects. In Table 1 we have recalculated the data under the provision that we interpret the signs correctly, which is difcult because of the misprint in the legend and converted them to effect size measures d (standardized differences). Hereby, we only look at study 1 (since trial 2 was obviously negative if one is to trust the signs at all). We have recalculated standard deviations from the given CIs assuming that a t-distribution with the respective degrees of freedom was used. If, in fact, a normal distribution was used for calculating the CIs, then the effect sizes are slightly smaller. Effect sizes pre-post represent the improvements or aggravations patients experienced regardless of group assignment. Between-group effect sizes are conventional effect size

measures and represent the specic difference. For simplicitys and robustness sake we have used the standard deviation of the control group for calculating d. We abbreviate by giving only results for the rst three scales of the SF 36, VAS rst parameter and MYMOP primary symptom. Note that the between-group effect size is not the simple difference of pre-post effect sizes, but directly calculated. What is immediately visible is the fact that for the main outcome measure, the PRIT of the McGill Pain Questionnaire, patients in both groups experience quite sizeable improvements, and that the difference between the groups with an effect size of d 0:29 is not large, but within the range of what would be expected as a specic effect. There are also quite inconsistent ndings, mainly in the SF 36 and the HAD. The SF 36 is known to be an unstable measure for small sample sizes. The PRIT result is corroborated by the MYMOP primary symptom. If we compare the effect sizes found in this study with those reported in the meta-analyses quoted above, we see that they are within the range of effect sizes to be expected for behavioral interventions. Of course, we have to bear in mind that pre-post-effect sizes are different from betweengroup effect sizes, as reported in meta-analyses, since the latter represent differences from controls, in the case of behavioral interventions mostly waiting list controls, whereas the pre-post-effect sizes represent differences from baseline. But does this matter for patients? Patients are interested in improvements, and improvements were obviously seen in this study. (6) Small specic effects Although we doubt that specic effects are all that matter, let us, for the moment, accept it and ask: is there really no specic effect? Both in the primary outcome parameter PRIT score and in the secondary, MYMOP primary symptom, we nd similar effect sizes of d 0:29=0:28. In order to see what that means we convert it to an effect size measure r, roughly corresponding to a correlation coefcient, and nd r 0:14. Using Rosenthals binomial effect size display (Rosenthal, 1991) this means that 57% in the treated group and 43% in the control group would improve. We doubt that patients would not be willing to try a method, if they had a 57% chance of improvement compared to 43%. The problem is: we cannot be sure that this would really be

Table 1 Effect sizes (standardized mean differences) of outcome measures Outcome PRIT VAS (average) SF 36 physical functioning SF 36 role functioning SF 36 pain HAD anxiety HAD depression MYMOP primary symptom d within (pre-post) healing 1.12 0.21 0.62 0.05 0.47 0.18 0.29 0.62 d within (pre-post) sham 0.83 0.26 0.01 0.23 0.62 0.29 0.58 0.34 d between-group 0.29 20.05 0.63 20.19 20.15 20.12 20.29 0.28

Letters to the Editor

405

the true gure. It might be smaller or larger, we do not know, because the study was not powered to detect an effect of that magnitude. However, in patients with chronic pain even small effects are sizeable and worth the effort. The authors have not shown a specic effect of the size anticipated. But it was unreasonable in the rst place, we contend, to anticipate a specic effect of d 0:8. They have shown strong non-specic effects, and we hold that also these are therapeutically valuable with chronic pain patients. And they have found a small but rather interesting specic effect. Alas, the study was not powered to detect it. The question is still open, whether spiritual healing has a specic effect. What is increasingly emerging is the fact that the non-specic effects are sizeable. We are wondering whether it is not time for the scientic community to pose the question, how these non-specic effects are produced and how they can be harnessed for effective treatment. Healing seems to be one method to do this.

Reply to the Letter to the Editor We thank Walach et al. for their interest in our study and would comment as follows.

1. Misprint There is indeed a misprint in Table 3. It should, of course, read negative values represent improvement for MPQ, VAS, HAD and MYMOP and positive values represent improvement for the SF-36

2. Inconsistencies in sample sizes and condence intervals The sample sizes in Table 2 refer only to those patients who completed two baseline measurements as this states ....at two baseline assessments (B1 and B2) measured 3 weeks apart in those patients who completed both assessments. The trial owchart refers to patients entering or leaving the trial, not to whether or not they completed both baseline assessments. However, all patients had at least one baseline, and it is from this that subsequent changes in outcome measures are calculated. The data in Table 2 are intended only to allow the important comparison of two pre-trial measurements: they are not meant to match subsequent means and CIs.

References
Abbot NC, Harkness EF, Stevinson C, Marshall FP, Conn DA, Ernst E. Spiritual healing as a therapy for chronic pain: a randomized, clinical trial. Pain 2001;91:7989. Diener HC, Brune K, Gerber W-D, Pfaffenrath V, Straube A. Therapie der Migraneattacke und Migraneprophylaxe. Empfehlungen der deutschen Migrane- und Kopfschmerzgesellschaft. Aktuelle Neurologie 2000;27: 273282. Fishbain DA, Cutler RB, Rosomoff HL, Rosomoff RS. Do antidepressants have an analgesic effect in psychogenic pain and somatoform pain disorder: a meta-analysis. Psychosom Med 1998;60:503509. Morley S, Eccleston C, Williams A. Systematic review and meta-analysis of randomized controlled trials of cognitive behaviour therapy for chronic pain in adults, excluding headache. Pain 1999;80:113. Rosenthal R. Meta-analytic procedures for social research. Newbury Park: Sage, 1991. Superio-Cabuslay E, Ward MM, Lorig KR. Patient education interventions in osteoarthritis and rheumatoid arthritis: a meta-analytic comparison with nonsteroidal antiinammatory drug treatment. Arthritis Care Res 1996;9:292301. Wiesendanger H, Werthmuller L, Reuter K, Walach H. Chronically ill patients treated by spiritual healing improve in quality of life: results of a randomized waiting-list controlled study. J Altern Complement Med 2001;7:4551. Harald Walach a,*, George Lewith b, Holger Bosch c, Jessica Utts d Department of Environmental Medicine and Hospital Epidemiology, University Hospital Freiburg, Hugstetter Strasse 55, D-79106 Freiburg, Germany b Department of Complementary Medicine, University Hospital Southamptom, Southampton, UK c Department of Environmental Medicine and Hospital Epidemiology, University Hospital Freiburg, Freiburg, Germany d Department of Statistics, University of California Davis, USA
a

3. Inconsistencies regarding the statistics The methods section makes it clear that no repeated measures analysis or complex statistical models were used for these data. One-way analysis-of-variance was used. For example, the mean change from baseline at each of the time points for a group was compared with each of the others in one ANOVA. The multiple range test was chosen in preference to another multiple comparisons test since the consensus in the literature is that most tests available are equivalent. Why the Bonferroni was not used is discussed in the text. The means and 95% CIs shown are those of the actual data in the relevant group, and are not the product of a statistical model. Although the null hypothesis was that healing would have no signicant positive effect, a two-tailed test was used following the standard advice (Armitage and Berry, 1987) that a one-sided test should be used only if it is quite certain that departures in one particular direction will always be ascribed to chance and therefore regarded as non-signicant however large they are. This certainty rarely exists, and could not exist for a trial of spiritual healing.

* Corresponding author. Tel.: 149-761-270-5497; fax: 149-761-2707224. E-mail address: walach@ukl.uni-freiburg.de (H. Walach) PII: S 0304-395 9(02)00417-1

Das könnte Ihnen auch gefallen