Sie sind auf Seite 1von 4

EFFECT SIZE AND STATISTICAL POWER ANALYSIS IN BEHAVIORAL AND EDUCATIONAL RESEARCH

P. Onghena, W. Van den Noortgate, & I. Van Mechelen PhD seminars, K.U.Leuven, 10th of October 2003

REFERENCES
GENERAL Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: Erlbaum. Abelson, R. P. (1997). On the surprising longevity of flogged horses: Why there is a case for the significance test. Psychological Science, 8, 12-15. Abelson, R. P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds), What if there were no significance tests? (pp. 117-141). Mahwah, NJ: Erlbaum. American Psychological Association (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author. American Psychological Association (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Berkson, J. (1938). Some difficulties of interpretation encountered in the application of the chi-square test. Journal of the American Statistical Association, 33, 526-536. Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37, 325-335. Box, J. F. (1978). R. A. Fisher: the life of a scientist. New York: Wiley. Carver, R. P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61, 287-292. Cohen, J. (1992). Things I have learned (so far). American Psychologist, 45, 1304-1312. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003. Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75-98. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1990). The empire of chance: How probability changed science and everyday life. Cambridge, UK: Cambridge University Press. Hagen, R. L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15-24. Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Hunter, J. E. (1997). Needed: A ban on the significance test. Psychological Science, 8, 3-7. Levin, J. R. (1993). Statistical significance testing from three perspectives. Journal of Experimental Education, 61, 378-382. Loftus, G. R. (1993). A picture is worth a thousand p-values: On the irrelevance of hypothesis testing in the computer age. Behavior Research Methods, Instruments, & Computers, 25, 250-256. Morrison, D. R., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago, IL: Aldine. Morrison, G. R., & Weaver, B. (1995). Exactly how many p values is a picture worth? A commentary on Loftus's plot-plus-errorbar approach. Behavior Research Methods, Instruments, & Computers, 27, 52-56. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241-301. Oakes, M. W. (1986). Statistical inference: A commentary for the social sciences. New York: Wiley. Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115-129. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26-30. Wainer, H. (1999). One cheer for null hypothesis significance testing. Psychological Methods, 6, 212-213.

2
Wilkinson, L., & Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604. Note: Also available online at http://www.apa.org/journals/amp/amp548594.html EFFECT SIZE Allison, D. B., & Gorman, B. S. (1993). Calculating effect sizes for meta-analysis: The case of the single case. Behaviour Research and Therapy, 31, 621-631. Notes: Discuss possible effect size measures for single-case designs Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, Mass: The MIT Press. Notes: Discuss, e.g., measures of association for nominal data (Goodman-Kruskal tau, Uncertainty coefficient) Cooper, H., & Hedges, L. V. (1994). The handbook of research synthesis. New York: Rusell Sage Found. Notes: A handbook for basic and advanced meta-analysis. Each chapter is devoted to a specific step in a metaanalysis, or to a specific issue regarding meta-analysis. Fleiss, J. L. (1994). Measures of effect size for categorical data. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 245-260). New York: Sage. Notes: Gives an overview and discussion of measures for dichtomous data (risk difference, rate ratio, phi coefficient and odds ratio) Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8. Notes: This article was the real start of the quantitative integration of the results of several studies, or meta-analysis Gleser, L. J., & Olkin, I. (1994). Stochastically dependent effect sizes. In H. Cooper, & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 339-355). New York: Russell Sage Foundation. Notes: Discuss the combination of multiple effect sizes Hedges, L. V. (1981). Distribution theory for Glasss estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107-128. Hedges, L. V., & Olkin, I. (1984). Nonparametric estimators of effect size in meta-analysis. Psychological Bulletin, 96, 573-580. Notes: Give an overview of possible alternatives for the standardized mean difference, that are not or less dependent on distributional assumptions Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press. Notes: A handbook for meta-analysis with a statistical focus. The focus is on combining g's, but several other measures are described as well. Huberty, C. J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227-240. Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage. Notes: Discuss possible threats for effect size measures (especially the product-moment correlation coefficient) and correction factors to avoid misleading effect size measures Kirk, R. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746759. Kromrey, J. D., & Foster-Johnson, L. (1996). Determining the efficacy of intervention: The use of effect sizes for data analysis in single-subject research. Journal of Experimental Education, 65, 73-93. Notes: Describe effect size measures for single-case studies, based on differences in proportion of variance explained Liebetrau, A. M. (1983). Measures of association. London: Sage Publications. Notes: Describes for instance measures of association for nominal data (contingency measures) and measures of agreement (Kappa and weighted Kappa) Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. Psychological Methods, 7, 105-125. Notes: Discuss the comparability of standardized mean differences (and their precision) calculated for dissimilar designs Prentice, D. A., & Miller, D. T. (1992). When small effects are impressive. Psychological Bulletin, 112, 160-164. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). London: Sage Publications.

3
Notes: A handbook for multilevel analysis. A separate chapter is devoted to meta-analysis, which is regarded by the authors as a specific kind of multilevel analysis Ray, J. W., & Shadish, R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. Notes: Discuss, e.g., the denominator to be used to estimate one or more d's based on the results of different kinds of ANOVA Richardson, J. T. E. (1996). Measures of effect size. Behavior Research Methods, Instruments, & Computers, 28, 12-22. Notes: Discusses measures of effect sizes for comparing two groups (standardized mean difference and adaptations) or more groups (measures based on the proportion of explained variance, such as eta and intraclass correlation) Rosenthal, R. (1991). Meta-analytic procedures for social research (rev. ed.). Newbury Park, CA: Sage. Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231-244). New York: Sage. Notes: Different measures and their relation with each other and with test statistics are described Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000) Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge: Cambridge University Press. Notes: Discuss effect size measures and statistical tests for two groups and for contrasts in factorial and repeated measures designs Rosenthal, R. , & Rubin, D. B. (1982). A simple general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166-169. Notes: Present the Binomial Effect Size Display (BESD) Rosenthal, R. , & Rubin, D. B. (1994). The counternull value of an effect size: A new statistic. Psychological Science, 5, 329334. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill. Notes: Describe and discuss for instance measures of association for ordinal data (Spearman's rho, Kendall's tau, Somer's D, Gamma coefficient) Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis. An introduction to basic and advanced multilevel modeling. London: Sage. Notes: Discuss (among other topics) reporting results from multilevel analyses, using standardizing regression coefficients, proportion of the variance explained by the model, and the (residual) intra-class correlation coefficient Tatsuoka, M. (1993). Effect size. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 461-479). Hillsdale, NJ: Erlbaum. Notes: An overview of different measures for ES, also for MANOVA Van den Noortgate, W., & Onghena, P. (2003). Hierarchical linear models for the quantitative integration of effect sizes in singlecase research. Behavior Research Methods, Instruments, & Computers, 35, 1-10. Notes: Propose to use standardized regression coefficients of multilevel models as effect sizes for single-case studies Van den Noortgate, W., & Onghena, P. (in press). Estimating the standardized mean difference in a meta-analysis: Bias, precision, and mean squared error of different weighting methods. Behavior Research Methods, Instruments, & Computers. Vandenberghe, N. (2002). Effectmaten in pedagogisch onderzoek [Effect size measures in educational research]. Unpublished masters thesis, KULeuven, Leuven. Notes: Gives an overview of measures of effect size that were proposed in methodological literature for educational research, investigates if and in what situation these measures are used in practice, and looks for formulae and methods to convert measures to a common effect size measure Wilcox, R. R. (1995). ANOVA: A paradigm for low power and misleading measures of effect size. Review of Educational Research, 65, 51-77. Notes: Warns for low power and misleading effect sizes in case the normality assumption is violated (e.g., due to outliers or heavy tails), and proposes alternative tests and robust effect size measures POWER

4
Bradley, D. R., & Russell, R. L. (1996). Statistical power in complex experimental designs. Behavior Research Methods, Instruments, & Computers, 28, 319-326. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments, & Computers, 28, 1-11. Ferron, J., & Onghena, P. (1996). The power of randomization tests for single-case phase designs. Journal of Experimental Education, 64, 231-239. Goodman, S. N., & Berlin, J. A. (1994). The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine, 121, 200-206. Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26, 499510. Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55, 19-24. Koele, P. (1982). Calculating power in analysis of variance. Psychological Bulletin, 92, 513-516. Kraemer, H. C., & Thiemann, S. (1987). How many subjects? Statistical power analysis in research. Newbury Park, CA: Sage. Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage. Onghena, P. (1994). The power of randomization tests for single-case designs. Unpublished doctoral dissertation, Faculteit Psychologie en Pedagogische Wetenschappen, Katholieke Universiteit Leuven. Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies. Psychological Bulletin, 105, 309-316. Wahlsten, D. (1991). Sample size to detect a planned contrast and a one degree-of-freedom interaction effect. Psychological Bulletin, 110, 587-595. Wilcox, R. R. (2003). Power: Basics, practical problems, and possible solutions. In J. A. Schinka & W. F. Velicer (Eds.), Handbook of psychology, Vol. 2: Research methods in psychology (pp. 65-85). Hoboken, NJ: Wiley. Zumbo, B. D., & Hubley, A. M. (1998). A note on misconceptions concerning prospective and retrospective power. The Statistician, 47, 385-388.

Das könnte Ihnen auch gefallen