Beruflich Dokumente
Kultur Dokumente
Jack Barnette, PhD has served as a faculty member at Penn State University, University of Virginia, University of Memphis, University of Alabama at Tuscaloosa, University of Iowa, and is now Senior Associate Dean for Academic Affairs and Professor of Biostatistics at the University of Alabama at Birmingham. He has served as an APHA Statistics Council Member and Section representative to the APHA Action Board. Presently, he is chairing the ASPH biostatistics competency workgroup and is co-chair of the ASPH Biostatistics/Epidemiology Section. He has more than 30 years experience in teaching, advising students, and applying research, evaluation, and statistical methods to a wide variety of educational and public health projects. He has conducted evaluations of projects funded by CDC, HRSA, SAMHSA, NHLBI, and NIOSH. He serves on three of the ASPH/CDC Preparedness Exemplar Groups: Education and Evaluation Methods, Certificate Programs, and University-based Student Preparedness. He has been conducting research on the use of effect sizes and measures of association for the past eight years and he has presented pre-sessions on this topic at the last four AEA annual meetings. He holds the PhD in Educational Research and Development from Ohio State (1972). Offered: Monday, June 11, 9:25 12:45 (20 minute break within) Wednesday, June 13, 8:30 11:50 (20 minute break within)
J. Jackson Barnette, PhD Senior Associate Dean and Professor of Biostatistics School of Public Health University of Alabama at Birmingham
Session 1 Title of this Session Overview of effect size measures or When a good p is not quite enough
References
There are a lot of references to this history I have summarized the history as reported in: Kline, R. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington: APA. and Nix, T. & Barnette, J. (1998). The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing. Research in the Schools, 5(2), 3-114. Carl Huberty (UGA Prof. Emeritus) has published a lot on this topic
14
16
20
24
Confidence Intervals
Confidence intervals are usually not included in categorizations of effect sizes, but since they are also ways of describing the magnitude of effects they probably should be included In fact, we are now seeing a call for confidence intervals around observed standardized effect sizes and measures of strength of association Why not have confidence intervals of confidence interval limits? No reason we couldnt.
Dr. Jack Barnette Eval Inst 2007 - Effect Size 25
Variance-Accounted-for Statistics
These statistics are very similar (or even the same) to the notion of the correlation coefficient (r) and the coefficient of determination (r2). The correlation measures the linear relationship between two variables and the squared correlation is a measure of the proportion of variance that is common to the two variables
Dr. Jack Barnette Eval Inst 2007 - Effect Size 28
Variance-Accounted-for Statistics
This same notion is found in the varianceaccounted-for statistics. They cover the range of 0 where none of the variation in the dependent variable is attributable to the variation of the treatment variable to 1 where all of the variation in the dependent variable is attributable to the variation of the treatment variable
Dr. Jack Barnette Eval Inst 2007 - Effect Size 29
Variance-Accounted-for Statistics
r2 is one such statistic There are several others that we may or may not recognize as variance-accountedfor statistics Cronbachs Alpha and Cohens Kappa are also examples Others we use in effect size assessment are usually referred to as strength of association or strength of relationship measures
Dr. Jack Barnette Eval Inst 2007 - Effect Size 30
Variance-Accounted-for Statistics
The examples we will discuss are: Eta-squared, 2, also known as the correlation ratio Omega-Squared, 2 Intraclass Correlation, ICC or I Cramers V for chi-square applications
The Call(s)
The call in the literature 1. Examples from publication manuals 2. Examples from journal guidelines
31
32
33
34
37
The Source
American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author.
42
43
44
MORE
At least 24 journals now require effect size reporting
... 11 Journal of Early Intervention 12 Journal of Ed & Psychological Consultation 13 Journal of Experimental Education 14 Journal of Learning Disabilities 15 Journal of Personality Assessment 16 Language Learning 17 Measurement & Evaluation in Counseling & Development ...
More
At least 24 journals now require effect size reporting
... 18 The Professional Educator 19 Reading and Writing 20 Research in the Schools 21 Educational Technology Research and Development 22 Health Psychology (APA) 23 Journal of Educational Psychology (APA) 24 Journal of Experimental Psychology: Applied And the list grows daily
Unresolved Issues
There needs to be greater understanding of what they can do and what they cant do Misconceptions of interpretation Identifying relationships and conversions Getting a clear understanding of how effect size confidence intervals are computed Assessing how useful effect size confidence interval can be
49 Dr. Jack Barnette Eval Inst 2007 - Effect Size 50
Session 2
Technical Definitions And Equations for Computation
54
Relative Risk
A way of defining this is:
RR =
D B+D
C+D N
57
The ratio of the probability of some event happening such as death or disease in the group who receives the treatment or is exposed to the probability of that event happening in the group that does not receive treatment or exposure
Dr. Jack Barnette Eval Inst 2007 - Effect Size 58
Relative Risk
Relative Risk is the ratio of those exposed who have the outcome condition to those who are not exposed but who have the outcome or condition A/(A + B) Relative Risk ( RR ) = C/(C + D) If RR is equal to 1.00 the same risk is in both the exposed and unexposed groups
Dr. Jack Barnette Eval Inst 2007 - Effect Size 59
Relative Risk
If RR is less than one, the exposed group has a lower likelihood of having the condition as compared with the nonexposed group If RR is greater than one, the exposed group has a higher likelihood of having the condition as compared with the nonexposed group
Dr. Jack Barnette Eval Inst 2007 - Effect Size 60
10
Odds Ratio
The Odds Ratio is a ratio of two odds: We would have the odds from two groups to compare for the odds ratio:
Odds Ratio = Odds for Group 1 Odds for Group 2 Odds of Being Exposed in Group with Condition Odds of Being Exposed in Group without Condition
Odds Ratio
The Odds Ratio is the ratio that a patient or subject in an experimental group is exposed to the risk factor relative to a patient or subject in the comparison group is exposed to the risk factor
Odds Ratio (OR ) =
61 Dr. Jack Barnette
Odds Ratio =
A/C AD = B/D BC
62
Confidence Intervals
For both the relative risk and the odds ratio, the distributions around these values are not normal, but the log of the distributions is distributed normally We find the standard error in log units, then determine lower and upper limits of the log of these values and then we convert them back to the original scale
Dr. Jack Barnette Eval Inst 2007 - Effect Size 63
64
11
1 2
d=
X1 X 2 s
Cohen used this as the basis his research on power. and s represent the total score standard deviation
Dr. Jack Barnette Eval Inst 2007 - Effect Size 67
g' =
g=
X1 X 2 s Pooled
Eval Inst 2007 - Effect Size
This pooled variance term is the same one that would be used in the t test
Dr. Jack Barnette
12
Cohens Standards
There is always the issue of how large an effect size is meaningful? It is suggested that this be based on other similar finding in the literature and/or what would be a meaningful difference for pedagogic, treatment, fiscal, etc. reasons.
Cohens Standards
Cohen needed to base his research on power on some effect sizes so he pretty much arbitrarily chose three values that had been used extensively as standards for effect sizes: 0.2 is a small effect 0.5 is a medium effect 0.8 is a large effect
Dr. Jack Barnette Eval Inst 2007 - Effect Size 74
73
Cohens Standards
Cohen warned about using these in practice, but these seem to have become the default values for a great deal of research. The major problem with this is that effect sizes are influenced by the number of samples and the sample sizes.
Dr. Jack Barnette Eval Inst 2007 - Effect Size 75
Cohens Standards
Thus, using these same standards across many different research arrangements is not a good practice. They may be attained just by chance in several situations. In many cases, we dont have estimates of effect sizes for use in power determinations so these are often used, i.e. will be looking to find a moderate effect size so we use d= 0.5
Dr. Jack Barnette Eval Inst 2007 - Effect Size 76
Checking up on Standards
Here are some results from a Monte Carlo simulation study where 5,000 standardized effect sizes (comparing means) were generated at random from a unit normal distribution using 2 through 10 samples with samples sizes from 5 to 100 in steps of 5. Following are some of the results:
Dr. Jack Barnette Eval Inst 2007 - Effect Size 77
13
n= 25, mean= .41, p>.2= .90, p>.5= .29, p>.8= .03 n= 50, mean= .29, p>.2= .74, p>.5= .06, p>.8= .00 n=100, mean= .21, p>.2= .49, p>.5= .00, p>.8= .00
79
80
83
14
Eta-Squared
A 2 of 0.25 would indicate that 25% of the total variation is accounted for by the treatment variation.
2 =
SSTreatment SSTotal
Eval Inst 2007 - Effect Size 86
85
Eta-Squared
Positives: easy to compute, easy to interpret. Negatives: it is more of a descriptive than inferential statistic, it has a tendency to be positively biased and chance values are a function of number and size of samples. BUT ITS NOW EXTENSIVELY USED AS AN INFERENTIAL STATISTIC in effect size reporting
Dr. Jack Barnette Eval Inst 2007 - Effect Size 87
Partial Eta-Squared
In a case of other than one-way ANOVA, often the error term for an effect is no longer MSError In that case, partial 2 is computed instead
2 = Partial
r=
d d2 +4
Looks like converting from d to r (then squaring to get r2) is pretty easy and straightforward
Dr. Jack Barnette Eval Inst 2007 - Effect Size 90
15
d =
2r 1 r2
DO NOT BE DECEIVED.
It is easy to think they are useful beyond that, but they are not. Ill demonstrate this later.
Dr. Jack Barnette Eval Inst 2007 - Effect Size 91
92
Omega-Squared
2 is computed using terms from the ANOVA
2 =
94
Omega-Squared
Positives and Negatives (Pun intended) of 2 Positives: it is an inferential statistic that can be used for predicting population values, easily computed, it does remove much of the bias found in 2. Negatives: it can have negative values, not just rounding error type, but relatively different than 0. If you get one that is negative, call it zero.
Dr. Jack Barnette Eval Inst 2007 - Effect Size 95
Omega-Squared Values
Values associated with levels of 2
According to Kirk (1995). Note these are the same values Cohen gave for eta-squared. Can this be true, equating 2 and 2? 0.010 is a small association (0.2) 0.059 is a medium association (0.5) 0.138 is a large association (0.8) How can this be? IT CANT BE
Dr. Jack Barnette Eval Inst 2007 - Effect Size 96
16
Intraclass Correlation
Omega-squared is used when the independent variables are fixed. Occasionally, the independent variables may be random in which case the intraclass correlation is used to assess strength of association.
Intraclass Correlation
Values to determine the ICC come from the ANOVA. If unequal group sizes use the harmonic mean in place of n
Intraclass Correlation
Can also find from:
ICC =
F 1 (n 1) + F
ICC =
The ICC is a variance-accounted-for statistic, interpreted in the same way as is Omega-Squared. It also has the same strengths and weaknesses.
99 Dr. Jack Barnette Eval Inst 2007 - Effect Size 100
Cramers V
A measure of strength of association used when the statistical test is based on a chisquare distribution. It is a proportion of variance accounted for statistic.
Cramers V
Mean Square Contingency Coefficient (Cramer) The measure of association for contingency tables
'2 =
2 2 estimated as V = 2 Nq Max
Has a range of 0 to 1 and is an indication of the proportion of common variance for two metric or ordinal variables using the chi-square statistic computed for a contingency table
101 Dr. Jack Barnette Eval Inst 2007 - Effect Size 102
17
Computational Examples
Well take a look at how the common ones are computed using data Then well look at how we determine confidence intervals for some of these using SPSS and SAS
Session 3
Computational Examples of Commonly Used Effect Size and Association Measures and Available Confidence Intervals
103
This Session
We will look at examples of computing the common effect size/strength of association measures Well include confidence intervals of some of these: d, 2, and Cramers V using SAS macros.
105
18
111
112
19
Smithsons Utilities
It is not an easy task to find the confidence intervals. Michael Smithson has provided macros for use in SPSS and SAS programs to help, but there is a reasonable amount of work needed to use them Go to: http://www.anu.edu.au/psychology/people/ smithson/details/CIstuff/CI.html
Dr. Jack Barnette Eval Inst 2007 - Effect Size 115
t Test Example
ANOVA Summary Table Source df SS MS Bet. 1 30.817 30.817 Error 58 390.033 6.725 Total 59 420.850 Note: t2 = F
Dr. Jack Barnette Eval Inst 2007 - Effect Size 119 Dr. Jack Barnette
t Test Example
Cohens d
F 4.583
p 0.037
d=
dchance =
120
20
d Confidence Interval
Notice how large this CI is?
dCI0.95
: 0.035 to 1.066
122
t Test Example
Glasss g
g' = X Experiment al X Control sControl = 15 .27 13 .83 1.44 = = 0.720 2.001 2.001
Type Cohen
MSError
Total SS/df
Hedgess g
Glass
0.539 Hedges n1+n22 Pooled Sample 1 + Sample 2 Sum of Squares Notice why Cohen d and Hedges g are so close? Lots of folks compute Hedgess g and call in Cohens d
Dr. Jack Barnette Eval Inst 2007 - Effect Size 124
sControl sPooled
nc 1
0.720
Note that t2= 3.3082= 10.943= F As it should be (with a tad of rounding error). But what about the effect size to use?
Dr. Jack Barnette Eval Inst 2007 - Effect Size 126
21
g'=
This will be a negatively biased estimate of the effect size because the correlation of the pre and post-test scores is not accounted for
Dr. Jack Barnette Eval Inst 2007 - Effect Size 128
This is the error term for the dependent t, but it is not the SD of the differences which is 1.434 So, one option would be a g (pooled) approach, which would be:
d =
post
pre
MS residual
10 . 10 8 . 60 = 1 . 479 1 . 028
g=
Dr. Jack Barnette
X post X pre sd
132
22
7.32% of Variance Accounted for Chance predicted values from Barnette/ McLean for 2: 2 -1.0250 = 0.0171 K=2, n=30 chance = 0.5573 n
Dr. Jack Barnette Eval Inst 2007 - Effect Size 133
2 Confidence Interval
Notice how large this CI is?
2
CI 0.95
: 0 to 0.2214
135
136
10.67% of the Variance Accounted for ICC tends to be higher than Omega-squared, by a factor of two in many cases, but the difference disappears as Omega-Squared approaches 1
Dr. Jack Barnette Eval Inst 2007 - Effect Size 137
CI0.95 13.37 to 15.56 11.21 to 12.93 14.12 to 16.41 13.84 to 16.10 12.01 to 13.59 13.47 to 14.35
138
23
Chance predicted values from Barnette/McLean for d: K=5, n=30 d chance= 5.23883 / 149 = 0.4292
139
140
g'=
Hedgess
g =
Pairwise d Values
One approach to doing this which would cut down on the total number needed to do would be to conduct a post-hoc pairwise analysis such as Tukeys HSD and only compute d for the pairs that are significantly different Well do that for our example
c=
K(K 1) 5(5 1) 20 = = = 10 2 2 2
143
144
24
Pairwise d Values
Tukeys HSD would be:
HSD = q , J ,dfe MSE 7.400 = q0.05,5,145 = 3.91 0.497 = 1.94 n 30
Pairwise d Values
Following would be a Table of the individual d values and their 2 equivalents Pair |Diff.| |d| 2 2-3 3.20 1.176 0.263 2-4 2.90 1.066 0.227 2-1 2.40 0.882 0.167 3-5 2.47 0.908 0.175 4-5 2.17 0.800 0.142
Dr. Jack Barnette Eval Inst 2007 - Effect Size 146
Any pairwise difference of 1.94 or greater would be significant at p< 0.05. This would be:
2 3 2 4 2 1 3 5 4 5
Dr. Jack Barnette Eval Inst 2007 - Effect Size 145
2 =
2 Confidence Interval
Notice how large this CI is?
CI 0.95
: 0.0649 to 0.2721
2 =
149
150
25
151
Factorial Designs
A factorial design permits us to look at how combinations of treatments might affect mean scores of groups Of primary importance is finding out if combinations of treatments affect the mean scores of subjects exposed to the combinations
Factorial Designs
Example of a 2 x 3 Factorial Design A pilot study was designed to examine the interaction effects of teaching methods and style of teacher on student achievement. Fifteen teachers classified as "Direct" teachers and 15 classified as "Indirect" teachers were randomly selected from a large pool of teachers. One third of each group was randomly assigned to teach either in a laboratory setting, a lecture setting, or a combination lab/lecture setting. Each teacher taught his/her class for a period of time and then his/her students were measured with the same achievement test. Average class achievement, the dependent variable, was computed for each teacher.
153 Dr. Jack Barnette Eval Inst 2007 - Effect Size 154
Factorial Designs
We will be testing for three effects: Two Main effects: Effect of Teacher Type (comparing 7.47 and 7.67), dfA= J 1= 2 1 = 1 Effect of Instructional Method (comparing 7.40, 6.10, and 9.20), dfB= K 1= 3 1= 2 One Interaction Effect: Interaction Effect (combination of teacher type and instructional method, comparing the cell means with each other 4.80, 9.20, 8.40, 10.00, 6.10, 9.20), dfAB= dfA dfB= 1 2= 2
Dr. Jack Barnette Eval Inst 2007 - Effect Size 156
Indirect 10.00 (2.35) 3.00 (1.22) 10.00 (2.24) 7.67 (3.89) Total 7.40 (3.53) 6.10 (3.54) 9.20 (1.87) 7.57 (3.24)
155
26
Factorial Design Example, Int eraction of Teacher Type and Instructional M ethod on St udent Achievement
Source Teacher Type Instructional Method Interaction Within Groups (error) Total
df 1 2 2 24 29
p 0.776 0.005
10
Di r ect
Indi r ect
0.000
Lab
Lect ur e
Combinat ion
158
Omega-Squared
2 =
SSBetween (k 1)MSError 169.800 (6 1)3.617 = = 0.5011 306.367 3.617 SSTotal MSError
Intraclass Correlation
ICC = 84.900 3.617 MSTreatment MSError = = 0.8180 MSTreatment + (n 1)MSError 84.900+ (5 1)3.617
Eval Inst 2007 - Effect Size 160
Factorial Design
For the interaction, we could do tests for simple effects and compute d and 2 for the ones that are significant If the interaction is not significant, we can do follow-ups for the significant main effects and compute d and 2 for those Wed use the same approach as we did in the One-Way example before
Dr. Jack Barnette Eval Inst 2007 - Effect Size 161
27
Randomized Blocks, Simple Repeated Measures Person Drug A Drug B Drug C Mean 1 4 7 9 6.7 2 3 9 7 6.3 3 5 3 6 4.7 4 2 5 5 4.0 5 4 3 8 5.0 6 3 7 7 5.7 Mean 3.5 5.7 7.0 5.4
Dr. Jack Barnette Eval Inst 2007 - Effect Size 163
164
Partial Eta-Squared
Would find Eta-squared as a measure of strength of association or effect size using
2 = partial
Chi-Square Example
Contingency Table (Frequencies), N= 143 Poor Fair Good Excellent Program A 4 6 17 11 Program B 2 3 28 26 Program C 7 12 15 12 2 obs= 16.221, df= 6, p= 0.013
165 Dr. Jack Barnette Eval Inst 2007 - Effect Size 166
Chi-Square Example
Cramers V (Variance Accounted for)
V =
V =
2
Nq
2
Nq =
28
170
171
172
Session 4
Relationships Among the Common Measures
173
29
176
Relationships
When K= 2, it is very straightforward to convert any one of these to all of the others if the two sample sizes are known: t statistic p value d 2 2 ICC These are all directly related, may not be linear, but they are directly related (exact predictability)
Dr. Jack Barnette Eval Inst 2007 - Effect Size 177
Converting Values
As indicated, if we know the sample sizes of two groups where we want to compare the means, we can convert one effect size to other values There is a perception that the p-value and Cohens d are independent, that they represent different things In fact, they only represent values on a different metric, but they are exactly related
Dr. Jack Barnette Eval Inst 2007 - Effect Size 178
A Demonstration
I have written a program that, for two groups with known sample sizes n1 and n2, we can convert from one of the following: t, p, d, 2, 2, ICC to ALL of the others in the set This software is in development and not available at this time, perhaps it will be later
Dr. Jack Barnette Eval Inst 2007 - Effect Size 179
Converting Values
Ill use that program to generate some converted values of p to the other values and d to the other values. Well look at situations where: n1= 10, n2= 10 n1= 30, n2= 30 n1= 60, n2= 60
Dr. Jack Barnette Eval Inst 2007 - Effect Size 180
30
Converting Values
The range of p values is 0 to 1 We often use p values such as 0.10, 0.05, 0.01, 0.001, or 0.0001 and well add 1.00 and 0.50 The range of |d| values is 0 to X, based on the sample sizes Common values of d are: 0.20, 0.50, 0.80 and well add 0.00, 1.00 and 2.00
Dr. Jack Barnette Eval Inst 2007 - Effect Size 181
2
0.0000 0.0257 0.1431 0.1969 0.3152 0.4607 0.5780
2
-0.0526 -0.0270 0.0912 0.1458 0.2670 0.4183 0.5410
2
0.0000 0.0110 0.0649 0.1509 0.2174 0.5263
2
-0.0526 -0.0417 0.0124 0.0991 0.1667 0.4872
2
0.0000 0.0079 0.0460 0.0646 0.1090 0.1716 0.2314
2
-0.0170 -0.0091 0.0290 0.0477 0.0922 0.1551 0.2153
2
0.0000 0.0102 0.0607 0.1420 0.2055 0.5085
2
-0.0170 -0.0067 0.0438 0.1254 0.1892 0.4958
31
Converting Values
These will be the exact values of the other measures when we have the given p and d values for the given sample sizes There will be no variation These are NOT independent of each other There is only a different metric of measurement
Dr. Jack Barnette Eval Inst 2007 - Effect Size 188
2
0.0000 0.0101 0.0598 0.1399 0.2027 0.5042
2
-0.0084 0.0017 0.0514 0.1317 0.1946 0.4979
d and 2, K=2, n= 5
Scatterplot of Observed Standarized Effect Size and Eta-Squared
190
0.25
Eta-Squared
0.2
Eta-Squared
0.15
0.1
0.05
0.02 0.01
0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Standardized Effect Size, d
191
192
32
d and 2, K=2, n= 30
Scatterplot of Observed Standardized Effect Size and OmegaSquared
0.09
0.25
0.2
0.06
Omega-Squared 0.15
Omega-Squared
0 0.2 0.4 0.6 0.8 1 1.2
0.05 0.04 0.03 0.02 0.01 0 0 -0.01 0.01 0.02 0.03 0.04 0.05 Eta-Squared 0.06 0.07 0.08 0.09 0.1
0.1
0.05
193
194
When K > 2
When there are more than two groups, the specific relationships hold among 2,2, and ICC These are close to being linear A few examples follow
0.6 0.5 0.4 Omega-Squared 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Eta-Squared
195
196
197
33
199
200
Relationship Summaries
When K= 2, all four are related, not linear, but specifically related When K>2 relationships of d with the other three are not specific When K>2 relationships of Eta-squared, Omega-squared, and ICC are linear and predictable without error (R2 ~ 1)
201
202
CAUTION
When you read that d is related to any of the other three measures and these are used to convert d to 2 or convert 2 to d be very cautious. Clearly these commonly used equations are useful ONLY when K= 2 and there are reasonably large groups. You will find this relationship used inappropriately in many cases of meta analysis and other applications.
Dr. Jack Barnette Eval Inst 2007 - Effect Size 203
34
Myths Associated
The p value and the effect size, as well as other association measures, are independent The relationship between d and 2 is independent of sample sizes The relationships of the effect sizes are specific when there are 2 or more samples Confidence intervals around effect sizes are useful
Dr. Jack Barnette Eval Inst 2007 - Effect Size 206
Contact Information
J. Jackson Barnette, PhD Senior Associate Dean for Academic Affairs Professor of Biostatistics School of Public Health University of Alabama at Birmingham barnette @ uab.edu (205) 934 4552
Dr. Jack Barnette Eval Inst 2007 - Effect Size 207
Want More???
AEA day-long workshop will probably be conducted at the AEA meeting in Baltimore in Nov. Contact me if you have questions or if I can provide additional information in your use of these measures Now to the Workshop Evaluation
208
35