Sie sind auf Seite 1von 24

Lecture 2: Correlations !

nread: Chapter 4 !

npractical: Chapter 4 !

last week !

nmean ! nvariance !

nstandard deviation !

nstandard error !

σ x =

∑ ( X − X ) 2 S = N − 1 s N
∑ ( X − X ) 2
S =
N − 1
s
N

ninferential test statistics = variance explained by the model!

!!! variance not explained by the model!

! !

the mistakes we can make !

nwe think weve accounted for more systematic variance than unsystematic !

ni.e. theres a statistically significanteffect! nbut there isnt - a TYPE I error !

if our criterion is p<.05, run the experiment 100 times, and on 5 of those times, youll get a significant (p<.05) effect!

nwe think there was too much unsystematic variation !

ni.e. theres no statistically significant effect! nbut there is - a TYPE II error !

minimising mistakes !

neffect size !

nhow close are the predictions of the model to the observed outcomes? ! nyou can correlate the predicted vs. the observed !

a smalleffect: r=.1 ! a mediumeffect: r=.3 ! a largeeffect: r=.5 !

nand so we calculate how much of the variance we have explained(and how good our model is!) !

youll have to wait until next week, on correlation

!

minimising mistakes !

nstatistical power !

nthe power of a test is the probability that a given test will find an effect assuming that one exists in the population !

power = [1-p(Type II error)]!

nCohen suggested we aim for an 80% chance of detecting an effect if one genuinely exists ! nto calculate power!

select α (.05), find effect size (r), enter no. of participants ! or, instead, calculate no. of participants given anticipated effect size, α, and Cohens .8 power criterion !

for a small effect (r=.1), N=783 ! for a medium effect (r=.3), N=85 ! for a large effect (r=.5), N=28 !

correlations

!

nsomething a little more powerful than the mean !

nwe assume that there is a linear relationship between two variables (the linear model: fitting a straight line to our data) !

that there is a linear relationship between two variables (the linear model: fitting a straight line

scatterplots !

scatterplots !

step 1: covariance !

nwhen one variable deviates from its mean, the other variable deviates from its mean in a similar way!

does variance in one variable predict variance in the other?

the other variable deviates from its mean in a similar way ! does variance in one

how do we calculate it? !

nvariance: we multiply the deviation by itself!

( x i x )( x i x )

N 1

ncovariance: we multiply one deviation by the

other:!

( x i x )( y i y )

N 1

− 1 n   covariance: we multiply one deviation by the other: ! ∑ ( x

how do we calculate it? !

!

!

( x i x )( y i y )

ncovariance: !

nbut, the more observations the larger the

( N 1)

number

so we standardize it!

ncf. z-scores !

z = X X

s
s
observations the larger the ( N − 1) number so we standardize it ! n  

Step 2:!

!

!

( x i x )( y i y )

ncovariance: !

nbut, the more observations the larger the

( N 1)

number

so we standardize it!

nPearsons R !

( x i x )( y i y )

( N 1) s x s y

we standardize it ! n   Pearson ’ s R ! ∑ ( x i −
we standardize it ! n   Pearson ’ s R ! ∑ ( x i −

nice things about correlations !

( x i x )( y i y )

( N 1) s x s y

( x i x )( x i x )

( N 1)

nthe equation is not unlike that for variance ! nthe equation forces a result between +1 (they covary perfectly and in the same way), 0 (there is no covariance at all), and -1 (they covary perfectly but in the opposite way).! nr 2 is a measure of how much variability in one variable can be explainedby variability in the other.!

r-squared

$

&

&

%

( x i x )( y i y )

( N 1) s x s y

'

)

(

)

2

!

nif I know the variance in number of adverts shown, I can predict x% of the variance in packets eaten.! nfor each unit of variance in adverts shown, we get x units of variance in packets eaten!

!

in packets eaten. ! n   for each unit of variance in adverts shown, we get

correlation: a summary !

nthe correlation is a measure of the strength of the relationship between one variable and another.!

nhence its use in calculating effect sizeand power!

nPearsons r calculated when both variables are on continuous (interval) scales.!

correlation and causality !

high anxiety correlates with lower exam performance ! does a state of anxiety cause worse marks? NO!

high anxiety correlates with having done less revision !

less revision correlates with lower exam performance !

ncorrelating 2 variables may miss an important relationship with a 3rd unmeasured variable ! nwhat causes what? !

ncorrelations do not imply causality!!

different types of correlation I!

nPearsons r is for parametric data:!

nboth normally distributed, on interval scales !

nor

if one variable has just two categories !

the t-test!!

nSpearmans ρ (rho, r s ) !

nnon-normal (e.g. ordinal, such as grades) ! nworks by ranking the data, and then running Pearsons r on the ranked data !

nKendalls τ (tau) !

nfor small datasets, many tied ranks !

npossibly better than Spearmans

!

different types of correlation II!

nBiserial correlation !

nwhen one variable is dichotomous, but there is an underlying continuum (e.g. pass/fail on an exam)!

npoint-biserial correlation !

nwhen one variable is dichotomous, and it is a true dichotomy (e.g. gender) !

bivariate vs. partial correlation !

nbivariate correlations tell you how much

variance is shared (and typically it is calculated between two variables).!

npartial correlations tell you how much of the unshared variance is actually shared with a third variable (more or less ) !

a graphical account of partial correlation!

a graphical account of partial correlation !

a graphical account of partial correlation!

§the (bivariate) correlation is a little like fitting a line to the data points (= simple regression)

§each points distance from the line (the residual) is the error relative to the model - i.e. its variance that cannot be explained

§a 3rd variable (e.g. age) might correlate with (i.e. predict) some of that variance

40 age 40 30 20
40
age
40
30
20

finally !

nyou can ignore the distinction between partial and semi-partial correlations (see Howell Statistical methods for psychologyif you are interested!) !

nnext week: !regression (incl. multiple !regression) !

!