T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008

t-tests, ANOVA and regression
- and their application to the statistical analysis of fMRI data
Lorelei Howard and Nick Wright

MfD 2008
Overview
Why do we need statistics?

P values
T-tests
ANOVA
Why do we need statistics?
To enable us to test experimental hypotheses
H0 = null hypothesis
H1 = experimental hypothesis
In terms of fMRI
Null = no difference in brain activation between these
2 conditions
Exp = there is a difference in brain activation between
these 2 conditions
2 types of statistics
Descriptive Stats
e.g., mean and standard deviation (S.D)
Inferential statistics
t-tests, ANOVAs and regression
Issues when making inferences
So how do we know whether the effect
observed in our sample was genuine?
We dont
Instead we use p values to indicate our

level of certainty that our results represent
a genuine effect present in the whole
population
P values
P values = the probability that the observed
result was obtained by chance
i.e. when the null hypothesis is true
level is set a priori (Usually 0.05)
If p < level then we reject the null hypothesis

and accept the experimental hypothesis
95% certain that our experimental effect is genuine
If however, p > level then we reject the
experimental hypothesis and accept the null
hypothesis
Two types of errors
Type I error = false positive
level of 0.05 means that there is 5% risk

that a type I error will be encountered
Type II error = false negative

t-tests
Compare two group means
Hypothetical experiment
Time
Q does viewing pictures of the Simpson and the Griffin

family activate the same brain regions?
Condition 1 = Simpson family faces

Condition 2 = Griffin family faces
Calculating T
Difference between the means divided by the pooled
standard error of the mean
x1 x 2
t
s x1 x2
2 2
s1 s2
Group 1 Group 2
s x1 x2
n1 n2
How do we apply this to fMRI
data analysis?
Time
Degrees of freedom
= number of unconstrained data points
Which in this case = number of data points
1.
Can use t value and df to find the

associated p value
Then compare to the level
Different types of t-test
2 sample t tests
Related = two samples related, i.e. same
people in both conditions
Independent = two independent samples, i.e.
diff people in 2 conditions
One sample t tests

compare the mean of one sample to a given
value
Another approach to group differences
Analysis Of VAriance (ANOVA)

Variances not means
Multiple groups
e.g. Different facial expressions
H0 = no differences between groups

H1 = differences between groups
Calculating F
F = the between group variance divided by
the within group variance
the model variance/error variance
for F to be significant the between group

variance should be considerably larger
than the within group variance
What can be concluded from a
significant ANOVA?
There is a significant difference between

the groups
NOT where this difference lies
Finding exactly where the differences lie

requires further statistical analyses
Different types of ANOVA
One-way ANOVA
One factor with more than 2 levels
Factorial ANOVAs
More than 1 factor
Mixed design ANOVAs

Some factors independent, others related
Conclusions
T-tests assess if two group means differ
significantly
Can compare two samples or one sample
to a given value
ANOVAs compare more than two groups
or more complicated scenarios
They use variances instead of means
Further reading
Howell. Statistical methods for psychologists
Howitt and Cramer. An introduction to statistics in psychology
Huettel. Functional magnetic resonance imaging (especially chapter 12)
Acknowledgements
MfD Slides 2005 2007

PART 2
Correlation
Regression
Relevance to GLM and SPM
Correlation
Strength and direction of the relationship
between variables
Scattergrams
Y Y Y
Y Y Y
X X
Positive correlation Negative correlation No correlation

Describe correlation:
covariance
A statistic representing the degree to which 2
variables vary together n
Covariance formula
( x x)( y
i i y)
cov( x, y ) i 1
n
n
cf. variance formula i
( x x ) 2
S x2 i 1
n
but
the absolute value of cov(x,y) is also a function of the
standard deviations of x and y.
Describe correlation: Pearson
correlation coefficient (r)
Equation cov( x, y)
rxy s = st dev of sample
sx s y
r = -1 (max. negative correlation); r = 0 (no constant
relationship); r = 1 (max. positive correlation)
Limitations:
5
Sensitive to extreme values, e.g. 2
0
0 1 2 3 4 5 6
r is an estimate from the sample, but does it

represent the population parameter?
Relationship not a prediction.
Summary
Correlation
Regression
Relevance to SPM
Regression
Regression: Prediction of one variable
from knowledge of one or more other
variables.
Regression v. correlation: Regression
allows you to predict one variable from the
other (not just say if there is an
association).
Linear regression aims to fit a straight line
to data that for any value of x gives the
best prediction of y.
Best fit line, minimising sum
of squared errors
Describing the line as in GCSE maths: y = m x + c
Here, = bx + a
= bx + a
: predicted value of y
b: slope of regression line
a: intercept

= , predicted
= y i , observed
= residual
Residual error (): Difference between obtained and predicted values of

y (i.e. y- ).
Best fit line (values of b and a) is the one that minimises the sum of squared
errors (SSerror) (y- )2
How to minimise SSerror
Minimise (y- )2 , which is (y-
bx+a)2
Sums of squared error (SSerror)

Plotting SSerror for each
possible regression line gives a
parabola.
Minimum SSerror is at the
bottom of the curve where the
gradient is zero and this can
found with calculus.
Take partial derivatives of (y-
Gradient = 0
bx-a)2 and solve for 0 as min SSerror
simultaneous equations, giving: Values of a and b
rs y
b a y bx
sx
How good is the model?
We can calculate the regression line for any data, but how well does it fit
the data?
Total variance = predicted variance + error variance

sy2 = s2 + ser2
Also, it can be shown that r2 is the proportion of the variance in y that is
explained by our regression model
r2 = s2 / sy2
Insert r2 sy2 into sy2 = s2 + ser2 and rearrange to get:
ser2 = sy2 (1 r2)

From this we can see that the greater the correlation the smaller the error
variance, so the better our prediction
Is the model significant?
i.e. do we get a significantly better prediction of y
from our regression equation than by just
predicting the mean?
F-statistic:
complicated
rearranging
s2 r2 (n - 2)2
F(df ,df ) = =......=
er
ser2 1 r2
And it follows that:
r (n - 2) So all we need to
t(n-2) = know are r and n !
1 r2
Summary
Correlation
Regression
Relevance to SPM
General Linear Model
Linear regression is actually a form of the
General Linear Model where the
parameters are b, the slope of the line,
and a, the intercept.
y = bx + a +
A General Linear Model is just any model
that describes the data in terms of a
straight line
One voxel: The GLM
Our aim: Solve equation for tells us how much BOLD signal is explained by X
b3
b4
b5
= b6 +
b7
b8
b9
Y = X b + e
Multiple regression
Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a
single dependent variable, y
The different x variables are combined in a linear way
and each has its own regression coefficient:
y = b0 + b1x1+ b2x2 +..+ bnxn +
The a parameters reflect the independent contribution of

each independent variable, x, to the value of the
dependent variable, y.
i.e. the amount of variance in y that is accounted for by
each x variable after all the other x variables have been
accounted for
SPM
Linear regression is a GLM that models the effect of one

independent variable, x, on one dependent variable, y
Multiple Regression models the effect of several

independent variables, x1, x2 etc, on one dependent
variable, y
Both are types of General Linear Model
This is what SPM does and will be explained soon

Summary
Correlation
Regression
Relevance to SPM
Thanks!

T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008

Hochgeladen von

Copyright:

Verfügbare Formate

t-tests, ANOVA and regression

- and their application to the statistical analysis of fMRI data

Lorelei Howard and Nick Wright

Why do we need statistics?

Instead we use p values to indicate our

level is set a priori (Usually 0.05)

If p < level then we reject the null hypothesis

level of 0.05 means that there is 5% risk

Type II error = false negative

Q does viewing pictures of the Simpson and the Griffin

Condition 1 = Simpson family faces

Can use t value and df to find the

One sample t tests

Analysis Of VAriance (ANOVA)

H0 = no differences between groups

for F to be significant the between group

There is a significant difference between

NOT where this difference lies

Finding exactly where the differences lie

Mixed design ANOVAs

Howell. Statistical methods for psychologists

Howitt and Cramer. An introduction to statistics in psychology

Huettel. Functional magnetic resonance imaging (especially chapter 12)

MfD Slides 2005 2007

Positive correlation Negative correlation No correlation

Sensitive to extreme values, e.g. 2

r is an estimate from the sample, but does it

Residual error (): Difference between obtained and predicted values of

Sums of squared error (SSerror)

Total variance = predicted variance + error variance

Insert r2 sy2 into sy2 = s2 + ser2 and rearrange to get:

ser2 = sy2 (1 r2)

y = b0 + b1x1+ b2x2 +..+ bnxn +

The a parameters reflect the independent contribution of

Linear regression is a GLM that models the effect of one

Multiple Regression models the effect of several

Both are types of General Linear Model

This is what SPM does and will be explained soon

Das könnte Ihnen auch gefallen