Sie sind auf Seite 1von 41

William Gosset

(aka Student)
(1876-1937)
Worked in quality
control at the
Guinness brewery
and could not
publish under his
own name.
The t-test

What can this test tell you?

If there is a statistically significant


difference between two means, when:

The sample size is less than 25.


The data is normally distributed
t-test
x1 x2
t=
(s12/n1) + (s22/n2)

(x x)2
SD =
x1 = mean of first sample n1
x2 = mean of second sample
s1 = standard deviation of first sample
s2 = standard deviation of second sample
n1 = number of measurements in first sample
n2 = number of measurements in second sample
Worked example
Does the pH of soil affects seed
germination of a specific plant species?
Group 1: eight pots with soil at pH 5.5
Group 2: eight pots with soil at pH 7.0
50 seeds planted in each pot and the
number that germinated in each pot was
recorded.
What is the null hypothesis (H0)?

H0 = there is no statistically significant difference


between the germination success of seeds in two
soils of different pH

HA = there is a significant difference between the


germination of seeds in two soils of different pH

If the value for t exceeds the critical value (P =


0.05), then you can reject the null hypothesis.
Construct the following table
Pot Group 1
(pH5.5)
(x x)2 Group 2
(pH7.0)
(x x)2
1 38 1.27 39 20.25
2 41 3.52 45 2.25
3 43 15.02 41 6.25
4 39 0.02 46 6.25
5 37 4.52 48 20.25
6 38 1.27 39 20.25
7 41 3.52 46 6.25
8 36 9.77 44 0.25
Mean 39.125 1.27 43.5 20.25
38.88 82.0
Calculate standard deviation for both groups

Group 1:
(x x)2 38.88 = 2.36
SD = =
n1 81

Group 2:
(x x)2 82.0
SD =
n1
= 81
= 3.42
Using your means and SDs, calculate value
for t
x 1 x2
t=
(s12/n1) + (s22/n2)

39.1 43.5 -4.4


t= =
0.696 + 1.462
(2.362/8) + (3.422/8)

t = 2.99
Compare our calculated value of r with the relevant critical
value in the stats table of critical values

Our value of t = 2.99


Degrees of freedom = n1 + n2 2 = 14

D.F. Critical Value


Our value for t exceeds the
(P = 0.05)
critical value, so we can
14 2.15 reject the null hypothesis.
15 2.13 We can conclude that there is a
16 2.12 significant difference between the
two means, so pH does affect the
17 2.11
germination rate for this plant.
18 2.10
Spearmans Rank Correlation Coefficient

What you need to know:

What is a correlation?

How do we know if the correlation


between two variables is statistically
significant?

How do we calculate Spearmans Rank


Correlation coefficient?
Does being good at maths make you better at biology?

Student Maths exam Biology


score exam score
Ahmed 57 83
Bernard 45 37
Charlotte 72 41
Demi 78 85
Eustace 53 56
Ferdinand 63 85
Gemma 86 77
Hector 98 87
Ivor 59 70
Jasmine 71 59

Is there a statistically significant correlation between


these two sets of results?
Spearmans Rank Correlation Coefficient: rs
Where:

6di2
rs = 1 - n = the number of individuals
n3 -n in the sample

di = difference in the rank of


the two measurements
made on an individual
rs will be a number between 1 and +1
A negative value for rs implies a negative
correlation
A positive value for rs implies a positive
correlation
H0 = there is no statistically significant correlation between
Measurement 1 and Measurement 2

H0 = there is no statistically significant correlation between


Maths scores and Biology scores

HA = there is a statistically significant correlation between


Measurement 1 and Measurement 2

HA = there is a statistically significant correlation between


Maths scores and Biology scores
Step 1: Rank each set of data (lowest to highest)
Student Maths Maths Biology Biology
exam rank exam rank
score score
Alex 57 3 83 7
Bernard 45 1 37 1
Charlotte 72 7 41 2
Demi 78 8 85 8.5
8
Eustace 53 2 56 3
Ferdinand 63 5 85 8.5
9
Gemma 86 9 77 6
Hector 98 10 87 10
Ivor 59 4 70 5
Jasmine 71 6 59 4

Where two or more scores are tied...


... each is assigned an average rank
Step 2: Work out the differences in ranks (maths biology)
Student Maths Maths Biology Biology
exam rank exam rank di d i2
score score
Alex 57 3 83 7 -4 16
Bernard 45 1 37 1 0 0
Charlotte 72 7 41 2 5 25
Demi 78 8 85 8.5 - 0.5 0.25
Eustace 53 2 56 3 -1 1
Ferdinand 63 5 85 8.5 - 3.5 12.25
Gemma 86 9 77 6 3 9
Hector 98 10 87 10 0 0
Ivor 59 4 70 5 -1 1
Jasmine 71 6 59 4 2 4
di2 68.5
Step 3: Work out the square of the differences
Step 4: Work out the sum of the square of the differences
Step 5: Work out the correlation coefficient, rs

6di2 n = 10
rs = 1 -
n3 - n
di2 = 68.5
6(68.5) 6(68.5)
rs = 1 - =1-
103 - 10 1000 - 10

411 = 1 0.415 = 0.585


=1-
990
Step 6: Compare your calculated value of rs with the
relevant critical value in your stats table

For n = 10 and p = 0.05, the critical value of rs is


0.648

Our value of rs is 0.585

Because this is below the critical value, we


cannot reject H0

There is no statistically significant correlation


between Maths scores and Biology scores
Do peoples stress levels
increase the closer they live to
Mad Geoffs Chaotic Firework
Mad Geoffs Chaotic Factory?
Firework Factory

Acacia Ave.

Cortisol is a stress hormone

The more stressed an individual is, the higher their blood


cortisol levels will be
Resident Address Blood cortisol level
(g/ml)
Karl (Caretaker) 2 (The Factory) 13.4
Lillie 8 22.6
Melanie 10 23.4
Nigel 12 18.6
Olga 12 17.4
Peter 14 16.8
Quentin 16 15.2
Rajesh 16 10.2
Susan 18 9.8
Toni 18 12.6
Uri 18 8.8
Vanessa 20 7.5

H0 = there is no significantly significant correlation between


proximity to the fireworks factory and blood cortisol
levels
Resident Address Address Blood Crotisol
rank cortisol rank di d i2
level
(g/ml)
Karl 2 1 13.4 6 -5 25
Lillie 8 2 22.6 11 -9 81
Melanie 10 3 23.4 12 -9 81
Nigel 12 4.5 18.6 10 -5.5 30.25
Olga 12 4.5 17.4 9 -4.5 20.25
Peter 14 6 16.8 8 -2 4
Quentin 16 7.5 15.2 7 0.5 0.25
Rajesh 16 7.5 10.2 4 3.5 12.25
Susan 18 10 9.8 3 7 49
Toni 18 10 12.6 5 5 25
Uri 18 10 8.8 2 8 64
Vanessa 20 12 7.5 1 11 121
di2 513
6di2 n = 12
rs = 1 -
n3 - n
di2 = 513
6(513) 6(513)
rs = 1 - =1-
123 - 12 1728 - 12

3078 = 1 1.794 = - 0.794


=1-
1716
When comparing rs to the critical value ignore the sign on rs

For n = 12 and p = 0.05, the critical value of rs is


0.587
Our value of rs is 0.794

Because this is above the critical value, we can


reject H0

There is a statistically significant correlation


between proximity to the fireworks factory and
blood cortisol levels

rs is negative so there is a negative correlation...


... the further one lives from the factory, the lower
ones blood cortisol levels
Chi-squared (2) test
This test compares measurements relating to
the frequency of individuals in defined
categories e.g. the numbers of white and
purple flowers in a population of pea plants.
Chi-squared is used to test if the observed
frequency fits the frequency you expected or
predicted.
How do we calculate the expected
frequency?
You might expect the observed frequency of
your data to match a specific ratio. e.g. a 3:1
ratio of phenotypes in a genetic cross.
Or you may predict a homogenous distribution
of individuals in an environment. e.g. numbers
of daisies counted in quadrats on a field.
Note: In some cases you might expect the observed
frequencies to match the expected, in others you
might hope for a difference between them.
Example 1: GENETICS

Comparing the observed frequency of


different types of maize grains with the
expected ratio calculated using a
Punnett square.
The photo shows four different phenotypes for maize grain,
as follows:
Purple & Smooth (A), Purple & Shrunken (B), Yellow &
Smooth (C) and Yellow & Shrunken (D)
The Punnett square below shows the expected
ratio of phenotypes from crosses of four
genotypes of maize.
Gametes PS Ps pS ps

PS PPSS PPSs PpSS PpSs

Ps PPSs PPss PpSs Ppss

pS PpSS PpSs ppSS ppSs

ps PpSs Ppss ppSs ppss

A:B:C:D = 9:3:3:1
What is the null hypothesis (H0)?

H0 = there is no statistically significant difference


between the observed frequency of maize grains
and the expected frequency (the 9:3:3:1 ratio)

HA = there is a significant difference between the


observed frequency of maize grains and the
expected frequency

If the value for 2 exceeds the critical value (P =


0.05), then you can reject the null hypothesis.
Calculating 2
(O E)2
2 =
E

O = the observed results


E = the expected (or predicted) results
E (O-E)2
Phenotype O O-E (O-E)2
(9:3:3:1) E

A 271 244 27 729 2.99

B 73 81 -8 64 0.88

C 63 81 -18 324 4.00

D 26 27 -1 1 0.04

433 433 2= 7.91


Compare your calculated value of 2 with the critical value
in your stats table

Our value of 2 = 7.91


Degrees of freedom = no. of categories - 1 = 3

D.F. Critical Value


Our value for 2 exceeds the
(P = 0.05)
critical value, so we can reject
1 3.84 the null hypothesis.
2 5.99 There is a significant difference
3 7.82 between our expected and
observed ratios. i.e. they are a
4 9.49 poor fit.
5 11.07
Example 2: ECOLOGY
One section of a river was trawled and four
species of fish counted and frequencies
recorded.
The expected frequency is equal numbers of
the four fish species to be present in the
sample.
What is the null hypothesis (H0)?

H0 = there is no statistically significant difference


between the observed frequency of fish species and
the expected frequency.

HA = there is a significant difference between the


observed frequency of fish and the expected
frequency

If the value for 2 exceeds the critical value (P =


0.05), then you can reject the null hypothesis.
Calculating 2
(O E)2
2 =
E

O = the observed results


E = the expected (or predicted) results
(O-E)2
Species O E O-E (O-E)2
E

Rudd 15 10 5 25 2.5

Roach 15 10 5 25 2.5

Dace 4 10 -6 36 3.6

Bream 6 10 -4 16 1.6

40 40 2= 10.2
Compare your calculated value of 2 with the critical value
in your table of critical values.

Our value of 2 = 10.2


Degrees of freedom = no. of categories - 1 = 3

D.F. Critical Value


Our value for 2 exceeds the
(P = 0.05)
critical value, so we can reject
1 3.84 the null hypothesis.
2 5.99 There is a significant difference
3 7.82 between our expected and
observed frequencies of fish
4 9.49 species.
5 11.07
Example 3: CONTINGENCY TABLES
You can use contingency tables to calculate
expected frequencies when the relationship
between two quantities is being investigated.

In this example we will look


at the incidence of colour
blindness in both males and
females.
What is the null hypothesis (H0)?

H0 = there is no statistically significant difference


between the observed frequency of colour blindness
in males and females.

HA = there is a significant difference between the


between the observed frequency of colour blindness
in males and females

If the value for 2 exceeds the critical value (P =


0.05), then you can reject the null hypothesis.
Observed frequencies Males Females

Colour blind 56 14

Not colour blind 754 536

Expected Cell Frequency = (Row Total x Column Total)


n

e.g.
The expected frequency (56 + 14) x (56 + 754)
= 42
for colour blind males = 1360
Observed: Males Females
Colour blind 56 14
Not colour blind 754 536
Expected: Males Females
Colour blind 42 28
Not colour blind 768 522

(O E)2 / E Males Females


Colour blind 4.7 14
Not colour blind 754 536
(O E)2
2
= E
= 4.7 + 14 + 754 + 536 = 12.33
Compare your calculated value of 2 with the critical value
in your table of critical values

Our value of 2 = 12.33


Deg of Freedom = (2 rows - 1) x (2 cols 1) = 1

D.F. Critical Value Our value for 2 exceeds the


(P = 0.05) critical value, so we can reject
1 3.84 the null hypothesis.

2 5.99 There is a significant difference


between our expected and
3 7.82 observed frequencies.
4 9.49 The fraction of males with colour
5 11.07 blindness is greater than that in
females. The difference cannot
be attributed to chance alone.

Das könnte Ihnen auch gefallen