Sie sind auf Seite 1von 68

Hypotheses Testing

Deng Dan
The Department of Medical Statistics

Definition of Hypotheses Testing


Hypotheses

tests are procedures for


making rational decisions about the
reality of effects.

It

is made based on probability


theorem.

An example for introduction


An

research office is interested in checking on


whether beginning 5th-graders in their district
score at the national norm level (5.0) in spelling,
a subset of the language Arts standardized test.

A random

sample of n=50 cases is drawn from


the population of all beginning 5th grade
students in the Mesa School District.

In

the sample of 50 students, the mean is 4.72


and the standard deviation is 1.15.

Question exists
Information

n=50
And
So

from the sample:


mean =4.72 SD=1.15

we know the mean of population is 5.0

there is difference between the sample


statistic and population parameter.

Cont
The

difference compared to sample mean of 4.72


and population mean of 5.0 is 5.0 4.72=0.28, but
we know that if we were to take another sample
the difference would not be exactly the same.

It

might be greater, it might be smaller.

What

kind of population difference is consistent


with this observed value of 4.72?

The source of the difference


Sampling

error

This study was sampling and we got the sample


was randomly from the population we known.
Remember that the sampling distribution is
known and the probability to get the quantity of
error can be estimated.
True

difference
The difference really exists because the sample
drawn from another different population which
parameter is not what we known.
That is the two population differs.

The way we may thinking


Naturally

we assume this sample we got


drawn from the population that we known
which parameter was 5.0.

As

a sample drawn from a population and it


represents the population, so we can use the
information from the sample to illustrated the
population characteristics.

And more
We also

think, from the population we got the


particular sample which differed 0.28 from the
population by only just one sampling, obviously
we generated the quantity of the error in this
manner was quiet easy.

Thus

it was not rarely occasional we got the


error. The probability gained the sample must
be not little.

Cont

If

we computed out the probability gained the


sample we will make a rational decisions based
on probability theory.

If

the probability is not small we believe the


sampling error is not rare. From this
population to get this sample is reasonable.

Otherwise

if the probability value is very small


we believe the sampling error is rare, and we
have to doubt that this sample which cause the
sampling error can draw from the population?

How to get the probability?


So

the sample statistic mean 4.72 differs from


the parameter mean 5.0

According

to the sampling distribution, the


sampling error obeys normal distribution.

So

we can compute the probability getting


sample drawn from the population by
calculating the area under the curve.

Cont
In

this manner of sampling we can gain the


samples probability by using sampling
distribution theory.

The

standard error of the mean is known or the


sample size is large enough.
x 0
u

The

standard error of the mean is estimated.


x 0
t
Sx

Cont
From

the u-transformation formula we can get


the u value:

x 0
u
=(4.72-5.0)/(1.15/7.07)=-1.09
x
And

from the gained u value we can check the u


table to find the probability ( p value).

The

p value is 0.2898.

Cont
In

order to determine the sample is whether easy


to be drawn we firstly prescribe a special p value,
which is commonly 0.05.

If

the events probability values is less than the


special value, we can conclude that this sampling
error or extreme is impossible to be drawn in just
one sampling.

So

we compare the p value of them.

This

is the application of
the rule of the the small probability event.

Review
The

small probability value

Habitually, we define P(A) 0.05 as small


probability value.

The

small probability event

If random event A occurrences less than 5 times in


100 times repeated observations, we usually named the
random event A as the small probability event.

The

rule of the small probability event

If random event A is defined as small probability


event, we can consider that the random event A would
occur impossibly at one randomized observation.

Steps of hypotheses testing


Setting

up testing hypotheses and level of


significance

Selecting

suitable method and computing


relevant statistic

Computing
Making

p value

reasonable decisions, which consist


of statistical conclusion and professional
conclusion

The

first step of hypotheses testing

It

is to specify the null hypothesis (H0) and


the alternative hypothesis (H1)

Typically

the 0.05 or the 0.01 level is used. It is


the small probability value. The probability
value computed in step 3 is compared with the
significance level chosen in this step.

One

tail or two tail test depending on your


professional knowledge.

It

is the second step.

We calculate

a statistic analogous to the


parameter specified by the null hypothesis.

The

calculation are made assuming that the


null hypothesis is true.

Usually

we use the statistics name to name the


applied methods.

It

is the third step.

We calculate

the probability value, which often


called as the p value.

The

probability of obtaining a statistic as


different or more different from the parameter
specified in the null hypothesis as the statistic
computed from the data.

The

calculation are made assuming that the null


hypothesis is true.

If

the probability is less than or equal to the


significance level, then the null hypothesis is
rejected.
When H0 is rejected, the outcome is said to be
statistically significant.

If

the probability is greater than the significance


level, then the null hypothesis is not rejected.
When H0 is not rejected, then the outcome is said
to be not statistically significant.

If

the outcome is statistically significant, then


the null hypothesis is rejected in favor of the
alternative hypothesis.

The

final step is to describe the result and the


statistical conclusion in an understandable way.

Please

pay attention to the phenomena:


There is the two kind of the error occurred
during hypothesis testing process because of the
probability inference.

Definition:

The approach is to set up an


assumption that there is no contraction between
the believed result and the sample result and that
the difference therefore can be ascribed solely to
chance.

It

is the null hypotheses that is actually tested, not


the research hypotheses.

If the research concerns whether one method of presenting


pictorial stimuli leads to better recognition than another.
The null hypotheses would most likely be that there is no
difference between methods.
H0: 1- 2=0 or 1 = 2

Definition:

If the null hypothesis is rejected, that


is taken as evidence in favor of the research
hypothesis which is called as the alternative
hypothesis.

In

usual practice we do not say that the research


hypothesis has been proved only that it has
been supported.

If the research concerns whether one method of


presenting pictorial stimuli leads to better recognition
than another. The alternative hypothesis would be:
H 1 : 1 2

Hypothesis

testing implies that a difference that


a statistic would differ as much or more from
parameter in either direction would be counted.

A probability

computed considering differences


in both directions is called a two-tailed
probability, otherwise it is a one-tailed test.

Probability

values for one-tailed tests are


always one half the value for two-tailed tests as
long as the effect is in the specified direction.

Type error and type error

error
A true null hypothesis can be incorrectly rejected.
The probability of a type error is , the
significance level. It is directly controlled by the
researchers.

Type

error
A false null hypothesis can fail to be rejected.
The probability of a type error is , but this
value is not directly set by the researchers.

type

Cont
Statistical
decision

True state of null


hypothesis
H0 true

H0 false

Reject H0

Type error

Correct

Do not
reject H0

Correct

type error

Cont
The

value of is inversely related to the value


of . For example: The smaller the value of ,
the larger the value of .

In

order to minimize the two type error at the


same time, we have to enlarge the sample size n.
It is the best way.

t-test

The purpose
To test

whether the samples population is the


same population which parameter is known.

If

the testing is for two independent samples the


hypothesis is to test whether parameters of two
population which samples drawn from are
equal.

If

we use it for paired design data test, it test


the parameters of two population which
samples drawn from are equal.

General formula
The

formula shown below is used for testing


hypotheses about a parameter:
u=

The

statistic hypothesized value


the standard error of the statistic

statistic is an estimation of the parameter in


question.
The hypothesized value is the value of the
parameter specified in the null hypothesis.
The standard error of the statistic is assumed to be
known and the sampling distribution of the statistic
is assumed to normal. Or n is large enough.

Cont
When x

is estimated by sx, the significance test


uses the t distribution instead of the normal
distribution.

That

is standard error of statistic is estimated

also.
statistic hypothesized value
t=
the standard error of the statistic

Common types of t-test


u-tests

of , known

t-tests

of , standard deviation estimated

t-tests

of differences between means, dependent


means by related-paired design

t-tests

of differences between means, independent


means by random design

Suppose

the mean score of all 10-year old children on


an anxiety scale is 7.0 and SD is 2.0(=7.0,=2.0 ).

A research

were interested in whether 10-year old


children with alcoholic parents had a different mean
score on the anxiety scale.

The

researcher drawn a sample including 16 children


and its mean is 8.1 (n=16,x=8.1

).

Steps
Setting

up hypothesis and significance level

H0: =7.0

H1: 7.0

=0.05 (two tail)

Compute

the suitable statistic


x 0
u
=(8.1-7.0)/(2/4)=2.2

Compute

the p-value (probability)

Because of u>u=1.96, we can get p-value is less than


. At the same time, we can look for p-value from the u
table and p=0.0028 (one side just 0.014)
Making

a reasonable decision

0.014

0.972
-2.2
-1.96

0.014
2.2
1.96

Since the p-value is less than the significance


level, the effect is statistically significant.
Since the effect is significant, the null
hypothesis should be rejected.
it is concluded that the mean anxiety score
of 10-year old children with alcoholic parents
is higher than the population mean.

How to express
The

results might be described in a report as


follows:

The

mean score of children of alcoholic parents


(alcoholic=8.1) was significantly higher than the
population mean (=7.0), u=2.2 and p=0.028.

Summary of u-test
Specify

the null hypothesis and an alternative


hypothesis
Set up the significance level (usual be 0.05) and
specify two-side or one-side test
Calculate statistic u value
Use a u table or a special u value (usual be 1.96
two side or 1.645 one side) to determine p-value
Make a rational conclusion

Assumption of u-test
Normal
Scores

distribution

are independent

is known

The

sample size is large enough ( n 30)

Since

the statistic is u, we call it u-test.

Suppose

a researcher wish to test whether the


mean score of fifth graders on a test of reading
achievement in his or her city differed from then
national mean of 76. (=76, but is unknown)

The

researcher randomly sampled the scores of


20 students.

The

scores are shown as: 72 69 98 87 78 76


78 66 85 97 84 86 88 76 79 82 82 91
69 74. (we can get: n=20,x=80.85,s=8.87)

Steps

Setting

up hypothesis and significance level

H0: =76

H1: 76

=0.05 (two tail)

Compute

the suitable statistic


x 0
t
=(80.85-76)/(8.87/4.47)=2.44
Sx

Compute

the p-value (probability)

the probability value for t can be determined using


t table. The degree of freedom for t is equal to ( n1)=19. So we can find the two-tailed probability value
of a t of 2.44 with 19 df is 0.025 from the t table.
Making

a reasonable decision

Cont
Since

the probability value is less than the


significance level ,the effect is statistically
significant.

Since

the effect is significant, the null


hypothesis should be rejected.

It

is concluded that the mean reading


achievement score of children in the city in
question is higher than the population mean.

How to express
The

results of this experimental result might be


described as follows:

The

mean reading achievement score of fifth


grade children in the sample (x=80.85

) was
significantly higher than the mean reading
achievement score nationally(=76), t(=19)=2.44
and p=0.025.

Summary of t-test
Specify

the null hypothesis and an alternative


hypothesis

Set

up the significance level (usual be 0.05) and


specify two-side or one-side test

Calculate
Use

statistic t value

a t table to determine p-value from t and df

Make

a rational conclusion

Assumption of t-test
Normal
Scores
Since

distribution

are independent

the statistic is t, we call it t-test.

The

t test used when the scores are not


independent is sometimes called a correlated t
test and sometimes called a related-paired t
test.

When

the same subjects are tested in two


experimental conditions, scores in the two
conditions are not independent because
subjects who score well in one condition tend to
score well in the other condition.

Whats related-paired design?


When

the same subjects are tested in two


experimental conditions

When

a same subject is tested before it is treated


with some therapies and it is tested again after
the treatment is taken

The

two very similar subjects (twins) is designed


as a pair and they are tested in two experimental
conditions.

An

experimenter interested in whether the time


it takes to respond to a visual signal is different
from the time it takes to respond to an auditory
signal.

Ten

subjects are tested with both the visual


signal and with the auditory signal.

To avoid

confounding with practice effects, half


are in the auditory condition first and the other
half are in the visual task first.

The reaction times (in milliseconds) are shown:


subject
Visual
Auditory
1
420
380
2
235
230
3
280
300
4
360
260
5
305
295
6
215
190
7
200
200
8
460
410
9
345
330
10
375
380

Compute the difference of two conditions:


subject

Visual

Auditory

Vis-Aud

420

380

40

235

230

280

300

-20

360

260

100

305

295

10

215

190

25

200

200

460

410

50

345

330

15

10

375

380

-5

Mean

319.5

297.5

22

Steps

Setting

up hypothesis and significance level

H0: d =0

H 1 : d 0

=0.05 (two tail)

Compute

the suitable statistic


d
d 0
t

=(22-0)/(34.41/3.16)=2.02
Sd
Sd n
Compute the p-value (probability)
the probability value for t can be determined using t
table. The degree of freedom for t is equal to ( n-1)=9. So
we can find the two-tailed probability value of a t of 2.02
with 9 df is bigger than 0.05 but less than 0.1 from the t
table ( it is exactly 0.74).
Making

a reasonable decision

Cont
Since

the probability value is more than the


significance level ,the effect does not reach
statistically significant.

Since

the effect is no significant, the null


hypothesis should be not rejected.

It

is concluded that the reaction times of the ten


subjects in the two conditions are not different.

How to express
The

results of this experimental result might be


described as follows:

The

mean time to respond to a visual stimulus


(mean=319.5) was longer than the mean time to
respond to a auditory stimulus (mean=297.5).
However, this difference was not statistically
significant, t(=9)=2.02 and p=0.74.

Summary of paired t-test


Compute

a difference score for each subject

Test

whether the mean difference score is


significantly different from zero.

The

other steps of paired t-test are similar to


the above two tests.

Assumption of paired t-test


Each

subject is sampled: independently from


each other subject.

The

difference scores are normally distributed.


If both raw scores are normally distributed
then the difference score will be normally
distributed too.

The

two raw scores from each subject do not


have to be independent of each other.

This

section explains how to test the difference


between group means for significance.

An

experiment was conducted comparing the


memory of expert and novice chess players.

The

mean number of pieces correctly placed


across several chess positions was computed for
each subject and the scores for each subject.

The

data of average
number of pieces
recalled for groups
lists right.

The

question is
whether the difference
between the means of
these two groups of
subjects is statistically
significant.

Novices
37.1
39.1
40.5
45.5
51.3
52.6
55.7
55.9
57.7

Tournament
45.6
51.2
56.4
58.1
71.1
74.9
75.9
80.3
85.3

Steps
Setting

up hypothesis

H0: 1 = 2
Setting

H1: 1 2

up the significance level


=0.05 (two tail)

Cont
Compute

pool standard error ( the estimated


standard error of the difference between means)
SS1=(X1-XX 1)= X1-(X1)/(n1-1)=507.08
SS2=(X2-XX 2)= X2-(X2)/(n2-1)=1730.64
dftotal=(n1-1)+(n2-1)=n1+n2-2=9+9-2=16
SStotal= SS1+ SS2= 507.08+ 1730.64=2237.72
SDtotal= SStotal/dftotal= 2237.72/ 16=139.86
n=2/(1/n1+1/n2)=9
S(X1X - X2X )=sqrt(2* SDtotal/n)=5.2715

Cont
Compute

the statistic t:

x1 x 2
t
S x1 x2

=(48.38-67.88)/5.2715=3.70

n1 1 n2 1 =16

Cont
the

probability value for t can be determined


using t table.

The

degree of freedom for t is equal to (n1-1)+


(n2-1)=n1+n2-2=16.

So

we can find the two-tailed probability value


of a t of 3.70 with 16 df is less than 0.05 from
the t table ( it is exactly 0.0018).

Cont
Since

the probability value is less than the


significance level ,the effect is statistically
significant.

Since

the effect is significant, the null


hypothesis should be rejected.

It

is concluded that the mean memory score for


experts is higher than the mean memory score
for novices.

How to express
The

results of this experimental result might be


described as follows:

The

mean number of pieces recalled by


tournament players (mean=48.37) was
significantly higher than the mean number of
pieces recalled by novices (mean=67.88),
t(=16)=3.70 and p=0.0018.

Summary of two independent data t-test


Specify

the null hypothesis and an alternative


hypothesis
Set up the significance level (usual be 0.05) and
specify two-side or one-side test
Compute pool standard error
Calculate statistic t value
Use a t table to determine p-value from t and df
Make a rational conclusion

Assumptions of two independent data t-test


The

populations are normally distributed

Variance
Scores

in the two population should be equal

are independent: each subject provides


only one score

Exercises
The

scores of a random sample of 8 students on


a physical test are given. Test to see if the
sample mean is significantly different from 65
at the 0.05 level.
60 62 67 69 70 72 75 80

x 0
t
=(69.375-65)/(6.55/2.83)=1.89
Sx
df=n-1=7

d
d 0
t

df=n-1
Cont
Sd
Sd n
A experiment is conducted on the effect of alcohol on
perceptual motor ability. Ten subjects are each tested
twice, once after having two drinks and once after
having two glasses of water. The two tests were on two
different days to give the alcohol a chance to wear off.

Half of the subjects were given alcohol first and half


were given water first. The scores of the 10 students are
shown below. The first number for each subject is their
performance in the water condition. Higher scores
reflect better performance. Test to see if alcohol had a
significant effect at the 0.05 level.
The paired t-test should be applied
first time: 16 15 11 20 19 14 13 15 14 16
second time: 13 13 12 16 16 11 10 15 9 16

Cont
The

scores on a vocabulary test of a group of 20


years old men and a group of 60 years old men
are shown below.
Test the difference for significance using the
0.05 level.
20 years old: 27 26 21 24 15 18 17 12 13
60 years old: 26 29 29 29 27 15 20 27

x1 x 2
t
S x1 x2

n1 1 n2 1

Das könnte Ihnen auch gefallen