Sie sind auf Seite 1von 11

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.

Lecture 1.3
Men willingly believe what they wish. --Julius Caesar (100-44 BC)

1. Chi-Square Test of Goodness- of- Fit 2. Conditional Probability. Independence. Chi-Square Test: Test of Independence

5 Basic Steps in Hypothesis Test


Step 1: Determine the null (H0) and alternative (Ha) hypotheses. Note: Hypotheses are statements ABOUT population parameters NOT ABOUT sample statistics. Step 2: Verify necessary data conditions (assumptions), and if met, summarize the data into an appropriate test statistic (using appropriate data summary, or sample statistic). Step 3: Assuming the null (H0) hypothesis is true, find either rejection region or the p-value. Step 4: Decide whether or not the result is statistically significant based on rejection region: or based on p-value, the probability of getting a test statistic as extreme or more extreme (in the direction of Ha) than the observed value of the test statistic, assuming H0 is true. Step 5: Report the conclusion in the context of the problem (question of interest).

One-Sample Hypothesis Test for Population Proportion (Large n)


Test Scenario Population Proportion Data 1 Sample Categorical (YES/NO, Success/Failure, True/False) Is the true population proportion of adults who believe in life after death is more than 70%? Large Sample Hypothesis Test for Population Proportion H0: p=p0 Decision Rule: Ha: p>p0 ! ! !!!! Ha: p!p0 ! ! !!!!!! Ha: p<p0 ! ! !!!!! Assumptions
Large n --for CI check: ! ! ! ! !! !! ! ! ! ! ! ! ! --for HT check: !!! ! !!! !! ! ! ! ! ! ! ! !

Population Parameter !

Sample Statistics

Response

! !! ! ! !!!"#$!!$! ! !

Explanatory Variable ________

Test Statistic ! p0 p Z= p0 (1 ! p0 ) n
where

Confidence Interval

= p

X n

(1 ! p ) p n 2 " Z1!(! /2) % ! )$ ni = p(1 ! p ' # E & Z1!(! /2) p

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

One-Sample Hypothesis Test for Population Proportion (small n) (Binomial Test)


Small Sample Hypothesis Test for Population Proportion H0: p=p0
Required assumption: Data are a random sample from Binomial population. Test Statistics: X (observed # successes), X ~ Binomial(n, p0) Decision rule is based on p-value: Ha: p>p0 p-value = P(X!x) = P(X=x)+P(X=x+1) +...+ P(X=n) Ha: p<p0 p-value = P(X"x) = P(X=0)+P(X=1) +...+ P(X=x) Ha: p!p0 p-value = P(X#x) =1- P(X=x)

If the p-value " #, then reject H0. If the p-value >#, then fail to reject H0. Recall Probability of exactly k successes in trials: ! ! ! !! ! ! ! ! ! ! ! !! !! ! ! !! !!! ! ! ! ! !! !!! !!!where ! ! ! ! !! ! ! ! !

Two-Sample Hypothesis Test for Difference in Population Proportions (Large n1, n2)
Test Scenario Difference in Population Proportions Explanatory Variable 2 Independent Categorical Group (1,2) Samples (YES/NO, (Gender, Success/Failure, Student/NonTrue/False) Students) Is there a significant difference in proportions of athletes among male and female students at BU? Large Sample Hypothesis Test for Difference in Population Proportions H0: p1-p2=0 Decision Rule: Ha: p1>p2 ! ! !!!! Ha: p1!p2 ! ! !!!!!! Ha: p1<p2 ! ! !!!!! Data Population Parameter !! ! !! Sample Statistics !! ! !! Response

Assumptions
Large n1, n2 !! !! ! !!and ! ! !! ! ! ! ! ! ! and !! !! ! !! ! ! !! ! ! ! ! ! !

Test Statistic 1 ' p 2 p Z= &1 1# (1 ' p )$ p $n + n ! ! 2 " % 1


where p 1 = X 1 / n1 , p 2 = X 2 / n2

Confidence Interval

1 ! p 2 ) Z1!(" / 2) (p

1 (1 ! p 1 ) p (1 ! p 2) p + 2 n1 n2
# ! ! "
2

= p

X1 + X 2 n1 + n2

& pq 2Z1'() / 2) + p1 q1 + p 2 q 2 Z1' ( ni = $ $ ES %


where ES=|p2-p1| (under Ha)

p=

p1 + p 2 ,q = 1! p 2

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Chi-Square Tests
Name of the Test Goodness-of-Fit Test (how well categorical data FITS expected distribution) Description Uses Multinomial Distribution: response is the choice of a category from more than 2 possible categories H0 involves statements about what the proportions should be for the k response categories H0: p1 = p01, p2 = p02,... , pk = p0k Ha: H0 is false Note: all proportions add up to 1. ! ! !! !!!!! ! ! ! Test of Independence (if two categorical variables can be considered INDEPENDENT) ! Consider 1 population and 2 categorical variables. ! Test if the two categorical variables appear to be related (dependent) for a given population of interest. H0: [variable 1] and [variable 2] are INDEPENDENT Ha: [variable 1] and [variable 2] are DEPENDENT Are angry people more likely to have Heart Disease? Are sleep deprived students are more likely to be stressed? Examples Do people park equally often on all 4 levels of a parking garage on rainy days? Do students register equally likely for morning, afternoon, and evening sessions?

Chi-Square statistic: !! ! !
!! !!! ! !!

, reject H0 if !! ! ! !! ! !!"!!

where Oi is the observed count and Ei is the expected count under the corresponding null hypothesis.

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Test of Goodness of Fit


The goodness of fit test is used to assess if one sample fits well with a specified distribution. Probabilities for the all categories add up to 1. Chi-Square Goodness-of-Fit Test H0: p1 = p01, p2 = p02,... , pk = p0k Test Statistics:!!! ! !
!! !!! ! !!

Decision Rule: reject H0 if !! ! ! !! ! !!"!!, df = k-1 Oi is the observed count and !! ! ! !!!!" !! is the expected count under the corresponding H0.

Example (DAgoustino, Example 7.10): Goodness of Fit for Teen Issues Volunteers at a teen hotline have been assigned based on the assumption that 40% of all calls are drug related, 25% are sex related (e.g., date rape), 25% are stress related, and 10% concern educational issues. For this investigation, each call is classified into one category based on the primary issue raised by the caller. To test the hypothesis, the following data are collected from 120 randomly selected calls placed to the teen hotline. Based on the data, is the assumption regarding the distribution of topic issues appropriate? Topic Issue: Number of calls: Drugs 52 Sex 38 Stress 21 Education 9 Total n=120

Step 1: Define parameter of interest and state the hypothesis. Parameter: pi = the proportion of calls related to (1 Drugs, 2- Sex, 3 Stress, 4 - Education) H0: p1 = 0.40, p2 = 0.25, p3 = 0.25, p4 = 0.10 Ha: H0 is false Significance Level ! = 0.05

Step 2: Summarize the data into an appropriate test statistic: !! ! !


!! !!! ! !!

!!"#$!!! ! !!!!!"#$%&!!"!!"#$%&'($)!! !"! ! !! ! !! ! !!

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

To compute the chi-square statistics, extend the table of observed counts and compute expected counts:

Topic Issue: Number of calls =Oi (Expected Counts)!! ! ! !!!!!" (Oi - Ei)2/Ei

Drugs 52 120*0.4 =48 (52- 48)2/48= 0.33

Sex 38 120*0.25= 2.133

Stress 21

Education 9

Total n=120

$2 = 0.33 + 2.13+ 2.70 +0.75 = 5.913

Step 3: Assuming the H0 is true, define decision: Decision Rule: Reject H0 if ! ! ! ! !! ! !!"!! Using Table B.5 !! ! !" ! ! !! ! ! ! !!!!" Step 4: Decide whether or not the result is statistically significant based on rejection region: Decision: Fail to Reject H0 Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n=120 phone calls, there is no significant evidence at 5% significance level to conclude that volunteers at a teen hotline have been assigned inappropriately.

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: According to M&Ms web site each regular package of Milk Chocolate M&Ms should contain 24% blue, 14% brown, 16% green, 20% orange, 13% red, and 14% yellow M&Ms. Count candies, and use an appropriate test of hypotheses to check if the claimed percentage is consistent with the stated proportion distribution. Step 1: Define parameter of interest and state the hypothesis. H0:________________________________________________ Ha:________________________________________________ Step 2: Summarize the data into an appropriate test statistic: First, create table of observed and expected counts: Colors Blue Brown Green Orange H0 0.24 0.14 0.16 0.2 Oi Ei ! =_____________

Red 0.13

Yellow 0.14

Then compute test statistics

$2 =

df =
Step 3: Assuming the H0 is true, define decision rule:

Step 4: Decide whether or not the result is statistically significant based on rejection region: Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________
6

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: As part of an on-going study, men who quit smoking using a variety of methods are being followed for several years. A group of 350 men (n=350) who quit smoking using a nicotine patch were tracked down 3 years after their quitting date. They were asked if they had successfully quit or if they had gone back to smoking. The possible answers and number of men who answered the question that way are given below. Outcome I haven't smoked since (#1) 188 I don't smoke much anymore, but I occasionally light up (#2) 35 I smoke as much now as I did before (#3) 111 I smoke Total more now than I did before (#4) 16 350

# responses Oi !! ! !!!!!" (Oi - Ei)2/Ei

The researchers wish to test the following hypothesis: H0: p1 = 0.60, p2 = 0.10, p3 = 0.25, p4 = 0.05 Step 1: Define parameter of interest and state the hypothesis. Parameter: H0:________________________________________________ Ha: ________________________________________________ ! = 0.05 Step 2: Summarize the data into an appropriate test statistic: !! ! !
!! !!! ! !!

with df = k-1 =

Step 3: Assuming the H0 is true, define decision rule:

Step 4: Decide whether or not the result is statistically significant based on rejection region:

Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_____________________________________________
7

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Test of Independence
A fundamental question: Is there a relationship between the two variables so that the chance that an individual falls into a particular category for one variable depends upon the particular category they fall into for the other variable? A procedure for assessing the statistical significance of a relationship between categorical variables is the chi-square test of independence. Chi-Square Test of Independence !! ! !
!! !!! ! !!

, reject H0 if !! ! ! !! ! !!"!!, df = (R-1)(C-1)

H0: [variable 1] and [variable 2] are INDEPENDENT Oi is the observed count and !!" ! ! !"#!!!!"#$% ! !!"#$%&!!!!"#$%!!!!"#$%!!is the expected count under the corresponding H0.

Conditional Probability and Independence


Definition: Recall that conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written ! !!! , and is read "the (conditional) probability of A, given B" or "the probability of A under the condition B". When in a random experiment the event B is known to have occurred, the possible outcomes of the experiment are reduced to B, and hence the probability of the occurrence of A is changed from the unconditional probability into the conditional probability given B. Marginal probability is then the unconditional probability P(A) of the event A; that is, the probability of A, regardless of whether event B did or did not occur. The Conditional Probability Rule: ! ! !! ! ! !!!"# !! ! !!! ! !!!!!!!!!!!!!! ! ! ! !

Two random events A and B are (statistically) independent if and only if ! ! ! ! ! ! !!!"# !! ! !!!!!!!!! Thus, if A and B are independent, then their joint probability can be expressed as a simple product of their individual probabilities. Equivalently, for two independent events A and B with non-zero probabilities: !! !!! ! ! ! ! !"!!!!!!! !!! ! ! ! ! In other words, if A and B are independent, then the conditional probability of A, given B is simply the individual (marginal) probability of A alone; likewise, the probability of B given A is simply the probability of B alone.
8

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Example: Are angry people more likely to have Coronary Heart Disease? Coronary heart disease (CHD) is a narrowing of the small blood vessels that supply blood and oxygen to the heart. CHD is also called coronary artery disease. People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a study that followed a random sample of 12,986 people from three locations over about four years. All subjects were free of heart disease at the beginning of the study. The subjects took the Spielberger Trait Anger Scale, which measures how prone a person is to sudden anger. The 8474 people in the sample who had normal blood pressure were classified according to whether they had coronary heart disease (CHD) or not and whether they had low anger, moderate anger, or high anger according to the Anger Scale. CHD * TEMPER Crosstabulation
Count TEMPER Moderate anger 110 4621 4731

CHD Total

CHD No CHD

Low anger 53 3057 3110

High anger 27 606 633

Total 190 8284 8474

1. What proportion of sampled subjects had CHD? (Answer: 0.022) 2. What proportion of High anger subjects had CHD? (Answer: 0.043) 3. What proportion of Moderate anger subjects had CHD? (Answer: 0.023)

4. What proportion of Low anger subjects had CHD? (Answer: 0.017)

5. Do anger classification and coronary heart disease status seem to be independent?

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Step 1: State the hypotheses. H0:__Having CHD is independent of Level of Anger _____________ Ha: _______________________________________________________ Significance level ! =______0.05_______
* TEMPER Crosstabulation Step 2: Summarize CHD the data into an appropriate test statistic: Count TEMPER Moderate anger 110 4621 4731

CHD Total

CHD No CHD

Low anger 53 3057 3110

High anger 27 606 633

Total 190 8284 8474

NOTE: All cell counts must be % 5. First, compute expected counts: !!" ! ! !"#!!!!"#$% ! !!"#$%&!!!!"#$%!!!!"#$% !"#$%&!!!!"#$% !"#$ !!" ! ! !"#!!!!"#$% ! ! ! !"# ! ! !"#! !! !"!#$ !"#"
CHD * TEMPER Crosstabulation TEMPER Moderate anger 110 106.1 4621 4624.9 4731 4731.0

CHD

CHD No CHD

Total

Count Expected Count Count Expected Count Count Expected Count

Low anger 53 69.7 3057 3040.3 3110 3110.0

High anger 27 14.2 606 618.8 633 633.0

Total 190 190.0 8284 8284.0 8474 8474.0

$2 =16.077 with df = (R-1)(C-1)= (2-1)(3-1) =2 Step 3: Assuming the H0 is true, define decision rule: $2!5.99 Step 4: Decide whether or not the result is statistically significant based on rejection region: ___________________________________________________________ Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= __________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________
10

CAS MA 116 Statistics II Summer 2, 2012 Lecture 1.3

Exercise: Are sleep deprived students are more likely to be stressed? Using the following data perform an appropriate test and conclude if there is a relationship between level of stress and the lack of sleep of BU students at 5% significance level. Stressed Sleep Deprived Not Sleep Total Deprived 18 9 Not Stressed 7 12 Total

Step 1: Define parameter of interest and state the hypothesis. H0:________________________________________________ Ha:________________________________________________ Step 2: Summarize the data into an appropriate test statistic: !! ! ! df = Step 3: Assuming the H0 is true, define decision rule: !! ! !! !!
!

! =_____________

Step 4: Decide whether or not the result is statistically significant based on rejection region: Step 5: Report the conclusion in the context of the problem (question of interest). Based on a random sample of n= ___________, there is __________ significant evidence, at level ! = __________, to conclude that_______________________ _________________________________________________________________

11

Das könnte Ihnen auch gefallen