Statistics 111 - Lecture 9: Introduction To Inference

Statistics 111 - Lecture 9 Introduction to Inference
Sampling Distributions for Counts and Proportions
June 10, 2008
Stat 111 - Lecture 9 - Proportions
Administrative Notes
Homework 3 is due on Monday, June 15th
Covers chapters 1-5 in textbook
Exam on Monday, June 15th Review session on Thursday
June 10, 2008
Last Class
Focused on models for continuous data: using the sample mean as our estimate of population mean Sampling Distributionof the Sample Mean
how does the sample mean change over different samples? Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n . . .
Population
Parameter:
x x x x x x
Distribution of these values?
June 10, 2008
Todays Class
We will now focus on count data: categorical data that takes on only two different values Success (Yi = 1) or Failure (Yi = 0)
Goal is to estimate population proportion: p = proportion of Yi = 1 in population
June 10, 2008
Examples
Gender: our class has 83 women and 42 men What is proportion of women in Penn student population? Presidential Election: out of 2000 people sampled, 1150 will vote for McCain in upcoming election What proportion of total population will vote for McCain? Quality Control: Inspection of a sample of 100 microchips from a large shipment shows 10 failures What is proportion of failures in all shipments?
June 10, 2008 Stat 111 - Lecture 9 - Proportions 5
Inference for Count Data

Goal for count data is to estimate the population proportion p From a sample of size n, we can calculate two statistics: 1. sample count Y 2. sample proportion = Y/n Use sample proportion as our estimate of population proportionp Sampling Distributionof the Sample Proportion how does sample proportion change over different samples? Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n . . Stat 111 - Lecture 9 - Proportions . x x x x x x
Population
Parameter: p
Distribution of these values?
June 10, 2008
The Binomial Setting for Count Data

1. Fixed number n of observations (or trials)
2. Each observation is independent

3. Each observation falls into 1 of 2 categories:
1. Success (Y = 1) or Failure (Y = 0)
4. Each observation has the same probability of success: p = P(Y = 1)
June 10, 2008
Binomial Distribution for Sample Count Sample count Y (number of Yi=1 in sample of size n) has a Binomial distribution The binomial distribution has two parameters:
number of trials n and population proportion p P(X=k) = nCk * pk (1-p)(n-k)
Binomial formula accounts for

number of success: pk number of failures : (1-p)n-k different orders of success/failures: nCk = n!/(k!(n-k)!)
Binomial Probability Histogram

Can make histogram out of these probabilities
Can add up bars of histogram to get any probability we want: eg. P(Y < 4) Different values of n and p have different histograms, but Table C in book has probabilities for many values of n and p
Binomial Table
June 10, 2008
10
Example: Genetics
If a couple are both carriers of a certain disease, then their children each have probability 0.25 of being born with disease Suppose that the couple has 4 children P(none of their children have the disease)? P(X=0) = 4!/(0!*4!) * .250 * (1-.25)4 P(at least two children have the disease)?
P(Y 2) = P(Y = 2) +P(Y = 3) +P(Y = 4) = 0.2109 +0.0469 +0.0039 (from table) = 0.2617
June 10, 2008
11
Example: Quality Control

A worker inspects a sample of n=20 microchips from a large shipment The probability of a microchip being faulty is 10% (p = 0.10) What is the probability that there are less than three failures in the sample?
P(Y < 3) = P(Y = 0) + P(Y =1) + P(Y = 2) = 0.1216 + 0.2702 + 0.2852 (from table) = 0.677
June 10, 2008
12
Sample Proportions
Usually, we are more interested in a sample proportion = Y/n instead of a sample count P ( < k ) = P( Y < n*k) Example: a worker inspects a sample of 20 microchips from a large shipment with probability of a microchip being faulty is 0.1 What is the probability that our sample proportion of faulty chips is less than 0.05?
P(
June 10, 2008
< .05 ) = P( Y < 1) = P(Y=0) = .1216

0.05 x 20
13
Mean and Variance of Binomial Counts

If our sample count Y is a random variable with a Binomial distribution, what is the mean and variance of Y across all samples?
Useful since we only observe the value of Y for our sample but what are the values in other samples?
We can calculate the mean and variance of a Binomial distribution with parameters n and p: Y = n*p 2 = n*p*(1-p) = (n*p*(1-p))
Mean/Variance of Binomial Proportions

Sample proportion is a linear transformation of the sample count ( = Y/n ) = 1/n * mean(Y) = 1/n * np = p Mean of sample proportion is true probability of success p 2 = 1/n2 Var(Y) = 1/n2 * n*p*(1-p) = p(1-p)/n
Variance of sample proportion decreases as sample size n increases!

Variance over Long-Run

Lower variance with larger sample size means that sample proportion will tend to be closer to population mean in larger samples Long-run behaviour of two different coin tossing runs. Much less likely to get unexpected events in larger samples
June 10, 2008
16
Binomial Probabilities in Large Samples

In large samples, it is often tedious to calculate probabilities using the binomial distribution Example: Gallup poll for presidential election
Bush has 49% of vote in population. What is the probability that Bush gets a count over 550 in a sample of 1000 people?
P(Y > 550) = P(Y = 551) + P(Y = 552) + + P(Y =1000)
= 450 terms to look up in the table!
We can instead use the fact that for large samples, the Binomial distribution is closely approximated by the Normal distribution
June 10, 2008
18
Normal Approximation to Binomial

If count Y follows a binomial distribution with parameters n and p, then Y approximately follows a Normal distribution with mean and variance: Y = n*p This approximation is only good if n is large enough.
Rule of thumb for large enough:np 10 and n(1-p) 10
Also works for sample proportion: = Y/n a Normal distribution with mean and variance
follows
June 10, 2008
19
Example: Quality Control

Sample of 100 microchips (with usual 10% of microchips are faulty. What is the probability there are at least 17 bad chips in our sample? Using Binomial calculation/table is tedious. Instead use Normal approximation:
Mean = np = 1000.10 = 10 Var = np(1-p) = 1000.100.90 = 9
= P(Z 2.33) =1- P(Z 2.33) = 0.01 (from table)

Example: Gallup Poll

Bush has 49% of vote in population What is the probability that Bush gets sample proportion over 0.51 in sample of size 1000? Use normal distribution with
mean = p = 0.49 and variance p(1-p)/n = 0.000245
= P(Z 1.27) =1- P(Z 1.27) = 0.102
June 10, 2008
21
Why does Normal Approximation work?

Central Limit Theorem: in large samples, the distribution of the sample mean is approx. Normal Well, our count data takes on two different values: Success (Yi = 1) or Failure (Yi = 0) The sample proportion is the same as the sample mean for count data!
So, Central Limit Theorem works for sample proportions as well!

Next Class - Lecture 10

Review session on Wednesday/Thursday
Show up with questions!
June 10, 2008
23

Statistics 111 - Lecture 9: Introduction To Inference

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics 111 - Lecture 9: Introduction To Inference

Hochgeladen von

Copyright:

Verfügbare Formate

Statistics 111 - Lecture 9 Introduction to Inference

Sampling Distributions for Counts and Proportions

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Exam on Monday, June 15th Review session on Thursday

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Distribution of these values?

June 10, 2008

Goal is to estimate population proportion: p = proportion of Yi = 1 in population

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Inference for Count Data

Distribution of these values?

June 10, 2008

The Binomial Setting for Count Data

2. Each observation is independent

4. Each observation has the same probability of success: p = P(Y = 1)

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Binomial formula accounts for

Binomial Probability Histogram

June 10, 2008

Stat 111 - Lecture 9 - Proportions

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Example: Quality Control

June 10, 2008

Stat 111 - Lecture 9 - Proportions

< .05 ) = P( Y < 1) = P(Y=0) = .1216

Mean and Variance of Binomial Counts

Mean/Variance of Binomial Proportions

Variance of sample proportion decreases as sample size n increases!

Variance over Long-Run

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Binomial Probabilities in Large Samples

= 450 terms to look up in the table!

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Normal Approximation to Binomial

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Example: Quality Control

= P(Z 2.33) =1- P(Z 2.33) = 0.01 (from table)

Example: Gallup Poll

= P(Z 1.27) =1- P(Z 1.27) = 0.102

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Why does Normal Approximation work?

So, Central Limit Theorem works for sample proportions as well!

Next Class - Lecture 10

June 10, 2008

Stat 111 - Lecture 9 - Proportions

Das könnte Ihnen auch gefallen