Sie sind auf Seite 1von 29

Inference About A Population

•Population mean
•Population proportion
Inference About A Population…
Population

Sample

Inference

Statistic
Parameter

We will develop techniques to estimate and test three


population parameters:
Population Mean
Population Proportion p
Population SD σ
12.2
Inference With Variance Unknown…
Previously, we looked at estimating and testing the
population mean when the population standard
deviation ( ) was known or given:

But how often do we know the actual population


variance?

Instead, we use the Student t-statistic, given by:


12.3
Example 1
It is likely that in the near future nations will have
to do more to save the environment.

Possible actions include reducing energy use and


recycling.

Currently (2007) most products manufactured from


recycled material are considerably more expensive
than those manufactured from material found in
the earth.
12.4
Example 1
Newspapers are an exception.

It can be profitable to recycle newspaper.

A major expense is the collection from homes. In recent years a


number of companies have gone into the business of collecting
used newspapers from households and recycling them.

A financial analyst for one such company has recently computed


that the firm would make a profit if the mean weekly newspaper
collection from each household exceeded 2.0 pounds.

12.5
Example 1

In a study to determine the feasibility of a


recycling plant, a random sample of 148
households was drawn from a large community,
and the weekly weight of newspapers discarded
for recycling for each household was recorded.

Do these data provide sufficient evidence to


allow the analyst to conclude that a recycling
plant would be profitable?

12.6
IDENTIFY

Example 1
Our objective is to describe the population of the amount of
newspaper discarded per household, which is an interval
variable. Thus the parameter to be tested is the population
mean µ.

We want to know if there is enough evidence to conclude that


the mean is greater than 2. Thus,
H1 : µ > 2

Therefore we set our usual null hypothesis to:


H0 : µ = 2
12.7
IDENTIFY

Example 1
x 
The test statistic is: t
s/ n
ν = n −1

Because the alternative hypothesis is:


H1: µ > 2

the rejection region becomes:


t  t  ,  t .01,148  t .01,150  2.351
12.8
True in reality False in reality

Reject null Type 1 error α Correct decision


1-β =power of the
test.

Accept null Correct decision Type 2 error β


1-α = significance
level.

α=prob of rejecting a correct null. Type 1 error


Pvalue= prob of obtaining the calculated test
statistic.
Decision rule: Reject the null hypothesis if
pvalue< α
R: t-test
>data1=read.csv("C:\\Users\\TOSHIBA\\Desktop\\ex1_week3_onepop_mean.csv")
> newspaper=data1[,1]
> t.test(newspaper,alternative="greater",mu=2)

One Sample t-test

data: newspaper
t = 2.2369, df = 147, p-value = 0.0134
alternative hypothesis: true mean is greater than 2
95 percent confidence interval:
2.046905 Inf
sample estimates:
mean of x
2.180405

t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired =


FALSE, var.equal = FALSE, conf.level = 0.95, ...)
INTERPRET
Example 1
The value of the test statistic is t = 2.24 and its p-value is .
0134.
Our decision will depend on the level of significance α, if
α=0.01, there is not enough evidence to infer that the mean
weight of discarded newspapers is greater than 2.0.
Note that there is some evidence; the p-value is .0134.
However, because we wanted the Type I error to be small we
insisted on a 1% significance level. Thus, we cannot conclude
that the recycling plant would be profitable.
If α=0.05, we will reject the null which means that we accept
the alternative and we conclude that the recycling plant would
be profitable.
12.11
Identifying Factors
Factors that identify the t-test and estimator of
:

12.12
Check Required Conditions
The Student t distribution is robust, which means that if the
population is nonnormal, the results of the t-test and
confidence interval estimate are still valid provided that the
population is “not extremely nonnormal”.

To check this requirement, draw a histogram of the data


and see how “bell shaped” the resulting figure is. If a
histogram is extremely skewed (say in the case of an
exponential distribution), that could be considered
“extremely nonnormal” and hence t-statistics would be not
be valid in this case.
12.13
> hist(newspaper)
Histogram of newspaper

25
20
F re q u e n c y

15
10
5
0

0 1 2 3 4

newspaper
Inference: Population Proportion…
When data are nominal, we count the number of occurrences of
each value and calculate proportions. Thus, the parameter of
interest in describing a population of nominal data is the
population proportion p.

This parameter was based on the binomial experiment.

Recall the use of this statistic:

where p-hat ( ) is the sample proportion: x successes in a


sample size of n items.

12.15
Inference: Population Proportion…
When np and n(1–p) are both greater than 5, the
sampling distribution of is approximately normal
with

mean:

standard deviation:

Hence:
12.16
Inference: Population Proportion
Test statistic for p:

The confidence interval estimator for p is given by:

(both of which require that np>5 and n(1–p)>5)


12.17
Example 2
After the polls close on election day networks compete
to be the first to predict which candidate will win.

The predictions are based on counts in certain


precincts and on exit polls.

Exit polls are conducted by asking random samples of


voters who have just exited from the polling booth
(hence the name) for which candidate they voted.

12.18
Example 2
In American presidential elections the candidate who
receives the most votes in a state receives the state’s
entire Electoral College vote.

In practice, this means that either the Democrat or the


Republican candidate will win.

Suppose that the results of an exit poll in one state were


recorded where 1 = Democrat and 2 = Republican.
Xm12-05*
12.19
Example 2
The polls close at 8:00 P.M.

Can the networks conclude from these data that


the Republican candidate will win the state?

Should the network announce at 8:01 P.M. that


the Republican candidate will win?

12.20
IDENTIFY

Example 2
The problem objective is to describe the population of votes in
the state. The data are nominal because the values are
“Democrat” and “Republican.” Thus the parameter to be
tested is the proportion of votes in the entire state that are for
the Republican candidate. Because we want to determine
whether the network can declare the Republican to be the
winner at 8:01 P.M., the alternative hypothesis is

H1: p > .50

And hence our null hypothesis becomes:


H0: p = .50
12.21
IDENTIFY

Example 2
The test statistic is
p̂  p
z
p(1  p) / n

12.22
>data2=read.csv("C:\\Users\\TOSHIBA\\Desktop\\eslasca\\ex2_week3_onepop
_prop.csv")
> votes=data2[,1]
> x=407
> n=765
> prop.test(x,n,alternative="greater")

1-sample proportions test with continuity correction

data: x out of n, null probability 0.5


X-squared = 3.0118, df = 1, p-value = 0.04133
alternative hypothesis: true p is greater than 0.5
95 percent confidence interval:
0.5016378 1.0000000
sample estimates:
p
0.5320261
INTERPRET

Example 2
At the 5% significance level we reject the null
hypothesis and conclude that there is enough
evidence to infer that the Republican candidate
will win the state.

However, is this the right decision?

12.24
INTERPRET

Example 2
One of the key issues to consider here is the cost
of Type I and Type II errors.

A Type I error occurs if we conclude that the


Republican will win when in fact he has lost.

12.25
INTERPRET

Example 2
Such an error would mean that a network would
announce at 8:01 P.M. that the Republican has
won and then later in the evening would have to
admit to a mistake.

If a particular network were the only one that


made this error it would cast doubt on their
integrity and possibly affect the number of
viewers.
12.26
INTERPRET

Example 2
This is exactly what happened on the evening of the
U. S. presidential elections in November 2000.

Shortly after the polls closed at 8:00 P.M. all the


networks declared that the Democratic candidate
Albert Gore would win in the state of Florida.

A couple of hours later, the networks admitted that


a mistake had been made and the Republican
candidate George W. Bush had won.
12.27
INTERPRET

Example 2
Several hours later they again admitted a mistake and finally
declared the race too close to call.

Fortunately for each network all the networks made the same
mistake.

However, if one network had not done this it would have developed
a better track record, which could have been used in future
advertisements for news shows and likely drawn more viewers.

Considering the costs of Type I and II errors it would have been


better to use a 1%significance level.

12.28
Identifying Factors
Factors that identify the z-test and interval
estimator of p:

12.29

Das könnte Ihnen auch gefallen