Beruflich Dokumente
Kultur Dokumente
Chapter 6
June 12, 2012
Rebecca Slack
1
Relationship between population and sample
Random Number Tables
Randomized Clinical Trials
Estimation of the Mean of a Distribution
Estimation of the Variance of a Distribution
There is a lot of material here. What we dont
finish today, we will pick up on Thursday. Focus
on the variance will be moved to Thursdays
lecture.
2
A parameter that is part
of a model for a
population is called a
population parameter.
We use data to estimate
population parameters.
Any summary found
from the data is a
statistic.
The statistics that
estimate population
parameters are called
sample statistics.
3 Population and Sample
A population is the group we want to study. We
often call the population of interest the reference
population, the target population or the study
population.
Parameters are used to describe a population:
(mean)
(standard deviation)
p (proportion)
In reality, we never know the values of parameters.
We try to estimate the values of parameters, or we
assume the parameters have specific values and
see if we can find data to support that assumption.
4 Population and Sample
A sample is a subset of the population. A
sample should be representative of the
population.
There are many ways to select a sample
Convenience
Systematic
Random.
Statistics are used to describe a sample:
(mean)
S (standard deviation)
(proportion)
X
p
=
n
x x
s
n
i
i
The sampling distribution of is the
distribution of values of over all possible
samples with n subjects that could have been
selected from the population.
This requires using your imagination and thinking
hypothetically
Notation: Let X
1
, , X
n
be a random sample
selected from some population with mean .
Then
x
x
( ) = X E
29 Estimating the Mean
Definition: , an estimator for , is said to be
unbiased if .
Since , we see that , the sample
mean, is an unbiased estimator of , the
population mean.
The sample median is also an unbiased
estimator of the population mean, .
So, why do we use as our estimator of the
population mean, instead of the median?
If the underlying distribution of the population is
normal, then is the unbiased estimator of
with the smallest variance.
u
u
( ) u u =
E
30
( ) = X E
X
X
Estimating the Mean
Why is it preferable to estimate a parameter
from larger samples rather than from smaller
samples?
Consider each element in the sample as a
piece of information. It is intuitive that the more
information we have about a parameter, the
better we are able to estimate that parameter.
That is, we can be more precise (smaller
variance) when we have more data.
31 Estimating the Mean
Let X
1
, , X
n
be a random sample from a
population with mean and variance .
The set of sample means in repeated random
samples of size n from this population has
variance = /n.
The standard deviation of this set of sample
means is then (/n)= /n and is referred to
as the standard error of the mean (SEM or
SE).
32 Estimating the Mean
Let X
1
, , X
n
be a random sample from a
population with mean and variance . Then
for large n,
regardless of the underlying distribution of
X
1
, , X
n
!
|
|
.
|
\
|
n
N X
2
, ~
o
\
|
10
6 . 20
, 112 ~
2
N X
Estimating the Mean
Since and we want
we first find
|
|
.
|
\
|
10
6 . 20
, 112 ~
2
N X
15 . 2
51 . 6
14
10
6 . 20
112 98
=
=
|
.
|
\
|
=
L
Z
15 . 2
51 . 6
14
10
6 . 20
112 126
+ =
+
=
|
.
|
\
|
=
R
Z
( ) 126 98 Pr < < X
35 Estimating the Mean
Since we find
|
|
.
|
\
|
10
6 . 20
, 112 ~
2
N X
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( ) | |
| |
9684 . 0
0158 . 0 9842 . 0
9842 . 0 1 9842 . 0
15 . 2 1 15 . 2
15 . 2 15 . 2
15 . 2 Pr 15 . 2 Pr
98 Pr 126 Pr 126 98 Pr
=
=
=
u u =
u u =
< + < =
< < = < <
Z Z
X X X
\
|
=
n
s
x
t
38 Estimating the Mean
William S. Gosset
1876-1937
39 Estimating the Mean
t
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
5 df
40 Estimating the Mean
t
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Students t Distribution
2 df
8 df
32 df
41 Estimating the Mean
t
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Students t Distribution
2 df
8 df
32 df
N(0,1)
42 Estimating the Mean
When the sample size is more than 30
(i.e, df 30) the standard normal
distribution is a good approximation to the
t distribution.
) 1 , 0 ( ~ ~ ) 30 ( N Z df t
>
43 Estimating the Mean
Recall that the sample mean ( )is a point
estimate of the population mean ().
A confidence interval is an interval estimate.
That is, a confidence interval is a range of
values that we use to estimate some
parameter. We usually construct what we call
95% confidence intervals.
In general, we construct 100% x (1-)
confidence intervals. So, if we want a 95%
CI, then = 0.05. If we want a 90% CI, then
= 0.10.
X
44 Estimating the Mean
Age
40
42
44
46
48
50
52
54
56
Confidence Interval
95% confidence intervals
true mean,
45 Estimating the Mean
Yes: Of the collection of all 95% confidence
intervals that could be constructed from
repeated random samples of size n, 95% of
them will contain the parameter .
Be careful: The probability that the parameter,
, is contained in a particular confidence
interval is either 0 or 1, depending on the true
(unknown) value of the parameter, .
46 Estimating the Mean
A 100% x (1-) confidence interval (CI) for
the mean of a normal distribution with
an unknown variance is given by:
|
.
|
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
47 Estimating the Mean
A 100% x (1-) confidence interval (CI) for
the mean of a normal distribution with
an unknown variance is given by:
|
.
|
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
standard error of the mean
48 Estimating the Mean
A 100% x (1-) confidence interval (CI) for
the mean of a normal distribution with
an unknown variance is given by:
|
.
|
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
from Table 5, p 831
49 Estimating the Mean
t
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
Students t Distribution
sample size n=10
df = 9
975 . 0 , 9 2 / 05 . 0 1 , 9 ) 2 / 1 , 1 (
t t t
n
= =
o
2.262
0.025
0.975
50 Estimating the Mean
When the sample size is more than 30
(i.e, df 30) the standard normal
distribution is a good approximation to the
t distribution.
) 1 , 0 ( ~ ~ ) 30 ( N Z df t
>
51 Estimating the Mean
An approximate 100% x (1-) confidence
interval (CI) for the mean of a normal
distribution with an unknown variance is
given by (n > 30):
|
.
|
\
|
+
n
s
z x
n
s
z x
2 / 1 2 / 1
,
o o
from Table 3, p 825
52 Estimating the Mean
The length of a confidence interval is:
or
n
s
t
n ) 2 / 1 , 1 (
2
o
n
s
z
2 / 1
2
o
margin of error
53 Estimating the Mean
The length of a confidence interval
is affected by n, s, and .
It decreases as n increases.
It increases as s increases.
It decreases as increases.
54
n
s
z
2 / 1
2
o
n
s
t
n ) 2 / 1 , 1 (
2
o
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
Estimating the Mean
|
.
|
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
9 . 116 = x 7 . 21 = s
262 . 2
) 975 . 0 , 9 (
) 2 / 05 . 0 1 , 1 10 ( ) 2 / 1 , 1 (
= =
=
t
t t
n o
56 Estimating the Mean
|
.
|
\
|
+
n
s
t x
n
s
t x
n n ) 2 / 1 , 1 ( ) 2 / 1 , 1 (
,
o o
|
.
|
\
|
+
10
7 . 21
262 . 2 9 . 116 ,
10
7 . 21
262 . 2 9 . 116
|
.
|
\
|
+ 5 . 15 9 . 116 , 5 . 15 9 . 116
|
.
|
\
|
4 . 132 , 4 . 101
57 Estimating the Mean
Most of the slides were adapted from a lecture
created for this course by Mark F. Munsell in 2006.
Several slides were adapted from lectures I
created for BIST 501 at Georgetown University
using the text: Stats, Data, and Models by
DeVeaux, Velleman, and Bock.
58