Beruflich Dokumente
Kultur Dokumente
Piligian
Mount
Vernon
Presbyterian
School
Statistics
BVD
Chapter
23
Inference
About
Means
Key
Vocabulary
Terms
-
Summary
Students
t
distribution:
a
family
of
distributions
indexed
by
its
degrees
of
freedom.
Commonly
known
as
t-distributions,
they
are
unimodal,
symmetric,
and
bell-shaped,
but
they
generally
have
fatter
tails
than
the
normal
model.
As
the
degrees
of
freedom
increase,
t-distributions
look
more
and
more
like
the
normal
model.
t-models
correct
for
the
extra
variability
introduced
because
we
estimate
the
population
standard
deviation
from
the
sample
standard
deviation.
Unlike
proportions,
knowing
the
mean
of
the
population
doesnt
tell
us
anything
about
the
standard
deviation
of
the
population,
so
when
we
use
this
estimate,
we
introduce
extra
variability,
particularly
for
small
sample
sizes.
The
t-model
more
accurately
describes
the
sampling
distribution
model
for
sample
means
than
a
normal
model
does.
For
large
sample
sizes,
t-models
look
very
close
to
normal
models
(or
z-models)
Degrees
of
freedom:
the
number
of
values
in
the
final
calculation
of
a
statistic
that
are
free
to
vary.
For
the
purposes
of
our
class,
the
degrees
of
freedom
(df)
are
calculated
as
the
sample
size
minus
1:
= 1
Sampling
distribution
model
for
means:
With
the
appropriate
assumptions,
the
sampling
distribution
model
for
means
is
a
t-distribution
with
( 1)
degrees
of
freedom.
The
test
statistic
used
for
inference
is
()
The
standard
deviation
is
estimated
by
the
SE,
where
=
!
!
One-sample
t-interval
for
the
mean:
When
certain
assumptions
and
conditions
are
met,
the
confidence
interval
for
the
population
mean
is
= !!!
()
where
!!!
is
the
critical
value
from
the
t-distribution
model
with
1 degrees of freedom corresponding
to
the
particular
confidence
level
that
you
specify.
One-sample
t-test
for
the
mean:
a
test
of
the
null
hypothesis
! : = !
by
referring
to
the
following
test
statistic:
!!! =
!
()
where =
Note
in
the
(very,
very,
very)
rare
case
when
you
know
the
population
standard
deviation
(),
then
you
can
use
the
Normal
Model
and
then
use
the
z-statistic
as
your
test
statistic.
Otherwise
use
the
t-model
and
then
use
the
t-statistic
as
your
test
statistic.
For
all
practical
purposes,
youll
never
know
,
so
you
cant
go
wrong
by
using
the
t-model
for
inference
for
means!
So
just
remember
z
for
proportions,
and
t
for
means!
Assumptions
and
Conditions
to
check
for
Inferences
about
Means
Independence
Assumption
the
data
values
should
be
independent.
Conditions
to
check
to
verify
Independence
Assumption:
-Randomization
Condition:
data
come
from
SRS
or
randomized
experiment
-10%
Condition:
sample
size
is
less
than
10%
of
the
population.
We
rarely
need
to
check
the
10%
condition
for
means,
since
our
sample
sizes
are
generally
smaller
than
they
were
for
proportions.
Normal
Population
Assumption
t-models
wont
work
for
data
that
are
badly
skewed,
so
we
assume
the
data
are
from
a
population
that
follows
a
normal
model.
Conditions
to
check
to
verify
the
Normal
Population
Assumption:
-Nearly
Normal
Condition
the
data
come
from
a
distribution
that
is
unimodal
and
symmetric.
Check
by
making
a
histogram.
The
smaller
the
sample
size,
the
more
important
is
it
that
the
data
are
nearly
normal.
Key
concepts
to
remember:
Statistical
inference
for
means
relies
on
the
same
concepts
as
it
does
for
proportions,
but
the
model
is
different.
We
still
infer
the
population
mean
from
the
mean
of
a
representative
sample,
but
we
use
a
t-distribution
model
rather
than
a
normal
distribution
model.
The
ruler
for
measuring
variability
in
sample
means
is
the
standard
error
(s
divided
by
the
square
root
of
n).
Use
this
ruler
to
find
the
margin
of
error,
to
construct
confidence
intervals
and
to
conduct
hypothesis
tests
regarding
means.
The
t-model
is
a
family
of
models,
rather
than
just
one
model.
The
number
of
degrees
of
freedom
dictates
the
shape
of
the
t-distribution.
As
the
number
of
degrees
of
freedom
increase,
the
t-model
converges
to
the
normal
model.
TI
84+
Tips:
To
find
t-model
probabilities,
use
the
tcdf
function.
Just
like
there
is
a
normalcdf
function
on
your
TI84+
to
calculate
the
probabilities
of
getting
a
range
of
z-scores
using
a
normal
model,
there
is
a
tcdf
function
that
does
the
same
thing
for
t-scores
using
a
t-
model.
However,
you
also
have
to
specify
the
degrees
of
freedom
df.
The
syntax
is
tcdf(lower
bound,
upper
bound,
df).
To
find
a
critical
t-value,
use
the
invT
function.
Again,
you
need
to
specify
the
degrees
of
freedom,
so
the
syntax
is
invT(percentile,
df).
Remember
to
use
the
proper
percentile
to
account
for
data
in
both
tails!
(for
example,
if
doing
a
95%
confidence
interval
with
ten
degrees
of
freedom,
you
would
find
the
critical
t-value
via
InvT(0.975,
10).
To
construct
a
confidence
interval
for
means,
go
to
STAT-TEST,
choose
T-Interval,
and
enter
the
appropriate
values
either
for
the
raw
data
(in
a
list),
or
the
summary
statistics
for
, ! , and
(respectively,
the
sample
mean,
the
sample
standard
deviation,
and
the
sample
size)
and
the
desired
confidence
level.
Hit
Calculate,
and
the
calculator
displays
the
CI!
You
must
interpret
the
CI,
and
you
must
also
check
the
nearly
normal
condition.
Use
STATPLOT
to
create
a
histogram
of
the
data
if
you
have
the
raw
data.
To
perform
a
hypothesis
test
for
means,
go
to
STAT
-TEST,
choose
T-test,
and
enter
the
appropriate
values
either
for
the
raw
data
(in
a
list)
or
the
summary
statistics
for
, ! , and
(the
hypothesized
mean,
the
sample
standard
deviation,
and
the
sample
size)
along
with
the
type
of
hypothesis
test
(two
tailed,
one-tailed
upper
tail,
or
one-tailed
lower
tail).
Hit
calculate
and
the
test
statistic
is
calculated,
along
with
the
appropriate
p-
value.
Again,
its
up
to
you
to
interpret
the
results.
Some
pitfalls
to
avoid:
Dont
confuse
means
with
proportions.
Sounds
simple,
but
sometimes
it
isnt.
For
categorical
data,
you
summarize
with
counts
and
calculate
a
proportion.
For
quantitative
data,
you
summarize
by
calculating
a
sample
mean.
Beware
of
multi-modality,
skewed
data,
and
outliers.
For
multi-modality,
try
to
see
if
separate
data
groups
solve
the
problem;
for
skewed
data,
try
a
re-expression;
and
for
outliers,
consider
doing
the
analysis
with
and
without
the
outliers.
Use
t-models
for
means,
and
use
z-models
for
proportions.
Always!
Dont
use
inference
methods
for
means
when
the
assumptions
arent
true!
Beware
of
multi-modality,
skewed
data,
and
data
with
huge
outliers.
You
may
need
to
remove
outliers
before
conducting
your
analysis.
In
any
case,
always
check
the
nearly
normal
condition.
Using
a
histogram
is
the
best
way.
Discuss
what
you
see!
Interpret
your
confidence
interval
correctly.
The
CI
is
about
the
mean
of
the
population,
not
the
means
of
samples,
individual
data
points
in
the
sample,
or
individual
data
points
in
the
population.
See
page
541-542
in
text.
Best
interpretation:
I
am
C%
confident
that
the
true
mean
value
of
the
population
is
between
the
xx
and
yy.