Stats CH 23 Notes

Gary
Piligian
Mount Vernon Presbyterian School
Statistics

BVD Chapter 23 Inference About Means
Key Vocabulary Terms - Summary

Students t distribution: a family of distributions indexed by its degrees of freedom.
Commonly known as t-distributions, they are unimodal, symmetric, and
bell-shaped, but they generally have fatter tails than the normal model. As
the degrees of freedom increase, t-distributions look more and more like
the normal model.

t-models correct for the extra variability introduced because we estimate
the population standard deviation from the sample standard deviation.
Unlike proportions, knowing the mean of the population doesnt tell us
anything about the standard deviation of the population, so when we use
this estimate, we introduce extra variability, particularly for small sample
sizes. The t-model more accurately describes the sampling distribution
model for sample means than a normal model does. For large sample
sizes, t-models look very close to normal models (or z-models)

Degrees of freedom: the number of values in the final calculation of a statistic that are
free to vary. For the purposes of our class, the degrees of freedom (df) are
calculated as the sample size minus 1:

= 1

Sampling distribution model for means: With the appropriate assumptions, the
sampling distribution model for means is a t-distribution with ( 1)
degrees of freedom. The test statistic used for inference is

()

The standard deviation is estimated by the SE, where =
!
!

One-sample t-interval for the mean: When certain assumptions and conditions are
met, the confidence interval for the population mean is

= !!!
()
where !!!
is the critical value from the t-distribution model with
1 degrees of freedom corresponding to the particular confidence
level that you specify.

One-sample t-test for the mean: a test of the null hypothesis ! : = ! by referring
to the following test statistic:

!!! =
!

()
where =
Note in the (very, very, very) rare case when you know the population
standard deviation (), then you can use the Normal Model and then use
the z-statistic as your test statistic. Otherwise use the t-model and then
use the t-statistic as your test statistic. For all practical purposes, youll
never know , so you cant go wrong by using the t-model for inference for
means! So just remember z for proportions, and t for means!

Assumptions and Conditions to check for Inferences about Means
Independence Assumption the data values should be independent.
Conditions to check to verify Independence Assumption:

-Randomization Condition: data come from SRS or randomized experiment

-10% Condition: sample size is less than 10% of the population. We rarely

need to check the 10% condition for means, since our sample sizes are
generally smaller than they were for proportions.
Normal Population Assumption t-models wont work for data that are badly skewed,
so we assume the data are from a population that follows a normal model.
Conditions to check to verify the Normal Population Assumption:
-Nearly Normal Condition the data come from a distribution that is unimodal
and symmetric. Check by making a histogram. The smaller the sample size, the
more important is it that the data are nearly normal.

Key concepts to remember:
Statistical inference for means relies on the same concepts as it does for proportions,
but the model is different. We still infer the population mean from the mean of a
representative sample, but we use a t-distribution model rather than a normal
distribution model.
The ruler for measuring variability in sample means is the standard error (s divided
by the square root of n). Use this ruler to find the margin of error, to construct
confidence intervals and to conduct hypothesis tests regarding means.
The t-model is a family of models, rather than just one model. The number of degrees
of freedom dictates the shape of the t-distribution. As the number of degrees of freedom
increase, the t-model converges to the normal model.

TI 84+ Tips:
To find t-model probabilities, use the tcdf function. Just like there is a normalcdf
function on your TI84+ to calculate the probabilities of getting a range of z-scores using
a normal model, there is a tcdf function that does the same thing for t-scores using a t-
model. However, you also have to specify the degrees of freedom df. The syntax is
tcdf(lower bound, upper bound, df).

To find a critical t-value, use the invT function. Again, you need to specify the degrees
of freedom, so the syntax is invT(percentile, df). Remember to use the proper
percentile to account for data in both tails! (for example, if doing a 95% confidence
interval with ten degrees of freedom, you would find the critical t-value via InvT(0.975,
10).

To construct a confidence interval for means, go to STAT-TEST, choose T-Interval, and
enter the appropriate values either for the raw data (in a list), or the summary statistics
for , ! , and (respectively, the sample mean, the sample standard deviation, and the
sample size) and the desired confidence level. Hit Calculate, and the calculator displays
the CI! You must interpret the CI, and you must also check the nearly normal condition.
Use STATPLOT to create a histogram of the data if you have the raw data.

To perform a hypothesis test for means, go to STAT -TEST, choose T-test, and enter the
appropriate values either for the raw data (in a list) or the summary statistics for
, ! , and (the hypothesized mean, the sample standard deviation, and the sample size)
along with the type of hypothesis test (two tailed, one-tailed upper tail, or one-tailed
lower tail). Hit calculate and the test statistic is calculated, along with the appropriate p-
value. Again, its up to you to interpret the results.

Some pitfalls to avoid:
Dont confuse means with proportions. Sounds simple, but sometimes it isnt. For
categorical data, you summarize with counts and calculate a proportion. For quantitative
data, you summarize by calculating a sample mean.
Beware of multi-modality, skewed data, and outliers. For multi-modality, try to see if
separate data groups solve the problem; for skewed data, try a re-expression; and for
outliers, consider doing the analysis with and without the outliers.
Use t-models for means, and use z-models for proportions. Always!
Dont use inference methods for means when the assumptions arent true! Beware of
multi-modality, skewed data, and data with huge outliers. You may need to remove
outliers before conducting your analysis. In any case, always check the nearly normal
condition. Using a histogram is the best way. Discuss what you see!
Interpret your confidence interval correctly. The CI is about the mean of the population,
not the means of samples, individual data points in the sample, or individual data points in
the population. See page 541-542 in text. Best interpretation: I am C% confident that the
true mean value of the population is between the xx and yy.

Stats CH 23 Notes

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stats CH 23 Notes

Hochgeladen von

Copyright:

Verfügbare Formate

Gary

Das könnte Ihnen auch gefallen