Statistics 221 Summary of Material

Statistics 221 Summary of Material
James McQueen, Autumn 2012

This guide is meant to be a brief summary of important results including formulas and the occasional denition. It is not meant
to cover every topic in the book.
This notation will be used throughout this summary so I will dene it here:
n
i=1
X
i
= X
1
+X
2
+ +X
n
.
Additionally, I will denote P(A and B) = P(A B) and P(A or B) = P(A B), and let Z denote a N(0, 1) random variable.
Chapter 2: Descriptive Statistics
The Mean: (

X) (also called the average) is a measure of central tendency of the data. It is highly aected by outliers.
X =
1
n
n
i=1
X
i
The Standard Deviation: (S) this is a measure of spread of the data. S
2
is called the variance.
S =
_
1
n 1
n
i=1
(X
i

X)
2
Chapter 3: The Normal Distribution
Skewness: A distribution is skewed right if the tail is on the right, additionally, the mean will be to the right of the median.
For skewed left simply reverse the above.
Z-Score: The Z-score is a measure of distance from the mean, measured in units of standard deviation.
Z-Score =
x
If any random variable X has a normal distribution (not necessarily standard normal) then
X
has the standard normal

distribution and we can calculate the following probabilities:
P(X < a) = P
_
Z <
a
_
and P(X > a) = P
_
Z >
a
_
= 1 P
_
Z <
a
_
Chapter 4: Scatterplots and Correlation
For variables X and Y each with n observations . Then Correlation is dened as:
r =
1
n 1
n
i=1
_
X
i

X
s
X
__
Y
i

Y
s
Y
_
=
1
s
X
s
Y
_
1
n 1
n
i=1
(X
i

X)(Y
i

Y )
_
and is a measure of linear relationship. Remember: correlation does not imply causation!
Chapter 5: Regression
Let X and Y be pairs of observations with n observations each. Suppose we wish to t the following linear model:
Y = a +bX
Where we wish to minimize the squared error of our line then we calculate a and b as follows:
b =
S
Y
S
X
r and a =

Y b

X
The residuals are dened as being Y
i

Y
i
where

Y
i
is the predicted value of Y
i
for the corresponding X
i
.
1
Chapter 10: Probability
Let the sample space be S and let A and B be events in the sample space. Let P(A) denote the probability of the event A
occurring. Then the following are always true:
0 P(A) 1
P(S) = 1
P(A B) = P(A) +P(B) if A and B are mutually exclusive (disjoint)
P(A does not occur) = P(A
C
) = 1 P(A)
Chapter 11: Sampling Distributions (Central Limit Theorem)
The Law of Large Numbers The sample mean

X approaches the population mean as the sample size increases.
The Central Limit Theorem: When n is large, the sample mean

X has approximately a normal distribution with mean
and standard deviation /
n denoted as N
_
,

n
_
Chapter 12: General Probability Rules
For any two events:
P(A B) = P(A) +P(B) P(A B)
When two events are mutually exclusive we have that
P(A B) = 0
If we know that A has occurred and we want to know the probability if B also occurring we call this P(B|A) and is calculated:
P(B|A) =
P(A B)
P(A)
Two events are independent if knowing one tells you nothing about the other, denoted by:
P(B|A) = P(B)
In general, we have that:
P(A B) = P(A)P(B|A)
Therefore if two events are independent we have:
P(A B) = P(A)P(B)
Chapter 14: Introduction to Inference
Assume we have a SRS from a population with exactly N(,
2
) distribution with known and we wish to estimate .
A C level condence interval for is calculated by:
X z
n
Where z
is the number such that

P(Z > z
) =
1 C
2
=

2
Here are some common values of z
:
C 90% 95% 99%
z
1.645 1.960 2.576

When testing the null hypothesis H
0
: =
0
at signicance level calculate the following Z-statistic:
Z =
X
/
n
Calculate the p-value by nding:
P(Z >

Z) if H
a
: >
0
P(Z <

Z) if H
a
: <
0
P(Z > |
Z|) if H
a
: =
0
Then if p-value < we reject the null hypothesis.
2
Chapter 15: Interpretation of Inference
If you want to have a C condence interval of the form (

X ,

X +) then you need a sample of size:
n =
_
z
_
2
Type I Error: Incorrectly rejecting the null hypothesis ().
Type II Error: Incorrectly failing to reject the null hypothesis (1power).
Chapter 17: One sample t-test
Assume we have a simple random sample of data with either exactly normal distribution or comes from a symmetric single-peak
distribution and we have a large sample (n > 35) and we wish to estimate .
A C level condence interval for can be calculated by:
X t
n
Where t
is the appropriate value from the C column of the ttable at row n 1 (degrees of freedom).
We can perform a hypothesis test by calculating a t-statistic (with n 1 degrees of freedom) as follows:
t =
X
0
S/
n
And then nd the pvalue by nding the two values on the n 1st row of the tdistribution table that bound

t and then
choosing the appropriate one-sided or two-sided pvalue listed.
Matched Pairs t-test: Say we have matched pair data (Y
i
, Z
i
) and we wish to test H
0
:
Y

Z
= 0 or create a con-
dence interval for
Y

Z
then let X
i
= Y
i
Z
i
and continue as above.
Chapter 18: Two sample t-test
Assume we have two independent samples X
1
and X
2
meeting the assumptions in chapter 17.
Dene Standard Error (SE) as:
SE =
S
2
1
n
1
+
S
2
2
n
2
Then we dene our C level condence interval for
1
2
as :
(

X
1

X
2
) t
SE
And we nd our t-statistic for the null hypothesis: H
0
:
1
=
2
by:
t =
X
1

X
2
SE
And we continue nding pvalues as described in chapter 17 with degrees of freedom equal to min(n
1
, n
2
).
Chapter 19: One sample proportion test
Assume we have a sample of size n with P successes such that we create proportion:
p =
P
n
And we wish to estimate the true p of a population.
Dene our Standard Error as such:
SE =
_
p(1 p)
n
3
Then we can calculate a C level condence interval for p as such: (Where z
is found as in chapter 14.)

p z
SE
We can perform a hypothesis test for: H
0
: p = p
0
by computing the Zstatistic: (Then proceed as in chapter 14.)
Z =
p p
0
_
p0(1p0)
n
Chapter 20: Two sample proportion test
Assume we have two population samples of size n
1
and n
2
with P
1
successes in the rst group and P
2
successes in the
second group such that we create proportions:
p
1
=
P
1
n
1
p
2
=
P
2
n
2
And we wish to compare the true proportions p
1
and p
2
.
Dene our Standard Error as such:
SE =
p
1
(1 p
1
)
n
1
+
p
1
(1 p
1
)
n
1
Then we can calculate a C level condence interval for p
1
p
2
as such: (Where z
is found as in chapter 14)

( p
1
p
2
) z
SE
We can perform a hypothesis test for: H
0
: p
1
= p
2
rst by dening:
p =
P
1
+P
2
n
1
+n
2
and then by computing the Zstatistic:(Then proceed as in chapter 14.)
Z =
p
1
p
2
_
p(1 p)
_
1
n1
+
1
n2
_
Chapter 22: Chi-Square Test
Assume we have two categorical variables and we calculate a two-way table of counts. We may wish to test whether or not the
two categorical variables are independent, i.e. H
0
: There is no relationship between the two categorical variables. We can do so
by performing a
2
(or Chi-Squared) test.
First compute the expected count for each entry in the two-way table by the formula:
Expected Count Cell =
_
Row Total
Grand Total
__
Column Total
Grand Total
_
Grand Total =
_
Row Total Column Total
Grand Total
_
Where Row Total is the total counts for the row of the specied cell and Column Total is the total for the specied cells column
and Grand Total is the total number of observations.
Then calculate the
2
statistic:

2
=
(observed count expected count)

2
expected count
You then nd the p-value from the
2
distribution table with (r 1)(c 1) degrees of freedom where r is the number of rows
and c is the number of columns.
Chapter 23: Inference for Regression
Say we have t a regression line Y = a +bX as in Chapter 5. If we believe that Y and X come from a simple random samples
of normally distributed populations, that the true relationship is of the form Y = +X, and that the variance of Y does not
vary with X then we can perform inference for regression.
4
We dene the Standard Error of Regression (s): as follows:
s =
_
1
n 2
n
i=1
(
Y
i
Y
i
)
2
We dened the Standard Error of the estimated parameter b as follows:
SE
b
=
s
n
i=1
(X
i

X)
2
=
s
_
(n 1)S
X
We can calculate a C level condence interval for the true value as such where t
is as dened in chapter 17 with n2 degrees

of freedom.
b t
SE
b
We can perform a hypothesis test of H
0
: = 0 by calculating the tstatistic as follows and continuing as in chapter 17:
t =
t
SE
b
Dene the following Stander Errors
SE

= s
1
n
+
(x

X)
2
(X
i

X)
2
SE
Y
= s
1 +
1
n
+
(x

X)
2
(X
i

X)
2
Then we dene the C level condence interval for the mean response
y
when X = x
by:
Y t
SE

And C level prediction interval for the predicted value

Y when X = x
by:
Y t
SE
Y
5

Statistics 221 Summary of Material

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics 221 Summary of Material

Hochgeladen von

Copyright:

Verfügbare Formate

Statistics 221 Summary of Material

James McQueen, Autumn 2012

has the standard normal

is the number such that

1.645 1.960 2.576

is found as in chapter 14.)

is found as in chapter 14)

(observed count expected count)

is as dened in chapter 17 with n2 degrees

Das könnte Ihnen auch gefallen