Beruflich Dokumente
Kultur Dokumente
i=1
X
i
= X
1
+X
2
+ +X
n
.
Additionally, I will denote P(A and B) = P(A B) and P(A or B) = P(A B), and let Z denote a N(0, 1) random variable.
Chapter 2: Descriptive Statistics
The Mean: (
X) (also called the average) is a measure of central tendency of the data. It is highly aected by outliers.
X =
1
n
n
i=1
X
i
The Standard Deviation: (S) this is a measure of spread of the data. S
2
is called the variance.
S =
_
1
n 1
n
i=1
(X
i
X)
2
Chapter 3: The Normal Distribution
Skewness: A distribution is skewed right if the tail is on the right, additionally, the mean will be to the right of the median.
For skewed left simply reverse the above.
Z-Score: The Z-score is a measure of distance from the mean, measured in units of standard deviation.
Z-Score =
x
If any random variable X has a normal distribution (not necessarily standard normal) then
X
_
and P(X > a) = P
_
Z >
a
_
= 1 P
_
Z <
a
_
Chapter 4: Scatterplots and Correlation
For variables X and Y each with n observations . Then Correlation is dened as:
r =
1
n 1
n
i=1
_
X
i
X
s
X
__
Y
i
Y
s
Y
_
=
1
s
X
s
Y
_
1
n 1
n
i=1
(X
i
X)(Y
i
Y )
_
and is a measure of linear relationship. Remember: correlation does not imply causation!
Chapter 5: Regression
Let X and Y be pairs of observations with n observations each. Suppose we wish to t the following linear model:
Y = a +bX
Where we wish to minimize the squared error of our line then we calculate a and b as follows:
b =
S
Y
S
X
r and a =
Y b
X
The residuals are dened as being Y
i
Y
i
where
Y
i
is the predicted value of Y
i
for the corresponding X
i
.
1
Chapter 10: Probability
Let the sample space be S and let A and B be events in the sample space. Let P(A) denote the probability of the event A
occurring. Then the following are always true:
0 P(A) 1
P(S) = 1
P(A B) = P(A) +P(B) if A and B are mutually exclusive (disjoint)
P(A does not occur) = P(A
C
) = 1 P(A)
Chapter 11: Sampling Distributions (Central Limit Theorem)
The Law of Large Numbers The sample mean
X approaches the population mean as the sample size increases.
The Central Limit Theorem: When n is large, the sample mean
X has approximately a normal distribution with mean
and standard deviation /
n denoted as N
_
,
n
_
Chapter 12: General Probability Rules
For any two events:
P(A B) = P(A) +P(B) P(A B)
When two events are mutually exclusive we have that
P(A B) = 0
If we know that A has occurred and we want to know the probability if B also occurring we call this P(B|A) and is calculated:
P(B|A) =
P(A B)
P(A)
Two events are independent if knowing one tells you nothing about the other, denoted by:
P(B|A) = P(B)
In general, we have that:
P(A B) = P(A)P(B|A)
Therefore if two events are independent we have:
P(A B) = P(A)P(B)
Chapter 14: Introduction to Inference
Assume we have a SRS from a population with exactly N(,
2
) distribution with known and we wish to estimate .
A C level condence interval for is calculated by:
X z
n
Where z
) =
1 C
2
=
2
Here are some common values of z
:
C 90% 95% 99%
z
Z =
X
/
n
Calculate the p-value by nding:
P(Z >
Z) if H
a
: >
0
P(Z <
Z) if H
a
: <
0
P(Z > |
Z|) if H
a
: =
0
Then if p-value < we reject the null hypothesis.
2
Chapter 15: Interpretation of Inference
If you want to have a C condence interval of the form (
X ,
X +) then you need a sample of size:
n =
_
z
_
2
Type I Error: Incorrectly rejecting the null hypothesis ().
Type II Error: Incorrectly failing to reject the null hypothesis (1power).
Chapter 17: One sample t-test
Assume we have a simple random sample of data with either exactly normal distribution or comes from a symmetric single-peak
distribution and we have a large sample (n > 35) and we wish to estimate .
A C level condence interval for can be calculated by:
X t
n
Where t
is the appropriate value from the C column of the ttable at row n 1 (degrees of freedom).
We can perform a hypothesis test by calculating a t-statistic (with n 1 degrees of freedom) as follows:
t =
X
0
S/
n
And then nd the pvalue by nding the two values on the n 1st row of the tdistribution table that bound
t and then
choosing the appropriate one-sided or two-sided pvalue listed.
Matched Pairs t-test: Say we have matched pair data (Y
i
, Z
i
) and we wish to test H
0
:
Y
Z
= 0 or create a con-
dence interval for
Y
Z
then let X
i
= Y
i
Z
i
and continue as above.
Chapter 18: Two sample t-test
Assume we have two independent samples X
1
and X
2
meeting the assumptions in chapter 17.
Dene Standard Error (SE) as:
SE =
S
2
1
n
1
+
S
2
2
n
2
Then we dene our C level condence interval for
1
2
as :
(
X
1
X
2
) t
SE
And we nd our t-statistic for the null hypothesis: H
0
:
1
=
2
by:
t =
X
1
X
2
SE
And we continue nding pvalues as described in chapter 17 with degrees of freedom equal to min(n
1
, n
2
).
Chapter 19: One sample proportion test
Assume we have a sample of size n with P successes such that we create proportion:
p =
P
n
And we wish to estimate the true p of a population.
Dene our Standard Error as such:
SE =
_
p(1 p)
n
3
Then we can calculate a C level condence interval for p as such: (Where z
SE
We can perform a hypothesis test for: H
0
: p = p
0
by computing the Zstatistic: (Then proceed as in chapter 14.)
Z =
p p
0
_
p0(1p0)
n
Chapter 20: Two sample proportion test
Assume we have two population samples of size n
1
and n
2
with P
1
successes in the rst group and P
2
successes in the
second group such that we create proportions:
p
1
=
P
1
n
1
p
2
=
P
2
n
2
And we wish to compare the true proportions p
1
and p
2
.
Dene our Standard Error as such:
SE =
p
1
(1 p
1
)
n
1
+
p
1
(1 p
1
)
n
1
Then we can calculate a C level condence interval for p
1
p
2
as such: (Where z
SE
We can perform a hypothesis test for: H
0
: p
1
= p
2
rst by dening:
p =
P
1
+P
2
n
1
+n
2
and then by computing the Zstatistic:(Then proceed as in chapter 14.)
Z =
p
1
p
2
_
p(1 p)
_
1
n1
+
1
n2
_
Chapter 22: Chi-Square Test
Assume we have two categorical variables and we calculate a two-way table of counts. We may wish to test whether or not the
two categorical variables are independent, i.e. H
0
: There is no relationship between the two categorical variables. We can do so
by performing a
2
(or Chi-Squared) test.
First compute the expected count for each entry in the two-way table by the formula:
Expected Count Cell =
_
Row Total
Grand Total
__
Column Total
Grand Total
_
Grand Total =
_
Row Total Column Total
Grand Total
_
Where Row Total is the total counts for the row of the specied cell and Column Total is the total for the specied cells column
and Grand Total is the total number of observations.
Then calculate the
2
statistic:
2
=
_
1
n 2
n
i=1
(
Y
i
Y
i
)
2
We dened the Standard Error of the estimated parameter b as follows:
SE
b
=
s
n
i=1
(X
i
X)
2
=
s
_
(n 1)S
X
We can calculate a C level condence interval for the true value as such where t
SE
b
We can perform a hypothesis test of H
0
: = 0 by calculating the tstatistic as follows and continuing as in chapter 17:
t =
t
SE
b
Dene the following Stander Errors
SE
= s
1
n
+
(x
X)
2
(X
i
X)
2
SE
Y
= s
1 +
1
n
+
(x
X)
2
(X
i
X)
2
Then we dene the C level condence interval for the mean response
y
when X = x
by:
Y t
SE
And C level prediction interval for the predicted value
Y when X = x
by:
Y t
SE
Y
5