Sie sind auf Seite 1von 4

Review of Statistics Basic concepts

Population Group of interest. Random variable variable/feature of interest (about the population). It may dier for every individual, therefore, it is called a variable and it has a distribution that can be characterized by the mean and variance (or standard deviation). Random Sample Choose a small set of the population consisting of n individuals (randomly chosen) and collect values of the random variable for those individuals. Then, the sample will be of size n.

Population Students in the U.S Professors in the U.S. Students taking Econ 103 at UCLA

Random Variable Heights Salaries Grades

Sample Students at UCLA Professors at UCLA Students enrolled in sections 1C and 1D

There are three things to do: 1. Estimation 2. Hypothesis Testing 3. Condence Intervals Why? Typically, the distribution of the random variable is not known for the population. Therefore, the moments of the distribution are estimated using random samples.

Estimation
Moments
Let the random variable be denoted by Y . Note: Greek letters represent population quantities.

Population moments Mean Y = E(Y )


2 Variance Y = E[Y Y ]2 Standard Deviation Y = Variance(Y )

Sample analogs Sample mean Y =


1 n

Sample Variance s2 = Y

Covariance XY = E[(X X )(Y Y )]

Y )2 Sample Standard Deviation sY = Sample Variance(Y ) 1 Sample Covariance sXY = n1 n (Xi X)(Yi Y ) i=1
i=1 (Yi

i=1 Yi

1 n1

Also, Cov(X, Y ) Corr(X, Y ) = XY = Var(X)Var(Y )

Properties of Expectation and Variance

E[a] = a where a is a constant E[aX] = aE[X] E[X + Y ] = E[X] + E[Y ] E[(aX)2 ] = a2 E[X 2 ] V ar(a) = 0 V ar(aX) = a2 V ar(X) V ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X, Y ) V ar(X Y ) = V ar(X) + V ar(Y ) 2Cov(X, Y ) V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y ) Notice that constants multiplied with the random variable are squared when pulled out of the variance.

Important property of the Sample Mean


The estimator Y is itself a random variable because every time the random sample changes, the value of Y will also change. Thus, the sample mean has a distribution with an expectation E(Y ) and a variance V ar(Y ). E(Y ) = Y This implies that Y is an unbiased estimator of the population mean. 2

2 Y n This implies that as the sample size n increases, the variance of Y shrinks. These two properties imply that Y is a consistent estimator of the population mean.

V ar(Y ) =

Central Limit Theorem


As n increases, the distribution of Y is approximately given by a normal distribution. 2 Y Y N Y , n Y Y N (0, 1) 2 Y n

Hypothesis Testing
The aim of hypothesis testing is to test whether the data actually matches our hypothesis about the value of a parameter of interest. The general procedure to do this is: Specify the null hypothesis - This is a statement about the conjectured value of the true population parameter. For example, H0 : = 10 If the null is H0 : = 0, then it is referred to as the test of statistical signicance. Specify the alternative hypothesis - This is hypothesis that the null is compared to. It could be two-sided or one-sided. For example, HA : = 10 (two-sided) HA : > 10 or

HA : < 10 (one-sided)

Specify a signicance level which is the probability at which the null is rejected even if it may be true (Type I error). is typically, 1%, 5% or 10%. Calculate the test statistic and nd its distribution under the null (i.e. assuming that the null is true). For example, in large samples Y Y tST AT = N (0, 1) s2 Y n 3

under the null. Determine the critical values using the tables for the distribution obtained above and the values corresponding to the acceptance and rejection regions. Rejection region would lie in both the tails if it is a two-sided test and only in one tail if it is a one-sided test. Make a conclusion about rejecting the null or failing to reject the null, depending on whether the value of the test statistic lies in the rejection or acceptance regions. Alternatively, one could use the p-value approach. P-value gives the smallest signicance level at which the null hypothesis can be rejected. Therefore, if the p-value is less than , then we reject the null hypothesis.

Condence Intervals
It gives a range of values which are likely to contain the unknown parameter with a certain probability (the condence level). A 95% condence interval for the population mean implies that the true population mean will lie within the interval 95% of the time. A (1 )% condence interval is given by s2 s2 Y Y Y c , Y +c n n where c is the critical value from the N(0,1) tables depending on . Therefore, a 95% condence interval for Y is given by s2 s2 Y Y Y 1.96 , Y + 1.96 n n

Das könnte Ihnen auch gefallen