Sie sind auf Seite 1von 6

Statistics (13/09/2013, 20/09/2013) A. Descriptive statistics a. Measures of central tendency i. Arithmetic mean { ii.

: i=1, , n} ; Median The middle value in a dataset(if even number of observations: arithmetic mean of the two middle values) iii. Mode Value that appears most often in a dataset b. Measures of dispersion i. Range Difference between the largest and smallest values of a dataset ii. Variance iii. Standard deviation

B. Random variables and probability distributions a. Random variable Experiment => possible outcomes => numerical value => random variable Coin flipping example: {

Types of random variables: Bernoulli (binary), discrete, continuous Probabilities: Bernoulli: P(X=1)=; P(X=0)=1- Discrete: if k possible values { { } then ( )

} with respective probabilities where

b. Probability density function (pdf) Pdf summarizes the information concerning the possible outcomes of X and the corresponding probabilities: ( ) Discrete:

Continuous:

c. Expected value E(X) or : weighted average of all possible values of X; the weights are determined by the pdf. Sometimes, the expected value is called the population mean, especially when we want to emphasize that X represents some variable in a population. In case of discrete random variables: If X: { } and f(x): pdf of X, then

C. Inferential statistics a. Normal distribution

b. Sampling distribution Population: any well-defined group of subjects. Sample is a subset of individuals chosen from a population. Random sample: each individual has the same probability of being chosen at any stage during the sampling process. Law of Large Numbers: if we are interested in estimating the population mean , we can get arbitrarily close to by choosing sufficiently large sample. Central Limit Theorem: the sampling distribution of the sampling means approaches a normal distribution as the sample size gets larger, regardless of the shape of the population distribution. Sampling distribution: mean of the sampling distribution and standard error ( )

c. Confidence interval Interval estimator: random estimator (the endpoints change with different samples)

c: critical value from the distribution ( : degrees of freedom; df), obtained from the tables To construct a 95% confidence interval, let c denote the 97.5th percentile in the distribution. In other words, c is the value such that 95% of the area in the distribution is between c and c: P(-c < < c) = 0.95 Probabilistic interpretation: for 95% of all random samples, the constructed confidence interval will contain .

For large n, an approximate 95% confidence interval is

D. Regression a. Joint and conditional distributions Let X and Y be discrete random variables. Then: (X,Y) have a joint distribution which is fully described by the joint probability density function of (X,Y): where the right-hand side is the probability that X=x and Y=y. The conditional probability of Y given X informs about how X affects Y, this information is summarized by the conditional probability density function, defined by: for all values such that >0.

If X and Y are discrete: , where the right-hand side is read as the probability that Y=y given that X=x. If X and Y independent: . b. Measures of association between variables Covariance Cov(X,Y) or : Correlation coefficient Corr(X,Y) :

c. Conditional expectation E(Y|X=x) or E(Y|x) If discrete variables: If Y= }, then

d. Causality versus correlation Causal effect when established ceteris paribus (other relevant factors being fixed). e. Regression model Simple linear regression model: u: error term (or disturbance term) Assumptions: 1. E(u)=O 2. E(u|x)=E(u) E(u|x)=0 :zero-conditional mean assumption Estimation method: Random sample

Assume:

Substituting for u:

Given our sample of data, we choose such estimators and to solve the sample counterparts of eq. (4) and (5) => methods of moments: (6) can be rewritten as mean => using the formula of arithmetic

Rewriting in terms of , , and :


4

in (7) and plugging (9) into (7): ( ) ]

Dropping [

Rearranging;

From basic properties of summation operator: Therefore:


(9) and (10): Ordinary Least Squares (OLS) For any and : a fitted value for y: Residual: the difference between the actual and its fitted value:

Choose such and to make the sum of squared residuals as small as possible: => optimization calculus

Goodness-of-Fit: R-squared

Das könnte Ihnen auch gefallen