Sie sind auf Seite 1von 27

Economics 420

Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Section 2: Tu/Th 10:20-11:40, 117 Berkey
Section 3: Tu/Th 12:40-2:00, 117 Berkey

What is econometrics?
The simplest definition is:
The statistical analysis of economic (and related) data
But there are many other definitions
Use of statistical methods to test hypotheses about economic
relationships, estimate actual magnitudes, and use these estimates
to make quantitative predictions of economic events
How we use theory and data from economics, along with the
tools from statistics, to answer how much type questions
2

Estimation of causal effects suggested by economic theory from


observational data

Economics (economic theory) suggests important relationships,


often with policy implications, but rarely suggests quantitative
magnitudes of causal effects
What is the quantitative effect of reducing class size on
student achievement?
How does another year of education change earnings?
What is the price elasticity of demand for cigarettes?
What is the effect on GDP growth of a 1 percentage point
increase in interest rates by the Fed?
What is the effect on housing prices of environmental
improvements?

This course is about using data to measure causal


effects

Ideally, we would like an experiment


what would be an experiment to estimate the effect of
class size on standardized test scores?

But almost always we only have observational


(nonexperimental) data
returns to education
cigarette prices
monetary policy

Showing the effect (or causality) of one variable on


another is a challenge
We need to compare an outcome actually observed with a
counterfactual what would have occurred under some
other set of circumstances

What would student test scores be if average class size were


lower by one student per teacher?

What would your earnings be at age 30 if you had attended a


different university than MSU?

Questions like these may seem unanswerable, but we will see that
econometrics is about finding ways to obtain credible answers to
these questions
In fact, most of the course deals with difficulties arising from using
observational data to estimate causal effects
confounding effects (omitted factors)
simultaneous causality
correlation does not imply causation

Structure of economic data (Wooldridge, section 1.3)


All of the following are types of observational data
Cross-section data (snap-shot)
A sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a point in time
Time-series data (often macro)
Observations on a variable (or several variables) over time
stock prices, money supply, consumer price index, GDP, annual
homicide rates, car sales

Pooled cross sections


Two or more cross sections from the same survey (e.g.,
Current Population Survey) that are combined, or pooled
Panel data (longitudinal)
A time series for each cross-sectional unit survey the same
individuals (or households) annually over many years, or
collect data on the states of the US over several years

Sources and characteristics of labor market data


Data Source
Feature

American
Community
Survey

Current
Population Survey

Establishment
Survey

Unit of
observation

Household

Household

Establishment

Establishment

Establishment

Frequency

Annual

Monthly

Monthly

Quarterly

Monthly

Type

Cross section

Cross section
(panel aspects)

Cross section

Cross section

Cross section

Sample size

~3,500,000
per year

60,000 households

145,000
establishments1

10,800
establishments1

16,400
establishments1

Exclusions from
sample

None

None

Agriculture and
self-employed

Agriculture and
self-employed

Agriculture and
self-employed

Payroll records

Mail survey

Phone and mail


survey

How gathered

Mail survey with


Interview in
follow up
person or phone

Employment Cost Job Openings and


Index
Labor Turnover

10

In this course you will:

Learn methods for estimating causal effects using


observational data
Learn some tools that can be used for other purposes, for
example forecasting using time series data
Focus on applications theory is used only as needed to
understand the whys of the methods
Learn to evaluate the regression analysis of others this
means you will be able to read/understand empirical
economics papers in other economics courses
Get some hands-on experience with regression analysis in
your problem sets
11

Experiments and Basic Statistics Intro


Consider an empirical problem: Class size and educational output

Policy question: What is the effect on test scores (or some


other outcome measure) of reducing class size by one student
per class? by 8 students/class?
We must use data to find out (Is there any way to answer this
without data?)

12

The California Test Score Data Set


All K-6 and K-8 California school districts in 1998-1999 (n = 420)
Variables:
5th grade test scores (Stanford-9 achievement test, combined
math and reading), district average
Student-teacher ratio (STR) = number of students in the
district divided by number full-time equivalent teachers

13

Initial
look
data
Initial
lookatatsome
the data:
(You(You
should
already
know
how
totointerpret
should
already
know
how
interpretthis
this table)
table)

doesnt
us anything
about
therelationship
relationship between
This This
tabletable
doesnt
tell tell
us anything
about
the
betweenand
testthe
scores
and the STR. ratio!
test scores
student-teacher
1/2/3-8

14

Do districts
smaller
classes
have
higher
testscores?
scores?
Do districts
withwith
smaller
classes
have
higher
test
Scatterplot
of test
score
v. student-teacherratio
ratio
Scatterplot
of test
score
v. student-teacher

What
does
figure
show?
What
does
thisthis
figure
show?
1/2/3-9

15

We need numerical evidence on whether districts with low STRs


have higher test scores but how?
1. Compare average test scores in districts that have low STRs
with those that have high STRs (estimation)
2. Test the null hypothesis that the mean test scores in the two
types of districts are the same, against the alternative hypothesis
that they differ (hypothesis testing)
3. Estimate an interval for the difference in the mean test scores,
high v. low STR districts (confidence interval)

16

Initial data analysis: Compare districts with small (STR <


data analysis:
districts with small
20)Initial
and large
(STR 20)Compare
class sizes:
(STR < 20) and large (STR 20) class sizes:

(Y )

Standard
deviation (sBY)B

Small

657.4

19.4

238

Large

650.0

17.9

182

Class
Size

Average score

1. Estimate = difference between group means

1. Estimation of

= difference between group means

2. 2.Test
thatthat
==
00
Testthe
thehypothesis
hypothesis
3. Construct a confidence interval for
3. Construct a confidence interval for
17

1/2/3-11

Estimation
1. Estimation

Ysmall Ylarge =

nsmall

nsmall

i 1

Yi

1
nlarge

nlarge

Yi
i 1

= 657.4 650.0
= 7.4
Is this a large difference in a real-world sense?
Standard
deviationinacross
districtssense?
= 19.1
Is this a large
difference
a real-world
between
60th
and 75th
percentiles of test score
Difference
Standard
deviation
across
districts
= 19.1
distribution
is 667.6
659.4
8.2
Difference
between
60th
P P and
75th
P =P percentiles
of test score
enough difference to be important for school
This is aisbig
distribution
667.6
659.4 = 8.2
reform discussions, for parents, or for a school committee?

This is a big enough difference to be important for school


reform discussions, for parents, or for a school
committee?

18

2. Hypothesis testing

2. Hypothesis testing
Hypothesis testing
Difference-in-means
test: compute the t-statistic,
Difference-in-means test compute the t-statistic:

Difference-in-means test: compute the t-statistic,


Ys Yl
Ys Yl
(remember this?)
t
2
2
ss
sl
SE (YYs YYl )
Y
Y
s
l
(remember this?)
t ns s2 nl l2
ss
sl
SE (Ys Yl )
ns this?)
nl
(remember

where SE(Ys Yl ) is the standard error of Ys Yl , the


to small
anderror
large
STR
districts,
subscripts
SE(Y
the
standard
error
of (Ysof Y
),the
subscripts
s and
YYs l) lisYrefer
)
is
the
standard
Y
Y
,
the
where
SE(
ss and
l
l
s
l
ns
1 s and
2l (el) refer
2 small
to(small
and
large and
STRlarge
districts,STR
and districts,
lYrefer
to
subscripts
Y
(etc.)
and ss
)
i
s
ns 11 i 1 ns
and ss2
(Yi Ys ) 2 (etc.)
ns 1 i 1
is the variance of test scores in small districts
1/2/3-13

1/2/3-13

19

Compute the difference-of-means t-statistic:


Compute the difference-of-means t-statistic:

sBYB

small

Y
657.4

19.4

238

large

650.0

17.9

182

Size

Ys Yl
ss2
ns

sl2
nl

657.4 650.0
19.42
238

17.92
182

7.4
= 4.05
1.83

|t| > 1.96, so we reject (at the 5% significance level) the null
|t| hypothesis
> 1.96, so that
reject
themeans
5% significance
the(attwo
are the samelevel) the null

hypothesis that the two means are the same.

20

3. Confidence interval

A 95%Confidence
confidence interval
interval for the difference between the
meansAis,95% confidence interval for the difference between the means
is:

(Ys Yl )

1.96 SE(Ys Yl )

= 7.4 1.96 1.83 = (3.8, 11.0)


Two equivalent statements:

This leads to two equivalent statements:


1. The
95%95%
confidence
interval
forfor
doesnt
1. The
confidence
interval
doesntinclude
include0;
0
hypothesis
thatthat=0=is0rejected
at the
5%5%
level.
2. The
2. The
hypothesis
is rejected
at the
level

1/2/3-15

21

Stata issues

We also looked at the California school district data using


Stata
You can find the data on D2L the file is named
<caschool.dta>, and the description of the data is in the pdf
file <californiatestscores.pdf>
And here are helpful materials for learning and using Stata:
o Wooldridge, Rudiments of Stata (posted on D2L)
o The University of Wisconsins Social Science Computing
Cooperative has produced two excellent (very readable)
series of Stata documentation:

22

Stata for Students at <http://www.ssc.wisc.edu/sscc/pubs/


stata_students1.htm
Stata for Researchers at <http://www.ssc.wisc.edu/sscc/
pubs/sfr-intro.htm>
Princeton University also has an excellent Online Stata
Tutorial at >http://www.princeton.edu/~otorres/Stata/>
Finally, UCLA has extensive Resources to help you learn
and use Stata at http://www.ats.ucla.edu/stat/stata/

23

Here is what we did in class with the California school


district data
1. Open the dataset by double-clicking on it
2. Look at the data by clicking the Data Browser button
3. Order the variables so the STR and test score variables come
first:

order str testscr

4. Produce descriptive statistics (mean, standard deviation,


minimum, maximum):

sum

5. Generate a dummy variable indicating whether the school


district has a large or small STR:
24

gen lg = str >= 20

6. Sort the data, then produce descriptive statistics for each of the
two groups of school districts (large and small STR):

sort lg

by lg: sum

7. Perform a t-test for whether the difference in test scores


between large and small STR school districts is significant (in the
statistical sense, as opposed to the policy sense):

ttest testscr, by lg

25

What comes next ...

The mechanics of estimation, hypothesis testing, and


confidence intervals should be familiar to you
These concepts extend directly to regression and its variants
But before we turn to regression, we will review some of the
underlying theory of estimation, hypothesis testing, and
confidence intervals
o Why do these procedures work, and why use these rather
than others?
We will review the intellectual foundations of statistics and
econometrics

26

For next time, read:


Wooldridge, Chapter 1
Wooldridge, Appendix A.1
Wooldridge, Appendix B.1 and B.2
Wooldridge, Appendix C.1 and C.2

27

Das könnte Ihnen auch gefallen