1 Intro Rev

Economics 420
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Section 2: Tu/Th 10:20-11:40, 117 Berkey
Section 3: Tu/Th 12:40-2:00, 117 Berkey
What is econometrics?
The simplest definition is:
The statistical analysis of economic (and related) data
But there are many other definitions
Use of statistical methods to test hypotheses about economic
relationships, estimate actual magnitudes, and use these estimates
to make quantitative predictions of economic events
How we use theory and data from economics, along with the
tools from statistics, to answer how much type questions
2
Estimation of causal effects suggested by economic theory from

observational data
Economics (economic theory) suggests important relationships,

often with policy implications, but rarely suggests quantitative
magnitudes of causal effects
What is the quantitative effect of reducing class size on
student achievement?
How does another year of education change earnings?
What is the price elasticity of demand for cigarettes?
What is the effect on GDP growth of a 1 percentage point
increase in interest rates by the Fed?
What is the effect on housing prices of environmental
improvements?
This course is about using data to measure causal

effects
Ideally, we would like an experiment

what would be an experiment to estimate the effect of
class size on standardized test scores?
But almost always we only have observational

(nonexperimental) data
returns to education
cigarette prices
monetary policy
Showing the effect (or causality) of one variable on

another is a challenge
We need to compare an outcome actually observed with a
counterfactual what would have occurred under some
other set of circumstances
What would student test scores be if average class size were

lower by one student per teacher?
What would your earnings be at age 30 if you had attended a

different university than MSU?
Questions like these may seem unanswerable, but we will see that
econometrics is about finding ways to obtain credible answers to
these questions
In fact, most of the course deals with difficulties arising from using
observational data to estimate causal effects
confounding effects (omitted factors)
simultaneous causality
correlation does not imply causation
Structure of economic data (Wooldridge, section 1.3)

All of the following are types of observational data
Cross-section data (snap-shot)
A sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a point in time
Time-series data (often macro)
Observations on a variable (or several variables) over time
stock prices, money supply, consumer price index, GDP, annual
homicide rates, car sales
Pooled cross sections

Two or more cross sections from the same survey (e.g.,
Current Population Survey) that are combined, or pooled
Panel data (longitudinal)
A time series for each cross-sectional unit survey the same
individuals (or households) annually over many years, or
collect data on the states of the US over several years
Sources and characteristics of labor market data

Data Source
Feature
American
Community
Survey
Current
Population Survey
Establishment
Survey
Unit of
observation
Household
Household
Establishment
Establishment
Establishment
Frequency
Annual
Monthly
Monthly
Quarterly
Monthly
Type
Cross section
Cross section
(panel aspects)
Cross section
Cross section
Cross section
Sample size
~3,500,000
per year
60,000 households
145,000
establishments1
10,800
establishments1
16,400
establishments1
Exclusions from
sample
None
None
Agriculture and
self-employed
Agriculture and
self-employed
Agriculture and
self-employed
Payroll records
Mail survey
Phone and mail

survey
How gathered
Mail survey with

Interview in
follow up
person or phone
Employment Cost Job Openings and

Index
Labor Turnover
10
In this course you will:
Learn methods for estimating causal effects using

observational data
Learn some tools that can be used for other purposes, for
example forecasting using time series data
Focus on applications theory is used only as needed to
understand the whys of the methods
Learn to evaluate the regression analysis of others this
means you will be able to read/understand empirical
economics papers in other economics courses
Get some hands-on experience with regression analysis in
your problem sets
11
Experiments and Basic Statistics Intro

Consider an empirical problem: Class size and educational output
Policy question: What is the effect on test scores (or some

other outcome measure) of reducing class size by one student
per class? by 8 students/class?
We must use data to find out (Is there any way to answer this
without data?)
12
The California Test Score Data Set

All K-6 and K-8 California school districts in 1998-1999 (n = 420)
Variables:
5th grade test scores (Stanford-9 achievement test, combined
math and reading), district average
Student-teacher ratio (STR) = number of students in the
district divided by number full-time equivalent teachers
13
Initial
look
data
Initial
lookatatsome
the data:
(You(You
should
already
know
how
totointerpret
should
already
know
how
interpretthis
this table)
table)
doesnt
us anything
about
therelationship
relationship between
This This
tabletable
doesnt
tell tell
us anything
about
the
betweenand
testthe
scores
and the STR. ratio!
test scores
student-teacher
1/2/3-8
14
Do districts
smaller
classes
have
higher
testscores?
scores?
Do districts
withwith
smaller
classes
have
higher
test
Scatterplot
of test
score
v. student-teacherratio
ratio
Scatterplot
of test
score
v. student-teacher
What
does
figure
show?
What
does
thisthis
figure
show?
1/2/3-9
15
We need numerical evidence on whether districts with low STRs

have higher test scores but how?
1. Compare average test scores in districts that have low STRs
with those that have high STRs (estimation)
2. Test the null hypothesis that the mean test scores in the two
types of districts are the same, against the alternative hypothesis
that they differ (hypothesis testing)
3. Estimate an interval for the difference in the mean test scores,
high v. low STR districts (confidence interval)
16
Initial data analysis: Compare districts with small (STR <

data analysis:
districts with small
20)Initial
and large
(STR 20)Compare
class sizes:
(STR < 20) and large (STR 20) class sizes:
(Y )
Standard
deviation (sBY)B
Small
657.4
19.4
238
Large
650.0
17.9
182
Class
Size
Average score
1. Estimate = difference between group means
1. Estimation of
= difference between group means
2. 2.Test
thatthat
==
00
Testthe
thehypothesis
hypothesis
3. Construct a confidence interval for
3. Construct a confidence interval for
17
1/2/3-11
Estimation
1. Estimation
Ysmall Ylarge =
nsmall
nsmall
i 1
Yi
1
nlarge
nlarge
Yi
i 1
= 657.4 650.0
= 7.4
Is this a large difference in a real-world sense?
Standard
deviationinacross
districtssense?
= 19.1
Is this a large
difference
a real-world
between
60th
and 75th
percentiles of test score
Difference
Standard
deviation
across
districts
= 19.1
distribution
is 667.6
659.4
8.2
Difference
between
60th
P P and
75th
P =P percentiles
of test score
enough difference to be important for school
This is aisbig
distribution
667.6
659.4 = 8.2
reform discussions, for parents, or for a school committee?
This is a big enough difference to be important for school

reform discussions, for parents, or for a school
committee?
18
2. Hypothesis testing
2. Hypothesis testing
Hypothesis testing
Difference-in-means
test: compute the t-statistic,
Difference-in-means test compute the t-statistic:
Difference-in-means test: compute the t-statistic,

Ys Yl
Ys Yl
(remember this?)
t
2
2
ss
sl
SE (YYs YYl )
Y
Y
s
l
(remember this?)
t ns s2 nl l2
ss
sl
SE (Ys Yl )
ns this?)
nl
(remember
where SE(Ys Yl ) is the standard error of Ys Yl , the

to small
anderror
large
STR
districts,
subscripts
SE(Y
the
standard
error
of (Ysof Y
),the
subscripts
s and
YYs l) lisYrefer
)
is
the
standard
Y
Y
,
the
where
SE(
ss and
l
l
s
l
ns
1 s and
2l (el) refer
2 small
to(small
and
large and
STRlarge
districts,STR
and districts,
lYrefer
to
subscripts
Y
(etc.)
and ss
)
i
s
ns 11 i 1 ns
and ss2
(Yi Ys ) 2 (etc.)
ns 1 i 1
is the variance of test scores in small districts
1/2/3-13
1/2/3-13
19
Compute the difference-of-means t-statistic:

Compute the difference-of-means t-statistic:
sBYB
small
Y
657.4
19.4
238
large
650.0
17.9
182
Size
Ys Yl
ss2
ns
sl2
nl
657.4 650.0
19.42
238
17.92
182
7.4
= 4.05
1.83
|t| > 1.96, so we reject (at the 5% significance level) the null
|t| hypothesis
> 1.96, so that
reject
themeans
5% significance
the(attwo
are the samelevel) the null
hypothesis that the two means are the same.
20
3. Confidence interval
A 95%Confidence
confidence interval
interval for the difference between the
meansAis,95% confidence interval for the difference between the means
is:
(Ys Yl )
1.96 SE(Ys Yl )
= 7.4 1.96 1.83 = (3.8, 11.0)

Two equivalent statements:
This leads to two equivalent statements:

1. The
95%95%
confidence
interval
forfor
doesnt
1. The
confidence
interval
doesntinclude
include0;
0
hypothesis
thatthat=0=is0rejected
at the
5%5%
level.
2. The
2. The
hypothesis
is rejected
at the
level
1/2/3-15
21
Stata issues
We also looked at the California school district data using

Stata
You can find the data on D2L the file is named
<caschool.dta>, and the description of the data is in the pdf
file <californiatestscores.pdf>
And here are helpful materials for learning and using Stata:
o Wooldridge, Rudiments of Stata (posted on D2L)
o The University of Wisconsins Social Science Computing
Cooperative has produced two excellent (very readable)
series of Stata documentation:
22
Stata for Students at <http://www.ssc.wisc.edu/sscc/pubs/

stata_students1.htm
Stata for Researchers at <http://www.ssc.wisc.edu/sscc/
pubs/sfr-intro.htm>
Princeton University also has an excellent Online Stata
Tutorial at >http://www.princeton.edu/~otorres/Stata/>
Finally, UCLA has extensive Resources to help you learn
and use Stata at http://www.ats.ucla.edu/stat/stata/
23
Here is what we did in class with the California school

district data
1. Open the dataset by double-clicking on it
2. Look at the data by clicking the Data Browser button
3. Order the variables so the STR and test score variables come
first:

order str testscr
4. Produce descriptive statistics (mean, standard deviation,

minimum, maximum):

sum
5. Generate a dummy variable indicating whether the school

district has a large or small STR:
24
gen lg = str >= 20
6. Sort the data, then produce descriptive statistics for each of the
two groups of school districts (large and small STR):

sort lg
by lg: sum
7. Perform a t-test for whether the difference in test scores

between large and small STR school districts is significant (in the
statistical sense, as opposed to the policy sense):

ttest testscr, by lg
25
What comes next ...
The mechanics of estimation, hypothesis testing, and

confidence intervals should be familiar to you
These concepts extend directly to regression and its variants
But before we turn to regression, we will review some of the
underlying theory of estimation, hypothesis testing, and
confidence intervals
o Why do these procedures work, and why use these rather
than others?
We will review the intellectual foundations of statistics and
econometrics
26
For next time, read:

Wooldridge, Chapter 1
Wooldridge, Appendix A.1
Wooldridge, Appendix B.1 and B.2
Wooldridge, Appendix C.1 and C.2
27

1 Intro Rev

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1 Intro Rev

Hochgeladen von

Copyright:

Verfügbare Formate

Economics 420

Estimation of causal effects suggested by economic theory from

Economics (economic theory) suggests important relationships,

This course is about using data to measure causal

Ideally, we would like an experiment

But almost always we only have observational

Showing the effect (or causality) of one variable on

What would student test scores be if average class size were

What would your earnings be at age 30 if you had attended a

Structure of economic data (Wooldridge, section 1.3)

Pooled cross sections

Sources and characteristics of labor market data

Phone and mail

Mail survey with

Employment Cost Job Openings and

In this course you will:

Learn methods for estimating causal effects using

Experiments and Basic Statistics Intro

Policy question: What is the effect on test scores (or some

The California Test Score Data Set

We need numerical evidence on whether districts with low STRs

Initial data analysis: Compare districts with small (STR <

1. Estimate = difference between group means

= difference between group means

This is a big enough difference to be important for school

Difference-in-means test: compute the t-statistic,

where SE(Ys Yl ) is the standard error of Ys Yl , the

Compute the difference-of-means t-statistic:

hypothesis that the two means are the same.

= 7.4 1.96 1.83 = (3.8, 11.0)

This leads to two equivalent statements:

We also looked at the California school district data using

Stata for Students at <http://www.ssc.wisc.edu/sscc/pubs/

Here is what we did in class with the California school

order str testscr

4. Produce descriptive statistics (mean, standard deviation,

5. Generate a dummy variable indicating whether the school

gen lg = str >= 20

7. Perform a t-test for whether the difference in test scores

What comes next ...

The mechanics of estimation, hypothesis testing, and

For next time, read:

Das könnte Ihnen auch gefallen