Sie sind auf Seite 1von 43

Lecture 1: Introduction to Econometrics

University of San Francisco Department of Economics Prof. Jesse K. Anttila-Hughes January 21st, 2014

NYTimes, Jan 29th, 2013

! Today well cover:

!! Introduction to econometrics !! Review of probability !! Wooldridge Appendixes B and C !! Tuesdays 1-4pm or by appt !! On Blackboard !! Due next Monday !! Get Stata !! Download the data files from Blackboard !! Bring your laptop, with Stata installed and the data files

! Reading:

! Office Hours:

! Problem set #1

! Computer assignment #1

Goals for this class


! Emphasis on:
!! Formal econometrics :
!! Basic statistics and probability !! Fundamentals of multivariate OLS regression !! Simple connections between research questions and econometrics
!! Regression execution and interpretation !! Hypothesis testing and standard errors

!! Tacit knowledge

!! Formal foundation for Econometrics II and III

!! Including some common basic problems and their solutions / lack thereof

!! Basics of performing econometrics analysis in Stata !! Basics good habit for handling and managing data

! By the end of class you should have:

!! Simple, intuitive understandings of what econometrics is and what it can and cant do !! A beginners proficiency with Stata !! Enough econometrics to start appreciating how much there is to learn

Econometrics involves a lot of tacit learning:

Administraterrata
! Class:
!! 6:30 9:15 pm on Tuesdays !! 14 lectures total, one midterm

! Blackboard primary resource


!! Syllabus, lecture notes, hws, readings, etc.

! Grades:
!! 12 Problem sets (drop lowest 2): 35% !! Class participation: 10% !! Midterm exam: 25% !! Final exam: 30%

! Book:
!! Wooldridge 5th Edition

What were aiming for

Hsiang et al. 2013

The Nature of Econometrics and Economic Data

Econometrics
! What is econometrics?
!! The statistics used by economists !! !! !! !! Estimating relationships between economic variables Testing economic theories and hypotheses Forecasting economic variables Evaluating or implementing policies
!! Etc.

! Why do we do econometrics?

! In general, econometrics starts with an economic model (i.e., something youd learn in a theory class) and then generates testable predictions which come from that model
!! The step of explicitly defining a model is often skipped

What kind of theory models?


! Economic model of crime (Becker (1968)) !! Derives equation for criminal activity based on utility maximization
Hours spent in criminal activities Age Returns to criminal activities Wage for legal employment Probability of Probability of conviction if getting caught caught Expected sentence

Other income

!! Functional form of relationship not specified !! Equation could have been postulated without economic modeling

What kind of theory models?


! Model of job training and worker productivity !! What is effect of additional training on worker productivity? !! Formal economic theory not really needed to derive equation (but may):
Hourly wage

Years of formal education

Years of workforce experience

Weeks spent in job training

!! Other factors may be relevant, but these are the most important (?)

What kind of econometric models?


! Econometric model of criminal activity !! The functional form has to be specified !! Variables may have to be approximated by other quantities
Measure of criminal activity Wage for legal employment Other income Frequency of prior arrests

Unobserved determinants of criminal activity

Frequency of conviction

Average sentence length after conviction

Age

e.g. moral character, wage in criminal activity, family background

What kind of econometric models?


! Econometric model of job training and worker productivity
Unobserved determinants of the wage Years of formal education e.g. innate ability, quality of education, family background

Hourly wage

Years of workforce experience

Weeks spent in job training

! Most of econometrics deals with the specification of the error ! Econometric models may be used for hypothesis testing !! For example, the parameter represents effect of training on wage

!! How large is this effect? Is it different from zero?

Types of data
! Econometric analysis requires data
!! And there are many, many different things that count as data

! In general, we distinguish between the four major kinds of economic data in terms of how they interact between units of observation and time
!! Cross-sectional data !! Time series data
!! Multiple units of obs., single time !! Single unit of obs., multiple times !! Multiple unit of obs., multiple times, but different obs. each time !! Multiple units of observation with multiple time observations for each

!! Pooled cross sections

!! Panel/Longitudinal data

! Econometric methods depend on the nature of the data used


!! Use of inappropriate methods may lead to misleading results
!! This will be a big theme in later classes

Types of data
! Cross-sectional data !! Sample of individuals, households, firms, cities, states, countries, other units of interest at a given point of time/in a given period !! Cross-sectional observations must be more or less independent
!! For example, pure random sampling from a population

or

!! Sometimes pure random sampling is violated, e.g. units refuse to respond in surveys, or if sampling is characterized by clustering

Cross sectional data: One observation per unit of obs


"

Cross-sectional data set on wages and other characteristics

Indicator variables (1=yes, 0=no)

Observation number

Hourly wage

Cross sectional data: One observation per unit of obs


! Cross-sectional data on growth rates and country characteristics

Growth rate of real per capita GDP

Government consumtion as percentage of GDP

Adult secondary education rates

Types of data
! Time series data !! Observations of a variable or several variables over time
!! For example, stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, automobile sales,

!! Time series observations are typically serially correlated !! Ordering of observations conveys important information !! Data frequency: daily, weekly, monthly, quarterly, annually, !! Typical features of time series: trends and seasonality !! Typical applications: applied macroeconomics and finance

Types of data
! Time series data on minimum wages and related variables

Average minimum wage for given year

Average coverage rate

Unemployment rate

Gross national product

Types of data
! Pooled cross sections !! Two or more cross sections are combined in one data set !! Cross sections are drawn independently of each other !! Pooled cross sections often used to evaluate policy changes !! Example: !! Evaluate effect of change in property taxes on house prices !! Random sample of house prices for the year 1993 !! A new random sample of house prices for the year 1995 !! Compare before/after (1993: before reform, 1995: after reform)

Types of data
! Pooled cross sections on housing prices
Property tax Size of house in square feet Number of bathrooms

Before reform After reform

Types of data
! Panel or longitudinal data !! The same cross-sectional units are followed over time !! Panel data have both cross-sectional and a time series dimensions
!! Hence, panel data can be used to account for time-invariant unobservables

!! Panel data can be used to model lagged responses !! Example: !! City crime statistics; each city is observed in two years !! Time-invariant unobserved city characteristics may be modeled !! Effect of police on crime rates may exhibit time lag

Types of data
! Two-year panel data on city crime statistics
Each city has two time series observations

Number of police in 1986

Number of police in 1990

Causal Inference vs. Association


! One of the major aspects of econometrics that distinguishes it from statistics is a very strong emphasis on understanding causal inference
!! Causal inference: evaluating whether a change one variable (x) will lead to a change in another variable (y) assuming nothing else changes (ceteris paribus) !! A lot of times as econometricians were explicitly trying to evaluate a policy to enact
!! Thus we want to know what will happen if we change one variable

! Why do we care about causality?

! More generally, the statistical tools we have can tell us a lot about how two variables covary ! A particular concern is when our two variables x and y are endogenous, or jointly determined

!! But correlation doesnt imply causation, and to get to causal inference we generally need to know about how the problem works in real life !! Aka x and y influence each other, or theres a third variable Z that affects both

Example: Medicine (Medieval)

Four Humors Theory (not falsifiable)

Four Humors Empirics (sub-optimal outcomes)

Medicine (Victorian)

Miasma Theory (not falsifiable)

Miasma Empirics (sub-optimal outcomes)

Medicine (Early Modern, 1850s)

Theory : Cholera is a vector-borne disease transmitted by water (falsifiable)

Disease Theory Empirics (decent outcomes)

Randomization
! Ironically given its late arrival to the scientific method, medicine first developed what we may now consider the be the ultimate tool in causal inference in the sciences: the randomized control trial (or RCT)

Endogeneity and random assignment


! Why is random assignment important? ! Consider health_outcome = A*(took_drug) + !
!! where ! is our error term, took_drug is a binary variable indicating that a patient took a drug, and health_outcome is some measurement of health

! Why do we care that took_drug is randomly assigned?


!! If took_drug is randomly assigned, then is it correlated with anything?
!! If not, who cares?

Why is endogeneity an issue?


! Randomization allows a researcher to eliminate the possibility that they are arguing for a causal, exogenous interpretation of an endogenous system ! Endogenous originating from inside the system, in this case taken to mean co-influential
!! Education and earnings !! Prices of substitute or complementary goods !! Development and the environment

! Exogenous means originating outside the system


!! Interpreting an endogenous relationship as exogenous means risking interpreting a system with reverse causality as strictly causal

Endogeneity by example: Classroom size and educational achievement

How is the relationship endogenous?

Class size: ways of attacking endogeneity


! Randomly assign students to large or small classes
!! Tennessee STAR

! Find a natural experiment that produces something akin to randomization in class size
!! Maimonides Rule in Israel

! But these dont always work the way we think


!! Discontinuous class size cutoffs in Chile

Why all this concern about endogeneity?


! Endogeneity is particularly troublesome in the social sciences because humans are self-aware
!! Humans might sort on / select into treatment

! Our understanding of how to deal with endogeneity is relatively new


!! Natural scientists dont normally have intelligent, reactive data points !! Medicine, which one might argue is halfway between the natural and social sciences, needed to be concerned with endogeneity early on

! Endogeneity is still a concern in the natural sciences, though


!! correlation does not imply causation

Das könnte Ihnen auch gefallen