Sie sind auf Seite 1von 162

Lecture Notes for Econ 101A

David Card

Dept. of Economics
UC Berkeley

The manuscript was typeset by Daniel Nolan in L


A
T
E
X. The gures were created in Asymptote, Inkscape, R,
and Excel (the marjority in Inkscape). Please address comments/corrections to daniel nolan@msn.com, with Card
Lecture Notes in the subject line.
Contents
1 Optimization 7
1.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 SOC in Higher Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Consumer Choice 14
2.1 Budget Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Consumers Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Consumers Optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Special Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Two Applications of Indierence Curve Analysis 23
3.1 Analysis of a Subsidy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 The Consumer Price Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Indirect Utility and the Expenditure Function 28
4.1 Indirect Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Expenditure Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Comparative Statics of Consumer Choice 31
5.1 Change in Demand with Respect to Income, Engel Curves . . . . . . . . . . . . . . . 31
5.2 Change in Demand with Respect to Price . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Graphical Decomposition of a Change in Demand . . . . . . . . . . . . . . . . . . . . 34
5.4 Substitution Eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.5 Income Eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Slutskys Equation 38
6.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Slutsky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Using Market Level Demand Curves 42
7.1 An Increase in Income . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Tax Incidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Labor Supply 48
9 Intertemporal Consumption 52
10 Production and Cost I 55
10.1 One-Factor Production and Cost Functions . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.1 Production Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.1.2 Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
10.1.3 Connection between MC and MP . . . . . . . . . . . . . . . . . . . . . . . . 58
10.1.4 Geometry of c, AC, and MC . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1
11 Production and Cost II 62
11.1 Derivation of the Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
11.2 Marginal Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12 Cost Functions and IRFs 68
12.1 Sheppards Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
13 Supply 70
13.1 Supply Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
13.2 The Law of Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.3 Changes in Input Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
14 Input Demand for a Competitive Firm 75
15 Industry Supply 80
16 Monopoly I 82
16.1 Monopolists Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
16.2 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
16.3 Monopoly in Two or More Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
17 Monopoly II 87
18 Consumers Surplus 91
19 Duopoly 94
19.1 Monopolization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
19.2 Duopoly Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
19.3 Price Setting vs. Quantity Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
20 Symmetric Cournot Equilibria 99
20.1 n-Firm Symmetric Cournot Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . 99
20.2 Alternatives to the Cournot Assumption . . . . . . . . . . . . . . . . . . . . . . . . . 100
21 Game Theory I 102
22 Game Theory II 106
22.1 Tree Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
22.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
23 Uncertainty I: Income Lotteries 110
23.1 Review of Basic Statistical Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
23.2 Choices Over Uncertain Incomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
24 Uncertainty II: Expected Utility 114
24.1 Expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
24.2 The Demand for Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2
25 Uncertainty III: Moral Hazard 118
25.1 Solution with No Moral Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
25.2 A Partial Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
26 Uncertainty IV: The State-preference Approach and Adverse Selection 122
26.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
26.2 Adverse Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
27 Auctions I: Types of Auctions 127
27.1 Basic Types of Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
27.2 Important Results Concerning the Private Values Case . . . . . . . . . . . . . . . . . 128
27.3 Bidding in a First-price Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
28 Auctions II: Winners Curse 131
28.1 Appendix: Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
28.1.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
28.1.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
29 Finance I: Capital Asset Pricing Model 135
29.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
29.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
29.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
30 Finance II: Ecient Market Hypothesis 139
30.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
30.2 Ecient Market Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
31 Public and Near-public Goods 143
31.1 Optimal Provision of Goods with No-rivalry Characteristics . . . . . . . . . . . . . . 143
31.1.1 Case 1: one consumer; x = t
1
/p. . . . . . . . . . . . . . . . . . . . . . . . . . 143
31.1.2 Case 2: two consumers; x = (t
1
+t
2
)/p. . . . . . . . . . . . . . . . . . . . . . 143
31.1.3 Case 3: n consumers; x = /p, where =

n
i=1
t
i
. . . . . . . . . . . . . . . . 145
31.2 Appendix: Social Optimum with Ordinary Goods . . . . . . . . . . . . . . . . . . . . 146
32 Externalities 148
32.1 Consumption Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
32.1.1 Market Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
32.1.2 Social Optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
32.1.3 Market Equilibrium versus Social Optimum . . . . . . . . . . . . . . . . . . . 150
32.1.4 Other Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
32.2 Production Externalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
33 Empirical Methods in Microeconomics 154
33.1 Experiments and Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
33.1.1 The Self Suciency Project (SSP) . . . . . . . . . . . . . . . . . . . . . . . . 155
33.2 Research Designs Based on Natural Experiments . . . . . . . . . . . . . . . . . . . . 157
33.2.1 The Mariel Boatlift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3
33.3 Natural Experiments with Several Control Groups . . . . . . . . . . . . . . . . . . . 157
33.3.1 The New Jersey Minimum Wage . . . . . . . . . . . . . . . . . . . . . . . . . 158
33.4 The Discontinuity Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4
Course Description
This is a course in intermediate microeconomics, emphasizing the applications of calculus and linear
algebra to the problems of consumer choice, rm behavior, and market interactions. Students are
presumed to be familiar with multivariate calculus (including e.g. limits, derivatives, integrals) and
with basic statistics (random variables, moments, etc.). The course material will be presented in a
fairly mathematical way and the problem sets and examinations will require you to apply models
and derive results. Students who are concerned about their mathematical ability should consider
Econ 100A.
The basic text is Microeconomic Theory: Basic Principles and Extensions, by Nicholson & Snyder,
which should be available at the campus book store. An alternative, slightly more theoretical
treatment of the same material is Varians Intermediate Microeconomics: A Modern Approach.
Another, slightly more application-oriented alternative is Perlos Microeconomics: Theory and
Applications with Calculus. Any of the these is a good supplement to the lectures, but the lectures
will be at a somewhat higher level, and will not follow the texts closely.
Problem sets and practice exams will be made available on the course website.
The GSIs will present some additional material in section (for which all students will be responsible)
and also will review the solutions to problem sets, practice exams, and problems from the lectures,
etc.
Weekly problem sets will be assigned most weeks throughout the course. Completed problem sets
are due at the end of the last lecture each week. We will not accept late problem sets. Instead, we
drop your two worst scores. Thus, you can miss up to two problem sets without any penalty. You
are encouraged to work in groups but every student must hand in his or her own version of the
solutions.
Course grades will be determined by a combination of weekly problem sets (20 percent), two
midterm exams (15 percent each), and a nal exam (50 percent). The midterm exams will be
held in class.
5
Lecture Topics
1 Methods of Optimization
2 Consumer Choice
3 Applications of Indierence Curve Analysis, Expenditure Function
4 Comparative Statics, Slutskys Equation
5 Market Level Demand and Supply
6 Labor Supply
7 Intertemporal Consumption & Savings
89 Production & Cost, Sheppards Lemma
1011 Supply Determination
12 Monopoly and Price Discrimination
13 Consumer/Producer Surplus & Applications
1415 Duopoly
1617 Game Theory
1821 Uncertainty and Insurance Markets
2223 Auctions
2425 Finance: CAPM and Ecient Markets
2627 Public Goods, Externalities
28 Empirical Methods in Microeconomics
6
1 Optimization
1.1 Unconstrained Optimization
Consider a smooth function y = f(x). How do we go about nding a point x
0
such that y
0
=
f(x
0
) f(x) for any x in [a, b]?
Figure 1.1: In this picture f(x0) = max
axb
f(x). (Read: f(x0) is the maximum value of f(x) when x
is selected from the interval [a, b].)
What can we say generally? Obviously, if x
0
is a potential candidate for a maximizer, then it must
be the case that we cant move around x
0
and reach a higher value of f. But this means f

(x
0
) = 0.
Why? Let 0 < h 1.
If f

(x) > 0, then f(x +h) f(x) +hf

(x) > f(x).


If f

(x) < 0, then f(x h) f(x) hf

(x) > f(x).


This leads us to Rule 1:
If f(x
0
) = max
axb
f(x), then f

(x
0
) = 0.
This is called the rst order necessary condition (FONC) for an interior maximum.
Does f

(x
0
) = 0 always mean that x
0
is a maximizer? Are there maximizers with f

(x
0
) = 0?
Consider the examples illustrated in Figure 1.3.
How can we be certain that we have located a maximum (not a minimum, nor an inection point)?
We examine the properties of f

(x), which is itself a function of x. Take a look at Figure 1.4. As


the function f

crosses x
0
from left to right, it goes from positive to negative, i.e. its decreasing.
On the other hand, as f

crosses x
1
from left to right, it goes from negative to positive, i.e. its
increasing. In general, at a local maximum f

(x) has negative slope, or in other words f

(x) < 0,
while at a local minimum f

(x) has positive slope, that is f

(x) > 0.
These considerations lead us to Rule 2:
If f

(x
0
) = 0 and f

(x
0
) < 0, then f(x
0
) is a local maximum.
If f

(x
0
) = 0 and f

(x) > 0, then f(x


0
) is a local minimum.
7
Figure 1.2: Notice that Rule 1 also holds for a function of several variables.
(a) (b) (c)
Figure 1.3: Exceptions to the converse of Rule 1: (a) f(x) = x. Thus f(b) = max
axb
f(x) even though
f

(b) = 1 = 0. The maximum occurs on the boundary. (b) f

(x) = 0 has two solutions, x

and x

but neither one is a maximizer. f(x

) is a local maximum while f(x

) is a minimum.
(c) f(x) = x
3
. Solving f

(x) = 0 gives x = 0, which is an inection point.


8
Figure 1.4: Properties of f

(x): at a local max f

is decreasing since the tangent lines go from positive to


negative. The reverse is true for a local min.
This generalizes to two or more dimensions.
How do we determine whether a local maximum is a global maximum? If f

(x) < 0 for all x and


f

(x
0
) = 0, then x
0
is a global maximum. A function f such that f

(x) < 0 for all x is called


concave.
1
Figure 1.5: A concave function always lies below any line tangent to its graph.
1.2 Constrained Optimization
Now we consider maximizing a function f(x
1
, x
2
) subject tos.t.some constraint on x
1
and x
2
which we denote by g(x
1
, x
2
) = g
0
. The two important examples of this in economics are:
1
See Appendix 1.3.
9
In the study of consumer behavior, maximize utility u(x
1
, x
2
) s.t. the budget constraint
p
1
x
1
+p
2
x
2
= I.
In the study of rm behavior, maximize prot py wx s.t. the production function y = f(x).
How do we go about a graphical analysis of the problem of maximizing f(x
1
, x
2
) s.t g(x
1
, x
2
) = g
0
?
Figure 1.6: Illustration of two-step approach described on p. 10.
A two-step approach:
1. Plot the contours of the function g. E.g. g(x
1
, x
2
) = x
2
1
+x
2
2
; g(x
1
, x
2
) = k is the equation of
a circle with radius

k and center O = (0, 0).


2. Plot the contours of the function f. E.g. f(x
1
, x
2
) = x
1
x
2
; f(x
1
, x
2
) = m is the equation of
a hyperbola.
The constrained maximum of the function f occurs where a contour of f is tangent to the contour
of g corresponding to g
0
. Why? Suppose we add a small amount dx
1
to x
1
in such a way as to
keep g(x
1
, x
2
) constant. If so, then we must have a corresponding reduction in x
2
such that the
total dierential of g is zero, i.e.
dg = g
1
(x
1
, x
2
)dx
1
+g
2
(x
1
, x
2
)dx
2
= 0
(where g
i
denotes g/x
i
), which implies
dx
2
dx
1
=
g
1
(x
1
, x
2
)
g
2
(x
1
, x
2
)
If we increase x
1
by one unit, we must increase x
2
by g
1
(x
1
, x
2
)/g
2
(x
1
, x
2
)or, equivalently,
decrease x
2
by g
1
(x
1
, x
2
)/g
2
(x
1
, x
2
)in order to keep the value of g constant. The net eect of
10
such a change in x
1
on the value of f is
df = f
1
(x
1
, x
2
)dx
1
+f
2
(x
1
, x
2
)dx
2
= f
1
(x
1
, x
2
)dx
1
+f
2
(x
1
, x
2
)
dx
2
dx
1
dx
1
=
_
f
1
(x
1
, x
2
) f
2
(x
1
, x
2
)
g
1
(x
1
, x
2
)
g
2
(x
1
, x
2
)
_
dx
1
Now in order for (x
0
1
, x
0
2
) to be a constrained maximum, it must be the case that we cannot increase
f by adding or subtracting a small amount to x
1
while keeping the value of g constant. But this
means the above expression is 0 for all dx
1
, or in other words
f
1
(x
1
, x
2
)
f
2
(x
1
, x
2
)
=
g
1
(x
1
, x
2
)
g
2
(x
1
, x
2
)
But this expression says that at (x
0
1
, x
0
2
), the contours of f and g are tangent, i.e. have the same
slope. Note that this argument applies only if (x
0
1
, x
0
2
) lies in the interior of the domain for if (x
0
1
, x
0
2
)
lies on the boundary then we cannot increase or decrease one of x
1
or x
2
.
How do we convert a constrained maximization problem into an unconstrained one? A French
mathematician named Lagrange noted that one gets the right answer by setting up an articial,
unconstrained maximization problem with an additional variable, :
L(x
1
, x
2
, ) = f(x
1
, x
2
) [g(x
1
, x
2
) g
0
]
The FONC for L, with respect to x
1
, x
2
, and are:
L
1
= f
1
(x
1
, x
2
) g
1
(x
1
, x
2
) = 0
L
2
= f
2
(x
1
, x
2
) g
2
(x
1
, x
2
) = 0
L

= g(x
1
, x
2
) g
0
= 0
Dividing the rst of these by the second gives
f
1
(x
1
, x
2
)
f
2
(x
1
, x
2
)
=
g
1
(x
1
, x
2
)
g
2
(x
1
, x
2
)
while the third simply restates the constraint! Thus by writing down the Lagrangian L and setting
its rst derivatives equal to zero we get the necessary conditions for a constrained maximum.
We also get a new variable, , called the Lagrange multiplier. How do we interpret ? It turns out
that the value of tells us how much the maximum value of f changes if we relax the constraint by a
small amount. Specically, suppose we are to maximize f(x
1
, x
2
) s.t. the constraint g(x
1
, x
2
) = g
0
.
Call the solution (x
0
1
, x
0
2
). Now suppose we relax the constraint and instead maximize f(x
1
, x
2
) s.t.
g(x
1
, x
2
) = g
0
+ dg
0
. How do we change our optimal choices of x
1
and x
2
? Suppose we decide to
use more x
1
, enough to use up the added constraint. Since the total dierential of g is
dg = g
1
(x
1
, x
2
)dx
1
+g
2
(x
1
, x
2
)dx
2
if we change only x
1
, (that is, if dx
2
= 0), the amount we can change x
1
while satisfying the new
constraint is
dx
1
=
1
g
1
(x
1
, x
2
)
dg
0
11
The increase in f that accompanies this increase in x
1
is
df = f
1
(x
1
, x
2
)dx
1
=
f
1
(x
1
, x
2
)
g
1
(x
1
, x
2
)
=
You are encouraged to check for yourself that if you were to use up the added constraint on x
2
, df
would again be . This suggests another interpretation of the tangency condition: at a maximum,
if we had a bit more constraint, then we would be indierent as to whether to use it on x
1
or x
2
.
As with unconstrained optimization, there are also second order conditions. These can be expressed
algebraically; however, they amount to the condition that the objective function has contours that
are more convex than the constraint.
2
(a) (b)
Figure 1.7: (a) Contours of f are more convex than g(x1, x2) = g0: SOC satised. (b) Contours of f are
linear, less convex than g(x1, x2) = g0: SOC not satised.
1.3 Appendix
1.3.1 Convexity
A set S R
2
is convex if, for every pair of points u = (u
1
, u
2
) and v = (v
1
, v
2
) in S,
[0, 1] = u + (1 )v S
i.e. the line segment joining u and v lies entirely in S. A set that is not convex is called concave.
A function f : [a, b] R is called convex if, for every x
1
and x
2
in [a, b],
[0, 1] = f(x
1
+ (1 )x
2
) f(x
1
) + (1 )f(x
2
)
2
See Appendix 1.3.
12
Or, equivalently, f : [a, b] R is convex if the set S = {(x, y) [a, b] R : y f(x)} is convex. A
function g : [a, b] R is called concave if g is convex. Let f be twice dierentiable. Then
f is convex f

(x) > 0 for all x


f is concave f

(x) < 0 for all x


Throughout these notes, if f

(x) >[<] g

(x) >[<] 0 on some interval, then we shall think of f as


being more[less] convex[concave] than g.
A function f : R
2
R is quasi-concave if S
k
= {(x, y) R
2
: f(x, y) k} is convex for all k. (The
sets S
k
are called upper contour sets.)
1.3.2 SOC in Higher Dimensions
Let f : R
n
R, i.e. let z = f(x
1
, . . . , x
n
), and dene the Hessian H(f) to be the matrix
H(f) =
_
_
_
_
_
_
_

2
f
x
2
1

2
f
x1x2


2
f
x1xn

2
f
x2x1

2
f
x
2
2


2
f
x2xn
.
.
.
.
.
.
.
.
.
.
.
.

2
f
xnx1

2
f
xnx2


2
f
x
2
n
_
_
_
_
_
_
_
Next, dene H
i
(f) to be the ith principal minor of H(f), the submatrix comprised of the rst i
rows and the rst i columns of H(f). For example
H
2
(f) =
_

2
f
x
2
1

2
f
x1x2

2
f
x2x1

2
f
x
2
2
_
If, at z
0
= f(x
0
1
, . . . , x
0
n
), |H
i
(f)| > 0 for all i, then z
0
satises the SOC for a local minimum. On
the other hand, if sgn(|H
i
(f)|) = (1)
i
for all i, then z
0
satises the SOC for a local maximum.
13
2 Consumer Choice
In this section we apply the methods of optimization of Section 1 to the analysis of consumer choice
subject to a budget constraint. The problem has three elements:
1. Describe the budget constraint.
2. Describe the consumers objective, i.e. his or her utility.
3. Set up and solve the constrained optimization.
2.1 Budget Constraint
We assume that a consumer must choose among bundles (x
1
, . . . , x
n
) of commodities 1 through n
that fall within his or her budget. In the case of just two goods x
1
and x
2
let their prices be p
1
and p
2
, respectively. Let the consumer have income I. Then the bundle (x
1
, x
2
) is aordable i
p
1
x
1
+p
2
x
2
I.
Figure 2.1: Graphically, the set of aordable bundles (the budget set) is the triangular region bounded
by the coordinate axes and the line x2 = (p1/p2)x1 +I/p2.
Note the following:
if all income is spent on x
1
, the total amount available is I/p
1
(and likewise for x
2
)
we are implicitly assuming that you cannot buy negative amounts of x
1
or x
2
the slope of the budget line (the outer boundary of the budget set) is p
1
/p
2
2.2 Consumers Objective
We seek a simple way of summarizing how the consumer evaluates alternative bundles, say (x
0
1
, x
0
2
)
and (x

1
, x

2
).
14
Figure 2.2: If we give up one unit of x1, we save p1, which can be used to purchase p1/p2 units of x2.
The market trades x1 for x2 at the rate p1/p2. This ratio represents the relative price of x1
and x2.
Graphically, the device we use is the indierence curve: a curve connecting bundles that are equally
good. Consider the indierence curve through (x
0
1
, x
0
2
), i.e. the set of bundles that are as good
as (x
0
1
, x
0
2
).
Now take a look at Figure 2.4. If both x
1
and x
2
are desirable, then bundles with more x
1
and
more x
2
must be preferred to (x
0
1
, x
0
2
). By the same token, (x
1
, x
2
) must be preferred to bundles
with less x
1
and less x
2
. This means that indierence curves must have negative slope.
In more advanced treatments of economic theory, indierence curves are derived from a set of
assumptions about how consumers evaluate alternative bundles. Some types of preferences cannot
be represented by indierence curves. The classic example is lexicographic preferences: the
consumer evaluates a bundle (x
1
, x
2
) rst by the amount of x
1
, then by the amount of x
2
. If
x
0
1
> x

1
, then (x
0
1
, x
0
2
) is strictly preferred to (x

1
, x

2
) regardless of x
0
2
and x

2
. However, if x
0
1
= x

1
,
then the consumer compares x
0
2
and x

2
. (This is the same way alphabetical order works.) As an
exercise, try to graph the indierence curves of a consumer with lexicographic preferences.
Analytically, we represent preferences by a utility function u(x
1
, x
2
) with domain equal to the set
of possible consumption bundles. We construct u such that higher values are preferred.
Examples:
u(x
1
, x
2
) = x
1
x
2
u(x
1
, x
2
) = x
1
+x
2
u(x
1
, x
2
) = min {x
1
, x
2
}
Facts:
The contours of u are the indierence curves.
The bundles (x
0
1
, x
0
2
) and (x

1
, x

2
) lie on the same indierence curve i u(x
0
1
, x
0
2
) = u(x

1
, x

2
).
15
Figure 2.3: How does a consumer decide between (x
0
1
, x
0
2
) and (x

1
, x

2
)?
Figure 2.4: If both x1 and x2 are desirable, then it follows that indierence curves are downward-sloping.
16
Let h > 0. If more of x
1
is always preferred, then u(x
1
+ h, x
2
) > u(x
1
, x
2
), which implies
u
1
(x
1
, x
2
) > 0 for every bundle (x
1
, x
2
). (Likewise for x
2
.) You are encouraged to verify this
for each of the above examples.
The slope of the indierence curve through (x
1
, x
2
), at (x
1
, x
2
), is u
1
(x
1
, x
2
)/u
2
(x
1
, x
2
).
We call the absolute value of this ratio the marginal rate of substitution (MRS) because it
is the amount of x
2
the consumer would need to compensate for the loss of one unit of x
1
,
or in other words the amount of x
2
needed, per unit of x
1
given up, in order to keep utility
constant.
Figure 2.5: The slope of the indierence curve through (x
0
1
, x
0
2
) is MRS = u1(x
0
1
, x
0
2
)/u2(x
0
1
, x
0
2
).
Examples:
u(x
1
, x
2
) = x

1
x

2
(Cobb-Douglas)
u
1
(x
1
, x
2
) = x
1
1
x

2
u
2
(x
1
, x
2
) = x

1
x
1
2
MRS =
u
1
(x
1
, x
2
)
u
2
(x
1
, x
2
)
=


x
2
x
1
u(x
1
, x
2
) = x
1
+x
2
MRS =
u
1
u
2
= 1, a constant for every bundle (x
1
, x
2
)
u(x
1
, x
2
) = 2 log x
1
+x
2
MRS =
u
1
u
2
=
2/x
1
1
=
2
x
1
, independent of x
2
As an exercise, graph the indierence curves for these three examples.
Note: If your utility function is u(x
1
, x
2
) and mine is v(x
1
, x
2
) = au(x
1
, x
2
) +b, where a > 0, then
we have the same preferences. Why? It can be shown that we have the same indierence curves,
17
only with dierent labels. The result holds for v = f(u), where f is a monatonically increasing
function.
You may be familiar with the concept of diminishing marginal rate of substitution (DMRS). Unless
stated otherwise, we shall assume DMRS in most of the examples throughout these notes.
(a) (b) (c)
Figure 2.6: (a) DMRS (b) constant MRS (c) increasing MRS
Along an indierence curve, (holding utility constant), the MRS decreases with x
1
. As one obtains
more x
1
, the less one values an additional unit of x
1
in terms of x
2
. DMRS implies that consumers
always prefer averages. Suppose we have two bundles (x
0
1
, x
0
2
) and (x

1
, x

2
), on the same indierence
curve. Then a bundle that is a weighted average of (x
0
1
, x
0
2
) and (x

1
, x

2
), e.g. (x
0
1
, x
0
2
) + (1
)(x

1
, x

2
), where 0 < < 1, is strictly preferred to either of the original bundles.
Figure 2.7: The dashed line represents the set of all weighted averages of x
0
and x

, that is, the set


S = {x
0
+ (1 )x

: 0 < < 1}. Clearly these are strictly preferred to both x


0
and x

.
Equivalently, the set S = {x R
2
: u(x) > u(x
0
)} is convex. (One can see this by noting the
shape of the region above the indierence curve.)
It is important to understand that DMRS is not the same as diminishing marginal utility, nor are
the two even related. Given a utility function u, the marginal utility of x
1
is u
1
. We say that u
exhibits diminishing marginal utility if u
11
= (u
1
)
1
< 0. However, the sign of u
11
says nothing
about the MRS, as the following examples show:
u(x
1
, x
2
) = (x
2
1
+x
2
2
)
1/4
u
1
(x
1
, x
2
) = (1/2)(x
2
1
+x
2
2
)
3/4
18
u
11
(x
1
, x
2
) = (3/4)(x
2
1
+ x
2
2
)
7/4
< 0 = decreasing marginal utility but the indierence
curves are circles, which exhibit increasing MRS.
u(x
1
, x
2
) = x
3
1
x
3
2
u
1
(x
1
, x
2
) = 3x
2
1
x
3
2
u
11
(x
1
, x
2
) = 6x
1
x
3
2
> 0 = increasing marginal utility but the indierence curves are
hyperbolas, which exhibit DMRS.
2.3 Consumers Optimum
Analytically, the consumers problem is to solve
max
x1,x2
u(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I
Have a look at Figure 2.8. Clearly, a bundle (x
0
1
, x
0
2
) is optimal if two things are true:
Figure 2.8: The consumer chooses the bundle that lands her on the highest indierence curve while still
lying on the budget line.
1. p
1
x
0
1
+p
2
x
0
2
= I,
2. MRS(x
0
1
, x
0
2
) = p
1
/p
2
.
Condition (2), the tangency condition, expresses the simple fact that if (x
0
1
, x
0
2
) is optimal, then
there are no gains to be made by trading in the market any further. If MRS > p
1
/p
2
, then the
consumer values x
1
more than the market does, in terms of x
2
, so it would benet the consumer
to sell x
2
and buy more x
1
as you can see in Figure 2.9.
19
Figure 2.9: MRS > p1/p2. On the margin, the consumer values x1 more than the market does, in terms
of x2, and there is room for a protable trade! What happens if MRS < p1/p2?
To proceed analytically, lets use the Lagrangian method:
L(x
1
, x
2
, ) = u(x
1
, x
2
) (p
1
x
1
+p
2
x
2
I)
L
1
= u
1
(x
1
, x
2
) p
1
= 0 (2.1)
L
2
= u
2
(x
1
, x
2
) p
2
= 0 (2.2)
L

= p
1
x
1
p
2
x
2
+I = 0 (2.3)
Dividing (2.1) by (2.2) gives the tangency condition
u
1
(x
1
, x
2
)
u
2
(x
1
, x
2
)
=
p
1
p
2
Also,
=
u
1
(x
1
, x
2
)
p
1
=
u
2
(x
1
, x
2
)
p
2
With an extra dollar to spend one could either
(a) buy 1/p
1
units of x
1
and increase utility by u
1
(x
1
, x
2
)/p
1
= , or
(b) buy 1/p
2
units of x
1
and increase utility by u
2
(x
1
, x
2
)/p
2
= .
For this reason, is sometimes called the marginal utility of income.
For example, if u(x
1
, x
2
) = x
1
x
2
, then L = x
1
x
2
(p
1
x
1
+p
2
x
2
I), and the FONC are:
L
1
= x
2
p
1
= 0
L
2
= x
1
p
2
= 0
L

= p
1
x
1
p
2
x
2
+I = 0
20
Therefore, x
1
= p
2
and x
2
= p
1
. Plugging these results back into (2.3):
p
1
(p
2
) +p
2
(p
1
) = I
= 2p
1
p
2
= I
= =
I
2p
1
p
2
=
_
x
1
= x
1
(p
1
, p
2
, I) = I/2p
1
,
x
2
= x
2
(p
1
, p
2
, I) = I/2p
2
The functions x
1
(p
1
, p
2
, I) and x
2
(p
1
, p
2
, I) are called the demand functions. Notice that p
1
x
1
=
p
2
x
2
= I/2, so the consumer spends half his or her income on each good! As an exercise, re-do the
analysis for U(x
1
, x
2
) = x

1
x

2
with dierent values of and .
2.4 Special Problems
Preferences do not satisfy DMRS (Figure 2.10). Often, we restrict preferences by requiring the
indierence curves to be convex to the origin. (Functions with this property are called quasi-
concave. A function u : R
2
R is quasi-concave if the upper contour sets S
k
= {(x
1
, x
2
)
R
2
: u(x
1
, x
2
) k} are convex for all k.)
Even with quasi-concave preferences, i.e. with convex indierence curves, we still can run into
problems (Figure 2.11). Most consumers consume zero units of most goods, so the endpoint
problem is potentially one that economists must deal with. The problem is much worse the
more narrowly goods are dened, (e.g. Coke versus Pepsi), and becomes less serious the
more broadly they are dened (e.g. beverages in general). A considerable amount of applied
research regarding consumer demand involves the so-called discrete choice approach, focusing
on whether consumers buy some or none of a given commodity. Daniel McFadden won the
Nobel Prize for his research showing how to link the buy, dont buy decision to underlying
utility functions.
21
(a) (b)
Figure 2.10: (a) Indierence curves exhibit CMRS, and there is no bundle with MRS = p1/p2. (b)
MRS = p1/p2 but this point is not a maximumwhats wrong?
(a) (b)
Figure 2.11: Endpoint optima: (a) MRS < p1/p2, (x1, x2) = (0, I/p2) (b) MRS > p1/p2, (x1, x2) =
(I/p1, 0).
22
3 Two Applications of Indierence Curve Analysis
We have seen that the consumers optimum is represented by a tangency between an indierence
curve and the budget constraint. This condition expresses the simple economic idea that the
consumer, on the margin, cannot adjust her consumption bundle to spend the same amount of
money and simultaneously achieve higher utility. Recall that the tangency condition is only true
when the indierence curves exhibit DMRS, and we dont have an endpoint optimum.
3.1 Analysis of a Subsidy
In many economies, certain commodities are subsidized by the government. A subsidy is a negative
tax that is usually introduced to aid low income consumers. Economists generally argue that
subsidies are inecient. Why?
Let there by two commodities: food f and other stu x. The price of other stu is p
x
, and
the price of food is p
f
. A typical consumer has income I and normal preferences, (quasi-concave
indierence curves with DMRS). The budget constraint is p
x
x +p
f
f = I. See Figure 3.1.
Figure 3.1: Budget constraints with and without food subsidy. (x

, f

) denotes the optimal choice under


the subsidy arrangement.
Suppose now that a subsidy of $s per unit is introduced on food. The budget constraint becomes
p
x
x + (p
f
s)f = I. If the consumer chooses the bundle (x

, f

), then the cost of the subsidy to


the government (for this consumer alone) is $sf

. Most economists would argue that you should


instead give the consumer $sf

directly and leave the price of food alone. To see this, suppose the
lump sum is given to the consumer directly, but she is forced to pay the market, unsubsidized price
for food. In this case her budget constraint is
p
x
x +p
f
f = I +sf

(3.1)
Notice that the bundle (x

, f

) satises the budget constraint, since originally


p
x
x + (p
f
s)f = I
23
In other words, if I give the consumer $sf she still can aord (x

, f

). But she can do even better,


as shown in Figure 3.2.
Figure 3.2: The unsubsidized budget constraint corresponding to I + sf

cuts the original indierence


curve and therefore enables the consumer to achieve higher utility.
The reason is that the budget line (3.1), with the lump sum, is atter than the budget line with
the subsidy. They both pass through (x

, f

), so the budget line (3.1) cuts through an indierence


curve and therefore enables the consumer to choose a bundle with higher utility.
Figure 3.3 illustrates the same point.
3.2 The Consumer Price Index
The CPI is a measure of how much it costs today (in todays dollars) to buy a xed bundle of
commodities. We currently use 1982-84 as our reference period, which means the CPI is calculated
by nding the cost of the bundle relative to its cost in 1982-84, $100.
Suppose the CPI is 177.5, (which it was in July 2001). That means it now costs 1.775 times as
much to purchase the standard bundle as it did on average in 1982-84. If someone earns 1.78
times as much as he did in the early 80s, then he is at least as well o as he was then.
Does your nominal income necessarily have to rise in proportion with the CPI? Suppose that in
1983 you purchased (x
0
, y
0
) at prices (p
0
x
, p
0
y
). Your income was I
0
, and
x
0
p
0
x
+y
0
p
0
y
= I
0
Now suppose that in 2001 prices are (p
0
x
(1+), p
0
y
(1+)). In this case both prices increased at the
rate of . How much would your income have to increase in order to oset the increase in prices?
See Figure 3.4.
24
Figure 3.3: Note that = sf

/px, or the subsidy at initial optimum, in terms of x.


On the other hand, suppose p
x
rises by 3/2 and p
y
rises by /2, i.e.
p
x
= p
0
x
_
1 +
3
2

_
,
p
y
= p
0
y
_
1 +
1
2

_
.
The increase in the cost of living is represented by the increase in the cost of the reference bundle
(x
0
, y
0
):
p
0
x
_
1 +
3
2

_
+p
0
y
_
1 +
1
2

_
p
0
x
x
0
p
0
y
y
0
=
3
2
p
0
x
x
0
+
1
2
p
0
y
y
0
.
If you initially spent half your income on each of x and y, then p
0
x
x
0
= p
0
y
y
0
= I
0
/2, and the
increase in the cost of living is
3
2

I
0
2
+

2

I
0
2
= I
0
,
a proportional increase of . But, if your income increases by , you are better o!
The reasoning is as follows: If your income increases by enough to allow you to buy (x
0
, y
0
) your
budget is represented by the dashed line. But with that budget, you will not consume (x
0
, y
0
); you
will consume a bundle with more y, less x, and higher utility. You respond to the change in relative
prices by altering your consumption. See Figure 3.5.
The CPI is really a weighted average of prices for a xed set of purchases. See Table 1 for an
example of some of the major categories and their weights. Note the slow growth of apparel prices
(usually attributed to the rapid rise in cheap imports) and the very rapid growth in medical prices.
25
Figure 3.4: If all prices rise by the same factor, the consumer is in fact worse o.
Figure 3.5: If some prices rise more than others, the new budget line, (assuming income rises in proportion
to CPI), cuts the original indierence curve.
26
Table 1: Major Purchase Categories in CPI and Corresponding Weights
Category Weight Price Index (Dec. 2000)
All 100.0 174.1
Food & Beverage 16.3 169.5
Housing 39.6 171.6
Apparel 4.7 131.8
Transportation 17.5 155.2
Medical 5.8 264.1
Recreation 6.0 103.7

Education 2.7 115.4

Communication 2.7 92.3

Other Items 4.7 276.2


* Reference period is Dec. 1997, not 1982-84.
The dierence between the rate of increase in the average price of the reference bundle and the
minimum increase in income necessary in order to maintain the original level of utility is called the
substitution bias in the CPI. Note that it depends on two things: how disproportionately prices
for dierent goods are rising, and how convex ones indierence curves are. The more convex the
indierence curves, and the more dispersion in relative price increases, the bigger the substitution
bias. The Boskin Commission estimates that on average substitution bias was about 0.5% per year
in the U.S. over the past couple decades.
There are lots of other, bigger sources of bias in the CPI. One that is hard to measure is quality bias:
consumer goods change over time, which makes it hard to hold the reference bundle constant. Some
new inventions since the early 80s: CD/DVD players, airbags and anti-lock breaks, the internet,
laser printers, portable PCs, cell phones, The X-Files. Roughly speaking, quality changes are
handled in the CPI by attempting to subtract the part of any price change that is due to quality,
measured at the time the higher quality product is introduced. So, for example, when airbags rst
became available manufacturers charged about $500 extra for them. Thus, when we compare the
price of a new car in 2001 that is equipped with airbags, to a similar model in 1990 without airbags,
we subtract $500 from the 2001 price before computing the price ratio.
27
4 Indirect Utility and the Expenditure Function
4.1 Indirect Utility
We characterized the solution to the problem
max
x1,x2
u(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I
as an optimal pair (x
0
1
, x
0
2
) that satises the rst order conditions (tangency, budjet constraint).
Note that (x
0
1
, x
0
2
) varies with (p
1
, p
2
, I). We call the optimal choices at a given level of prices and
income the demand functions and write:
x
1
= x
0
1
(p
1
, p
2
, I)
x
2
= x
0
2
(p
1
, p
2
, I)
Note that p
1
x
0
1
(p
1
, p
2
, I)+p
2
x
0
2
(p
1
, p
2
, I) = I, so the demand functions satisfy the budget constraint
by denition, even as prices vary. This gives rise to restrictions on the demand functions.
The highest level of utility that can be achieved under (p
1
, p
2
, I) is u(x
0
1
(p
1
, p
2
, I), x
0
2
(p
1
, p
2
, I)),
which is the utility of the optimal choices under the budget parameters. We dene the indirect
utility function to be
v(p
1
, p
2
, I) = max
x1,x2
u(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I
= u(x
0
1
(p
1
, p
2
, I), x
0
2
(p
1
, p
2
, I))
It should be clear to the reader that v is decreasing in p
1
and p
2
, and increasing in I.
Example: u(x
1
, x
2
) = x

1
x

2
, where + = 1. We saw in Section 2.3 that x
0
1
(p
1
, p
2
, I) = I/p
1
and x
0
2
(p
1
, p
2
, I) = I/p
2
. Note that x
0
1
does not depend on p
2
, and x
0
2
does not depend on p
1
. The
indirect utility function is given by
v(p
1
, p
2
, I) =

1
p

2
I
4.2 Expenditure Function
Instead of maximizing utility subject to a budget constraint, one could minimize spending, subject
to a utility constraint:
min
x1,x2
p
1
x
1
+p
2
x
2
s.t. u(x
1
, x
2
) = u
0
The Lagrangian is
L(x
1
, x
2
, ) = p
1
x
1
+p
2
x
2
[u(x
1
, x
2
) u
0
]
The FONC are:
p
1
u
1
(x
1
, x
2
) = 0
p
2
u
2
(x
1
, x
2
) = 0
u(x
1
, x
2
) = u
0
28
Note that the rst two conditions are equivalent to the tangency condition p
1
/p
2
= u
1
/u
2
. Take a
look at Figure 4.1. The parallel lines represent iso-cost lines: combinations such that p
1
x
1
+p
2
x
2
is constant. These can be thought of as the contours of the objective function. Their slope is
p
1
/p
2
. (Why?)
Figure 4.1: How does the consumer reach u
0
with as little income as possible?
The utility maximization (u-max) and expenditure minimization (e-min) problems are called dual
problems, since they reverse the objective and the constraint.
What are the solutions to the e-min problem? The choices (x
1
, x
2
) that minimize spending subject
to a utility constraint are like demand functions, with the exception that they take utility, rather
than income, as given. We call these compensated demand functions, and denote them as follows:
x
1
= x
c
1
(p
1
, p
2
, u
0
)
x
2
= x
c
2
(p
1
, p
2
, u
0
)
Sometimes these are called Hicksian demand functions, after John Hicks, the English economist
who discovered them (and won the second Nobel prize in economics).
Under (p
1
, p
2
, I), and having chosen x
c
1
, x
c
2
, one spends a total of
p
1
x
c
1
(p
1
, p
2
, I) +p
2
x
c
2
(p
1
, p
2
, I)
We dene the expenditure function, (analagous to the indirect utility function for it gives the
amount spent assuming one has solved the e-min problem), to be
e(p
1
, p
2
, u
0
) = min
x1,x2
p
1
x
1
+p
2
x
2
s.t. u(x
1
, x
2
) = u
0
= p
1
x
c
1
(p
1
, p
2
, u
0
) +p
2
x
c
2
(p
1
, p
2
, u
0
)
Note that e(p
1
, p
2
, u
0
) tells you the minimum amount of money necessary to achieve utility u
0
under
prices (p
1
, p
2
).
29
Example: u(x

1
, x

2
) = x

1
x

2
, where + = 1. The Lagrangian is
L = p
1
x
1
+p
2
x
2
(x

1
x

2
u
0
)
FONC:
L
1
= p
1
x
1
1
x

2
= 0
L
2
= p
1
x

1
x
1
2
= 0
_
=
p
1
p
2
=


x
2
x
1
= x
2
=


p
1
p
2
x
1
Substituting this into the budget constraint,
x

1
_


p
1
p
2
x
1
_

= u
0
which implies
x
1
= u
0
_
p
2
p
1

x
2
= u
0
_
p
1
p
2

30
5 Comparative Statics of Consumer Choice
In this section we characterize the changes in consumer demands that occur as income and prices
vary. Our goal is to describe the consumers demand functions. Analytically, the demand functions
for the goods x and y are a pair of functions
x = x(p
x
, p
y
, I)
y = y(p
x
, p
y
, I)
that describe the consumers optimal choices of x and y, given prices and income. As you can
imagine, the nature of these functions is important in a wide variety of applications.
5.1 Change in Demand with Respect to Income, Engel Curves
As income changes, the budget constrint shifts in a parallel fashion: inward if I decreases, outward
if I increases.
In commodity space, (xy-space, or in our case the plane), the tangencies of the budget constraints
with higher and higher indierence curves trace out the income expansion path shown in Figure 5.1.
For a good x, if the quantity of x demanded increases with income, then x is said to be a normal
good. For some goods, the quantity demanded falls with incomesuch goods are called inferior.
Analytically, x/I > 0 = x normal, while x/I < 0 = x inferior.
Figure 5.1: Fix prices. Then x(px, py, I) = x(I), and y(px, py, I) = y(I). The income expansion path is
{(x(I), y(I)) : I 0}.
A couple interesting implications of the budget constraint for changes in x and y with respect to
income:
31
(a) (b) (c)
Figure 5.2: (a) x, y normal (b) x normal, y borderline inferior (c) x inferior, y normal
Using the fact that income is always exhausted,
I = p
x
x +p
y
y
= dI = p
x
dx +p
y
dy
= 1 = p
x
dx
dI
+p
y
dy
dI
so clearly both goods cannot be inferior for in that case the RHS would be negative.
Starting from the previous equation,
xp
x
I

I
x
dx
dI
+
yp
y
I

I
y
dy
dI
= 1
which is equivalent to
s
x
e
x
+s
y
e
y
= 1
where s
x
and s
y
are the expenditure shares, (the fraction of income spent on each good),
and e
x
and e
y
are the income elasticies, (the percent change in demand x/x divided by the
percent change in income I/I, or, in the limit as I 0, (dx/x)/(dI/I)). This equation
can be summarized as follows: the expenditure-weighted sum of income elasticies is unity.
The relation between x and I, holding prices constant, is called the Engel curve, and is shown in
Figure 5.3.
The data in Table 2 conrm Engels Law, that as income increases, the expenditure share of food
decreases. The implication is that income elasticity of food is less than unity. Why? Let x be food.
Then s
x
= xp
x
/I is the expenditure share of food, and
ds
x
dI
=
p
x
dx
dI
I

1
I
2
xp
x
=
xpx
I
I
x
dx
dI
I

1
I
xp
x
I
=
s
x
I
(e
x
1)
or
I
s
x
ds
x
dI
= e
x
1
32
Figure 5.3: The Engel curve starts from the origin if x = 0 when I = 0, (which is a reasonable assumption).
The Engel curve has positive slope if x is a normal good.
(a) (b) (c)
Figure 5.4: (a) Linear Engel curves: dx/dI = x/I = ex = 1. (b) Convex Engel curves: dx/dI >
x/I = ex > 1. (c) Concave Engel curves: dx/dI < x/I = ex < 1.
So, if e
x
< 1, then food share is declining with income. An alternative proof employs a favorite
trick of economists, taking natural logs:
log s
x
= log x + log p
x
log I
d log s
x
d log I
=
d log x
d log I
1
or
I
s
x
ds
x
dI
= e
x
1
In some contexts, the food share is used as an indicator of welfare. It has been proposed that
families in dierent countries with the same food share are equally well o.
5.2 Change in Demand with Respect to Price
A change in one of the prices causes the budget line to rotate; as it does so, the tangencies with
higher and higher indierence curves trace out the price consumption path.
You should be familiar with the demand curve, which is the graph of the demand function x(p
x
) =
x(p
x
, p
0
y
, I
0
), where p
0
y
and I
0
are xed. See Figure 5.6.
33
Table 2: Food Share of Std. Budget in Various Years
Year Food Share in Std. Budget

1935-39 35.4
1952 32.2
1963 25.2
1992 19.6
2000 16.3
* Budget used in calculation of CPI.
Figure 5.5: A rise in px is accompanied by a reduction in x.
Note that we traditionally plot demand, (the dependent variable), on the horizontal axis and the
price, (the independent variable), on the vertical axis.
3
The negative slope of the demand curve
reects the idea that consumption of a commodity falls as its price increases. However, demand
curves are not necessarily downward sloping! We turn now to a decomposition of the change in
demand due to a change in price. We show that there are two factors:
1. the curvature of the indierence curves
2. the nature of the income eect on demand
5.3 Graphical Decomposition of a Change in Demand
Suppose p
x
increases from p
0
x
to p
1
x
; demand changes from (x
0
, y
0
) to (x
1
, y
1
). We can deocmpose
the change from x
0
to x
1
as follows:
1. First, think of the change in x that arises purely due to the fact that x now costs more.
Draw a budget line with slope p
1
x
/p
y
that still allows the consumer to reach the indierence
3
We owe this convention to Alfred Marshall. As a result of this, steep demand curves are inelastic, whereas at
demand curves are elastic.
34
Figure 5.6: The reader is presumed to be famililar with the demand curve.
Figure 5.7: The movement from x
0
, y
0
) to (x

, y

) takes place along the indierence curve.


curve through (x
0
, y
0
) (call this indierence curve u
0
). Note that, since its steeper than the
old budget line, it has a tangency with u
0
to the left of (x
0
, y
0
).
4
This articial budget
constraint is represented by the dashed line in Figure 5.7.
2. Second, move from this intermediate point to the nal optimum. Observe that this movement
is a movement along an income expansion path, since the intermediate optimum occurs where
u
0
has a tangency with a budget line with slope p
1
x
/p
y
.
Analytically,
x = x
1
x
0
= (x
1
x

) + (x

x
0
)
where x

denotes the aforementioned intermediate optimum. We refer to the rst change (x


1
x

),
holding utility constant, as the substitution eect. We refer to the second change (x

x
0
), as the
4
Assuming DMRS.
35
(a) (b)
Figure 5.8: (a) Step 1: move to new tangency on old indierence curve. (b) Step 2: Move along IEP to
new optimum.
income eect. Thus we write
x = x
S
+ x
I
5.4 Substitution Eect
The substitution eect represents movement along an indierence curve. It tells you how far to
move in order for the indierence curve to be parallel to the new budget line, i.e. in order for the
MRS to equal the new price ratio. Obviously, then, if the indierence curves are relatively at,
you have to go a long way before the MRS equals the new price ratio, and the substitution eect is
substantial. If the indierence curves are highly convex, the MRS changes rapidly and you do not
need to go far: the substitution eect is small. See Figure 5.9.
(a) (b)
Figure 5.9: (a) u
0
at = more substantial substitution eect (b) u
0
highly curved = lesser substi-
tution eect
Note that if p
x
> 0, the substitution eect is negative. (Why?) What about the substitution
36
eect of p
x
on y?
5.5 Income Eect
Intuitively, one might think the income eect is larger the greater x
0
, i.e. the greater x was in
the rst place. If, initially, you consumed very little x, the income eect would be relatively small.
Take a look at Figure 5.10:
Notice that the intermediate budget constraint almost passes through (x
0
, y
0
). (It always
cuts below, if not by much.)
So, the income eect is approximately proportional to the change in income from the budget
line through (x
0
, y
0
) to the nal budget line.
Figure 5.10: The income eect is approximately proportional to the perpendicular distance between the
budget lines.
What is the change in income? The nal budget constraint limits the consumer to I, just as the
initial constraint does. Therefore I = p
0
x
x
0
+p
y
y
0
. In order to be able to aord (x
0
, y
0
) under the
new prices, you would need p
1
x
x
0
+ p
y
y
0
, or I = p
x
x
0
more than before. For a small change
in p
x
, the intermediate optimum is close to the initial one, so the dierence in income from the
intermediate constraint to the nal one is approximately p
x
x
0
. (The approximation is exact in
the limit p
x
0.)
This conrms our intuition: the movement along the income expansion path from the intermediate
optimum to the nal optimumthe income eectwill be larger, the larger was x
0
, our initial level
of consumption of x.
37
6 Slutskys Equation
6.1 Review
Expenditure function:
e(p
1
, p
2
, u
0
) = min
x1,x2
p
1
x
1
+p
2
x
2
s.t. u(x
1
, x
2
) = u
0
= p
1
x
c
1
(p
1
, p
2
, u
0
) +p
2
x
c
2
(p
1
, p
2
, u
0
)
where x
c
1
and x
c
2
are the compensated demands, the cheapest choices that enable one to achieve
utility level u
0
at prices (p
1
, p
2
).
The Lagrangian for the e-min problem is
L(x
1
, x
2
, ) = p
1
x
1
+p
2
x
2
[u(x
1
, x
2
) u
0
]
The FONC are:
p
1
u
1
(x
1
, x
2
) = 0
p
2
u
2
(x
1
, x
2
) = 0
u(x
1
, x
2
) = u
0
As for the derivatives of the expenditure function with respect to prices,
e(p
1
, p
2
, u
0
)
p
1
= x
c
1
(p
1
, p
2
, u
0
) +p
1
x
c
1
(p
1
, p
2
, u
0
)
p
1
+p
2
x
c
2
(p
1
, p
2
, u
0
)
p
1
. (6.1)
The reader is presumed to be familiar with the Envelope Theorem, which says the second and third
terms on the RHS cancel.
Proof: Recall that u(x
c
1
(p
1
, p
2
, u
0
), x
c
2
(p
1
, p
2
, u
0
)) = u
0
. Dierentiate both sides with respect to p
1
:
u
1
x
c
1
p
1
+u
2
x
c
2
p
1
= 0
But u
1
= p
1
/ and u
2
= p
2
/ by the FONC. It follows by substitution that
p
1


x
c
1
p
1
+
p
2


x
c
2
p
1
= 0
which means
p
1
x
c
1
p
1
+p
2
x
c
2
p
1
= 0
Thus we have
e(p
1
, p
2
, u
0
)
p
1
= x
c
1
(p
1
, p
2
, u
0
)
There is a story we tell to go along with this. If you initially are minimizing expenditure, and the
price of good 1 rises, what do you do? Your rst order response is simply to continue buying the
38
old bundlethis increases your spending by x
c
1
p
1
. That is the rst term on the RHS of (6.1).
But then you would like to adjust your choices of goods 1 and 2 to reect the new prices. The
adjustments are the second and third terms on the RHS of (6.1). But because your initial choices
were optimalthey satised the FONCwhen you attempt to adjust x
1
and x
2
you dont save any
more.
6.2 Slutsky Decomposition
Now we are ready to analyze what happens to the uncompensated, or regular demand functions
when prices rise/fall. Suppose we start with prices (p
0
1
, p
0
2
) and income I
0
. Initially the optimal
choices are x
0
1
= x
1
(p
0
1
, p
0
2
, I
0
) and x
0
2
= x
2
(p
0
1
, p
0
2
, I
0
), where x
1
() and x
2
() are the regular demand
functions.
We decompose the eect of a change in price p
1
= p
1
1
p
0
1
as follows:
(a) Starting from (x
0
1
, x
0
2
), imagine the adjustment you would make if you could remain on the
old indierence curve. This would lead you to a new bundle (x

1
, x

2
). Since prices have risen
this bundle costs more than you were spending before. This move is called the substitution
eect of the price increase.
(b) Then, from (x

1
, x

2
), imagine the adjustment you would make to get back to the original
income level. This would be a move inward along an income expansion path (IEP), and
would lead you to (x
1
1
, x
1
2
). This move is called the income eect of a price increase.
Figure 6.1: A decomposition of the change in demand into its constituent parts: movement along the
indierence curve followed by movement inward along an IEP.
Note that the total change in x
1
is
x
1
= x
1
1
x
0
1
= (x
1
1
x

1
) + (x

1
x
0
1
) = x
I
1
+ x
S
1
39
What are the relative magnitudes of the constituent parts? To begin, observe that (x
0
1
, x
0
2
) and
(x

1
, x

2
) are on u
0
. Now,
x
0
1
= x
1
(p
0
1
, p
0
2
, I
0
) = x
c
1
(p
1
, p
2
, u
0
) (6.2)
Also,
x

1
= x
c
1
(p
1
1
, p
0
2
, u
0
)
so
x
S
1
= x

1
x
0
1
= x
c
1
(p
1
1
, p
0
2
, u
0
) x
c
1
(p
0
1
, p
0
2
, u
0
)
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
p
1
The substitution eect depends on the rate at which compensated demands change: this is purely a
function of the curvature of the indierence curves.
How about the income eect?
x
I
1
= x
1
1
x

1
First note that x
1
1
= x
1
(p
1
1
, p
0
2
, I
0
): it is the regular demand given (p
1
,
1
, p
0
2
, I
0
). But what is x

1
? It
is the choice one would make with enough income remain on u
0
even at the new prices. How much
money would it take? The answer is e(p
1
1
, p
0
2
, u
0
)! So,
x

1
= x
1
(p
1
1
, p
0
2
, e(p
1
1
, p
0
2
, u
0
))
Thus
x
I
1
= x
1
(p
1
1
, p
0
2
, I
0
) x
1
(p
1
1
, p
0
2
, e(p
1
1
, p
0
2
, u
0
))

x
1
(p
0
1
, p
0
2
, I
0
)
I
(I
0
e(p
1
1
, p
0
2
, u
0
))
So the income eect depends on the income derivative of demand times the change in income
I = I
0
e(p
1
1
, p
0
2
, u
0
). Note that I < 0 since one would need more than I
0
to achieve U = u
0
at prices (p
1
1
, p
0
2
).
But how big is I? We need one last trick. We know that I
0
= e(p
0
1
, p
0
2
, u
0
), so we can write
I = I
0
e(p
1
1
, p
0
2
, u
0
)
= e(p
0
1
, p
0
2
, u
0
) e(p
1
1
, p
0
2
, u
0
)

e(p
0
1
, p
0
2
, u
0
)
p
1
(p
0
1
p
1
1
)
=
e(p
0
1
, p
0
2
, u
0
)
p
1
(p
1
)
=
e(p
0
1
, p
0
2
, u
0
)
p
1
p
1
(which is negative for an increase in p
1
). Finally we have
e(p
0
1
, p
0
2
, u
0
)
p
1
= x
c
1
(p
0
1
, p
0
2
, u
0
) by (6.1)
= x
0
1
by (6.2)
40
and combining the last few results,
I x
0
1
p
1
Note that the size of the income eect depends on the original level of consumption of x
1
.
Putting it all together,
x
I
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
I
I =
x
1
(p
0
1
, p
0
2
, I
0
)
I
x
0
1
p
1
Thus
x
1
= x
I
1
+ x
S
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
I
x
0
1
p
1
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
p
1
or
x
1
p
1
= x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
Now in the limit p
1
0 the ratio x
1
/p
1
equals the derivative of the regular demand function
with respect to p
1
. We have established:
x
1
(p
0
1
, p
0
2
, u
0
)
p
1
= x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
This is called Slutskys equation, after the Russian economist who proved it over 100 years ago.
Slutskys equation says the derivative of the regular demand function with respect to p
1
is a com-
bination of the income and substitution eects. The income eect depends on the derivative of
demand with respect to income, times the original level of consumption of x
1
. The substitution
eect depends on the derivative of the compensated demand function.
A useful feature of Slutskys equation is that it provides a way to recover information about indif-
ference curves from the derivatives of the demand functions with respect to prices and incomes. In
principle, we can observe x
1
/p
1
and x
1
/I, which would enable us to infer
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
p
1
+x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
Suppose we get an estimate of x
c
1
/p
1
that is nearly zero. The indierence curves must therefore
be almost Leontief (right angles).
41
7 Using Market Level Demand Curves
Since the demand curve graphs x = f(p
x
, p
y
, I), if p
y
or I changes, the demand curve shifts. For
example, if income were to increase by dI > 0, then at a given price, demand would increase by
dx = (x/I)dI. For a normal good x/I > 0, so the demand curve would shift to the right as in
Figure 7.1.
Figure 7.1: A shift in the demand curve to to an increase in I, assuming x is a normal good.
If the elasticities of demand are approximately constant, then
d(log x) =
dx
x
=
_
x
I

I
x
_
dI
I
= e
x
dI
I
= e
x
d(log I)
where e
x
is the income elasticity of demand for x.
5
Similarly, if p
y
changes, the demand curve shifts
unless x/p
y
= 0 (as in the case of Cobb-Douglas preferences). If x/p
y
< 0, and increase in the
price of y causes the demand curve to shift to the right.
For the purposes of evaluating the eect of relatively small changes in prices and income, we often
assume the demand function has constant elasticities:
x
p
x

p
x
x
=
log x
log p
x
=
xx
(constant)
x
p
y

p
y
x
=
log x
log p
y
=
xy
(constant)
x
I

p
x
x
=
log x
log I
= e
x
(constant)
This is equivalent to assuming that the demand function is log-linear:
log x =
xx
log p
x
+
xy
log p
y
+e
x
log I +c
5
You should be familiar with the concept of elasticity from Econ 1. In particular, you should be able to verify
that elasticity is a unitless quantity.
42
where c is a constant. Note that homogeneity implies
xx
+
xy
+e
x
= 0. Put dierently, if prices
and income all rise by one percent, then x remains constant.
6
As you recall from introductory economics, the market is constructed by introducing a supply curve
of the form x = S(p
x
). (See Figure 7.2.) It is usually assumed that supply is upward sloping. (We
defer the derivation of market supply curves until later.) For now, we shall assume that elasticity
of supply is constant:
dS(p
x
)
dp
x

p
x
S(p
x
)
=
x
where
x
denotes elasticity of supply. We now can combine supply and demand curves to analyze
the eects of exogenous shocks to income or other prices. We have
x = S(p
x
) = f(p
x
, p
y
, I)
a system of two equations in two unknowns, p
x
and x (unit price of x and quantity of x, respectively),
given income and other prices. This is pictured in Figure 7.3.
Figure 7.2: The reader is presumed to be familiar with the upward sloping supply curve.
7.1 An Increase in Income
Obviously, both x and p
x
increase with I. But by how much? Take a look at Figure 7.4. Starting
at equilibrium, with x = x
0
and p
x
= p
0
x
, the changes in demand and supply are:
x
x
=
xx
p
x
p
x
+e
x
I
I
(demand)
x
x
=
x
p
x
p
x
(supply)
6
A proof would involve recognizing that if x remains constant, then so does log x, and therefore setting the total
dierential of log x equal to zero. The details are left to the reader.
43
Figure 7.3: The market is in equilibruim when the price is such that supply and demand are balanced.
Figure 7.4: How much does px increase due to an outward shift in the demand curve?
The proportional changes in supply and demand have to be the same in order to restore equilibrium.
Therefore

xx
p
x
p
x
+e
x
I
I
=
x
p
x
p
x
which implies
p
x
p
x
=
_
e
x

xx
_
I
I
Note that
x
> 0 and
xx
< 0, so
x

xx
is strictly positive. Furthermore,
x
x
=
x
p
x
p
x
=
_

x
e
x

xx
_
I
I
44
For example, suppose the following:

x
= 0.60 (short run)

xx
= 1.40
e
x
= 0.40
If I/I = 0.10 (10% increase), then
p
x
p
x
= (0.40)(0.10) 0.02
x
x
0.012
As an exercise, calculate the eect of a 10% drop in the price of a substitute good (good y) on the
market for x. Use an estimate for the cros-price elasticity between x and y of 0.67 (
xy
= 0.67).
7.2 Tax Incidence
If a tax of t dollars per unit is imposed on x, it creates a gap between the price that consumers pay
and the price that producers receive, of t dollars per unit. You are presumed to be familiar with
the diagram shown in Figure 7.5.
Starting from an equilibrium at (p
0
x
, x
0
), price received by producers falls to p
1
x
, the price paid
by consumers rises to p
1
x
+ t, and the quantity falls to x
1
. Consider the two marekts shown in
Figure 7.6, each with the same tax. Obviously, the eect of the tax on the prices paid/received by
the two sides depends on the relative elasticities of supply and demand. To see this more formally,
we proceed based on the assumption that elasticities are roughly constant. Letting p
x
denote the
price received by producers, the change in supply is
x
x
=
x
p
x
p
x
The change in prices for consumers is p
x
+t. Therefore, the change in quantity demanded is
x
x
=
xx
_
p
x
+t
p
x
_
Market equilibrium requires that change in demand equals change in supply:

xx
_
p
x
+t
p
x
_
=
x
p
x
p
x
Solving for the equilibrium change in prices, we have

xx
t
p
x
=
p
x
p
x
(
x

xx
)
and
p
x
p
x
=
_

xx

xx
_
t
p
x
45
where t/p
x
is the proportional tax rate. Since
x
> 0 and
xx
< 0, so
x

xx
is strictly positive,
and therefore p
x
< 0. With regard to quantity,
x
x
=
x
p
x
p
x
=
_

x

xx

xx
_
t
p
x
< 0
For producers, the change in price is
p
x
p
x
=
_

xx

xx
_
t
p
x
and for consumers it is
p
x
+t
p
x
=
_

xx

xx
_
t
p
x
+
t
p
x
=
_

x

xx
_
t
p
x
> 0
Notice that the ratio of the changes in prices for producers versus consumers is
xx
/
x
. So, if
demand is highly inelastic, i.e. |
xx
| is small (e.g.
xx
= 0.1), and supply is moderately elastic
(e.g.
x
= 1.0), then producer prices dont fall by much relative to consumer prices. On the other
hand, if demand is highly elastic, i.e. if
xx
is big (e.g.
xx
= 3.0), then producer prices are more
aected.
Last we consider the eect of a per unit subsidy of s on the price of x. (For example, prior to
the recent rise in electricity rates, electricity prices were subsidized throughout most of California.)
The change in price received by producers is p
x
, whereas the change in price paid by consumers
is p
x
s. The proportional changes in quantity are:
x
x
=
xx
_
p
x
s
p
x
_
(demand)
x
x
=
x
p
x
x
(supply)
Setting the two equal, we have
p
x
p
x
=
_

xx

xx
_
s
p
x
> 0
which implies that part of the eect of the subsidy is mitigated by a rise in prices. In fact, the
change in price paid by consumers is
p
x
s
p
x
=
_

xx

xx
_
s
p
x

s
p
x
=
_

x

xx
_
s
p
x
< 0
Note that
x
/(
x

xx
) is less than one in absolute value.
46
Figure 7.5: The new price p
1
x
is such that when consumers pay p
1
x
+t and suppliers receive p
1
x
, equilibrium
is restored.
(a) (b)
Figure 7.6: (a) Demand inelastic, supply elastic. (b) Demand elastic, supply inelastic.
47
8 Labor Supply
In this section we consider the choice of how many hours to work by an individual who faces an
hourly wage w > 0, and also has non-labor income y. The individual is assumed to value leisure
and consumption of goods x, using a utility function u(x, ). We assume there is an upper bound
T on leisure, and that the sum of leisure and hours of work h is T:
+h = T, or h = T
The graph looks a little unusual since preferences are only dened up to the point where = T as
the reader can see in Figure 8.1.
Figure 8.1: The budget constraint for an agent who works for w/h and consumes a numeraire good x.
The budget constraint is px = wh +y but we shall assume p = 1. The consumers objective is
max
x,
u(x, ) s.t. x = w(T ) +y, or x +w = y +wT
Note that if you think of the consumption bundle as (x, ), then the budget constraint says the
total cost of the bundle has to be y +wT for this is all the income you would have if you bought
no leisure. This full income depends on w, and therein lies the key dierence between labor
supply and other consumer choice problems: as the price of one good (leisure) rises, the consumer
is actually richer. Intuitively this is because a worker is a net seller of leisure: he or she starts at
an endowment point (x, ) = (y, T). From there he or she can trade with the market by giving
up leisure in return for cash, which is then used to purchase goods.
We proceed by the method of Lagrange:
L(x, , ) = u(x, ) (x +w y +wT)
L
x
= u
x
(x, ) = 0
L

= u

(x, ) w = 0
L

= x w +y wT = 0
48
The rst two FONC imply the usual tangency condition: u

(x, )/u
x
(x, ) = w. The solutions are:
x = x(w, y)
= (w, y)
h(w, y) = T (w, y)
Now consider the rise in w (from w
0
to w
1
) shown in Figure 8.2. As you can see, the substitution
Figure 8.2: For this individual the income and substitution eects have opposite signs.
eect causes a drop in , or equivalently a rise in h. But the income eect works in the opposite
direction: as a net seller of leisure the agent is better o and uses some of her extra income to buy
more leisure.
To formally analyze the income and substitution eects we rely on the expenditure function for the
labor supply case: this is the amount of non-labor income needed to achieve utility u
0
, given w:
e(w, u
0
) = min
x,
x w(T ) s.t. u(x, ) = u
0
L(x, , ) = x w(T ) [u(x, ) u
0
]
L
x
= 1 u
x
(x, ) = 0
L

= w u

(x, ) = 0
L

= u(x, ) +u
0
= 0
The rst two FONC imply the tangency condition: u

(x, )/u
x
(x, ) = w. The solutions are:
x = x
c
(w, u
0
)
=
c
(w, u
0
)
h
c
(w, u
0
) = T
c
(w, u
0
)
The expenditure function is thus
e(w, u
0
) = x
c
(w, u
0
) w[T
c
(w, u
0
)] = x
c
(w, u
0
) wh
c
(w, u
0
)
49
and
e
w
=
x
c
w
w
h
c
w
. .
0
h
c
= h
c
To see that x
c
/w wh
c
/w = 0, we use the same trick as we did in Section 6 when dealing
with the usual expenditure function. So, recalling that (x
c
(w, u
0
),
c
(w, u
0
)) yields utility u
0
,
u(x
c
(w, u
0
),
c
(w, u
0
)) = u
0
and therefore dierentiating both sides,
u
x
(x
c
(w, u
0
),
c
(w, u
0
))
x
c
w
+u

(x
c
(w, u
0
),
c
(w, u
0
))

c
w
= 0
But wu
x
= u

by the tangency condition, and h


c
/w =
c
/w, hence the desired result.
(Again, this is an example of the Envelope Theorem.)
To summarize, we have shown that e/w = h
c
(w, u
0
). To understand this, think of your mom
when she nds out you got a raise at your summer job: she reduces your allowance by an amount
proportional to how much you were working.
Now lets see how leisure choice depends on wages. Assume we start with (w
0
, y
0
), and that w rises
from w
0
to w
1
. The rise in w causes a substitution eect and an income eect:
=
S
+
I
As usual, we can write

S
=

c
w
w
representing the compensated adjustment to the higher cost of leisure on the indierence curve
corresponding to level u
0
. Also,

I
= (w
1
, y
0
) (w
1
, y
1
)
where y
0
= original non-labor income, and y
1
= e(w
1
, u
0
). We use our standard trick of taking
rst order approximations, based on the expenditure function. First, we can approximate
(w
1
, y
0
) (w
1
, y
1
)
(w
1
, y
1
)
y
(y
0
y
1
)
and recognizing that y
0
= e(w
0
, u
0
),
y
0
y
1
= e(w
0
, u
0
) e(w
1
, u
0
)

e(w
0
, u
0
)
w
(w)
= h
c
(w
0
, u
0
)(w)
= h
0
w
So,

(w
1
, y
1
)
y
h
0
w
50
The income eect is proportional to h
0
w: if you had been working more, there would be a bigger
positive income eect. Finally, then, we have
=
S
+
I
=

c
(w
0
, u
0
)
w
w +
(w
1
, y
1
)
y
h
0
w
Dividing both sides w, and taking the limit w 0,

w
= lim
w0

w
=

c
(w
0
, u
0
)
w
+h
0
(w
0
, y
0
)
y
This is Slutskys equation for leisure demand. In terms of hours, recall that h = T , so
h
w
=

w
and
h
y
=

y
and therefore
h
w
=
h
c
(w
0
, u
0
)
w
+h
0
h(w
0
, y
0
)
y
When the wage rises there is a positive substitution eect and a negative income eect on labor
supply. Note in particular that when a person gets a raise, he wont necessarily work more.
51
9 Intertemporal Consumption
The two-period consumption model concerns a consumer whose lifetime spans two periods. In
period one the consumer has income y
1
and spends c
1
; in period two the consumer has income y
2
and spends c
2
. The consumer can borrow or lend at a rate of interest equal to r.
We express the consumers budget constraint in terms of period-two dollars. The choice is arbitrary,
but this way it ends up simplifying the algebra for then we basically have two goods with prices 1+r
and 1, respectively (rather than 1 and 1/(1 + r), which would be the case in period-one dollars).
Having 1 + r in the numerator, not the denominator, is a big help. Total consumption is limited
by total income, so the budget constraint is given by
(1 +r)c
1
+c
2
= (1 +r)y
1
+y
2
The consumers objective is to solve
max u(c
1
, c
2
) s.t. (1 +r)c
1
+c
2
= (1 +r)y
1
+y
2
The Lagrangean is
L(c
1
, c
2
, ) = u(c
1
, c
2
) [(1 +r)c
1
+c
2
(1 +r)y 1 y
2
]
and the FONC are
L
1
= u
1
(c
1
, c
2
) (1 +r) = 0
L
2
= u
2
(c
1
, c
2
) = 0
L

= (1 +r)c
1
c
2
+ (1 +r)y
1
+y
2
= 0
These give a rise to the tangency condition u
1
/u
2
= 1 +r and the budget constraint, as usual. The
solutions are functions of r, y
1
, and y
2
:
c
1
= c
1
(r, y
1
, y
2
)
c
2
= c
2
(r, y
1
, y
2
)
These demand functions are a little unusual because they specify not just total available resources,
or wealth w = (1 + r)y
1
+ y
2
, but also the composition of w. To clarify the eects of a change
in r on c
1
it is helpful to dene two other consumption functions, that depend on the interest rate
and total wealth (measured in period-two dollars):
c
1
= c
w
1
(r, w)
c
2
= c
w
2
(r, w)
These optimal choice functions are related by:
c
1
(r, y
1
, y
2
) = c
w
1
(r, (1 +r)y
1
+y
2
)
c
2
(r, y
1
, y
2
) = c
w
2
(r, (1 +r)y
1
+y
2
)
You can see that as we change r, the eect on c
1
(r, y
1
, y
2
) depends on both c
1
/r and c
1
/w.
52
Now lets dene the expenditure function as the minimum cost to reach a given level of utility
(again, measured in period-two dollars). Specically, dene e as follows:
e(r, u
0
) = min(1 +r)c
1
+c
2
s.t. u(c
1
, c
2
) = u
0
The Lagrangian is
L(c
1
, c
2
, ) = (1 +r)c
1
+c
2
[u(c
1
, c
2
) u
0
]
and the FONC are
L
1
= 1 +r u
1
(c
1
, c
2
) = 0
L
2
= 1 u
2
(c
1
, c
2
) = 0
L

= u(c
1
, c
2
) +u
0
= 0
The solutions are the compensated demand functions c
c
1
(r, u
0
) and c
c
2
(r, u
0
). As usual
e(r, u
0
) = (1 +r)c
c
1
(r, u
0
) +c
c
2
(r, u
0
)
Dierentiating,
e(r, u
0
)
r
= c
c
1
(r, u
0
) + (1 +r)
c
c
1
r
+
c
c
2
r
and (as usual) it is easy to show that (1 +r)c
c
1
/r +c
c
2
/r = 0, so
e(r, u
0
)
r
= c
c
1
(r, u
0
)
Thus we have three optimal consumption functions for rst period consumption:
c
1
(r, y
1
, y
2
), which depends on y
1
and y
2
c
w
1
(r, w), which depends only on w
c
c
1
(r, u
0
), which depends on utility
We also have two relations connecting the three:
c
1
(r, y
1
, y
2
) = c
w
1
(r, (1 +r)y
1
+y
2
) (9.1)
c
c
1
(r, u
0
) = c
w
1
(r, e(r, u
0
)) (9.2)
Now it may seem clear why we dened c
w
1
: its the function that links the compensated demand and
the demand we ultimately are interested in, c
1
(r, y
1
, y
2
). We can dierentiate these two equations
with respect to r. Starting with (9.1),
c
1
(r, y
1
, y
2
)
r
=
c
w
1
(r, (1 +r)y
1
+y
2
)
r
+y
1
c
w
1
(r, (1 +r)y
1
+y
2
)
w
(9.3)
This means that when you change r, the response of the demand for c
1
as a function of (r, y
1
, y
2
)
has an income eect, reecting the fact that as r rises, so does the value of wealth.
53
From (9.2) we get an expression like weve seen before:
c
c
1
(r, u
0
)
r
=
c
w
1
(r, e(r, u
0
))
r
+
c
w
1
(r, e(r, u
0
))
w

e(r, u
0
)
r
=
c
w
1
(r, e(r, u
0
))
r
+
c
w
1
(r, e(r, u
0
))
w
c
c
1
(r, u
0
)
Rearranging, we get a Slutsky equation for c
w
1
:
c
w
1
(r, e(r, u
0
))
r
=
c
c
1
(r, u
0
)
r

c
w
1
(r, u
0
)
w
c
c
1
(r, u
0
)
=
c
c
1
(r, u
0
)
r
c
1
(r, y
1
, y
2
) (9.4)
assuming u
0
is the level of utility one can achieve with income (y
1
, y
2
) and interest rate r.
Finally, plugging (9.4) into (9.3),
c
1
(r, y
1
, y
2
)
r
=
c
w
1
(r, (1 +r)y
1
+y
2
)
r
+y
1
c
w
1
(r, (1 +r)y
1
+y
2
)
w
=
c
c
1
(r, u
0
)
r
+
c
w
1
(r, e(r, u
0
))
w
[y
1
c
1
(r, y
1
, y
2
)]
=
c
c
1
(r, u
0
)
r
+
c
w
1
(r, e(r, u
0
))
w
s
1
(r, y
1
, y
2
)
where s
1
(r, y
1
, y
2
) = y
1
c
1
(r, y
1
, y
2
) is the optimal level of period-one savings.
The income eect of a rise in r on optimal consumption c
1
(r, y
1
, y
2
) is positive or negative, depending
whether s
1
is positive or negative. For a saver, s
1
> 0 and a rise in r has a positive income eect
(because the consumer is a net supplier of funds to the market, as in the case of labor supply). But
for a borrower, s
1
< 0 and a rise in r has a negative income eect (because the consumer is a net
demander of funds, as in the case of basic commodity demand).
54
10 Production and Cost I
The technology available to a given rm is is summarized by its production function. This function
gives the quantities of output produced by various combinations of inputs. For example, an airline
uses labor inputs, fuel, and machinery (airplanes, loading equipment, etc.) to produce the output
passenger seats. We write y = f(a, b) to signify that with inputs a and b, it is possible to produce
y units of output.
Examples:
One Input
y = a

y =
_
0 a < a
1 a > a
Two Inputs
y = a

(Cobb-Douglas)
y = min{a, b} (Leontief, CRS)
y = a +b (Additive, CRS)
For two or more inputs, production functions are a lot like utility functions. The important dif-
ference is that output is measurable and has natural units (e.g. passenger seats). Its as if the
indierence curves have numbers attached to them that matter.
A second, less obvious, way to summarize technology is to compute the cost associated with pro-
ducing a given output level y, at xed prices for the inputs. In principle, if you know the production
function, it is easy to nd the cost function in two steps:
1. enumerate all possible ways of producing y
2. determine the cheapest one, and evaluate its cost
Most of the economic behavior of rms is studied via the cost function. In the next few sections,
we demonstrate how to derive the cost function and illustrate the connection between its properties
and those of the production function.
10.1 One-Factor Production and Cost Functions
10.1.1 Production Functions
Suppose there is only one input (apart from, perhaps a set-up cost). Then we have a picture
along the lines of Figure 10.1. Note that f(0) = 0 by convention.
Denitions and Facts:
55
Figure 10.1: A representative production function. Note the S shape.
The marginal product of factor a is the increase in y that accompanies a unit increase in a:
MP
a
=
f(a)
a
= f

(a)
Factor a is said to be useful if f

(a) > 0.
The average product of factor a is the ratio of total output to total input of a:
AP
a
=
f(a)
a
If the MP of factor a is increasing, then f

(a) > 0 and we say that there are increasing


marginal returns: as the scale of output is expanded, each additional unit of input contributes
more. If the MP is decreasing, then f

(a) < 0 and we say there are diminishing marginal


returns. See Figure 10.2.
(a) (b)
Figure 10.2: (a) Increasing marginal returns. (b) Decreasing marginal returns.
If MP
a
> AP
a
, then AP
a
is increasing; if MP
a
< AP
a
, then AP
a
is decreasing.
Think baseball, with AP = career batting average and MP = season batting average. A
hitter who has a better-than-average season raises his career average. See Figure 10.3. In
56
general,
dAP
a
da
=
af

(a) f(a)
a
2
=
1
a
_
f

(a)
f(a)
a
_
=
1
a
(MP
a
AP
a
)
Figure 10.3: At a = a1, AP = f(a1)/a < f

(a) = MP, AP is increasing. At a = a2, the opposite is true.


Examples:
f(a) = ka, where k > 0 (linear). AP
a
= MP
a
= k.
f(a) = a

, where 0 < < 1 (concave). See Figure 10.4.


Figure 10.4: The greater , the less concave the production function, up to = 1.
f(a) = 9a
2
a
3
, a < 6. See Figure 10.5. For this function we have the following:
f

(a) = 18a 3a
2
= [f

(a) 0 a 6]
f

(a) = 18 6a =
_
f

(a) > 0 a < 3


f

(a) < 0 a > 3


57
Figure 10.5: The production function of the example on page 57.
10.1.2 Cost Functions
What is the cost function for a one-factor production function? Let w dentoe the price per unit of
factor a. Then
c(y, w) = min wa s.t. y = f(a)
But y = f(a) implies a = f
1
(y).
7
Therefore c(y, w) = wf
1
(y). See Figure 10.6 for an illustration
of this process. If w is xed, then we often write the cost function as a function of y only: c(y).
Dene marginal cost MC(y) = c

(y), and average cost AC(y) = c(y)/y.


Examples:
y = f(a) = ka (linear) = a = y/k (linear input requirement function)
c(y, w) = w
_
y
2
_
=
1
2
wy (linear in both y and w)
y = f(a) =

a = a = y
2
(convex input requirement function)
c(y, w) = wy
2
(linear in w but convex in ysee Figure 10.7)
10.1.3 Connection between MC and MP
Marginal cost is the amount it would cost, at the current level of output, to produce an additonal
unit. By denition of MP
a
, one unit of input adds MP
a
= f

(a) units of output. It follows that


1/MP
a
= 1/f

(a) units of a are needed to produce one unit of y


the marginal cost of an additional unit is MC(y) = w/f

(a), when the production function


is given by y = f(a)
7
Assume, for the moment, that f is one-to-one.
58
(a)
(b)
Figure 10.6: The graph in (b) is obtained by rotating quadrant II in (a) 90 degrees clockwise.
Alternatively, c(y) = wf
1
(y), using as input requirement function a = f
1
(y). Thus
8
C

(y) = w
df
1
(y)
dy
=
w
f

(a)
10.1.4 Geometry of c, AC, and MC
Take a look at Figure 10.8a. Note the following:
when MC < AC, AC is falling
when MC > AC, AC is rising
when AC is at a minimum, AC = MC
8
Recall that if f

(x
0
) = 0, then

df
1
(y)
dy

y=f(x
0
)
=
1
f

(x
0
)
.
59
(a) (b)
Figure 10.7: The production function y =

a and the corresponding cost function c = wy
2
, where w is
the per-unit cost of a.
We sometimes add a set up cost F, (also called a xed cost). The total cost is then
c(y) = xed cost + variable cost = F +V C(y)
The implications of this model are illustrated in Figure 10.8b.
60
(a) (b)
Figure 10.8: Compare (b) to (a) and note the following: 1. min AC occurs to the right of min AV C.
Why? 2. MC intersects both AC and AV C at their respective minimumns. Why?
61
11 Production and Cost II
The analysis of production and cost is more interesting when it involves combinations of two or
more inputs to produce y. The production function is y = f(a, b). As in consumer theory, we begin
by thinking about combinations of inputs that produce the same level of output. In the rm case
these are called isoquants.
We dene the marginal rate of technical substitution (MRTS) as the slope of an isoquant. It indicates
how many units of b one would need to add, per unit of a given up, to keep output constant. See
Figure 11.1.
Figure 11.1: The marginal rate of technical substitution is analogous to the consumers MRS. This bears
comparison to Figure 2.5.
Formally, suppose y = f(a
0
, b
0
), and consider varying a and b in such a way that output remains
xed at y
0
:
dy = f
a
da +f
b
db = 0
which implies
_
db
da
_
y
0
=
f
a
(a
0
, b
0
)
f
b
(a
0
, b
0
=
MP
a
MP
b
The MRTS is analogous to the marginal rate of substitution (MRS) in consumer theory. When
there are two or more inputs, the production function is characterized by both the degree of sub-
stitutability between inputs (curvature of isoquants) and the extent to which output expands as
inputs are expanded proportionately. The latter gives rise to the idea of returns to scale. Recall
that for a production function y = f(a, b), we say f has constant returns to scale (CRS) if
f(a, b) = f(a, b), > 0
We say that f has decreasing returns to scale (DRS) if
f(a, b) < f(a, b), > 1
62
With DRS, if you double both inputs, you get less than twice the output. On the other hand, the
same inequality implies that if you reduce inputs by some proportion, your output falls by a smaller
proportion. So DRS suggests that smaller rms are necessarily more ecient. Conversely we say
that f has increasing returns to scale (IRS) if
f(a, b) > f(a, b), > 1
(a) (b)
Figure 11.2: (a) CRS and (b) DRS. This can be seen by noting the shape of the intersection of the surface
with the plane a = b for example.
Examples:
One Input: f(a) = a

CRS if = 1
DRS if = 1
IRS if > 1
Cobb-Douglas: f(a, b) = a

CRS if + = 1
DRS if + = 1
IRS if + > 1
As a check, suppose + = 1. Then
f(a, b) = (a)

(b)

=
+
a

= f(a, b)
63
Geometrically, returns to scale indicates whether f is concave or convex over the top of a ray
emanating from the origin. (See Figure 11.2.)
11.1 Derivation of the Cost Function
Given a production function f(a, b) and prices w
a
, w
b
, we can write
c(w
a
, w
b
, y) = min w
a
a +w
b
b s.t. f(a, b) y
Dene L = w
a
a +w
b
b [f(a, b) y], and proceed by the method of Lagrange:
L
a
= w
a
f
a
(a, b) = 0
L
b
= w
b
f
b
(a, b) = 0
L

= f(a, b) +y = 0
The ratio of the rst two FONC gives
w
a
w
b
=
f
a
(a, b)
f
b
(a, b)
= MRTS
Geometrically, we nd the point of tangency of the constraint f(a, b) = y with the iso-cost lines
w
a
a +w
b
b = const.
See Figure 11.3.Notice the problem is reversed relative to that of a consumer. In the cost problem,
you are constrained to an isoquant and have to nd the lowest budget, or iso-cost line. In the
consumer problem, you are constrained to a budget line and have to nd the highest isoquant, or
indierence curve.
Figure 11.3: The Firms objective is to minimize cost subject to a given level of output. This is done by
moving along an isoquant until the tangency condition is satised.
If we consider nding the most inexpensive way to achieve dierent levels of output given w
a
and
w
b
, we trace out the scale expansion path (SEP) shown in Figure 11.4. Note the similarity between
64
a rms SEP and a consumers IEP. Geometrically, the shape of the cost function (as a function of
y) depends on the shape of the production function over the top of the SEP. See Figure 11.5 for
an illustration. If the curve over the SEP is S-shaped as in Figure 11.5b we get cost functions of
the usual shape.
Figure 11.4: The scale expansion path traces out the optimal input demands as production varies.
(a) (b)
Figure 11.5: The shape of the cost function depends on the shape of the production function over the
top of the SEP. In other words, if the SEP is given by g(a, b) = g
0
, then the cost function is
shaped like the intersection of y = f(a, b) with g(a, b) = g
0
, where the latter is promoted to
three dimensions.
11.2 Marginal Cost
If we were to produce an additional unit of y, we could use input a, or input b, or both. If we used
a only, it would take 1/MP
a
units of a for a single unit of y. The marginal cost is w
a
/MP
a
(just as
65
in the one-factor case). By symmetry, we could also use b only, at marginal cost of w
b
/MP
b
. But
from the FONC
w
a
w
b
=
MP
a
MP
b
=
w
a
MP
a
=
w
b
MP
b
So, on the margin, one should be indierent to expanding output via increases in a or increases in
b. This reects the fact that a and b were optimally chosen to begin with. Note also that
=
w
a
f
a
(a, b)
=
w
a
MP
a
=
w
b
MP
b
Thus the Lagrange multiplier in the cost-minimization problem gives marginal cost.
Examples:
f(a, b) = min{a, b/k}. At a cost minimum we must have a = b/k = y, which implies
c(w
a
, w
b
, y) = y(w
a
+kw
b
)
Note that this production function exhibits CRS.
f(a, b) = a + kb. These are linear isoquants, with f
a
/f
b
= 1/k. If w
a
/w
b
> 1/k, use only b,
in which case y = kb = b = y/k, and c(w
a
, w
b
, y) = w
b
y/k. But if w
a
/w
b
< 1/k, use only
a, in which case y = a, and c(w
a
, w
b
, y) = w
a
y. Combining these results, for any w
a
, w
b
, we
have c(w
a
, w
b
, y) = y min{w
a
, w
b
/k}.
The previous two examples illustrate what is called the dual relationship between cost and pro-
duction functions. Leontief production functions imply linear cost functions; linear cost functions
imply Leontief-like cost functions.
f(a, b) = a

. (You may have seen this in a problem set!) The Lagrangian is L(a, b, ) =
w
a
a +w
b
b (a

y).
L
a
= w
a
a
1
b

= 0
L
b
= w
b
a

b
1
= 0
L

+y = 0
Using the rst FONC, we have
w
a
w
b
=
a
1
b

b
1
=
b
a
or
b =
aw
a
w
b
By substitution,
a

= a

_
aw
a
w
b
_

= a
+

b
= y
from which we can easily retrieve the input requirement function (IRF) for a:
a = y
1
+
_

_

+
w


+
a
w

+
b
66
The IRF for b can be found by substitution, or by symmetry:
b = y
1
+
_

_
+
w

+
a
w


+
b
Finally c(w
a
, w
b
, y) = w
a
a + w
b
b when a and b are set to their respective cost-minimizing
values, so
c(w
a
, w
b
, y) = y
1
+
_

_

+
w

+
a
w

+
b
+y
1
+
_

_
+
w

+
a
w

+
b
= y
1
+
w

+
a
w

+
b
_
_

_

+
+
_

_
+
_
If + = 1 (CRS), this simplies considerably:
c(w
a
, w
b
, y) = yw

a
w

b
_
_

+
_

_
= yw

a
w

b
(

)
So with CRS, cost is linear in output. In general the exponent of y in the cost function is
( + )
1
, so if + > 1, cost is concave in output (IRS), whereas if + < 1, cost is
convex in output (DRS).
67
12 Cost Functions and IRFs
Suppose we are given a production function f(x
1
, x
2
), and the associated cost function c(y, w
1
, w
2
).
We determine c by solving the cost minimization problem:
min w
1
x
1
+w
2
x
2
s.t. f(x
1
, x
2
) = y
We dene the Lagrangian L = w
1
x
1
+w
2
x
2
[f(x
1
, x
2
) y]. The FONC are:
L
1
= w
1
f
1
(x
1
, x
2
) = 0
L
2
= w
2
f
2
(x
1
, x
2
) = 0
L

= f(x
1
, x
2
) +y = 0
The rst two of these imply the tangency condition w
1
/w
2
= f
1
/f
2
, while the third is equivalent
to the constraint. Solving these two equations in two unknowns we get the IRFs:
x
1
= x

1
(y, w
1
, w
2
)
x
2
= x

2
(y, w
1
, w
2
)
The IRFs are analogous to the consumers demand functions: they represent the optimal (cost-
minimizing) input choices to produce y when input prices are (w
1
, w
2
). With these we obtain the
cost function
c(y, w
1
, w
2
) = w
1
x

1
(y, w
1
, w
2
) +w
2
x

2
(y, w
1
, w
2
) (12.1)
which is simply the cost of the cost-minimizing combination of inputs.
12.1 Sheppards Lemma
It turns out that given c, one can recover the IRFs by simple dierentiation:
x

1
(y, w
1
, w
2
) =
c(y, w
1
, w
2
)
w
1
At a glance, this appears to be inconsistent with (12.1). Indeed, dierentiating (12.1) with respect
to w
1
gives three terms:
c(y, w
1
, w
2
)
w
1
= x

1
(y, w
1
, w
2
) +w
1
x

1
(y, w
1
, w
2
)
w
1
+w
2
x

2
(y, w
1
, w
2
)
w
1
(12.2)
However, when an input price changes, x

1
(y, w
1
, w
2
) and x

2
(y, w
1
, w
2
) are constrained to move
along an isoquant as in Figure 12.1. In other words, we have
f(x

1
(y, w
1
, w
2
), x

2
(y, w
1
, w
2
)) = y
and this holds even as w
1
varies, so, dierentiating w.r.t. w
1
:
f
1
x

1
w
1
+f
2
x

2
w
1
= 0
68
This means
x

2
w
1
=
f
1
f
2

1
w
1
So, since x

1
falls in response to a rise in w
1
, x

2
has to rise, and the rates of change are in the ratio
f
x1
/f
x2
. (Note that x

1
responds to a change in w
1
just as a demand function does in consumer
theory; the response is like a subsitution eect. Since the isoquant exhibits DMRTS, w
1
inc.
= x

1
dec.) And substituting (12.1) into (12.2),
c
w
1
= x

1
+
x

1
w
1
_
w
1
w
2
f
1
f
2
_
But w
1
w
2
(f
1
/f
2
) = 0 by the tangency condition, so the second and third terms on the RHS of
(12.2) always cancel, leaving us with (12.1).
Equation (12.1) says that if w
1
rises, the rst order eect on cost is proportional to the amount of
x
1
the rm originally was using. Although the optimal choices of x
1
and x
2
also change, they do so
in such a way that y remains constant, and because of the initial tangency condition the movements
in the inputs leave cost unchanged.
Figure 12.1: The price of x1 changes, and the rm adjusts x

1
and x

2
without aecting production.
69
13 Supply
13.1 Supply Determination
So far we have studied cost, taking output as given. In this lecture, we consider the output or
supply decision of individual competitive rms. By competitive, we mean the rm takes the prices
of inputs and outputs as exogenous (i.e. beyond the rms control). For any rm, prot is dened
as revenue minus cost. For a competitive rm that uses two inputs, 1 and 2, to produce a single
output y with unit price p, prot is given by
(y) = py c(y, w
1
, w
2
)
Note that revenue py is linear in output, whereas the cost function is potentially non-linear. Assume
the rm selects y so as to maximize prot:
max py c(y, w
1
, w
2
)
FONC:
d
dy
= p c
y
(y

, w
1
, w
2
) = 0
or, equivalently, price = marginal cost at y = y

. The SOC for a maximum is


d
2

dy
2
< 0 = c
yy
(y

, w
1
, w
2
) < 0 = c
yy
(y

, w
1
, w
2
) > 0 = MC is increasing at y = y

The diagram is shown in Figure 13.1a. Note that y

is a function of p and w = (w
1
, w
2
). We dene
the supply function to be y = y

(p, w
1
, w
2
). What if < 0 at y

(p, w)? See Figure 13.1b.


(a) (b)
Figure 13.1: (a) The rm selects y

such that MC = p. (b) p < AV C = y

= 0 and AV C < p <


AC = the rm is not turning a prot but its covering its operating costs, so it may be
advised to stay in business and hope for better times.
If p < AV C then y

= 0. The rm is losing on both xed and variable inputs: the best choice
is to shut down.
70
If p > AC, the rm is turning a prot, so y

is such that p = MC(y

).
If AV C < p < AC , the rm is incurring a loss, but its covering its operating costs, failing
only to cover its xed costs. The rm may well stay in business and hope for better times.
Figure 13.2 is a useful representation of the rms optimal choice.
Figure 13.2: The rectangle represents revenue py

while the area underneath MC represents costs (not


including xed costs). Thus the shaded area represents prots (not including xed cost
payments). Here we are using the fact that c(y) =

y
0
MC(s)ds +F.
Observations
If MC is constant (e.g. Cobb-Douglas with + = 1), then, assuming no xed costs,
p < MC = loss = y

= 0, and p MC = y = y

= (innite prot).
If MC is always decreasing, then supply is undened, if not zero.
Figure 13.3: At y

dened by p = MC(y

), prot is not maximized. Why? Consider a reduction in


output. Cost falls by MC and revenue falls by p, so actually increases. The SOC are not
satised since cyy < 0.
Examples:
71
y = x
a
, 0 < a < 1 (one input, DRS)
The input requirement function is x

(y) = y
1/a
, which does not depend on prices. Thus
c(w, y) = wx

(y) +F = wy
1/a
+F
where F = xed costs, and
MC(y) =
w
a
y
1a
a
AC(y) =
F
y
+wy
1a
a
The optimal output supply choice y

solves p = MC(y), which implies


p =
w
a
(y

)
1a
a
or
y

(p, w) =
_
ap
w
_ a
1a
Note the following:
y

is homogenous of degree zero in (p, w)


y

increases with p, decreases with w


y = x

1
x

2
, + < (Cobb-Douglas with DRS)
Recall that
c(y, w
1
, w
2
) = k
1
w

+
1
w

+
2
y
1
+
for some k
1
> 0. Therefore
MC(y) = k
2
y
1
+
w

+
1
w

+
2
for some constant k
2
. Setting p = MC and solving for y gives
y

= k
3
p
+
1
w


+
1
w


+
2
for some constant k
3
. Or, equivalently,
log y

= constant +
+
1
log p

1
log w
1


1
log w
2
Again y

is homogeneous of degree zero in (p, w), increasing in p, and decreasing in w


1
and
w
2
.
As an exercise, prove that for a general cost function, the competitive supply response is homoge-
neous of degree zero in all prices, (input and output). Hint: The cost function is homogeneous of
degree one in all input prices.
72
13.2 The Law of Supply
The Law of Supply states that competitive supply functions are always upward sloping:
y

p
> 0
Why? At the optimal level of supply, p = MC. But MC is increasing by the SOC, so if p increases,
the new optimal level of supply increases, too: we simply move along the MC schedule as in
Figure 13.4.
Figure 13.4: Assuming the SOC is satised, an increase in p is accompanied by an increase in y

since
the intersection moves upward and to the right.
Formally, y

is dened as the solution to


p c
y
(y

(p, w
1
, w
2
), w
1
, w
2
) = 0. (13.1)
This FONC holds even if we move p (or either of w
1
or w
2
for that matter). Therefore, dierentiating
both sides of (13.1) w.r.t. p,
1 c
yy
(y

(p, w
1
, w
2
), w
1
, w
2
)
y

p
= 0
hence
y

p
=
1
c
yy
(y

, w
1
, w
2
)
.
But c
yy
(y

(p, w
1
, w
2
), w
1
, w
2
) > 0 by the SOC, so y

/p > 0!
13.3 Changes in Input Prices
What is the eect of an increase in input prices on the rms output decisions? An increase in
input prices, (say w
1
), is associated with a shift in MC. See Figure 13.5.
In the case where MC rises with w
1
, we have y

/w
1
< 0. Is this always the case? We shall see
in the next section!
73
Figure 13.5: An increase in w1 causes the MC curve to shift, usually upward, which causes the intersection
of p and MC to move inward.
74
14 Input Demand for a Competitive Firm
In this lecture we describe the determination of input demands for a competitive rm that sells
output y at price p. Its production function is y = f(x
1
, x
2
). Inputs 1 and 2 have prices w
1
and
w
2
.
The rms optimal choice of (x
1
, x
2
) is determined in two steps. First, the rm constructs its cost
function c(y, w
1
, w
2
). This implicitly denes the optimal input demands x
1
and x
2
for each level of
y, given input prices.
c(y, w
1
, w
2
) = min
x1,x2
w
1
x
1
+w
2
x
2
s.t. y = f(x
1
, x
2
)
= w
1
x
c
1
(y, w
1
, w
2
) +w
2
x
c
2
(y, w
1
, w
2
)
where x
c
1
(y, w
1
, w
2
) and x
c
2
(y, w
1
, w
2
) are the conditional factor demands. The word conditional
signies that these input demands depend on the output choice. Note that x
c
1
and x
c
2
are very
much like the compensated demand functions for the consumer. In particular, setting L = w
1
x
1
+
w
2
x
2
[y f(x
1
, x
2
)], we have the following FONC:
L
1
= w
1
f
1
(x
1
, x
2
) = 0
L
2
= w
2
f
2
(x
1
, x
2
) = 0
L

= y +f(x
1
, x
2
) = 0
The ratio of the rst two FONC implies that w
1
/w
2
= f
1
/f
2
. Recall that f
1
is the marginal product
of input 1. The ratio f
1
/f
2
is called the marginal rate of technical substitution (MRTS). This is
the rms equivalent of the consumers MRS; it gives the slope of an isoquant at (w
1
, w
2
). So, the
rst order conditions for the cost-min problem are illustrated in Figure 14.1.
Figure 14.1: Illustration of FOC for cost-min problem.
Recall from Section 12.1 that
x
c
i
(y, w
1
, w
2
) =
c(y, w
1
, w
2
)
w
i
, i = 1, 2
75
Having determined the cost of producing a given level of output, the next step for the rm is to
choose what level of output to produce. It does so by maximizing prot = py c(y, w
1
, w
2
):
p c
y
= 0 = p = MC (14.1)
c
yy
< 0 =
MC
y
> 0 (14.2)
Equation (14.2) means that marginal cost must be rising. See Figure 13.1a. The optimal choice of
y, given (p, w
1
, w
2
), is the value y

such that
p = MC(y

, w
1
, w
2
)
i.e. output is chosen so that price equals marginal cost. Now we are ready to dene the rms
unconditional input choices. The rms unconditional input demands are simply:
x
i
(p, w
1
, w
2
) = x
c
i
(y

(p, w
1
, w
2
), w
1
, w
2
) (i = 1, 2)
In other words, the unconditional input demands are the conditional demands, for the optimal
choice of y. We can think of the problem of nding optimal input demand choices as one of solving
two problems simultaneously: cost-min and p = MC.
Figure 14.2: The level of production plays the role of utility in the consumer choice analogy: w1 rises,
conditional input demand falls.
What happens when w
1
rises? Since
x
1
(p, w
1
, w
2
) = x
c
1
(y

(p, w
1
, w
2
), w
1
, w
2
)
we have
x
1
w
1
=
x
c
1
w
1
+
x
c
1
y

w
1
76
The rst term is the response of optimal input demand, holding constant y. This is called the
substitution eect. It is just like the consumers substitution eect, which is dened as the change
in demand, holding constant u. Instead of being constrained to move along an indierence curve,
the rm is constrained to move along an isoquant as one can see in Figure 14.2.
The second term is called the scale eect. It is somewhat similar to the consumers income eect,
except the analogy can be misleading. It reects the fact when w
1
rises, the rms MC curve shifts,
so the optimal choice of y shifts. See Figure 14.3.
Figure 14.3: The optimal choice of y shifts due to a change in w1 Assuming input 1 is non-inferior, the
shift is upward.
Recall that if input 1 is non-inferior, then MC shifts upward when w
1
rises. Why?
MC
w
1
=

w
1
_
c
y
_
=

2
c
yw
1
=

2
c
w
1
y
=

y
_
c
w
1
_
=
x
c
1
y
Thus the derivative of MC w.r.t. w
1
is the same quantity as the derivative of the conditional
input demand function w.r.t. y. If input 1 is non-inferior, then x
c
1
/y > 0, so MC shifts upward
whenever w
1
rises.
77
In this case we have the pictures shown in Figure 14.4. When w
1
rises, the substitution eect and
scale eect both cause a reduction in demand for input 1.
With an inferior input, when w
1
rises, MC shifts downward. (E.g. when shovels rise in price the
marginal cost of holes goes down.) But the scale eect is also negative because although the rise
in w
1
causes the rm to want to increase output, input 1 is inferior, so the expansion in output
reduces demand! See Figure 14.5.
There is another way to look at the problem of input demandsa so-called direct approach.
Suppose the rm simply chose x
1
and x
2
to maximize
= pf(x
1
, x
2
) w
1
x
1
w
2
x
2
This is an unconstrained optimization problem, so the FONC are:
pf
1
(x
1
, x
2
) w
1
= 0 (14.3)
pf
2
(x
1
, x
2
) w
2
= 0 (14.4)
Note that dividing (14.3) by (14.4) returns the tangency condition f
1
/f
2
= w
1
/w
2
. Also, the rm
sets w
1
/f
1
= w
2
/f
2
= p. What do these equations mean? If the rm had to increase output, it
could do by increasing input 1 or input 2. If it used input 1, it would require 1/f
1
(x
1
, x
2
) = p units
to produce an additional unit of output. The marginal cost would be w
1
/f
1
(x
1
, x
2
). If instead the
rm used input 2, the marginal cost would again be w
2
/f
2
(x
1
, x
2
) = p.
Looking back at the Lagrangian for the cost-min problem, notice that the FONC are
w
1
= f
1
= = w
1
/f
1
and
w
2
= f
2
= = w
2
/f
2
Remember that is marginal cost. So, when the rm solves the cost-min problem and sets p =
MC = , it achieves the same result as if it had carried out the direct approach. Sometimes one
method is more convenient than the other, thats all.
78
(a) (b)
Figure 14.4: SE causes x1 to decrease, scale eect does too.
(a) (b)
Figure 14.5: Once again despite x1 being an inferior input, SE causes x1 to decrease, and so does scale
eect.
79
15 Industry Supply
The supply curve for an industry consists of the horizontal sum of the supply curves of each
individual rm as shown in Figure 15.1.
Notice that if rms vary with respect to their costs, at any market price some rms are proting,
some are on the margin, and others are out of business. A good example of this is the case of oil
wells. Some wells have low variable costs and always are protable to operate. Others are high-
cost, and are activated only when crude prices are high. We usually call the prots earned by the
infra-marginal suppliers rents. Presumably the lower costs of these rms arise from their control
over a scarce resource.
A competitive market is in equilibrium if the following conditions hold:
1. Each existing rm has p = MC and 0.
2. No remaining rm can aord to enter the market.
These ideas are applicable to the case of a single rm with multiple facilities, or plants. For
example if a rm owns two plants, with MC schedules MC
1
(y
1
) and MC
2
(y
2
), then the rm
operates eciently by viewing the plants as separate suppliers. See Figure 15.2 for an example of
this principle, called the principle of decentralization.
80
Figure 15.1: When prices reach p
1
, Firm 1 enters the market, when prices then reach p
2
, Firm 2 enters
the market (causing the discontinuity in the supply curve), and so on.
Figure 15.2: At prices below p
1
, the rm is completely inactive; at prices between p
1
and p
2
, only plant
1 is active, while at prices above p
2
, both plants are active.
81
16 Monopoly I
16.1 Monopolists Objective
A monopolist is the sole supplier in a given market. The critical feature of monopolistic behavior
is the fact that a monopolist sets the price, or quantity. Monopolies arise
(a) through exclusive control over resources, e.g. DeBeers monopoly of diamond marketing
(b) through exclusive legal rights, e.g. public utilities, drug companies with patents, etc.
Suppose the demand for output is represented by the function y = D(p). Then we can invert this
to p = p(y), where p = D
1
is usually referred to as the inverse demand function. A monopolists
prot is
(w
1
, w
2
, y) = yp(y) c(w
1
, w
2
, y)
The FONC for prot maximization is
p(y) +yp

(y) c
y
(w
1
, w
2
, y) = 0 = p(y) +yp

(y) = c
y
(w
1
, w
2
, y)
The LHS represents marginal revenue MR(y) = p(y) + yp

(y). If demand is downward sloping, as


usual, then p

(y) < 0, so MR(y) < p. This is the key point about a monopoly. Since a monopolist
controls the market, it cannot treat price as exogenous. Rather, it has to take into account the fact
that a rise in sales will necessarily come at the expense of a reduction in price. Note that there may
be close substitutes for a product. But as long as a rm is the sole supplier of a given product, it
has monopoly power.
Dene the elasticity of demand
=
y
p

p
y
=
1
p

(y)

p
y
= p

(y) =
1


p(y)
y
We then have
MR(y) = p(y) +yp

(y) = p(y) +y
_
1

__
p(y)
y
_
= p(y)
_
1 +
1

_
So, for a monopolist,
p(y)
_
1 +
1

_
= MC
As the market demand becomes closer and closer to a horizontal line, , demand becomes
perfectly elastic, and p = MC. In other words, in the limiting case, monopoly becomes perfect
competition.
The picture associated with monopoly is shown in Figure 16.1.
Observations:
A monopolist always sets MR = MC. Since MR = p(1 + 1/) and < 0, MR < p. If
|| < 1, then 1/ < 1 and MR is negative. It follows that a monopolist never operates in a
market in which demand is inelastic. Intuitively, if demand were inelastic, one could increase
82
Figure 16.1: The monopolist selects y

such that MC = MR.


revenue by raising the price! This is a very powerful result. It says that some markets cannot
be considered monopolies, namely those with measured elasticities of demand less than 1 in
absolute value.
If the monopolists MC schedule were the MC schedule of a price takeror a set of price takers,
i.e. a competitive industrythen equilibrium would occur at p = MC. This would entail
higher output and lower price, but lower prot to the industry as a whole. See Figure 16.2.
Figure 16.2: The area of the region bounded by p = p
M
, y = D(p
M
), and MC, is greater than the area
of the region bounded by p = p
C
, y = D(p
C
), and MC.
A monopolist does not have a supply schedule per se. First, the monopolist examines the
demand function. Then she establishes the price. There is no schedule of price/quantity
combinations.
The SOC for prot maximization is

y
(MRMC) < 0, or slope of MR < slope of MC. Even
if MC is downward sloping, there may still exist an equilibrium for the monopolist.
16.2 Comparative Statics
See Figure 16.3. Note the following:
83
Figure 16.3: If MC increases, output falls, assuming MR is negatively sloped, which is usually the case.
If MC increases, (say because an input becomes more expensive), output will fall, assuming
MR is negatively sloped.
Factors that shift MR will cause output to increase or decrease along the MC schedule. A
constant elasticity of demand function gives
y = Ap

z
I
e
where z is another good, I is income, p = price of y, and p
z
= price of z. Inverse demand is
given by
p = A
1/
y
1/
p
/
z
I
e/
and
MR = p (1 + 1/) = (1 + 1/) A
1/
y
1/
p
/
z
I
e/
Thus increases in I or p
z
shift MR.
Examples:
Linear y = a by, and c(y) = +y. To nd inverse demand, note that p = a/b y/b. Let
a

= a/b and b

= 1/b. Inverse demand may be written p = a

y. Revenue is given by
yp(y) = a

y b

y
2
= MR(y) = a

2b

y. See Figure 16.4. Equating MC and MR, we


obtain a

2b

y = , or
y

=
a

2
and
p

= a

by

=
a

+
2
Exponential y = ap

, < 1. Inverse demand is given by p = a

y
1/
, where a

= a
1/
, and
revenue equals yp(y) = a

y
1+1/
, hence we have
MR = (1 + 1/)a

y
1/
84
Suppose cost also is exponential, i.e. c(y) = y

, > 0. This implies MC = y


1
. Prot
is thus a

y
1+1/
y

, and the FONC is


(1 + 1/)a

y
1/
= y
1
The SOC is
1 + 1/

y
1/1
( 1)y
2
> 0
which is automatically satised whenever > 1. Solving the FONC,
y =
__
1 +
1

_
a

_

(1)1
Note that the optimal choice depends on the parameters of the demand and cost functions.
A change in elasticity of demand causes a shift in the optimal choice of output.
Figure 16.4: Marginal revenue in the case of linear demand.
16.3 Monopoly in Two or More Markets
Suppose a monopolist has access to two markets.
Market 1: p
1
= p
1
(y
1
)
Market 2: p
2
= p
2
(y
1
)
If trade is restricted between the two markets, then p
1
and p
2
can dier. The rms prots are
= p
1
y
1
+p
2
y
2
c(y
1
+y
2
)
The FONC are
p
1
+y
1
p
1
y
1
c

(y
1
+y
2
) = 0, or MR
1
= MC
and
p
2
+y
2
p
2
y
2
c

(y
1
+y
2
) = 0, or MR
2
= MC
85
Since MR
1
= p
1
(1 +1/
1
) and MR
2
= p
2
(1 +1/
2
), and in light of the FONC, our model predicts
that
p
2
(1 + 1/
1
) = p
2
(1 + 1/
2
) = p
1
/p
2
=
1 + 1/
2
1 + 1/
1
=

1

2
_
1 +
2
1 +
1
_
For example, if
1
= 1.5 and
2
= 2.5, then p
1
/p
2
= 1.82. The monopolist charges more in the
more inelastic market. This is known as price discrimination.
86
17 Monopoly II
We have shown that a monopolist prefers to distinguish between markets and charge more to
customers with more inelastic demand. This phenomenon is called price discrimination. Sellers
have a strong incentives to attempt to separate customers according to their demand elasticities,
and charge discriminatory prices. Consumers, on the other hand, have a strong incentive to imitate
high-elasticity consumers. There are many devices to separate consumers according to elasticity:
Advanced purchase versus regular coach fares on airlines. Here, the airlines discriminate against
customers who book at the last minute (typically business travelers) and charge lower prices
to consumers who are willing to shop around.
Single tokens versus monthly passes on public transit. Presumably, commuters have more
elastic demand for public transit than out-of-town or occasional passengers.
Discount coupons. Here, retailers are willing to charge lower prices to consumers who are
better informed while continuing to charge high prices to consumers who cant be bothered
with coupons (and therefore reveal themselves as having inelastic demands).
After-season sales, special Monday-and-Tuesday only sales. Again, retailers are attempting
to separate high-elasticity consumers from those who want only up-to-date items at the peak
of their popularity.
In each case, the key to price discrimination is to impose a cost on the low-price consumers (those
with more elastic demands), in order to prevent high-price consumers from masquerading as low-
price consumers. The cost must be too high for low-elasticity consumers, yet not high enough to
discourage others from buying altogether.
As an example, suppose that, across a population, individual demand elasticities are negatively
correlated with wages. Those with the highest wages have the most inelastic demand; those with
the lowest wages have the most elastic demand. A rm can use a queue, or line-up, as follows:
charge a high price with no waiting time, and a low price to those who are willing to line up in a
queue for a while, (e.g. price dierence between buying a ticket at a box oce versus buying over
the phone.) For a consumer with wage w, the full price is given by
p

=
_
p if she decides not to wait
p +wt d if she waits, where d = price discount and t = waiting time
For this individual, if wt > d, she bypasses the line and pays p, whereas if wt < d, she waits in
line and pays p d. The rm has successfully charged two prices! Another way to implement price
discrimination is by charging less to those who buy more. Suppose, for example, that there are two
kinds of buyers, (1) low-volume buyers with inelastic demand, and (2) high-volume buyers with
elastic demand. See Figure 17.1. The monopolist can choose y
0
between y

1
and y

2
and oer a
two-tiered price system: p
1
/unit for those who buy less than y
0
, and p
2
/unit for those who buy
at least y
0
. Note that we must have p
1
y

1
< p
2
y

2
, or else the low-volume customers would buy y
0
units and discard what they dont need. The ultimate price discrimination strategy would involve
charging a separate price for each unit sold, as in Figure 17.2 (for the rst unit sold, charge p
1
, for
the 20th unit, p
20
, etc.). Notice that in this case the MR of the next unit sold is equal to its price,
since the seller doesnt have to lower prices on the infra-marginalpreviously soldunits to sell
87
Figure 17.1: On the left, low-volume buyers with inelastic demand, and on the right, high-volume buyers
with elastic demand. The monopolist can price discriminate in this case.
Figure 17.2: Ideally, the monopolist would charge a separate price for each unit sold.
an additional unit, which means the MR curve is identical to the demand function.
9
Thus under
perfect price discrimination:
quantity is equal to its level under perfect competition
monopolist revenue = area underneath demand curve
Relative to a perfect price discrimination scheme, consumers benet when all consumers pay the
same price. The savings to all consumers is the shaded area underneath the demand curve, above
the price line in Figure 17.3. This area is sometimes called consumers surplus (CS). By analogy,
the area over the MC curve and up to price line is called producers surplus (PS). We have noted
previously that this area equals revenue less total variable cost, so PS = +F. Also, we saw that
in a competitive industry, the supply schedule is simply the combined MC schedule of the rms that
comprise the industry. The area between the supply and demand curves, (or MC and demand),
represents the total surplus CS + PS pictured in Figure 17.4. This is consistent because CS and
PS both are measured in dollars. Applied economists often evaluate the eect of a government
intervention in a given market by computing CS +PS +GC, where GC denotes the cost to the
9
Actually, as you shall see if you continue reading, this is not quite true. The demand function represents the
number of units demanded when all units sell for a given price.
88
Figure 17.3: The shaded region is sometimes called consumers surplus.
Figure 17.4: The dark region is referred to as producers surplus.
government. The appeal of this exercise is obvious: it assigns a dollar value to the ineciency that
arises due to monopolization or the imposition of a tax/subsidy. Nonetheless, there is a problem.
Recall that if y = D(p) is demand function, D indicates how much is purchased when y costs
p/unit. On the other hand, D says nothing regarding demand when the price of the next unit is
p but prices for all previous units are higher. In general, having paid more for the inframarginal
units, the consumers who purchased them have less income with which to purchase additional units.
Higher prices on the inframarginal units have an income eect that is not captured by the ordinary
demand function. In fact, the only case in which CS and PS analysis is completely legitimate is
the one in which demand does not depend on income (the slope of the indierence curve through
x
1
= (x
1
, x
1
2
), at x
1
, equals the slope of the indierence curve through x
2
= (x
1
, x
2
2
), at x
2
), or
each consumer buys at most one unit of the commodity (so that higher prices for the rstand
onlyunit purchased dont lower subsequent demands).
In spite of this problem, CS and PS analysis is a good starting point for evaluating the merits of a
market intervention. For example, suppose that a market is in equilibrium at p = p
0
, x = x
0
, when
a per-unit tax of t is imposed as in Figure 17.5. Demand falls to x
1
, and the amount received by
supplies falls to p
1
. Tax revenue is tx
1
. The combined loss in CS and PS, however, exceeds the tax
revenue by an amount equal to the area of the shaded triangle. This excess loss is referred to as
the deadweight loss due to the tax. It provides a rough estimate of the ineciency brought on by
89
the tax.
Figure 17.5: The tax scenario pictured is inecient because the shaded triangle can be thought of as lost
since it is neither revenue nor savings to any of the parties involved in this market.
Exercises:
1. Calculate deadweight loss in terms of elasticities of supply and demand.
2. Prove that CS +PS is a maximum when D intersects MC.
3. Calculate CS + PS when a competitive industry is monopolized.
90
18 Consumers Surplus
In Econ 1 you probably were introduced to the concept of consumers surplus (CS). Consider a
consumer who is choosing between two goods, x and y. Denote by x(p
x
, p
y
, I) the consumers
demand for good x, given prices p
x
and p
y
, and income I. Now suppose the price of good x rises
from p
0
x
to p
1
x
. The change in consumers surplus is the shaded region in Figure 18.1, which can be
written as
CS =
_
p
1
x
p
0
x
x(p
x
, p
y
, I)dp
x
As we noted previously, there is a problem with CS: although the vertical height of the inverse
demand function appears to be the most you would be willing to pay for each additional unit,
if someone actually charged you dierent prices for each unit, your demand would not be given
by the conventional demand curve, (since that is derived under the assumption that you pay the
same price for every unit you purchase). There is, however, a measure of welfare that does make
sensein fact, there are two. Let u
0
represent utility at (p
0
x
, p
y
, I), and let u
1
represent utility at
Figure 18.1: The shaded region represents the change in consumers surplus due to an increase in px.
(p
1
x
, p
y
, I). Note that u
0
> u
1
since a rise in prices makes a consumer worse o.
Also note that I = e(p
0
x
, p
y
, I). In other words I is the minimum amount of money needed to
achieve u
0
at prices p
0
x
and p
y
. This follows by the fact that the consumer wasnt wasting money
initially.
Likewise, I = e(p
1
x
, p
y
, u
1
). (Make sure you understand why this must be true.)
Consider the quantity
EV = e(p
1
x
, p
y
, u
1
) e(p
0
x
, p
y
, u
1
) = I e(p
0
x
, p
y
, u
1
)
This is the amount of money one would have to take away from our consumer initially, leaving
prices alone, so that he would be indierent regarding a rise in prices. This is called equivalent
variation. It can be thought of as the income equivalent of a rise in prices or, more specically, a
natural means of measuring the eect on welfare of a rise in prices.
Alternatively, consider the quantity
CV = e(p
1
x
, p
y
, u
0
) e(p
0
x
, p
y
, u
0
) = e(p
1
x
, p
y
, u
0
) I
91
This is the amount of money one would have to provide our consumer in order for him to be as
well o under the new prices as he was initially. This is called the compensating variation. It also
appears to be a plausible measure of the eect on welfare of a rise in prices.
Now we shall use Sheppards Lemma to connect these two quantities to the area underneath com-
pensated demand curves. Specically, start from the fact that

p
x
e(p
x
, p
y
, u
0
) = x
c
(p
x
, p
y
, u
0
)
By the Fundamental Theorem of Calculus,
e(p
1
x
, p
y
, u
0
) = e(p
0
x
, p
y
, u
0
) +
_
p
1
x
p
0
x

p
x
e(p
x
, p
y
, u
0
)dp
x
hence we have
CV =
_
p
1
x
p
0
x
x
c
(p
x
, p
y
, u
0
)dp
x
which is the area underneath the compensated demand curve from p
0
x
to p
1
x
. See Figure 18.2.
Note that x
c
(p
0
x
, p
y
, u
0
) = x(p
0
x
, p
y
, I), so the regular demand curve, with I, and the compensated
Figure 18.2: The area of the shaded region is the Compensating Variation.
demand curve, with u
0
, intersect at (x
0
, p
0
x
). But the regular demand curve is atter. Why? Recall
that by Slutsky:
x
p
x
=
x
c
p
x
x
x
I
If x is a normal good, then x/I > 0, and a rise in prices causes the regular demand to decrease
faster than the compensated demand because of the income eect, hence it appears atter. All of
this implies CV > CS for a normal good.
For the EV ,
e(p
1
x
, p
y
, u
1
) = e(p
0
x
, p
y
, u
1
) +
_
p
1
x
p
0
x

p
x
e(p
x
, p
y
, u
1
)dp
x
= e(p
0
x
, p
y
, u
1
) +
_
p
1
x
p
0
x
x
c
(p
x
, p
y
, u
1
)dp
x
92
so
EV =
_
p
1
x
p
0
x
x
c
(p
x
, p
y
, u
1
)dp
x
which is the area underneath the compensated demand curve between p
0
x
and p
1
x
, with u = u
1
,
which intersects the regular demand curve at (x
1
, p
1
x
). So we have Figure 18.3 for a normal good x.
We have shown that CV > CS > EV , so you can think of CS as approximating either one of
these.
Figure 18.3: The area of the shaded region is the Equivalent Variation.
93
19 Duopoly
The simplest market to analyze, in between the two extremes of perfect competition and monopoly,
is one with two suppliers. In particular, suppose there are two suppliers of a homogeneous good,
one that cannot be dierentiated by consumers. Let y
1
denote the amount supplied by Firm 1,
and y
2
the amount supplied by Firm 2 so that the inverse demand function is given by p(y
1
+y
2
).
Note that inverse demand is a function of the sum y
1
+y
2
, reecting the assumption that the two
outputs are perfect substitutes. We shall assume the following, for simplicity
p(y
1
+y
2
) = a b(y
1
+y
2
), i.e. linear demand
MC = c/unit, a constant
The problem facing these rms is simple:
Firm 1: choose y
1
so as to maximize
1
(y
1
, y
2
) = y
1
p(y
1
+y
2
) cy
1
Firm 2: choose y
2
so as to maximize
2
(y
1
, y
2
) = y
2
p(y
1
+y
2
) cy
2
Note that Firm 1s obejctive function depends on Firm 2s choice, and vice versa.
19.1 Monopolization
What would a monopolist do? Suppose a monopolist owned both rms. Then she would choose y
1
and y
2
as follows:
max
y1,y2
y
1
p(y
1
+y
2
) +y
2
p(y
1
+y
2
) cy
1
cy
2
= max
y1+y2
(y
1
+y
2
)p(y
1
+y
2
) c(y
1
+y
2
)
= max
y
yp(y) cy
= max
y
(a c)y by
2
The FONC is (a c) 2by = 0, or
y = y
M
=
a c
2b
where M signies monopoly. This implies that
p = p
M
=
a +c
2
Now suppose p = p
M
but the rms have separate ownership groups, each producing y
M
/2. (This
would constitute a perfect cartel.) Is this an equilibrium? Probably not. For Firm 1,
MR
1
=

y
1
[y
1
p(y
1
+y
2
)]
94
If Firm 1 could increase output with the assurance that Firm 2 would not follow suit, then
MR
1
= p(y
1
+y
2
) +y
1
p
y
1
= p
M
+
b
2
y
M
=
a +c
2

a c
4
=
3c
4
+
a
4
> c = MC
Thus under a joint monopoly (both rms producing y
M
/2), each rm has an incentive to cheat.
For the industry as a whole,
(y
1
+y
2
)p(y
1
+y
2
) = a(y
1
+y
2
) b(y
1
+y
2
)
2
= MR = a 2b(y
1
+y
2
)
For an individual rm, however,
y
1
p(y
1
+y
2
) = y
1
[a b(y
1
+y
2
)] = MR
1
= a 2by
1
> MR
This is a fundamental problem with a cartel; each rm has an incentive to cheat and produce more
if it has any reason to believe the other rm will hold constant its production. The reason is that
when Firm 1 increases output, it considers only how this aects the price of the units it sells; Firm
1 ignores the fact prices fall for Firm 2 as well. A monopolist, by contrast, takes account of the full
eect of a price change on all units sold.
19.2 Duopoly Equilibrium
How does the duopoly market equilibrate? The answer depends how much each rm believes the
other will react to a change in the level of output. The simplest assumption is the one we made
above, that Firm 1 does not believe Firm 2 will adjust its output and vice versa. This assumption
was suggested by Counot, a 19th century French economist. Lets consider Firm 1s optimal choice
in this case. Fix y
2
. Then Firm 1s objective is
max
y1
y
1
p(y
1
+y
2
) cy
1
= max
y1
y
1
[a b(y
1
+y
2
)]
The FONC is
a by
2
2by
1
= 0
The SOC is not a concern. (Why?) This leads to
y
1
= y

1
(y
2
) =
a c by
2
2b
The function y

1
is called Firm 1s reaction function. It represents the optimal choice by Firm 1, as
a function of Firm 2s level of output, under the Cournot assumption that Firm 2 will not respond
further.
Observations:
95
If y
2
= 0, then Firm 1 acts as a monopolist: y

1
(0) = (a c)/2b = y
M
.
If y
2
(a c)/b = 2y
M
, then y

1
(y
2
) = 0, that is, Firm 1 is driven out of the market.
The slope of the reacion function is 1/2. Every two additional units produced by Firm 2
cause Firm 1 to reduce output by one unit.
Figure 19.1: Firm 1s reaction function, assuming linear demand.
By the same token, there is a reaction function for Firm 2, taking Firm 1s output as given.
Following the same procedure as above,
y

2
(y
1
) =
a c
2b

y
1
2
If Firm 1 decides upon its ouput, given Firm 2s output, and Firm 2 does the same, then where
does this process end? Presumably, it ends when Firm 1s choice, taking Firm 2s output as given,
is such that given this level of output, Firm 2 produces the same level of output as Firm 1 thought
it would. Formally,
y
1
= y

1
(y

2
(y
1
))
If y
1
is an equilibrium choice, then it has the property that when Firm 1 chooses y
1
, Firm 2
chooses y

2
(y
1
), and the optimal response by Firm 1 is y

1
(y

2
(y
1
)), which leads us back to y
1
. In
mathematical terms, y
1
is called a xed point of the composition of functions y

1
y

2
.
Fortunately for us, there is a convenient way of visualizing a Cournot equilibrium. We simply plot
the reaction functions (remembering which is which!) as in Figure 19.2. Equilibrium occurs when
y
1
= y

1
(y
2
) = y

1
(y

2
(y
1
)) =
a c
2b

1
2
_
a c
2b

y
1
2
_
Solving,
y
1
= y
2
=
2
3
y
M
and therefore
p =
a + 2c
3
The details are left to the reader.
96
Figure 19.2: Cournot equilibrium supply y
0
=
2
3
y
M
.
19.3 Price Setting vs. Quantity Setting
The previous section was an analysis of the outcome when two duopolists take each others output
as given. A similar analysis can be carried out when duopolists set prices. For example, consider
end-to-end railroads that wish to set the rates for freight. Railroad 1 hauls from point A to point B,
at p
1
/ton, and Railroad 2 hauls from point B to point C, at p
2
/ton. Demand for transport services
from A to C depends on p
1
+p
2
. Assume for the sake of simplicity, that demand is linear:
x = a b(p
1
+p
2
)
Note that this means the two segments are perfect complements. (They are consumed together, so
demand is a function of p
1
+p
2
only.) Lets assume, too, that cost per ton for Railroad 1 is c
1
, and
cost per ton for Railroad 2 is c
2
. Suppose a single rm owned both railroads. Then it would choose
a total price p so as to maximize
(p) = (a bp)(p c
1
c
2
)
The FONC implies
p =
a +b(c
1
+c
2
)
2b
= p
M
where p
M
denotes the monopolists price.
Now suppose the two railroads act as duopolists, each taking the others price as given. For the
rst railroad,

1
(p
1
, p
2
) = [a b(p
1
+p
2
)](p
1
c
1
)
The FONC implies
p

1
(p
2
) =
a bp
2
+bc
1
2b
which looks a lot like the reaction function in the quantity-setting scenario. In particular, the slope
is again 1/2. By symmetry, Railroad 2s reaction function is
p

2
(p
1
) =
a bp
1
+bc
2
2b
97
In equilibrium, p
1
= p

1
(p

2
(p
1
)). Solving,
p

1
=
a
3b
+
2c
1
c
2
3
and
p

2
=
a
3b
+
2c
2
c
1
3
For price-setting duopolists who sell perfectly complementary products in a market with linear
demand,
p

1
+p

2
=
2a
3b
+
c
1
+c
2
3
Note that
p

1
+p

2
p
M
=
a b(c
1
+c
2
)
6b
If the railroads charged a combined c
1
+ c
2
, demand would be x = a b(c
1
+ c
2
) > 0. Thus the
duopolists actually charge an even higher price than a monopolist. This special result is due to the
perfect complementarity.
98
20 Symmetric Cournot Equilibria
20.1 n-Firm Symmetric Cournot Equilibria
Duopoly is simple when the two rms are identical and equilibrim is symmetric, with each rm
producing an equal share of the industry supply. Let us continue to assume linear demand. Recall
that for Firm 1,

1
= y
1
p(y
1
+y
2
) cy
1
= ay
1
by
1
(y
1
+y
2
) cy
1
= (a by
2
c)y
1
by
2
1
The FONC is
a by
2
c = 2by
1
Let y
1
= y
2
= y
0
, since we are in a symmetric equilibrium. Now solving the FONC,
y
0
=
a c
3b
=
2
3
y
M
The same appeal to symmetry enables us to solve for equilibrium output in a market with n
suppliers. In this case

1
= y
1
p
_
n

i=1
y
i
_
cy
1
= ay
1
b
n

i=1
y
i
cy
1
The FONC is
a b
n

i=2
y
i
2by
1
c = 0
As before, y
i
= y
0
for all i, so
y
0
=
a c
b(n + 1)
and
p = p
0
= a b(ny
0
) =
_
1
n + 1
_
a +
_
n
n + 1
_
c
As the number of rms increases, (relative to the size of the market), the symmetric Cournot
equilibrium has each rm supplying less and less, and price converging to the competitive price c.
As a practical matter, the presence of xed costs often prevents us from having a large number of
rms in a given industry. With xed costs, there is a social cost to more rmsnamely, that total
xed costs associated with the industry riseas well as a benet due to less monopolistic behavior.
In our example, since costs are constant, there is no ineciency as output per rm falls.
The problem is illustrated by Figure 20.1. In each case, the rm has c(y) = ky
2
/2+F, and therefore
MC = ky and AC = k/2y + F/y, which is U-shaped. Optimal AC is achieved by choosing y such
that MC = AC, or y
E
=
_
2F/k. (E signies ecient scale.) In Figure 20.1b, in order to have
three or more rms, p must exceed p
0
; otherwise, rms would fail to recover their xed costs. In
some cases p exceeds even the price that a monopolist would charge.
99
(a) (b)
Figure 20.1: (a) Competitive paradigm. min AC is achieved at small scale relative to the size of the
market. (b) Non-competitive paradigm. min AC is achieved at a level of output that is large
relative to the size of the market.
20.2 Alternatives to the Cournot Assumption
1. Return for a moment to the duopoly model of Section 19, with linear demand and constant
marginal cost. Recall that Firm 1s objective is to maximize

1
= y
1
p(y
1
+y
2
) cy
1
Under the Cournot assumption Firm 1 selects y

1
(y
2
) under the assumption that y
2
is xed.
Suppose, however, that Firm 1 has reason to believe Firm 2 will respond to Firm 1s choice
by setting y
2
= (y
1
). What does Firm 1 do in this case? The FONC is
p(y
1
+y
2
) +y
1
p

(y
1
+y
2
)[1 +

(y
1
)] c = 0
For example, Firm 2 might announce: We plan to increase our output in (constant) propor-
tion to yours. Then
dy
2
y
2
=
dy
1
y
1
which implies

(y
1
) =
dy
2
dy
1
=
y
2
y
1
and the FONC becomes
p(y
1
+y
2
) + (y
1
+y
2
)p

(y
1
+y
2
) c = 0
But this should remind you of the FONC for joint prot maximization that we saw in Sec-
tion 19.1. Therefore, if each rm announced to the other the rule that
dy
i
dy
j
=
y
i
y
j
i, j = 1, 2
the two rms maintain the same level of output as a joint-monopoly, (provided each believes
the other).
100
2. A second class of alternatives to the Cournot assumption involves a duopoly in which one
rm is savvy and the other one is naive. Suppose for example that Firm 2 always takes
y
1
as given, i.e. Firm 2 adopts the Cournot assumption. Firm 1 on the other hand is savvy,
and recognizes Firm 2 is employing the Cournot reaction function y

2
(y
1
). Firm 1 is said to
be a Stackelberg leader while Firm 2 is a Stackelberg follower. (Stackelberg was an early 20th
century German economist.) It can be shown that (1) the leader does better than the follower,
(2) the leader does better than either rm would in a symmetric Cournot model, and (3) the
follower does worse than either rm would in a symmetric Cournot model.
10
10
Condition (3) is redundant since the symmetric Cournot equilibrium is Pareto optimal.
101
21 Game Theory I
In Sections 19 and 20 we considered a duopoly with linear demand
p(y
1
+y
2
) = a b(y
1
+y
2
)
and constant marginal cost
MC
1
= MC
2
= c
We identied three possible strategies:
1. Cooperation. Each rm produces y
M
/2.
y
i
=
y
M
2
=
a c
4b
i = 1, 2
p = p
M
=
a +c
2

i
=

M
2
=
(a b)
2
8b
i = 1, 2
2. Joint non-cooperation. Each rm produces y
0
= 2y
M
/3.
y
i
=
2
3
y
M
=
a c
3b
i = 1, 2
p = p
0
=
a + 2c
3

i
=
0
=
(a c)
2
9b
i = 1, 2
The situation is jointly non-cooperative in the sense that each rm is acting in its own,
narrowly dened best interest, given what the other rm is producing. Given that Firm 1
produces y
0
, Firm 2 is advised to produce y
0
as well.
3. Cheating given that your competitor is cooperating. For example if Firm 1 sets y
1
= y
M
/2,
Firm 2s best response is
y

2
_
y
M
2
_
= y
M

1
2
_
y
M
2
_
=
3
4
y
M
which means
p = p
C
=
3a + 5c
8
We have also

1
=
L
=
3(a c)
2
64b

2
=
W
=
9(a c)
2
64b
102
where W stands for winner (in this case cheater!), and L of course stands for loser. Notice
that

W
>
1
2

M
>
0
>
L
Cooperation is better than joint non-cooperation but, given that your competitor is cooper-
ating, your best response is to cheat.
We can illustrate the dilemma in a box such as Figure 21.1, showing each rms actions and the
resulting payos as ordered pairs.
Cooperate Don't Cooperate
Cooperate
M
/2,
M
/2
L
,
W
Don't
Cooperate

W
,
L

0
,
0
Firm 2
F
i
r
m

1
Figure 21.1: The strategies are listed along the edges of the box. The payos are listed in order with
Player 1 rst.
E.g. if Firm 1 cooperates and Firm 2 does not, the payos are (
L
,
W
), where the rst coordinate
corresponds to Firm 1 and the second to Firm 2. Set (a c)
2
/b = 1. Then our game box looks
like Figure 21.2.
C C
C 1/8, 1/8 3/64, 9/64
C 9/64, 3/64 1/9, 1/9
P
l
a
y
e
r

1
Player 2
Figure 21.2: C stands for Cooperate, and C for not C, or Dont Cooperate.
Given a box like this, we can gure out which stategy each player will adopt.
Suppose Firm 2 believes Firm 1 will play C (cooperate). Firm 2 then checks the second
coordinate of each entry in row 1: 9/64 > 1/8, so Firm 2 plays C (dont cooperate). However,
if Firm 2 believes Firm 1 will play C, then Firm 2 checks the second coordinate of each entry
in row 2: 1/9 > 3/64, so again Firm 2 plays C.
We can evaluate Firm 1s choices the same way, only this time we check columns rather than
rows, and we compare rst coordinates. The result is the same: Firm 1 is better o playing
C, regardless of Firm 2s choice.
Notice that in this game there is always an incentive for each player to choose C, regardless of what
the other player does. An action that is always the best response is called a dominant strategy. The
game pictured in Figure 21.3 doesnt have a unique dominant strategy. In this game, we say that
(C,C) and (C,C) are Nash equilibria. A Nash equilibrium in a 2-player game is a combination
of strategies (S, T) such that
103
1. Given that Player 1 has chosen S, Player 2s best response is T.
2. Given that Player 2 has chosen T, Player 1s best response is S.
C C
C 2, 2 1/2 3/2 2 chooses C if 1 plays C
C 3/2, 1/2 1, 1 2 chooses C if 1 plays C

1 chooses C if 2
plays C
1 chooses C if 2
plays C
Player 2
P
l
a
y
e
r

1
Figure 21.3: (C,C) and (C,C) are Nash equilibria since given that Player 1 or Player 2 plays C, his
opponents best response is to play C, and likewise for C.
The duopoly game has a unique Nash equilibrium in (C,C). The game in Figure 21.3 has two
Nash equilibria, although one is superior to the other.
You may have seen the duopoly game in disguise. One common version is known as the Prisoners
Dilemma. Suppose you and a former friend are involved in a legal dispute. You and he will appear
before a judge who will determine who takes custody of the cat you bought together. You can hire
a lawyer, or not. Suppose further that you estimate the probability of winning is 1/2 if neither of
you hires a lawyer, or if you both hire a lawyer. But, if one of you hires a lawyer and the other
one does not, the one who is represented by a lawyer wins with probability 3/4. As we can see by
looking at the box in Figure 21.4, hiring a lawyer is a dominant strategy. The problem is lawyers
No Lawyer Lawyer
No Lawyer 1/2, 1/2 1/4, 3/4
Lawyer 3/4, 1/2 1/2, 1/2
ex-Friend
Y
o
u
Figure 21.4: Hiring a lawyer is a dominant strategy in this game.
cost money, so your true payo with a lawyer is lower than the box suggests. In fact both parties
are better o agreeing not to hire lawyers. But this is not a Nash equilibrium. Figure 21.5 displays
real data pertaining to child custody cases in California in the early 1980s.
It may be possible to induce cooperation in a game that is played repeatedly. For example, consider
the following long-term strategy by a participant in a duopoly game:
If Player 1 sees that the price last time was p
M
, then she produces y
M
/2.
If Player 1 sees that the price last time was p
C
, she infers that Player 2 cheated, and punishes
Player 2 by producing y
0
the next k times, after which she reverts to y
M
/2.
Questions to consider:
104
No Lawyer Lawyer
No Lawyer 75% 86%
Lawyer 49% 65%
Mother
F
a
t
h
e
r
Figure 21.5: Percentage of Mothers Awarded Child Physical Custody in San Mateo and Santa Clara
Counties, California, 1984. Source: Mnookin, Maccoby, Depha, and Albiston (1989)
1. Does the threat of punishment stop Player 2 from cheating?
2. Is the threat credible?
To answer Question 1, consider the costs and benets of cheating in the current period, ignoring
the time value of money:
Benet =
C

M
/2 = 9/64 1/8 = 1/64
Cost = k(
M
/2
0
) = k/72
Clearly, if k 2, the threat will deter cheating! As for Question 2, the punishment is to produce
y
0
. This is not too crazy, but is it credible? Given that Player 2 has cheated, he could simply
claim that it was an honest mistake and promise not to do it again. Player 1 could then bypass the
punishmentdoes this sound familiar?and save herself k(
M
/2
0
) too! So she has a strong
incentive not to follow through on her threat. This is an example of a dynamic inconsistency.
Player 1 would like to commit herself to carrying out the punishment in return for a deviation from
cooperative play but, given that Player 2 has cheated, she hurts herself by doing so and therefore
has an incentive to bail out early.
105
22 Game Theory II
22.1 Tree Diagrams
In Section 21 we described a punishment strategy for the repeated Cournot game in which a player
chooses his current level of output based on last periods price: if p
t1
< p
M
, he decides to produce
y
0
for the next k periods. Afterwards he reverts to cooperative play, producing y
M
/2. We showed
that this strategy is eective, provided his compeitor believes he actually will execute it. But should
his competitor believe him? The same issue arises in numerous contexts:
The Cold War. The U.S. threatened to start nuclear war with the USSR if the USSR invaded
Western Europe. Many Europeans themselves believed that even if the USSR invaded, the
U.S. simply would cut its losses.
Flood relief. The government would like to discourage homeowners from living in ood prone
areas such as the New Jersey shore. But when a ood strikes, the government inevitably
oers disaster relief.
Entry deterrence. A grocery store currently is a monopolist in a certain town. Another chain
is considering building a new store to compete with the existing one. The incumbent threatens
to reduce prices if the chain enters the market.
Figure 22.1: The rst coordinate is the payo to Player 1, the incumbent, and the second coordinate is
the payo to Player 2, the potential entrant. Note that we could replace (0, 0) and (
0
,
0
)
with (0, F) and (
0
,
0
F), where F denotes xed cost of entry.
We can analyze simple dynamic games with the aid of a tree diagram such as the one in Figure 22.1,
which shows each partys possible moves. Consider the entry deterrence game. First, the potential
entrant decides whether to enter. Then, the incumbent decides whether to engage in a price war.
106
As before,

M
= prot per year for incumbent without any competitors

0
= prot per year for each rm if entry followed by Cournot duopoly
In a price war, the incumbent charges p = c and earns no prot. Notice that once the potential
entrant (Player 2) has acted, its up to the encumbent to decide where to go from there. Suppose
Player 2 has entered. The incumbent (Player 1) has to choose between the top two nodes:
0
> 0,
so clearly it doesnt make sense to ght once the competitor has entered. Thus we can conclude
that Player 1 will choose dont ght.
Player 2 on the other hand looks at the ultimate payos to entry. If she enters, she gets
0
since
she knows that Player 1 will choose dont ght, so she always enters.
The method we used to analyze this game is called backward induction. At the last stage, depending
whose turn it is, we deduce this players action by comparing his payos. Then we back up to the
previous move.
Notice that (enter, dont ght) is the dynamically consistent equilibrium here. (enter, ght) is not
dynamically consistent even though Player 1 threatens to ght, because given that Player 2 has
entered, Player 1 seeks to maximize his payo and thus doesnt ght.
Implications:
In the Cold War, it was not a credible threat to promise all-out nuclear war if the USSR
invaded Western Europe.
In hostage situations, it is not a credible threat to claim that you dont negotiate with
terrorists.
In the entry game above, it is not a credible threat to claim that you will wage a price war if
another supplier enters the market.
The punishment strategy outlined in Section 21 is not a credible threat.
22.2 Interpretation
The prededing analysis is predicated on players behaving rationally; despite threatening to do
something, once the time comes to make good on the threat, they always do what is in their best
interest, regarless of the events leading up to that time. This is encountered quite often in economics
and nance, e.g. bygones are bygones and sunk costs dont count.
Notice that in our entry game, the incumbent (Player 1) would like to be able to commit herself to
behaving irrationally. If the entrant (Player 2) knows that the incumbent will in fact ght, then he
wont enter, especially if there is a xed cost of entry.
Suppose there is an earlier decision that Player 1 can make to alter the payos should Player 2
enter. Here the decision might involve investing in overhead that increases the operating costs for
the incumbent so that the payos are

M
C, if Player 2 doesnt enter, and
107

0
A, if Player 2 enters and Player 1 doesnt ght
Figure 22.2: Here the investment decision reduces Player 1s payos by A if he doesnt ght in the event
of entry and C if there is no entry.
This is illustrated in Figure 22.2. Will Player 1 invest in this strategy? Again, the answer is found
by backward induction:
Path 1 (Player 1 doesnt invest). We know that if Player 2 enters, Player 1 wont ght. Player
2 gets
0
if he enters and 0 otherwise, so he will enter, which means Player 1 gets
0
.
Path 2 (Player 1 does invest). We know that if Player 2 enters, Player 1 will ght if A >
0
.
Assuming this condition holds, Player 2 knows Player 1 will ght, so Player 2 wont enter,
which means Player 1 gets
M
C. This is worthwhile if C <
M

0
.
Thus Player 1s payos boil down to the following:
dont invest in entry deterrence, earn
0
invest in entry deterrence, earn
M
C
Conclusion: Player 1 may make an investment in overhead provided:
it reduces the prot from not ghting when Player 2 enters (A >
0
)
it isnt too costly when Player 2 doesnt enter (C <
M

0
)
The key to entry deterrence is that once the incumbent decides to invest, the decision must aect
his payos. He is committing to ghting by changing his payos in the latter stage of the game.
108
There is an extension of this model to the case in which potential entrants dont know whom theyre
dealing with. Suppose there are two types of incumbents:
rational incumbents with payos 0 in a price war and
0
in a duopoly
mad dog incumbents with payos

in a price war and


0
S in a duopoly
The possibility that

> 0 reects the idea that the mad dog likes to ght. S > 0 can thus be
thought of as the shame that a mad dog feels for backing down. If

>
0
S, the mad dog
will ght, and this is the case if the mad dog really enjoys ghting or feels a substantial amount of
shame for backing down.
Suppose there is a xed cost of entry F. The game looks like Figure 22.3. If Player 2 enters and
the incumbent is a mad dog, then a ght ensues and the entrant gets F. If Player 2 enters and
the incumbent is rational, the incumbent doesnt put up a ght and the entrant gets
0
F. Player
2s expected prot
11
is
E[
2
] = P (mad dog) (F) +P (rational) (
0
F)
As the incumbent, you want to raise the entrants belief that you are crazy!
Figure 22.3: The potential entrant has no idea which type of incumbent hes dealing with. It behooves
the incumbent to signal that hes crazy!
11
See Section 23 for a denition of expected value.
109
23 Uncertainty I: Income Lotteries
In the next four sections we extend the theory of consumer choice to the context of choice under
uncertianty. For simplicity, we deal mainly with uncertainty regarding income. Assuming that
prices are xed, alternative realizations of random income translate directly into alternative utility
levels. We begin with a brief review of statistics.
23.1 Review of Basic Statistical Concepts
We dene the mean, or expected value of a random variable X, denoted by E[X] (or sometimes by
X), to be
E[X] =
n

i=1
p
i
x
i
where X takes the value x
i
with probability p
i
. The mean is just a weighted average of the
alternative realizations of X, with the weights being the probabilities associated with the respective
realizations.
Consider the two random variables X
1
and X
2
with probability distributions as shown in Figure 23.1.
Note that
E[X
1
] = 10 .1 + 20 .2 + 30 .4 + 40 .2 + 50 .1 = 30,
E[X
2
] = 10 .5 + 50 .5 = 30,
so while these distributions have the same mean, X
2
is more dispersed (X
1
on the other hand is
more concentrated near its mean).
Figure 23.1: Two dierent distributions with identical means.
One way to describe the level of dispersion of a random variable is by its variance, denoted V[X]:
V[X] =
n

i=1
p
i
(x
i
X)
2
.
The variance of X is the mean squared dierence between X and X. As an exercise, calculate V[X
1
]
and V[X
2
] above. We say that a random variable X is degenerate if X = E[X] with probability
one, in which case V[X] = 0.
110
We can also consider functions of random variables. If g is a function dened on R, then Y = g(X)
is a random variable. We dene the mean E[Y ] as follows:
E[Y ] = E[g(X)] =
n

i=1
p
i
g(x
i
).
If g is linear, i.e. if g(x) = ax +b for some choice of a and b, then
E[Y ] =
n

i=1
p
i
(ax
i
+b)
= a
n

i=1
p
i
x
i
+b
n

i=1
p
i
. .
1
= aE[X] +b.
As an exercise, show that V[aX +b] = a
2
V[X] for any choice of a, b.
23.2 Choices Over Uncertain Incomes
We now suppose that individuals are asked to make choices between alternative income lotteries.
Each lottery is essentially a probability distribution of income. In ranking two alternative lotteries,
we hold constant income in the absence of either lottery, (which in reality could be random).
Let y denote income. In a world without uncertainty individuals always prefer more income to
less, so the following utility functions are all equivalent in the sense that they give rise to the same
indierence curves:
u(y) = ay +b, a > 0
u(y) = e
y
u(y) = y
3
Since each function is increasing, it indicates a preference for more income. This is all we need, if
all we want to know is how to rank incomes.
On the other hand, suppose we wish to rank income lotteries. For example, consider:
Payo Probability
Lottery 1: $100 0.5
0 0.5
Payo Probability
Lottery 2: $70 0.5
$30 0.5
In the 1940s John von Neumann and Oskar Morgenstern asked: is there some way of assigning
a utility number to each possible outcome in such a way that we can compare these lotteries by
111
comparing the expected utilities:
0.5 u(100) + 0.5 u(0) in case of Lottery 1
0.5 u(70) + 0.5 u(30) in case of Lottery 2
The answer is yes, (under some assumptions), although we wont prove it. Thus, if preferences
satisfy certain conditions, then there is a utility functioncall it an expected utility function
dened on the set of all possible incomes, that we can use to compare both certain incomes, (which
is trivially easy anyway), and income lotteries. The idea is that if we get the utility diernces
between dierent incomes just right, then we can use the expected utility criterion to compare
lotteries.
NOTE: Normally we dont care about the gauge of a given utility function. That is, if u is a utility
function, then we regard v = g(u) as equivalent, provided g is a non-decreasing function.
How do you feel about Lottery 1 versus Lottery 2? Chances are, you would take Lottery 2. This
reveals something about the shape of your expected utility function.
Figure 23.2: Concave expected utility function. u(50) > 0.5u(30)+0.5u(70) > 0.5u(0)+0.5u(100).
An expected utility function u is always increasing since more money is always better than less,
(for an economist anyway). If u is linear, e.g. u(y) = ay +b, then
u(0) = b,
u(30) = 30a +b,
u(70) = 70a +b, and
u(100) = 100a +b,
so clearly 0.5 u(70) + 0.5 u(30) = 0.5 u(0) + 0.5 u(100). This leads to our rst result:
If the expected utility function is linear, then lotteries with equal expected utilities are
considered equally good.
On the other hand, if you prefer Lottery 2, then your expected utility function must be concave as
in Figure 23.2. If you prefer Lottery 1, this reveals that your expected utility function is convex.
112
In general, it is useful to assume that people are risk-averse (gambling is an exception). We say
that a person is risk-averse if he prefers x for sure rather than x +, where is a random variable
with E[] = 0:
E[u(x)] E[u(x +)]
If u is concave, this equation holds. Why? For any realization of , say =
i
,
u concave = u(x +
i
) u(x) +
i
u

(x)
So, taking expectations over all realizations of ,
E[u(x +)] E[u(x)] +E[u

(x)] = u(x) +u

(x)E[] = u(x)
since E[] = 0 by assumption.
113
24 Uncertainty II: Expected Utility
24.1 Expected Utility
In Section 23 we introduced the idea of a special utility function, dened over nonrandom incomes,
with curvature such that a consumer can use it to rank income lotteries. In particular, if an income
lottery is available that pays y
i
with probability p
j
, then it can be compared with any other lottery
based on the expected utility criterion:
E[u(y)] =

i
p
i
u(y
i
)
This function is called a von Neumann-Morgenstern utility function (vN-M), or sometimes simply
an expected utility function.
Examples:
Linear. u(y) = ay +b, gives rise to an expected value ranking.
Power function. u(y) = y

, where 0 < < 1. This function is concave, so people with


preferences such as these are risk-averse.
Exponential. u(y) = exp(ry), where r > 0. This function is increasing and concave, and
ranges from to 0. This particular function is often used in nance because if all income
lotteries are normally distributed, we get a nice ranking: for y N(,
2
), it can be shown
that
12
E[exp(ry)] = exp(r +r
2

2
/2) = exp[r( r
2
/2)]
Therefore, a lottery with mean and variance
2
is assigned a value based on r
2
/2.
This is nice because, given , individuals with higher values of r assign a greater discount to
a lottery with higher risk (variance).
We know that vN-M utility functions are not invariant under arbitrary transformations. If your
vN-M utility function is u(y) = y, then you are risk-neutral, and care only about expected values.
If my vN-M utility function is
v(y) =
_
u(y)
then mine is concave (v

y), and therefore I am risk-averse. Thus you and I evaluate lotteries
dierently. Expected utility functions are, however, invariant under increasing linear transforma-
tions. In other words, if your vN-M utility function is u(y) and mine is v(y) = au(y) + b, where
a > 0, then we evaluate lotteries the same way. To see this, consider a pair of lotteries y
1
and y
2
.
Suppose you prefer y
1
, i.e.
E[u(y
1
)] > E[u(y
2
)]
Then it also is true that
aE[u(y
1
)] +b > aE[u(y
2
)] +b E[au(y
1
) +b] > E[au(y
2
) +b]
12
This can be shown by manipulating the moment generating function (MGF) for the normal distribution. The
reader is advised to consult a book on mathematical statistics.
114
so I too prefer y
1
. This fact is very useful because it means we can rescale a vN-M utility function
so that the worst income realization (among a given set of lotteries) is assigned the value 0 and
the best one is assigned the value 1. To see this, imagine that we are comparing several lotteries:
the worst outcome is 10, 000 and the best outcome is 250, 000. Suppose u(10, 000) = u
0
and
u(250, 000) = u
1
. Then v(y) = au(y) +b, where
a =
1
u
1
u
0
b =
u
0
u
1
u
0
has v(10, 000) = 0 and v(250, 000) = 1. We have seen already that v evaluates lotteries in the
same way as u, so we are better o using v instead.
Figure 24.1: Risk-neutral individual has p(1, 000)+(1p)(100) = 250, or p = 0.318, whereas risk-averse
individual has p250 > 0.318.
We now are in a position to describe how to derive ones own vN-M utility function. Assume the
best possible outcome among lotteries under consideration is 1,000, and the worst is 100. We wish
to assign utilities to all possible incomes ranging from 100 to 1,000. Begin by setting u(100) = 0
and u(1, 000) = 1. For any intermediate income level, e.g. 250, as yourself:
If I had to choose between 250, and a lottery in which I receive 1,000 with probability
p, and 100 with probability 1 p, what value of p would make me indierent?
Call this quantity p
250
. Clearly 0 < p
250
< 1. Also, p
251
> p
250
(although not by much, probably).
Now simply set u(250) = p
250
. Why does this work? By denition
p
250
u(1, 000) + (1 p
250
)u(100) = u(250)
and weve normalized u so that u(1, 000) = 1 and u(100) = 0, hence u(250) = p
250
. Experimental
economists use this idea in the lab to gure out whether a subject is more or less risk-averse. As
Figure 24.1 shows, the more convex ones preferences, the bigger is p
250
, and the better the chances
of winning 1,000 have to be in order to forfeit 250 certain.
115
24.2 The Demand for Insurance
We now use the expected utility function to show that if you are risk-averse, and you have access to
actuarially fair insurance, then you will insure yourself fully against any risk. For example, suppose
your income is 30,000, and the probability that you will have an accident is p = 0.05. In the event of
an accident your medical bills will be 10,000. Your vN-M utillity function is u. Without insurance
your expected utility of income is
(1 p)u(30, 000) +pu(20, 000)
How does insurance work in a simple world? An insurance contract for 1 worth of coverage is a
promise by the insurance company to pay you 1 if you have an accident, and nothing otherwise. If
the premium, i.e. the cost to you, is , then the expected value of the contract to the insurance
company is
(1 p) +p( 1)
With probability 1 p, you pay the premium and nothing happens. With probability p, you pay
the premium but there is a claim and therefore a benet payment of 1. If insurance companies were
risk-neutral, they would compete for business by reducing to the point that
(1 p) +p( 1) = 0
This is so-called actuarially fair insurance: coverage of 1 is available for a premium equal to the
probability of a claim.
Suppose you buy c units of coverage at a premium of . Your expected utility is
(c) = (1 p)u(30, 000 c) +pu(20, 000 c +c)
where the function captures the value of dierent levels of coverage. If you choose c so as to
maximize , the FONC is

(c) = (1 p)u

(3, 000 c) +p(1 )u

(20, 000 c +c) = 0


The SOC is not a concern since

(c) =
2
(1 p)u

(30, 000 c) +p(1 )


2
u

(20, 000 c +c)


is always negative under the assumption that you are risk-averse. (Why? u concave = u

< 0.)
Consider the FONC carefully for = p. In this case
u

(30, 000 pc) = u

(20, 000 +c(1 p))


If u

< 0 as usual, then u

is strictly decreasing and therefore one-to-one, so


u

(x) = u

(y) x = y
hence
30, 000 pc = 20, 000 +c(1 p)
or c = 10, 000!
Exercises:
116
1. Redo the analysis of Section 24.2 assuming that if you buy insurance at all, you have to pay
an underwriting fee of f. The price per unit of coverage remains p (total cost of c units of
coverage is pc +f). Show that there is a number F such that
f F = you insure yourself fully
f > F = you dont buy insurance at all
2. Redo the analysis of Section 24.2 assuming > p, i.e. the insurance is not actuarially fair.
117
25 Uncertainty III: Moral Hazard
One of the most interesting problems in markets with uncertainty is that of moral hazard, the
tendency of economic agents to change their behavior ineciently upon having entered a contract
or some sort. We owe this term to the insurance industry: a policy holder who fails to exercise due
caution because he is insured is known as a moral hazard. A good example of this is a driver who
rents a car and purchases the full insurance option. Moral hazard can arise in other contexts as
well. For example, it often is argued that welfare systems discourage those who are in the system
from seeking employment. In this section we analyze the demand for insurance when policyholders,
through their own eorts, are capable of inuencing the likelihood of an accident. We show that
1. With full insurance, policyholders have no incentive to avoid accidents.
2. A solution to the moral hazard problem involves a deductibility clause.
In particular, a high deductible generally will induce a greater level of preventive care at the cost of
inducing variability in the policy holders income. Thus there is a tradeo between insurance and
eciency.
The model is simple. In each state of the world (accident/no accident) the insured has y initially.
In the even of an accident he loses . The insurance company oers to pay c in this event, in return
for a per unit charge of regardless. Expected utility depends on both ultimate wealth and eort x
expended on accident prevention. Assume that consumers evaluate income-eort bundles according
to
u(ultimate income) d(eort)
where u is an expected utility function and d represents the cost of a concerted eort to avoid an
accident. Assume d is convex, with d(0) = d

(0) = 0 as in Figure 25.1.


Figure 25.1: A representative cost-of-eort function.
The probability of an accident is p(x), where p is a decreasing function with p(0) = 0.5. We must
have p(x) > 0 and p

(x) < 0 for all x > 0. A consumer who buys c units of coverage and who
expends x units of eort has expected utility
(c, x) = p(x)[u(y c +c) d(x)] + (1 p(x))[u(y c) d(x)]
= p(x)u(y +c(1 )) + (1 p(x))u(y c) d(x)
118
Notice that since equal eort is expended whether or not there is an accident, we end up subtracting
d(x) from expected utility of income. Suppose the insurance company, through vast experience,
knows p(x), i.e. knows how much eort the insured will expend. If they break even, then
(1 p(x)) p(x)(1 ) = 0 (25.1)
by the same line of reasoning as in Section 24.2, so they charge = p(x).
The consumer views as exogenous and chooses x so as to maximize expected utility. The FONC
are
13

c
= p(x)(1 )u

(y +c(1 )) +(1 p(x))u

(y c) 0 (25.2)

x
= p

(x)[u(y +c(1 )) u(y c)] d

(x) = 0 (25.3)
Since the insurance company sets such that (25.1), (25.2) may be rewritten as follows
u

(y +c(1 )) u

(y c) 0 (25.4)
Suppose that equality holds in (25.4), i.e. the insured gets all the coverage he wants. Then, as in
Section 24.2
y +c

(1 ) = y c

=
But, with full coverage is there any incentive to be cautious? If the insured goes out of his way to
be careful, p falls, and he saves
u(y +c(1 )) u(y c)
in utility. With full coverage the savings are nil: he doesnt reap the benet of his actions because
the insurance company bears all of the risk. Therefore, if d

(0) = 0, the FONC are satised with


x

= 0 and c

= , i.e. the insured takes minimal care. Insurance companies expect this and set
premiums accordingly.
This level of care is socially inecient because the marginal cost of care is 0 when x = 0. If the
insured were just a little bit more careful, it would cost next to nothing, yet it would result in fewer
accidents and lower premiums. There is a breakdown in the usual argument about markets leading
to socially ecient outcomes because each consumer views the premium as exogenous even though
ultimately = p(x) since, in the long run, the insurance company understands what is going on.
25.1 Solution with No Moral Hazard
Suppose the consumer recognizes that = p(x) (this would be true if the insurance company could
monitor her behavior). In this case her objective is to maximize
(x, c) = p(x)u(y +c(1 p(x))) + (1 p(x))u(y cp(x)) d(x)
13
The > in (*) reects the idea that the consumer is rationed.
119
The FONC are

c
= p(x)[1 p(x)]u

(y +c(1 p(x))) p(x)[1 p(x)]u

(y cp(x)) = 0 c =

x
= p

(x)[u(y +c(1 p(x))) u(y cp(x))] d(x)


cp(x)p

(x)u

(y +c(1 p(x)))
cp

(x)[1 p(x)]u

(y cp(x))
= 0 (25.5)
Compare this to (25.3) and note that allowing premiums to vary according to eort gives rise to
extra terms. Now use y +c[1 p(x)] = y cp(x) in (25.5):
p

(x)u

(y cp(x)) = d

(x) (25.6)
which has the following interpretation: if the insured expends more eort, the cost is d

(x), the
RHS. On the other hand this reduces the likelihood of an accident by p

(x), saving times marginal


utility of income u

(ycp(x)), the LHS. The optimal level of caution is such that the marginal costs
and marginal benets are perfectly balanced. Note that (25.6) usually implies a level of caution
greater than zero, unless p

(x) = 0, in which case an increase in eort doesnt reduce the likelihood


of an accident. Notice too that the optimal solution has marginal benet of accident prevention
equal to u

(y cp(x)) .
25.2 A Partial Solution
How can we incentivize the insured to expend eort avoiding accidents when her eorts arent
rewarded with a lower premium? Look at equations (*) and (**). Assuming = p,
u

(u +c(1 )) u

(y c) 0 (25.7)
p

(x)[u(y c) u(y +c(1 ))] = d

(x) (25.8)
Suppose the insurance company refuses to sell full coverage (c < ). Utility in the accident state is
less than it is otherwise,
u(y c) > u(y +c(1 ))
and there is in fact an incentive to avoid accidents. The insured prefers more coveragebecause
the LHS of (25.7) is positivebut the insurance company refuses to sell any more. The insurance
company is instituting a deductible that the insured must pay in the event of an accident. The
amount of the deductible, c, inuences the amount of care taken by the insured.
Let a = c. Then (25.8) becomes
p

(x)[u(y +a) u(y a(1 ))] = d

(x)
or
u = u(y +a) u(u a(1 )) =
d

(x)
p

(x)
See what the deductible does? It provides the insured with less income in the accident state. Now
we can show that a higher deductible causes the insured to try even harder to avoid an accident.
For example, if
p(x) = p x
120
and
d(x) = x
2
then optimal eort x

is such that
u =
2x

or
x

=

2
u
i.e. optimal eort increases with u, which in turn increases with a.
121
26 Uncertainty IV: The State-preference Approach and Ad-
verse Selection
In this section we continue to consider the insurance problems with the following characteristics:
An individual with income y is at risk of a losing .
The loss occurs with probability p.
The individual has a vN-M utility function u(z), where z denotes (net, or ultimate) income.
The individual has access to actuarially fair insurance with a per-unit premium of = p.
The individuals expected utility is
(c) = pu(y +c(1 )) + (1 p)u(y c)
With c units of coverage the problem is summarized in Table 3. We now introduce a graphical way
Table 3: A summary of our assumptions regarding a policy holder.
State Probability Income Utility
accident p y +c(1 ) u(y +c(1 ))
no accident 1 p y c u(y c)
of analyzing a problem such as this. The approach we take is called the state-preference approach,
and it applies only if p is xed. Therefore it is less useful in moral hazard style problems where p
varies according to the endogenous variable x.
26.1 Setup
Figure 26.1: The slope of the indierence curves, on their way through the line yA = yN, is p/(1 p).
122
Think of the consumer as choosing a bundle consisting of two goods: income in the accident state
y
A
and income in the no-accident state y
N
. His expected utility is then
v(y
A
, y
N
) = pu(y
A
) + (1 p)u(y
N
)
We can draw indierence curves in y
A
y
N
-space as in Figure 26.1. Note that in this case
MRS =
v
1
v
2
=
p
1 p

u

(y
A
)
u

(y
N
)
Along the line y
A
= y
N
,
MRS =
p
1 p
The consumer of Table 3 is represented graphically by Figure 26.2. Note that every point on the
Figure 26.2: E denotes the consumers endowment, her incomes in each state without insurance.
line through (y , y) with slope p/(1p) has the same expected utility. Why? Consider varying
y
A
and y
N
in such a way as to hold expected income constant:
pdy
A
+ (1 p)dy
N
= 0
or
dy
N
dy
A
=
p
1 p
The consumer has access to insurance with a per-unit premium of = p, so with c units of coverage
his incomes are
y
A
= y +c(1 )
y
N
= y c
As coverage increases, the consumer moves along a line with slope /(1 ) since each unit of
coverage raises income in the accident state by 1 and reduces income in the no-accident state
by . But = p, so we have Figure 26.3. Recall that on y
A
= y
N
, MRS = p(1 p), so, if the
123
Figure 26.3: The tangency condition is satised by full coverage.
Figure 26.4: Here > p, so the budget line is rotated about E.
consumer can buy as much insurance as he wants, then he will satisfy the tangency condition for a
constrained optimum by choosing c = .
What happens if > p? Have a look at Figure 26.4. Starting from the endowment (y , y), the
budget line has slope
14
/(1 ) > p/(1 p), hence the optimum lies above the line y
A
= y
N
.
15
This means that when insurance companies sell policies with a load factor, consumers buy less
than full coverage. Note the following:
If is too high as in Figure 26.5, the consumer wont buy any coverage.
If < p, the consumer will over-insure as in Figure 26.6.
14
Steepness, actually.
15
All indierence curves have slope p/(1p) on the line y
A
= y
N
, and u concave = v exhibits DMRSverify
this!
124
Figure 26.5: Here is prohibitively high, so the consumer does without insurance altogether.
Figure 26.6: Here is so low that the consumer can aord to provide himself with more income in the
accident state than he has initially, so he tends to over-insure.
26.2 Adverse Selection
We now are ready to consider the case of two types of consumers: high-risk consumers with p = p
H
and low-risk consumers with p = p
L
. We assume that consumers know their own type but that the
insurance company cannot tell whos who. Suppose the population is half high-risk, half low-risk,
and the average level of risk is therefore given by
p =
p
H
+p
L
2
If the insurance company were to charge everyone = p, the low-risk consumers would buy less than
full coverage and the high-risk consumers would over-insure (or buy as much as they are allowed,
up to c = ).
How might equilibrium work in this case? One possibility is a signaling equilibrium in which the
125
nsurance companies oer two types of policies: one with a low premium
L
= p
L
that requires a
deductible a, another with a high premium
H
= p
H
and no deductible. If a is high enough, the
high-risk consumers will self-select and opt for the high-premium policy. The deductible has to be
such that they are a little worse o than they would be with a high premium and no deductible,
otherwise they would masquerade as low-risk consumers.
16
Figure 26.7: The low-risk consumer is in blue, and the high-risk consumer is in red.
In this model, which was described rst by M. Rothschild and J. Stiglitz (QJE, 1976), the deductible
associated with the low-risk contract serves the same purpose as extra education acquired by job
seekers in M. Spenses job-market signalling model. The point is that low-risk consumers have a
lesser cost of bearing risk (deductible) because they know an accident is relatively unlikely. The
presence of the high-risk consumers is a problem for the low-risk consumers: if they can identify
themselves credibly, they can achieve higher utility, but the only way to identify themselves credibly
is by purchasing a contract that a high-risk consumer would turn down.
Note that R-S equilibrium involves rms and consumers. Firms charge actuarially fair premiums
and therefore dont prot in the long run. Consumers of all types are happy to make the appropriate
choices.
16
This should remind the reader of price discrimination.
126
27 Auctions I: Types of Auctions
Many items are sold by auction, including treasury bills, broadcasting rights, real estate, livestock,
ne art, and natural resources (e.g. timber lands and oil elds). Large companies and governments
also use procedures that are equivalent to auctions to determine who will supply goods or services
in some cases.
In this section and the next we examine how economists model auctions. Although auctions have
existed for centuries, the basic theory thereof is quite modern. One good, somewhat advanced ref-
erence is Paul Klemperer, Auction Theory: A Guide to the Literature, Journal of Economic Sur-
veys Vol 13 (3), July 1999, which is available at http://www.nuff.ox.ac.uk/users/klemperer/
survey.pdf.
27.1 Basic Types of Auction
There are four basic types of auction for a single good:
1. English Auction. Also known as an ascending bid auction, this probably is the one with
which you are most familiar. An auctioneer acts as moderator, and asks for bids from a group
of n bidders. If a bidder bids b
(n)
, and no one outbids him, then he wins the auction and pays
b
(n)
in return for the good. Note that the auctioneer may be a computer. (eBay essentially
is an English auction arena, although each of the auctions has a time limit, which is unusual
for an English auction.)
2. Dutch Auction. Also known as a descending bid auction, the auctioneer calls out a de-
scending sequence of prices, starting from a price that is clearly too high. The rst bidder to
announce that she is willing to accept the current price, b
(n)
, wins the auction and pays b
(n)
in return for the good.
3. First-Price Sealed-Bid Auction. Bidders submit written bids. At a certain point the bidding
is closed. The auctioneer then selects the highest bid b
(n)
, which is declared the winner. The
winner pays b
(n)
.
4. Second-Price Sealed-Bid Auction. Also known as a Vickery auction, bidders again submit
written bids and, at a certain point, the bidding is closed. The auctioneer then selects the
highest bid b
(n)
, which is declared the winner; however, the winner pays the second highest
bid b
(n1)
.
Auction models dier in their assumptions regarding how the value of an item at auction varies
from one person to the next, and how much the bidders know about their own potential valuations
as well as those of other bidders. The value of the item to bidder i will be donoted v
i
.
We shall focus on three important cases:
1. Private Values. Each valuation is independent and known only to the bidder.
2. Common Value. v
i
= v for all i but v is unknown. (Examples might include an auction to
sell the rights to drill for oil in a certain tract of land.)
127
3. Aliated Values. v
i
varies across bidders but bidders themselves do not know their own
valuations with certainty, and the valuations are positively correlated. (Examples might
include an auction for a house.)
27.2 Important Results Concerning the Private Values Case
1. A Dutch auction is equivalent to a rst-price sealed-bid auction.
In a Dutch auction there is no dynamic choice: one must choose an opt-in price ex ante and,
if the price falls to that level, opt in and receive the good for that price. This is the same
problem as deciding what bid to submit in a rst-price sealed-bid auction. (We defer for the
moment the optimal choice of bidding strategy in these auctions.)
2. In an English auction the optimal strategy is to keep bidding until the current highest bid b
exceeds your valuation v
i
. Why?
(a) If b > v
i
, you are advised to walk away for otherwise, if you bid b

> b, then the eventual


winner will pay at least b > v
i
.
(b) If v
i
> b, and you walk away, you leave a surplus v
i
b > 0 on the table.
(c) If b = v
i
, then bid b + and, if no one outbids you, break even.
3. In light of (2c), in an English auction the bidder with the highest valuation wins, and pays
the second highest valuation (plus a marginal amount needed to surpass the second highest
bidder).
4. In a second-price sealed-bid auction, the optimal strategy is to bid your valuation.
Suppose your true value is v and you bid v x, where x 0. Suppose the highest bid among
all other bidders is w.
(a) If v > w, you win and pay w.
(b) If v < w, you lose and pay nothing.
Your expected surplus,
s = (v w)P (v x > w)
is maximized by setting x = 0!
Now suppose you bid v +x, where x and w are as before.
(a) If v +x < w, then you lose and pay nothing.
(b) If v +x > w, you win and pay w. Your surplus is v w. There are two cases:
v w = you want to win
v < w = you dont want to win
If you set x = 0, you win i v > w, so x = 0 is the best choice.
5. Based on (4), in a second-price sealed-bid auction the winner is the bidder with the highest
valuation, who then pays a price equal to the second highest valuation.
128
6. Results (3) and (5) imply that in the private values case an English auction is equivalent to
a second-price sealed-bid auction.
7. In the common value case English auctions may be dierent because a bidder can make
inferences based on the identity/size of the remaining pool of bidders.
27.3 Bidding in a First-price Auction
How does one bid in a rst-price auction? Let us start by making a few assumptions:
The valuations v
1
, . . . , v
n
are independent and identically distributed (IID), with P (v
i
x) =
F(x) for all i. (This is sometimes written v
1
, . . . , v
n
IID
F, where F is the cumulative
distribution function, or CDF.)
Each bidder adopts the same strategy and bids b
i
= B(v
i
), where B is the bid function.
What does the bid function look like?
B is increasing for otherwise bidder with highest valuation wouldnt necessarily win.
B increasing = B invertible = if b = B(v), then v = g(b), where g denotes the inverse
bid function B
1
.
Assuming each bidder bids according to B,
P (you win with bid b) = P (v
j
< g(b) for all other bidders j) = [F(g(b))]
n1
Let s = s(b, v) = expected surplus given bid b and valuation v. Then
s = (v b)
. .
surplus
if win
[F(g(b))]
n1
. .
prob of winning
What is the FONC for b?
s
b
= [F(g(b))]
n1
+ (v b)(n 1)[F(g(b))]
n2


b
F(g(b))
= [F(g(b))]
n1
+ (n 1)(v b)[F(g(b))]
n2
f(g(b))g

(b)
where f = F

is the probability density function. Setting s/b = 0,


[F(g(b))]
n1
= (n 1)(v b)[F(g(b))]
n2
f(g(b))g

(b)
which implies
v b =
F(g(b))
f(g(b))

1
g

(b)

1
n 1
(27.1)
Note that since B

> 0, so is g

= 1/B

, and therefore v b > 0. This means one should always


shade his or her bid. Why? If you bid more, then although you win more often you also pay
more. Like a monopolist, you must take this into account.
129
As an example, suppose v
1
, . . . , v
n
IID
Uniform(0, 1), that is, suppose the valuations are indepen-
dent and identically distributed with
F(v) =
_
_
_
0, v < 0
v, 0 v 1
1 v > 1
and therefore
f(v) =
_
1, 0 v 1
0, otherwise
In this case (27.1) says
g(b) b =
1
n 1

g(b)
g

(b)
which is a dierential equation with solution
g(b) =
n
n 1
b
This is easily veriable:
g(b) b =
n
n 1
b b =
n (n 1)
n 1
b =
b
n 1
=
1
n 1

n
n1
b
n
n1
=
1
n 1

g(b)
g

(b)
The bid function is recovered by setting v = g(b) and solving
v =
n
n 1
b
for b = B(v),
B(v) =
n 1
n
v
130
28 Auctions II: Winners Curse
In this section we analyze the winners curse. The winners curse arises in common value and
aliated values auctions, in which each bidder estimates the value of the item at auction. Bidders
with higher guesses, on average have made a positive error. So bidders must shade their bids to
compensate for the fact that when they win, on average it is the result of being overly optimistic.
A very simple example of the winners curse is the following: a police car is to be sold at auction
using a second-price sealed-bid system. Each bidder inspects the car, then the bidding begins.
The true value of the car is v, a random variable with mean and variance
2
v
. This reects the
idea that over many auctions, the average value of an old police car is . But there is variability
from one car to the next, captured by
2
v
.
Each bidder hires a mechanic to estimate the value of the car. The mechanic reports his or her
estimated value t
i
= v +
i
, where
i
is the error in the mechanics assessment. A bidder doesnt
know the values reported to his competitors by their respective mechanics.
We assume
1
, . . . ,
n
IID
F

, with mean zero and variance


2

. Note that the larger is


2

relative
to
2
V
, the noiser the mechanics reports.
Based on t
i
, the ith bidder estimates the true value of the car. In particular, the bidder forms an
estimate
y
i
= t
i
+ (1 )
The idea behind this is that if t
i
is signicantly noisy, the bidder should downweight the mechanics
report and assume instead that the average value at auction is more credible. Note that
E[y
i
] = E[t
i
] + (1 ) = E[v +
i
] + (1 ) = + (1 ) =
What is the optimal value of ? The forecast error for a given value of is

i
= y
i
v
= t
i
+ (1 ) v
= (v +
i
) + (1 ) v
= (1 )( v) +
i
The variance of the forecast error is
V[
i
] = V[(1 )( v) +
i
]
= (1 )
2
V[v] +
2
V[
i
]
= (1 )
2

2
v
+
2

We choose to minimize the variance of the forecast error. The FONC is


V[
i
]

= 2(1 )
2
v
+ 2
2

= 0
which implies (1 )
2
v
=
2

, or
=

=

2
v

2
v
+
2

131
You may have seen this before:

is the signal-to-total-variance ratio. If


2

is small, then

is
nearly one, and the result is a weighted average with more weight on the mechanics report.
Based on the mechanics report, plus the optimal choice of =

, each bidder now has a good


idea as to the value of the car in the current auction. Since it is a second-price auction, one might
think each bidder simply should bid
y

i
=

t
i
+ (1

)
But this will give rise to a winners curse! The highest bidder wins the auctionthis is the bidder
whose mechanic made the biggest positive error. He pays the amount of the second highest bid,
which includes the second biggest positive error. So, even in a second-price auction, one must take
into account the fact that on average the second highest bidder also was overly optimistic, and
shade his or her bid.
In an auction with 10 bidders, if
1
, . . . ,
n
IID
N(0, 1), i.e. if the errors are normally distributed
with mean zero and variance one,
17
then the expected value of the second highest error
(n1)
is
approximately 1.003. Table 4 lists a few other cases. As you can see, E[
(n1)
] grows with n. This
Table 4: Expected Second Highest Error for Various Numbers of Bidders
n E[
(n1)
]
10 1.001
25 1.524
35 1.692
100 2.148
might cause you to worry a little about eBay: with thousands of bidders on a given item, if you do
not know what the item is worth to you, then you ought to shade your bid quite a bit. But what
should you bid?
Suppose each bidder shades his or her bid by k:
y
i
=

t
i
+ (1

) k
=

v + (1

) +

i
k
Let = E[
(n1)
]. (This quantity usually is estimated by computer simulation.
18
) The expected
bid of the second-highest bidder is

v + (1

) +

k.
So, if everyone sets k =

, then each bidder should expect to pay


E[

v + (1

)] =
in the event he wins, which means that on average he pays what the item is worth.
Bottom line: in the second-price auction we have set up, each bidder bids what she believes the
item is worth based on her information, minus a discount that is equal to the expected second-highest
error in the observed signal. Note that in the real world bidders hire consultants to simulate the
auction (by making educated guesses as to
2

and
2
v
).
17
See Figure 28.1.
18
See Appendix 28.1
132
4 2 0 2 4
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Standard Normal Distribution
x
Figure 28.1: Standard normal distribution with PDF in red and CDF in black.
28.1 Appendix: Order Statistics
Let X
1
, . . . , X
n
IID
F
X
, and let Y
k
= X
(nk+1)
deonte the kth biggest observation, 1 k n.
Then
F
Y2
(x) = P (exactly one observation is greater than x) +P (all observations are at most x)
= n[1 F
X
(x)] [F
X
(x)]
n1
+ [F
X
(x)]
n
= n[F
X
(x)]
n1
n[F
X
(x)]
n
+ [F
X
(x)]
n
= n[F
X
(x)]
n1
(n 1)[F
X
(x)]
n
and therefore
f
Y2
(x) = F

Y2
(x)
= n(n 1)[F
X
(x)]
n2
f
X
(x) n(n 1)[F
X
(x)]
n1
f
X
(x)
= n(n 1)[F
X
(x)]
n2
f
X
(x)S
X
(x)
where S
X
(x) is dened to be 1 F
X
(x). Then
E[Y
2
] = n(n 1)
_
R
x[F
X
(x)]
n2
f
X
(x)S
X
(x)dx
133
28.1.1 Uniform Distribution
If X
1
, . . . , X
n
IID
Uniform(0, 1), then
E[Y
2
] = n(n 1)
_
1
0
x
n1
(1 x)dx
= n(n 1)
_
1
n

1
n + 1
_
=
n 1
n + 1
28.1.2 Normal Distribution
If X
1
, . . . , X
n
IID
N(0, 1), then
E[Y
2
] = n(n 1)
_
R
x[(x)]
n2
[1 (x)](x)dx
where is the standard normal CDF and
(x) =

(x) =
1

2
e
x
2
/2
This can be computed by Gaussian quadrature with the following R
19
function:
ESB.gq = function(n, CDF, PDF){
# - computes approx. EV of 2nd biggest observation among n IID draws from dist. "CDF"
# - corresponding density is "PDF"
f = function(x){x*n*(n-1)*(CDF(x))^(n-2)*PDF(x)*(1 - CDF(x))}
# - "f" is density of 2nd biggest observation
integrate(f, -Inf, Inf)$value
}
As a check, we may approximate E[Y
2
] for n = 100 by simulation. The following R script returns
E[Y
2
] 2.148444, which agrees fairly well with the previous result.
ESB.sim = function(n,B){
# - computes EV of 2nd biggest observation among n IID draws from standard normal
x = 0
for(i in 1:B) x = x + sort(rnorm(n))[n-1]
x/B
}
print(ESB.sim(100,1e06))
19
R is a programming language for statistical computing that is available free of charge at http://www.r-project.
org.
134
29 Finance I: Capital Asset Pricing Model
In this section we consider the implications of the simple assumption that investors hold only port-
folios of assets that are mean-variance ecient. This turns out to have the surprising consequence
that the price of a stock, i.e. the price of a share in a publicly traded company, depends on the
covariance between the return on the stock and the return on the market as a whole. This result
was discovered in the early 1960s by William Sharpe, who shared the Nobel Price in Economics on
the basis of his work. The theoretical model is called the Capital Asset Pricing Model (CAPM).
29.1 Assumptions
We shall assume that investors can choose among a set of assets, i = 1, . . . , n. An investor with x
to invest selects a portfolio, or in other words a list of the amount invested in each of the possible
assets. Denote by
i
the share of x in asset i. Note that
i
0 for all i, and

n
i=1

i
= 1. You
can think of any vector R
n
with non-negative elements that add up to one as a portfolio.
We shall assume for simplicity that our investor has a one-period horizon, or holding period. An
investment of $1 in asset i will be worth 1 + R
i
at the end of the holding period. Thus R
i
is the
proportional return on asset i over the holding period. Note that R
i
1 if assets have limited
liability for in that case the worst event would be to lose ones entire investment. R
1
, . . . , R
n
are
random variables. The mean of R
i
is r
i
= E[R
i
], and the variance is
2
i
. Asset i is said to be
riskier than asset j if
2
i
>
2
j
. We do not assume that the returns on the respective assets are
independent. Instead, we assume there to be potential covariances

2
ij
= Cov[R
i
, R
j
] = E[(R
i
r
i
)(R
j
r
j
)]
Note that the covariance of the return on asset i, with itself, is
E[(R
i
r
i
)
2
] = V[R
i
] =
2
i
so we shall occasionally write
2
ii
instead of
2
i
.
Recall
20
the following: Given two random variables X and Y , and observations (X
1
, Y
1
), . . . , (X
n
, Y
n
),
if you are interested in a line of best t revealing the dependence of Y upon X, then you carry out
a linear regression of Y on X. This procedure returns the least squares estimates and in the
linear model
Y
i
= +X
i
+
i
where
1
, . . . ,
n
are the residual errors. The optimal coecients are found by minimizing the
residual sum of squares
n

i=1

2
i
=
n

i=1
( +X
i
Y
i
)
2
20
If the reader has not taken econometrics or statistics, then this may not look familiar.
135
This can be done by the same method we used in Section 28 to analyze the winners curse, and
gives
=
Cov[X, Y ]
V[X]
=

2
XY

2
X
Now if the investor with x selects a portfolio = (
1
, . . . ,
n
), then how much will he have at the
end of the holding period?
The total amount invested in asset i is
i
x.
At the end of the holding period this investment is worth
i
x(1 +R
i
).
The investor now has
n

i=1

i
x(1 +R
i
) = x
_
n

i=1

i
+
n

i=1

i
R
i
_
= x
_
1 +
n

i=1

i
R
i
_
The return on the portfolio is
R =
x(1 +

n
i=1

i
R
i
) x
x
=
n

i=1

i
R
i
which is a weighted average of the returns on the respective assets, with each weight equal to the
corresponding share. The expected return is
E[R] = E
_
n

i=1

i
R
i
_
=
n

i=1

i
r
i
What is the variance of R?
V[R] = E
_
_
_
n

i=1

i
R
i

i=1

i
r
i
_
2
_
_
= E
_
_
_
n

i=1

i
(R
i
r
i
)
_
2
_
_
= E
_
_
n

i=1
n

j=1

j
(R
i
r
i
)(R
j
r
j
)
_
_
=
n

i=1
n

j=1

j
E[(R
i
r
i
)(R
j
r
j
)]
=
n

i=1
n

j=1

2
ij
=
n

i=1

2
i

2
ii
+

1in

1jn
j=i

2
ij
136
The CAPM hinges on two assumptions:
1. There exists a risk-free assetcall it asset 1.
2. The market portfoliothe portfolio consisting of equal amounts of all sharesis mean-variance
ecient in the sense that no other portfolio realizes the same return but with lesser variance.
29.2 Conclusion
Consider an ecient portfolio
P
= (
P
1
, . . . ,
P
n
), with return R
P
=

n
i=1

P
i
R
i
, and which has
the least possible variance subject to yielding an expected rate of return of r
P
= ER
P
. Such a
portfolio solves
min

i=1
n

j=1

2
ij
s.t.
n

i=1

i
r
i
= r
P
and
n

i=1

i
= 1
The Lagrangian is
L(, , ) =
n

i=1
n

j=1

2
ij

_
n

i=1

i
r
i
r
P
_

_
n

i=1

i
1
_
FONC w.r.t. :
L

P
j
= 2
n

i=1

P
i

2
ij
r
i
= 0, 1 j n (29.1)
In particular, (29.1) holds for asset 1; however, asset 1 is risk-free by assumption, and therefore

2
1i
= 0 for all i. Thus, for asset 1, (29.1) implies
= r
1
= r,
where r deontes the risk-free rate of return. Hence we can rewrite (29.1) as follows:
2
n

i=1

P
i

2
ij
= (r
j
r), 1 j n (29.2)
Multiplying by
P
j
,
2
P
j
n

i=1

P
i

2
ij
=
P
j
(r
j
r), 1 j n
and summing across all assets,
2
n

j=1

P
j
n

i=1

P
i

2
ij
=
n

j=1

P
j
(r
j
r)
which is equivalent to
2
n

j=1
n

i=1

P
i

P
j

2
ij
=
_
_
n

j=1

P
j
r
j
r
_
_
137
or
n

j=1
n

i=1

P
i

P
j

2
ij
= (r
P
r) (29.3)
since

n
j=1

P
j
r
j
= r
P
. Let
2
P
=

n
j=1

n
i=1

P
i

P
j

2
ij
denote the minimum variance of the
portfolio with expected return r
P
. Then (29.3) says 2
2
P
= (r
P
r), or
=
2
2
P
r
P
r
Plugging the result back into (29.2),
r
j
r = (r
P
r)
n

i=1

P
i

2
ij

2
P
(29.4)
Finally, notice that
n

i=1

P
i

2
ij
= E
_
(R
j
r
j
)
n

i=1

P
i
(R
i
r
i
)
_
= Cov
_
R
j
,
n

i=1

P
i
R
i
_
= Cov [R
j
, R
P
]
In other words,

n
i=1

P
i

2
ij
is the covariance of the return on asset j, with the return on the ecient
portfolio that has expected return r
P
. We can rewrite (29.4) as follows:
r
j
r = (r
P
r)
Cov[R
j
, R
P
]

2
P
(29.5)
Suppose now that (29.5) holds for the return on the market portfolio, R
M
. Let
2
M
denote VR
M
.
Then it follows by (29.5) that
r
j
= r +(r
M
r)
where
=
Cov[R
j
, R
M
]

2
M
is the regression coecient one would get by carrying out linear regression of the return on asset j,
on the return on the market portfolio. This regression coecient is called the assets beta.
29.3 Summary
An assets beta measures the amount of systematic risk the asset carries. E.g. if > 1, then the
asset is expected to outperform the market in good times and perform worse than the market in
bad times. This is risky! For an asset with = 1.5, in a market with r = 0.05 and r
M
= 0.13, an
expected rate of return of
0.05 + 1.5 (0.13 0.05) = 0.17
would be considered fair compensation for assuming the risk associated with this asset.
138
30 Finance II: Ecient Market Hypothesis
30.1 Review
In Section 29 we considered the implications of the CAPM assumptions with respect to the return
on a risky asset i. In particular, if asset i has random return R
i
and beta , (i.e. if, on average,
when R
M
is 1% higher/lower than usual, then R
i
is % higher/lower than usual), then r
i
= E[R
i
]
is required to satisfy
r
i
=
0
+
1
(30.1)
where, in the CAPM,
0
= r is the risk-free rate of return and
1
= r
M
r is the excess return on
the market portfolio.
Researchers in the 1970s and 80s attempted to test the CAPM by estimating betas for classes of
assets and checking whether, on average, assets with bigger betas had higher expected returns. This
was not always successful. These days, economists interpret the model less literally. Often, they
augment the model with additional factors. The original CAPM says that all one needs to know
about an asset is its covariance with the market. A more agnostic view is that while beta matters,
there may be additional considerations. So it is common to see models such as the following:
r
i
=
0
+
1
+
2
where is some other factor. A typical factor is one that reects how a given asset covaries with a
portfolio comprised of small stocks, or with a portfolio made up of bonds as opposed to stocks.
One key use of (30.1) is the determination of how to discount returns on dierent assets. For
example, if we are dealing with an asset that is priced at P
0
in the current period, and will sell for
P
1
in the next, then the expected return on the asset is (E[P
1
] P
0
)/P
0
. If the asset has beta equal
to , then under the CAPM the expected return is
0
+
1
. Thus we have
E[P
1
]
P
0
1 =
0
+
1
= P
0
=
E[P
1
]
1 +
0
+
1
This very simple equation has many immediate implications.
30.2 Ecient Market Hypothesis
Consider a very short holding period, e.g. one week. The discounting is negligible, thereby giving
P
0
= E[P
1
] (30.2)
This means the price today has to be the expected value of the price next week. A stochastic process
is simply a sequence of random variables X
1
, X
2
, . . .
Examples:
The height of the Nile River at a given location on June 1, some year onward.
The closing price of the S&P 500 on Friday, some week onward.
139
Roughly speaking, a random walk is a stochastic process with the additional property that
E[X
t
|X
t1
, . . . , X
1
] = X
t1
i.e. the best forecast of the value in the next period is the current value. So people sometimes say
that asset prices constitute a random walk.
Equation (30.2) is sometimes called the ecient market hypothesis (EMH). The key insight in this
equation is that all the information we have to forecast the value of the asset tomorrow is factored
into the current value. As an example, suppose you think a share of Google stock will be worth x in
six weeks. Then you should be willing to pay almost x for the stock now, (subject to the discount
factor only).
Suppose (30.2) holds for a stock. The realized gain from buying the stock today and selling it in
the next period is
P
1
P
0
= P
1
E[P
1
]
But the deviation of a random variable from its expected value is unpredictable. This means that
techniques such as drawing charts, etc., (so called technical analysis), cannot work!
Suppose new information is revealed about an asset. Take, for example, news concerning a drug
company such as Merck: the news could be regarding a prospective drug that is being evaluated
in a randomized clinical trial, or the discovery of side eects associated with an existing drug, or
a decision by the FDA, etc. Equation (30.2) says the newswhatever it may beshould cause an
instantaneous adjustment of the stock price, up or down. Likewise, news with implications for the
entire economy, e.g. the results of a Federal Reserve Open Market meeting, should cause the
market as a whole to adjust, up or down, instantaneously, as people adjust their expectations.
This leads to the idea of an event study. If one is trying to evaluate the eect of news on the value
of a rm, one looks at the excess returns on the rms stock:
XR
t
=
P
t
P
t1
P
t1

M
t
M
t1
M
t1
where P
t
is the value of a share in the rm under consideration, at closing on day t, and M
t
is the
value of the market index at the same time.
The cumulative excess return is the sum of the excess returns over some horizon:
CXR
t
=
t

i=1
XR
i
where period zero is some time prior to the breaking of the news, e.g. 714 days beforehand. One
then plots CXR
t
, 1 t T, where T is several days after the breaking of the news. Ideally one
should observe random uctuations before and after the news, with a jump on the day of the
news.
21
An implication of the EMH is that on average there is no advantage to following the suggestions
of advisors (at least, adjusting for the excess risk of the portfolios they recommend). It is widely
21
A good reference on the topic is W. Craig McKinley, Event Studies in Finance and Economics, Journal of
Economic Literature, March 1997.
140
Figure 30.1: Plot of cumulative abnormal return for earning announcements from event day 20 to event
day 20. The abnormal return is calculated using the market model as the normal return
measure. Source: McKinley (1997).
believed that the high returns reported by some funds in some periods are merely strings of luck.
Tables 57, (drawn from a paper by Burton Malkiel, The Ecient Market Hypothesis and Its
Critics, Journal of Economic Perspectives, Winter 2003), demonstrate this idea.
141
Table 5: Percentage of Large Capitalization Equity Funds Outperformed by Index Ending 6/30/2002
1 year 3 years 5 years 10 years
S&P 500 vs. Large Cap Equity Funds 63% 56% 70% 79%
Wilshire 5000 vs. Large Cap Equity Funds 72% 64% 69% 74%
Note: All large capitalization mutual funds in existence are covered with the exception of sector
funds and funds investing in foreign securities.
Source: Lipper Analytic Services.
Table 6: Median Total Returns Ending 12/31/2001
10 years 15 years 20 years
Large Cap Equity Funds 10.98% 11.95% 13.42%
S&P 500 Index 12.94% 13.74% 15.24%
Source: Lipper Analytic Services, Wilshire Associates, Standard &
Poors and The Vanguard Group.
Table 7: Getting Burned by Hot Funds
1998 1999 2000 2001
Average Average
Annual Annual
Fund Name Rank Return Rank Return
Van Wagoner:Emrg Growth 1 105.52 1106 43.54
Rydex:OTC Fund;Inv 2 93.43 1103 36.31
TCW Galileo:AGr Eq;Instl 3 92.78 1098 34.00
RS Inv:Emrg Growth 4 90.19 1055 26.17
PBHG:Large Cap 20 5 84.56 1078 29.03
Janus Olympus Fund 6 77.24 1061 27.03
Van Kampen Aggr Gro;A 7 76.70 1067 28.04
Janus Mercury 8 76.31 1057 26.35
PBHG:Sel Equity 9 76.21 1097 33.19
WM:Growth;A 10 74.77 1046 25.82
Berger new Generation;Inv 11 73.31 1107 45.96
Janus Enterprise 12 72.28 1101 35.40
Janus Venture 13 72.22 1091 30.89
Fidelity Aggr Growth 14 70.56 1105 38.02
Janus Twenty 15 69.09 1090 30.83
Amer Cent:New Oppty 16 67.64 1033 24.11
Morg Stan Sm Cap Gro;B 17 66.59 1102 35.96
Van Kampen Emrg Gro;A 18 65.67 1021 22.70
TCW Galileo:SC Gro;Instl 19 64.87 1099 34.77
Black Rock:Md Cap Gro;Instl 20 64.44 1009 22.18
Average Fund Return 76.72 31.52
S&P 500 Return 24.75 10.50
Source: Analytic Services and Bogle Research Institute, Valley Forge, PA.
142
31 Public and Near-public Goods
A pure public good is one such as public radio, with two properties:
1. The amount of the good consumed by one person has no eect on its availability to others.
(This is called the no rivalry, or no congestion condition.)
2. A person cannot be prevented from consuming the good. (This is called the non-exclusionary
condition.)
Sometimes condition (1) is true while (2) is not. This is arguably the case with intellectual property
distributed via the internet. (If I download a song or a software program, my use does not aect
anyone elses use.)
Additional examples of near public goods:
parks and wildlife reserves (although in some cases these can become congested)
national defense.
There are many goods/services that are widely thought of as public goods yet really arent, e.g.
schools, which are subject to congestion and also are excludable.
31.1 Optimal Provision of Goods with No-rivalry Characteristics
Consider a public good which comes in various amounts. Let x be the amount provided, at a cost
of p dollars per unit.
An economy has n consumers, i = 1, . . . , n. Consumer i has income y
i
, and pays a tax t
i
toward the
purchase of the public good. Additionally, consumer i has utility given by u
i
(c
i
, x) = u
i
(y
i
t
i
, x).
31.1.1 Case 1: one consumer; x = t
1
/p.
The objective is
max
t1
u
1
(y
1
t
1
, t
1
/p)
FONC:
u
1
c
(y
1
t
1
, t
1
/p) +
1
p
u
1
x
(y
1
t
1
, t
1
/p) = 0 =
u
1
x
(y
1
t
1
, t
1
/p)
u
1
c
(y
1
t
1
, t
1
/p)
= p
i.e. MRS
1
(y
1
t
1
, t
1
/p) = p. Recall that MRS
1
is consumer 1s willingness to pay for the last
unit of the public good x, in units of consumption c, or dollars.
31.1.2 Case 2: two consumers; x = (t
1
+t
2
)/p.
The objective is:
max
t1,t2
u
1
(y
1
t
1
, (t
1
+t
2
)/p) s.t. u
2
(y
2
t
2
, (t
1
+t
2
)/p) k
2
143
Why? A social optimum must maximize consumer 1s utility subject to consumer 2s current utility.
Such an outcome is called Pareto optimal. (If Pareto optimality fails to hold, then we could re-
allocate resources with the end result that both consumers are better o.) Varying k
2
traces out a
full range of potential social optima.
The Lagrangian is:
L(t
1
, t
2
, ; k
2
) = u
1
(y
1
t
1
, (t
1
+t
2
)/p) +
_
u
2
(y
2
t
2
, (t
1
+t
2
)/p) k
2

(In what follows we shall occasionally omit functional dependencies for the sake of notational
simplicity.) Let v
1
be the maximum value of u
1
s.t. u
2
k
2
. We know by the Envelope Theorem
that v
1
/k
2
= L/k
2
= , so > 0. A higher value of assigns greater weight to consumer
2s outcome.
FONC:
L
t1
= u
1
c
+
1
p
u
1
x
+

p
u
2
x
= 0
L
t2
= u
2
c
+
1
p
u
1
x
+

p
u
2
x
= 0
Note that the second and third terms in each of the above equations are the same, so we get
u
1
c
= u
2
c
,
or = u
1
c
/u
2
c
. The intuition behind this is that the social planner can rearrange taxes on consumers
1 and 2 while keeping x constant. If consumer 1 pays one less tax dollar, his utility increases by
u
1
c
. Likewise, if consumer 2 pays one less tax dollar, her utility increases by u
2
c
. At the optimum, a
gain of one unit in consumer 2s utility corresponds to a gain of in consumer 1s utility.
The rst of the FONC can be rewritten as follows:
u
1
c
=
1
p
u
1
x
+

p
u
2
x
=
1
p
u
1
x
+
u
1
c
/u
2
c
p
u
2
x
=
1
p
_
u
1
x
+u
1
c
_
u
2
x
u
2
c
__
=
u
1
x
u
1
c
+
u
2
x
u
2
c
= p
or
MRS
1
+MRS
2
= p
This means the optimal choice of x has the property that p equals the aggregate willingness to pay!
144
31.1.3 Case 3: n consumers; x = /p, where =

n
i=1
t
i
.
The objective is
max
t1,...,tn
u
1
(y
1
t
1
, /p) s.t.
_

_
u
2
(y
2
t
2
, /p) k
2
u
3
(y
3
t
3
, /p) k
3
.
.
.
u
n
(y
n
t
n
, /p) k
n
This is the n-consumer version of Pareto optimality. The optimal choice of taxes is the one that
maximizes consumer 1s utility subject to minimum levels of utility for the other n 1 consumers.
The Lagrangian is
L = u
1
(y
1
t
1
, /p) +
n

i=2

i
[u
i
(y
i
t
i
, /p) k
i
].
For convenience dene
1
= 1 and k
1
= 0. Then
L =
n

i=1

i
[u
i
(y
i
t
i
, /p) k
i
]
FONC:
L
ti
=
i
u
i
c
+
1
p
n

i=1

i
u
i
x
= 0, 1 i n
Note that the sum is constant with respect to i, so we must have

1
u
1
c
=
2
u
2
c
= =
n
u
n
c
In particular,
u
1
c
=
i
u
i
c
, 2 i n
and thus

i
=
u
1
c
u
i
c
, 2 i n
Putting the last result back into the rst of the FONC gives
u
1
c
=
1
p
n

i=1
_
u
1
c
u
i
c
_
u
i
x
Dividing by u
1
c
and multiplying by p, we see that
p =
n

i=1
u
1
x
u
1
c
=
n

i=1
MRS
i
As in the case of two consumers, p equals the aggregate willingness to pay.
Implications:
145
For a non-rivalrous good, the optimal provision of the good has the property that the marginal
cost p equals the aggregate willingness to pay. This is called the Samuelson condition because
it was derived by the great American economist Paul Samuelson in 1954.
A simple market mechanism will not necessarily achieve the optimality condition. With non-
excludable goods, in fact, it is hard to see why anyone is willing to contribute voluntarily,
(although people do). Thus, the provision of pure public goods usually is left to political
mechanisms.
With excludable goods such as proprietary software, a per-user fee may be reasonable. Note
that the producer receives the sum of the user fees.
For questions such as how much to invest in wilderness areas, some suggest polling the public
and asking how much people would be willing to pay to expand/protect the wilderness versus
selling it o. This practice is controversial because its unclear whether those polled under-
stand the questions, or tell the truth. Moreover, goods such as wilderness areas are valued in
a passive way since most people never will experience them rst hand. Unlike ordinary con-
sumer goods, there is no observable behavior that can be traced back to a persons willingness
to pay. Despite these issues, this method, known as contingent valuation, was used to value
the environmental damageor lost passive usecaused by the Exxon Valdez oil spill.
31.2 Appendix: Social Optimum with Ordinary Goods
You may be wondering how the idea of a social optimum works with ordinary goods. Lets consider
the decision how to allocate an ordinary good x. The government collects a tax t
i
from the ith
consumer, and allocates to the consumer x
i
units of the good. The budget constraint for the
government in this case is = p, where =

n
i=1
t
i
, =

n
i=1
x
i
, and p is the price of x.
Assume, as before, that consumer i has income y
i
and uses his or her after-tax income to buy
c
i
= y
i
t
i
units of the numeraire good.
The objective is
max
t1,...,tn,
x1,...,xn
u
1
(y
1
t
1
, x
1
) s.t.
_

_
u
2
(y
2
t
2
, x
2
) k
2
u
3
(y
3
t
3
, x
3
) k
3
.
.
.
u
n
(y
n
t
n
, x
n
) k
n
= p
The Lagrangian is
L = u
1
(y
1
t
1
, x
1
) +
n

i
[u
i
(y
i
t
i
, x
i
) k
i
] +( p)
Once again dene
1
= 1 and k
1
= 0 so that
L =
n

i=1

i
[u
i
(y
i
t
i
, x
i
) k
i
] +( p)
146
FONC:
L
ti
=
i
u
i
c
+ = 0, 1 i n
L
xi
=
i
u
i
x
+p, 1 i n
The rst collection of FONC implies
u
1
c
=
i
u
i
c
, 2 i n
or

i
=
u
1
c
u
i
c
, 2 i n
Combining these results with the second collection of FONC gives u
1
x
= p, or, equivalently, p =
u
1
x
/u
1
c
= MRS
1
, and

i
u
i
x
= p
=
_
u
1
c
u
i
c
_
u
i
x
= u
1
c
p
= p =
u
i
x
u
i
c
= MRS
i
, 1 i n
Thus at a social optimum we have
MRS
i
= p, 1 i n (31.1)
Note that this is the same condition that would result from opening a market in good x, and
charging p dollars per unit of x. However, in order to reach a particular social optimum we would
have to redistribute income via our choice of t
1
, . . . , t
n
.
It is possible to show the following:
Any particular social optimum can be achieved by opening a free market in good x, and
redistributing income via taxes.
For any given distribution of income, setting all taxes equal to zero achieves one possible
Pareto optimum. This may not be the one that people particularly likeit will result in
highest utility for the person with highest incomebut it is nonetheless ecient in the sense
that it satises (31.1).
147
32 Externalities
Externalities arise when the consumption or production of a good by one economic agent causes a
side eect for others. Examples include air pollution caused by burning fossil fuels, the playing of
loud music, etc. Externalities can be positive as well: a classic example is bees, which are needed
to pollinate fruit trees!
This secion deals primarily with air pollution, which is like a public good to the extent that air
quality aects the entire population of an area.
32.1 Consumption Externalities
We shall use an extended version of the model used in our analysis of public goods. Assume that
consumers care about three things:
consumption of a basic, numeraire good c
consumption of a good x, with an externality
the level z of the externality
Think of x as gasoline and z as the amount of smog in the air. Consider an economy with n
consumers, i = 1, . . . , n. Consumer i has income y
i
and with it consumes c
i
and x
i
. The level z of
the externality is determined by the total consumption of x:
z =
where =

n
i=1
x
i
and is the amount of smog produced per gallon of gas used. Let p denote the
priceand the marginal costof x. The utility of consumer i is given by
u
i
(c
i
, x
i
, z) = u
i
(y
i
px
i
, x
i
, )
We are assuming that u
i
c
> 0, u
i
x
> 0, and u
i
z
< 0, i.e. z is bad. Notice the similarity between z
and the public goods we studied previously: consumer is consumption of z has no eect on the
amount of z available to others.
32.1.1 Market Equilibrium
Consumer 1 takes p as given, and while he realizes z =

n
i=1
x
i
, he also takes x
2
, . . . , x
n
(gas
consumption of others) as given. His objective is
max
x1
u
1
(y
1
px
1
, x
1
, )
FONC:
pu
1
c
+
1
p
u
1
x
+u
1
z
= 0 =
u
1
x
u
1
c
..
MRS
i
(x,c)
= p
u
1
z
u
1
c
148
In general, a consumer is advaised to set her MRSfor x relative to cequal to p u
i
z
/u
i
c
. If
u
i
z
< 0, then pu
1
z
/U
1
c
> p, so the consumer acts as if the price of x is actually higher. The price
dierence u
i
z
/u
i
c
is , (the rate of production of z per unit x), times the marginal willingness to
pay for clean air, u
i
z
/u
i
c
.
32.1.2 Social Optimum
A social planner has to allocate x and collect taxes t
i
, (i = 1, . . . , n) that balance the governments
costs: = p, where =

n
i=1
t
i
and =

n
i=1
x
i
. As before, we look for Pareto omptimal
outcomes. The social planners objective is:
max
t1,...,tn
x1,...,xn
u
1
(y
1
t
1
, x
1
, ) s.t.
_

_
u
2
(y
2
t
2
, x
2
, ) k
2
.
.
.
u
n
(y
n
t
n
, x
n
, ) k
n
= p
Dene
1
= 1, k
1
= 0. The Lagrangian is
L =
n

i=1
[
i
u
i
(y
i
t
i
, x
i
, ) k
i
] +( p)
FONC:
L
ti
=
i
u
i
c
+ = 0, 1 i n (32.1)
L
xi
= u
i
x
+
n

i=1

i
u
i
z
p = 0, 1 i n (32.2)
Equations (32.1) imply
=
i
u
i
c
, 1 i n
and in particular
= u
1
c
(32.3)
As a consequence,

i
=
u
1
c
u
i
c
, 1 i n (32.4)
149
Equations (32.2) imply

i
u
i
x
= p
n

i=1

i
u
i
z
=

i

u
i
x
= p
n

i=1

u
i
z
=

i
u
1
c
u
i
x
= p
n

i=1

i
u
1
c
u
i
z
by (32.3)
=
u
1
c
/u
i
c
u
1
c
u
i
x
= p
n

i=1
u
1
c
/u
i
c
u
1
c
u
i
z
by (32.4)
=
u
i
x
u
i
c
..
MRS
i
(x,c)
= p
n

i=1
u
i
z
u
i
c
, 1 i n
This means everyone has to set
MRS = p +
where = , and =

n
i=1
u
i
z
/u
i
c
is the aggregate marginal willingness to pay for clean air.
32.1.3 Market Equilibrium versus Social Optimum
Market Eq: MRS
i
(x, c) = p u
i
z
/u
i
c
Social Opt: MRS
i
(x, c) = p

n
i=1
u
i
z
/u
i
c
= p +
So, in the social optimum, consumer i takes account of the eect of her gas consumption on everyone
else whereas in the market equilibrium she cares only about herself.
The sum p + is the social marginal cost of consuming gas. It exceeds the private cost p if is
non-zero, and if there is some value to clean air, (which obviously is the case if u
i
z
/u
i
c
0 for all
i). In the real world, is very small but n is very big, so while u
i
z
/u
i
c
is negligible, can be
signicant.
In the 1920s the English economist Arthur C. Pigou gured out that one can correct an externality
by taxing the activity that creates it, with a tax . We have shown that the optimal Pigouvian tax
for a consumption externality that aects the entire population is
=

i
{consumer is willingness to pay for marginal reduction in externality}.
32.1.4 Other Examples
Taxes for wear and tear on the road. The usual justication for a gas taxapart from
the air pollution eectis that driving causes the roadways to deteriorate. If the wear and
tear caused by a given car is proportional to the cars gas mileage, a Pigouvian tax on gas is
sensible.
150
Taxes on cigarettes are sometimes justied because they are a tax on second hand smoke.
Some people have proposed a tax on foods that cause obesity. This is a more complicated
case but the basis of their argument is that health care costs for those over 65, (which is
when most costs are incurred), are heavily subsidized through Medicare. Thus, if someone
eats too much and as a result winds up with diabetes later in life, this person contributes to
the Medicare bill, which we all pay.
32.2 Production Externalities
We will restrict our attention to a very simple example of a production externality. The example
is motivated by the electric power industry, which in most places uses coal to create electricity.
Assume there are n plants, i = 1, . . . , n. Plant i has cost function c
i
(s
i
, y
i
), where y
i
is the amount
of electricity (kWh) produced, and s
i
is a choice variable representing the choice of factors that
aect the amount of SO
2
produced. For example, s
i
could represent the choice of what type of coal
to use (more expensive coal from the Western US, which burns cleaner, versus cheaper coal from
the East), or the choice of what kind of scrubber to install. The amount of SO
2
emitted by the
plant is
z
i
= y
i

i
(s
i
)
where

i
(s
i
) < 0 and

i
(s
i
) > 0, i.e.
i
is decreasing and convex as in Figure 32.1.
Figure 32.1:

i
(si) < 0 and

i
(si) > 0.
Let be the aggregate willingness to pay to avoid SO
2
across the entire population, not only
the power industryand let p be the value of a kWh of electricity. From the point of view of an
industry regulator, the objective is to maximize the industry surplus, valuing SO
2
at /kWh:
max
y1,...,yn

where
=
n

i=1

i
=
n

i=1
[py
i
c
i
(s
i
, y
i
)]
151
is the total prot of the industry as a whole, and
=
n

i=1
z
i
=
n

i=1
y
i

i
(s
i
)
is the total amount of SO
2
emitted by the industry. (As an alternative, we could set up the problem
by having utility functions for all the local residents, who each use electricity and consume another,
numeraire good c, and wish to avoid having SO
2
in the air. As an exercise, set up the problem this
way.)
FONC w.r.t. y
i
:
p c
i
yi

i
(s
i
) = 0, 1 i n (32.5)
This means the output of plant i should be chosen so that
c
i
yi
+
i
(s
i
) = p
The LHS, c
i
yi
+
i
(s
i
), is called the marginal social cost of production at plant i. The regulator
wants to set this equal to p, the social value of a kWh of electricity.
FONC w.r.t.
i
:
c
i
si
y
i

i
(s
i
) = 0, 1 i n (32.6)
Dividing by y
i
,
1
y
i
c
i
si
. .
AC
i
/si
=

i
(s
i
)
The optimal choice is the one for which the marginal increase in average cost osets the marginal
value of the reduced pollution per unit of output. Assuming AC
i
(s
i
, y
i
) is convex in s
i
(so that
with higher s
i
, an additional increase in s
i
has a bigger eect on AC
i
) and that
i
is decreasing
and convex, we have Figure 32.2.
Figure 32.2: How can a regulator get Plant i to choose s

i
?
Method 1 (Pigouvian Tax):
152
Tax each plant per ton of SO
2
produced.
Buy electricity at p/kWh.
The manager of plant i will then attempt to maximize

i
= py
i
c
i
(s
i
, y
i
)
i
(s
i
)
which has FONC equivalent to (32.5) and (32.6) above.
Method 2 (Cap & Trade):
Distribute among the plants a xed amount of SO
2
emission rights, each of which entitles
the bearer to produce a ton of SO
2
.
Allow the plants to trade emission rights among themselves.
Buy electricity at p/kWh.
Let q be the value of an emission right, where q > 0. A plant manager who owns k emission
rights will then attempt to maximize
py
i
c
i
(s
i
, y
i
) +v
where v = kq qy
i

i
(s
i
) is the value of the emission rights she can sell on the market (or will
have to buy). Notice that if q = v, the FONC for this plant is equivalent to (32.5) and (32.6).
This is how SO
2
really is regulated.
Why use Method 2?
In reality, no one knows what v to charge. So instead the regulator looks at the total amount
of SO
2
emitted at some reference point in time, then issues a somewhat smaller number of
emission rights, e.g. 80%. This method ensures that SO
2
is reduced by 20% eciently.
Firms prefer this method because they get the emission rights free of charge. (Emission
rights were distributed in the early 1990s, and plants were allowed to trade them, but the
rules forbidding them from exceeding the limits didnt take eect until 1995.)
It is claimed that enforcement is easier.
153
33 Empirical Methods in Microeconomics
This section provides the reader with an overview of how microeconomists use real data to test
alternative theories and (in some cases) estimate the relevant parameters of a particular model.
The examples are drawn from my own work in labor economics.
33.1 Experiments and Counterfactuals
Suppose one is interested in testing a prediction of microeconomic theory. To be concrete, we shall
consider four examples:
If single mothers currently on welfare are oered an earnings subsidy, will they work more?
If the supply of low-skilled workers in a local labor market is increased by an inux of immi-
grants, will wages of native, low-wage workers fall?
If the minimum wage is increased, will low-wage employers hire fewer workers?
If people without health insurance are provided insurance, will they use more health care
services? Will they become healthier?
The classical scientic approach to such questions would be to conduct a randomized experiment.
In such an experiment, a population whose behavior is to be studied would be randomly divided into
two groups: the treatment group, members of which receive the treatment, and the control group,
members of which do not. For the welfare question, the population would be single mothers currently
on welfare. For the immigrant question, the population would be cities (or other geographic entities
such as counties). For the minimum wage question, the population would be employers. For the
nal question, the population would be the uninsured. Note that some of these experiments seem
harder to carry out than others.
Lets assume that one could conduct a randomized experiment on welfare mothers. (In reality, such
an experiment was conducted in two Canadian provinces in the mid-90s. We will examine the data
shortly.) How would one do this? Presumably, one could tabulate the employment rates of the
treatment group Y
T
and the control group Y
C
some time after the subsidy was in place. One would
then calculate the treatment eect
= Y
T
Y
C
The idea of a randomized experiment is that in the absence of the treatment, the two groups would
have had equal outcomes. Randomization is key: if treatment status really is randomly assigned
to the general population, then it is reasonable to expect the two groups to exhibit the same
behavior in the absence of treatment. The impact of statistical accidents is minimized by using
big groups. The behavior of the control group represents a counterfactual for assessing whether or
not the treatment has an eect. If a theory predicts that a subsidy will increase work eort, for
example, then we want to test the null hypothesis H
0
: = 0 versus the alternative hypothesis
H
1
: > 0.
A randomized experiment is considered the gold standard for scientic evidence. The FDA, for
example, requires drug companies to evaluate the ecacy of a new drug by means of a randomized
experiment. The high status of randomized experiments is due to several features:
154
1. Randomization ensures that Y
C
is a valid counterfactual. So, except for chance errors, is
truly attributable to the treatment, not to some inherent dierence between the two groups.
2. Once the experimental design is determined, the researchers hands are tied. There is no room
for weaseling. (The experimental design is a full description of the population, the sample
size, the randomization procedure, the treatment, and the data collection process.)
3. Because of (1) and (2), randomized experiments are easy to understand and therefore have a
lot of credibility.
33.1.1 The Self Suciency Project (SSP)
SSP is the name of a randomized experiment conducted in Canada during the 90s. Half a random
sample of single mothers who had been on welfare for at least a year was assigned to the treatment
group. The other half was assigned to the control group. Members of the control group were eligible
to receive their regular welfare benet, a xed monthly sum based on the number of children in
the home as well as the province, (e.g. $712 per month for a mother of one in New Brunswick).
Welfare payments are reduced dollar-for-dollar for those who earn over $200 per month. Members
of the treatment group were allowed to remain on welfare but were oered an earnings subsidy
S = (M E)/2, where M is a monthly earnings target ($2500/month) and E is actual earnings.
So, if a participant earned $650 in a month, she received a subsidy of $925. Participants qualied
for the subsidy only if they worked at least 30 hours per week, for up to three years. They also had
to receive their rst subsidy payment within a year of entering the treatment group or they forfeited
all future eligibility. Figure 33.1 shows the monthly budget constraint, and Figure 33.2 shows the
Figure 33.1: Monthly budget constraint for members of the treatment group in the SSP.
fractions of each group on welfare as a function of time, in months, since random assignment, along
with a graph of the average employment rate for each group.
155
Time (Months)
F
r
a
c
t
i
o
n

o
f

G
r
o
u
p

o
n

I
A
0 10 20 30 40 50 60 70
0
.
0
0
.
2
0
.
4
0
.
6
0
.
8
1
.
0
Control Group
Treatment Group
Difference
Time (Months)
F
r
a
c
t
i
o
n

o
f

G
r
o
u
p

E
m
p
l
o
y
e
d
0 10 20 30 40 50
0
1
0
2
0
3
0
4
0
Control Group
Treatment Group
Difference
Figure 33.2: Source: D. Card and D. Hyslop, Estimating the Eects of a Time-limited Earnings Subsidy
for Welfare Leavers, Econometrica 73, November 2005.
156
33.2 Research Designs Based on Natural Experiments
Often we cannot carry out an experiment, either because it would cost a lot, and be quite invasive
(e.g. SSP), or because it would be impractical. How do we proceed in such cases?
One approach is to consider events that occur, and gauge whether an anlysis of the event could be
interpreted as if the event were a random experiment. A very simple example is a paper I wrote
on the Mariel Boatlift. In that paper, examined the movements in wages and unemployment rates
in Miami, (where the Marielitos landed), and within a control group comprised of four other cities:
Tampa, Houston, Atlanta, and Los Angeles. A key dierence between a true randomized experiment
and a natural experiment is that treatment is not randomly assigned. So it is debatable whether
the control group provides a valid counterfactual. For my paper, I examined trends in employment
in Miami versus the average of the four other cities throughout the 70s: the two moved in close
parallel. (Ironically, the editor of the journal forced me to remove this graph from the published
paper!)
In a natural experiment, it may not happen that outcomes are exactly the same in both groups,
even before the treatment. Let

0
= Y
0
T
Y
0
C
represent the pre-existing gap in the outcomeor measurable quantityof iterest (e.g. average
wages), and let

1
= Y
1
T
Y
1
C
represent the gap at some time after the treatment has begun. Then we might want to look at the
dierence-in-dierences
DD =
1

0
= (Y
1
T
Y
0
T
) (Y
1
C
Y
0
C
)
This is the change in the treatment group relative to the change in the control group. The implicit
assumption is that in the absence of treatment,
0
would have remained constant.
33.2.1 The Mariel Boatlift
In the Boatlift, about 125,000 Cuban immigrants were transported on a otilla of small boats to
Miami, over the period from April 1980 to July of the same year. This represented an increase of
about 7% in the Miami labor forcemainly in the ranks of the unskilled. One simple hypothesis
is that such an inux would reduce wages for unskilled workers already in Miami. Table 8 shows
outcomes for blacks in Miami relative to the comparison cities.
33.3 Natural Experiments with Several Control Groups
In a natural experiment, one never can be sure the control group provides a valid counterfactual.
Sometimes it is possible to do additional checks by using two or more control groups. Then you
157
Table 8: Logarithms of Real Hourly Earnings of Workers Age 1661 in Miami and Four Comparison Cities,
197985.
Group 1979 1980 1981 1982 1983 1984 1985
Miami:
Whites 1.85 1.83 1.85 1.82 1.82 1.82 1.82
(.03) (.03) (.03) (.03) (.03) (.03) (.05)
Blacks 1.59 1.55 1.61 1.48 1.48 1.57 1.60
(.03) (.02) (.03) (.03) (.03) (.03) (.04)
Cubans 1.58 1.54 1.51 1.49 1.49 1.53 1.49
(.02) (.02) (.02) (.02) (.02) (.03) (.04)
Hispanics 1.52 1.54 1.54 1.53 1.48 1.59 1.54
(.04) (.04) (.05) (.05) (.04) (.04) (.06)
Comparison Cities:
Whites 1.93 1.90 1.91 1.91 1.90 1.91 1.92
(.01) (.01) (.01) (.01) (.01) (.01) (.01)
Blacks 1.74 1.70 1.72 1.71 1.69 1.67 1.65
(.01) (.02) (.02) (.01) (.02) (.02) (.03)
Hispanics 1.65 1.63 1.61 1.61 1.58 1.60 1.58
(.01) (.01) (.01) (.01) (.01) (.01) (.02)
Note: Entries represent means of log hourly earnings (deated by the Consumer Price Index
1980=100) for workers age 1661 in Miami and four comparison cities: Atlanta, Houston, Los Angeles,
and TampaSt. Petersburg.
Source: D. Card, The Impact of the Mariel Boatlift on the Miami Labor Market, Industrial and
Labor Relations Review, January 1990. Based on samples of employed workers in the ongoing rotation
of groups of the Current Population Survey in 197985. Due to a change in SMSA coding procedures
in 1985, the 1985 sample is based on individuals in outgoing rotation groups for JanuaryJune of 1985
only.
can construct
DD
1
= (Y
1
T
Y
0
T
) (Y
1
C1
Y
0
C1
)
DD
2
= (Y
1
T
Y
0
T
) (Y
1
C2
Y
0
C2
)
DD
3
= (Y
1
C2
Y
0
C2
) (Y
1
C1
Y
0
C1
)
where C
1
refers to control group 1, and C
2
refers to control group 2. Ideally it will be the case that
DD
1
= DD
2
, or equivalently, DD
3
= 0.
33.3.1 The New Jersey Minimum Wage
In April 1992, the minimum wage rose from $4.25 to $5.05 per hour in the state of NJ. Elsewhere,
it remained $4.25. The statute that raised the minimum wage had been passed in fall of the year
before, and, in anticipation, Alan Krueger and I developed a survey of fast food restaurants in NJ
and PA. We surveyed a set of about 400 restaurants rst in FebruaryMarch of 1992, (just before
the increase), and again in late fall. We were extremely careful to track down all the restaurants
that were surveyed in the rst round. The treatment group consisted of restaurants in NJ whose
starting wages were less than $5.00 per hour prior to the increase. There were two control groups:
restaurants in PA, and restaurants in NJ that already were paying relatively high wages, ($5.00
158
Table 9: Average Employment Per Store Before and After the Rise in the NJ Minimum Wage
or more per hour prior to the increase). Table 9 shows the comparisons of employment growth
between groups.
33.4 The Discontinuity Research Design
Sometimes one cannot nd a good natural experiment; it is nonetheless possible to nd a good
counterfactual by looking at treatments that aect some groups but not other, extremely similar
groups. A good example is Medicare. When individuals who have worked for at least 10 years turn
65, they become eligible for free health insurance. (One also is eligible if ones spouse worked 10
years.) This age limit suggests that we compare individuals who are just a few months younger
than 65, with those who are a few months older. Figure 33.3 shows the fractions of people with
health insurance, by age (measured in quarters). The plots are for two groups: (relatively) more
educated whites (over 12 years of education), and less educated minorities (blacks and hispanics
with less than 12 years of education). The idea of the discontinuity design is that the rule that
grants free insurance to those who reach their 65th birthday creates an experiment: we think of
those just over 65 as the treatment group, and those just under 65 and the control group. There
are some potential problems with this idea, depending on the application:
It may be that other factors, apart from the primary treatment, also change at the same point
in time. So it is important to check very carefully that these factors are very similar between
groups.
159
Age
F
r
a
c
t
i
o
n

o
f

G
r
o
u
p

I
n
s
u
r
e
d
55 60 65 70 75
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
Whites, High Edu. (Actual)
Whites, High Edu. (Pred.)
Overall (Actual)
Overall (Pred.)
Minorities, Low Edu. (Actual)
Minorities, Low Edu. (Pred.)
Figure 33.3: Health insurance coverage rates by age, based on 19922001 data from NHIS.
There may be an age trend in the outcome of interest, so that even without treatment,
individuals who are a little over 65 tend to be a little dierent from those under 65 in a
certain respect. This can be checked by looking at the age prole of the outcome of interest.
If individuals know they soon will be eligible for Medicare, they may act dierently when they
are just under 65 from the way they would if there were no such rule.
160
Percentage Who Did Not Get Medical Care Last Year for Cost Reasons
Age
P
e
r
c
e
n
t
a
g
e
55 60 65 70 75
2
4
6
8
1
0
1
2
1
4
Whites, High Edu.
Overall
Minorities, Low Edu.
Florida Outpatient Data, 19972002
Age
L
o
g
(
N
o
.

o
f

C
a
t
a
r
a
c
t

S
u
r
g
u
r
i
e
s
)
55 60 65 70 75
1
.
5
2
.
0
2
.
5
3
.
0
3
.
5
White
Hispanic
Black
Figure 33.4: Here are plots showing the fractions of individuals belonging to three demographics who
report that they did not receive medical care in the last year because they could not aord
it, and the number of cataract surguries by age in Florida. You can see the discontinuities
in the cataract data.