Sie sind auf Seite 1von 101

Lecture Notes for Econ 101A

David Card
Dept. of Economics
UC Berkeley
Lecture Topics Relevant Sections
1 Optimization Methods 1
2 Consumer Choice 2
3 Applications of Indierence Curve Analysis, Expenditure Functions 34
45 Comparative Statics, Slutskys Equation 56
6 Market Level Demand and Supply 7
7 Labor Supply 8
8 Intertemporal Consumption & Savings 9
911 Production & Cost, Sheppards Lemma 1012
1213 Supply Determination
14 Monopoly
15 Consumer/Producer Surplus & Applications
1617 Duopoly
1820 Game Theory
2124 Uncertainty
2526 Auctions
27 Public Goods, Externalities
28 Empirical Methods
1
1 Optimization
1.1 Unconstrained Optimization
Consider a function y = f(x), a x b. How do we go about nding a point x
0
such that
y
0
= f(x
0
) is as big as (or bigger than) f(x) for any x in [a, b]?
Figure 1: In this picture f(x
0
) = max
axb
f(x). (Read: f(x
0
) is the maximum value of f(x)
when x is selected from the interval [a, b].)
What can we say generally? Obviously, if x
0
is a potential candidate for a maximizer, then it must
be the case that we cant move around x
0
and reach a higher value of f. But this means f

(x
0
) = 0.
Why? Let 0 < h 1.
If f

(x) > 0, then f(x +h) f(x) +hf

(x) > f(x).


If f

(x) < 0, then f(x h) f(x) hf

(x)
. .
<0
> f(x).
This leads us to:
RULE 1 If f(x
0
) = max
axb
, then f

(x
0
) = 0.
This is called the rst order necessary condition (FONC) for an interior maximum.
Does f

(x
0
) = 0 always mean that x
0
is a maximizer? Are there maximizers with f

(x
0
) = 0?
Consider the counterexamples illustrated in Figure 3.
How can we be certain that we have located a maximum (not a minimum, nor an inection point)?
We examine the properties of f

(x), which is itself a function of x. Take a look at Figure 4. As


the function f

crosses x

from left to right, it goes from positive to negative, i.e. its decreasing.
On the other hand, as f

crosses x

from left to right, it goes from negative to positive, i.e. its


increasing. In general, at a local maximum f

(x) has negative slope, or in other words f

(x) < 0,
while at a local minimum f

(x) has positive slope, that is f

(x) > 0.
These considerations lead us to:
RULE 2 If f

(x
0
) = 0 and f

(x
0
) < 0, then f(x
0
) is a local maximum.
If f

(x
0
) = 0 and f

(x) > 0, then f(x


0
) is a local minimum.
2
Figure 2: Notice that Rule 1 also holds for a function of several variables.
(a) In this example f(x) = x.
Thus f(b) = max
axb
f(x)
even though f

(b) = 1 = 0.
The maximum occurs on the
boundary.
(b) In this example f

(x) = 0 has
two solutions, x

and x

but nei-
ther one is a maximizer. f(x

) is
a local maximum while f(x

) is a
minimum.
(c) In this example f(x) =
x
3
. Solving f

(x) = 0 gives
x = 0, which is an inection
point.
Figure 3: Exceptions to the converse of Rule 1.
3
x
f

(x)
x
f(x)
x
f(x)
x
f

(x)
x

Figure 4: Properties of f

(x).
This rule generalizes to two or more dimensions.
How do we determine whether a local maximum is a global maximum? If f

(x) < 0 for all x and


f

(x
0
) = 0, then x
0
is a global maximum. A function f such that f

(x) < 0 for all x is called


concave.
1
(See Figure 5.)
Figure 5: A concave function always lies below any line tangent to its graph.
1.2 Constrained Optimization
Now we consider maximizing a function f(x
1
, x
2
) subject tos.t.some constraint on x
1
and x
2
which we denote by g(x
1
, x
2
) = g
0
. The two important examples of this in economics are:
1
See Appendix 1.3.
4
In the study of consumer behavior, maximize utility U(x
1
, x
2
) s.t. the budget constraint
p
1
x
1
+p
2
x
2
= I.
In the study of rm behavior, maximize prot py wx s.t. the production function y = f(x).
How do we go about a graphical analysis of the problem of maximizing f(x
1
, x
2
) s.t g(x
1
, x
2
) = g
0
?
Figure 6: Illustration of two-step approach described on p. 5.
A two-step approach:
1. Plot the contours of the function g.
E.g. g(x
1
, x
2
) = x
2
1
+ x
2
2
; g(x
1
, x
2
) = k is the equation of a circle with radius

k
and center O = (0, 0).
2. Plot the contours of the function f.
E.g. f(x
1
, x
2
) = x
1
x
2
; f(x
1
, x
2
) = m is the equation of a hyperbola.
The constrained maximum of the function f occurs where a contour of f is tangent to the contour
of g corresponding to g
0
. Why? Suppose we add a small amount dx
1
to x
1
in such a way as to
keep g(x
1
, x
2
) constant. If so, then we must have a corresponding reduction in x
2
such that the
total dierential of g is zero, i.e.
dg = g
x1
(x
1
, x
2
)dx
1
+g
x2
(x
1
, x
2
)dx
2
= 0,
(where g
xi
denotes g/x
i
), which implies
dx
2
dx
1
=
g
x1
(x
1
, x
2
)
g
x2
(x
1
, x
2
)
.
If we increase x
1
by one unit, we must increase x
2
by g
x1
(x
1
, x
2
)/g
x2
(x
1
, x
2
)or, equivalently,
decrease x
2
by g
x1
(x
1
, x
2
)/g
x2
(x
1
, x
2
)in order to keep the value of g constant. The net eect of
5
such a change in x
1
on the value of f is
df = f
x1
(x
1
, x
2
)dx
1
+f
x2
(x
1
, x
2
)dx
2
= f
x1
(x
1
, x
2
)dx
1
+f
x2
(x
1
, x
2
)

dx
2
dx
1

dx
1
=
_
f
x1
(x
1
, x
2
) f
x2
(x
1
, x
2
)
g
x1
(x
1
, x
2
)
g
x2
(x
1
, x
2
)
_
dx
1
.
Now in order for (x
0
1
, x
0
2
) to be a constrained maximum, it must be the case that we cannot increase
f by adding or subtracting a small amount to x
1
while keeping the value of g constant. But this
means the above expression is 0 for all dx
1
, or in other words
f
x1
(x
1
, x
2
)
f
x2
(x
1
, x
2
)
=
g
x1
(x
1
, x
2
)
g
x2
(x
1
, x
2
)
.
But this expression says that at (x
0
1
, x
0
2
), the contours of f and g are tangent, i.e. have the same
slope. Note that this argument applies only if (x
0
1
, x
0
2
) lies in the interior of the domain for if (x
0
1
, x
0
2
)
lies on the boundary then we cannot increase or decrease one of x
1
or x
2
.
How do we convert a constrained maximization problem into an unconstrained one? A French
mathematician named Lagrange noted that one gets the right answer by setting up an articial,
unconstrained maximization problem with an additional variable, :
L(x
1
, x
2
, ) = f(x
1
, x
2
) (g(x
1
, x
2
) g
0
).
The FONC for L, with respect to x
1
, x
2
, and are:
L
x1
= f
x1
(x
1
, x
2
) g
x1
(x
1
, x
2
) = 0,
L
x2
= f
x2
(x
1
, x
2
) g
x2
(x
1
, x
2
) = 0,
L

= g(x
1
, x
2
) g
0
= 0.
Dividing the rst of these by the second gives
f
x1
(x
1
, x
2
)
f
x2
(x
1
, x
2
)
=
g
x1
(x
1
, x
2
)
g
x2
(x
1
, x
2
)
,
while the third simply restates the constraint! Thus by writing down the Lagrangian L and setting
its rst derivatives equal to zero we get the necessary conditions for a constrained maximum.
We also get a new variable, , called the Lagrange multiplier. How do we interpret ? It turns out
that the value of tells us how much the maximum value of f changes if we relax the constraint by a
small amount. Specically, suppose we are to maximize f(x
1
, x
2
) s.t. the constraint g(x
1
, x
2
) = g
0
.
Call the solution (x
0
1
, x
0
2
). Now suppose we relax the constraint and instead maximize f(x
1
, x
2
) s.t.
g(x
1
, x
2
) = g
0
+ dg
0
. How do we change our optimal choices of x
1
and x
2
? Suppose we decide to
use more x
1
, enough to use up the added constraint. Since the total dierential of g is
dg = g
x1
(x
1
, x
2
)dx
1
+g
x2
(x
1
, x
2
)dx
2
,
6
if we change only x
1
, (that is, if dx
2
= 0), the amount we can change x
1
while satisfying the new
constraint is
dx
1
=
1
g
x1
(x
1
, x
2
)
dg
0
.
The increase in f that accompanies this increase in x
1
is
df = f
x1
(x
1
, x
2
)dx
1
=
f
x1
(x
1
, x
2
)
g
x1
(x
1
, x
2
)
= .
You are encouraged to check for yourself that if you were to use up the added constraint on x
2
, df
would again be . This suggests another interpretation of the tangency condition: at a maximum,
if we had a bit more constraint, then we would be indierent as to whether to use it on x
1
or x
2
.
As with unconstrained optimization, there are also second order conditions. These can be expressed
algebraically; however, they amount to the condition that the objective function has contours that
are more convex than the constraint.
2
(a) Contours of f are more convex than g(x
1
, x
2
) = g
0
.
SOC satised.
(b) Contours of f are linear, less convex than
g(x
1
, x
2
) = g
0
. SOC not satised.
Figure 7
1.3 Appendix
1.3.1 Convexity
A set S R
2
is convex if, for every pair of points u = (u
1
, u
2
) and v = (v
1
, v
2
) in S,
[0, 1] u + (1 )v S,
i.e. the line segment joining u and v lies entirely in S. A set that is not convex is called concave.
2
See Appendix 1.3.
7
A function f : [a, b] R is called convex if, for every x
1
and x
2
in [a, b],
[0, 1] f(x
1
+ (1 )x
2
) f(x
1
) + (1 )f(x
2
).
Or, equivalently, f : [a, b] R is convex if the set S = {(x, y) [a, b] R : y f(x)} is convex. A
function g : [a, b] R is called concave if g is convex. Let f be twice dierentiable. Then
f is convex f

(x) > 0 for all x


f is concave f

(x) < 0 for all x


Throughout these notes, if f

(x) >[<] g

(x) >[<] 0 on some interval, then we shall think of f as


being more[less] convex[concave] than g.
A function f : R
2
R is quasi-concave if S
k
= {(x, y) R
2
: f(x, y) k} is convex for all k. (The
sets S
k
are called upper contour sets.)
1.3.2 SOC in Higher Dimensions
Let f : R
n
R, i.e. let z = f(x
1
, . . . , x
n
), and dene the Hessian H(f) to be the matrix
H(f) =
_
_
_
_
_
_
_

2
f
x
2
1

2
f
x1x2


2
f
x1xn

2
f
x2x1

2
f
x
2
2


2
f
x2xn
.
.
.
.
.
.
.
.
.
.
.
.

2
f
xnx1

2
f
xnx2


2
f
x
2
n
_
_
_
_
_
_
_
.
Next, dene H
i
(f) to be the ith principal minor of H(f), the submatrix comprised of the rst i
rows and the rst i columns of H(f). For example
H
2
(f) =
_

2
f
x
2
1

2
f
x1x2

2
f
x2x1

2
f
x
2
2
_
.
If, at z
0
= f(x
0
1
, . . . , x
0
n
), |H
i
(f)| > 0 for all i, then z
0
satises the SOC for a local minimum. On
the other hand, if sgn(|H
i
(f)|) = (1)
i
for all i, then z
0
satises the SOC for a local maximum.
8
2 Consumer Choice
In this section we apply the methods of optimization of Section 1 to the analysis of consumer choice
subject to a budget constraint. The problem has three elements:
1. describe the budget constraint,
2. describe the consumers objective, i.e. his or her utility,
3. set up and solve the constrained optimization.
2.1 Budget Constraint
We assume that a consumer must choose among bundles (x
1
, . . . , x
n
) of commodities 1 through n
that fall within his or her budget. In the case of just two goods x
1
and x
2
let their prices be p
1
and p
2
, respectively. Let the consumer have income I. Then the bundle (x
1
, x
2
) is aordable i
p
1
x
1
+p
2
x
2
I.
Figure 8: Graphically, the set of aordable bundles (the budget set) is the triangular region bounded
by the coordinate axes and the line x
2
= (p
1
/p
2
)x
1
+I/p
2
.
Note the following:
if all income is spent on x
1
, the total amount available is I/p
1
(and likewise for x
2
),
we are implicitly assuming that you cannot buy negative amounts of x
1
or x
2
,
the slope of the budget line (the outer boundary of the budget set) is p
1
/p
2
.
2.2 Consumers Objective
We seek a simple way of summarizing how the consumer evaluates alternative bundles, say (x
0
1
, x
0
2
)
and (x

1
, x

2
).
9
Figure 9: If we give up one unit of x
1
, we save p
1
, which can be used to purchase p
1
/p
2
units of x
2
.
The market trades x
1
for x
2
at the rate p
1
/p
2
. This ratio represents the relative price of x
1
and x
2
.
Graphically, the device we use is the indierence curve: a curve connecting bundles that are equally
good. Consider the indierence curve through (x
0
1
, x
0
2
), i.e. the set of bundles that are as good
as (x
0
1
, x
0
2
). (See Figure 10.)
Take a look at Figure 11. If both x
1
and x
2
are desirable, then bundles with more x
1
and more x
2
must be preferred to (x
0
1
, x
0
2
). By the same token, (x
1
, x
2
) must be preferred to bundles with less
x
1
and less x
2
. This means that indierence curves must have negative slope.
In more advanced treatments of economic theory, indierence curves are derived from a set of
assumptions about how consumers evaluate alternative bundles. Some types of preferences cannot
be represented by indierence curves. The classic example is lexicographic preferences: the
consumer evaluates a bundle (x
1
, x
2
) rst by the amount of x
1
, then by the amount of x
2
. If
x
0
1
> x

1
, then (x
0
1
, x
0
2
) is strictly preferred to (x

1
, x

2
) regardless of x
0
2
and x

2
. However, if x
0
1
= x

1
,
then the consumer compares x
0
2
and x

2
. (This is the same way alphabetical order works.) As an
exercise, try to graph the indierence curves of a consumer with lexicographic preferences.
Analytically, we represent preferences by a utility function U(x
1
, x
2
) with domain equal to the set
of possible consumption bundles. We construct U such that higher values are preferred.
Examples:
U(x
1
, x
2
) = x
1
x
2
U(x
1
, x
2
) = x
1
+x
2
U(x
1
, x
2
) = min {x
1
, x
2
}
Facts:
The contours of the function U are the indierence curves.
The bundles (x
0
1
, x
0
2
) and (x

1
, x

2
) lie on the same indierence curve i U(x
0
1
, x
0
2
) = U(x

1
, x

2
).
Let h > 0. If more of x
1
is always preferred, then U(x
1
+ h, x
2
) > U(x
1
, x
2
), which implies
10
Figure 10: How does a consumer decide between (x
0
1
, x
0
2
) and (x

1
, x

2
)?
Figure 11: If both x
1
and x
2
are desirable, then it follows that indierence curves are downward-
sloping.
11
U
x1
(x
1
, x
2
) > 0 for every bundle (x
1
, x
2
). (Likewise U
x2
> 0.) You are encouraged to verify
this for each of the above examples.
The slope of the indierence curve through (x
1
, x
2
), at (x
1
, x
2
), is U
x1
(x
1
, x
2
)/U
x2
(x
1
, x
2
).
We call the absolute value of this ratio the marginal rate of substitution (MRS) because it
is the amount of x
2
the consumer would need to compensate for the loss of one unit of x
1
,
or in other words the amount of x
2
needed, per unit of x
1
given up, in order to keep utility
constant.
Figure 12
Examples:
U(x
1
, x
2
) = x

1
x

2
(Cobb-Douglas)
U
x1
(x
1
, x
2
) = x
1
1
x

2
U
x2
(x
1
, x
2
) = x

1
x
1
2
MRS =
U
x1
(x
1
, x
2
)
U
x2
(x
1
, x
2
)
=


x
2
x
1
U(x
1
, x
2
) = x
1
+x
2
MRS =
U
x1
U
x2
= 1, a constant for every bundle (x
1
, x
2
)
U(x
1
, x
2
) = 2 log x
1
+x
2
MRS =
U
x1
U
x2
=
2/x
1
1
=
2
x
1
, independent of x
2
As an exercise, graph the indierence curves for these three examples.
Note: If your utility function is U(x
1
, x
2
) and mine is V (x
1
, x
2
) = aU(x
1
, x
2
) +b, where a > 0, then
we have the same preferences. Why? It can be shown that we have the same indierence curves,
12
only with dierent labels. The result holds for V = F(U), where F is a monatonically increasing
function.
You may be familiar with the concept of diminishing marginal rate of substitution (DMRS). Unless
stated otherwise, we shall assume DMRS in most of the examples throughout these notes.
(a) DMRS (b) Constant MRS (c) Increasing MRS
Figure 13
Along an indierence curve, (holding utility constant), the MRS decreases with x
1
. As one obtains
more x
1
, the less one values an additional unit of x
1
in terms of x
2
. DMRS implies that consumers
always prefer averages. Suppose we have two bundles (x
0
1
, x
0
2
) and (x

1
, x

2
), on the same indierence
curve. Then a bundle that is a weighted average of (x
0
1
, x
0
2
) and (x

1
, x

2
), e.g. (x
0
1
, x
0
2
) + (1
)(x

1
, x

2
), where 0 < < 1, is strictly preferred to either of the original bundles.
Figure 14: The dashed line represents the set of all weighted averages of x
0
and x

, that is, the


set S = {x
0
+ (1 )x

: 0 < < 1}. Clearly these are strictly preferred to both x


0
and x

.
Equivalently, the set S = {x R
2
: U(x) > U(x
0
)} is convex. (One can see this by noting the
shape of the region above the indierence curve.)
It is important to understand that DMRS is not the same as diminishing marginal utility, nor are
the two even related. Given a utility function U, the marginal utility of x
1
is U
x1
. We say that
U exhibits diminishing marginal utility if U
x1x1
= (U
x1
)
x1
< 0. However, the sign of U
x1x1
says
nothing about the MRS, as the following examples show:
13
U(x
1
, x
2
) = (x
2
1
+x
2
2
)
1/4
U
x1
(x
1
, x
2
) = (1/2)(x
2
1
+x
2
2
)
3/4
U
x1x1
(x
1
, x
2
) = (3/4)(x
2
1
+ x
2
2
)
7/4
< 0 decreasing marginal utility but the indierence
curves are circles, which exhibit increasing MRS.
U(x
1
, x
2
) = x
3
1
x
3
2
U
x1
(x
1
, x
2
) = 3x
2
1
x
3
2
U
x1x1
(x
1
, x
2
) = 6x
1
x
3
2
> 0 increasing marginal utility but the indierence curves are
hyperbolas, which exhibit DMRS.
2.3 Consumers Optimum
Analytically, the consumers problem is to solve
max
x1,x2
U(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I.
Have a look at Figure 15. Clearly, a bundle (x
0
1
, x
0
2
) is optimal if two things are true:
Figure 15: The consumer chooses the bundle that lands her on the highest indierence curve while
still lying on the budget line.
1. p
1
x
0
1
+p
2
x
0
2
= I,
2. MRS(x
0
1
, x
0
2
) = p
1
/p
2
.
Condition (2), the tangency condition, expresses the simple fact that if (x
0
1
, x
0
2
) is optimal, then
there are no gains to be made by trading in the market any further. If MRS > p
1
/p
2
, then the
consumer values x
1
more than the market does, in terms of x
2
, so it would benet the consumer
to sell x
2
and buy more x
1
. (See Figure 16.)
14
Figure 16: MRS > p
1
/p
2
. On the margin, the consumer values x
1
more than the market does, in
terms of x
2
, and there is room for a protable trade! What happens if MRS < p
1
/p
2
?
To proceed analytically, lets use the Lagrangian method:
L(x
1
, x
2
, ) = U(x
1
, x
2
) (p
1
x
1
+p
2
x
2
I)
L
x1
= U
x1
(x
1
, x
2
) p
1
= 0 (*)
L
x2
= U
x2
(x
1
, x
2
) p
2
= 0 ()
L

= p
1
x
1
p
2
x
2
+I = 0 ()
Dividing (*) by () gives the tangency condition
U
x1
(x
1
, x
2
)
U
x2
(x
1
, x
2
)
=
p
1
p
2
.
Also,
=
U
x1
(x
1
, x
2
)
p
1
=
U
x2
(x
1
, x
2
)
p
2
.
With an extra dollar to spend I could either
(a) buy 1/p
1
units of x
1
and increase my utility by U
x1
(x
1
, x
2
)/p
1
= , or
(b) buy 1/p
2
units of x
1
and increase my utility by U
x2
(x
1
, x
2
)/p
2
= .
For this reason, is sometimes called the marginal utility of income.
For example, if U(x
1
, x
2
) = x
1
x
2
, then L = x
1
x
2
(p
1
x
1
+p
2
x
2
I), and the FONC are:
L
x1
= x
2
p
1
= 0,
L
x2
= x
1
p
2
= 0,
L

= p
1
x
1
p
2
x
2
+I = 0.
15
Therefore, x
1
= p
2
and x
2
= p
1
. Plugging these results back into (),
p
1
(p
2
) +p
2
(p
1
) = I
2p
1
p
2
= I
=
I
2p
1
p
2

_
x
1
= x
1
(p
1
, p
2
, I) = I/2p
1
,
x
2
= x
2
(p
1
, p
2
, I) = I/2p
2
.
The functions x
1
(p
1
, p
2
, I) and x
2
(p
1
, p
2
, I) are called the demand functions. Notice that p
1
x
1
=
p
2
x
2
= I/2, so the consumer spends half his or her income on each good! As an exercise, re-do the
analysis for U(x
1
, x
2
) = x

1
x

2
with dierent values of and .
2.4 Special Problems
Preferences do not satisfy DMRS. (See Figure 17.)
Often, we restrict preferences by requiring the indierence curves to be convex to the origin.
(Functions with this property are called quasi-concave. A function u : R
2
R is quasi-
concave if the upper contour sets S
k
= {(x
1
, x
2
) R
2
: u(x
1
, x
2
) k} are convex for all
k.)
Even with quasi-concave preferences, i.e. with convex indierence curves, we still can run into
problems. (See Figure 18.) Most consumers consume zero units of most goods, so the endpoint
problem is potentially one that economists must deal with. The problem is much worse the
more narrowly goods are dened, (e.g. Coke versus Pepsi), and becomes less serious the
more broadly they are dened (e.g. beverages in general). A considerable amount of applied
research regarding consumer demand involves the so-called discrete choice approach, focusing
on whether consumers buy some or none of a given commodity. Daniel McFadden won the
Nobel Prize for his research showing how to link the buy, dont buy decision to underlying
utility functions.
16
(a) Constant MRS. There is no bundle with
MRS = p
1
/p
2
.
(b) At this point MRS = p
1
/p
2
but this is not an
optimumwhats wrong?
Figure 17
(a) Endpoint optimum. MRS < p
1
/p
2
,
(x
1
, x
2
) = (0, I/p
2
).
(b) Endpoint optimum. MRS > p
1
/p
2
,
(x
1
, x
2
) = (I/p
1
, 0).
Figure 18: Endpoint Optima
17
3 Two Applications of Indierence Curve Analysis
We have seen that the consumers optimum is represented by a tangency between an indierence
curve and the budget constraint. This condition expresses the simple economic idea that the
consumer, on the margin, cannot adjust her consumption bundle to spend the same amount of
money and simultaneously achieve higher utility. Recall that the tangency condition is only true
when the indierence curves exhibit DMRS, and we dont have an endpoint optimum.
3.1 Analysis of a Subsidy
In many economies, certain commodities are subsidized by the government. A subsidy is a negative
tax that is usually introduced to aid low income consumers. Economists generally argue that
subsidies are inecient. Why?
Let there by two commodities: food f and other stu x. The price of other stu is p
x
, and
the price of food is p
f
. A typical consumer has income I and normal preferences, (quasi-concave
indierence curves with DMRS). The budget constraint is p
x
x +p
f
f = I. See Figure 19.
Figure 19
Suppose now that a subsidy of $s per unit is introduced on food. The budget constraint becomes
p
x
x+(p
f
s)f = I. If the consumer chooses the bundle (x

, f

), then the cost of the subsidy to the


government (for this consumer alone) is $sf

. Most economists would argue that you should instead


give the consumer $sf

directly and leave the price of food alone. To see this, suppose the lump
sum is given to the consumer directly, but she is forced to pay the marketunsubsidizedprice for
food. In this case her budget constraint is
p
x
x +p
f
f = I +sf

. (*)
18
Notice that the bundle (x

, f

) satises the budget constraint, since originally


p
x
x + (p
f
s)f = I.
In other words, if I give the consumer $sf she still can aord (x

, f

). But she can do even better,


as shown in Figure 20.
Figure 20
The reason is that the budget line (*), with the lump sum, is atter than the budget line with the
subsidy. They both pass through (x

, f

), so the budget line (*) cuts through an indierence curve


and therefore enables the consumer to choose a bundle with higher utility.
Figure 21 shows another way to see the same point.
3.2 The Consumer Price Index
The CPI is a measure of how much it costs today (in todays dollars) to buy a xed bundle of
commodities. We currently use 1982-84 as our reference period, which means the CPI is calculated
by nding the cost of the bundle relative to its cost in 1982-84, $100.
Suppose the CPI is 177.5, (which it was in July 2001). That means it now costs 1.775 times as
much to purchase the standard bundle as it did on average in 1982-84. If someone earns 1.78
times as much as he did in the early 80s, then he is at least as well o as he was then.
Does your nominal income necessarily have to rise in proportion with the CPI? Suppose that in
1983 you purchased (x
0
, y
0
) at prices (p
0
x
, p
0
y
). Your income was I
0
, and
x
0
p
0
x
+y
0
p
0
y
= I
0
.
19
Figure 21: Note that = sf

/p
x
, or the subsidy at initial optimum, in terms of x.
Now suppose that in 2001 prices are (p
0
x
(1+), p
0
y
(1+)). In this case both prices increased at the
rate of . How much would your income have to increase in order to oset the increase in prices?
See Figure 22.
On the other hand, suppose p
x
rises by 3/2 and p
y
rises by /2, i.e.
p
x
= p
0
x
_
1 +
3
2

_
,
p
y
= p
0
y
_
1 +
1
2

_
.
The increase in the cost of living is represented by the increase in the cost of the reference bundle
(x
0
, y
0
):
p
0
x
_
1 +
3
2

_
+p
0
y
_
1 +
1
2

_
p
0
x
x
0
p
0
y
y
0
=
3
2
p
0
x
x
0
+
1
2
p
0
y
y
0
.
If you initially spent half your income on each of x and y, then p
0
x
x
0
= p
0
y
y
0
= I
0
/2, and the
increase in the cost of living is
3
2

I
0
2
+

2

I
0
2
= I
0
,
a proportional increase of . But, if your income increases by , you are better o!
The reasoning is as follows: If your income increases by enough to allow you to buy (x
0
, y
0
) your
budget is represented by the dashed line. But with that budget, you will not consume (x
0
, y
0
); you
will consume a bundle with more y, less x, and higher utility. You respond to the change in relative
prices by altering your consumption. (See Figure 23.)
20
Figure 22
Figure 23
21
Table 1: Major Purchase Categories in CPI and Corresponding Weights
Category Weight Price Index (Dec. 2000)
All 100.0 174.1
Food & Beverage 16.3 169.5
Housing 39.6 171.6
Apparel 4.7 131.8
Transportation 17.5 155.2
Medical 5.8 264.1
Recreation 6.0 103.7

Education 2.7 115.4

Communication 2.7 92.3

Other Items 4.7 276.2


* Reference period is Dec. 1997, not 1982-84.
The CPI is really a weighted average of prices for a xed set of purchases. See Table 1 for an
example of some of the major categories and their weights. Note the slow growth of apparel prices
(usually attributed to the rapid rise in cheap imports) and the very rapid growth in medical prices.
The dierence between the rate of increase in the average price of the reference bundle and the
minimum increase in income necessary in order to maintain the original level of utility is called the
substitution bias in the CPI. Note that it depends on two things: how disproportionately prices
for dierent goods are rising, and how convex ones indierence curves are. The more convex the
indierence curves, and the more dispersion in relative price increases, the bigger the substitution
bias. The Boskin Commission estimates that on average substitution bias was about 0.5% per year
in the U.S. over the past couple decades.
There are lots of other, bigger sources of bias in the CPI. One that is hard to measure is quality bias:
consumer goods change over time, which makes it hard to hold the reference bundle constant. Some
new inventions since the early 80s: CD/DVD players, airbags and anti-lock breaks, the internet,
laser printers, portable PCs, cell phones, The X-Files. Roughly speaking, quality changes are
handled in the CPI by attempting to subtract the part of any price change that is due to quality,
measured at the time the higher quality product is introduced. So, for example, when airbags rst
became available manufacturers charged about $500 extra for them. Thus, when we compare the
price of a new car in 2001 that is equipped with airbags, to a similar model in 1990 without airbags,
we subtract $500 from the 2001 price before computing the price ratio.
22
4 Indirect Utility and the Expenditure Function
4.1 Indirect Utility
We characterized the solution to the problem
max
x1,x2
U(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I
as an optimal pair (x
0
1
, x
0
2
) that satises the rst order conditions (tangency, budjet constraint).
Note that (x
0
1
, x
0
2
) varies with (p
1
, p
2
, I). We call the optimal choices at a given level of prices and
income the demand functions and write:
x
1
= x
0
1
(p
1
, p
2
, I),
x
2
= x
0
2
(p
1
, p
2
, I).
Note that p
1
x
0
1
(p
1
, p
2
, I)+p
2
x
0
2
(p
1
, p
2
, I) = I, so the demand functions satisfy the budget constraint
by denition, even as prices vary. This gives rise to restrictions on the demand functions.
The highest level of utility that can be achieved under (p
1
, p
2
, I) is U(x
0
1
(p
1
, p
2
, I), x
0
2
(p
1
, p
2
, I)),
which is the utility of the optimal choices under the budget parameters. We dene the indirect
utility function to be
v(p
1
, p
2
, I) = max
x1,x2
U(x
1
, x
2
) s.t. p
1
x
1
+p
2
x
2
= I
= U(x
0
1
(p
1
, p
2
, I), x
0
2
(p
1
, p
2
, I)).
See Appendix ?? for more information about the derivatives of v; however, it should be clear to
the reader that v is decreasing in p
1
and p
2
, and increasing in I.
Example: U(x
1
, x
2
) = x

1
x

2
, where + = 1. We saw in Section 2.3 that x
0
1
(p
1
, p
2
, I) = I/p
1
and x
0
2
(p
1
, p
2
, I) = I/p
2
. Note that x
0
1
does not depend on p
2
, and x
0
2
does not depend on p
1
. The
indirect utility function is given by
v(p
1
, p
2
, I) =

1
p

2
I.
4.2 Expenditure Function
Instead of maximizing utility subject to a budget constraint, one could minimize spending, subject
to a utility constraint:
min
x1,x2
p
1
x
1
+p
2
x
2
s.t. U(x
1
, x
2
) = u
0
.
The Lagrangian is
L(x
1
, x
2
, ) = p
1
x
1
+p
2
x
2
(U(x
1
, x
2
) u
0
).
The FONC are:
p
1
U
x1
(x
1
, x
2
) = 0
p
2
U
x2
(x
1
, x
2
) = 0
U(x
1
, x
2
) = u
0
23
Note that the rst two conditions are equivalent to the tangency condition p
1
/p
2
= U
x1
/U
x2
. Take a
look at Figure 24. The parallel lines represent iso-cost lines: combinations such that p
1
x
1
+p
2
x
2
is constant. These can be thought of as the contours of the objective function. Their slope is
p
1
/p
2
. (Why?)
Figure 24
The utility maximization (U-max) and expenditure minimization (E-min) problems are called
dual problems, since they reverse the objective and the constraint.
What are the solutions to the E-min problem? The choices (x
1
, x
2
) that minimize spending subject
to a utility constraint are like demand functions, with the exception that they take utility, rather
than income, as given. We call these compensated demand functions, and denote them as follows:
x
1
= x
c
1
(p
1
, p
2
, u
0
),
x
2
= x
c
2
(p
1
, p
2
, u
0
).
Sometimes these are called Hicksian demand functions, after John Hicks, the English economist
who discovered them (and won the second Nobel prize in economics).
Under (p
1
, p
2
, I), and having chosen x
c
1
, x
c
2
, one spends a total of
p
1
x
c
1
(p
1
, p
2
, I) +p
2
x
c
2
(p
1
, p
2
, I).
We dene the expenditure function, (analagous to the indirect utility function for it gives the
amount spent assuming one has solved the E-min problem), to be
e(p
1
, p
2
, u
0
) = min
x1,x2
p
1
x
1
+p
2
x
2
s.t. U(x
1
, x
2
) = u
0
= p
1
x
c
1
(p
1
, p
2
, u
0
) +p
2
x
c
2
(p
1
, p
2
, u
0
)
Note that e(p
1
, p
2
, u
0
) tells you the minimum amount of money necessary to achieve utility u
0
under
prices (p
1
, p
2
).
24
Example: U(x

1
, x

2
) = x

1
x

2
, where + = 1. The Lagrangian is
L = p
1
x
1
+p
2
x
2
(x

1
x

2
+u
0
).
FONC:
L
x1
= p
1
x
1
1
x

2
= 0
L
x2
= p
1
x

1
x
1
2
= 0
_

p
1
p
2
=


x
2
x
1
x
2
=


p
1
p
2
x
1
.
Substituting this into the budget constraint,
x

1
_


p
1
p
2
x
1
_

= u
0
,
which implies
x
1
= u
0
_
p
2
p
1

,
x
2
= u
0
_
p
1
p
2

.
25
5 Comparative Statics of Consumer Choice
In this section we characterize the changes in consumer demands that occur as income and prices
vary. Our goal is to describe the consumers demand functions. Analytically, the demand functions
for the goods x and y are a pair of functions
x = x(p
x
, p
y
, I)
y = y(p
x
, p
y
, I)
that describe the consumers optimal choices of x and y, given prices and income. As you can
imagine, the nature of these functions is important in a wide variety of applications.
5.1 Change in Demand with Respect to Income, Engel Curves
As income changes, the budget constrint shifts in a parallel fashion: inward if I decreases, outward
if I increases.
In commodity space, (xy-space, or in our case the plane), the tangencies of the budget constraints
with higher and higher indierence curves trace out the income expansion path shown in Figure 25.
For a good x, if the quantity of x demanded increases with income, then x is said to be a normal
good. For some goods, the quantity demanded falls with incomesuch goods are called inferior.
Analytically, x/I > 0 x normal, while x/I < 0 x inferior. (See Figure 26.)
Figure 25
A couple interesting implications of the budget constraint for changes in x and y with respect to
income:
26
(a) x, y normal (b) x normal, y borderline inferior (c) x inferior, y normal
Figure 26
Using the fact that income is always exhausted,
I = p
x
x +p
y
y
dI = p
x
dx +p
y
dy
1 = p
x
dx
dI
+p
y
dy
dI
,
so clearly both goods cannot be inferior for in that case the RHS would be negative.
Starting from the previous equation,
xp
x
I

I
x
dx
dI
+
yp
y
I

I
y
dy
dI
= 1,
which is equivalent to
s
x
e
x
+s
y
e
y
= 1,
where s
x
and s
y
are the expenditure shares, (the fraction of income spent on each good),
and e
x
and e
y
are the income elasticies, (the percent change in demand x/x divided by the
percent change in income I/I, or, in the limit as I 0, (dx/x)/(dI/I)). This equation
can be summarized as follows: the expenditure-weighted sum of income elasticies is unity.
The relation between x and I, holding prices constant, is called the Engel curve, and is shown in
Figure 27.
The data in Table 2 conrm Engels Law, that as income increases, the expenditure share of food
decreases. The implication is that income elasticity of food is less than unity. Why? Let x be food.
Then s
x
= xp
x
/I is the expenditure share of food, and
ds
x
dI
=
p
x
dx
dI
I

1
I
2
xp
x
=
xpx
I
I
x
dx
dI
I

1
I
xp
x
I
=
s
x
I
(e
x
1),
27
Figure 27: The Engel curve starts from the origin if x = 0 when I = 0, (which is a reasonable
assumption). The Engel curve has positive slope if x is a normal good.
(a) Linear Engel curves: dx/dI =
x/I ex = 1.
(b) Convex Engel curves:
dx/dI > x/I ex > 1.
(c) Concave Engel curves:
dx/dI < x/I ex < 1.
Figure 28
or
I
s
x
ds
x
dI
= e
x
1.
So, if e
x
< 1, then food share is declining with income. An alternative proof employs a favorite
trick of economists, taking natural logs:
log s
x
= log x + log p
x
log I
d log s
x
d log I
=
d log x
d log I
1,
or
I
s
x
ds
x
dI
= e
x
1.
In some contexts, the food share is used as an indicator of welfare. It has been proposed that
families in dierent countries with the same food share are equally well o.
5.2 Change in Demand with Respect to Price
A change in one of the prices causes the budget line to rotate; as it does so, the tangencies with
higher and higher indierence curves trace out the price consumption path.
28
Table 2: Food Share of Std. Budget in Various Years
Year Food Share in Std. Budget

1935-39 35.4
1952 32.2
1963 25.2
1992 19.6
2000 16.3
* Budget used in calculation of CPI.
Figure 29
You should be familiar with the demand curve, which is the graph of the demand function x(p
x
) =
x(p
x
, p
0
y
, I
0
), where p
0
y
and I
0
are xed. (See Figure 30.)
Note that we traditionally plot demand, (the dependent variable), on the horizontal axis and the
price, (the independent variable), on the vertical axis.
3
The negative slope of the demand curve
reects the idea that consumption of a commodity falls as its price increases. However, demand
curves are not necessarily downward sloping! We turn now to a decomposition of the change in
demand due to a change in price. We show that there are two factors:
1. the curvature of the indierence curves, and
2. the nature of the income eect on demand.
5.3 Graphical Decomposition of a Change in Demand
Suppose p
x
increases from p
0
x
to p
1
x
; demand changes from (x
0
, y
0
) to (x
1
, y
1
). We can deocmpose
the change from x
0
to x
1
as follows:
3
We owe this convention to Alfred Marshall. As a result of this, steep demand curves are inelastic, whereas at
demand curves are elastic.
29
Figure 30
1. First, think of the change in x that arises purely due to the fact that x now costs more.
Draw a budget line with slope p
1
x
/p
y
that still allows the consumer to reach the indierence
curve through (x
0
, y
0
) (call this indierence curve u
0
). Note that, since its steeper than the
old budget line, it has a tangency with u
0
to the left of (x
0
, y
0
).
4
This articial budget
constraint is represented by the dashed line in Figure 31.
2. Second, move from this intermediate point to the nal optimum. Observe that this movement
is a movement along an income expansion path, since the intermediate optimum occurs where
u
0
has a tangency with a budget line with slope p
1
x
/p
y
.
Analytically,
x = x
1
x
0
= (x
1
x

) + (x

x
0
),
where x

denotes the aforementioned intermediate optimum. We refer to the rst change (x


1
x

),
holding utility constant, as the substitution eect. We refer to the second change (x

x
0
), as the
income eect. Thus we write
x = x
S
+ x
I
.
5.4 Substitution Eect
The substitution eect represents movement along an indierence curve. It tells you how far to
move in order for the indierence curve to be parallel to the new budget line, i.e. in order for the
MRS to equal the new price ratio. Obviously, then, if the indierence curves are relatively at,
you have to go a long way before the MRS equals the new price ratio, and the substitution eect is
substantial. If the indierence curves are highly convex, the MRS changes rapidly and you do not
need to go far: the substitution eect is small. See Figure 33.
4
Assuming DMRS.
30
Figure 31
(a) Step 1: move to new tangency on
old indierence curve.
(b) Step 2: Move along IEP to new optimum.
Figure 32
Note that if p
x
> 0, the substitution eect is negative. (Why?) What about the substitution
eect of p
x
on y?
5.5 Income Eect
Intuitively, one might think the income eect is larger the greater x
0
, i.e. the greater x was in
the rst place. If, initially, you consumed very little x, the income eect would be relatively small.
Take a look at Figure 34:
Notice that the intermediate budget constraint almost passes through (x
0
, y
0
). (It always
cuts below, if not by much.)
So approximately, the income eect is proportional to the change in income represented by
the change in income from (x
0
, y
0
) to the nal budget constraint.
31
(a) u
0
at more substantial substi-
tution eect
(b) u
0
highly curved lesser substitu-
tion eect
Figure 33
Figure 34
What is the change in income? The nal budget constraint limits the consumer to I, just as the
initial constraint does. Therefore I = p
0
x
x
0
+p
y
y
0
. In order to be able to aord (x
0
, y
0
) under the
new prices, you would need p
1
x
x
0
+ p
y
y
0
, or I = p
x
x
0
more than before. For a small change
in p
x
, the intermediate optimum is close to the initial one, so the dierence in income from the
intermediate constraint to the nal one is approximately p
x
x
0
. (The approximation is exact in
the limit p
x
0.)
This conrms our intuition: the movement along the income expansion path from the intermediate
optimum to the nal optimumthe income eectwill be larger, the larger was x
0
, our initial level
of consumption of x.
32
6 Slutskys Equation
6.1 Review
Expenditure function:
e(p
1
, p
2
, u
0
) = min
x1,x2
p
1
x
1
+p
2
x
2
s.t. U(x
1
, x
2
) = u
0
= p
1
x
c
1
(p
1
, p
2
, u
0
) +p
2
x
c
2
(p
1
, p
2
, u
0
),
where x
c
1
and x
c
2
are the compensated demands, the cheapest choices that enable one to achieve
utility level u
0
at prices (p
1
, p
2
).
The Lagrangian for the E-min problem is
L(x
1
, x
2
, ) = p
1
x
1
+p
2
x
2
(U(x
1
, x
2
) u
0
).
The FONC are:
p
1
U
x1
(x
1
, x
2
) = 0,
p
2
U
x2
(x
1
, x
2
) = 0,
U(x
1
, x
2
) = u
0
.
As for the derivatives of the expenditure function with respect to prices,
e(p
1
, p
2
, u
0
)
p
1
= x
c
1
(p
1
, p
2
, u
0
) +p
1
x
c
1
(p
1
, p
2
, u
0
)
p
1
+p
2
x
c
2
(p
1
, p
2
, u
0
)
p
1
. ()
In Appendix ??, we discussed the Envelope Theorem, which says the second and third terms on
the RHS cancel.
Proof: Recall that U(x
c
1
(p
1
, p
2
, u
0
), x
c
2
(p
1
, p
2
, u
0
)) = u
0
. Dierentiate both sides with respect to p
1
:
U
x1
x
c
1
p
1
+U
x2
x
c
2
p
1
= 0.
But U
x1
= p
1
/ and U
x2
= p
2
/ by the FONC. It follows by substitution that
p
1

x
c
1
p
1
+
p
2

x
c
2
p
1
= 0,
which means
p
1
x
c
1
p
1
+p
2
x
c
2
p
1
= 0.
Thus we have
e(p
1
, p
2
, u
0
)
p
1
= x
c
1
(p
1
, p
2
, u
0
).
There is a story we tell to go along with this. If you initially are minimizing expenditure, and the
price of good 1 rises, what do you do? Your rst order response is simply to continue buying the
33
old bundlethis increases your spending by x
c
1
p
1
. That is the rst term on the RHS of ().
But then you would like to adjust your choices of goods 1 and 2 to reect the new prices. The
adjustments are the second and third terms on the RHS of (). But because your initial choices
were optimalthey satised the FONCwhen you attempt to adjust x
1
and x
2
you dont save any
more.
Now we are ready to analyze what happens to the uncompensated, or regular demand functions
when prices rise/fall. Suppose we start with prices (p
0
1
, p
0
2
) and income I
0
. Initially the optimal
choices are x
0
1
= x
1
(p
0
1
, p
0
2
, I
0
) and x
0
2
= x
2
(p
0
1
, p
0
2
, I
0
), where x
1
() and x
2
() are the regular demand
functions.
We decompose the eect of a change in price p
1
= p
1
1
p
0
1
as follows:
(a) Starting from (x
0
1
, x
0
2
), imagine the adjustment you would make if you could remain on the
old indierence curve. This would lead you to a new bundle (x

1
, x

2
). Since prices have risen
this bundle costs more than you were spending before. This move is called the substitution
eect of the price increase.
(b) Then, from (x

1
, x

2
), imagine the adjustment you would make to get back to the original
income level. This would be a move inward along an income expansion path (IEP), and
would lead you to (x
1
1
, x
1
2
). This move is called the income eect of a price increase.
Figure 35
Note that the total change in x
1
is
x
1
= x
1
1
x
0
1
= (x
1
1
x

1
) + (x

1
x
0
1
)
= x
I
1
+ x
S
1
.
What are the relative magnitudes of the constituent parts? To begin, observe that (x
0
1
, x
0
2
) and
34
(x

1
, x

2
) are on u
0
. Now,
x
0
1
= x
1
(p
0
1
, p
0
2
, I
0
) = x
c
1
(p
1
, p
2
, u
0
). ()
Also,
x

1
= x
c
1
(p
1
1
, p
0
2
, u
0
),
so
x
S
1
= x

1
x
0
1
= x
c
1
(p
1
1
, p
0
2
, u
0
) x
c
1
(p
0
1
, p
0
2
, u
0
)

x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
p
1
.
The substitution eect depends on the rate at which compensated demands change: this is purely a
function of the curvature of the indierence curves.
How about the income eect?
x
I
1
= x
1
1
x

1
First note that x
1
1
= x
1
(p
1
1
, p
0
2
, I
0
): it is the regular demand given (p
1
,
1
, p
0
2
, I
0
). But what is x

1
? It
is the choice one would make with enough income remain on u
0
even at the new prices. How much
money would it take? The answer is e(p
1
1
, p
0
2
, u
0
)! So,
x

1
= x
1
(p
1
1
, p
0
2
, e(p
1
1
, p
0
2
, u
0
)).
Thus
x
I
1
= x
1
(p
1
1
, p
0
2
, I
0
) x
1
(p
1
1
, p
0
2
, e(p
1
1
, p
0
2
, u
0
))

x
1
(p
0
1
, p
0
2
, I
0
)
I
(I
0
e(p
1
1
, p
0
2
, u
0
)).
So the income eect depends on the income derivative of demand times the change in income
I = I
0
e(p
1
1
, p
0
2
, u
0
). Note that I < 0 since one would need more than I
0
to achieve U = u
0
at prices (p
1
1
, p
0
2
).
But how big is I? We need one last trick. We know that I
0
= e(p
0
1
, p
0
2
, u
0
), so we can write
I = I
0
e(p
1
1
, p
0
2
, u
0
)
= e(p
0
1
, p
0
2
, u
0
) e(p
1
1
, p
0
2
, u
0
)

e(p
0
1
, p
0
2
, u
0
)
p
1
(p
0
1
p
1
1
)
=
e(p
0
1
, p
0
2
, u
0
)
p
1
(p
1
)
=
e(p
0
1
, p
0
2
, u
0
)
p
1
p
1
,
35
(which is negative for an increase in p
1
). Finally we have
e(p
0
1
, p
0
2
, u
0
)
p
1
= x
c
1
(p
0
1
, p
0
2
, u
0
) by ()
= x
0
1
by ()
and combining the last few results,
I x
0
1
p
1
.
Note that the size of the income eect depends on the original level of consumption of x
1
.
Putting it all together,
x
I
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
I
I
=
x
1
(p
0
1
, p
0
2
, I
0
)
I
x
0
1
p
1
.
Thus
x
1
= x
I
1
+ x
S
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
I
x
0
1
p
1
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
p
1
,
or
x
1
p
1
= x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
.
Now in the limit p
1
0 the ratio x
1
/p
1
equals the derivative of the regular demand function
with respect to p
1
. We have established:
x
1
(p
0
1
, p
0
2
, u
0
)
p
1
= x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
+
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
.
This is called Slutskys equation, after the Russian economist who proved it over 100 years ago.
Slutskys equation says the derivative of the regular demand function with respect to p
1
is a com-
bination of the income and substitution eects. The income eect depends on the derivative of
demand with respect to income, times the original level of consumption of x
1
. The substitution
eect depends on the derivative of the compensated demand function.
A useful feature of Slutskys equation is that it provides a way to recover information about indif-
ference curves from the derivatives of the demand functions with respect to prices and incomes. In
principle, we can observe x
1
/p
1
and x
1
/I, which would enable us to infer
x
c
1
(p
0
1
, p
0
2
, u
0
)
p
1
=
x
1
(p
0
1
, p
0
2
, I
0
)
p
1
+x
0
1
x
1
(p
0
1
, p
0
2
, I
0
)
I
.
Suppose we get an estimate of x
c
1
/p
1
that is nearly zero. The indierence curves must therefore
be almost Leontief (right angles).
36
7 Using Market Level Demand Curves
Since the demand curve graphs x = f(p
x
, p
y
, I), if p
y
or I changes, the demand curve shifts. For
example, if income were to increase by dI > 0, then at a given price, demand would increase by
dx = (x/I)dI. For a normal good x/I > 0, so the demand curve would shift to the right as in
Figure 36.
Figure 36
If the elasticities of demand are approximately constant, then
d(log x) =
dx
x
=
_
x
I

I
x
_
dI
I
= e
x
dI
I
= e
x
d(log I),
where e
x
is the income elasticity of demand for x.
5
Similarly, if p
y
changes, the demand curve shifts
unless x/p
y
= 0 (as in the case of Cobb-Douglas preferences). If x/p
y
< 0, and increase in the
price of y causes the demand curve to shift to the right.
For the purposes of evaluating the eect of relatively small changes in prices and income, we often
assume the demand function has constant elasticities:
x
p
x

p
x
x
=
log x
log p
x
=
xx
(constant)
x
p
y

p
y
x
=
log x
log p
y
=
xy
(constant)
x
I

p
x
x
=
log x
log I
= e
x
(constant)
This is equivalent to assuming that the demand function is log-linear:
log x =
xx
log p
x
+
xy
log p
y
+e
x
log I +c,
5
You should be familiar with the concept of elasticity from Econ 1. In particular, you should be able to verify
that elasticity is a unitless quantity.
37
where c is a constant. Note that homogeneity implies
xx
+
xy
+e
x
= 0. Put dierently, if prices
and income all rise by one percent, then x remains constant.
6
As you recall from introductory economics, the market is constructed by introducing a supply curve
of the form x = S(p
x
). (See Figure 37.) It is usually assumed that supply is upward sloping. (We
defer the derivation of market supply curves until later.) For now, we shall assume that elasticity
of supply is constant:
dS(p
x
)
dp
x

p
x
S(p
x
)
=
x
,
where
x
denotes elasticity of supply. We now can combine supply and demand curves to analyze
the eects of exogenous shocks to income or other prices. We have
x = S(p
x
) = f(p
x
, p
y
, I),
a system of two equations in two unknowns, p
x
and x (unit price of x and quantity of x, respectively),
given income and other prices. (See Figure 38.)
Figure 37
7.1 An Increase in Income
Obviously, both x and p
x
increase with I. But by how much? Take a look at Figure 39. Starting
at equilibrium, with x = x
0
and p
x
= p
0
x
, the changes in demand and supply are:
x
x
=
xx
p
x
p
x
+e
x
I
I
(demand)
x
x
=
x
p
x
p
x
(supply)
6
A proof would involve recognizing that if x remains constant, then so does log x, and therefore setting the total
dierential of log x equal to zero. The details are left to the reader.
38
Figure 38
The proportional changes in supply and demand have to be the same in order to restore equilibrium.
Therefore

xx
p
x
p
x
+e
x
I
I
=
x
p
x
p
x
,
which implies
p
x
p
x
=
_
e
x

xx
_
I
I
.
Note that
x
> 0 and
xx
< 0, so
x

xx
is strictly positive. Furthermore,
x
x
=
x
p
x
p
x
=
_

x
e
x

xx
_
I
I
.
For example, suppose the following:

x
= 0.60 (short run)

xx
= 1.40
e
x
= 0.40
If I/I = 0.10 (10% increase), then
p
x
p
x
= (0.40)(0.10) 0.02,
x
x
0.012.
As an exercise, calculate the eect of a 10% drop in the price of a substitute good (good y) on the
market for x. Use an estimate for the cros-price elasticity between x and y of 0.67 (
xy
= 0.67).
39
Figure 39
7.2 Tax Incidence
If a tax of t dollars per unit is imposed on x, it creates a gap between the price that consumers pay
and the price that producers receive, of t dollars per unit. You are presumed to be familiar with
the diagram shown in Figure 40.
Starting from an equilibrium at (p
0
x
, x
0
), price received by producers falls to p
1
x
, the price paid by
consumers rises to p
1
x
+t, and the quantity falls to x
1
. Consider the two marekts shown in Figure 41,
each with the same tax. Obviously, the eect of the tax on the prices paid/received by the two sides
depends on the relative elasticities of supply and demand. To see this more formally, we proceed
based on the assumption that elasticities are roughly constant. Letting p
x
denote the price received
by producers, the change in supply is
x
x
=
x
p
x
p
x
.
The change in prices for consumers is p
x
+t. Therefore, the change in quantity demanded is
x
x
=
xx
_
p
x
+t
p
x
_
.
Market equilibrium requires that change in demand equals change in supply:

xx
_
p
x
+t
p
x
_
=
x
p
x
p
x
Solving for the equilibrium change in prices, we have

xx
t
p
x
=
p
x
p
x
(
x

xx
),
40
and
p
x
p
x
=
_

xx

xx
_
t
p
x
. (t/p
x
is the proportional tax rate)
Since
x
> 0 and
xx
< 0, so
x

xx
is strictly positive, and therefore p
x
< 0. With regard to
quantity,
x
x
=
x
p
x
p
x
=
_

x

xx

xx
_
t
p
x
< 0.
For producers, the change in price is
p
x
p
x
=
_

xx

xx
_
t
p
x
,
and for consumers it is
p
x
+t
p
x
=
_

xx

xx
_
t
p
x
+
t
p
x
=
_

x

xx
_
t
p
x
> 0.
Notice that the ratio of the changes in prices for producers versus consumers is
xx
/
x
. So, if
demand is highly inelastic, i.e. |
xx
| is small (e.g.
xx
= 0.1), and supply is moderately elastic
(e.g.
x
= 1.0), then producer prices dont fall by much relative to consumer prices. On the other
hand, if demand is highly elastic, i.e. if
xx
is big (e.g.
xx
= 3.0), then producer prices are more
aected.
Last we consider the eect of a per unit subsidy of s on the price of x. (For example, prior to
the recent rise in electricity rates, electricity prices were subsidized throughout most of California.)
The change in price received by producers is p
x
, whereas the change in price paid by consumers
is p
x
s. The proportional changes in quantity are:
x
x
=
xx
_
p
x
s
p
x
_
(demand)
x
x
=
x
p
x
x
(supply)
Setting the two equal, we have
p
x
p
x
=
_

xx

xx
_
s
p
x
> 0,
which implies that part of the eect of the subsidy is mitigated by a rise in prices. In fact, the
change in price paid by consumers is
p
x
s
p
x
=
_

xx

xx
_
s
p
x

s
p
x
=
_

x

xx
_
s
p
x
< 0.
Note that
x
/(
x

xx
) is less than one in absolute value.
41
Figure 40
(a) Demand inelastic, supply elastic. (b) Demand elastic, supply inelastic.
Figure 41
42
8 Labor Supply
In this section we consider the choice of how many hours to work by an individual who faces an
hourly wage w > 0, and also has non-labor income y. The individual is assumed to value leisure
and consumption of goods x, using a utility function u(x, ). We assume there is an upper bound
T on leisure, and that the sum of leisure and hours of work h is T:
+h = T, or h = T .
The graph looks a little unusual since preferences are only dened up to the point where = T.
(See Figure 42.)
Figure 42
The budget constraint is px = wh +y but we shall assume p = 1. The consumers objective is
max
x,
u(x, ) s.t. x = w(T ) +y, or x +w = y +wT
Note that if you think of the consumption bundle as (x, ), then the budget constraint says the
total cost of the bundle has to be y +wT for this is all the income you would have if you bought
no leisure. This full income depends on w, and therein lies the key dierence between labor
supply and other consumer choice problems: as the price of one good (leisure) rises, the consumer
is actually richer. Intuitively this is because a worker is a net seller of leisure: he or she starts at
an endowment point (x, ) = (y, T). From there he or she can trade with the market by giving
up leisure in return for cash, which is then used to purchase goods.
43
We proceed by the method of Lagrange:
L(x, , ) = u(x, ) (x +w y +wT)
L
x
= u
x
(x, ) = 0
L

= u

(x, ) w = 0
L

= x w +y wT = 0
The rst two FONC imply the usual tangency condition: u

(x, )/u
x
(x, ) = w. The solutions are:
x = x(w, y)
= (w, y)
h(w, y) = T (w, y)
Now consider the rise in w (from w
0
to w
1
) shown in Figure 43. As you can see, the substitution
Figure 43
eect causes a drop in , or equivalently a rise in h. But the income eect works in the opposite
direction: as a net seller of leisure the agent is better o and uses some of her extra income to buy
more leisure.
To formally analyze the income and substitution eects we rely on the expenditure function for the
44
labor supply case: this is the amount of non-labor income needed to achieve utility u
0
, given w:
e(w, u
0
) = min
x,
x w(T ) s.t. u(x, ) = u
0
L(x, , ) = x w(T ) [u(x, ) u
0
]
L
x
= 1 u
x
(x, ) = 0
L

= w u

(x, ) = 0
L

= u(x, ) +u
0
= 0
The rst two FONC imply the tangency condition: u

(x, )/u
x
(x, ) = w. The solutions are:
x = x
c
(w, u
0
)
=
c
(w, u
0
)
h
c
(w, u
0
) = T
c
(w, u
0
)
The expenditure function is thus
e(w, u
0
) = x
c
(w, u
0
) w[T
c
(w, u
0
)] = x
c
(w, u
0
) wh
c
(w, u
0
),
and
e
w
=
x
c
w
w
h
c
w
. .
0
h
c
= h
c
.
To see that x
c
/w wh
c
/w = 0, we use the same trick as we did in Section 6 when dealing
with the usual expenditure function. So, recalling that (x
c
(w, u
0
),
c
(w, u
0
)) yields utility u
0
,
u(x
c
(w, u
0
),
c
(w, u
0
)) = u
0
,
and therefore dierentiating both sides,
u
x
(x
c
(w, u
0
),
c
(w, u
0
))
x
c
w
+u

(x
c
(w, u
0
),
c
(w, u
0
))

c
w
= 0.
But wu
x
= u

by the tangency condition, and h


c
/w =
c
/w, hence the desired result.
(Again, this is an example of the Envelope Theorem.)
To summarize, we have shown that e/w = h
c
(w, u
0
). To understand this, think of your mom
when she nds out you got a raise at your summer job: she reduces your allowance by an amount
proportional to how much you were working.
Now lets see how leisure choice depends on wages. Assume we start with (w
0
, y
0
), and that w rises
from w
0
to w
1
. The rise in w causes a substitution eect and an income eect:
=
S
+
I
.
As usual, we can write

S
=

c
w
w,
45
representing the compensated adjustment to the higher cost of leisure on the indierence curve
corresponding to level u
0
. Also,

I
= (w
1
, y
0
) (w
1
, y
1
),
where y
0
= original non-labor income, and y
1
= e(w
1
, u
0
). We use our standard trick of taking
rst order approximations, based on the expenditure function. First, we can approximate
(w
1
, y
0
) (w
1
, y
1
)
(w
1
, y
1
)
y
(y
0
y
1
),
and recognizing that y
0
= e(w
0
, u
0
),
y
0
y
1
= e(w
0
, u
0
) e(w
1
, u
0
)

e(w
0
, u
0
)
w
(w)
= h
c
(w
0
, u
0
)(w)
= h
0
w.
So,

(w
1
, y
1
)
y
h
0
w.
The income eect is proportional to h
0
w: if you had been working more, there would be a bigger
positive income eect. Finally, then, we have
=
S
+
I
=

c
(w
0
, u
0
)
w
w +
(w
1
, y
1
)
y
h
0
w.
Dividing both sides w, and taking the limit w 0,

w
= lim
w0

w
=

c
(w
0
, u
0
)
w
+h
0
(w
0
, y
0
)
y
.
This is Slutskys equation for leisure demand. In terms of hours, recall that h = T , so
h
w
=

w
and
h
y
=

y
,
and therefore
h
w
=
h
c
(w
0
, u
0
)
w
+h
0
h(w
0
, y
0
)
y
.
When the wage rises there is a positive substitution eect and a negative income eect on labor
supply. Note in particular that when a person gets a raise, he wont necessarily work more.
46
9 Intertemporal Consumption
The two-period consumption model concerns a consumer whose lifetime spans two periods. In
period one the consumer has income y
1
and spends c
1
; in period two the consumer has income y
2
and spends c
2
. The consumer can borrow or lend at a rate of interest equal to r.
We express the consumers budget constraint in terms of period-two dollars. The choice is arbitrary,
but this way it ends up simplifying the algebra for then we basically have two goods with prices 1+r
and 1, respectively (rather than 1 and 1/(1 + r), which would be the case in period-one dollars).
Having 1 + r in the numerator, not the denominator, is a big help. Total consumption is limited
by total income, so the budget constraint is given by
(1 +r)c
1
+c
2
= (1 +r)y
1
+y
2
.
The consumers objective is to solve
max U(c
1
, c
2
) s.t. (1 +r)c
1
+c
2
= (1 +r)y
1
+y
2
.
The Lagrangean is
L(c
1
, c
2
, ) = U(c
1
, c
2
) [(1 +r)c
1
+c
2
(1 +r)y 1 y
2
],
and the FONC are
L
c
1
= U
c1
(c
1
, c
2
) (1 +r) = 0
L
c
2
= U
c2
(c
1
, c
2
) = 0
L

= (1 +r)c
1
c
2
+ (1 +r)y
1
+y
2
= 0
These give a rise to the tangency condition U
c1
/U
c2
= 1 + r and the budget constraint, as usual.
The solutions are functions of r, y
1
, and y
2
:
c
1
= c
1
(r, y
1
, y
2
)
c
2
= c
2
(r, y
1
, y
2
)
These demand functions are a little unusual because they specify not just total available resources,
or wealth w = (1 + r)y
1
+ y
2
, but also the composition of w. To clarify the eects of a change
in r on c
1
it is helpful to dene two other consumption functions, that depend on the interest rate
and total wealth (measured in period-two dollars):
c
1
= c
w
1
(r, w)
c
2
= c
w
2
(r, w)
These optimal choice functions are related by:
c
1
(r, y
1
, y
2
) = c
w
1
(r, (1 +r)y
1
+y
2
)
c
2
(r, y
1
, y
2
) = c
w
2
(r, (1 +r)y
1
+y
2
)
47
You can see that as we change r, the eect on c
1
(r, y
1
, y
2
) depends on both c
1
/r and c
1
/w.
Now lets dene the expenditure function as the minimum cost to reach a given level of utility
(again, measured in period-two dollars). Specically, dene e as follows:
e(r, u
0
) = min(1 +r)c
1
+c
2
s.t. U(c
1
, c
2
) = u
0
The Lagrangian is
L(c
1
, c
2
, ) = (1 +r)c
1
+c
2
[U(c
1
, c
2
) u
0
],
and the FONC are
L
c
1
= 1 +r U
c1
(c
1
, c
2
) = 0
L
c
2
= 1 U
c2
(c
1
, c
2
) = 0
L

= U(c
1
, c
2
) +u
0
= 0
The solutions are the compensated demand functions c
c
1
(r, u
0
) and c
c
2
(r, u
0
). As usual
e(r, u
0
) = (1 +r)c
c
1
(r, u
0
) +c
c
2
(r, u
0
).
Dierentiating,
e(r, u
0
)
r
= c
c
1
(r, u
0
) + (1 +r)
c
c
1
r
+
c
c
2
r
and (as usual) it is easy to show that (1 +r)c
c
1
/r +c
c
2
/r = 0, so
e(r, u
0
)
r
= c
c
1
(r, u
0
).
Thus we have three optimal consumption functions for rst period consumption:
c
1
(r, y
1
, y
2
), which depends on y
1
and y
2
c
w
1
(r, w), which depends only on w
c
c
1
(r, u
0
), which depends on utility
We also have two relations connecting the three:
c
1
(r, y
1
, y
2
) = c
w
1
(r, (1 +r)y
1
+y
2
) ()
c
c
1
(r, u
0
) = c
w
1
(r, e(r, u
0
)) ()
Now it may seem clear why we dened c
w
1
: its the function that links the compensated demand and
the demand we ultimately are interested in, c
1
(r, y
1
, y
2
). We can dierentiate these two equations
with respect to r. Starting with (),
c
1
(r, y
1
, y
2
)
r
=
c
w
1
(r, (1 +r)y
1
+y
2
)
r
+y
1
c
w
1
(r, (1 +r)y
1
+y
2
)
w
. (*)
This means that when you change r, the response of the demand for c
1
as a function of (r, y
1
, y
2
)
has an income eect, reecting the fact that as r rises, so does the value of wealth.
48
From () we get an expression like weve seen before:
c
c
1
(r, u
0
)
r
=
c
w
1
(r, e(r, u
0
))
r
+
c
w
1
(r, e(r, u
0
))
w

e(r, u
0
)
r
=
c
w
1
(r, e(r, u
0
))
r
+
c
w
1
(r, e(r, u
0
))
w
c
c
1
(r, u
0
)
Rearranging, we get a Slutsky equation for c
w
1
:
c
w
1
(r, e(r, u
0
))
r
=
c
c
1
(r, u
0
)
r

c
w
1
(r, u
0
)
w
c
c
1
(r, u
0
)
=
c
c
1
(r, u
0
)
r
c
1
(r, y
1
, y
2
) (**)
assuming u
0
is the level of utility one can achieve with income (y
1
, y
2
) and interest rate r.
Finally, plugging (**) into (*),
c
1
(r, y
1
, y
2
)
r
=
c
w
1
(r, (1 +r)y
1
+y
2
)
r
+y
1
c
w
1
(r, (1 +r)y
1
+y
2
)
w
=
c
c
1
(r, u
0
)
r
+
c
w
1
(r, e(r, u
0
))
w
[y
1
c
1
(r, y
1
, y
2
)]
=
c
c
1
(r, u
0
)
r
+
c
w
1
(r, e(r, u
0
))
w
s
1
(r, y
1
, y
2
),
where s
1
(r, y
1
, y
2
) = y
1
c
1
(r, y
1
, y
2
) is the optimal level of period-one savings.
The income eect of a rise in r on optimal consumption c
1
(r, y
1
, y
2
) is positive or negative, depending
whether s
1
is positive or negative.
For a saver, s
1
> 0 and a rise in r has a positive income eect (because the consumer is a net
supplier of funds to the market, as in the case of labor supply).
But for a borrower, s
1
< 0 and a rise in r has a negative income eect (because the consumer is a
net demander of funds, as in the case of basic commodity demand).
49
10 Production and Cost I
The technology available to a given rm is is summarized by its production function. This function
gives the quantities of output produced by various combinations of inputs. For example, an airline
uses labor inputs, fuel, and machinery (airplanes, loading equipment, etc.) to produce the output
passenger seats. We write y = f(a, b) to signify that with inputs a and b, it is possible to produce
y units of output.
Examples:
One Input
y = a

DRS if 0 < < 1


CRS if = 1
IRS if > 1
y =
_
0 a < a
1 a > a
Two Inputs
y = a

(Cobb-Douglas)
DRS if + < 1
CRS if + = 1
IRS if + > 1
y = min{a, b} (Leontief)
y = a +b (Additive)
For two or more inputs, production functions are a lot like utility functions. The important dif-
ference is that output is measurable and has natural units (e.g. passenger seats). Its as if the
indierence curves have numbers attached to them that matter.
A second, less obvious, way to summarize technology is to compute the cost associated with pro-
ducing a given output level y, at xed prices for the inputs. In principle, it is easy to nd the cost
function if you know the production function using two steps:
1. nd all possible ways of producing y
2. nd the cheapest one, and evaluate its cost
Most of the economic behavior of rms is studied via the cost function. In the next few sections,
we demonstrate how to derive the cost function and illustrate the connection between its properties
and those of the production function.
50
10.1 One-Factor Production and Cost Functions
10.1.1 Production Functions
Suppose there is only one input (apart from, perhaps a set-up cost). Then we have a picture
along the lines of Figure 44. Note that f(0) = 0 by convention.
Figure 44
Denitions and Facts:
The marginal product of factor a is the increase in y that accompanies a unit increase in a:
MP
a
=
f(a)
a
= f

(a).
Factor a is said to be useful if f

(a) > 0.
The average product of factor a is the ratio of total output to total input (of a):
AP
a
=
f(a)
a
.
If the MP of factor a is increasing, then f

(a) > 0 and we say that there are increasing


marginal returns: as the scale of output is expanded, each additional unit of input contributes
more. If the MP is decreasing, then f

(a) < 0 and we say there are diminishing marginal


returns. (See Figure 45.)
If MP
a
> AP
a
, then AP
a
is increasing; if MP
a
< AP
a
, then AP
a
is decreasing.
Think baseball, with AP = career batting average and MP = season batting average. A
hitter who is having a season better than his career to date raises his career average. (See
Figure 46.) In general,
dAP
a
da
=
af

(a) f(a)
a
2
=
1
a
_
f

(a)
f(a)
a
_
=
1
a
(MP
a
AP
a
).
51
(a) Increasing returns to scale. (b) Decreasing returns to scale.
Figure 45
Figure 46: At a = a
1
, AP = f(a
1
)/a < f

(a) = MP, AP is increasing. At a = a


2
, the opposite is
true.
Examples:
f(a) = ka, where k > 0 (linear). AP
a
= MP
a
= k.
f(a) = a

, where 0 < < 1 (concave). (See Figure 47.)


f(a) = 9a
2
a
3
, a < 6. See Figure 48. For this function we have the following:
f

(a) = 18a 3a
2
[f

(a) 0 a 6]
f

(a) = 18 6a
_
f

(a) > 0 a < 3


f

(a) < 0 a > 3


52
Figure 47
y
a
y = 9a
2
a
3
3 6
Figure 48
10.1.2 Cost Functions
What is the cost function for a one-factor production function? Let w dentoe the price per unit of
factor a. Then
C(y, w) = min wa s.t. y = f(a).
But y = f(a) implies a = f
1
(y).
7
Therefore C(y, w) = wf
1
(y). See Figure 49 for an illustration
of this process. If w is xed, then we often write the cost function as a function of y only: C(y).
Dene marginal cost MC(y) = C

(y), and average cost AC(y) = C(y)/y.


Examples:
y = f(a) = ka (linear) a = y/k (linear input requirement function)
C(y, w) = w
_
y
2
_
=
1
2
wy (linear in both y and w)
7
Assume, for the moment, that f is one-to-one.
53
(a)
(b)
Figure 49: The graph shown in (b) is obtained by rotating quadrant II in (a) 90 degrees clockwise.
y = f(a) =

a a = y
2
(convex input requirement function)
C(y, w) = wy
2
(linear in w but convex in ysee Figure 50)
10.1.3 Connection between MC and MP
Marginal cost is the amount it would cost, at the current level of output, to produce an additonal
unit. By denition of MP
a
, one unit of input adds MP
a
= f

(a) units of output. It follows that


1/MP
a
= 1/f

(a) units of a are needed to produce one unit of y


the marginal cost of an additional unit is MC(y) = w/f

(a), when the production function


is given by y = f(a)
54
(a) (b)
Figure 50
Alternatively, C(y) = wf
1
(y), using as input requirement function a = f
1
(y). Thus
8
C

(y) = w
df
1
(y)
dy
=
w
f

(a)
.
10.1.4 Geometry of C, AC, and MC
Take a look at Figure 51a. Note the following:
when MC < AC, AC is falling
when MC > AC, AC is rising
when AC is at a minimum, AC = MC
We sometimes add a set up cost F, (also called a xed cost). The total cost is then
C(y) = xed cost + variable cost = F +V C(y).
The implications of this model are illustrated in Figure 51b.
8
Recall that if f

(x
0
) = 0, then

df
1
(y)
dy

y=f(x
0
)
=
1
f

(x
0
)
.
55
(a) (b) Note the following: 1. min AC occurs to the
right of min AV C. (Why?) 2. MC intersects
both AC and AV C at their respective minimumns.
(Why?)
Figure 51
56
11 Production and Cost II: Two-Factor Production and Cost
Functions
The analysis of production and cost is more interesting when it involves combinations of two or
more inputs to produce y. The production function is y = f(a, b). As in consumer theory, we begin
by thinking about combinations of inputs that produce the same level of output. In the rm case
these are called isoquants.
We dene the marginal rate of technical substitution (MRTS) as the slope of an isoquant. It indicates
how many units of b one would need to add, per unit of a given up, to keep output constant. See
Figure 52.
Figure 52
Formally, suppose y = f(a
0
, b
0
), and consider varying a and b in such a way that output remains
xed at y
0
:
dy = f
a
da +f
b
db = 0,
which implies
_
db
da
_
y
0
=
f
a
(a
0
, b
0
)
f
b
(a
0
, b
0
=
MP
a
MP
b
.
The MRTS is analogous to the marginal rate of substitution (MRS) in consumer theory. When
there are two or more inputs, the production function is characterized by both the degree of sub-
stitutability between inputs (curvature of isoquants) and the extent to which output expands as
inputs are expanded proportionately. The latter gives rise to the idea of returns to scale. For a
production function y = f(a, b), we say f has constant returns to scale (CRS) if
f(a, b) = f(a, b), > 0.
We say that f has decreasing returns to scale (DRS) if
f(a, b) < f(a, b), > 1.
57
With DRS, if you double both inputs, you get less than twice the output. On the other hand, the
same inequality implies that if you reduce inputs by some proportion, your output falls by a smaller
proportion. So DRS suggests that smaller rms are necessarily more ecient. Conversely we say
that f has increasing returns to scale (IRS) if
f(a, b) > f(a, b), > 1.
(a) CRS (b) DRS
Figure 53
Examples:
One Input: f(a) = a

CRS if = 1
DRS if = 1
IRS if > 1
Cobb-Douglas: f(a, b) = a

CRS if + = 1
DRS if + = 1
IRS if + > 1
As a check,
f(a, b) = (a)

(b)

=
+
a

= f(a, b).
Geometrically, returns to scale indicates whether f is concave or convex over the top of a ray
emanating from the origin. (See Figure 53.)
58
11.1 Derivation of the Cost Function
Given a production function f(a, b) and prices w
a
, w
b
, we can write
C(w
a
, w
b
, y) = min w
a
a +w
b
b s.t. f(a, b) y.
Dene L = w
a
a +w
b
b (f(a, b) y), and proceed by the method of Lagrange:
L
a
= w
a
f
a
(a, b) = 0
L
b
= w
b
f
b
(a, b) = 0
L

= f(a, b) +y = 0
The ratio of the rst two FONC gives
w
a
w
b
=
f
a
(a, b)
f
b
(a, b)
= MRTS.
Geometrically, we nd the point of tangency of the constraint f(a, b) = y with the iso-cost lines
w
a
a +w
b
b = c.
See Figure 54.Notice the problem is reversed relative to that of a consumer. In the cost problem,
you are constrained to an isoquant and have to nd the lowest budget, or iso-cost line. In the
consumer problem, you are constrained to a budget line and have to nd the highest isoquant, or
indierence curve.
Figure 54
If we consider nding the most inexpensive way to achieve dierent levels of output given w
a
and
w
b
, we trace out the scale expansion path (SEP). (See Figure 55.) Note the similarity between a
rms SEP and a consumers IEP. Geometrically, the shape of the cost function (as a function of
y) depends on the shape of the production function over the top of the SEP. (See Figure 56.) If
the curve over the SEP is S-shaped (as illustrated) we get cost functions of the usual shape.
59
Figure 55
(a) (b)
Figure 56
11.2 Marginal Cost
If we were to produce an additional unit of y, we could use input a, or input b, or both. If we used
a only, it would take 1/MP
a
units of a for a single unit of y. The marginal cost is w
a
/MP
a
(just as
in the one-factor case). By symmetry, we could also use b only, at marginal cost of w
b
/MP
b
. But
from the FONC
w
a
w
b
=
MP
a
MP
b

w
a
MP
a
=
w
b
MP
b
.
So, on the margin, one should be indierent to expanding output via increases in a or increases in
b. This reects the fact that a and b were optimally chosen to begin with. Note also that
=
w
a
f
a
(a, b)
=
w
a
MP
a
=
w
b
MP
b
.
Thus the Lagrange multiplier in the cost-minimization problem gives marginal cost.
60
Examples:
f(a, b) = min{a, b/k}. At a cost minimum we must have a = b/k = y, which implies
C(w
a
, w
b
, y) = y(w
a
+kw
b
).
Note that this production function exhibits CRS.
f(a, b) = a + kb. These are linear isoquants, with f
a
/f
b
= 1/k. If w
a
/w
b
> 1/k, use only b,
in which case y = kb b = y/k, and C(w
a
, w
b
, y) = w
b
y/k. But if w
a
/w
b
< 1/k, use only
a, in which case y = a, and C(w
a
, w
b
, y) = w
a
y. Combining these results, for any w
a
, w
b
, we
have C(w
a
, w
b
, y) = y min{w
a
, w
b
/k}.
The previous two examples illustrate what is called the dual relationship between cost and pro-
duction functions. Leontief production functions imply linear cost functions; linear cost functions
imply Leontief-like cost functions.
f(a, b) = a

. (You may have seen this in a problem set!) The Lagrangian is L(a, b, ) =
w
a
a +w
b
b (a

y).
L
a
= w
a
a
1
b

= 0
L
b
= w
b
a

b
1
= 0
L

+y = 0
Using the rst FONC, we have
w
a
w
b
=
a
1
b

b
1
=
b
a
,
or
b =
aw
a
w
b
.
By substitution,
a

= a

_
aw
a
w
b
_

= a
+

w
b
6 = y,
from which we can easily retrieve the input requirement function (IRF) for a:
a = y
1
+
_

_

+
w


+
a
w

+
b
The IRF for b can be found by substitution, or by symmetry:
b = y
1
+
_

_
+
w

+
a
w


+
b
Finally C(w
a
, w
b
, y) = w
a
a + w
b
b when a and b are set to their respective cost-minimizing
values, so
C(w
a
, w
b
, y) = y
1
+
_

_

+
w

+
a
w

+
b
+y
1
+
_

_
+
w

+
a
w

+
b
= y
1
+
w

+
a
w

+
b
_
_

_

+
+
_

_
+
_
.
61
If + = 1 (CRS), this simplies considerably:
C(w
a
, w
b
, y) = yw

a
w

b
_
_

+
_

_
= yw

a
w

b
(

).
So with CRS, cost is linear in output. In general the exponent of y in the cost function is
( + )
1
, so if + > 1, cost is concave in output (IRS), whereas if + < 1, cost is
convex in output (DRS).
12 Cost Functions and IRFs
Suppose we are given a production function f(x
1
, x
2
), and the associated cost function C(y, w
1
, w
2
).
We determine C by solving the cost minimization problem:
min w
1
x
1
+w
2
x
2
s.t. f(x
1
, x
2
) = y
We dene the Lagrangian L = w
1
x
1
+w
2
x
2
(f(x
1
, x
2
) y). The FONC are:
L
x1
= w
1
f
x1
(x
1
, x
2
) = 0
L
x2
= w
2
f
x2
(x
1
, x
2
) = 0
L

= f(x
1
, x
2
) +y = 0
The rst two of these imply the tangency condition w
1
/w
2
= f
x1
/f
x2
, while the third is equivalent
to the constraint. Solving these two equations in two unknowns we get the IRFs:
x
1
= x

1
(y, w
1
, w
2
)
x
2
= x

2
(y, w
1
, w
2
)
The IRFs are analogous to the consumers demand functions: they represent the optimal (cost-
minimizing) input choices to produce y when input prices are (w
1
, w
2
). With these we obtain the
cost function
C(y, w
1
, w
2
) = w
1
x

1
(y, w
1
, w
2
) +w
2
x

2
(y, w
1
, w
2
), ()
which is simply the cost of the cost-minimizing combination of inputs.
12.1 Sheppards Lemma
It turns out that given C, one can recover the IRFs by simple dierentiation:
x

1
(y, w
1
, w
2
) =
C(y, w
1
, w
2
)
w
1
.
At a glance, this appears to be inconsistent with (). Indeed, dierentiating () with respect to w
1
gives three terms:
C(y, w
1
, w
2
)
w
1
= x

1
(y, w
1
, w
2
) +w
1
x

1
(y, w
1
, w
2
)
w
1
+w
2
x

2
(y, w
1
, w
2
)
w
1
. ()
62
However, when an input price changes, x

1
(y, w
1
, w
2
) and x

2
(y, w
1
, w
2
) are constrained to move
along an isoquant. (See Figure 9.) In other words, we have
f(x

1
(y, w
1
, w
2
), x

2
(y, w
1
, w
2
) = y,
and this holds even as w
1
varies, so, dierentiating w.r.t. w
1
:
f
x1
x

1
w
1
+f
x2
x

2
w
1
= 0.
This means
x

2
w
1
=
f
x1
f
x2

1
w
1
. (*)
So, since x

1
falls in response to a rise in w
1
, x

2
has to rise, and the rates of change are in the ratio
f
x1
/f
x2
. (Note that x

1
responds to a change in w
1
just as a demand function does in consumer
theory; the response is like a subsitution eect. Since the isoquant exhibits DMRTS, w
1
x

1
.)
And substituting (*) into (),
C
w
1
= x

1
+
x

1
w
1
_
w
1
w
2
f
x1
f
x2
_
.
But w
1
w
2
(f
x1
/f
x2
) = 0 by the tangency condition, so the second and third terms on the RHS of
() always cancel, leaving us with ().
Equation () says that if w
1
rises, the rst order eect on cost is proportional to the amount of x
1
the rm originally was using. Although the optimal choices of x
1
and x
2
also change, they do so in
such a way that y remains constant, and because of the initial tangency condition the movements
in the inputs leave cost unchanged.
13 Output (Supply) Determination
So far we have studied cost, taking output as given. In this lecture, we consider the output or
supply decision of individual competitive rms. By competitive, we mean the rm takes the prices
of inputs and outputs as exogenous (i.e. beyond the rms control). For any rm, prot is dened
as revenue minus cost. For a competitive rm that uses two inputs, 1 and 2, to produce a single
output y with unit price p, prot is given by
(y) = py C(y, w
1
, w
2
).
Note that revenue py is linear in output, whereas the cost function is potentially non-linear. Assume
the rm selects y so as to maximize prot:
max py C(y, w
1
, w
2
)
FONC:
d
dy
= p C
y
(y

, w
1
, w
2
) = 0,
63
or, equivalently, price = marginal cost at y = y

. The SOC for a maximum is


d
2

dy
2
< 0 C
yy
(y

, w
1
, w
2
) < 0 C
yy
(y

, w
1
, w
2
) > 0 MC is increasing at y = y

.
The diagram is shown in Figure 9. Note that y

is a function of p and w = (w
1
, w
2
). We dene the
supply function to be y = y

(p, w
1
, w
2
). What if < 0 at y = y

(p, w)?
If p < AV C then y

= 0. The rm is losing on both xed and variable inputs: the best choice
is to shut down.
If p > AC, the rm is turning a prot, so y

is such that p = MC(y

).
If AV C < p < AC , the rm is incurring a loss, but its covering its operating costs, failing
only to cover its xed costs. The rm may well stay in business and hope for better times.
Figure 9 is a useful representation of the rms optimal choice.
Observations
If MC is constant (e.g. Cobb-Douglas with + = 1), then, assuming no xed costs,
p < MC loss y

= 0, and p MC y y

= (innite prot).
If MC is always decreasing, then supply is undened, if not zero. (See Figure 9.)
Examples:
y = x
a
, 0 < a < 1 (one input, DRS)
The input requirement function is x

(y) = y
1/a
, which does not depend on prices. Thus
C(w, y) = wx

(y) +F
= wy
1/a
+F,
where F = xed costs, and
MC(y) =
w
a
y
1a
a
AC(y) =
F
y
+wy
1a
a
The optimal output supply choice y

solves p = MC(y), which implies


p =
w
a
(y

)
1a
a
, or y

(p, w) =
_
ap
w
_ a
1a
.
Note the following:
y

is homogenous of degree zero in (p, w)


y

increases with p, decreases with w


y = x

1
x

2
, + < (Cobb-Douglas with DRS)
Recall that
C(y, w
1
, w
2
) = k
1
w

+
1
w

+
2
y
1
+
64
for some k
1
> 0. Therefore
MC(y) = k
2
y
1
+
w

+
1
w

+
2
for some constant k
2
. Setting p = MC and solving for y gives
y

= k
3
p
+
1
w


+
1
w


+
2
for some constant k
3
. Or, equivalently,
log y

= constant +
+
1
log p

1
log w
1


1
log w
2
.
Again y

is homogeneous of degree zero in (p, w), increasing in p, and decreasing in w


1
and
w
2
.
Exercise: For a general cost function, prove that the competitive supply response is homogeneous of
degree zero in all prices, (input and output). Hint: The cost function is homogeneous of degree one
in all input prices.
13.1 The Law of Supply
The Law of Supply states that competitive supply functions are always upward sloping:
y

p
> 0.
Why? At the optimal level of supply, p = MC. But MC is increasing by the SOC, so if p increases,
the new optimal level of supply increases, too: we simply move along the MC schedule. (See
Figure 9.)
Formally, y

is dened as the solution to


p C
y
(y

(p, w
1
, w
2
), w
1
, w
2
) = 0. (*)
This FONC holds even if we move p (or either of w
1
or w
2
for that matter). Therefore, dierentiating
both sides of (*) w.r.t. p,
1 C
yy
(y

(p, w
1
, w
2
), w
1
, w
2
)
y

p
= 0,
hence
y

p
=
1
C
yy
(y

, w
1
, w
2
)
.
But C
yy
(y

(p, w
1
, w
2
), w
1
, w
2
) > 0 by the SOC, so y

/p > 0!
13.2 Changes in Input Prices
What is the eect of an increase in input prices on the rms output decisions? An increase in
input prices, (say w
1
), is associated with a shift in MC. (See Figure 9.)
In the case where MC rises with w
1
, we have y

/w
1
< O. Is this always the case? We shall see
in the next section!
65
14 Uncertainty I: Introduction
In the next four sections we extend the theory of consumer choice to the context of choice under
uncertianty. For simplicity, we deal mainly with uncertainty regarding income. Assuming that
prices are xed, alternative realizations of random income translate directly into alternative utility
levels. We begin with a brief review of statistics.
14.1 Review of Basic Statistical Concepts
Figures/LectureNotes/probDist.pdf
Figure 57: Probability distribution of a random variable X. The height of the bar above a point
indicates the probability of X landing on that point.
Let X be a random variable that takes the values x
1
, . . . , x
n
, with probabilities p
i
= P (x
i
) =
P (X = x
i
), 1 i n. Note that if x
1
, . . . , x
n
are exhaustive and mutually exclusive, then

n
i=1
p
i
= 1. We draw the probability distribution of X as shown in Figure 57.
We dene the mean, or expected value of X, denoted by E[X] (or sometimes by X), to be E[X] =

n
i=1
p
i
x
i
. The mean is just a weighted average of the alternative realizations of X, with the
weights being the probabilities associated with the respective realizations.
Consider the two random variables X
1
and X
2
with probability distributions as shown in Figure 58.
Note that
E[X
1
] = 10 .1 + 20 .2 + 30 .4 + 40 .2 + 50 .1 = 30,
E[X
2
] = 10 .5 + 50 .5 = 30,
so while these distributions have the same mean, X
2
is more dispersed (X
1
on the other hand is
more concentrated near its mean).
Figures/LectureNotes/x1dist.pdf Figures/LectureNotes/x2dist.pdf
Figure 58: Two distributions with identical means.
One way to describe the level of dispersion of a random variable is with its variance, denoted V[X]:
V[X] =
n

i=1
p
i
(x
i
X)
2
.
66
The variance of X is the mean squared dierence between X and X. As an exercise, calculate V[X
1
]
and V[X
2
] above. We say that a random variable X is degenerate if X = E[X] with probability
one, in which case V[X] = 0.
We can also consider functions of random variables. If g is a function dened on R, then Y = g(X)
is a random variable. We dene the mean E[Y ] as follows:
E[Y ] = E[g(X)] =
n

i=1
p
i
g(x
i
).
If g is linear, i.e. if g(x) = ax +b for all x, then
E[Y ] =
n

i=1
p
i
(ax
i
+b)
= a
n

i=1
p
i
x
i
+b
n

i=1
p
i
. .
1
= aE[X] +b.
As an exercise, show that V[aX +b] = a
2
V[X] for any choice of a, b.
14.2 Choices Over Uncertain Incomes
We now suppose that individuals are asked to make choices between alternative income lotteries.
Each lottery is essentially a probability distribution over income. In ranking two alternative lotter-
ies, we hold constant income in the absence of either lottery, (which in reality could be random).
Let y denote income. In a world without uncertainty individuals always prefer more income to
less, so the following utility functions are all equivalent in the sense that they give rise to the same
indierence curves:
U(y) = ay +b, a > 0
U(y) = e
y
U(y) = y
3
Since each function is increasing, it indicates a preference for more income. This is all we need, if
all we want to know is how to rank incomes.
On the other hand, suppose we wish to rank income lotteries. For example, consider:
Payo Probability
Lottery 1: $100 0.5
0 0.5
Payo Probability
Lottery 2: $70 0.5
$30 0.5
67
In the 1940s John von Neumann and Oskar Morgenstern asked: is there some way of assigning
a utility number to each possible outcome in such a way that we can compare these lotteries by
comparing the expected utilities:
0.5 U(100) + 0.5 U(0) in case of Lottery 1
0.5 U(70) + 0.5 U(30) in case of Lottery 2
The answer is yes, (under some assumptions), although we wont prove it. Thus, if preferences
satisfy certain conditions, then there is a utility functioncall it a von Neumann-Morgenstern
(vN-M) utility functiondened on the set of all possible incomes, that we can use to compare
both certain incomes, (which is trivially easy anyway), and income lotteries. The idea is that if we
get the utility diernces between dierent incomes just right, then we can use the expected utility
criterion to compare lotteries.
NOTE: Normally we dont care about the gauge of a given utility function. That is, if U is a utility
function, then we regard V = g(U) as equivalent, provided g is a non-decreasing function.
How do you feel about Lottery 1 versus Lottery 2? Chances are, you would take Lottery 2. This
reveals something about the shape of your vN-M utility function.
A vN-M utility function U is always increasing since more money is always better than less, (for
an economist anyway). If U is linear, e.g. U(y) = ay +b, then
U(0) = b,
U(30) = 30a +b,
U(70) = 70a +b, and
U(100 = 100a +b,
so clearly 0.5 U(70) + 0.5 U(30) = 0.5 U(0) + 0.5 U(100). This leads to our rst result:
If the vN-M utility function is linear, then lotteries with equal expected utilities are
considered equally good.
On the other hand, if you prefer Lottery 2, then your vN-M utility function must be concave as
shown in Figure 59.
Figures/LectureNotes/vN-MUtil.pdf
Figure 59: Concave vN-M utility function. U(50) > 0.5 U(30) + 0.5 U(70).
68
15 Auctions I
Many items are sold by auction, including treasury bills, broadcasting rights, real estate, livestock,
ne art, and natural resources (e.g. timber lands and oil elds). Large companies and governments
also use procedures that are equivalent to auctions to determine who will supply goods or services
in some cases.
In this section and the next we examine how economists model auctions. Although auctions have
existed for centuries, the basic theory thereof is quite modern. One good, somewhat advanced ref-
erence is Paul Klemperer, Auction Theory: A Guide to the Literature, Journal of Economic Sur-
veys Vol 13 (3), July 1999, which is available at http://www.nuff.ox.ac.uk/users/klemperer/
survey.pdf.
15.1 Basic Types of Auction
There are four basic types of auction for a single good:
1. English Auction. Also known as an ascending bid auction, this probably is the one with
which you are most familiar. An auctioneer acts as moderator, and asks for bids from a group
of n bidders. If a bidder bids b
(n)
, and no one outbids him, then he wins the auction and pays
b
(n)
in return for the good. Note that the auctioneer may be a computer. (eBay essentially
is an English auction arena, although each of the auctions has a time limit, which is unusual
for an English auction.)
2. Dutch Auction. Also known as a descending bid auction, the auctioneer calls out a
descending sequence of prices, starting from a price that is clearly too high. The rst bidder
to announce that she is willing to accept the current price, b
(n)
, wins the auction and pays
b
(n)
in return for the good.
3. First-Price Sealed-Bid Auction. Bidders submit written bids. At a certain point the
bidding is closed. The auctioneer then selects the highest bid b
(n)
, which is declared the
winner. The winner pays b
(n)
.
4. Second-Price Sealed-Bid Auction. Also known as a Vickery auction, bidders again
submit written bids and, at a certain point, the bidding is closed. The auctioneer then selects
the highest bid b
(n)
, which is declared the winner; however, the winner pays the second highest
bid b
(n1)
.
Auction models dier in their assumptions regarding how the value of an item at auction varies
from one person to the next, and how much the bidders know about their own potential valuations
as well as those of other bidders. The value of the item to bidder i will be donoted v
i
.
We shall focus on a few important cases:
1. Private Values. Each valuation is independent and known only to the bidder.
2. Common Value. v
1
= = v
n
= v but v is unknown. (Examples might include an auction
to sell the rights to drill for oil in a certain tract of land.)
69
3. Aliated Values. v
i
varies across bidders but bidders themselves do not know their own
valuations with certainty, and the valuations are positively correlated. (Examples might
include an auction for a house.)
15.2 Important Results Concerning the Private Values Case
1. A Dutch auction is equivalent to a rst-price sealed-bid auction.
In a Dutch auction there is no dynamic choice: one must choose an opt-in price ex ante and,
if the price falls to that level, opt in and receive the good for that price. This is the same
problem as deciding what bid to submit in a rst-price sealed-bid auction. (We defer for the
moment the optimal choice of bidding strategy in these auctions.)
2. In an English auction the optimal strategy is to keep bidding until the current highest bid b
exceeds your valuation v
i
. Why?
(a) If b > v
i
, you are advised to walk away for otherwise, if you bid b

> b, then the eventual


winner will pay at least b > v
i
.
(b) If v
i
> b, and you walk away, you leave a surplus v
i
b > 0 on the table.
(c) If b = v
i
, then bid b + and, if no one outbids you, break even.
3. In light of (2c), in an English auction the bidder with the highest valuation wins, and will pay
the second highest valuation (plus a marginal amount needed to surpass the second highest
bidder).
4. In a second-price sealed-bid auction, the optimal strategy is to bid your valuation.
Suppose your true value is v and you bid v x, where x 0. Suppose the highest bid among
all other bidders is w.
(a) If v > w, you win and pay w.
(b) If v < w, you lose and pay nothing.
Your expected surplus,
s = (v w)P (v x > w) ,
is maximized by setting x = 0!
Now suppose you bid v +x, where x and w are as before.
(a) If v +x < w, then you lose and pay nothing.
(b) If v +x > w, you win and pay w. Your surplus is v w. There are two cases:
i. v w you want to win,
ii. v < w you dont want to win.
If you set x = 0, you win i v > w, so x = 0 is the best choice.
5. Based on (4), in a second-price sealed-bid auction the winner is the bidder with the highest
valuation, who then pays a price equal to the second highest valuation.
70
6. Results (3) and (5) imply that in the private values case an English auction is equivalent to
a second-price sealed-bid auction.
7. In the common value case English auctions may be dierent because a bidder can make
inferences based on the identity/size of the remaining pool of bidders.
8. How does one bid in a rst-price auction?
Let us start by making a few assumptions:
The valuations v
1
, . . . , v
n
are independent and identically distributed (IID), with P (v
i
x) =
F(x) for all i. (This is sometimes written v
1
, . . . , v
n
IID
F, where F is the cumulative
distribution function, or CDF.)
Each bidder adopts the same strategy and bids b
i
= B(v
i
), where B is the bid function.
What does the bid function look like?
B is increasing for otherwise bidder with highest valuation wouldnt necessarily win.
B increasing B invertible if b = B(v), then v = g(b), where g denotes the inverse
bid function B
1
.
Assuming each bidder bids according to B,
P (you win with bid b) = P (v
j
< g(b) for all other bidders j)
= [F(g(b))]
n1
.
Let s = s(b; v) = expected surplus given bid b and valuation v. Then
s = (v b)
. .
surplus
if win
[F(g(b))]
n1
. .
prob of winning
.
What is the FONC for b?
s
b
= [F(g(b))]
n1
+ (v b)(n 1)[F(g(b))]
n2


b
F(g(b))
= [F(g(b))]
n1
+ (n 1)(v b)[F(g(b))]
n2
f(g(b))g

(b),
where f = F

is the probability density function. Setting s/b = 0,


[F(g(b))]
n1
= (n 1)(v b)[F(g(b))]
n2
f(g(b))g

(b),
which implies
v b =
F(g(b))
f(g(b))

1
g

(b)

1
n 1
. (*)
Note that since B

> 0, so is g

= 1/B

, and therefore v b > 0. This means one should


always shade his or her bid. Why? If you bid more, then although you win more often
you also pay more. Like a monopolist, you must take this into account.
71
As an example, suppose v
1
, . . . , v
n
IID
Uniform(0, 1), that is, suppose the valuations
are independent and identically distributed with
F(v) =
_
_
_
0, v < 0,
v, 0 v 1,
1 v > 1,
and therefore
f(v) =
_
1, 0 v 1,
0, otherwise.
(See Figure 60.)In this case (*) says
Figures/LectureNotes/unifPDF.pdf
(a) PDF
Figures/LectureNotes/unifCDF.pdf
(b) CDF
Figure 60: Uniform(0,1) distribution.
g(b) b =
1
n 1

g(b)
g

(b)
,
which is a dierential equation with solution
g(b) =
n
n 1
b.
This is easily veriable:
g(b) b =
n
n 1
b b =
n (n 1)
n 1
b =
b
n 1
=
1
n 1

n
n1
b
n
n1
=
1
n 1

g(b)
g

(b)
.
The bid function is recovered by setting v = g(b) and solving
v =
n
n 1
b
for b = B(v):
B(v) =
n 1
n
v.
16 Auctions II
In this section we analyze the winners curse. The winners curse arises in common value and
aliated values auctions, in which each bidder estimates the value of the item at auction. Bidders
with higher guesses, on average have made a positive error. So bidders must shade their bids to
compensate for the fact that when they win, on average it is the result of being overly optimistic.
72
16.1 Winners Curse
A very simple example of the winners curse is the following: a police car is to be sold at auction
using a second-price sealed-bid system. Each bidder inspects the car, then the bidding begins.
Mathematically, we make the following assumptions:
1. The true value of the car is V , a random variable with mean and variance
2
V
. This exrpesses
the idea that over many auctions, the average value of an old police car is . But there is
variability from one car to the next, reected in
2
V
.
2. Each bidder hires a mechanic to estimate the value of the car. The mechanic reports his or
her estimated value T
i
= V +
i
, where
i
is the error in the mechanics assessment. A bidder
doesnt know the values reported to the competitors by their respective mechanics.
3. We assume
1
, . . . ,
n
IID
F

, with mean zero and variance


2

. Note that the larger is


2

relative to
2
V
, the noiser the mechanics reports.
4. Based on T
i
, the ith bidder estimates the true value of the car. In particular, the bidder forms
an estimate
Y
i
= T
i
+ (1 ).
The idea behind this is that if T
i
is signicantly noisy, the bidder should downweight the
mechanics report and assume instead that the average value at auction is more credible.
Note that
E[Y
i
] = E[T
i
] + (1 ) = E[V +
i
] + (1 ) = + (1 ) = .
What is the optimal value of ? The forecast error for a given value of is

i
= Y
i
V
= T
i
+ (1 ) V
= (V +
i
) + (1 ) V
= (1 )( V ) +
i
.
The variance of the forecast error is
V[
i
] = V[(1 )( V ) +
i
]
= (1 )
2
V[V ] +
2
V[
i
]
= (1 )
2

2
V
+
2

.
We choose to minimize the variance of the forecast error. The FONC is
V[
i
]

= 2(1 )
2
V
+ 2
2

= 0,
which implies (1 )
2
V
=
2

, or
=

=

2
V

2
V
+
2

.
73
You may have seen this before:

is the signal-to-total-variance ratio. If


2

is small, then

is nearly one, and the result is a weighted average with more weight on the mechanics report.
5. Based on the mechanics report, plus the optimal choice of =

, each bidder now has a


good idea as to the value of the car in the current auction. Since it is a second-price auction,
one might think each bidder simply should bid
Y

i
=

T
i
+ (1

).
But this will give rise to a winners curse! The highest bidder wins the auctionthis is the
bidder whose mechanic made the biggest positive error. He pays the amount of the second
highest bid, which includes the second biggest positive error. So, even in a second-price
auction, one must take into account the fact that on average the second highest bidder also
was overly optimistic, and shade his or her bid.
6. In an auction with 10 bidders, if
1
, . . . ,
n
IID
N(0, 1), i.e. if the errors are normally
distributed with mean zero and variance one,
9
then the expected value of the second highest
error
(n1)
is approximately 1.003. Table 3 lists a few other cases. As you can see, E[
(n1)
]
Table 3: Expected Second Highest Error for Various Numbers of Bidders
n E[
(n1)
]
10 1.001
25 1.524
35 1.692
100 2.148
grows with n. This might make you worry a little about eBay. With thousands of bidders
on a given item, if you do not know what the item is worth to you, then you ought to shade
your bid quite a bit.
7. What should you bid?
Suppose each bidder shades his or her bid by k:
Y
i
=

T
i
+ (1

) k
=

V + (1

) +

i
k.
Let = E[
(n1)
]. (This quantity usually is estimated by computer simulation.
10
) The
expected bid of the second-highest bidder is

V + (1

) +

k.
So, if everyone sets k =

, then each bidder should expect to pay


E[

V + (1

)] =
in the event he wins, which means that on average he pays what the item is worth.
9
See Figure 61.
10
See Appendix 16.2
74
Bottom line: in the second-price auction we have set up, each bidder bids what she believes
the item is worth based on her information, minus a discount that is equal to the expected
second-highest error in the observed signal.
8. In the real world, bidders hire consultants to simulate the auction (by guessing as to
2

and

2
V
).
Figures/LectureNotes/stdNorm.pdf
Figure 61: Standard normal distribution.
16.2 Appendix: Order Statistics
Let X
1
, . . . , X
n
IID
F
X
, and let Y
k
= X
(nk+1)
deonte the kth biggest observation, 1 k n.
Then
F
Y2
(x) = P (exactly one observation is greater than x) +P (all observations are at most x)
= n(1 F
X
(x))(F
X
(x))
n1
+ (F
X
(x))
n
= n(F
X
(x))
n1
n(F
X
(x))
n
+ (F
X
(x))
n
= n(F
X
(x))
n1
(n 1)(F
X
(x))
n
,
and therefore
f
Y2
(x) = F

Y2
(x)
= n(n 1)(F
X
(x))
n2
f
X
(x) n(n 1)(F
X
(x))
n1
f
X
(x)
= n(n 1)(F
X
(x))
n2
f
X
(x)S
X
(x),
75
where S
X
(x) is dened to be 1 F
X
(x). Let
Y
k
= E[Y
k
]. Then

Y2
= n(n 1)
_
R
x(F
X
(x))
n2
f
X
(x)S
X
(x)dx.
If X
1
, . . . , X
n
IID
Uniform(0, 1), then

Y2
= n(n 1)
_
1
0
x
n1
(1 x)dx
= n(n 1)
_
1
n

1
n + 1
_
=
n 1
n + 1
.
If X
1
, . . . , X
n
IID
N(0, 1), then

Y2
= n(n 1)
_
R
x((x))
n2
(1 (x))f(x)dx,
where is the standard normal CDF and
f(x) =

(x) =
1

2
e
x
2
/2
.
This can be computed by Gaussian quadrature with the following R function:
ESBgq = function(n, CDF, PDF){
# - computes approx. EV of 2nd biggest observation among n IID draws from dist. "CDF"
# - corresponding density is "PDF"
f = function(x){x*n*(n-1)*(CDF(x))^(n-2)*PDF(x)*(1 - CDF(x))}
# - "f" is density of 2nd biggest observation
integrate(f, -Inf, Inf)$value
}
See Table 4 for some additional approximate values of
Y2
.
As a check, we may approximate
Y2
for n = 100 by simulation. The following R script
returns
Y2
2.148444, which agrees fairly well with the previous result.
ESBsim = function(n,B){
# - computes EV of 2nd biggest observation among n IID draws from standard normal
x = 0
for(i in 1:B) x = x + sort(rnorm(n))[n-1]
76
Table 4: Approximate values of
Y2
for several n.
n
Y2
5 0.495019
10 1.001357
25 1.524301
50 1.854872
100 2.148145
250 2.494308
500 2.732308
1000 2.954133
1e+04 3.605654
1e+05 4.165956
1e+06 4.664618
x/B
}
print(ESBsim(100,1e06))
17 Introduction to Finance I: Capital Asset Pricing Model
In this section we consider the implications of the simple assumption that investors hold only port-
folios of assets that are mean-variance ecient. This turns out to have the surprising consequence
that the price of a stock, i.e. the price of a share in a publicly traded company, depends on the
covariance between the return on the stock and the return on the market as a whole. This result
was discovered in the early 1960s by William Sharpe, who shared the Nobel Price in Economics on
the basis of his work. The theoretical model is called the Capital Asset Pricing Model (CAPM).
17.1 Assumptions
We shall assume that people can invest in a set of assets, i = 1, . . . , n. An investor with $X
selects a portfolio, or in other words a list of the amounts invested in each of the possible
assets. Denote by
i
the share of X in asset i. Note that
i
0 for all i, and

n
i=1

i
= 1.
You can think of any vector R
n
with non-negative elements that add up to one as a
portfolio.
We shall assume for simplicity that an investor has a one-period horizon, or holding period.
An investment of $1 in asset i will be worth 1 + R
i
at the end of the holding period. Thus
R
i
is the proportional return on asset i over the holding period. Note that R
i
1 if assets
have limited liability for in that case the worst event would be to lose ones entire investment.
R
1
, . . . , R
n
are random variables.
77
The mean of R
i
is
Ri
= E[R
i
], and the variance is
2
i
= V[R
i
] = E[(R
i

Ri
)
2
] = E[R
2
i
]
2
Ri
.
A riskier asset is one with higher variance. We do not assume that the returns on the respective
assets are independent. Instead, we assume there to be potential covariances

2
ij
= Cov[R
i
, R
j
] = E[(R
i

Ri
)(R
j

Rj
)].
Note that the covariance of the return on asset i, with itself, is
E[(R
i

Ri
)
2
] = V[R
i
] =
2
i
,
so we shall occasionally write
2
ii
rather than
2
i
.
(For future reference.) If you have taken econometrics, then you may recall the following:
Given two random variables X and Y , and observations (X
1
, Y
1
), . . . , (X
n
, Y
n
), if you are
interested in a line of best t explaining how Y may depend on X, then you carry out a
linear regression of Y on X. This procedure returns the least squares estimates

0
and

1
in
the linear model
Y
i
=
0
+
1
X
i
+
i
, 1 i n,
where
1
, . . . ,
n
are the residual errors. The optimal coecients are found by minimizing the
residual sum of squares
n

i=1

2
i
=
n

i=1
_

Y
i
Y
i
_
2
=
n

i=1
_

0
+

1
X
i
Y
i
_
2
,
which happens to be proportional to an unbiased estimate of V[
i
], assuming V[
1
] = =
V[
n
] =
2
, so this is equivalent to minimizing
2
. This can be done with the same methods
we used to analyze the winners curse, and leads to the formulas

1
=
Cov[X, Y ]
V[X]
=

2
XY

2
X
,

0
= E[Y ]

1
E[X] =
Y


2
XY

2
X

X
.
(Back to the problem of the portfolio.) If an investor with $X selects a portfolio =
(
1
, . . . ,
n
), then how much will he have at the end of the holding period?
The total amount invested in asset i is
i
X.
At the end of the holding period this investment is worth
i
X(1 +R
i
).
The investor now has
n

i=1

i
X(1 +R
i
) = X
_
n

i=1

i
+
n

i=1

i
R
i
_
= X
_
1 +
n

i=1

i
R
i
_
.
The realized proportional return on the portfolio is
R =
X (1 +

n
i=1

i
R
i
) X
X
=
n

i=1

i
R
i
,
78
which is a weighted average of the returns on the respective assets, with each weight equal to
the corresponding portfolio share. The expected return is
E[R] = E
_
n

i=1

i
R
i
_
=
n

i=1

i
E[R
i
] =
n

i=1

Ri
.
What is the variance of R?
V[R] = E
_
_
_
n

i=1

i
R
i

i=1

Ri
_
2
_
_
= E
_
_
_
n

i=1

i
(R
i

Ri
)
_
2
_
_
= E
__
n

i=1

i
(R
i

Ri
)
_

_
n

i=1

i
(R
i

Ri
)
__
= E
_
_
n

i=1
n

j=1

j
(R
i

Ri
)(R
j

Rj
)
_
_
=
n

i=1
n

j=1

j
E[(R
i

Ri
)(R
j

Rj
)]
=
n

i=1
n

j=1

2
ij
=
n

i=1

2
i

2
ii
+

1in

1jn
j=i

2
ij
.
The CAPM hinges on two assumptions:
1. There exists a risk-free assetcall it asset 1.
2. The market portfoliothe portfolio consisting of equal amounts of all sharesis mean-
variance ecient in the sense that no other portfolio realizes the same return but with
lower variance.
17.2 Conclusions
Consider an ecient portfolio
p
= (
p
1
, . . . ,
p
n
), with return R
p
=

n
i=1

p
i
R
i
, and which has the
lowest possible variance subject to yielding an expected rate of return of
Rp
. Such a portfolio
solves
min

i=1
n

j=1

2
ij
s.t.
n

i=1

Ri
=
Rp
and
n

i=1

i
= 1.
79
The Lagrangian is
L(,
1
,
2
) =
n

i=1
n

j=1

2
ij

1
_
n

i=1

Ri

Rp
_

2
_
n

i=1

i
1
_
.
FONC w.r.t. :
L

p
j
= 2
n

i=1

p
i

2
ij

1

Ri

2
= 0, 1 j n. (*)
In particular, (*) holds for asset 1; however, asset 1 is risk-free by assumption, and therefore
2
1i
= 0
for all i. Thus, for asset 1, (*) implies

2
=
1

R1
=
1
r,
where r deontes the risk-free rate of return. Hence we can rewrite (*) as follows:
2
n

i=1

p
i

2
ij
=
1
(
Rj
r), 1 j n. (*)
Multiplying by
p
j
,
2
p
j
n

i=1

p
i

2
ij
=
p
j

1
(
Rj
r), 1 j n,
and summing across all assets,
2
n

j=1

p
j
n

i=1

p
i

2
ij
=
n

j=1

p
j

1
(
Rj
r).
which is equivalent to
2
n

j=1
n

i=1

p
i

p
j

2
ij
=
1
_
_
n

j=1

p
j

Rj
r
_
_
,
or
n

j=1
n

i=1

p
i

p
j

2
ij
=
1
(
Rp
r), (**)
since

n
j=1

p
j

Rj
=
Rp
. Let
2
p
=

n
j=1

n
i=1

p
i

p
j

2
ij
denote the minimum variance of the
portfolio with expected return
Rp
. Then (**) says 2
2
p
= (
Rp
r), or

1
=
2
2
p

Rp
r
.
Plugging the result back into (*),

Rj
r = (
Rp
r)
n

i=1

p
i

2
ij

2
p
. ()
80
Finally, notice that
n

i=1

p
i

2
ij
= E
_
(R
j

Rj
)
n

i=1

p
i
(R
i

Ri
)
_
= Cov
_
R
j
,
n

i=1

p
i
R
i
_
= Cov [R
j
, R
p
] .
In other words,

n
i=1

p
i

2
ij
is the covariance of the return on asset j, with the return on the ecient
portfolio that has expected return
Rp
. We can rewrite () as follows:

Rj
r = (
Rp
r)
Cov[R
j
, R
p
]

2
p
. ()
Suppose now that () holds for the return on the market portfolio, R
m
. Let
Rm
= E[R
m
], and let

2
m
= V[R
m
]. Then it follows by () that

Rj
= r + (
Rm
r)

1
j
,
where

1
j
=
Cov[R
j
, R
m
]

2
m
is the regression coecient one would get by carrying out linear regression of the return on asset j,
on the return on the market portfolio. This regression coecient is called the assets beta.
18 Introduction to Finance II: Ecient-Market Hypothesis
18.1 Review
In the previous section we considered the implications of the CAPM hypotheses with respect
to the return on a risky asset i. In particular, if asset i has random return R
i
and beta
i
,
(i.e. if, on average, when R
m
is 1% higher/lower than usual, then R
i
is
i
% higher/lower),
then
Ri
satises the CAPM equation

Ri
=
0
+
i

1
, (*)
where, in the strict CAPM,
0
= r
f
is the risk-free rate of return and
1
= r
m
r
f
is the gap
between the expected return on the market portfolio, and the risk-free rate.
Researchers in the 1970s and 80s attempted to test the CAPM by estimating s for sets of
assets and checking whether, on average, assets with bigger s had higher expected returns.
This was not always successful. These days, economists interpret the model less literally.
Often, they augment the model with additional factors. The strict CAPM says that all one
needs to know about an asset is its covariance with the market. A more agnostic view is that
81
while matters, there may be additional considerations. So it is common to see models such
as the following:

Ri
=
0
+
i

1
+
i

2
,
where
i
is some other factor. A typical factor is one that reects how a given asset covaries
with a portfolio comprised of small stocks, or with a portfolio made up of bonds as opposed
to stocks.
One key use of (*) is the determination of how to discount returns on dierent assets. For
example, if we are dealing with an asset that is priced at P
0
in the current period, and will
sell for P
1
in the next, then the expected return on the asset is (E[P
1
] P
0
)/P
0
. If the asset
has beta equal to , then under the CAPM the expected return is
0
+
1
. Thus we have
E[P
1
]
P
0
1 =
0
+
1
P
0
=
E[P
1
]
1 +
0
+
1
.
This very simple equation has many immediate implications.
18.2 Ecient-Market Hypothesis
Consider a very short holding period, e.g. one week. The discounting is negligible, thereby
giving
P
0
= E[P
1
]. (**)
This means the price today has to be the expected value of the price next week. A stochastic
process is simply a sequence of random variables Y
1
, Y
2
, . . .
Examples:
The height of the Nile River at a given location on June 1, some year onward.
The closing price of the S&P 500 on Friday, some week onward.
Roughly speaking, a random walk is a stochastic process with the additional property that
E[Y
t+1
|t] = Y
t
,
i.e. the best forecast of the value in the next period is the current value. So people sometimes
say that asset prices constitute a random walk.
Equation (**) is sometimes called the ecient market hypothesis (EMH). The key insight in
this equation is that all the information we have to forecast the value of the asset tomorrow
is factored into the current value. As an example, suppose you think a share of Google stock
will be worth $X in six weeks. Then you should be willing to pay almost $X for the stock
now, (subject to the discount factor only).
82
Suppose (**) holds for a stock. The realized gain from buying the stock today and selling it
in the next period is
P
1
P
0
= P
1
E[P
1
].
But the deviation of a random variable from its expected value is unpredictable. This means
that techniques such as drawing charts, etc., (so called technical analysis), cannot work!
Suppose new information is revealed about an asset. Take, for example, news concerning a
drug company such as Merck: the news could be regarding a prospective drug that is being
evaluated in a randomized clinical trial, or the discovery of side eects associated with an
existing drug, or a decision by the FDA, etc. Equation (**) says the newswhatever it may
beshould cause an instantaneous adjustment of the stock price, up or down. Likewise, news
with implications for the entire economy, e.g. the results of a Federal Reserve Open Market
meeting, should cause the market as a whole to adjust, up or down, instantaneously, as people
adjust their expectations.
This leads to the idea of an event study. If one is trying to evaluate the eect of news on the
value of a rm, one looks at the excess returns on the rms stock:
XR
t
=
P
t
P
t1
P
t1

M
t
M
t1
M
t1
,
where P
t
is the value of a share in the rm under consideration, at closing on day t, and M
t
is the value of the market index at the same time.
The cumulative excess return is the sum of the excess returns over some horizon:
CXR
t
=
t

i=1
XR
i
,
where period zero is some time prior to the breaking of the news, e.g. 714 days beforehand.
One then plots CXR
t
, 1 t T, where T is several days after the breaking of the news.
Ideally one should observe random uctuations before and after the news, with a jump on
the day of the news.
11
Figure 62
An implication of the EMH is that on average there is no advantage to following the sugges-
tions of advisors (at least, adjusting for the excess risk of the portfolios they recommend). It
is widely believed that the high returns reported by some funds in some periods are merely
strings of luck. Tables 57, (drawn from a paper by Burton Malkiel, The Ecient Market
Hypothesis and Its Critics, Journal of Economic Perspectives, Winter 2003), demonstrate
this idea.
11
A good reference on the topic is W. Craig McKinley, Event Studies in Finance and Economics, Journal of
Economic Literature, March 1997.
83
Table 5: Percentage of Large Capitalization Equity Funds Outperformed by Index Ending
6/30/2002
1 year 3 years 5 years 10 years
S&P 500 vs. Large Cap Equity Funds 63% 56% 70% 79%
Wilshire 5000 vs. Large Cap Equity Funds 72% 64% 69% 74%
Note: All large capitalization mutual funds in existence are covered with the exception of sector
funds and funds investing in foreign securities.
Source: Lipper Analytic Services.
Table 6: Median Total Returns Ending 12/31/2001
10 years 15 years 20 years
Large Cap Equity Funds 10.98% 11.95% 13.42%
S&P 500 Index 12.94% 13.74% 15.24%
Source: Lipper Analytic Services, Wilshire Associates, Standard &
Poors and The Vanguard Group.
19 Public and Near-public Goods
A pure public good is one such as public radio, with two properties:
1. The amount of the good consumed by one person has no eect on its availability to others.
(This is called the no rivalry, or no congestion condition.)
2. A person cannot be prevented from consuming the good. (This is called the non-exclusionary
condition.)
Sometimes condition (1) is true while (2) is not. This is arguably the case with intellectual property
distributed via the internet. (If I download a song or a software program, my use does not aect
anyone elses use.)
Additional examples of near public goods:
parks and wildlife reserves (although in some cases these can become congested),
national defense.
There are many goods/services that are widely thought of as public goods yet really arent, e.g.
schools, which are subject to congestion and also are excludable.
19.1 Optimal Provision of Goods with No-rivalry Characteristics
Consider a public good which comes in various amounts. Let x be the amount provided, at a cost
of p dollars per unit.
An economy has n consumers, i = 1, . . . , n. Consumer i has income y
i
, and pays a tax t
i
toward the
purchase of the public good. Additionally, consumer i has utility given by U
i
(c
i
, x) = U
i
(y
i
t
i
, x).
84
Table 7: Getting Burned by Hot Funds
1998 1999 2000 2001
Average Average
Annual Annual
Fund Name Rank Return Rank Return
Van Wagoner:Emrg Growth 1 105.52 1106 43.54
Rydex:OTC Fund;Inv 2 93.43 1103 36.31
TCW Galileo:AGr Eq;Instl 3 92.78 1098 34.00
RS Inv:Emrg Growth 4 90.19 1055 26.17
PBHG:Large Cap 20 5 84.56 1078 29.03
Janus Olympus Fund 6 77.24 1061 27.03
Van Kampen Aggr Gro;A 7 76.70 1067 28.04
Janus Mercury 8 76.31 1057 26.35
PBHG:Sel Equity 9 76.21 1097 33.19
WM:Growth;A 10 74.77 1046 25.82
Berger new Generation;Inv 11 73.31 1107 45.96
Janus Enterprise 12 72.28 1101 35.40
Janus Venture 13 72.22 1091 30.89
Fidelity Aggr Growth 14 70.56 1105 38.02
Janus Twenty 15 69.09 1090 30.83
Amer Cent:New Oppty 16 67.64 1033 24.11
Morg Stan Sm Cap Gro;B 17 66.59 1102 35.96
Van Kampen Emrg Gro;A 18 65.67 1021 22.70
TCW Galileo:SC Gro;Instl 19 64.87 1099 34.77
Black Rock:Md Cap Gro;Instl 20 64.44 1009 22.18
Average Fund Return 76.72 31.52
S&P 500 Return 24.75 10.50
Source: Analytic Services and Bogle Research Institute, Valley Forge, PA.
19.1.1 Case 1: one consumer; x = t
1
/p.
The objective is
max
t1
U
1
(y
1
t
1
, t
1
/p).
FONC:
U
1
c
(y
1
t
1
, t
1
/p) +
1
p
U
1
x
(y
1
t
1
, t
1
/p) = 0

U
1
x
(y
1
t
1
, t
1
/p)
U
1
c
(y
1
t
1
, t
1
/p)
= p,
i.e. MRS
1
(y
1
t
1
, t
1
/p) = p. Recall that MRS
1
is consumer 1s willingness to pay for the last
unit of the public good x, in units of consumption c, or dollars.
85
19.1.2 Case 2: two consumers; x = (t
1
+t
2
)/p.
The objective is:
max
t1,t2
U
1
(y
1
t
1
, (t
1
+t
2
)/p) s.t. U
2
(y
2
t
2
, (t
1
+t
2
)/p) k
2
.
Why? A social optimum must maximize consumer 1s utility subject to consumer 2s current utility.
Such an outcome is called Pareto optimal. (If Pareto optimality fails to hold, then we could re-
allocate resources with the end result that both consumers are better o.) Varying k
2
traces out a
full range of potential social optima.
The Lagrangian is:
L(t
1
, t
2
, ; k
2
) = U
1
(y
1
t
1
, (t
1
+t
2
)/p) +
_
U
2
(y
2
t
2
, (t
1
+t
2
)/p) k
2

.
(In what follows we shall occasionally omit functional dependencies for the sake of notational
simplicity.) Let V
1
be the maximum value of U
1
s.t. U
2
k
2
. We know by the Envelope Theorem
that V
1
/k
2
= L/k
2
= , so > 0. A higher value of assigns greater weight to consumer
2s outcome.
FONC:
L
t1
= U
1
c
+
1
p
U
1
x
+

p
U
2
x
= 0,
L
t2
= U
2
c
+
1
p
U
1
x
+

p
U
2
x
= 0.
Note that the second and third terms in each of the above equations are the same, so we get
U
1
c
= U
2
c
,
or = U
1
c
/U
2
c
. The intuition behind this is that the social planner can rearrange taxes on consumers
1 and 2 while keeping x constant. If consumer 1 pays one less tax dollar, his utility increases by
U
1
c
. Likewise, if consumer 2 pays one less tax dollar, her utility increases by U
2
c
. At the optimum,
a gain of one unit in consumer 2s utility corresponds to a gain of in consumer 1s utility.
The rst of the FONC can be rewritten as follows:
U
1
c
=
1
p
U
1
x
+

p
U
2
x
=
1
p
U
1
x
+
U
1
c
/U
2
c
p
U
2
x
=
1
p
_
U
1
x
+U
1
c
_
U
2
x
U
2
c
__

U
1
x
U
1
c
+
U
2
x
U
2
c
= p,
or
MRS
1
+MRS
2
= p.
This means the optimal choice of x has the property that p equals the aggregate willingness to pay!
86
19.1.3 Case 3: n consumers; x = /p, where =

n
i=1
t
i
.
The objective is
max
t1,...,tn
U
1
(y
1
t
1
, /p) s.t. U
2
(y
2
t
2
, /p) k
2
,
U
3
(y
3
t
3
, /p) k
3
,
.
.
.
U
n
(y
n
t
n
, /p) k
n
.
This is the n-consumer version of Pareto optimality. The optimal choice of taxes is the one that
maximizes consumer 1s utility subject to minimum levels of utility for the other n 1 consumers.
The Lagrangian is
L = U
1
(y
1
t
1
, /p) +
n

i=2

i
[U
i
(y
i
t
i
, /p) k
i
].
For convenience dene
1
= 1 and k
1
= 0. Then
L =
n

i=1

i
[U
i
(y
i
t
i
, /p) k
i
].
FONC:
L
ti
=
i
U
i
c
+
1
p
n

i=1

i
U
i
x
= 0, 1 i n.
Note that the sum is constant with respect to i, so we must have

1
U
1
c
=
2
U
2
c
= =
n
U
n
c
.
In particular,
U
1
c
=
i
U
i
c
, 2 i n,
and thus

i
=
U
1
c
U
i
c
, 2 i n.
Putting the last result back into the rst of the FONC gives
U
1
c
=
1
p
n

i=1
_
U
1
c
U
i
c
_
U
i
x
.
Dividing by U
1
c
and multiplying by p, we see that
p =
n

i=1
U
1
x
U
1
c
=
n

i=1
MRS
i
.
As in the case of two consumers, p equals the aggregate willingness to pay.
Implications:
87
For a non-rivalrous good, the optimal provision of the good has the property that the marginal
cost p equals the aggregate willingness to pay. This is called the Samuelson condition because
it was derived by the great American economist Paul Samuelson in 1954.
A simple market mechanism will not necessarily achieve the optimality condition. With non-
excludable goods, in fact, it is hard to see why anyone is willing to contribute voluntarily,
(although people do). Thus, the provision of pure public goods usually is left to political
mechanisms.
With excludable goods such as proprietary software, a per-user fee may be reasonable. Note
that the producer receives the sum of the user fees.
For questions such as how much to invest in wilderness areas, some suggest polling the public
and asking how much people would be willing to pay to expand/protect the wilderness versus
selling it o. This practice is controversial because its unclear whether those polled under-
stand the questions, or tell the truth. Moreover, goods such as wilderness areas are valued in
a passive way since most people never will experience them rst hand. Unlike ordinary con-
sumer goods, there is no observable behavior that can be traced back to a persons willingness
to pay. Despite these issues, this method, known as contingent valuation, was used to value
the environmental damageor lost passive usecaused by the Exxon Valdez oil spill.
19.2 Appendix: Social Optimum with Ordinary Goods
You may be wondering how the idea of a social optimum works with ordinary goods. Lets consider
the decision how to allocate an ordinary good x. The government collects a tax t
i
from the ith
consumer, and allocates to the consumer x
i
units of the good. The budget constraint for the
government in this case is = p, where =

n
i=1
t
i
, =

n
i=1
x
i
, and p is the price of x.
Assume, as before, that consumer i has income y
i
and uses his or her after-tax income to buy
c
i
= y
i
t
i
units of the numeraire good.
The objective is
max
t1,...,tn,
x1,...,xn
U
1
(y
1
t
1
, x
1
) s.t. U
2
(y
2
t
2
, x
2
) k
2
,
U
3
(y
3
t
3
, x
3
) k
3
,
.
.
.
U
n
(y
n
t
n
, x
n
) k
n
,
= p.
The Lagrangian is
L = U
1
(y
1
t
1
, x
1
) +
n

i
[U
i
(y
i
t
i
, x
i
) k
i
] +( p).
Once again dene
1
= 1 and k
1
= 0 so that
L =
n

i=1

i
[U
i
(y
i
t
i
, x
i
) k
i
] +( p).
88
FONC:
L
ti
=
i
U
i
c
+ = 0, 1 i n,
L
xi
=
i
U
i
x
+p, 1 i n.
The rst collection of FONC implies
U
1
c
=
i
U
i
c
, 2 i n,
or

i
=
U
1
c
U
i
c
, 2 i n.
Combining these results with the second collection of FONC gives U
1
x
= p, or, equivalently,
p = U
1
x
/U
1
c
= MRS
1
, and

i
U
i
x
= p

_
U
1
c
U
i
c
_
U
i
x
= U
1
c
p
p =
U
i
x
U
i
c
= MRS
i
, 1 i n.
Thus at a social optimum we have
MRS
i
= p, 1 i n. (*)
Note that this is the same condition that would result from opening a market in good x, and
charging p dollars per unit of x. However, in order to reach a particular social optimum we would
have to redistribute income via our choice of t
1
, . . . , t
n
.
It is possible to show the following:
Any particular social optimum can be achieved by opening a free market in good x, and
redistributing income via taxes.
For any given distribution of income, setting all taxes equal to zero achieves one possible
Pareto optimum. This may not be the one that people particularly likeit will result in
highest utility for the person with highest incomebut it is nonetheless ecient in the sense
that it satises (*).
20 Externalities
Externalities arise when the consumption or production of a good by one economic agent causes a
side eect for others. Examples include air pollution caused by burning fossil fuels, the playing of
loud music, etc. Externalities can be positive as well: a classic example is bees, which are needed
to pollinate fruit trees!
This secion deals primarily with air pollution, which is like a public good to the extent that air
quality aects the entire population of an area.
89
20.1 Consumption Externalities
We shall use an extended version of the model used in our analysis of public goods. Assume that
consumers care about three things:
consumption of a basic, numeraire good c,
consumption of a good x, with an externality,
the level z of the externality.
Think of x as gasoline and z as the amount of smog in the air. Consider an economy with n
consumers, i = 1, . . . , n. Consumer i has income y
i
and with it consumes c
i
and x
i
. The level z of
the externality is determined by the total consumption of x:
z = ,
where =

n
i=1
x
i
and is the amount of smog produced per gallon of gas used. Let p denote the
priceand the marginal costof x. The utility of consumer i is given by
U
i
(c
i
, x
i
, z) = U
i
(y
i
px
i
, x
i
, ).
We are assuming that U
i
c
> 0, U
i
x
> 0, and U
i
z
< 0, i.e. z is bad. Notice the similarity between z
and the public goods we studied previously: consumer is consumption of z has no eect on the
amount of z available to others.
20.1.1 Market Equilibrium
Consumer 1 takes p as given, and while he realizes z =

n
i=1
x
i
, he also takes x
2
, . . . , x
n
(gas
consumption of others) as given. His objective is
max
x1
U
1
(y
1
px
1
, x
1
, ).
FONC:
pU
1
c
+
1
p
U
1
x
+U
1
z
= 0

U
1
x
U
1
c
..
MRS
i
(x,c)
= p
U
1
z
U
1
c
.
In general, a consumer is advaised to set her MRSfor x relative to cequal to p U
i
z
/U
i
c
. If
U
i
z
< 0, then p U
1
z
/U
1
c
> p, so the consumer acts as if the price of x is actually higher. The
price dierence U
i
z
/U
i
c
is , (the rate of production of z per unit x), times the marginal willingness
to pay for clean air, U
i
z
/U
i
c
.
90
20.1.2 Social Optimum
A social planner has to allocate x and collect taxes t
i
, (i = 1, . . . , n) that balance the governments
costs: = p, where =

n
i=1
t
i
and =

n
i=1
x
i
. As before, we look for Pareto omptimal
outcomes. The social planners objective is:
max
t1,...,tn
x1,...,xn
U
1
(y
1
t
1
, x
1
, ) s.t. U
2
(y
2
t
2
, x
2
, ) k
2
,
.
.
.
U
n
(y
n
t
n
, x
n
, ) k
n
,
= p.
Dene
1
= 1, k
1
= 0. The Lagrangian is
L =
n

i=1
[
i
U
i
(y
i
t
i
, x
i
, ) k
i
] +( p).
FONC:
L
ti
=
i
U
i
c
+ = 0, 1 i n, (*)
L
xi
= U
i
x
+
n

i=1

i
U
i
z
p = 0, 1 i n. (**)
Equations (*) imply
=
i
U
i
c
, 1 i n,
and in particular
= U
1
c
. ()
As a consequence,

i
=
U
1
c
U
i
c
, 1 i n. ()
91
Equations (**) imply

i
U
i
x
= p
n

i=1

i
U
i
z

U
i
x
= p
n

i=1

U
i
z


i
U
1
c
U
i
x
= p
n

i=1

i
U
1
c
U
i
z
by ()

U
1
c
/U
i
c
U
1
c
U
i
x
= p
n

i=1
U
1
c
/U
i
c
U
1
c
U
i
z
by ()

U
i
x
U
i
c
..
MRS
i
(x,c)
= p
n

i=1
U
i
z
U
i
c
, 1 i n.
This means everyone has to set
MRS = p +,
where = , and =

n
i=1
U
i
z
/U
i
c
is the aggregate marginal willingness to pay for clean air.
20.1.3 Market Equilibrium versus Social Optimum
Market Eq: MRS
i
(x, c) = p U
i
z
/U
i
c
Social Opt: MRS
i
(x, c) = p

n
i=1
U
i
z
/U
i
c
= p +
So, in the social optimum, consumer i takes account of the eect of her gas consumption on everyone
else whereas in the market equilibrium she cares only about herself.
The sum p + is the social marginal cost of consuming gas. It exceeds the private cost p if is
non-zero, and if there is some value to clean air, (which obviously is the case if U
i
z
/U
i
c
0 for all
i). In the real world, is very small but n is very big, so while U
i
z
/U
i
c
is negligible, can be
signicant.
In the 1920s the English economist Arthur C. Pigou gured out that one can correct an externality
by taxing the activity that creates it, with a tax . We have shown that the optimal Pigouvian tax
for a consumption externality that aects the entire population is
=

i
{consumer is willingness to pay for marginal reduction in externality}.
20.1.4 Other Examples
Taxes for wear and tear on the road. The usual justication for a gas taxapart from
the air pollution eectis that driving causes the roadways to deteriorate. If the wear and
tear caused by a given car is proportional to the cars gas mileage, a Pigouvian tax on gas is
sensible.
92
Taxes on cigarettes are sometimes justied because they are a tax on second hand smoke.
Some people have proposed a tax on foods that cause obesity. This is a more complicated
case but the basis of their argument is that health care costs for those over 65, (which is
when most costs are incurred), are heavily subsidized through Medicare. Thus, if someone
eats too much and as a result winds up with diabetes later in life, this person contributes to
the Medicare bill, which we all pay.
20.2 Production Externalities
We will restrict our attention to a very simple example of a production externality. The example
is motivated by the electric power industry, which in most places uses coal to create electricity.
Assume there are n plants, i = 1, . . . , n. Plant i has cost function C
i
(s
i
, y
i
), where y
i
is the amount
of electricity (kWh) produced, and s
i
is a choice variable representing the choice of factors that
aect the amount of SO
2
produced. For example, s
i
could represent the choice of what type of coal
to use (more expensive coal from the Western US, which burns cleaner, versus cheaper coal from
the East), or the choice of what kind of scrubber to install. The amount of SO
2
emitted by the
plant is
z
i
= y
i

i
(s
i
),
where

i
(s
i
) < 0 and

i
(s
i
) > 0, i.e.
i
is decreasing and convex. (See Figure 9.)
Figure 63
Let be the aggregate willingness to pay to avoid SO
2
across the entire population, not only
the power industryand let p be the value of a kWh of electricity. From the point of view of an
industry regulator, the objective is to maximize the industry surplus, valuing SO
2
at /kWh:
max
y1,...,yn
,
where
=
n

i=1

i
=
n

i=1
[py
i
C
i
(s
i
, y
i
)]
is the total prot of the industry as a whole, and
=
n

i=1
z
i
=
n

i=1
y
i

i
(s
i
)
is the total amount of SO
2
emitted by the industry. (As an alternative, we could set up the problem
by having utility functions for all the local residents, who each use electricity and consume another,
numeraire good c, and wish to avoid having SO
2
in the air. As an exercise, set up the problem this
way.)
FONC w.r.t. y
i
:
p C
i
yi

i
(s
i
) = 0, 1 i n. ()
93
This means the output of plant i should be chosen so that
C
i
yi
+
i
(s
i
) = p.
The LHS, C
i
yi
+
i
(s
i
), is called the marginal social cost of production at plant i. The regulator
wants to set this equal to p, the social value of a kWh of electricity.
FONC w.r.t.
i
:
C
i
si
y
i

i
(s
i
) = 0, 1 i n. ()
Dividing by y
i
,
1
y
i
C
i
si
. .
AC
i
/si
=

i
(s
i
).
The optimal choice is the one for which the marginal increase in average cost osets the marginal
value of the reduced pollution per unit of output. Assuming AC
i
(s
i
, y
i
) is convex in s
i
(so that
with higher s
i
, an additional increase in s
i
has a bigger eect on AC
i
) and that
i
is decreasing
and convex, we have:
Figure 64
Method 1:
Tax each plant per ton of SO
2
produced.
Buy electricity at p/kWh.
The manager of plant i will then attempt to maximize

i
= py
i
C
i
(s
i
, y
i
)
i
(s
i
),
which has FONC equivalent to () and () above. This is the idea of a Pigouvian tax.
Method 2 (Cap/Trade):
Distribute among the plants a xed amount of SO
2
emission rights, each of which entitles
the bearer to produce a ton of SO
2
.
Allow the plants to trade emission rights among themselves.
Buy electricity at p/kWh.
Let q be the value of an emission right, where q > 0. A plant manager who owns k emission
rights will then attempt to maximize
py
i
C
i
(s
i
, y
i
) +v,
where v = kq qy
i

i
(s
i
) is the value of the emission rights she can sell on the market (or
will have to buy). Notice that if q = v, the FONC for this plant is equivalent to () and ().
This is how SO
2
really is regulated.
Why use Method 2?
94
In reality, no one knows what v to charge. So instead the regulator looks at the total
amount of SO
2
emitted at some reference point in time, then issues a somewhat smaller
number of emission rights, e.g. 80%. This method ensures that SO
2
is reduced by 20%
eciently.
Firms prefer this method because they get the emission rights free of charge. (Emission
rights were distributed in the early 1990s, and plants were allowed to trade them, but
the rules forbidding them from exceeding the limits didnt take eect until 1995.)
It is claimed that enforcement is easier.
21 Empirical Methods in Microeconomics
This section provides the reader with an overview of how microeconomists use real data to test
alternative theories and (in some cases) estimate the relevant parameters of a particular model.
The examples are drawn from my own work in labor economics.
21.1 Experiments and Counterfactuals
Suppose one is interested in testing a prediction of microeconomic theory. To be concrete, we shall
consider four examples:
1. If single mothers currently on welfare are oered an earnings subsidy, will they work more?
2. If the supply of low-skilled workers in a local labor market is increased by an inux of immi-
grants, will wages of native, low-wage workers fall?
3. If the minimum wage is increased, will low-wage employers hire fewer workers?
4. If people without health insurance are provided insurance, will they use more health care
services? Will they become healthier?
The classical scientic approach to such questions would be to conduct a randomized experiment.
In such an experiment, a population whose behavior is to be studied would be randomly divided into
two groups: the treatment group, members of which receive the treatment, and the control group,
members of which do not. For the welfare question, the population would be single mothers currently
on welfare. For the immigrant question, the population would be cities (or other geographic entities
such as counties). For the minimum wage question, the population would be employers. For the
nal question, the population would be the uninsured. Note that some of these experiments seem
harder to carry out than others.
Lets assume that one could conduct a randomized experiment on welfare mothers. (In reality, such
an experiment was conducted in two Canadian provinces in the mid-90s. We will examine the data
shortly.) How would one do this? Presumably, one could tabulate the employment rates of the
treatment group Y
T
and the control group Y
C
some time after the subsidy was in place. One would
then calculate the treatment eect
= Y
T
Y
C
.
95
The idea of a randomized experiment is that in the absence of the treatment, the two groups would
have had equal outcomes. Randomization is key: if treatment status really is randomly assigned
to the general population, then it is reasonable to expect the two groups to exhibit the same
behavior in the absence of treatment. The impact of statistical accidents is minimized by using
big groups. The behavior of the control group represents a counterfactual for assessing whether or
not the treatment has an eect. If a theory predicts that a subsidy will increase work eort, for
example, then we want to test the null hypothesis H
0
: = 0 versus the alternative hypothesis
H
1
: > 0.
A randomized experiment is considered the gold standard for scientic evidence. The FDA, for
example, requires drug companies to evaluate the ecacy of a new drug by means of a randomized
experiment. The high status of randomized experiments is due to several features:
1. Randomization ensures that Y
C
is a valid counterfactual. So, except for chance errors, is
truly attributable to the treatment, not to some inherent dierence between the two groups.
2. Once the experimental design is determined, the researchers hands are tied. There is no room
for weaseling. (The experimental design is a full description of the population, the sample
size, the randomization procedure, the treatment, and the data collection process.)
3. Because of (1) and (2), randomized experiments are easy to understand and therefore have a
lot of credibility.
21.1.1 The Self Suciency Project (SSP)
SSP is the name of a randomized experiment conducted in Canada during the 90s. Half a random
sample of single mothers who had been on welfare for at least a year was assigned to the treatment
group. The other half was assigned to the control group. Members of the control group were eligible
to receive their regular welfare benet, a xed monthly sum based on the number of children in
the home as well as the province, (e.g. $712 per month for a mother of one in New Brunswick).
Welfare payments are reduced dollar-for-dollar for those who earn over $200 per month. Members
of the treatment group were allowed to remain on welfare but were oered an earnings subsidy
S = (M E)/2, where M is a monthly earnings target ($2500/month) and E is actual earnings.
So, if a participant earned $650 in a month, she received a subsidy of $925. Participants qualied
for the subsidy only if they worked at least 30 hours per week, for up to three years. They also had
to receive their rst subsidy payment within a year of entering the treatment group or they forfeited
all future eligibility. Here is a graph of the monthly budget constraint: Here is a graph showing
Figure 65
the fractions of each group on welfare as a function of time, in months, since random assignment,
along with a graph of the average employment rate for each group.
Figure 66: Source: D. Card and D. Hyslop, Estimating the Eects of a Time-limited Earnings
Subsidy for Welfare Leavers, Econometrica, November 2005.
96
21.2 Research Designs Based on Natural Experiments
Often we cannot carry out an experiment, either because it would cost a lot, and be quite invasive
(e.g. SSP), or because it would be impractical. How do we proceed in such cases?
One approach is to consider events that occur, and gauge whether an anlysis of the event could be
interpreted as if the event were a random experiment. A very simple example is a paper I wrote
on the Mariel Boatlift. In that paper, examined the movements in wages and unemployment rates
in Miami, (where the Marielitos landed), and within a control group comprised of four other cities:
Tampa, Houston, Atlanta, and Los Angeles. A key dierence between a true randomized experiment
and a natural experiment is that treatment is not randomly assigned. So it is debatable whether
the control group provides a valid counterfactual. For my paper, I examined trends in employment
in Miami versus the average of the four other cities throughout the 70s: the two moved in close
parallel. (Ironically, the editor of the journal forced me to remove this graph from the published
paper!)
In a natural experiment, it may not happen that outcomes are exactly the same in both groups,
even before the treatment. Let

0
= Y
0
T
Y
0
C
represent the pre-existing gap in the outcomeor measurable quantityof iterest (e.g. average
wages), and let

1
= Y
1
T
Y
1
C
represent the gap at some time after the treatment has begun. Then we might want to look at the
dierence-in-dierences
DD =
1

0
= (Y
1
T
Y
0
T
) (Y
1
C
Y
0
C
).
This is the change in the treatment group relative to the change in the control group. The implicit
assumption is that in the absence of treatment,
0
would have remained constant.
21.2.1 The Mariel Boatlift
In the Boatlift, about 125,000 Cuban immigrants were transported on a otilla of small boats to
Miami, over the period from April 1980 to July of the same year. This represented an increase of
about 7% in the Miami labor forcemainly in the ranks of the unskilled. One simple hypothesis
is that such an inux would reduce wages for unskilled workers already in Miami. Table 8 shows
outcomes for blacks in Miami relative to the comparison cities.
21.3 Natural Experiments with Several Control Groups
In a natural experiment, one never can be sure the control group provides a valid counterfactual.
Sometimes it is possible to do additional checks by using two or more control groups. Then you
97
Table 8: Logarithms of Real Hourly Earnings of Workers Age 1661 in Miami and Four Comparison
Cities, 197985.
Group 1979 1980 1981 1982 1983 1984 1985
Miami:
Whites 1.85 1.83 1.85 1.82 1.82 1.82 1.82
(.03) (.03) (.03) (.03) (.03) (.03) (.05)
Blacks 1.59 1.55 1.61 1.48 1.48 1.57 1.60
(.03) (.02) (.03) (.03) (.03) (.03) (.04)
Cubans 1.58 1.54 1.51 1.49 1.49 1.53 1.49
(.02) (.02) (.02) (.02) (.02) (.03) (.04)
Hispanics 1.52 1.54 1.54 1.53 1.48 1.59 1.54
(.04) (.04) (.05) (.05) (.04) (.04) (.06)
Comparison Cities:
Whites 1.93 1.90 1.91 1.91 1.90 1.91 1.92
(.01) (.01) (.01) (.01) (.01) (.01) (.01)
Blacks 1.74 1.70 1.72 1.71 1.69 1.67 1.65
(.01) (.02) (.02) (.01) (.02) (.02) (.03)
Hispanics 1.65 1.63 1.61 1.61 1.58 1.60 1.58
(.01) (.01) (.01) (.01) (.01) (.01) (.02)
Note: Entries represent means of log hourly earnings (deated by the Consumer Price Index
1980=100) for workers age 1661 in Miami and four comparison cities: Atlanta, Houston, Los Angeles,
and TampaSt. Petersburg.
Source: D. Card, The Impact of the Mariel Boatlift on the Miami Labor Market, Industrial and
Labor Relations Review, January 1990. Based on samples of employed workers in the ongoing rotation
of groups of the Current Population Survey in 197985. Due to a change in SMSA coding procedures
in 1985, the 1985 sample is based on individuals in outgoing rotation groups for JanuaryJune of 1985
only.
can construct
DD
1
= (Y
1
T
Y
0
T
) (Y
1
C1
Y
0
C1
),
DD
2
= (Y
1
T
Y
0
T
) (Y
1
C2
Y
0
C2
),
DD
3
= (Y
1
C2
Y
0
C2
) (Y
1
C1
Y
0
C1
),
where C
1
refers to control group 1, and C
2
refers to control group 2. Ideally it will be the case that
DD
1
= DD
2
, or equivalently, DD
3
= 0.
21.3.1 The New Jersey Minimum Wage
In April 1992, the minimum wage rose from $4.25 to $5.05 per hour in the state of NJ. Elsewhere,
it remained $4.25. The statute that raised the minimum wage had been passed in fall of the year
before, and, in anticipation, Alan Krueger and I developed a survey of fast food restaurants in NJ
and PA. We surveyed a set of about 400 restaurants rst in FebruaryMarch of 1992, (just before
the increase), and again in late fall. We were extremely careful to track down all the restaurants
that were surveyed in the rst round. The treatment group consisted of restaurants in NJ whose
starting wages were less than $5.00 per hour prior to the increase. There were two control groups:
restaurants in PA, and restaurants in NJ that already were paying relatively high wages, ($5.00
98
Table 9: Average Employment Per Store Before and After the Rise in the NJ Minimum Wage
Stores by State Stores in NJ Dierences within NJ
Dierence, Wage = Wage = Wage Low Midrange
PA NJ NJ PA $4.25 $4.26$4.99 $5.00 high high
Variable (i) (ii) (iii) (iv) (v) (vi) (vii) (viii)
1. FTE employment before, 23.33 20.44 2.89 19.56 20.08 22.25 2.69 2.17
all available observations (1.35) (0.51) (1.44) (0.77) (0.84) (1.14) (1.37) (1.41)
2. FTE employment after, 21.17 21.03 0.14 20.88 20.96 20.21 0.67 0.75
all available observations (0.94) (0.52) (1.07) (1.01) (0.76) (1.03) (1.44) (1.27)
3. Change in mean FTE 2.16 0.59 2.76 1.32 0.87 2.04 3.36 2.91
employment (1.25) (0.54) (1.36) (0.95) (0.84) (1.14) (1.48) (1.41)
4. Change in mean FTE 2.28 0.47 2.75 1.21 0.71 2.16 3.36 2.87
employment, balanced (1.25) (0.48) (1.34) (0.82) (0.69) (1.01) (1.30) (1.22)
sample of stores
5. Change in mean FTE 2.28 0.23 2.51 0.90 0.49 2.39 3.29 2.88
employment, setting (1.25) (0.49) (1.35) (0.87) (0.69) (1.02) (1.34) (1.23)
FTE at temporarily
closed stores to zero
Notes: Standard errors are shown in parentheses. The sample consists of all stores with available data on employment. FTE (full-time-equivalent)
employment counts each part-time worker as half a full-time worker. Employment at six closed stores is set to zero. Employment at four temporarily
closed stores is treated as missing.
Source: Card and Krueger, Myth and Measurement, Princeton University Press, 1995.
or more per hour prior to the increase). Table ?? shows the comparisons of employment growth
between groups.
21.4 The Discontinuity Research Design
Sometimes one cannot nd a good natural experiment; it is nonetheless possible to nd a good
counterfactual by looking at treatments that aect some groups but not other, extremely similar
groups. A good example is Medicare. When individuals who have worked for at least 10 years turn
65, they become eligible for free health insurance. (One also is eligible if ones spouse worked 10
years.) This age limit suggests that we compare individuals who are just a few months younger
than 65, with those who are a few months older. Figure 9 shows the fractions of people with health
insurance, by age (measured in quarters). The plots are for two groups: (relatively) more educated
whites (over 12 years of education), and less educated minorities (blacks and hispanics with less
than 12 years of education). The idea of the discontinuity design is that the rule that grants free
Figure 67
insurance to those who reach their 65th birthday creates an experiment: we think of those just
over 65 as the treatment group, and those just under 65 and the control group. There are some
potential problems with this idea, depending on the application:
It may be that other factors, apart from the primary treatment, also change at the same point
in time. So it is important to check very carefully that these factors are very similar between
groups.
There may be an age trend in the outcome of interest, so that even without treatment,
individuals who are a little over 65 tend to be a little dierent from those under 65 in a
certain respect. This can be checked by looking at the age prole of the outcome of interest.
99
If individuals know they soon will be eligible for Medicare, they may act dierently when they
are just under 65 from the way they would if there were no such rule.
Figure 68
Figure 69
100

Das könnte Ihnen auch gefallen