You are on page 1of 16

AAECON 5946: Microeconometrics

Nick Kuminoff

Virginia Tech: Fall 2008

Matlab for Microeconometrics: Numerical Optimization


This document provides a brief introduction to numerical optimization in Matlab and highlights
some commands that will be helpful as you begin to use Matlab to estimate nonlinear models. It
is intended to be a self-guided tutorial. If this is your first time using Matlab for numerical
optimization, I recommend that you follow along by opening Matlab on your computer and
entering each command that follows the >> prompt in this document. This will give you an
opportunity to verify the results reported here and experiment on your own. The data used in
these exercises and the m-files that generate the results described below can all be downloaded
from the course webpage.

For a thorough introduction to the mathematics of numerical methods and proofs of convergence
rates, see:
Judd, Ken. 1998. Numerical Methods in Economics. The MIT Press, Cambridge
Massachusetts.
For a less formal, but much more applicable, discussion of how to implement numerical methods
in Matlab, see:
Miranda, Mario J. and Paul L. Fackler. 2002. Applied Computational Economics and
Finance. The MIT Press, Cambridge, Massachusetts.
Finally, if you open Matlab and select Help Matlab Help and then select contents from
the tab menu, there is an optimization toolbox guide that you may find useful.

TABLE OF CONTENTS
Page
1. BASICS & QUICK REFERENCE TO COMMON COMMANDS ................3

2. WRITING AN OBJECTIVE FUNCTION................................4

3. NELDER-MEAD NON-DERIVATIVE SEARCH....6

4. QUASI-NEWTON WITH NUMERICAL GRADIENT...........11

5. NEWTON-RAPHSON...............13

6. PROGRAMMING TIPS....................................................... ..16

1. BASICS & QUICK REFERENCE TO COMMON COMMANDS


Here are a few key points to remember.

All algorithms require you to code your objective function as a separate m-file.

The order of arguments in your objective function matters!

Matlab does minimization, not maximization. Thus, if your analytical model is set up as
a maximization problem, you need to multiply your objective function by -1, or some
other transformation so that numerical minimization returns the desired parameter vector.

You can use different algorithms to maximize the same objective function.

Use the optimset command to define the stopping criteria for whichever algorithm you
use.

You can instruct the fminunc algorithm to return the gradient and Hessian evaluated at the
solution, which may be useful for subsequent inference.

Code and data to generate the results shown here can be downloaded from the course web
page.

The table below contains a quick reference to the most commonly used commands for
unconstrained optimization. Matlab also has a set of algorithms for constrained
optimization. To see these, view the introduction to the optimization toolbox from the
Help menu.

COMMAND

USE
1. UNCONSTRAINED OPTIMIZATION

fzero
fminsearch
fminunc
optimset

Find the 1-dimensional root to an equation


Nelder-Mead simplex algorithm
Newton-Raphson and Quasi-Newton methods
Set options for the algorithm you use

2. WRITING AN OBJECTIVE FUNCTION

The first step in solving a nonlinear optimization problem is to write an m-file which defines an
objective function that you seek to minimize. While textbooks often define nonlinear
econometric models as maximization problems, Matlabs numerical algorithms search for
parameter values that minimize a function. This is a trivial inconvenience since most
maximization problems can be converted to minimization problems simply by multiplying the
objective function by -1.
To begin, consider the following nonlinear model:

y = exp( X ) + u

(2.1)

Suppose we want to use nonlinear least squares to estimate this model for . The objective
function could be defined as follows:

Q ( ) = N 1 [ yi exp( X i )] , where
2

y , X , and , and i = 1,..., N


Nx1

Nxk

(2.2)

kx 1

We can create an m-file named obj.m that evaluates this function by writing the following four
lines of code:

>>
>>
>>
>>

function Q=obj(beta,y,X)
N=size(y,1);
f=y-exp(X*beta);
Q=(1/N)*sum(f.^2);

%
%
%
%

define function
# observations
evaluate obj for all i
objective function

(2.3)
(2.4)
(2.5)
(2.6)

Line (2.3) tells Matlab that you are creating a new function named obj that takes beta, y, and X,
as its arguments, evaluates them and returns the scaler, Q. Line (2.4) measures the number of
rows in the y vector. Line (2.5) evaluates the argument inside the square brackets in (2.2) for
every observation. Finally, line (2.6) evaluates the objective function in (2.2).
Personally, I find that it is often helpful to precede the function with some descriptive text
explaining what the objective function is doing. My m-file would look something like this:

Example 1: Writing an Objective Function as a Separate m-file


%---------------------------------------------------%
% OBJ is the objective function to an NLS problem
%
%
%
%
function Q=obj(beta,y,x)
%
%
%
% INPUTS
%
%
y:
Nx1 vector of dependent variables
%
%
X:
Nxk matrix of independent variables %
%
beta:
kx1 vector of parameters
%
% OUTPUTS
%
%
Q:
NLS objective function to minimize %
%---------------------------------------------------%
function Q=obj(beta,y,X)
%
N=size(y,1);
%
f=y-exp(X*beta);
%
Q=(1/N)*sum(f.^2);
%
%---------------------------------------------------%

define function
# observations
evaluate obj for all i
objective function

Now that we have an objective function, we want to tell Matlab to solve for the parameter vector,
, that minimizes the function. This can be done using a number of different algorithms. The
next three sections describe the most common approaches.

Finally, make sure that your objective function is included in Matlabs path.

3. NELDER-MEAD NON-DERIVATIVE SEARCH (fminsearch)


The Nelder-Mead simplex algorithm evaluates Q ( ) at q+1 points in q-dimensional space.
These points form a simplex. Initially, the points of the simplex are defined around some
starting value, 1 . Then the simplex moves systematically through the parameter space until it
collapses onto a local minimum. This process is easiest to envision in a two-dimensional
example, where the simplex can be represented by a triangle. The following diagrams illustrate
the basic idea behind the algorithm:

Step 1: evaluate Q at endpoints of


simplex. e.g. Q(B)<Q(C)<Q(A)

Step 2: reflect worst point through


opposite end of simplex.

1
A

Step 3A: if Q(D)<Q(B), expand


simplex further in that direction.

Step 3B: if Q(D)>Q(C), contract


simplex by halving distance to A

E
B

B
E
C

1
2

Step 3C: if Q(B)<Q(D)<Q(C), shrink


simplex toward B

1
Step 4.M: repeat steps 1-3 until
simplex collapses onto solution s.t. max
side length < some tolerance

To illustrate how to implement the algorithm, we can use 2753 observations on houses from
Lucas Daviss (AER, 2004) cancer cluster paper:

y =

duration of occupancy (days)

X =

[x1 , x2 ,..., x7 ], where


x1 =

dummy variable = 1 for Churchill county

x2 =

lot size of home (acres/10)

x3 =

size of home interior (sqft/1000)

x4 =

age of home (years/100)

x5 =

(age of home)2 (years/10000)

x6 =

(lot size of home)2 (acres/1000)

x7 =

constant

For example, we might be interested in how the time between sales differed between Churchill
county and Lyon county. One hypothesis would be that the cancer cluster in Churchill made it
harder for homeowners to sell their homes. In this case, we are primarily interested in recovering
1 , the coefficient on x1 .
The following lines of code load the data and prepare it for the estimation:
load optimization.mat
N=size(parcel,1);
y=duration;
X=[church house ones(N,1)];
k=size(X,2);
[min(X)' median(X)' max(X)']

%
%
%
%
%
%

load data
# observations
dependent variable
independent variables
# independent variables
look at scaling of data

(3.1)
(3.2)
(3.3)
(3.4)
(3.5)
(3.6)

Importantly, the data should all be scaled to be about the same magnitude. This will prevent the
objective function from being very flat in some dimensions relative to others. Having variables
that are scaled by vastly different orders of magnitude can cause optimization algorithms to
terminate prematurely. Line (3.6) summarizes the range of the data:
Variable
Churchill
acres
sqft
age
2

age
2
acres
constant

min
0.000
0.000
0.204
0.000
0.000
0.000
1.000

median
0.000
0.023
1.432
0.090
0.008
0.000
1.000

max
1.000
5.300
4.554
1.000
1.000
2.809
1.000

The Nelder-Mead algorithm can be used to estimate the nonlinear least squares model in (2.2)
based on these data. First, we need to define a set of starting values for the parameters and the
stopping criteria for the algorithm. This is done in the following three lines of code:
>> start=ones(k,1);
>> opt=optimset('Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);
>> beta=fminsearch('obj',start,opt,y,X)

(3.7)
(3.8)
(3.9)

Line (3.7) defines a vector of ones as a starting guess for . Line (3.8) defines the stopping
criteria for the algorithm, as well as other options as follows:
Display,iter
TolFun,1e-8

display output from each iteration of the function


terminate if Q it Q it 1 < 1e 8

TolX,1e-8

terminate if max it it 1 < 1e 8

terminate if # iterations > 1e+6


MaxIter,1e+6
MaxFunEval,1e+6 terminate if # evaluations of Q > 1e+6

Whether or not to display the output from each iteration is a matter of personal preference.
However, it is important to make sure the other four criteria are set appropriately. Otherwise,
Matlabs default values for the stopping criteria may cause the function to terminate before it
reaches the true solution. Finally, line (3.9) calls the Nelder-Mead algorithm (aka fminsearch) to
minimize the objective function contained in the file obj.m, using the starting values defined
by start, the options defined by opt, and the data contained in the vector y and the matrix X.
The syntax and order in which you enter the arguments in (3.9) matters! You always need to
enter the name of your objective function first, in single quotes, followed by the starting values,
the options, and finally by the other data you want to pass to your objective function.
Importantly, the order in which these data are entered must also match the order in which they
are defined in your objective function. Notice that the objective function in (2.3) defines the
vector of parameters first, followed by the dependent variable, and finally by the matrix of
covariates. This matches the order in (3.9). For convenience, here are both lines again:
>> function Q=obj(beta,y,X)
>> theta=fminsearch('obj',start,opt,y,X)

So far, we have defined two separate m-files. The general NLS exponential objective function
on lines (2.3)-(2.6), and the script file on lines (3.1)-(3.9) which defines starting values and
other options for the Nelder-Mead algorithm, and then uses the fminsearch command to call the
algorithm to minimize the objective function using some predefined data and the specified
options. After calling the minimization procedure in (3.9), Matlab displays the following output:

Iteration

Func-count

min f(x)

Procedure

0
1
2
3
4
.
.
.
1988
1989
1990

1
8
10
11
12
.
.
.
3017
3018
3027

3.40011e+007
3.40011e+007
1.97289e+007
1.97289e+007
1.97289e+007
.
.
.
709769
709769
709769

initial simplex
expand
reflect
reflect
.
.
.
reflect
reflect
shrink

Optimization terminated:
the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-008
and F(X) satisfies the convergence criteria using OPTIONS.TolFun of 1.000000e-008

beta =
-0.0742
-0.2321
0.1298
2.9995
-3.9431
0.3996
6.8009

For each iteration the results tell you the cumulative number of function evaluations, the current
value of the objective function at the best point on the simplex, and the procedure used to adjust
the simplex. The subsequent termination message tells you that the algorithm converged
according to the criteria defined in (3.8). Finally, the parameter estimates defined by beta
minimize (2.2). As we hypothesized, 1 < 0 , perhaps suggesting that all else constant, houses
in Churchill take longer to sell. The following table explores the sensitivity of this result to the
tolerances on the objective function and the parameter vector.
(1)

(2)

(3)

(4)

Churchill

-0.0696

-0.0742

-0.0742

-0.0742

acres

-0.2882

-0.2321

-0.2321

-0.2321

sqft

0.1386

0.1298

0.1298

0.1298

3.9862

2.9995

2.9995

2.9995

-5.8904

-3.9431

-3.9431

-3.9431

0.4971

0.3996

0.3996

0.3996

6.7186

6.8009

6.8009

6.8009

age
age

2
2

acres
constant
Tolerance
# iterations
obj fn value
converged

1.00E-04 1.00E-05 1.00E-06 1.00E-08


859
1931
1961
1990
7.16E+05 7.10E+05 7.10E+05 7.10E+05
yes
yes
yes
yes

Moving from left to right in the table, the tolerances become increasingly precise. In column (1)
we are satisfied with precision in our estimates out to 4 decimal places, while in column (8) we
9

require precision out to 8 decimal places. As we increase the required precision, the number of
iterations required for convergence increases. Importantly, the parameter estimates change as
well. What happens is that with low precision in column (1), the algorithm converges before it
reaches the optimum. This may be because the objective function is relatively flat near the
solution so that further changes in parameters lead to very small changes in the function itself.
This has the largest impact for the coefficients on age and age2. Increasing the precision in
columns (2)-(4) leads to a superior solution. We know it is superior because it has a lower value
for the objective function.
With data that are scaled around unity, it is typical to use tolerances on the parameters
somewhere between 1e-6 and 1e-10. Why not increase the precision to an extremely small
number 1ike 1e-100, just to be safe? There are two reasons. First, Matlab starts to run into
rounding error problems with numbers smaller than 1e-16. Second, decreasing the tolerance can
dramatically increase the time required for convergence. That said, it certainly does not hurt to
try increasing the tolerance to a very small number like 1e-12 as a robustness check on your
results. However, it should take longer to converge.

10

4. QUASI-NEWTON WITH NUMERICAL GRADIENT (fminunc)

2Q ( )
Q ( )
Let Q ( ) =
and H ( ) =
.

The quasi-Newton updating formula works as follows:

~ ~

it +1 = it H 1Q , where
~
H = positive definite approximation to H
~
Q = finite-difference approximation to Q

= step size based on line search.


Thus, at a minimum, all you need to provide is the objective function and the dataexactly the
same information that was required to implement the Nelder-Mead algorithm. You do not need
to provide the gradient. Matlab will use the objective function to take a finite-difference
approximation.
Using the same objective function from before, we can implement the quasi-Newton algorithm
using the following lines of code:
>> start=ones(k,1);
>> opt=optimset('Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);
>> beta=fminunc('obj',start,opt,y,X)

(4.1)
(4.2)
(4.3)

Matlab reports the following results:


Gradient's
Iteration Func-count
f(x)
Step-size
infinity-norm
0
8
3.40011e+007
2.97e+008
1
16
2.59712e+006
3.3629e-009
3.46e+004
2
72
2.47724e+006
6788.12
2.27e+005
3
104
2.4772e+006
0.0100767
2.41e+005
4
152
2.29891e+006
591.122
9.92e+005
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
92
952
709769
1
62.7
93
960
709769
1
6.84
94
968
709769
1
0.211
Optimization terminated: relative infinity-norm of gradient less than
options.TolFun.

11

beta =
-0.0742
-0.2321
0.1298
2.9995
-3.9430
0.3996
6.8009

It converges to the same results as the Nelder-Mead algorithm. The difference is that it
converged about 100 times faster. It only took 94 iterations to reach the same solution.
In many cases, you will want to use the Hessian or the gradient, evaluated at the solution, in
order to calculate standard errors or to perform other inference. You can obtain this information
by altering the command line used to call the algorithm to read:
>> [beta,fval,flag,output,grad,H]=fminunc('obj',start,opt,y,X)

(4.4)

In this case, the variables return the following information:


fval
flag
output
grad
H

()

Q
positive (negative) values indicate convergence (problems)
information about the optimization algorithm used to return
~
Q
~
H

()
()

Notice that the gradient and Hessian are both finite-difference approximations. Alternatively, we
can calculate exact expressions if we are willing to code them ourselves.

12

5. NEWTON-RAPHSON WITH ANALYTICAL GRADIENT (fminunc)


If you want to code the gradient and/or Hessian, you need to include it in your objective function
as an additional output. For example, suppose we want to code the gradient of (2.2). It is:

Q ( )
= 2 N 1 [ yi exp( X i )] exp( X i )X i

(5.1)

We could create a new objective function that returns the gradient as a second output. Here is
the code for an m-file called obj_grad.m:
%---------------------------------------------------%
% OBJ_GRAD is the objective function to an NLS
%
%
problem, plus the gradient
%
%
%
%
function [Q dQ]=obj_grad(beta,y,x)
%
%
%
% INPUTS
%
%
y:
Nx1 vector of dependent variables
%
%
X:
Nxk matrix of independent variables %
%
beta:
kx1 vector of parameters
%
% OUTPUTS
%
%
Q:
NLS objective function to minimize %
%
dQ:
gradient
%
%---------------------------------------------------%
function [Q dQ]=obj(beta,y,X)
%
N=size(y,1);
%
f=y-exp(X*beta);
%
Q=(1/N)*sum(f.^2);
%
if nargout==2,
%
dQ=(-2/N)*(X'*(f.*exp(X*beta)));
%
end
%
%---------------------------------------------------%

(5.2)
(5.3)
(5.4)
(5.5)
(5.6)
(5.7)
(5.8)

Line (5.2) specifies that there are two outputs, Q ( ) and Q ( ) . Lines (5.3)-(5.5) code the
objective function, just as in the obj.m file. Line (5.6) tells Matlab to evaluate the following
line only if requested. (This saves computational time since it is not necessary to calculate
Q ( ) on every step of the algorithmduring the line search procedure, for example). Finally,

line (5.7) calculates the gradient.


In order to call this objective function, you need to specifically tell Matlab that you have
provided it with an analytical expression for the gradient. This is done when specifying the
options:

13

start=ones(k,1);
opt=optimset('GradObj','on','Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);
beta=fminunc('obj_grad',start,opt,y,X)

(5.9)
(5.10)
(5.11)

Selecting GradObj,on tells Matlab that you have coded the gradient as a second output in
your objective function. Here are the estimation results:

Iteration
0
1
2
3
4
.
.
.
48
49
50

f(x)
3.40011e+007
1.3967e+007
1.3967e+007
1.38358e+007
6.30737e+006
.
.
.
709769
709769
709769

Norm of
step
0.755648
10
2.5
0.227829
.
.
.
0.00423822
0.00151899
0.000774996

First-order
optimality
2.97e+008
1.09e+008
1.09e+008
4.77e+007
1.7e+007
.
.
.
27.3
75
19.2

CG-iterations
1
1
0
2
.
.
.
3
3
3

Optimization terminated: Relative function value changing by less than


OPTIONS.TolFun.
beta =
-0.0742
-0.2321
0.1299
2.9991
-3.9423
0.3995
6.8010

The algorithm converged faster than the quasi-Newton method with a finite-difference gradient
illustrated in the previous section. This reflects the increased precision in the gradient, combined
with fewer evaluations of the objective function. While the parameter estimates are nearly
identical to those estimated in the previous two sections, they are not exactly the same. A couple
of the parameters differ out at the 4th decimal place. Again, this reflects the increased precision
in the gradient. In general, using an analytical gradient, instead of an approximation, may have a
small impact on the resulting parameter estimates. This difference may or may not be
economically important, depending on the problem. Here, for convenience, are the differences:

14

Sensitivity of Parameter Estimates to Gradient Method


Churchill
acres
sqft
age
2
age
2
acres
constant

Numerical
-0.0742
-0.2321
0.1298
2.9995
-3.9431
0.3996
6.8009

Analytical
-0.0742
-0.2321
0.1299
2.9991
-3.9423
0.3995
6.8010

While we provided an exact analytical gradient, Matlab generated an approximation to the


Hessian. If we want to use the true Newton-Raphson method, we can also provide an analytical
Hessian by extending our objective function to report the Hessian as a third output. As with the
analytical gradient, we would need to add .Hessian,on.. in the options line to let Matlab
know that we want it to use our analytical expression, rather than a positive definite
approximation.

15

6. PROGRAMMING TIPS FOR OPTIMIZATION

Scale your data. Scaling each variable to be around 1 often helps the algorithm to
converge faster and increases the robustness of results to alternative choices for the
conversion tolerances.

Try multiple starting values. Particularly if you are having trouble getting an estimator
to converge, try some different starting values. The most common choices are vectors of
ones or zeros, or random numbers. It is also quite easy to write a quasi-grid search that
loops over a large set of starting values and saves the parameter estimates and objective

function value associated with each set of starting values. Remember, R q is a large
place.

Try different algorithms. Estimating the model using both the Nelder-Mead algorithm
and a Newton-based method can provide a robustness check on your results. NelderMead tends to be more robust to nearly discontinuous functions or poorly-scaled
problems. If one does not work, perhaps the other will.

Punishment. If your estimator is not converging due to some discontinuity or to scaling


problems, this is often manifested by NaNs and/or Infs in your results. In this case, it is
sometimes possible to patch up your objective function by punishing it for returning
NaNs or Infs. Punishment might consist of replacing the objective function with a very
large number (in the context of your problem) if it contains a NaN or Inf. This helps
instruct the algorithm that it is going in the wrong direction.

Constrained Optimization. If you can use economic reasoning to place bounds on some
of your parameters, you may want to use Matlabs constrained optimization algorithms,
such as fmincon, which essentially implements fminunc in a bounded space.

Global Optimization. Formal algorithms for global optimization, such as simulated


annealing, dividing rectangles, and genetic algorithms, are sold by Mathworks as part of a
separate toolbox. Frankly, it takes some time to learn how to use global optimization
algorithms because they tend to be characterized by a large number of tuning
parameters such as the temperature in simulated annealing and mutation rates in genetic
algorithms. However, if you are willing to spend some time learning, you can download
(for free) a genetic algorithm and a dividing rectangles algorithm here:
http://www4.ncsu.edu/unity/users/p/pfackler/www/ECG790C/

16