22 views

Uploaded by mjdjar

Matlab Optimization methods

- 6%2E1994-4361
- 35773331-QTP-Q-A
- 7 Numerical Methods for Unconstrained Optimization.pdf
- 1756-pm010_-en-p
- Standard BASIC Language
- Skrainka_OptHessians
- ssch04
- Documentation of the Calculations
- 6.1 Function Parameters
- Oracle Apps Technical_ Pick Release API in Order Management
- AJCEB_Vol2_No2_28-11-02
- Supplementary Material
- Fea
- Transportation
- A Step by Step Configurations
- chap8
- Reviewer MGNT
- Some New Algorithms for Solving a System of General
- Shell Structures
- sdarticle (3)

You are on page 1of 16

Nick Kuminoff

This document provides a brief introduction to numerical optimization in Matlab and highlights

some commands that will be helpful as you begin to use Matlab to estimate nonlinear models. It

is intended to be a self-guided tutorial. If this is your first time using Matlab for numerical

optimization, I recommend that you follow along by opening Matlab on your computer and

entering each command that follows the >> prompt in this document. This will give you an

opportunity to verify the results reported here and experiment on your own. The data used in

these exercises and the m-files that generate the results described below can all be downloaded

from the course webpage.

For a thorough introduction to the mathematics of numerical methods and proofs of convergence

rates, see:

Judd, Ken. 1998. Numerical Methods in Economics. The MIT Press, Cambridge

Massachusetts.

For a less formal, but much more applicable, discussion of how to implement numerical methods

in Matlab, see:

Miranda, Mario J. and Paul L. Fackler. 2002. Applied Computational Economics and

Finance. The MIT Press, Cambridge, Massachusetts.

Finally, if you open Matlab and select Help Matlab Help and then select contents from

the tab menu, there is an optimization toolbox guide that you may find useful.

TABLE OF CONTENTS

Page

1. BASICS & QUICK REFERENCE TO COMMON COMMANDS ................3

5. NEWTON-RAPHSON...............13

Here are a few key points to remember.

All algorithms require you to code your objective function as a separate m-file.

Matlab does minimization, not maximization. Thus, if your analytical model is set up as

a maximization problem, you need to multiply your objective function by -1, or some

other transformation so that numerical minimization returns the desired parameter vector.

You can use different algorithms to maximize the same objective function.

Use the optimset command to define the stopping criteria for whichever algorithm you

use.

You can instruct the fminunc algorithm to return the gradient and Hessian evaluated at the

solution, which may be useful for subsequent inference.

Code and data to generate the results shown here can be downloaded from the course web

page.

The table below contains a quick reference to the most commonly used commands for

unconstrained optimization. Matlab also has a set of algorithms for constrained

optimization. To see these, view the introduction to the optimization toolbox from the

Help menu.

COMMAND

USE

1. UNCONSTRAINED OPTIMIZATION

fzero

fminsearch

fminunc

optimset

Nelder-Mead simplex algorithm

Newton-Raphson and Quasi-Newton methods

Set options for the algorithm you use

The first step in solving a nonlinear optimization problem is to write an m-file which defines an

objective function that you seek to minimize. While textbooks often define nonlinear

econometric models as maximization problems, Matlabs numerical algorithms search for

parameter values that minimize a function. This is a trivial inconvenience since most

maximization problems can be converted to minimization problems simply by multiplying the

objective function by -1.

To begin, consider the following nonlinear model:

y = exp( X ) + u

(2.1)

Suppose we want to use nonlinear least squares to estimate this model for . The objective

function could be defined as follows:

Q ( ) = N 1 [ yi exp( X i )] , where

2

Nx1

Nxk

(2.2)

kx 1

We can create an m-file named obj.m that evaluates this function by writing the following four

lines of code:

>>

>>

>>

>>

function Q=obj(beta,y,X)

N=size(y,1);

f=y-exp(X*beta);

Q=(1/N)*sum(f.^2);

%

%

%

%

define function

# observations

evaluate obj for all i

objective function

(2.3)

(2.4)

(2.5)

(2.6)

Line (2.3) tells Matlab that you are creating a new function named obj that takes beta, y, and X,

as its arguments, evaluates them and returns the scaler, Q. Line (2.4) measures the number of

rows in the y vector. Line (2.5) evaluates the argument inside the square brackets in (2.2) for

every observation. Finally, line (2.6) evaluates the objective function in (2.2).

Personally, I find that it is often helpful to precede the function with some descriptive text

explaining what the objective function is doing. My m-file would look something like this:

%---------------------------------------------------%

% OBJ is the objective function to an NLS problem

%

%

%

%

function Q=obj(beta,y,x)

%

%

%

% INPUTS

%

%

y:

Nx1 vector of dependent variables

%

%

X:

Nxk matrix of independent variables %

%

beta:

kx1 vector of parameters

%

% OUTPUTS

%

%

Q:

NLS objective function to minimize %

%---------------------------------------------------%

function Q=obj(beta,y,X)

%

N=size(y,1);

%

f=y-exp(X*beta);

%

Q=(1/N)*sum(f.^2);

%

%---------------------------------------------------%

define function

# observations

evaluate obj for all i

objective function

Now that we have an objective function, we want to tell Matlab to solve for the parameter vector,

, that minimizes the function. This can be done using a number of different algorithms. The

next three sections describe the most common approaches.

Finally, make sure that your objective function is included in Matlabs path.

The Nelder-Mead simplex algorithm evaluates Q ( ) at q+1 points in q-dimensional space.

These points form a simplex. Initially, the points of the simplex are defined around some

starting value, 1 . Then the simplex moves systematically through the parameter space until it

collapses onto a local minimum. This process is easiest to envision in a two-dimensional

example, where the simplex can be represented by a triangle. The following diagrams illustrate

the basic idea behind the algorithm:

simplex. e.g. Q(B)<Q(C)<Q(A)

opposite end of simplex.

1

A

simplex further in that direction.

simplex by halving distance to A

E

B

B

E

C

1

2

simplex toward B

1

Step 4.M: repeat steps 1-3 until

simplex collapses onto solution s.t. max

side length < some tolerance

To illustrate how to implement the algorithm, we can use 2753 observations on houses from

Lucas Daviss (AER, 2004) cancer cluster paper:

y =

X =

x1 =

x2 =

x3 =

x4 =

x5 =

x6 =

x7 =

constant

For example, we might be interested in how the time between sales differed between Churchill

county and Lyon county. One hypothesis would be that the cancer cluster in Churchill made it

harder for homeowners to sell their homes. In this case, we are primarily interested in recovering

1 , the coefficient on x1 .

The following lines of code load the data and prepare it for the estimation:

load optimization.mat

N=size(parcel,1);

y=duration;

X=[church house ones(N,1)];

k=size(X,2);

[min(X)' median(X)' max(X)']

%

%

%

%

%

%

load data

# observations

dependent variable

independent variables

# independent variables

look at scaling of data

(3.1)

(3.2)

(3.3)

(3.4)

(3.5)

(3.6)

Importantly, the data should all be scaled to be about the same magnitude. This will prevent the

objective function from being very flat in some dimensions relative to others. Having variables

that are scaled by vastly different orders of magnitude can cause optimization algorithms to

terminate prematurely. Line (3.6) summarizes the range of the data:

Variable

Churchill

acres

sqft

age

2

age

2

acres

constant

min

0.000

0.000

0.204

0.000

0.000

0.000

1.000

median

0.000

0.023

1.432

0.090

0.008

0.000

1.000

max

1.000

5.300

4.554

1.000

1.000

2.809

1.000

The Nelder-Mead algorithm can be used to estimate the nonlinear least squares model in (2.2)

based on these data. First, we need to define a set of starting values for the parameters and the

stopping criteria for the algorithm. This is done in the following three lines of code:

>> start=ones(k,1);

>> opt=optimset('Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);

>> beta=fminsearch('obj',start,opt,y,X)

(3.7)

(3.8)

(3.9)

Line (3.7) defines a vector of ones as a starting guess for . Line (3.8) defines the stopping

criteria for the algorithm, as well as other options as follows:

Display,iter

TolFun,1e-8

terminate if Q it Q it 1 < 1e 8

TolX,1e-8

MaxIter,1e+6

MaxFunEval,1e+6 terminate if # evaluations of Q > 1e+6

Whether or not to display the output from each iteration is a matter of personal preference.

However, it is important to make sure the other four criteria are set appropriately. Otherwise,

Matlabs default values for the stopping criteria may cause the function to terminate before it

reaches the true solution. Finally, line (3.9) calls the Nelder-Mead algorithm (aka fminsearch) to

minimize the objective function contained in the file obj.m, using the starting values defined

by start, the options defined by opt, and the data contained in the vector y and the matrix X.

The syntax and order in which you enter the arguments in (3.9) matters! You always need to

enter the name of your objective function first, in single quotes, followed by the starting values,

the options, and finally by the other data you want to pass to your objective function.

Importantly, the order in which these data are entered must also match the order in which they

are defined in your objective function. Notice that the objective function in (2.3) defines the

vector of parameters first, followed by the dependent variable, and finally by the matrix of

covariates. This matches the order in (3.9). For convenience, here are both lines again:

>> function Q=obj(beta,y,X)

>> theta=fminsearch('obj',start,opt,y,X)

So far, we have defined two separate m-files. The general NLS exponential objective function

on lines (2.3)-(2.6), and the script file on lines (3.1)-(3.9) which defines starting values and

other options for the Nelder-Mead algorithm, and then uses the fminsearch command to call the

algorithm to minimize the objective function using some predefined data and the specified

options. After calling the minimization procedure in (3.9), Matlab displays the following output:

Iteration

Func-count

min f(x)

Procedure

0

1

2

3

4

.

.

.

1988

1989

1990

1

8

10

11

12

.

.

.

3017

3018

3027

3.40011e+007

3.40011e+007

1.97289e+007

1.97289e+007

1.97289e+007

.

.

.

709769

709769

709769

initial simplex

expand

reflect

reflect

.

.

.

reflect

reflect

shrink

Optimization terminated:

the current x satisfies the termination criteria using OPTIONS.TolX of 1.000000e-008

and F(X) satisfies the convergence criteria using OPTIONS.TolFun of 1.000000e-008

beta =

-0.0742

-0.2321

0.1298

2.9995

-3.9431

0.3996

6.8009

For each iteration the results tell you the cumulative number of function evaluations, the current

value of the objective function at the best point on the simplex, and the procedure used to adjust

the simplex. The subsequent termination message tells you that the algorithm converged

according to the criteria defined in (3.8). Finally, the parameter estimates defined by beta

minimize (2.2). As we hypothesized, 1 < 0 , perhaps suggesting that all else constant, houses

in Churchill take longer to sell. The following table explores the sensitivity of this result to the

tolerances on the objective function and the parameter vector.

(1)

(2)

(3)

(4)

Churchill

-0.0696

-0.0742

-0.0742

-0.0742

acres

-0.2882

-0.2321

-0.2321

-0.2321

sqft

0.1386

0.1298

0.1298

0.1298

3.9862

2.9995

2.9995

2.9995

-5.8904

-3.9431

-3.9431

-3.9431

0.4971

0.3996

0.3996

0.3996

6.7186

6.8009

6.8009

6.8009

age

age

2

2

acres

constant

Tolerance

# iterations

obj fn value

converged

859

1931

1961

1990

7.16E+05 7.10E+05 7.10E+05 7.10E+05

yes

yes

yes

yes

Moving from left to right in the table, the tolerances become increasingly precise. In column (1)

we are satisfied with precision in our estimates out to 4 decimal places, while in column (8) we

9

require precision out to 8 decimal places. As we increase the required precision, the number of

iterations required for convergence increases. Importantly, the parameter estimates change as

well. What happens is that with low precision in column (1), the algorithm converges before it

reaches the optimum. This may be because the objective function is relatively flat near the

solution so that further changes in parameters lead to very small changes in the function itself.

This has the largest impact for the coefficients on age and age2. Increasing the precision in

columns (2)-(4) leads to a superior solution. We know it is superior because it has a lower value

for the objective function.

With data that are scaled around unity, it is typical to use tolerances on the parameters

somewhere between 1e-6 and 1e-10. Why not increase the precision to an extremely small

number 1ike 1e-100, just to be safe? There are two reasons. First, Matlab starts to run into

rounding error problems with numbers smaller than 1e-16. Second, decreasing the tolerance can

dramatically increase the time required for convergence. That said, it certainly does not hurt to

try increasing the tolerance to a very small number like 1e-12 as a robustness check on your

results. However, it should take longer to converge.

10

2Q ( )

Q ( )

Let Q ( ) =

and H ( ) =

.

~ ~

it +1 = it H 1Q , where

~

H = positive definite approximation to H

~

Q = finite-difference approximation to Q

Thus, at a minimum, all you need to provide is the objective function and the dataexactly the

same information that was required to implement the Nelder-Mead algorithm. You do not need

to provide the gradient. Matlab will use the objective function to take a finite-difference

approximation.

Using the same objective function from before, we can implement the quasi-Newton algorithm

using the following lines of code:

>> start=ones(k,1);

>> opt=optimset('Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);

>> beta=fminunc('obj',start,opt,y,X)

(4.1)

(4.2)

(4.3)

Gradient's

Iteration Func-count

f(x)

Step-size

infinity-norm

0

8

3.40011e+007

2.97e+008

1

16

2.59712e+006

3.3629e-009

3.46e+004

2

72

2.47724e+006

6788.12

2.27e+005

3

104

2.4772e+006

0.0100767

2.41e+005

4

152

2.29891e+006

591.122

9.92e+005

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

92

952

709769

1

62.7

93

960

709769

1

6.84

94

968

709769

1

0.211

Optimization terminated: relative infinity-norm of gradient less than

options.TolFun.

11

beta =

-0.0742

-0.2321

0.1298

2.9995

-3.9430

0.3996

6.8009

It converges to the same results as the Nelder-Mead algorithm. The difference is that it

converged about 100 times faster. It only took 94 iterations to reach the same solution.

In many cases, you will want to use the Hessian or the gradient, evaluated at the solution, in

order to calculate standard errors or to perform other inference. You can obtain this information

by altering the command line used to call the algorithm to read:

>> [beta,fval,flag,output,grad,H]=fminunc('obj',start,opt,y,X)

(4.4)

fval

flag

output

grad

H

()

Q

positive (negative) values indicate convergence (problems)

information about the optimization algorithm used to return

~

Q

~

H

()

()

Notice that the gradient and Hessian are both finite-difference approximations. Alternatively, we

can calculate exact expressions if we are willing to code them ourselves.

12

If you want to code the gradient and/or Hessian, you need to include it in your objective function

as an additional output. For example, suppose we want to code the gradient of (2.2). It is:

Q ( )

= 2 N 1 [ yi exp( X i )] exp( X i )X i

(5.1)

We could create a new objective function that returns the gradient as a second output. Here is

the code for an m-file called obj_grad.m:

%---------------------------------------------------%

% OBJ_GRAD is the objective function to an NLS

%

%

problem, plus the gradient

%

%

%

%

function [Q dQ]=obj_grad(beta,y,x)

%

%

%

% INPUTS

%

%

y:

Nx1 vector of dependent variables

%

%

X:

Nxk matrix of independent variables %

%

beta:

kx1 vector of parameters

%

% OUTPUTS

%

%

Q:

NLS objective function to minimize %

%

dQ:

gradient

%

%---------------------------------------------------%

function [Q dQ]=obj(beta,y,X)

%

N=size(y,1);

%

f=y-exp(X*beta);

%

Q=(1/N)*sum(f.^2);

%

if nargout==2,

%

dQ=(-2/N)*(X'*(f.*exp(X*beta)));

%

end

%

%---------------------------------------------------%

(5.2)

(5.3)

(5.4)

(5.5)

(5.6)

(5.7)

(5.8)

Line (5.2) specifies that there are two outputs, Q ( ) and Q ( ) . Lines (5.3)-(5.5) code the

objective function, just as in the obj.m file. Line (5.6) tells Matlab to evaluate the following

line only if requested. (This saves computational time since it is not necessary to calculate

Q ( ) on every step of the algorithmduring the line search procedure, for example). Finally,

In order to call this objective function, you need to specifically tell Matlab that you have

provided it with an analytical expression for the gradient. This is done when specifying the

options:

13

start=ones(k,1);

opt=optimset('GradObj','on','Display','iter','TolFun',1e-8,'TolX',1e-8,'MaxIter',1e+6,'MaxFunEval',1e+6);

beta=fminunc('obj_grad',start,opt,y,X)

(5.9)

(5.10)

(5.11)

Selecting GradObj,on tells Matlab that you have coded the gradient as a second output in

your objective function. Here are the estimation results:

Iteration

0

1

2

3

4

.

.

.

48

49

50

f(x)

3.40011e+007

1.3967e+007

1.3967e+007

1.38358e+007

6.30737e+006

.

.

.

709769

709769

709769

Norm of

step

0.755648

10

2.5

0.227829

.

.

.

0.00423822

0.00151899

0.000774996

First-order

optimality

2.97e+008

1.09e+008

1.09e+008

4.77e+007

1.7e+007

.

.

.

27.3

75

19.2

CG-iterations

1

1

0

2

.

.

.

3

3

3

OPTIONS.TolFun.

beta =

-0.0742

-0.2321

0.1299

2.9991

-3.9423

0.3995

6.8010

The algorithm converged faster than the quasi-Newton method with a finite-difference gradient

illustrated in the previous section. This reflects the increased precision in the gradient, combined

with fewer evaluations of the objective function. While the parameter estimates are nearly

identical to those estimated in the previous two sections, they are not exactly the same. A couple

of the parameters differ out at the 4th decimal place. Again, this reflects the increased precision

in the gradient. In general, using an analytical gradient, instead of an approximation, may have a

small impact on the resulting parameter estimates. This difference may or may not be

economically important, depending on the problem. Here, for convenience, are the differences:

14

Churchill

acres

sqft

age

2

age

2

acres

constant

Numerical

-0.0742

-0.2321

0.1298

2.9995

-3.9431

0.3996

6.8009

Analytical

-0.0742

-0.2321

0.1299

2.9991

-3.9423

0.3995

6.8010

Hessian. If we want to use the true Newton-Raphson method, we can also provide an analytical

Hessian by extending our objective function to report the Hessian as a third output. As with the

analytical gradient, we would need to add .Hessian,on.. in the options line to let Matlab

know that we want it to use our analytical expression, rather than a positive definite

approximation.

15

Scale your data. Scaling each variable to be around 1 often helps the algorithm to

converge faster and increases the robustness of results to alternative choices for the

conversion tolerances.

Try multiple starting values. Particularly if you are having trouble getting an estimator

to converge, try some different starting values. The most common choices are vectors of

ones or zeros, or random numbers. It is also quite easy to write a quasi-grid search that

loops over a large set of starting values and saves the parameter estimates and objective

function value associated with each set of starting values. Remember, R q is a large

place.

Try different algorithms. Estimating the model using both the Nelder-Mead algorithm

and a Newton-based method can provide a robustness check on your results. NelderMead tends to be more robust to nearly discontinuous functions or poorly-scaled

problems. If one does not work, perhaps the other will.

problems, this is often manifested by NaNs and/or Infs in your results. In this case, it is

sometimes possible to patch up your objective function by punishing it for returning

NaNs or Infs. Punishment might consist of replacing the objective function with a very

large number (in the context of your problem) if it contains a NaN or Inf. This helps

instruct the algorithm that it is going in the wrong direction.

Constrained Optimization. If you can use economic reasoning to place bounds on some

of your parameters, you may want to use Matlabs constrained optimization algorithms,

such as fmincon, which essentially implements fminunc in a bounded space.

annealing, dividing rectangles, and genetic algorithms, are sold by Mathworks as part of a

separate toolbox. Frankly, it takes some time to learn how to use global optimization

algorithms because they tend to be characterized by a large number of tuning

parameters such as the temperature in simulated annealing and mutation rates in genetic

algorithms. However, if you are willing to spend some time learning, you can download

(for free) a genetic algorithm and a dividing rectangles algorithm here:

http://www4.ncsu.edu/unity/users/p/pfackler/www/ECG790C/

16

- 6%2E1994-4361Uploaded byCromeXza
- 35773331-QTP-Q-AUploaded byAvijit Das
- 7 Numerical Methods for Unconstrained Optimization.pdfUploaded byAugusto De La Cruz Camayo
- 1756-pm010_-en-pUploaded byjohn_freddy1
- Standard BASIC LanguageUploaded byedhanaa
- Skrainka_OptHessiansUploaded byHector Perez Saiz
- ssch04Uploaded bygdeepthi
- Documentation of the CalculationsUploaded bymissesspeedy
- 6.1 Function ParametersUploaded byJuan
- Oracle Apps Technical_ Pick Release API in Order ManagementUploaded byVijay Kishan
- AJCEB_Vol2_No2_28-11-02Uploaded byarqsarqs
- Supplementary MaterialUploaded byProduction_Research
- FeaUploaded bySanjay Dolare
- TransportationUploaded byMimi
- A Step by Step ConfigurationsUploaded byusverma191794
- chap8Uploaded byaybik
- Reviewer MGNTUploaded byJv Chiu
- Some New Algorithms for Solving a System of GeneralUploaded byRafia Latif
- Shell StructuresUploaded byAshraf Mahmoud
- sdarticle (3)Uploaded byBianca Parv
- RT3Manual.pdfUploaded byAlexandru Bour
- 2251-712X-8-26.pdfUploaded byKrishnendu Shaw
- Quantitative TechniquesUploaded byPrateek Dave
- final imdcUploaded byAhmad Zikri
- Nonlinear ProgrammingUploaded byPreethi Gopalan
- Two-dimensional Finite Element Computer Codes for Stress Analysis and Field ProblemUploaded byMarco Miranda Rodríguez
- Netezza Stored Procedures GuideUploaded bythelongislander
- Fast Voltage Stabilty Index Based Optimal Reactive Power Planning Using Differential EvolutionUploaded byAnonymous LSebPS
- p-median model facility locationUploaded byRahul Arora
- Solving Large Sparse Nonlinear SystemsUploaded bylesuts

- MicroKlean RTUploaded byjohnlippy2
- A Statistical Approach to Evaluating the Manufacture of Furosemide TabletsUploaded byPacymo Dubelogy
- AutoCAD 2011 New FeaturesUploaded bypravaljain
- Simple Linear RegUploaded bySergioDragomiroff
- DictionaryEMDUploaded byfcojfloresruiz
- The Quantity Theory of Money and Its Long-run InplicationsUploaded byARUPARNA MAITY
- Supplier Evaluation Using AHPUploaded byapi-26872269
- Ejercicios para certificación JAVA SE 6Uploaded byGabriel Miranda
- Construcciones geometricasUploaded byLuis Victoria
- Sentiment Analysis Based on Interpretive Structure Modelling For Feature Selection and ExtractionUploaded byIJASRET
- Methods of TeachingUploaded byArgel Jermen A. Juan
- Mahler S MAH 17SPE E Sample 1-4-17Uploaded bytayko177
- Capital Budgeting TopicUploaded byDanica Austria Dimalibot
- Relativistic+stuffsmmmmmmUploaded byNoor Ali
- Generative Drafting IN CATIA V5Uploaded byspsharmagn
- Private Sector Indebtedness Across Euro Area CountriesUploaded byADBI Events
- mahindraUploaded byJitender Singh
- WEG - Manual PLC1-01 Board 1.4XUploaded byJardel
- Bernoulli Principle's LabUploaded byDusty Spears
- VLSI Implementation of High Resolution High Speed Low Latency Pipeline Floating Point Adder Subtractor for FFT ApplicationsUploaded byDr. Rozita teymourzadeh, CEng.
- Apache MahoutUploaded byAmol Jagtap
- 8086 CPU Complete SureshUploaded bydestaw
- sem.org-IMAC-XX-Conf-S08P03-Numerical-Simulation-Piping-Vibrations-Using-Updated-FE-Model.pdfUploaded bypastcal
- Temperament a Beginner's GuideUploaded byeanicolas
- DFS.pdfUploaded bySummrina Kanwal
- w6l1_ShortageCosts_ANNOTATED_v20.pdfUploaded byDiego Rojas Mendizábal
- Rolamentos Fag CrivosUploaded byCarlos Bruno Matos
- Milling tool designUploaded bySiddharth Dubey
- hegre_et_al_2001.pdfUploaded byAdrian Macovei
- Techtip New in 61 System Value QpwdrulesUploaded byrachmat99