Beruflich Dokumente
Kultur Dokumente
(i) z = f (x, y) is the height above the xy-plane, or the depth below if z is negative.
(ii) Functions can be drawn as perspective plots (almost always done using a package such as MAPLE),
or as a contour plot. Contour plots are sketches of the level curves f (x, y) = z = const. in the
xy-plane. Contour maps such as the Ordnance Survey maps are examples of functions of height in
terms of x and y coordinates (the grid reference).
(iii) z = c − x2 − y 2 has a local maximum at x = y = 0. Its contours are circles centred on the origin.
z = c + x2 + y 2 has a local minimum at x = y = 0. z = x2 − y 2 also is locally flat at x = y = 0, but
it has a saddle point there. A saddle is like a pass going from one valley to another, between two
mountains.
∂z ∂z
(i) If z = z(x, y), means differentiate z with respect to x treating y as a constant. Similarly
∂x ∂y
means differentiate z with respect to y treating x as a constant.
∂z
(ii) Geometrical interpretation: is the tangent of the uphill or downhill slope of z as we move in the
∂x
∂z
x direction holding y constant. Similarly, is the slope of z moving in y with x held constant.
∂y
∂2z ∂ ∂z ∂2z ∂ ∂z
(iii) Higher derivatives: 2
means . Similarly 2
means . The mixed derivative,
∂x ∂x ∂x ∂y ∂y ∂y
∂2z ∂ ∂z ∂ ∂z ∂2z
= = = , Young0 s theorem.
∂x∂y ∂x ∂y ∂y ∂x ∂y∂x
∂f ∂2f ∂2f
(iv) Notation: fx means , fxx ≡ 2
, fxy = fyx ≡ , etc.
∂x ∂x ∂x∂y
df = fx dx + fy dy + fz dz = 0
since dy = 0 when finding the partial derivative of z with respect to x. Similar formula work for other
partial derivatives, and since x, y and z appear symmetrically in f (x, y, z) = 0 we can also write
∂x fy
=−
∂y fx
where now the partial derivative of x with respect to y means that z is held constant (dz = 0).
1(F) Change of variables, Jacobian.
If we change variables from x, y to u, v and have formulae of the form u = u(x, y), v = v(x, y) then
du ux uy dx ux uy
= , and
dv vx vy dy vx vy
is called the Jacobian matrix of the trasformation of (x, y) to u, v. The inverse transformation x = x(u, v),
y = y(u, v) is given by the inverse of the Jacobian matrix
−1
xu xv ux uy
= .
yu yv vx vy
The condition that the inverse transformation exists is that the determinant of the Jacobian matrix is
non-zero. If this is the case, in the neighbourhood of a point (x0 , y0 ) each point (x, y) maps to one and
only one point (u, v) in the uv-plane.
The Jacobian matrix and Jacobian determinant generalise in the natural way to the n × n case. If the
Jacobian determinant is non-zero, all the derivatives in both the transformation and its inverse exist, i.e.
ux , uy , vx etc. as well as xu , xv , yu etc.
CAJ 25/9/2006
MATH2640 Introduction to Optimisation
(i) ∇f (x, y) = (fx , fy ) is in the direction in which f increases most rapidly. The unit vector in this
∇f (x, y)
direction is = n̂. −∇f (x, y) is in the direction of the rate of fastest decrease. Since
|∇f (x, y)|
df = fx dx + fy dy if (dx, dy) is perpendicular to ∇f , so then (dx, dy) · ∇f = 0, then df = 0, i.e. if
we move in the direction perpendicular to ∇f , f is constant. The direction perpendicular to ∇f is
therefore the direction of the contours of constant f .
(ii) For implicitly defined functions, e.g. g(x, y, z) = 0, ∇g = (gx , gy , gz ) is in the direction of the normal
∇g(x, y, z)
to the surface g(x, y, z) = 0. The unit normal is = n̂. This can be evaluated at any
|∇g(x, y, z)|
point (x0 , y0 , z0 ) lying on the surface g(x, y, z) = 0.
(iii) The tangent plane to the surface g(x, y, z) = 0 at the point x0 = (x0 , y0 , z0 ) is
(x − x0 ) · ∇g = 0,
where ∇g is evaluated at (x0 , y0 , z0 ). If a surface is given by z = f (x, y) then the tangent plane at a
point where fx = 0 and fy = 0 is of the form z = constant.
(i) Local maxima and minima of f (x, y) occur where the tangent plane is horizontal, so f x = 0 and
fy = 0 there. In n dimensions,
∂f
= 0, i = 1, 2, · · · , n, at x = x0 ,
∂xi
if xo is a local min or max. These n conditions give n equations for the n unknowns x1 , x2 , · · · xn .
We solve these equations, which are called the first order conditions (FOC), to find the critical points
where maxima and minima occur. The name first order conditions arises from the fact that they
come from the linear terms in the Taylor expansion. There may be more than one solution of the
FOC, i.e. more than one critical point.
(ii) Not all critical points are maxima or minimima. For example saddle points can occur as well. To
classify each critical point we must look at the quadratic terms, i.e. look at the Hessian matrix.
(iii) Sometimes, all the quadratic terms may vanish, i.e. every element in H is zero. If this happens, we
have to examine the cubic terms in the Taylor expansion to classify the critical point. In general this
is complicated, but it is practical for some simple special cases.
(i) When the FOC are satisfied, the behaviour near the critical point is given by
1
f (X + δx) = f (X) + δx · Hδx + · · · .
2!
The expression δx · Hδx is called a quadratic form. To study quadratic forms we usually consider
just x · Ax = Q(x) where A is a symmetric matrix.
In the 2-dimensional case,
a b
A= , and so Q = ax2 + 2bxy + cy 2
b c
in the 3×3 case, with the obvious generalization in the n×n case. In the 3×3 case this means solving
a cubic equation, which will have 3 roots giving 3 eigenvalues. Solving cubic equations (or higher
order equations) can be hard, but there are many computer packages which will find the eigenvalues
of an n × n symmetric matrix very quickly.
CAJ 11/10/2006
MATH2640 Introduction to Optimisation
Ax = λx. (3.1)
In general, an n×n matrix has n eigenvalues, though there may be less than n in special cases. The vectors x
satisfying this equation are called the eigenvectors. For each eigenvalue λ there is a corresponding eigenvector
x.
(ii) 2 × 2 case. The quadratic form associated with the symmetric matrix
a b (x y) a b x
A= is Q(x, y) = = ax2 + 2bxy + y 2 .
b c b c y
To find the eigenvalues, we write equation (3.1) as (A − λI)x = 0, where I is the unit (identity) matrix.
Then for this to have nontrivial solutions,
a−λ b
b = 0, so λ2 − (a + c)λ + ac − b2 = 0
c−λ
is the quadratic equation whose solutions are the two eigenvalues of the matrix A. These eigenvalues are
always real for a symmetric matrix.
(iii) The same methods work for an n × n symmetric matrix A. Quadratic forms are always associated with
symmetric matrices, where Aij = Aji , and symmetric matrices always have real eigenvalues. If there are
n distinct eigenvalues then the eigenvectors of a symmetric matrix are always orthogonal to each other,
xi · xj = 0 for any i 6= j, with i and j lying between 1 and n.
(iv) Rules for classifying quadratic forms using the eigenvalue method.
If all eigenvalues are > 0, Q is positive definite (PD)
If all eigenvalues are < 0, Q is negative definite (ND)
If some eigenvalues are ≤ 0 and some are ≥ 0, Q is indefinite (I).
If all eigenvalues are ≥ 0, Q is positive semi-definite (PSD)
If all eigenvalues are ≤ 0, Q is negative semi-definite (NSD)
(v) Normalising eigenvectors
To find the eigenvector corresponding to a particular eigenvalue, solve (A − λI)x = 0 for that value of λ.
You can only solve these equations up to an arbitrary constant k. Example:
3 1
A=
1 3
x1 + x 2 = 0
giving (k, −k)
x1 + x 2 = 0
as the eigenvector.
√ To normalise
√ this, choose k so that (k, −k) has unit length, 2k 2 = 1,√so the√normalised
eigenvector is (1/ 2, −1/ 2). The normalised eigenvector corresponding to λ = 4 is (1/ 2, 1/ 2).
(vi) Normal forms
(1) (1) (1)
When the normalised eigenvectors corresponding to the n eigenvalues λ(1) , λ(2) , · · · λ(n) , are found, (y1 , y2 , · · · yn );
(2) (2) (2)
(y1 , y2 , · · · , yn ), etc. then the quadratic form Q can be written as a sum of squares,
(1) (1) (2) (2)
Q = λ(1) (x1 y1 + x2 y2 + · · ·)2 + λ(2) (x1 y1 + x2 y2 + · · ·)2 + · · ·
which is said to be the normal form of the quadratic form Q. For the above example,
x1 x2 x1 x2
Q = 3x21 + 2x1 x2 + 3x22 = 2( √ − √ )2 + 4( √ + √ )2 .
2 2 2 2
Check this by multiplying it out. Its now obvious that positive eigenvalues make Q positive definite.
3(B) Unconstrained Optimisation in Economics.
Π = R − C = pQ(x1 , x2 , · · · , xn ) − C(x1 , x2 , · · · , xn ).
To maximise the profit, we find the maximum of Π as a function of the inputs x1 , · · · , xn , leading to
∂Q ∂C
p = , i = 1, n.
∂xi ∂xi
Problems where a firm produces several different products can also lead to optimisation problems.
(ii) The discriminating monopolist.
A monopolist produces two products at a rate Q1 and Q2 . The price obtained is not constant, but reduces if
the monopolist floods the market, so p1 = a1 − b1 Q1 , and p2 = a2 − b2 Q2 , are the prices for the two products.
The Cost function is taken as C = c1 Q1 + c2 Q2 , being simply proportional to the quantity produced. The
profit
Π = R − C = p1 Q1 + p2 Q2 − c1 Q1 − c2 Q2 = (a1 − c1 )Q1 + (a2 − c2 )Q2 − b1 Q21 − b2 Q22
so there is a best choice of Q1 and Q2 to maximise profit, when
Inspection of the Hessian shows that Π has a maximum here provided b1 and b2 are positive. We then also
require a1 ≥ c1 and a2 ≥ c2 . The monopolist then has a profit maximising strategy allowing discrimination
between the amounts of the two products to be produced.
The same approach can be used when the cost function or the pricing is a more complicated function of the
Qi .
(iii) The Cobb-Douglas production function.
A commonly used model in microeconomics is the Cobb-Douglas production function
The input bundle here is then just a two-dimensional vector, and we assume here the price p is independent
of Q, unlike in the monopolist problem. If the Cost is linear in the input bundle, C = w1 x1 + w2 x2 , then the
Profit
Π = pxa1 xb2 − w1 x1 − w2 x2
and the conditions for a stationary point are
apxa−1
1 xb2 = w1 , bpxa1 xb−1
2 = w2
giving positive critical values x∗1 = apQ∗ /w1 and x∗2 = bpQ∗ /w2 , where Q∗ is the production function at the
critical value.
We want to know when this stationary value is a maximum, so we must examine the Hessian
If these conditions are satisfied, then the Cobb-Douglas production function gives a profit maximising strategy.
If they are not satisfied, the profit increases indefinitely as x1 and x2 get very large. This is possible economic
behaviour: if there is no profit maximising strategy, all it means is that the bigger the firm the more profit
it makes, which is of course entirely possible.
3(C) Constrained Optimisation: Lagrange multipliers
Normally, economic activity is constrained, e.g. expenditure is constrained by available income. The constraints
are applied to the input bundle, and may take many different forms. Often constraints are in the form of
inequalities, e.g. we can spend up to so much on equipment but no more. However, it is often simpler to solve
problems with equality constraints, e.g. we are allowed to spend 1000 pounds on equipment, and we will spend it
all, so the equipment expenditure x1 = 1000 is an equality constraint. In general, we are trying to maximise the
objective function
f (x1 , x2 , · · · , xn )
subject to the k inequality constraints
gi (x1 , x2 , · · · , xn ) ≤ bi , 1≤i≤k
hi (x1 , x2 , · · · , xn ) ≤ ci , 1 ≤ i ≤ m.
∂f ∂h
−λ = 0, i = 1, n
∂xi ∂xi
h(x) = c,
for the n + 1 unknowns x1 , x2 · · · xn , λ. As usual, we must be very careful to get all the solutions of these
n + 1 equations, and not to miss some (or include bogus solutions).
Example:
Maximise f (x, y) = xy subject to h(x, y) = x + 4y − 16 = 0.
Solution: The Lagrangian L = f − λh = xy − λ(x + 4y − 16). The equations we need are
Lx = 0, or y − λ = 0 (A)
Ly = 0, or x − 4λ = 0 (B)
x + 4y = 16. (C)
The only critical point is x = 8, y = 2 and λ = 2. At this point, f = 16, and this is the maximum value
of f subject to this constraint. Strictly speaking, we don’t yet know its a maximum, because we have not
examined the Hessian (see the section on Bordered Hessians below). However, if we look at other values of
x and y that satisfy the constraint, such as x = 0, y = 4, we see they give lower values of f , which strongly
suggests that the critical point is a maximum.
(ii) The non-degenerate constraint qualification, NDCQ
The above algorithm for finding stationary points normally works well, but the argument breaks down if
∇h = 0 at the critical point. In this case, the constraint surface doesn’t a uniquely defined normal, so we
can’t say that ∇f must be parallel to ∇h. We should therefore check that at each critical point we find the
constraint qualification, usually called the non-degenerate constraint qualification NDCQ holds, i.e. ∇h 6= 0
at each critical point.
What happens if it doesn’t hold, i.e. ∇h = 0 at the critical point? Usually the Lagrange equations can’t be
solved.
Example: Maximise f = x subject to h = x3 + y 2 = 0.
Solution: L = x − λ(x3 + y 2 ), so Lx = 0 and Ly = 0 give
L = f − λ 1 h1 − λ 2 h2 − · · · − λ m hm .
L = x + y + z − λ1 (x2 + y 2 ) − λ2 (z − 2)
1 − 2λ1 x = 0, 1 − 2λ1 y = 0, 1 − λ2 = 0, x2 + y 2 = 1, z = 2.
√ √ √ √
It is √
easily seen that the two
√ solutions are x = 1/ 2, y = 1/ 2, z = 2, λ1 = 1/ 2, λ2 = 1 and x = −1/ 2, y =
−1/ 2, z = 2, λ1 = −1/ 2, λ2 = 1.
However, it is much simpler to use z = 2 to eliminate z and just maximise f = x+y +2 subject to x 2 +y 2 = 1,
which only requires three equations for three unknowns, and gives the same answer more quickly.
If there are several inequality constraints, the non-degenerate constraint qualification, NDCQ, is that the m
rows of the Jacobian derivative matrix,
∂h1 ∂h1
∂x1 · · · · · · ∂x n
··· ··· ··· ··· ,
∂hm
∂x1 · · · · · · ∂h
∂xn
m
are linearly independent, that is that none of the rows can be expressed as a linear combination of the other
rows. If this is the case, we say the Jacobian derivative matrix is of rank m.
∂2 ∂2
Lxi xj = L= (f − λ∗1 h1 − λ∗2 h2 + · · ·)
∂xi ∂xj ∂xi ∂xj
Now the allowed displacements dx are restricted by the constraints to satisfy dx · ∇h i = 0 for every constraint
i = 1, m.
Replacing dx by x, and ∇hi by its value ui at the stationary point, the problem becomes classify the quadratic
form
Q = xT Hx, subject to ui · x = 0 i = 1, m.
x2
Q(x) = (av 2 − 2buv + cu2 )
v2
which is positive definite (minimum) if (av 2 − 2buv + cu2 ) > 0, and negative definite (maximum) if (av 2 −
2buv + cu2 ) < 0.
(ii) Bordered Hessians
If we have an n variable Lagrangian with m constraints, then the Bordered Hessian is an (n + m) × (n + m)
matrix constructed as follows:
0 B
HB =
BT H
where the B part has m rows and n columns, with the rows consisting of the constraint vectors u i , i = 1, m,
and the H part is the Hessian based on the Lagrangian. In the case where m = 2, n = 3 and the two
constraints are
u1 x + u2 y + u3 z = 0, v1 x + v2 y + v3 z = 0,
(u1 , u2 , u3 ) being ∇h1 and (v1 , v2 , v3 ) being ∇h2 , both evaluated at stationary point, the 5 × 5 Bordered
Hessian matrix is
0 0 u1 u2 u3
0 0 v 1 v 2 v3
HB = u 1 v 1 L xx L xy L xz
u2 v2 Lxy Lyy Lyz
u3 v3 Lxz Lyz Lzz
where
Lxx = fxx − λ∗1 h1xx − λ∗2 h2xx ,
and the other components of the Hessian are defined similarly.
(iii) Rules for classification using Bordered Hessians
We need to classify
Q = xT Hx, subject to ui · x = 0, i = 1, m.
Construct the bordered Hessian as defined above, and evaluate the leading principal minors. We only need
the last (n − m) LPM’s, starting with LP M2m+1 and going up to (and including) LP Mn+m . LP Mn+m is
just the determinant of the whole HB matrix.
(A) Q is positive definite (PD) if sgn(LP Mn+m ) = sgn(det HB ) = (−1)m and the successive LPM’s have the
same sign. Here sgn means the sign of, so e.g. sgn(−2) = −1. This case corresponds to a local minimum.
(B) Q is negative definite (ND) if sgn(LP Mn+m ) = sgn(det HB ) = (−1)n and the successive LPM’s from
LP M2m+1 to LP Mn+m alternate in sign. This case corresponds to a local maximum.
(C) If both (A) and (B) are violated by non-zero LPM’s, then Q is indefinite.
The semi-definite conditions are very involved, and we don’t go into them here.
Example: If n = 2 and m = 1, then n−m = 1 and we only need examine LP M3 . Suppose this is LP M3 = −1.
Then this has the same sign as (−1)m , and so the constrained Q is positive definite. Check: if Q = x2 − y 2
and the constraint is y = 0, so u = (0, 1), the bordered Hessian is
0 0 1
HB = 0 1 0
1 0 −1
which has LP M3 = detHB = −1, and Q(x, 0) = x2 which is indeed positive definite.
Example: Q = x2 − y 2 − z 2 with constraint x = 0, so now u = (1, 0, 0). Now n = 3 and m = 1 we only need
examine LP M4 and LP M3 .
0 1 0 0
1 1 0 0
HB = 0 0 −1 0
0 0 0 −1
LP M4 = −1, and LP M3 = 1. These alternate in sign there is a possibility the constrained quadratic is
negative definite. To test this against criterion (B) note LP M4 = −1 which has the same sign as (−1)n ,
and so the constrained Q is negative definite. Check: if Q = x2 − y 2 − z 2 and the constraint is x = 0, then
Q = −y 2 − z 2 which is clearly negative definite.
(iv) Evaluating determinants
Evaluating 2×2 and 3×3 determinants is straightforward, but evaluating larger determinants can be difficult.
The definition is X
det A = (−1)h A1r1 A2r2 · · · Anrn
where the summation is made over all permutations r1 r2 r3 , · · · , rn . A permutation is just the numbers 1 to
n written in a different order, e.g . [1 3 2] is a permutation on the first 3 integers and corresponds to r 1 = 1,
r2 = 3 and r3 = 2. Permutations are either odd or even. If it takes an odd number of interchanges to restore
the permutation to its natural order, the permutation is odd. If it takes an even number, the permutation is
even. In the formula for the determinant, h = 1 if the permutation is odd, h = 2 if it is even. So [2 3 4 1] is
an odd permutation, as we must first swap 2 and 1, then 2 and 3, and then 3 and 4, three swaps altogether.
Since three is an odd number of swaps, the permutation is odd.
Other useful facts for evaluating determinants are
(i) Swapping any two rows or any two columns reverses the sign of the determinant.
(ii) Adding any multiple of another row to a row leaves the determinant unchanged. Similarly adding any
multiple of a column to a different column leaves it unchanged. using this trick, it is usually simple to
generate zero elements which makes evaluating the determinant easier.
APPENDIX
How were the tests for the constrained quadratic forms derived?
This appendix can be omitted on first reading, as the most important issue is what the tests are and how to
apply them. However, there is a way to understand where these tests come from in terms of elementary row
and column operations (ERCO’s). These also explain where the leading principal minor (LPM) tests for the
unconstrained quadratic forms come from.
An elementary row and column operation, ERCO, acts on a symmetric matrix and removes a pair of
elements aij and aji from the matrix by subtracting appropriate multiples of a row and a column. To
illustrate, consider the 4 × 4 matrix
a11 a12 a13 a14
a12 a22 a23 a24
A= a13 a23 a33 a34 .
We want to remove the elements a24 from row 2, column 4 and row 4, column 2 using the elements a23
and a32 . Because the matrix is symmetric it is the same element in both places.To remove a 24 from row 2,
column 4, multiply column 3 by a24 /a23 and subtract it from column 4. This gives
a11 a12 a13 a14 − a24
a23
a13
0
a12 a22 a23 0
A = .
a24 a33
a13 a23 a33 a34 − a23
a24 a34
a14 a24 a34 a44 − a23
This process can be written as a matrix multiplication A0 U = A or
a11 a12 a13 a14 − a24 a13
a23 1 0 0 0 a11 a12 a13 a14
a12 a22 a23 0
0 1 0 0 a12 a22 a23 a24
a
= .
a13 a23 a33 a34 − a24 a33
0 0 1 a24 a13 a23 a33 a34
a 23 23
0 0 0 1 a14 a24 a34 a44
a14 a24 a34 a44 − a24
a23
a34
The matrix U on the right is a unit upper triangular matrix, because it has zeros below the leading diagonal
and ones along the leading diagonal. Multiply this out yourself to check it really works! To remove the a 24
from row 4, column 2, multiply row 3 by a24 /a23 and subtract it from row 4. This is equivalent to the matrix
multiplication U T Ã = A0 or
a11 a12 a13 a14 − a24 a13
a11 a12 a13 a14 − a24 a13
1 0 0 0 a23 a23
0 1 0 0 a12 a22 a23 0
a12 a22 a23 0
= ,
0 0 1 0 a24 a33 a a33
a13 a23 a33 a34 − a23 a13 a23 a33 a34 − 24 a23
a24
0 0 a23 1 a14 a24 a34 a44 − a24 a34
a24 a13 a24 a33 2a24 a34
a14 − a23 0 a34 − a23 a44 − a23 a23
So the whole ERCO process is equivalent to writing A = U T ÃU . The important points are that U is a unit
upper triangular matrix, so U T is a unit lower triangular matrix, and à has zeros where the a23 elements
where in A.
There are two key properties of ERCO operations.
(i) The product of two unit upper triangular matrices is another unit upper triangular matrix. This means
if we do two successive ERCO operations, we end up with
A = U2T U1T ÃU1 U2 = V T ÃV
for some unit upper triangular matrix V . In fact, we can going on removing pairs of elements as often as we
want, and still end up with A = V T ÃV .
(ii) Removing a multiple of a row from another row, or a multiple of a column from another column, leaves
the determinant of the matrix unchanged. This a standard result from the theory of determinants. Indeed, it
also leaves all the leading principal minors, as defined in 2D(ii), unchanged. This result can easily be verified
for a 3 × 3 determinant, and the extension to the n × n case is straightforward. It follows that an ERCO
operation leaves the leading principal minors unchanged, i.e. the LPM’s of A and à are all the same.
Now we apply this ERCO process to our bordered Hessian. We take the case m = 2, n = 4 so there are two
constraints and four variables,
u1 x1 + u2 x2 + u3 x3 + u4 x4 = 0, v1 x1 + v2 x2 + v3x3 + v4x4 = 0,
so that our bordered Hessian is
0 0 u1 u2 u3 u4
0 0 v1 v2 v3 v4
u1 v1 a11 a12 a13 a14
A= .
u2 v2 a12 a22 a23 a24
u3 v3 a13 xa23 a33 a34
u4 v4 a14 a24 a34 a44
First we remove u2 , u3 and u4 by using the ERCO operations based on u1 to write A = U T ÃU giving
1 0 0 0 0 0 0 0 u1 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 v1 v2 v3 v4 0 1 0 0 0 0
0 0 1 0 0 0 u1 v1 a11 a12 a13 a14 0 0 1 u2 /u1 u3 /u1 u4 /u1
A= .
0 0 u 2 /u 1 1 0 0 u2 v2 a12
a22 a23 a24 0 0 0
1 0 0
0 0 u3 /u1 0 1 0 u3 v3 a13 a23 a33 a34 0 0 0 0 1 0
0 0 u4 /u1 1 0 a1 u4 v4 a14 a24 a34 a44 0 0 0 1 0 1
If u1 happened to be zero, we would need to reorder the variables so that a non-zero element is brought into
the u1 position. Since the constraint has to be non-trivial this must be possible. Note that the U matrix
applied to the vector (0, 0, x1 , x2 , x3 , x4 )T gives
1 0 0 0 0 0
0 1 0 0
0 0 0
0 0 1 u2 /u1 u3 /u1 u4 /u1
0
Ux = x1 + u2 x2 /u1 + u3 x3 /u1 + u4 x4 /u1 //x2 = x0 .
0 0 0 1 0 0
0 0 0
x3
0 1 0
x4
0 0 0 1 0 1
T
When we apply the first row of à to x0 we get zero because of the constraint, and the same is true of x0
applied to the first column of Ã. Now we use the v2 element to remove v1 , v3 and v4 . If m > 2 we would
continue until every row and column of the border just has one non-zero element in it. If v2 happened to be
zero, we would again reorder the variables from x2 to xn to bring a non-zero element into the v2 position. If
the only non-zero element in row 2 of à is v1 , then the second constraint is linearly dependent on the first
constraint, so its not really a second constraint. In this case the NDCQ won’t hold (see 3C(iii)).
The matrix A now has the form U T BU = A with B of the form
0 0 u1 0 0 0
0 0 0 v2 0 0
u1 0 a11 a12 a13 a14
B= 0 v2 a12 a22 a23 a24 .
0 0 a13 a23 a33 a34
0 0 a14 a24 a34 a44
At this stage we can check that just as applying the first row of à to x0 gave zero because of the constraint, so
applying the second row of B to U x also gives zero. The next step is to remove all the off-diagonal elements
of the type aij elements using aij /aii multiples of the ith row and column. This reduces the matrix to the
form U T CU = A with C of the form
0 0 u1 0 0 0
0 0 0 v2 0 0
u1 0 a11 0 0 0
C= .
0 v 2 0 a 22 0 0
0 0 0 0 a33 0
0 0 0 0 0 a44
Now we get the matrix into its final form by removing the a11 and a22 elements using u1 and v2 respectively.
In general, this means that a11 to amm are removed. Then we swap columns 1 and 3, and columns 2 and 4.
Every time we swap a row we change the sign of the determinant and the principal minors, so this process
multiplies the determinants and minors by a factor (−1)m . After this swap the matrix now has the form
U T DU = A with
u1 0 0 0 0 0
0 v2 0 0 0 0
0 0 u1 0 0 0
D= .
0 0 0 v2 0 0
0 0 0 0 a33 0
0 0 0 0 0 a44
The determinant of D is simply LP M60 = u21 v22 a33 a44 . The next lowest principal minor is LP M50 = u21 v22 a33 .
Now the U matrix applied to the vector x = (0, 0, x1 , x2 , x3 , x4 )T gives y = U x. Because of the swap of rows
1 and 3, and 2 and 4, the third and fourth elements of y are zero, and the first element is x1 + x2 u2 /u1 +
x3 u3 /u1 + x4 u4 /u1 which when multiplied by D11 gives zero from the constraint equation. Similarly, D22
times the second element y2 is zero, so that
CAJ 2/11/2005
MATH2640 Introduction to Optimisation
(i) Two variables, one inequality. To maximise f (x, y) subject to g(x, y) ≤ b we look at the boundary
of the region allowed by the inequality. By drawing a sketch of lines of constant f and constant g
we see that if there is a point where ∇f and ∇g are parallel, then this point will give a maximum
of f for the region g(x, y) ≤ b. It is necessary that ∇f and ∇g are parallel and not anti-parallel, i.e.
there must be a positive λ with ∇f = λ∇g. If this is the case, the inequality constraint is binding.
If ∇f and ∇g have opposite signs, then we can increase f by going in the direction of ∇f and we
are still in the region g(x, y) ≤ b. The boundary of the constraint region is then of no particular
significance for finding the maximum of f subject to g(x, y) ≤ b. In this case we say the constraint
is not binding, or the constraint is ineffective. The maximum is then found by looking for the
unconstrained maximum of f (x, y)
(ii) Complementary Slackness Condition
We define a Lagrangian L(x, y, λ) = f (x, y)−λg(x, y). If the constraint is binding, then the equations
to be solved are
∂L ∂L
= 0, = 0, g(x, y) = b.
∂x ∂y
If the constraint is not binding, then the equations to be solved are
∂f ∂f
= 0, = 0,
∂x ∂y
and the constraint is ignored. We can neatly capture both of these sets of equations by writing
∂L ∂L
= 0, = 0, λ(g(x, y) − b) = 0.
∂x ∂y
The third equation is called the complementary slackness condition. It can either be solved with
λ 6= 0, in which case we get the binding constraint conditions, or with λ = 0, in which case the
constraint g(x, y) − b = 0 doesn’t have to hold, and the Lagrangian L = f − λg reduces to L = f . So
both cases are taken care of automatically by writing the first order conditions as
∂L ∂L
= 0, = 0, λ(g(x, y) − b) = 0.
∂x ∂y
L = xy − λ(x2 + y 2 − 1).
These are n + k equations for the n + k unknowns x1 , · · · , xn , λ1 , · · · , λk . However, ony solutions which
satisfy the 2k inequalities
are admissible. If these don’t hold, it means that the stationary point solution of the first order equalities
is either not in the allowed region given by the constraints, or has a negative Lagrange multiplier, which
is not acceptable for a maximum.
If we have a solution of the first order conditions, we must next check which constraints are binding and
which are not. If any λi = 0, in the solution, then that constraint is not binding. Only constraints with
λi > 0 correspond to binding constraints. To test whether the stationary point is indeed a maximum,
we look at the bordered Hessian constructed from the Lagrangian and the binding comnstraints. Non-
binding constraints are simply ignored when we consider whether the stationary point is a maximum or
not. We should also check the nondegenerate constraint conditions for the binding constraints, that is
check whether the components of ∇gi form a set of linearly independent vectors.
Example: Maximise f (x, y, z) = xyz subject to x + y + z ≤ 3, −x ≤ 0, −y ≤ 0 and −z ≤ 0. Note that we
have written the conditions that x, y and z are positive in the standard form for a maximisation problem.
Solution:
L = xyz − λ1 (x + y + z − 3) + λ2 x + λ3 y + λ4 z
and the first order equalities give
Lx = yz − λ1 + λ2 = 0, Ly = xz − λ1 + λ3 = 0, Lz = xy − λ1 + λ4 = 0,
λ1 (x + y + z − 3) = 0, λ2 x = 0, λ3 y = 0, λ4 z = 0.
The eight inequalities are that the four λi ≥ 0, and the four constraints already listed. The solutions are
analysed by looking first at whether λ1 = 0 or x + y + z − 3 = 0. Four types of solution are found, (a)
x = y = 0, z = any positive value less than 3, λ1 = 0, (b) x = z = 0, y = any positive value less than
3, λ1 = 0, (c) y = z = 0, x = any positive value less than 3, λ1 = 0, (d) x = y = z = 1, λ1 = 1. All
four solutions have λ2 = λ3 = λ4 = 0. The solution (d) is the most interesting, as it provides the local
and global maximum; note that the domain of available x, y and z is bounded by the constraints. At
solution (d) only the constraint x + y + z = 3 is binding, all the others being not binding and so having
λ2 = λ3 = λ4 = 0. To show (d) is a local maximum we can examine the 4 × 4 bordered Hessian with
m = 1 constraint, x + y + z − 3 = 0 (discounting the non-binding constraints) and n = 3 variables. We get
0 1 1 1
1 0 1 1
HB = 1 1 0 1
1 1 1 0
and
h1 (x) = c1 , ···, hm (x) = cm , i = 1, m,
we use the complementary slackness conditions to provide the equations for the Lagrange multipliers
corresponding to the inequalities, and the usual constraint equations to give the Lagrange multipliers
corresponding to the equality constraints. Thus
k
X m
X
L=f− λi (gi − bi ) − µi (hi − ci )
i=1 i=1
L = 2y − x2 − λ1 (1 − x2 − y 2 ) − λ2 x − λ3 y.
λ1 (1 − x2 − y 2 ), λ2 x = 0, λ3 y = 0.
As usual, all the λi ≥ 0. Solutions are x = y = 0, λ1 = λ2 = 0, λ3 = 2 and x = 1, y = 0, λ1 = 1,
λ2 = 0, λ3 = 2. This second solution is a local and global minimum. Note that this second solution has
two binding constraints, x2 + y 2 ≤ 1 and y ≥ 0. Since ∇(x2 + y ) = (2, 0) and ∇y = (0, 1) at the critical
point, we note that these are two linearly indepependent vectors (not parallel), so the NDCQ are satisfied
here.
4(E) The Kuhn-Tucker method
This is an alternative, and slightly simpler method for dealing with the common case where there are
positivity constraints, that is constraints of the form
xi ≥ 0 or − xi ≤ 0.
This is common in practice, where often variables are only meaningful when positive, e.g. production
rates cannot be negative.
The typical problem is maximise f (x) subject to gi (x) ≤ bi , i = 1, k, and xi ≥ 0, i = 1, n. The standard
method for this problem would be to use the n + k Lagrange multipliers corresponding to the n + k
constraints, and proceed as before. This works, but is rather cumbersome, and the Kuhn-Tucker method
is neater.
We define the Kuhn-Tucker Lagrangian
k
X
L̄ = f − λi (gi − bi )
i=1
CAJ 15/10/2005