Optimization Theory for Large Systems
5/5
()
About this ebook
Related to Optimization Theory for Large Systems
Titles in the series (100)
Laplace Transforms and Their Applications to Differential Equations Rating: 5 out of 5 stars5/5The Calculus Primer Rating: 0 out of 5 stars0 ratingsAn Introduction to Lebesgue Integration and Fourier Series Rating: 0 out of 5 stars0 ratingsGeometry: A Comprehensive Course Rating: 4 out of 5 stars4/5History of the Theory of Numbers, Volume II: Diophantine Analysis Rating: 0 out of 5 stars0 ratingsThe Foundations of Statistics Rating: 0 out of 5 stars0 ratingsMethods of Applied Mathematics Rating: 3 out of 5 stars3/5Analytic Inequalities Rating: 5 out of 5 stars5/5Infinite Series Rating: 4 out of 5 stars4/5First-Order Partial Differential Equations, Vol. 1 Rating: 5 out of 5 stars5/5A Catalog of Special Plane Curves Rating: 2 out of 5 stars2/5Mathematics for the Nonmathematician Rating: 4 out of 5 stars4/5Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes Rating: 0 out of 5 stars0 ratingsTheory of Approximation Rating: 0 out of 5 stars0 ratingsFourier Series and Orthogonal Polynomials Rating: 0 out of 5 stars0 ratingsGauge Theory and Variational Principles Rating: 2 out of 5 stars2/5Calculus: An Intuitive and Physical Approach (Second Edition) Rating: 4 out of 5 stars4/5First-Order Partial Differential Equations, Vol. 2 Rating: 0 out of 5 stars0 ratingsCalculus Refresher Rating: 3 out of 5 stars3/5Fourier Series Rating: 5 out of 5 stars5/5An Adventurer's Guide to Number Theory Rating: 4 out of 5 stars4/5Optimization Theory for Large Systems Rating: 5 out of 5 stars5/5The Concept of a Riemann Surface Rating: 0 out of 5 stars0 ratingsAdvanced Calculus: Second Edition Rating: 5 out of 5 stars5/5Applied Functional Analysis Rating: 0 out of 5 stars0 ratingsElementary Matrix Algebra Rating: 3 out of 5 stars3/5Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, Second Edition Rating: 0 out of 5 stars0 ratingsTopology for Analysis Rating: 4 out of 5 stars4/5A History of Mathematical Notations Rating: 4 out of 5 stars4/5Differential Forms with Applications to the Physical Sciences Rating: 5 out of 5 stars5/5
Related ebooks
Invitation to Dynamical Systems Rating: 5 out of 5 stars5/5Mathematical Optimization Terminology: A Comprehensive Glossary of Terms Rating: 0 out of 5 stars0 ratingsDynamic Programming: Sequential Scientific Management Rating: 0 out of 5 stars0 ratingsAn Introduction to Mathematical Modeling Rating: 5 out of 5 stars5/5Nonlinear Programming: Analysis and Methods Rating: 5 out of 5 stars5/5Dynamic Programming: Models and Applications Rating: 0 out of 5 stars0 ratingsMatrices and Transformations Rating: 4 out of 5 stars4/5Applied Matrix Algebra in the Statistical Sciences Rating: 4 out of 5 stars4/5Linear Algebra and Linear Operators in Engineering: With Applications in Mathematica® Rating: 0 out of 5 stars0 ratingsDynamic Random Walks: Theory and Applications Rating: 0 out of 5 stars0 ratingsCombinatorial Optimization: Algorithms and Complexity Rating: 4 out of 5 stars4/5Nonlinear Optimization Rating: 5 out of 5 stars5/5Optimization Theory with Applications Rating: 4 out of 5 stars4/5Matrix Theory and Applications for Scientists and Engineers Rating: 0 out of 5 stars0 ratingsNumerical Methods for Scientists and Engineers Rating: 4 out of 5 stars4/5Artificial Neural Networks and Statistical Pattern Recognition: Old and New Connections Rating: 0 out of 5 stars0 ratingsThe Art and Theory of Dynamic Programming Rating: 0 out of 5 stars0 ratingsBuilding a Recommendation System with R Rating: 0 out of 5 stars0 ratingsIntroduction to the Theory of Abstract Algebras Rating: 0 out of 5 stars0 ratingsKronecker Products and Matrix Calculus with Applications Rating: 0 out of 5 stars0 ratingsSimulation Rating: 3 out of 5 stars3/5TensorFlow A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsMathematical Modelling in Science and Technology: The Fourth International Conference, Zurich, Switzerland, August 1983 Rating: 0 out of 5 stars0 ratingsIntroduction to Nature-Inspired Optimization Rating: 0 out of 5 stars0 ratingsStatistical Independence in Probability, Analysis and Number Theory Rating: 0 out of 5 stars0 ratingsThe Princeton Companion to Applied Mathematics Rating: 5 out of 5 stars5/5Quantum Theory of Collective Phenomena Rating: 0 out of 5 stars0 ratingsDynamic System Identification: Experiment Design and Data Analysis Rating: 0 out of 5 stars0 ratingsDigital Dice: Computational Solutions to Practical Probability Problems Rating: 5 out of 5 stars5/5Mathematical Approaches to Neural Networks Rating: 0 out of 5 stars0 ratings
Mathematics For You
My Best Mathematical and Logic Puzzles Rating: 5 out of 5 stars5/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Calculus Made Easy Rating: 4 out of 5 stars4/5Mental Math Secrets - How To Be a Human Calculator Rating: 5 out of 5 stars5/5Basic Math & Pre-Algebra For Dummies Rating: 4 out of 5 stars4/5The Everything Everyday Math Book: From Tipping to Taxes, All the Real-World, Everyday Math Skills You Need Rating: 5 out of 5 stars5/5Real Estate by the Numbers: A Complete Reference Guide to Deal Analysis Rating: 0 out of 5 stars0 ratingsAlgebra - The Very Basics Rating: 5 out of 5 stars5/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5The Thirteen Books of the Elements, Vol. 1 Rating: 0 out of 5 stars0 ratingsThe Little Book of Mathematical Principles, Theories & Things Rating: 3 out of 5 stars3/5The Everything Guide to Algebra: A Step-by-Step Guide to the Basics of Algebra - in Plain English! Rating: 4 out of 5 stars4/5Logicomix: An epic search for truth Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Game Theory: A Simple Introduction Rating: 4 out of 5 stars4/5Flatland Rating: 4 out of 5 stars4/5The Golden Ratio: The Divine Beauty of Mathematics Rating: 5 out of 5 stars5/5See Ya Later Calculator: Simple Math Tricks You Can Do in Your Head Rating: 4 out of 5 stars4/5Algebra I Workbook For Dummies Rating: 3 out of 5 stars3/5Algebra I For Dummies Rating: 4 out of 5 stars4/5Relativity: The special and the general theory Rating: 5 out of 5 stars5/5smarTEST Prep: Guide to LSAT Logic Games Rating: 5 out of 5 stars5/5The Math of Life and Death: 7 Mathematical Principles That Shape Our Lives Rating: 4 out of 5 stars4/5Basic Math Notes Rating: 5 out of 5 stars5/5Is God a Mathematician? Rating: 4 out of 5 stars4/5ACT Math & Science Prep: Includes 500+ Practice Questions Rating: 3 out of 5 stars3/5
Reviews for Optimization Theory for Large Systems
1 rating0 reviews
Book preview
Optimization Theory for Large Systems - Leon S. Lasdon
Index
1
Linear and Nonlinear Programming
The problem of mathematical programming is that of maximizing or minimizing an objective function f(x1 ··· xn) by choice of the vector x = (x1 ··· xn)’. The variables xi may be allowed to take on any values whereupon the problem is one of unconstrained minimization or they may be restricted to take on only certain allowable values, whereupon the problem is constrained. Only problems in which (1) the variables xi can vary continuously within the region of interest and (2) the objective and constraint functions are continuous and differentiable are considered here.
If the problem is constrained, its difficulty depends critically on the nature of the constraints, i.e., linear, nonlinear, etc. We consider first the unconstrained case, then the more difficult, constrained one. The constrained case will be divided into two parts: linear constraints and linear objective function (linear programming) and at least one nonlinear constraint and/or nonlinear objective (nonlinear programming).
1.1 Unconstrained Minimization
Necessary and Sufficient Conditions for an Unconstrained Minimum. The problem here is to maximize or minimize a function of n variables, f(x), with no restrictions on the variables x. Many real-life problems are of this form, where whatever constraints are present do not restrict the optimum. Also, many problems in which the constraints are binding can be converted to unconstrained problems or sequences of such problems. Since the problem of maximizing f(x) is equivalent to that of minimizing–f(x), only the minimization problem is considered.
A point x∗ is said to be a global minimum of f(x) if
(1)
for all x. If the strict inequality holds for x ≠ x∗ the minimum is said to be unique. If (1) holds only for all x in some neighborhood of x∗, then x* is said to be a local or relative minimum of f(x), since x∗ is only the best point in the immediate vicinity, not in the whole space.
If f(x) is continuous and has continuous first and second partial derivatives for all x, the necessary conditions for a local minimum are [3]
(2)
and that the matrix of second partial derivatives evaluated at x* be positive semidefinite. Any point x* satisfying (2) is called a stationary point of f(x). Sufficient conditions for a relative minimum are that the matrix of second partial derivatives of f(x) evaluated at x* be positive definite and (2) hold.
Numerical Methods for Finding Unconstrained Minima. The most obvious approach to finding the minimum of f(x) is to solve (2). These are a set of n equations, usually nonlinear, in the n unknowns xi. Unfortunately the task of solving large sets of nonlinear equations is very difficult. The function f(x) may be so complex that it is difficult even to write out (2) in closed form. Further, even if (2) could be solved, there would be no guarantee that a given solution was not a maximum, saddle point, etc., rather than a minimum. Thus other approaches must be considered.
Gradient. If f(x) is continuous and differentiable, a number of minimization techniques using the gradient of f(x), written ∇f(x), are available. The gradient is the vector whose ith component is ∂f/∂xi. It points in the direction of maximum rate of increase of f(x) (–∇f points in the direction of greatest decrease). The vector ∇f is, at any point x0, normal to the contour of constant function value passing through x0.
Steepest Descent. The method of steepest descent for finding a local minimum of f(x) proceeds as follows. Start at some initial point x0 and compute ∇f(x0). Take a step in the direction of steepest descent,–∇f(x0), using a step length α0, to obtain a new point x1. Repeat the procedure until some stop criterion is satisfied. This process is described by the relations
x0 given
(3)
where α1 > 0. The process will, under very mild restrictions [4] on f(x), converge to at least a local minimum of f(x), if the αi are chosen so that
(4)
for all i, i.e., if the function is made to decrease at each step. Since the function is initially decreasing in the directions given by–∇f(x1), there always exist αi > 0 such that (4) is satisfied.
Step Length and Optimum Gradients. One way to find αi satisfying (4) is to choose αi to minimize the function
(5)
Note that xi and ∇f/(xi) are known vectors so that the only variable in (5) is α. The adaptation of the method of steepest descent which uses (5), called the method of optimum gradients [4], is described by
x0 given
Choose α = αi by minimizing g(α) in (5),
(6)
Set i = i + 1 and repeat.
Geometrically, αi is chosen by minimizing f(x) along the direction s1 starting from xi. At a minimum,
(7)
so the vector xi + αsi must be tangent to a contour at α = αi, for dg is then zero for small changes dα. Since ∇f(xi+1) is normal to the same contour, successive steps are at right angles to one another. Practical methods for carrying out the one-dimensional minimization are discussed later in this section.
Stop Criteria. Some possible stop criteria are as follows:
Since, at a minimum ∂f/∂xi = 0, stop when
or
Stop when the change in function is less than some limit η, i.e.,
Others are possible. Criterion 2 appears to be the more dependable of the two, provided it is satisfied for several successive values of i.
Local versus Global Minima. The most that can be guaranteed of this or any other iterative minimization technique is that it will find a local minimum, in general the one nearest
the starting point xi. To attempt to find all local minima (and thus the global minimum), the method most used is to repeat the minimization from many different initial points.
Numerical Difficulties. The fact that successive steps of the optimum gradient method are orthogonal leads to very slow convergence for some functions. If the function contours are hyperspheres (circles in two dimensions), the method finds the minimum in one step. However, if the contours are in any way eccentric, an inefficient zigzag behavior results, as shown in Figure 1-1. This occurs because, for eccentric contours, the gradient direction is generally quite different from the direction to the minimum. Many, if not most, of the functions occurring in practical applications are ill-behaved in that their contours are eccentric or nonspherical. Thus more efficient schemes are desirable.
+ (x2–4)², normal steepest-descent method.
Second-Order
Gradient Methods. Recently, a number of minimization techniques have been developed which substantially overcome the above difficulties. What appear to be the best of these will be described in detail. First, however, the logic behind these methods will be explained.
Since the first partial derivatives of a function vanish at the minimum, a Taylor-series expansion about the minimum x∗ yields
(8)
where Hf(x*), the matrix of second partials of f evaluated at x*, is positive definite. Thus the function behaves like a pure quadratic in the vicinity of x*. It follows that the only methods which will minimize a general function quickly and efficiently are those which (1) work well on a quadratic and (2) are guaranteed to converge eventually for a general function. All others will be slow, at least in the vicinity of the minimum (see Figure 1-1), and often elsewhere.
Conjugate Directions. General minimization procedures can be designed which will minimize a quadratic function of n variables in n steps [5–7]. Most, if not all, are based on the ideas of conjugate directions.
The general quadratic function can be written
(9)
where A is positive definite and symmetric. Let x∗ minimize q(x). Then
(10)
Given a point x0 and a set of linearly independent directions {s0, s1,..., sn-1}, constants βi can be found such that
(11)
If the directions si are A-conjugate, i.e., satisfy
(12)
and none are zero, then the si are easily shown to be linearly independent and the βi can be determined from (11) as follows:
(13)
Using (12),
(14)
and, using (10),
(15)
Now consider an iterative minimization procedure, starting at x0 and successively minimizing q(x) down the directions s0, s1, s2, . . . , sn-1, where these directions satisfy (12). Successive points are then determined by the relations
(16)
where αi is determined by minimizing f(xi + αsi), as in the optimum gradient method, so that
(17)
Using (10) in (17) gives
(18)
or
(19)
From (16),
(20)
so that
(21)
Thus (19) becomes
(22)
which is identical to (15). Hence this sequential process leads, in n steps, to the minimum x*.
Method of Fletcher and Powell. A method recently presented by Fletcher and Powell [5] is probably the most powerful general procedure now known [8] for finding a local minimum of a general function, f(x). It is designed so that, when applied to a quadratic, it minimizes in n iterations. It does this by generating conjugate directions.
Central to the method is a symmetric positive definite matrix Hi which is updated at each iteration, and which supplies the current direction of motion, si, by multiplying the current gradient vector. An iteration is described by the following:
H0 = any positive definite matrix
Choose α = αi by minimizing f(xi + αsi),
(23)
where the matrices Ai and Bi are defined by
(24)
Note that the numerators of Ai and Bi are both matrices, while the denominators are scalars. Fletcher and Powell prove the following:
The matrix Hi is positive definite for all i. As a consequence of this, the method will usually converge, since
(25)
i.e., the function f is initially decreasing along the direction si, so that the function can be decreased at each iteration by minimizing down si.
When the method is applied to the quadratic (9), then
The directions si (or equivalently ϭi) are A-conjugate, thus leading to a minimum in n steps.
The matrix Hi converges to the inverse of the matrix of second partials of the quadratic, i.e.,
When applied to a general function, Hi tends to the inverse of the matrix of second partials of the function evaluated at the minimum since, as the minimum is approached, the second-order terms in the Taylor-series expansion predominate.
FIGURE 1-2 Contours of the Rosenbrock function.
Numerical tests bear out the rapid convergence of this method. Consider, for example, the function
(26)
called the Rosenbrock function [9], whose contours are given in Figure 1-2. The minimum is at (1, 1) and the steep curving valley makes minimization difficult. The paths taken by the optimum gradient technique and by the method of Fletcher and Powell are shown in Figures 1-3 and 1-4. The Fletcher–Powell technique follows the curved valley and minimizes very efficiently.
FIGURE 1-3 Fletcher–Powell method minimizing the Rosenbrock function.
FIGURE 1-4 Optimum gradient method minimizing the Rosenbrock function.
An excellent reference, which derives this algorithm as a member of a class of methods and gives numerical comparisons with other techniques, has been prepared by Pearson [21].
Conjugate Gradient Method. Other conjugate direction minimization techniques also exist. One of these, due to Fletcher and Reeves [6], requires computation of the gradient of f(x) and the storage of only one additional vector, the actual direction of search. The algorithm is
x0 arbitrary
Choose αi to minimize f(xi + αsi),
where
This method is not quite as efficient [8] as the Fletcher–Powell technique but requires much less storage, a significant advantage when the number of variables, n, is large.
Minimization without Derivatives. There are also a number of minimization techniques which do not require derivatives. Of these, tests performed thus far indicate that Powell’s method [7] is the most efficient [8]. Each iteration requires n one-dimensional minimizations down n linearly independent directions, s1, s2, ..., sn. As a result of these minimizations a new direction, s, is defined and, if a test is passed, s replaces one of the original directions, sr. The process is usually started from the best estimate of the minimum, x0, with the initial si’s being the coordinate directions.
The procedure is as follows:
For r = 1, 2, ..., n calculate αr, so that f(xr-1 + αsr) is a minimum and define xr = xr-1 + αrsr.
Find the integer m, 1 ≤ m ≤ n, so that [f(xm-1)–f(xm)] is a maximum, and define ∆ = f(xm-1)–f(xm).
Calculate f3 = f(2xn–x0) and define f1 = f(x0) and f2 = f(xn).
If either f3 ≥ f1 and/or
use the old directions s1, s2, . . . , sn for the next iteration and use xn for the next x0.
Otherwise, defining s = xn–x0, calculate α so that f(xn + αs) is a minimum. Use s1, s2, . . . , sm-1, sm+1, sm+2, . . . , sn, s as the directions for the next iteration and xn + αs for the next x0.
One-Dimensional Minimization. All the methods discussed thus far have searched for a minimum in n-dimensional space by performing one-dimensional minimizations down a set of directions {si}. Thus the efficiency of any such procedure depends critically on the efficiency of the method used to solve the single-dimensional search. Three techniques are presented. The first two use polynomial interpolation, one requiring derivatives, the second only function values. The third, the Fibonacci method, also requires only function values. Unlike the interpolation methods, it does not depend on smoothness of the function being minimized, and may be applied even to discontinuous functions.
For both interpolative procedures, the variables x1,..., xn are scaled so that a unit change in any variable is a significant but not too large percentage change in that variable. For example, if a variable is expected to have a value around 100 units, then a 1-unit change would be considered significant, whereas a 10-unit change would be too large.
Cubic Interpolation. This technique, described in [6], solves the problem of finding the smallest nonnegative α, α*, for which the function
(27)
attains a local minimum in three stages. It uses the derivative
(28)
The first stage normalizes the s vector so that a step size α = 1 is acceptable. The second stage establishes bounds on α*, and the third stage interpolates its value.
STAGE 1. Calculate
with (s)j component j of s and divide each component of s by A. This ensures that s is a reasonable change in x.
STAGE 2. Evaluate g(α) and g′(α) at the points a = 0, 1, 2, 4, . . . , a, b, where b is the first of these values at which either g′ is nonnegative or g has not decreased. It then follows that α* is bounded in the interval a < α* ≤ b. If g(1) is much greater than g(0), divide the components of s by a factor, e.g., 2 or 3, and repeat this stage.
STAGE 3. A cubic polynomial is now fitted to the four values g(a), g(a), g(b), g′(b), and its minimum, αe, is taken to be the value for α*. It is shown in [6] that the cubic has a unique minimum in the interval (a, b) which is given by
(29)
where
(30)
(31)
If neither g(a) nor g(b) is less than g(αe), then αe is accepted as the estimate of α*. Otherwise, according as g′(αe) is positive or negative, the interpolation is repeated over the subinterval (a, αe or (αe, b), respectively.
It is interesting that for small values of g′(a) and g′(b), the cubic has the shape that a flat metal spring would assume if fitted to the points, a, b, with slopes g′(a), g′(b).
Quadratic Interpolation. If derivatives are not available or are difficult to compute, then quadratic interpolation should be used in the one-dimensional minimization. The procedure can again be described in three steps.
STAGE 1. This is the same as stage 1 above.
STAGE 2. Evaluate g(α) at the points α = 0, 1, 2, 4, ..., a, b, c, where c is the first of these values at which g has increased. Then
Again, if g(1) » g(0), then divide the components of s by a factor, e.g., 2 or 3, and repeat.
STAGE 3. A quadratic polynomial is now fitted to the three values g(a), g(b), g(c), and its minimum, αe, is
(32)
If
then αe is accepted as the estimate of α*. Otherwise, b is taken as α*.
Fibonacci Technique. Unlike the previous technique, the Fibonacci method [20] does not use derivatives. It can thus deal with functions which are not differentiable or even continuous, functions to which the previous techniques could not be applied. It minimizes by assuming that the optimal value of the variable is within some initial interval, called the initial interval of uncertainty. It then systematically reduces this interval by evaluating the function within the interval, thus closing in
on the optimal point. To do this without gradients, one must assume something about the function to be minimized. The Fibonacci technique proceeds on the assumption that this function is unimodal within the initial interval of uncertainty.
Unimodality. Roughly speaking, a unimodal function is one that has only one peak (max or min). More precisely, a function of one variable is said to be unimodal if, given that two values of the variable are on the same side of the optimum, the one nearer the optimum gives the better functional value, i.e., the smaller value in the case of a minimization problem. Mathematically, this is phrased as follows. Let the minimum be at x*, and define x1 < x2. Then the function f(x) is unimodal if
and
This property enables us to reduce any initial interval of uncertainty by function evaluations alone. Consider the normalized interval [0, 1] and two function evaluations (henceforth called experiments
) within it, as shown in Figure 1-5. If the function is unimodal, i.e., has a single minimum, then the minimizing x cannot lie to the right of x2. Thus that part of the interval can be discarded, and a new smaller interval of uncertainty results, as shown in Figure 1-6. If f(x1) > f(x2), then the interval [0, x1] would have been discarded while if f(x1) = f(x2), then both [0, x1] and [0, x2] can be dismissed. Moreover, if one of the original experiments remains within this new interval, only one other experiment need be placed within it in order that the process be repeated. The Fibonacci method places the experiments so that one of the original experiments always remains. The method makes use of the sequence of Fibonacci numbers, {Fn}, defined by
yielding the sequence 1, 1, 2, 3, 5, 8, 13, .... It assumes that n experiments are to be made, and proceeds as follows.
FIGURE 1-5 Initial interval of uncertainty.
FIGURE 1-6 Reduced interval of uncertainty.
Method. Let the initial interval of uncertainty be L0 and define
units from the other end. Discard some part of the interval using the unimodality assumption. There then remains a new smaller interval of uncertainty with one experiment left in it, that experiment being some distance in from (any) one of the ends. Place a new experiment the same distance from the other end and repeat. Stop when n experiments have been performed.
Example. The function to be minimized is shown in Figure 1-7. Let n, the number of experiments to be performed, be 5. Then
from the other, as shown. Discard [x1, L0]. The experiment xin from x = 0. Place xin from x = x1 and discard [0, x3]. Place xin from x1 and discard [x4, x1]. The experiment x5 is now located in the middle of the remaining interval, [x3, x4], and by past procedure we should place the last experiment right on top of it. Since this would yield no new information, we displace the last experiment by a small amount, obtaining the final interval of uncertainty [x2, x4].
Note that, after discarding the first interval, [x1, L. Thus any section of a Fibonacci search can be viewed as the beginning of a new Fibonacci search with a smaller number of experiments left to be done and a smaller initial interval of uncertainty.
FIGURE 1-7 Minimizing by Fibonacci search.
Minimax Optimality. The Fibonacci technique is an optimal
search technique in a particular sense. Consider a three-experiment search in the interval [0, 1] with experiments at x1 = 0.1, x2 = 0.4, x3 = 0.8. Either x1, x2, or x3 could be the smallest (the possibility of equality is excluded), as shown in Figure 1-8. K is the index of the smallest experiment and l3 is the final interval of uncertainty remaining after three experiments. It is evident that l3 depends both on K and the positioning of the experiments x1, x2, x3. Thus, in the n-experiment case, we may write
FIGURE 1-8 All possible three experiment outcomes.
An obvious requirement of a good search plan is that it make the final interval of uncertainty as small as possible, and that it do this no matter what (unimodal) function it operates on. Since ln depends on K, thus on the function being minimized, minimizing it would yield a plan good only for a particular function. We can remedy this by defining
where Ln is the maximum final interval of uncertainty obtained over all best outcomes, K. This is independent of K, hence independent of the function being minimized. It is proved [20] that the Fibonacci search minimizes the maximum final interval of uncertainty; i.e., the final interval of uncertainty for the Fibonacci method is given by
(assuming a unimodal function). It is a rather conservative criterion, yet leads to very effective search results. Table 1 gives the ratio of Lversus n. It is seen from this that
and that the interval L0 is thus reduced quite rapidly.
TABLE 1
Constrained Optimization Problems
Attention is now focused on problems of constrained minimization, i.e., problems in which the variables x1, ..., xn may take on only certain allowable values. Such a situation in two dimensions is shown in Figure 1-9. The unshaded area is the set of allowable values of x1 and x2, henceforth called the constraint set. Its boundaries are the curves x1 = 0, x2 = 0, g1(x) = 0, g2(x) = 0. The constraint set in Figure 1-9 is the set of all points satisfying the inequalities.
FIGURE 1-9 Constraint set.
A general programming problem may have equality as well as inequality constraints. The equalities often describe the operation of the system under consideration, while the inequalities define limits within which certain physical variables must lie. Thus the general problem may be written
minimize f(x)
subject to
When all functions f, gi, hj are linear, the problem is one of linear programming; if not, then nonlinear programming. The field of linear programming is by far the most fully developed and is considered first. Nonlinear programming problems are considerably more difficult and are considered later.
1.2 Linear Programming
1.2.1 Simplex Method
Geometry of Linear Programs. Consider the problem
subject to
(1)
The constraint set is the unshaded region of Figure 1-10 defined by the intersections of the half-spaces satisfying the linear inequalities. The numbered points are called extreme points of the set. If the constraints are linear there are only a finite number of such points.
FIGURE 1-10 Geometry of a linear program.
Contours of constant value of the objective function, z, are defined by the linear equation
(2)
As c is varied, the line is moved parallel to itself. The maximum value of z is the largest c whose line has at least one point in common with the constraint set. For the figure shown, this occurs for c = 5 and the optimal values of x . Note that the maximum value occurs at an extreme point of the constraint set. If the problem had been to minimize z, the minimum is at the origin, which is again an extreme point. If the objective function had been z = 2x1 + 2x2, the line z = constant would be parallel to one of the constraint boundaries, x1 + x) and (x1 = 2, x2 = 0) and, in fact, also occurs at all points on the line segment joining these extreme points.
Two additional possibilities exist. If the constraint x1 + x2 ≤ 2 had been removed, the constraint set would have appeared as in Figure 1-11; i.e., the set would have been unbounded. Then max z is also unbounded, since z can be made as large as desired subject to the constraints. Of course, on the opposite extreme, the constraint set could have been empty, as in the case where x1 + x2 ≤ 2 is replaced by x1 + x2 ≤–1. Thus a linear programming problem may have (1) no solution, (2) an unbounded solution, (3) a finite optimal solution, or (4) an infinite number of optimal solutions. The methods to be developed deal with all these possibilities.
FIGURE 1-11 Unbounded minimum.
The fact that the minimum of a linear programming problem always occurs at an extreme point of the constraint set is the single most important property of linear programs. It is true for any number of variables (i.e., more than two dimensions) and forms the basis for the simplex method for solving linear programs.
Of course, in many dimensions the geometrical ideas used here cannot be visualized and one must characterize extreme points algebraically. This is done in the next two sections, where the problem is placed in standard form and the basic theorems of linear programming are stated.
Standard Form for Linear Programs. A linear programming problem can always be written in the following form. Choose x = (x1, x2, ..., xn) to minimize
(3)
subject to
(4)
(5)
or, in matrix form,
minimize cx
subject to
where A is an m × n matrix of constants. If any of the equations (4) were redundant, i.e., linear combinations of the others, they could be deleted without changing any solutions of the system. If there are no solutions, or if there is only one, there can be no optimization. Thus the case of greatest interest is where the system of equations (4) is nonredundant and has at least two, hence an infinite number, of solutions. This occurs if and only if
We assume the above is true in what follows. The problem of linear programming is to first detect whether solutions exist, and, if so, to find one yielding minimum z.
Note that all the constraints in (4) are equalities and that all variables xj are assumed to be nonnegative. It is necessary to place the problem in this form to solve it most easily (equations are easier to work with here than inequalities). If the original system is not of this form, it may easily be transformed by use of the following devices.
Slack Variables. If a given constraint is an inequality,
then define a slack variable xn+i ≥ 0 such that
and the inequality becomes an equality. Similarly, if the inequality is
we write
Note that the slacks must be nonnegative in order that the inequalities be satisfied for all xj.
Nonnegative Variables. If, in the original formulation of the problem, a given variable, xk, is not constrained to be nonnegative, we write it as the difference of two nonnegative variables, i.e.,
This adds more variables to the problem, but, since nonnegativity restrictions actually simplify the solution of linear programs, it is well worth it. After the solution, we transform back to obtain xk.
Example. Transform the following linear program into standard form:
subject to
SOLUTION. Define
Also define slack variables x3 ≥ 0, x4 ≥ 0. Then the problem becomes
subject to
Basic Theorems of Linear Programming. We now proceed to generalize the ideas illustrated earlier from two to n dimensions. Proofs of the following theorems may be found in Gass [1]. First a number of standard definitions are made.
DEFINITION 1. A feasible solution to the linear programming problem is a vector x = (x1, x2, . . . , xn) which satisfies the equations (4) and the nonnegativities (5).
DEFINITION 2. A basis matrix is an m × m nonsingular matrix formed from some m columns of the constraint matrix A (note; since rank (A) = m, A contains at least one basis matrix)
DEFINITION 3. A basic solution to a linear program is the unique vector determined by choosing a basis matrix, setting the n–m variables associated with columns of A not in the basis matrix equal to zero, and solving the resulting square, nonsingular system of equations for the remaining m variables.
DEFINITION 4. A basic feasible solution is a basic solution in which all variables have nonnegative values. [Note: By definition 3, at most M variables can be positive.]
DEFINITION 5. A nondegenerate basic feasible solution is a basic feasible solution with exactly m positive xi.
DEFINITION 6. An optimal solution is a feasible solution which also minimizes z in (3).
For example, in the system
(6)
obtained from (1) by adding slack variables x3, x4, the matrix
formed from columns 3 and 4 of (6) is nonsingular, and hence is a basis matrix. The corresponding basic solution
is a nondegenerate basic feasible solution. The matrix
formed from columns 1 and 4 of (6) is also a basis matrix. The corresponding basic solution is obtained by setting x2 = x3 = 0 and solving
yielding x1 =–1, x4 = 3. This basic solution is not feasible.
The importance of these definitions is brought out by the following theorems:
THEOREM 1. The objective function, z, assumes its minimum at an extreme point of the constraint set. If it assumes its minimum at more than one extreme point, then it takes on the same value at every point of the line segment joining any two optimal extreme points.
This theorem is a multidimensional generalization of the geometric arguments given previously. By Theorem 1, in searching for a solution, we need only look at extreme points. It is thus of interest to know how to characterize extreme points in many dimensions algebraically. This information is given by the next theorem.
THEOREM 2. A vector x = (x1, . . . , xn) is an extreme point of the constraint set of a linear programming problem if and only if x is a basic feasible solution of the constraints (4)–(5).
Theorem 2 is true in two dimensions, as can be seen from the example of relations (1), whose constraints have been rewritten in equation form in (6).
The (x1, x2) coordinates of the extreme point at x1 = 0, x2 = 1 are given by the (x1, x2) coordinates of the basic feasible solution
The optimal extreme point corresponds to the basic feasible solution
Theorems 1 and 2 imply that, in searching for an optimal solution, we need only consider extreme points, hence only basic feasible solutions. Since a basic feasible solution has at most m of n variables positive, an upper bound to the number of basic feasible solutions is the number of ways m variables can be selected from a group of n variables, which is
For large n and m this is still a large number. Thus, for large problems, it would be impossible to evaluate z at all extreme points to find the minimum. What is needed is a computational scheme which selects, in an orderly fashion, a sequence of extreme points, each one yielding a lower value of z, until finally the minimum is attained. In this way we consider only a small subset of the set of all possible extreme points. The simplex method, devised by G. B. Dantzig, is such a scheme. This procedure finds an extreme point and determines whether or not it is optimal. If not, it finds a neighboring extreme point at which the value of z is less than or equal to the previous value. The process is iterated. In a finite number of steps (usually between m and 2m) the minimum is found. The simplex method also discovers whether the problem has no finite minimal solution (i.e., min z =–∞) or if it has no feasible solutions (i.e., an empty constraint set). It is a powerful scheme for solving any linear programming problem.
To explain the method, it is necessary to know how to go from one basic feasible solution (b.f.s.) to another, how to identify an optimal b.f.s., and how to find a better b.f.s. from a b.f.s. that is not optimal. We consider these questions in the following two sections. The notation and approach used is that of Dantzig [2].
Systems of Linear Equations and Equivalent Systems. Consider the system of m linear equations in n unknowns
(7)
A solution to this system is any set of variables x1 ··· xn which simultaneously satisfies all equations. The set of all solutions to the system is called its solution set. The system may have one, many, or no solutions. If no solutions, the equations are said to be inconsistent, and their solution set is empty.
Equivalent Systems and Elementary Operations. Two systems of equations are said to be equivalent if they have the same solution sets. It is proved in Dantzig [2] that the following operations transform a given linear system into an equivalent system:
Multiplying any equation, Et, by a constant k ≠ 0.
Replacing any equation, Et, by the equation Et + kEi, where Ei is any other equation of the system.
These operations are called elementary row operations. For example, the linear system of equations (6)
may be transformed into an equivalent system by multiplying the first equation by–1 and adding it to the second, yielding
Note that the solution x1 = 0, x3 = 0, x2 = 1, x4 = 2 is a solution of both systems. In fact, any solution of one system is a solution of the other.
Pivoting. A particular sequence of elementary row operations finds special application in linear programming. This sequence is called a pivot operation, defined as follows.
DEFINITION. A pivot operation consists of m elementary operations which replace a linear system by an equivalent system in which a specified variable has a coefficient of unity in one equation and zero elsewhere. The detailed steps are as follows:
Select a term arsxs in row (equation) r, column s, with ars≠0 called the pivot term.
Replace the rth equation by the rth equation multiplied by 1/ars.
For each i = 1, 2, . . . , m except i = r, replace the ith equation, Ei, by Ei–ais/arsEr, i.e., by the sum of Ei and the replaced rth equation multiplied by–ais.
Example. Consider the system
(E1)
(E2)
(E3)
Let us transform to an equivalent system in which x1 is eliminated from all but (E1), there having unity coefficient. Thus choose the term 2x1 as the pivot term. The first operation is to make the coefficient of this term unity, so we divide (E1) by 2, yielding the equivalent system
( )
(E2)
(E3)
The next operation eliminates x) by–1 and adding to (E2), yielding
( )
( )
(E3)
Finally, we eliminate x) by–3 and adding to (E3), yielding
Canonical Systems. Assume that the first m columns of the linear system (7) form a basis matrix, B. Multiplying each column of (7) by B–1 yields a transformed (but equivalent) system in which the coefficients of the variables (x1, ..., xm) are an identity matrix. Such a system is called canonical and has the form shown in Table 1.
TABLE 1
Canonical System with Basic Variables x1, x2, ···, xm
The variables x1, . . . , xm are associated with the columns of B and are called basic variables. They are also called dependent, since if values are assigned to the nonbasic or independent variables xm+1, . . . , xn, then x1, . . . , xm can be determined immediately. In particular, if xm+1, . . . , xn are all assigned zero values then we obtain the basic solution
If
then this is a basic feasible solution. , the basic feasible solution is degenerate.
Instead of actually computing B–1 and multiplying the linear system (7) by it, we can place (7) in canonical form by a sequence of m pivot operations. First, pivot on the term a11x1 if a11 ≠0. If a11 =0 then, since B is nonsingular, there is an element in its first row which is nonzero. Rearranging the columns makes this the (1, 1) element and allows the pivot. Repeating this procedure for the terms a22x2, . . . , ammxm generates the canonical form. Such a form will be used to begin the simplex method.
Simplex Algorithm. The simplex method is a two-phase procedure for finding an optimal solution to linear programming problems. Phase 1 finds an initial basic feasible solution if one exists, or gives the information that one does not (in which case the constraints are inconsistent and the problem has no solution). Phase 2 uses this solution as a starting point and either (1) finds a minimizing solution or (2) yields the information that the minimum is unbounded (i.e.,–∞). Both phases use the simplex algorithm described here.
In initiating the simplex algorithm, we treat the objective form
as just another equation, i.e.,
(8)
which we include in the set to form an augmented system of equations. The simplex algorithm is always initiated with this augmented system in canonical form. The basic variables are some m of the x’s, which we renumber