Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Optimization Theory for Large Systems
Optimization Theory for Large Systems
Optimization Theory for Large Systems
Ebook1,197 pages7 hours

Optimization Theory for Large Systems

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Important text examines most significant algorithms for optimizing large systems and clarifying relations between optimization procedures. Much data appear as charts and graphs and will be highly valuable to readers in selecting a method and estimating computer time and cost in problem-solving. Initial chapter on linear and nonlinear programming presents all necessary background for subjects covered in rest of book. Second chapter illustrates how large-scale mathematical programs arise from real-world problems. Appendixes. List of Symbols.
LanguageEnglish
Release dateJan 17, 2013
ISBN9780486143699
Optimization Theory for Large Systems

Related to Optimization Theory for Large Systems

Titles in the series (100)

View More

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Optimization Theory for Large Systems

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Optimization Theory for Large Systems - Leon S. Lasdon

    Index

    1

    Linear and Nonlinear Programming

    The problem of mathematical programming is that of maximizing or minimizing an objective function f(x1 ··· xn) by choice of the vector x = (x1 ··· xn)’. The variables xi may be allowed to take on any values whereupon the problem is one of unconstrained minimization or they may be restricted to take on only certain allowable values, whereupon the problem is constrained. Only problems in which (1) the variables xi can vary continuously within the region of interest and (2) the objective and constraint functions are continuous and differentiable are considered here.

    If the problem is constrained, its difficulty depends critically on the nature of the constraints, i.e., linear, nonlinear, etc. We consider first the unconstrained case, then the more difficult, constrained one. The constrained case will be divided into two parts: linear constraints and linear objective function (linear programming) and at least one nonlinear constraint and/or nonlinear objective (nonlinear programming).

    1.1 Unconstrained Minimization

    Necessary and Sufficient Conditions for an Unconstrained Minimum. The problem here is to maximize or minimize a function of n variables, f(x), with no restrictions on the variables x. Many real-life problems are of this form, where whatever constraints are present do not restrict the optimum. Also, many problems in which the constraints are binding can be converted to unconstrained problems or sequences of such problems. Since the problem of maximizing f(x) is equivalent to that of minimizing–f(x), only the minimization problem is considered.

    A point x∗ is said to be a global minimum of f(x) if

    (1)

    for all x. If the strict inequality holds for x x∗ the minimum is said to be unique. If (1) holds only for all x in some neighborhood of x∗, then x* is said to be a local or relative minimum of f(x), since x∗ is only the best point in the immediate vicinity, not in the whole space.

    If f(x) is continuous and has continuous first and second partial derivatives for all x, the necessary conditions for a local minimum are [3]

    (2)

    and that the matrix of second partial derivatives evaluated at x* be positive semidefinite. Any point x* satisfying (2) is called a stationary point of f(x). Sufficient conditions for a relative minimum are that the matrix of second partial derivatives of f(x) evaluated at x* be positive definite and (2) hold.

    Numerical Methods for Finding Unconstrained Minima. The most obvious approach to finding the minimum of f(x) is to solve (2). These are a set of n equations, usually nonlinear, in the n unknowns xi. Unfortunately the task of solving large sets of nonlinear equations is very difficult. The function f(x) may be so complex that it is difficult even to write out (2) in closed form. Further, even if (2) could be solved, there would be no guarantee that a given solution was not a maximum, saddle point, etc., rather than a minimum. Thus other approaches must be considered.

    Gradient. If f(x) is continuous and differentiable, a number of minimization techniques using the gradient of f(x), written ∇f(x), are available. The gradient is the vector whose ith component is ∂f/∂xi. It points in the direction of maximum rate of increase of f(x) (–∇f points in the direction of greatest decrease). The vector ∇f is, at any point x0, normal to the contour of constant function value passing through x0.

    Steepest Descent. The method of steepest descent for finding a local minimum of f(x) proceeds as follows. Start at some initial point x0 and compute ∇f(x0). Take a step in the direction of steepest descent,–∇f(x0), using a step length α0, to obtain a new point x1. Repeat the procedure until some stop criterion is satisfied. This process is described by the relations

    x0 given

    (3)

    where α1 > 0. The process will, under very mild restrictions [4] on f(x), converge to at least a local minimum of f(x), if the αi are chosen so that

    (4)

    for all i, i.e., if the function is made to decrease at each step. Since the function is initially decreasing in the directions given by–∇f(x1), there always exist αi > 0 such that (4) is satisfied.

    Step Length and Optimum Gradients. One way to find αi satisfying (4) is to choose αi to minimize the function

    (5)

    Note that xi and ∇f/(xi) are known vectors so that the only variable in (5) is α. The adaptation of the method of steepest descent which uses (5), called the method of optimum gradients [4], is described by

    x0 given

    Choose α = αi by minimizing g(α) in (5),

    (6)

    Set i = i + 1 and repeat.

    Geometrically, αi is chosen by minimizing f(x) along the direction s1 starting from xi. At a minimum,

    (7)

    so the vector xi + αsi must be tangent to a contour at α = αi, for dg is then zero for small changes dα. Since ∇f(xi+1) is normal to the same contour, successive steps are at right angles to one another. Practical methods for carrying out the one-dimensional minimization are discussed later in this section.

    Stop Criteria. Some possible stop criteria are as follows:

    Since, at a minimum ∂f/∂xi = 0, stop when

    or

    Stop when the change in function is less than some limit η, i.e.,

    Others are possible. Criterion 2 appears to be the more dependable of the two, provided it is satisfied for several successive values of i.

    Local versus Global Minima. The most that can be guaranteed of this or any other iterative minimization technique is that it will find a local minimum, in general the one nearest the starting point xi. To attempt to find all local minima (and thus the global minimum), the method most used is to repeat the minimization from many different initial points.

    Numerical Difficulties. The fact that successive steps of the optimum gradient method are orthogonal leads to very slow convergence for some functions. If the function contours are hyperspheres (circles in two dimensions), the method finds the minimum in one step. However, if the contours are in any way eccentric, an inefficient zigzag behavior results, as shown in Figure 1-1. This occurs because, for eccentric contours, the gradient direction is generally quite different from the direction to the minimum. Many, if not most, of the functions occurring in practical applications are ill-behaved in that their contours are eccentric or nonspherical. Thus more efficient schemes are desirable.

    + (x2–4)², normal steepest-descent method.

    Second-Order Gradient Methods. Recently, a number of minimization techniques have been developed which substantially overcome the above difficulties. What appear to be the best of these will be described in detail. First, however, the logic behind these methods will be explained.

    Since the first partial derivatives of a function vanish at the minimum, a Taylor-series expansion about the minimum x∗ yields

    (8)

    where Hf(x*), the matrix of second partials of f evaluated at x*, is positive definite. Thus the function behaves like a pure quadratic in the vicinity of x*. It follows that the only methods which will minimize a general function quickly and efficiently are those which (1) work well on a quadratic and (2) are guaranteed to converge eventually for a general function. All others will be slow, at least in the vicinity of the minimum (see Figure 1-1), and often elsewhere.

    Conjugate Directions. General minimization procedures can be designed which will minimize a quadratic function of n variables in n steps [5–7]. Most, if not all, are based on the ideas of conjugate directions.

    The general quadratic function can be written

    (9)

    where A is positive definite and symmetric. Let x∗ minimize q(x). Then

    (10)

    Given a point x0 and a set of linearly independent directions {s0, s1,..., sn-1}, constants βi can be found such that

    (11)

    If the directions si are A-conjugate, i.e., satisfy

    (12)

    and none are zero, then the si are easily shown to be linearly independent and the βi can be determined from (11) as follows:

    (13)

    Using (12),

    (14)

    and, using (10),

    (15)

    Now consider an iterative minimization procedure, starting at x0 and successively minimizing q(x) down the directions s0, s1, s2, . . . , sn-1, where these directions satisfy (12). Successive points are then determined by the relations

    (16)

    where αi is determined by minimizing f(xi + αsi), as in the optimum gradient method, so that

    (17)

    Using (10) in (17) gives

    (18)

    or

    (19)

    From (16),

    (20)

    so that

    (21)

    Thus (19) becomes

    (22)

    which is identical to (15). Hence this sequential process leads, in n steps, to the minimum x*.

    Method of Fletcher and Powell. A method recently presented by Fletcher and Powell [5] is probably the most powerful general procedure now known [8] for finding a local minimum of a general function, f(x). It is designed so that, when applied to a quadratic, it minimizes in n iterations. It does this by generating conjugate directions.

    Central to the method is a symmetric positive definite matrix Hi which is updated at each iteration, and which supplies the current direction of motion, si, by multiplying the current gradient vector. An iteration is described by the following:

    H0 = any positive definite matrix

    Choose α = αi by minimizing f(xi + αsi),

    (23)

    where the matrices Ai and Bi are defined by

    (24)

    Note that the numerators of Ai and Bi are both matrices, while the denominators are scalars. Fletcher and Powell prove the following:

    The matrix Hi is positive definite for all i. As a consequence of this, the method will usually converge, since

    (25)

    i.e., the function f is initially decreasing along the direction si, so that the function can be decreased at each iteration by minimizing down si.

    When the method is applied to the quadratic (9), then

    The directions si (or equivalently ϭi) are A-conjugate, thus leading to a minimum in n steps.

    The matrix Hi converges to the inverse of the matrix of second partials of the quadratic, i.e.,

    When applied to a general function, Hi tends to the inverse of the matrix of second partials of the function evaluated at the minimum since, as the minimum is approached, the second-order terms in the Taylor-series expansion predominate.

    FIGURE 1-2 Contours of the Rosenbrock function.

    Numerical tests bear out the rapid convergence of this method. Consider, for example, the function

    (26)

    called the Rosenbrock function [9], whose contours are given in Figure 1-2. The minimum is at (1, 1) and the steep curving valley makes minimization difficult. The paths taken by the optimum gradient technique and by the method of Fletcher and Powell are shown in Figures 1-3 and 1-4. The Fletcher–Powell technique follows the curved valley and minimizes very efficiently.

    FIGURE 1-3 Fletcher–Powell method minimizing the Rosenbrock function.

    FIGURE 1-4 Optimum gradient method minimizing the Rosenbrock function.

    An excellent reference, which derives this algorithm as a member of a class of methods and gives numerical comparisons with other techniques, has been prepared by Pearson [21].

    Conjugate Gradient Method. Other conjugate direction minimization techniques also exist. One of these, due to Fletcher and Reeves [6], requires computation of the gradient of f(x) and the storage of only one additional vector, the actual direction of search. The algorithm is

    x0 arbitrary

    Choose αi to minimize f(xi + αsi),

    where

    This method is not quite as efficient [8] as the Fletcher–Powell technique but requires much less storage, a significant advantage when the number of variables, n, is large.

    Minimization without Derivatives. There are also a number of minimization techniques which do not require derivatives. Of these, tests performed thus far indicate that Powell’s method [7] is the most efficient [8]. Each iteration requires n one-dimensional minimizations down n linearly independent directions, s1, s2, ..., sn. As a result of these minimizations a new direction, s, is defined and, if a test is passed, s replaces one of the original directions, sr. The process is usually started from the best estimate of the minimum, x0, with the initial si’s being the coordinate directions.

    The procedure is as follows:

    For r = 1, 2, ..., n calculate αr, so that f(xr-1 + αsr) is a minimum and define xr = xr-1 + αrsr.

    Find the integer m, 1 ≤ m n, so that [f(xm-1)–f(xm)] is a maximum, and define ∆ = f(xm-1)–f(xm).

    Calculate f3 = f(2xn–x0) and define f1 = f(x0) and f2 = f(xn).

    If either f3 ≥ f1 and/or

    use the old directions s1, s2, . . . , sn for the next iteration and use xn for the next x0.

    Otherwise, defining s = xn–x0, calculate α so that f(xn + αs) is a minimum. Use s1, s2, . . . , sm-1, sm+1, sm+2, . . . , sn, s as the directions for the next iteration and xn + αs for the next x0.

    One-Dimensional Minimization. All the methods discussed thus far have searched for a minimum in n-dimensional space by performing one-dimensional minimizations down a set of directions {si}. Thus the efficiency of any such procedure depends critically on the efficiency of the method used to solve the single-dimensional search. Three techniques are presented. The first two use polynomial interpolation, one requiring derivatives, the second only function values. The third, the Fibonacci method, also requires only function values. Unlike the interpolation methods, it does not depend on smoothness of the function being minimized, and may be applied even to discontinuous functions.

    For both interpolative procedures, the variables x1,..., xn are scaled so that a unit change in any variable is a significant but not too large percentage change in that variable. For example, if a variable is expected to have a value around 100 units, then a 1-unit change would be considered significant, whereas a 10-unit change would be too large.

    Cubic Interpolation. This technique, described in [6], solves the problem of finding the smallest nonnegative α, α*, for which the function

    (27)

    attains a local minimum in three stages. It uses the derivative

    (28)

    The first stage normalizes the s vector so that a step size α = 1 is acceptable. The second stage establishes bounds on α*, and the third stage interpolates its value.

    STAGE 1. Calculate

    with (s)j component j of s and divide each component of s by A. This ensures that s is a reasonable change in x.

    STAGE 2. Evaluate g(α) and g′(α) at the points a = 0, 1, 2, 4, . . . , a, b, where b is the first of these values at which either g′ is nonnegative or g has not decreased. It then follows that α* is bounded in the interval a < α* ≤ b. If g(1) is much greater than g(0), divide the components of s by a factor, e.g., 2 or 3, and repeat this stage.

    STAGE 3. A cubic polynomial is now fitted to the four values g(a), g(a), g(b), g′(b), and its minimum, αe, is taken to be the value for α*. It is shown in [6] that the cubic has a unique minimum in the interval (a, b) which is given by

    (29)

    where

    (30)

    (31)

    If neither g(a) nor g(b) is less than g(αe), then αe is accepted as the estimate of α*. Otherwise, according as g′(αe) is positive or negative, the interpolation is repeated over the subinterval (a, αe or (αe, b), respectively.

    It is interesting that for small values of g′(a) and g′(b), the cubic has the shape that a flat metal spring would assume if fitted to the points, a, b, with slopes g′(a), g′(b).

    Quadratic Interpolation. If derivatives are not available or are difficult to compute, then quadratic interpolation should be used in the one-dimensional minimization. The procedure can again be described in three steps.

    STAGE 1. This is the same as stage 1 above.

    STAGE 2. Evaluate g(α) at the points α = 0, 1, 2, 4, ..., a, b, c, where c is the first of these values at which g has increased. Then

    Again, if g(1) » g(0), then divide the components of s by a factor, e.g., 2 or 3, and repeat.

    STAGE 3. A quadratic polynomial is now fitted to the three values g(a), g(b), g(c), and its minimum, αe, is

    (32)

    If

    then αe is accepted as the estimate of α*. Otherwise, b is taken as α*.

    Fibonacci Technique. Unlike the previous technique, the Fibonacci method [20] does not use derivatives. It can thus deal with functions which are not differentiable or even continuous, functions to which the previous techniques could not be applied. It minimizes by assuming that the optimal value of the variable is within some initial interval, called the initial interval of uncertainty. It then systematically reduces this interval by evaluating the function within the interval, thus closing in on the optimal point. To do this without gradients, one must assume something about the function to be minimized. The Fibonacci technique proceeds on the assumption that this function is unimodal within the initial interval of uncertainty.

    Unimodality. Roughly speaking, a unimodal function is one that has only one peak (max or min). More precisely, a function of one variable is said to be unimodal if, given that two values of the variable are on the same side of the optimum, the one nearer the optimum gives the better functional value, i.e., the smaller value in the case of a minimization problem. Mathematically, this is phrased as follows. Let the minimum be at x*, and define x1 < x2. Then the function f(x) is unimodal if

    and

    This property enables us to reduce any initial interval of uncertainty by function evaluations alone. Consider the normalized interval [0, 1] and two function evaluations (henceforth called experiments) within it, as shown in Figure 1-5. If the function is unimodal, i.e., has a single minimum, then the minimizing x cannot lie to the right of x2. Thus that part of the interval can be discarded, and a new smaller interval of uncertainty results, as shown in Figure 1-6. If f(x1) > f(x2), then the interval [0, x1] would have been discarded while if f(x1) = f(x2), then both [0, x1] and [0, x2] can be dismissed. Moreover, if one of the original experiments remains within this new interval, only one other experiment need be placed within it in order that the process be repeated. The Fibonacci method places the experiments so that one of the original experiments always remains. The method makes use of the sequence of Fibonacci numbers, {Fn}, defined by

    yielding the sequence 1, 1, 2, 3, 5, 8, 13, .... It assumes that n experiments are to be made, and proceeds as follows.

    FIGURE 1-5 Initial interval of uncertainty.

    FIGURE 1-6 Reduced interval of uncertainty.

    Method. Let the initial interval of uncertainty be L0 and define

    units from the other end. Discard some part of the interval using the unimodality assumption. There then remains a new smaller interval of uncertainty with one experiment left in it, that experiment being some distance in from (any) one of the ends. Place a new experiment the same distance from the other end and repeat. Stop when n experiments have been performed.

    Example. The function to be minimized is shown in Figure 1-7. Let n, the number of experiments to be performed, be 5. Then

    from the other, as shown. Discard [x1, L0]. The experiment xin from x = 0. Place xin from x = x1 and discard [0, x3]. Place xin from x1 and discard [x4, x1]. The experiment x5 is now located in the middle of the remaining interval, [x3, x4], and by past procedure we should place the last experiment right on top of it. Since this would yield no new information, we displace the last experiment by a small amount, obtaining the final interval of uncertainty [x2, x4].

    Note that, after discarding the first interval, [x1, L. Thus any section of a Fibonacci search can be viewed as the beginning of a new Fibonacci search with a smaller number of experiments left to be done and a smaller initial interval of uncertainty.

    FIGURE 1-7 Minimizing by Fibonacci search.

    Minimax Optimality. The Fibonacci technique is an optimal search technique in a particular sense. Consider a three-experiment search in the interval [0, 1] with experiments at x1 = 0.1, x2 = 0.4, x3 = 0.8. Either x1, x2, or x3 could be the smallest (the possibility of equality is excluded), as shown in Figure 1-8. K is the index of the smallest experiment and l3 is the final interval of uncertainty remaining after three experiments. It is evident that l3 depends both on K and the positioning of the experiments x1, x2, x3. Thus, in the n-experiment case, we may write

    FIGURE 1-8 All possible three experiment outcomes.

    An obvious requirement of a good search plan is that it make the final interval of uncertainty as small as possible, and that it do this no matter what (unimodal) function it operates on. Since ln depends on K, thus on the function being minimized, minimizing it would yield a plan good only for a particular function. We can remedy this by defining

    where Ln is the maximum final interval of uncertainty obtained over all best outcomes, K. This is independent of K, hence independent of the function being minimized. It is proved [20] that the Fibonacci search minimizes the maximum final interval of uncertainty; i.e., the final interval of uncertainty for the Fibonacci method is given by

    (assuming a unimodal function). It is a rather conservative criterion, yet leads to very effective search results. Table 1 gives the ratio of Lversus n. It is seen from this that

    and that the interval L0 is thus reduced quite rapidly.

    TABLE 1

    Constrained Optimization Problems

    Attention is now focused on problems of constrained minimization, i.e., problems in which the variables x1, ..., xn may take on only certain allowable values. Such a situation in two dimensions is shown in Figure 1-9. The unshaded area is the set of allowable values of x1 and x2, henceforth called the constraint set. Its boundaries are the curves x1 = 0, x2 = 0, g1(x) = 0, g2(x) = 0. The constraint set in Figure 1-9 is the set of all points satisfying the inequalities.

    FIGURE 1-9 Constraint set.

    A general programming problem may have equality as well as inequality constraints. The equalities often describe the operation of the system under consideration, while the inequalities define limits within which certain physical variables must lie. Thus the general problem may be written

    minimize f(x)

    subject to

    When all functions f, gi, hj are linear, the problem is one of linear programming; if not, then nonlinear programming. The field of linear programming is by far the most fully developed and is considered first. Nonlinear programming problems are considerably more difficult and are considered later.

    1.2 Linear Programming

    1.2.1 Simplex Method

    Geometry of Linear Programs. Consider the problem

    subject to

    (1)

    The constraint set is the unshaded region of Figure 1-10 defined by the intersections of the half-spaces satisfying the linear inequalities. The numbered points are called extreme points of the set. If the constraints are linear there are only a finite number of such points.

    FIGURE 1-10 Geometry of a linear program.

    Contours of constant value of the objective function, z, are defined by the linear equation

    (2)

    As c is varied, the line is moved parallel to itself. The maximum value of z is the largest c whose line has at least one point in common with the constraint set. For the figure shown, this occurs for c = 5 and the optimal values of x . Note that the maximum value occurs at an extreme point of the constraint set. If the problem had been to minimize z, the minimum is at the origin, which is again an extreme point. If the objective function had been z = 2x1 + 2x2, the line z = constant would be parallel to one of the constraint boundaries, x1 + x) and (x1 = 2, x2 = 0) and, in fact, also occurs at all points on the line segment joining these extreme points.

    Two additional possibilities exist. If the constraint x1 + x2 ≤ 2 had been removed, the constraint set would have appeared as in Figure 1-11; i.e., the set would have been unbounded. Then max z is also unbounded, since z can be made as large as desired subject to the constraints. Of course, on the opposite extreme, the constraint set could have been empty, as in the case where x1 + x2 ≤ 2 is replaced by x1 + x2 ≤–1. Thus a linear programming problem may have (1) no solution, (2) an unbounded solution, (3) a finite optimal solution, or (4) an infinite number of optimal solutions. The methods to be developed deal with all these possibilities.

    FIGURE 1-11 Unbounded minimum.

    The fact that the minimum of a linear programming problem always occurs at an extreme point of the constraint set is the single most important property of linear programs. It is true for any number of variables (i.e., more than two dimensions) and forms the basis for the simplex method for solving linear programs.

    Of course, in many dimensions the geometrical ideas used here cannot be visualized and one must characterize extreme points algebraically. This is done in the next two sections, where the problem is placed in standard form and the basic theorems of linear programming are stated.

    Standard Form for Linear Programs. A linear programming problem can always be written in the following form. Choose x = (x1, x2, ..., xn) to minimize

    (3)

    subject to

    (4)

    (5)

    or, in matrix form,

    minimize cx

    subject to

    where A is an m × n matrix of constants. If any of the equations (4) were redundant, i.e., linear combinations of the others, they could be deleted without changing any solutions of the system. If there are no solutions, or if there is only one, there can be no optimization. Thus the case of greatest interest is where the system of equations (4) is nonredundant and has at least two, hence an infinite number, of solutions. This occurs if and only if

    We assume the above is true in what follows. The problem of linear programming is to first detect whether solutions exist, and, if so, to find one yielding minimum z.

    Note that all the constraints in (4) are equalities and that all variables xj are assumed to be nonnegative. It is necessary to place the problem in this form to solve it most easily (equations are easier to work with here than inequalities). If the original system is not of this form, it may easily be transformed by use of the following devices.

    Slack Variables. If a given constraint is an inequality,

    then define a slack variable xn+i ≥ 0 such that

    and the inequality becomes an equality. Similarly, if the inequality is

    we write

    Note that the slacks must be nonnegative in order that the inequalities be satisfied for all xj.

    Nonnegative Variables. If, in the original formulation of the problem, a given variable, xk, is not constrained to be nonnegative, we write it as the difference of two nonnegative variables, i.e.,

    This adds more variables to the problem, but, since nonnegativity restrictions actually simplify the solution of linear programs, it is well worth it. After the solution, we transform back to obtain xk.

    Example. Transform the following linear program into standard form:

    subject to

    SOLUTION. Define

    Also define slack variables x3 ≥ 0, x4 ≥ 0. Then the problem becomes

    subject to

    Basic Theorems of Linear Programming. We now proceed to generalize the ideas illustrated earlier from two to n dimensions. Proofs of the following theorems may be found in Gass [1]. First a number of standard definitions are made.

    DEFINITION 1. A feasible solution to the linear programming problem is a vector x = (x1, x2, . . . , xn) which satisfies the equations (4) and the nonnegativities (5).

    DEFINITION 2. A basis matrix is an m × m nonsingular matrix formed from some m columns of the constraint matrix A (note; since rank (A) = m, A contains at least one basis matrix)

    DEFINITION 3. A basic solution to a linear program is the unique vector determined by choosing a basis matrix, setting the n–m variables associated with columns of A not in the basis matrix equal to zero, and solving the resulting square, nonsingular system of equations for the remaining m variables.

    DEFINITION 4. A basic feasible solution is a basic solution in which all variables have nonnegative values. [Note: By definition 3, at most M variables can be positive.]

    DEFINITION 5. A nondegenerate basic feasible solution is a basic feasible solution with exactly m positive xi.

    DEFINITION 6. An optimal solution is a feasible solution which also minimizes z in (3).

    For example, in the system

    (6)

    obtained from (1) by adding slack variables x3, x4, the matrix

    formed from columns 3 and 4 of (6) is nonsingular, and hence is a basis matrix. The corresponding basic solution

    is a nondegenerate basic feasible solution. The matrix

    formed from columns 1 and 4 of (6) is also a basis matrix. The corresponding basic solution is obtained by setting x2 = x3 = 0 and solving

    yielding x1 =–1, x4 = 3. This basic solution is not feasible.

    The importance of these definitions is brought out by the following theorems:

    THEOREM 1. The objective function, z, assumes its minimum at an extreme point of the constraint set. If it assumes its minimum at more than one extreme point, then it takes on the same value at every point of the line segment joining any two optimal extreme points.

    This theorem is a multidimensional generalization of the geometric arguments given previously. By Theorem 1, in searching for a solution, we need only look at extreme points. It is thus of interest to know how to characterize extreme points in many dimensions algebraically. This information is given by the next theorem.

    THEOREM 2. A vector x = (x1, . . . , xn) is an extreme point of the constraint set of a linear programming problem if and only if x is a basic feasible solution of the constraints (4)–(5).

    Theorem 2 is true in two dimensions, as can be seen from the example of relations (1), whose constraints have been rewritten in equation form in (6).

    The (x1, x2) coordinates of the extreme point at x1 = 0, x2 = 1 are given by the (x1, x2) coordinates of the basic feasible solution

    The optimal extreme point corresponds to the basic feasible solution

    Theorems 1 and 2 imply that, in searching for an optimal solution, we need only consider extreme points, hence only basic feasible solutions. Since a basic feasible solution has at most m of n variables positive, an upper bound to the number of basic feasible solutions is the number of ways m variables can be selected from a group of n variables, which is

    For large n and m this is still a large number. Thus, for large problems, it would be impossible to evaluate z at all extreme points to find the minimum. What is needed is a computational scheme which selects, in an orderly fashion, a sequence of extreme points, each one yielding a lower value of z, until finally the minimum is attained. In this way we consider only a small subset of the set of all possible extreme points. The simplex method, devised by G. B. Dantzig, is such a scheme. This procedure finds an extreme point and determines whether or not it is optimal. If not, it finds a neighboring extreme point at which the value of z is less than or equal to the previous value. The process is iterated. In a finite number of steps (usually between m and 2m) the minimum is found. The simplex method also discovers whether the problem has no finite minimal solution (i.e., min z =–∞) or if it has no feasible solutions (i.e., an empty constraint set). It is a powerful scheme for solving any linear programming problem.

    To explain the method, it is necessary to know how to go from one basic feasible solution (b.f.s.) to another, how to identify an optimal b.f.s., and how to find a better b.f.s. from a b.f.s. that is not optimal. We consider these questions in the following two sections. The notation and approach used is that of Dantzig [2].

    Systems of Linear Equations and Equivalent Systems. Consider the system of m linear equations in n unknowns

    (7)

    A solution to this system is any set of variables x1 ··· xn which simultaneously satisfies all equations. The set of all solutions to the system is called its solution set. The system may have one, many, or no solutions. If no solutions, the equations are said to be inconsistent, and their solution set is empty.

    Equivalent Systems and Elementary Operations. Two systems of equations are said to be equivalent if they have the same solution sets. It is proved in Dantzig [2] that the following operations transform a given linear system into an equivalent system:

    Multiplying any equation, Et, by a constant k ≠ 0.

    Replacing any equation, Et, by the equation Et + kEi, where Ei is any other equation of the system.

    These operations are called elementary row operations. For example, the linear system of equations (6)

    may be transformed into an equivalent system by multiplying the first equation by–1 and adding it to the second, yielding

    Note that the solution x1 = 0, x3 = 0, x2 = 1, x4 = 2 is a solution of both systems. In fact, any solution of one system is a solution of the other.

    Pivoting. A particular sequence of elementary row operations finds special application in linear programming. This sequence is called a pivot operation, defined as follows.

    DEFINITION. A pivot operation consists of m elementary operations which replace a linear system by an equivalent system in which a specified variable has a coefficient of unity in one equation and zero elsewhere. The detailed steps are as follows:

    Select a term arsxs in row (equation) r, column s, with ars≠0 called the pivot term.

    Replace the rth equation by the rth equation multiplied by 1/ars.

    For each i = 1, 2, . . . , m except i = r, replace the ith equation, Ei, by Eiais/arsEr, i.e., by the sum of Ei and the replaced rth equation multiplied by–ais.

    Example. Consider the system

    (E1)

    (E2)

    (E3)

    Let us transform to an equivalent system in which x1 is eliminated from all but (E1), there having unity coefficient. Thus choose the term 2x1 as the pivot term. The first operation is to make the coefficient of this term unity, so we divide (E1) by 2, yielding the equivalent system

    ( )

    (E2)

    (E3)

    The next operation eliminates x) by–1 and adding to (E2), yielding

    ( )

    ( )

    (E3)

    Finally, we eliminate x) by–3 and adding to (E3), yielding

    Canonical Systems. Assume that the first m columns of the linear system (7) form a basis matrix, B. Multiplying each column of (7) by B–1 yields a transformed (but equivalent) system in which the coefficients of the variables (x1, ..., xm) are an identity matrix. Such a system is called canonical and has the form shown in Table 1.

    TABLE 1

    Canonical System with Basic Variables x1, x2, ···, xm

    The variables x1, . . . , xm are associated with the columns of B and are called basic variables. They are also called dependent, since if values are assigned to the nonbasic or independent variables xm+1, . . . , xn, then x1, . . . , xm can be determined immediately. In particular, if xm+1, . . . , xn are all assigned zero values then we obtain the basic solution

    If

    then this is a basic feasible solution. , the basic feasible solution is degenerate.

    Instead of actually computing B–1 and multiplying the linear system (7) by it, we can place (7) in canonical form by a sequence of m pivot operations. First, pivot on the term a11x1 if a11 ≠0. If a11 =0 then, since B is nonsingular, there is an element in its first row which is nonzero. Rearranging the columns makes this the (1, 1) element and allows the pivot. Repeating this procedure for the terms a22x2, . . . , ammxm generates the canonical form. Such a form will be used to begin the simplex method.

    Simplex Algorithm. The simplex method is a two-phase procedure for finding an optimal solution to linear programming problems. Phase 1 finds an initial basic feasible solution if one exists, or gives the information that one does not (in which case the constraints are inconsistent and the problem has no solution). Phase 2 uses this solution as a starting point and either (1) finds a minimizing solution or (2) yields the information that the minimum is unbounded (i.e.,–∞). Both phases use the simplex algorithm described here.

    In initiating the simplex algorithm, we treat the objective form

    as just another equation, i.e.,

    (8)

    which we include in the set to form an augmented system of equations. The simplex algorithm is always initiated with this augmented system in canonical form. The basic variables are some m of the x’s, which we renumber

    Enjoying the preview?
    Page 1 of 1