MarieSpilde Kerrigan

SOLVING NONLINEAR OPTIMIZATION PROBLEMS
BASED ON GENERALIZED CHOQUET INTEGRALS

BY USING SOFT COMPUTING TECHNIQUES
Marie Spilde and Zhenyuan Wang

Department of Mathematics, University of Nebraska at Omaha, Omaha, NE 68182, USA
Email: mspilde@mail.unomaha.edu, zhenyuanwang@mail.unomaha.edu
ABSTRACT: The traditional gradient search fails in from the genetic algorithm in order to quickly fine-tune the
optimization problems where the objective function is not results and converge to the presumed global solution.
differentiable, such as nonlinear multiregressions based on In section 2, basic mathematical knowledge is laid as a
the generalized Choquet integral with respect to signed fuzzy foundation for the problem. Section 3 presents the genetic
measures. The gradient search can be replaced with an algorithm and section 4, the iterative search algorithm.
iterative search algorithm where differences instead of Section 5 offers examples and section 6 provides the
differentials are applied. conclusion.
KEYWORDS: Fuzzy measures, Choquet integrals, 2. FUZZY MEASURES, CHOQUET INTEGRAL

multiregressions, nonlinear optimization, soft computing.
Let X be the factor space which is composed of n predictive
1. INTRODUCTION attributes X = {x1 , x2 ,..., xn } . A signed fuzzy measure is a
set function µ defined on (X, (X)) satisfying µ (∅ ) = 0 ,
No algorithm exists to solve every nonlinear optimization
problem. One approach to solving such a problem is to where (X) represents the power set of X. Usually, we can
apply a gradient search procedure. In the simplest case of a assume that µ ( X ) = 1 .
concave differentiable function in two dimensions (such as The data set takes the following form:
y = x ), the gradient search amounts to taking the derivative
2
x1 x2 xn y
of the function to choose the appropriate search direction.
f11 f12 f1n y1
The optimal solution lies where the slope is equal to zero.
When multiple attributes are involved, there are countless f 21 f 22 f2 n y2
possible directions in which to move and partial derivatives
are used for choosing the best direction [2]. In the event that
the objective function is not differentiable, such as nonlinear fl1 fl 2 f ln yl
multiregressions based on Choquet integrals, a new
optimization technique must be determined [5]. where row f j 1 f j2 f jn y j is the jth observation of
A reasonable replacement for the gradient search is an predictive attributes x1 , x2 ,..., xn and y j is the corresponding
iterative search. The parameters in this iterative search are
expressed as elements of a vector ( x , x , x , ..., x ) . value of the objective attribute. The number of observations
1 2 3 n
in the data set is recommended to be at least 5 × 2 . Each
n
Differences instead of differentials are applied at each

dimension of the vector, which avoids the problem of observation can be regarded as a function fj thus
nondifferentiability altogether.
f : X → ( −∞, ∞ ) where f ji = f j ( xi ) .
The iterative search is able to converge in a reasonable
amount of time, but it suffers from the problem of potentially The non-linear multiregression discussed in this paper is
converging to a local solution, rather than a global solution. expressed as y = c + ( a + bf ) d µ + N (0, σ ), where c is a
2
The use of a genetic algorithm would provide a global

solution, but at a cost of an undesirably slow convergence constant, a and b are real-valued functions defined on X, f is
an observation of x1 , x2 ,..., xn , and N (0, σ ) is a normally
2
rate. This work will exploit the advantages of both the
genetic algorithm and the iterative search. A genetic distributed random perturbation with expectation 0 and
algorithm will be used to converge towards the global
variance σ . ( a + bf ) = ( a + bf j ) for j = 1, 2,...l , and
2
solution, but for a limited number of generations. Then an
iterative search algorithm will be initialized with the results ( a + bf j ) = ( ai + bi f ij ) for i = 1, 2, ...n . The integral is taken
to be the generalized Choquet integral which is defined as
0 ∞
0 0.2 0.7 1
fd µ = [ µ ( Fα ) − µ ( X )]dα + µ ( Fα ) dα where not both
−∞ 0
α β 1−α − β
terms on the right side are infinite and
Fα = { x | f ( x ) ≥ α , x ∈ X } for any α ∈ ( −∞, ∞ ) [4, 5, 6]. and then depending on a random number generated on
[0, 1), selects the appropriate genetic operator.
Functions a and b are both expressed as vectors with the
ε and δ : Small positive numbers used in the stopping
constraints that
−10
ai ≥ 0 for i = 1, 2,..., n, with min ai = 0; controller. Their defaults are 10 and
1≤ i ≤ n −6
10 respectively.
−1 ≤ bi ≤ 1 for i = 1, 2,..., n, with max bi = 1. w: An integer to limit the number of successive
1≤ i ≤ n
generations that have not made significant progress. The
For convenience, we take µ ( X ) ≥ 0 .
default is 10.
1 l 1 l
3. GENETIC ALGORITHM (5) Calculate σˆ y = ( y j − y ) , where y =
2 2
yj .
l j =1 l j =1
Recently, an algebraic approach to finding the values of the Construct vector q = ( q1 , q2 ,..., ql ) as follows: q j = y j
signed fuzzy measure has been introduced [4]. This genetic
algorithm is based on that work. The optimization goal of where j = 1, 2,..., l .
the genetic algorithm is to minimize the regression residual (6) Randomly create the initial population of p
error defined in step 11. chromosomes. Each chromosome has 2n genes, denoted
(1) For n, an integer equal to the number of attributes, by g1 , g 2 , ..., g n , g n +1 , g n + 2 ,..., g 2 n , the first n of which
express k in binary digits as k = k n k n −1 k1 for every represent vector a, the second n of which represent
k = 0,1, 2,..., 2 − 1 . Example: k = 6 as a binary number
n vector b. Each gene represents a rational number in
is 110, so if n = 4, k 4 = 0, k3 = 1, k 2 = 1, and k1 = 0 . [0,1). In binary, the numerator of g i has the form
t λ t λ −1 td t1 , where t d ∈ {0,1} , and d = λ , λ − 1, ...,1 .
(2) Use µk to denote µ ( A) where A= { xi } , The denominator is a one followed by λ zeros. For
ki =1
0101011110
k = 0,1, 2,..., 2 − 1 .
n
example: if λ = 10 , g1 = . Converting
10000000000
(3) Input integer l which equals the number of observations
in the data set, and input the data set itself. 350
g1 to base 10 results in g1 = .
(4) Seed the random number generator using a large prime 1024
number. Set the value for the following parameters: (7) Variable GC is initialized to p and is used to count how
λ : The bit length of each gene which is dependent on many chromosomes have been generated. Variable WT
the required precision of the results. For example, is a counter that is initialized to zero and it increases
λ =10 means that 2 = 1024 binary numbers in [0, 1)
10
when no significant progress has been made from one
generation to the next generation. (Once progress is
1
can be represented. That means the precision is made, WT is reset back to zero.) SE is a variable that
1024
represents the saved error and it is initialized to σˆ y .
2
The default for λ is 10.

−3
which is near 10 . (8) Decode each chromosome to produce the elements of
p: The population size. It should be a large, positive, vectors a and b using the following formulae:
even integer. The default is 200. f
g i − m( g )
α and β : The probabilities used in a random switch to ai = , where m( g ) = min g k
control the choice of genetic operators for producing (1 − g i )(1 − m( g )) 1≤ k ≤ n
offspring from selected parents. α is the probability of 2 g n +i − 1

using a three-bit mutation, β is the probability of using bi = , where M ( g ) = max 2 g n + k − 1
M (g) 1≤ k ≤ n
a two-point crossover, and 1 − α − β is the probability
where i = 1, 2, ..., n .
of using a two-point realignment [4]. They should
satisfy the conditions that α ≥ 0, β ≥ 0, and α +β ≤ 1 . (9) For each chromosome in the population, construct a
l×2
n
Their defaults are 0.2 and 0.5 respectively. A random matrix Z with dimensions as follows:
switch aligns the intervals as follows:
z j 0 = 1, (16) The relative goodness of the r-th chromosome in the
min( ai + bi f ji ) − max( ai − bi f ji ), if it is > 0 m(σˆ )
2
k =1i
k =0
or if k = 2 − 1
n current population is defined by Gr = ,
=
i
z jk σˆ r
2
0, otherwise
r = 1, 2, ..., p if m (σˆ ) > 0 .
2
where j = 1, 2,..., l . If the columns of Z are not linearly
independent, this chromosome should be rejected. A (17) The probability distribution of the r-th chromosome in
new chromosome should be randomly generated to the current population is defined by
replace the rejected chromosome. The new chromosome Gr
should be decoded using step 8 and it should have the pr = p , r = 1, 2,..., p.
property that the newly generated Z matrix has linearly Gr
independent columns. GC should be updated to include r =1
the number of rejected chromosomes. (18) Use the probability distribution { pr | r = 1, 2, ..., p} and
(10) For each matrix Z, apply the QR decomposition theorem a random switch to select two different chromosomes
to find the least squares solution of the system of linear from the population to use as parents. Use α , β and a
equations Zv = q, where the elements of v represent the
random switch to select a genetic operator to produce
unknown variables c, µ1 , µ 2 ,..., µ 2 −1 [1]. There may ben
two new chromosomes as offspring.
instances where a and b cause matrix Z to contain one or p
more columns which are all zero (i.e. matrix R in the QR (19) Repeat step 18 for total times to get a new generation
decomposition is singular). When this happens, any 2
of p chromosomes. GC + p GC . Save m (σˆ ) in
2
columns which contain all zeros should be removed
from matrix Z and the µ 's values that correspond to the SE.
removed columns can be arbitrarily set. The remaining (20) For each new chromosome, repeat steps 8-11 to
µ values are still determined from the least squares determine a, b, c, µ , and σˆ r .
2
solution of the Z matrix with the columns of zeros

(21) Use the magnitude of σˆ r (the smaller the better) to
2
removed.
(11) For each chromosome in the population, calculate the select the p best chromosomes from both generations to
regression residual error σ̂ which is defined as follows:
2 form the new population. Then go to step 12.
1 l (22) For all r ∈ R , check the sign of µ 2 −1 . In cases where
n
σˆ 2 = [ y j − c − ( a + bf j ) d µ ]
2
l j =1
µ 2 −1 < 0 , perform the following sub steps: replace c by
n
c + µ 2 ⋅ max ai , replace ai by max ai − ai , switch the

n
1 l 2 −1
z jk µ k ) .
n −1
σˆ = (yj − c −
2 2
1≤ i ≤ n 1≤ i ≤ n
l j =1 k =1 sign of vector b, and switch the sign of all µ 's elements.

The residual error of the r-th chromosome in the If WT > w , proceed to the iterative search algorithm in
population is denoted by σˆ r .
2
section 4. Otherwise, display s, p, , , , , , and w.
After deleting any duplicates, display a, b, c, and of
(12) Let m (σˆ ) = min σˆ r and set R = {r | σˆ r = m (σˆ )} .
2 2 2 2
the r-th chromosome for all r ∈ R . Stop.
1≤ r ≤ p
Erase the last history record associated with R, if any,

4. ITERATIVE SEARCH ALGORITHM
and save m (σˆ ) , a, b, c, and µ of all r ∈ R in the
2
The design of the iterative search algorithm is intended to

current population. Display GC, WT, and m (σˆ ) .
2
fine tune a and b to quickly reduce the error [3]. The

(13) If m (σˆ ) < εσˆ y , then go to step 22.
2 2
The desired optimization goal of the iterative search algorithm is to
precision has been reached. Otherwise, take the next minimize the root mean square error given in step 3. Only
step. one vector and only one element inside of that vector is
considered at a time.
(14) If SE − m(σˆ ) < δσˆ y , then WT + 1 WT and take the
2 2
Given a vector and a dimension, the general idea is as
next step (i.e. no significant progress was made); follows. Take an initial step of length δ in the negative
otherwise, 0 WT and go to step 16. direction and calculate the error associated with the step.
(15) If WT > w , then go to step 22, (i.e. no significant Next, a step of length δ is taken in the positive direction
progress has been made in the past w generations); (from the original value) and once again the error is
otherwise, take the next step. calculated. Whatever step direction (positive or negative)
had the smaller error determines the direction for future
steps. The step length is then iteratively doubled in the (15) Set ∆a1 equal to the change from the original a1 to the
chosen direction until the error between iterations grows
instead of decreases. Next, the direction is reversed by current a1 . Set ∆ea equal to the change from the
1
changing the sign. The step length is then iteratively cut in

original error to the last calculated error. Set a1 back to
half until the error between iterations grows instead of
decreases. Each time the error grows, the direction is the original a1 .
reversed, but the step length continues to be cut in half.
Processing is complete for a dimension when the absolute (16) Repeat steps 4-15 for elements a2 , a3 , ..., an of vector a.
value of the step length is less than δ . The remaining (17) Repeat steps 4-15 for each element of vector b.
dimensions of the vectors need to be considered in turn. (18) Let Ea = max ∆ea , and Eb = max ∆eb . Define
1≤ i ≤ n i
1≤ i ≤ n i
4.1 Algorithm part 1 M ( ∆e) = max{Ea , Eb } and
(1) Retrieve the a, b, c, and of one chromosome from set

R.
{
M ( ∆ab) = max ∆ai , max ∆bi
1≤ i ≤ n 1≤ i ≤ n
}.
(2) Calculate yˆ j = c + ( a + bf j ) d µ for all j = 1, 2,...l . (19) If e0 > 0 , M ( ∆e) > 0 and M ( ∆ab) ≥ δ then continue
to the next step. Otherwise, proceed to step 25.
1 l
(20) ai + ∆ai ai , bi + ∆bi bi for i = 1, 2,..., n . Revise
(3) Calculate the initial error e0 = ( y j − yˆ j ) .
2
l j =1 the history record.

(4) a1 + δ a1 . Ensure a1 does not violate the vector (21) Calculate e by repeating steps 5 and 3.
(22) If e > e0 proceed to the next step. Otherwise set e e0
constraints given in algorithm part 2.
(5) Determine the new c and values by performing steps 9 and go back to step 4.
and 10 from the genetic algorithm. Calculate yˆ j for all (23) Reduce ∆ai and ∆bi by half for i = 1, 2,..., n .
j = 1, 2,...l using the formula in step 2. ai − ∆a i ai and bi − ∆bi bi for i = 1, 2,..., n .
(6) Calculate the error, eδ , using the same formula as in
+
(24) Calculate e by repeating steps 5 and 3. Go back to step
22.
step 3. (25) Output a, b, c and µ . Stop.
(7) a1 − 2δ a1 . Ensure a1 does not violate the vector
constraints given in algorithm part 2. 4.2 Algorithm part 2
(8) Repeat step 5 and then go to the next step.
(9) Calculate the error, eδ , using the same formula as in
− (1) For i = 1, 2,..., n , if ai < 0 then 0 ai .
step 3. (2) For i = 1, 2,..., n , if bi > 1 then 1 bi . If bi < −1 then
(10) If eδ and eδ are both ≥ e0 , 0 ∆a1 and 0 ∆ ea .
+ −
1 −1 bi .
Otherwise, set ADJ to either +δ or −δ depending on
which caused a smaller error. ADJ represents the 5. EXAMPLES
adjustment length of the dimension.
(11) 2 ⋅ ADJ ADJ . a1 + ADJ a1 . Ensure a1 does not The program has been coded in Java and it runs on a laptop
violate the vector constraints given in algorithm part 2. computer. Two examples are presented. The first example
Recalculate the error, e, by repeating steps 5 and then 3. involves artificial data while the second involves real data.
(12) Repeat step 11 until the error at one iteration is greater
than the previous iteration. − ADJ ADJ and proceed 5.1 Example 1
to the next step.
An artificial data set with three attributes and forty
1
(13) ADJ ADJ . a1 + ADJ a1 . Ensure a1 does not observations was created to test the program. The following
2 parameters were used in a test of the program: λ = 14 ,
violate the vector constraints given in algorithm part 2.
p = 200 , ε = 10 , δ = 10 , and w = 10 . The genetic
−10 −6
Recalculate the error, e, by repeating steps 5 and then 3.

(14) Repeat step 13 until the error at one iteration is greater algorithm portion of the program generated 10,000
−5
than the previous iteration or chromosomes and lowered the error from 3.4631 × 10 to
ADJ < δ . − ADJ ADJ . If ADJ < δ then proceed 3.4879 × 10
-6
in 0.4632 minutes. The iterative algorithm
to the next step. Otherwise go back to step 13. lowered the error from 0.4029 to 4.6569 × 10
-4
in an
additional .0870 minutes. All of the multiregression Table 1: Comparisons of the Objective Attribute Values
variables were reasonably reclaimed.
Objective Attribute Estimated Objective
5.2 Example 2 Year from Data Set Attribute
(billion dollars) (billion dollars)
An appropriate multiregression problem is to estimate the 1984 322.07 325.39
gross domestic product of a country (y) by using the worth of 1985 427.5 424.85
a country’s fixed assets ( x1 ), the total number of people in 1986 500.06 492.08
1987 603.71 614.43
the labor force ( x2 ), and the number of people who pursued 1988 765.76 715.83
higher education ( x3 ). Nineteen observations during the last 1989 843.72 859.08
century were taken as the input. The attribute of fixed assets 1990 897.99 915.82
was represented in billion dollars, and the labor force and 1991 1081.75 1076.41
higher education attributes were represented as millions of 1992 1365.06 1361.73
people. 1993 1909.49 2010.52
Eight thousand two hundred chromosomes were 1994 2666.86 2566.73
generated in the genetic algorithm portion of the program. 1995 3524.79 3532.33
Comparing the error in the first generation and the error in 1996 4146.06 4232.31
the final generation resulted in an error reduction of 72%. 1997 4638.24 4543.16
The iterative portion of the program resulted in an additional 1998 4987.5 4988.97
5% improvement. Following 0.22605 minutes of processing, 1999 5364.89 5395.17
the computer program’s output is as follows: 2000 6036.34 6025.18
a1 = 1.22790 2001 6748.15 6758.32
2002 7670 7661.64
a2 = 0
a3 = 904.37073 6. CONCLUSION
b1 = 0.61832 Though the complete algorithm shows great promise, much
b2 = 0.38082 work still must be done to reduce the complexity. One
suggestion would be to make the genetic portion of the
b3 = 0.90887 algorithm adaptive. The most significant bits in the genes
c = -121838.28702 would be altered in the initial stages to cause wider genetic
µ (∅ ) = 0.0 variation. In subsequent generations, bits of less significance
would be altered.
µ ({ x1}) = 1.67278
µ ({ x2 }) = -16.04154 REFERENCES
µ ({ x1 , x2 }) = 1.08226 [1] J. Demmel, Applied Numerical Linear Algebra,

Philadelphia: SIAM, 1997.
µ ({ x3 }) = 136.21430
[2] F. S. Hillier and G. J. Lieberman, Introduction to
µ ({ x1 , x3 }) = 0.0 Operations Research (Seventh Edition), McGraw Hill, 2001.
[3] J. Wang and Z. Wang, “Using Neural Networks to
µ ({ x2 , x3 }) = 129.04767 Determine Sugeno Measures by Statistics,” Neural
µ ({ x1 , x2 , x3 }) = 130.85585 Networks, vol. 10, pp. 183-195, 1997.
[4] Z. Wang, “A new genetic algorithm for nonlinear
multiregressions based on generalized Choquet integrals,”
Next, the objective attributes in the data set were
Proc. FUZZ-IEEE2003, pp. 819-921, 2003.
compared against the estimated objective attributes which
[5] Z. Wang and G. J. Klir, Fuzzy Measure Theory, New
were calculated using the above values (see table 1). Though
York: Plenum Press, 1992.
the data size is not sufficiently large, we still can see that the
[6] Z. Wang, Kwong-Sak Leung, Man-Leung Wong, Jian
method provided in this paper is efficient for solving
Fang, and Kebin Xu, “Nonlinear nonnegative
nonlinear multiregression problems based on the generalized
multiregressions based on Choquet integrals,” International
Choquet integral as well as some nonlinear optimization
Journal of Approximate Reasoning, vol. 25, pp. 71-87, 2000.
problems with a nondifferentiable objective function.

MarieSpilde Kerrigan

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

MarieSpilde Kerrigan

Hochgeladen von

Copyright:

Verfügbare Formate

SOLVING NONLINEAR OPTIMIZATION PROBLEMS

BASED ON GENERALIZED CHOQUET INTEGRALS

Marie Spilde and Zhenyuan Wang

KEYWORDS: Fuzzy measures, Choquet integrals, 2. FUZZY MEASURES, CHOQUET INTEGRAL

Differences instead of differentials are applied at each

The use of a genetic algorithm would provide a global

The default for λ is 10.

offspring from selected parents. α is the probability of 2 g n +i − 1

solution of the Z matrix with the columns of zeros

c + µ 2 ⋅ max ai , replace ai by max ai − ai , switch the

l j =1 k =1 sign of vector b, and switch the sign of all µ 's elements.

Erase the last history record associated with R, if any,

The design of the iterative search algorithm is intended to

fine tune a and b to quickly reduce the error [3]. The

changing the sign. The step length is then iteratively cut in

4.1 Algorithm part 1 M ( ∆e) = max{Ea , Eb } and

(1) Retrieve the a, b, c, and of one chromosome from set

l j =1 the history record.

Recalculate the error, e, by repeating steps 5 and then 3.

µ ({ x1 , x2 }) = 1.08226 [1] J. Demmel, Applied Numerical Linear Algebra,

Das könnte Ihnen auch gefallen