Sie sind auf Seite 1von 23

A Minimal Algorithm for the

Multiple-Choice Knapsack Problem.


David Pisinger
Dept. of Computer Science, University of Copenhagen,
Universitetsparken 1, DK-2100 Copenhagen, Denmark.
May, 1994

Abstract
The Multiple-Choice Knapsack Problem is defined as a 0-1 Knapsack Problem
with the addition of disjoined multiple-choice constraints. As for other knapsack
problems most of the computational effort in the solution of these problems is used
for sorting and reduction. But although O(n) algorithms which solves the linear
Multiple-Choice Knapsack Problem without sorting have been known for more than
a decade, such techniques have not been used in enumerative algorithms.
In this paper we present a simple O(n) partitioning algorithm for deriving the
optimal linear solution, and show how it may be incorporated in a dynamic programming algorithm such that a minimal number of classes are enumerated, sorted
and reduced. Computational experiments indicate that this approach leads to a
very efficient algorithm which outperforms any known algorithm for the problem.
Keywords: Packing; Knapsack Problem; Dynamic Programming; Reduction.

Introduction

Given k classes N1 , . . . , Nk of items to pack in some knapsack of capacity c. Each item


j Ni has a profit pij and a weight wij , and the problem is to choose one item from each
class such that the profit sum is maximized without having the weight sum to exceed c.
The Multiple-Choice Knapsack Problem (MCKP) may thus be formulated as:
max

z=

k X
X

pij xij

i=1 jNi

subject to

k X
X

wij xij c,

i=1 jNi

Technical Report 94/25, DIKU, University of Copenhagen, Denmark.

(1)

xij = 1,

i = 1, . . . , k,

jNi

xij {0, 1},

i = 1, . . . , k, j Ni .

All coefficients p ij , wij , and c are positive integers, and the classes N 1 , . . . , Nk are mutually
P
disjoint, class Ni having size ni . The total number of items is n = ki=1 ni .
Negative coefficients pij , wij in (1) may be handled by adding a sufficiently large constant to all items in the corresponding class as well as to c. To avoid unsolvable or trivial
situations we assume that
k
X
i=1

min wij
jNi

c <

k
X
i=1

max wij .
jNi

(2)

If we relax the integrality constraint x ij {0, 1} in (1) to 0 xij 1 we obtain the


Linear Multiple-Choice Knapsack Problem (LMCKP). If each class has two items, where
(pi1 , wi1 ) = (0, 0), i = 1, . . . , k, the problem (1) corresponds to the 0-1 Knapsack Problem
(KP). The linear version of KP will be denoted by LKP.
MCKP is NP-hard as it contains KP as a special case, but it can be solved in pseudopolynomial time through dynamic programming (Dudzinski and Walukiewicz 1987). The
problem has a large range of applications: Capital Budgeting (Nauss 1978), Menu Planning (Sinha and Zoltners 1979), transforming nonlinear KP to MCKP (Nauss 1978),
determining which components should be linked in series in order to maximize fault tolerance (Sinha and Zoltners 1979), and to accelerate ordinary LP/GUP problems by the
dual simplex algorithm (Witzgal 1977). Moreover MCKP appear by Lagrange relaxation
of several integer programming problems (Fisher 1981).
Several algorithms for MCKP have been presented during the last two decades: e.g.
Nauss (1978), Sinha and Zoltners (1979), Dudzinski and Walukiewicz (1984), and Dyer,
Kayal and Walker (1984). Most of these algorithms start by solving LMCKP in order
to obtain an upper bound for the problem. LMCKP is solved in two steps: 1) The LPdominated items are reduced by sorting the items in each class according to nondecreasing
weights, and then applying some dominance criteria to delete unpromising states. 2) The
reduced LMCKP is solved by a greedy algorithm.
After these two initial steps, upper bound tests may be used to fix several variables in
each class to their optimal value. The reduced MCKP problem is then solved to optimality
through enumeration (Dudzinski and Walukiewicz 1987).
The development in KP however indicates, that MCKP may be solved easier: Balas
and Zemel (1980) proposed for the KP only to consider a small subset of the items
the so-called core where there was a large probability for finding an optimal solution.
Such a core can be found in O(n) time through a partitioning algorithm, and since the
restricted KP defined on the core items is easy to solve for several classes of data instances,
it means that many instances may be solved in linear time (Martello and Toth 1988,
Pisinger 1994a).
Although O(n) algorithms for LMCKP have been known for a decade (Zemel 1984,
Dyer 1984), making it possible to derive a core reasonably easy, a similar technique
2

has not been used for MCKP. It has been claimed (Martello and Toth 1990) that the
reduction of LP-dominated items in any case was necessary in order to derive upper
bounds in a branch-and-bound algorithm. The current paper refutes this conjecture, but
several questions had to be treated:
Which items or classes should be included in the core?
How should we derive upper bounds in a branch-and-bound algorithm when LPdominated items have not been deleted?
How should a core be derived? Zemel (1984) and Dyer (1984) only give algorithms
for solving LMCKP, so some modification is necessary.
The partitioning algorithms by Zemel and Dyer operate on the dual to LMCKP
making them complicated to implement for practical purposes. Some simplifications
like those by Martello and Toth (1988) or Pisinger (1994a) for KP, seem necessary.
The present paper is a counterpart to a minimal algorithm for KP by Pisinger (1993): A
simple algorithm is used for solving LMCKP, and for deriving an initial feasible solution to
MCKP. Starting from this initial solution we use dynamic programming to solve MCKP,
adding new classes to the core by need. By this technique we are able to show that a
minimal number of classes are considered in order to solve MCKP to optimality.
This paper is organized in the following way: First, Section 2 brings some basic definitions, and shows fundamental properties of MCKP, while Section 3 presents a simple
partitioning algorithm for the solution of LMCKP. Next, Section 4 shows how gradients
may be used in an expanding-core, as well as presenting some logical tests which may
be used to fix some variables at their optimal value, before a class is added to the core.
Section 5 gives a description of the dynamic programming algorithm, and Section 6 shows
how we keep track on the solution vector in dynamic programming. Finally Section 7
gives the main algorithm proving the minimality, and Section 8 brings computational
experiments.

Fundamental properties

Definition 1 If two items r and s in the same class N i , satisfy


wir wis and pir pis ,

(3)

then we say that item r dominates item s.


Proposition 1 Given two items r, s N i . If item r dominates item s then an optimal
solution to MCKP with xis = 0 exists.
Proof Let x be an optimal solution to (1) with x is = 1. Then a solution x equal to x
except that xir = 1, xis = 0 will be feasible and it will have an objective value at least as
good as x due to (3).
3

(wit wir , pit pir )


r
t

(pit + pir , wit wir )

rs

(wis wir , pis pir )


r

Figure 1: Item s is LP-dominated by items r and t.


Definition 2 If items r, s, t N i , with
wir wis wit and pir pis pit ,

(4)

and the projection of vector (wis wir , pis pir ) on the normal to (wit wir , pit pir ) is
negative, i.e. if
det(wis wir , pis pir , wit wir , pit pir ) =

(5)

(wis wir )(pit pir ) (pis pir )(wit wir ) 0,


then we say that item s is LP-dominated by items r and t. See Figure 1.

Proposition 2 If two items r, t N i LP-dominate an item s N i then an optimal


solution to LMCKP with xis = 0 exists.
Proof See Sinha and Zoltners (1979).
As a consequence, we only have to consider LP-undominated items R i in the solution
of LMCKP. Note, that these items form the upper convex boundary of the set Ni , as
illustrated in Figure 2. The set of LP-undominated items may be found by ordering the
p

Figure 2: LP-undominated items R i (black) form the upper convex boundary of Ni .


4

items in each class N i according to increasing weights, and successively test the items
according to criteria (3) and (6). If two items have the same weight and profit, choose an
arbitrary of them. Now LMCKP may be solved by using the greedy algorithm:
Algorithm 1 Greedy.
1 Find the LP-undominated classes R i (ordered by increasing weights) for all classes
Ni , i = 1, . . . , k.
2 Choose the lightest item from each class (i.e. set x i1 = 1, xij = 0 for j =
2, . . . , |Ri |, i = 1, . . . , k) and define the chosen weight and profit sum as W =
Pk
Pk
i=1 wi1 , resp. P =
i=1 pi1 .
3 For all items j 6= 1 define the slope ij as
ij

pij pi,j1
, i = 1, . . . , k, j = 2, . . . , |Ri |.
wij wi,j1

(6)

This slope is a measure of the profit-to-weight ratio obtained by choosing item j


instead of item j 1 in class R i (Zemel 1980). Clearly a greedy algorithm should
choose the most profitable changes first, therefore order the slopes { ij } in nondecreasing order.
4 Take the next slope ij from {ij }. If W + wij > c goto step 5. Otherwise set
xij = 1, xi,j1 = 0 and update the sums W = W + wij wi,j1, P = P + pij pi,j1.
Repeat step 4.
5 If W = c we have an integer solution and the optimal objective value to LMCKP
(and MCKP) is z = P . Otherwise let ij be the next slope in the list. We have two
cW
fractional variables x ij = wij
wi,j1 respectivly xi,j1 = 1 xij , which both belong
to the same class. The optimal objective value is
z = P + (c W )ij .

(7)

Although several orderings of { ij } exist in step 3 when more items have the same slope,
we will assume in the following definitions that one specific ordering has been chosen.
The LP-optimal choices b i obtained by Algorithm 1 are those variables, where x ibi = 1.
The class containing two fractional variables in step 5 will be denoted the fractional class
Na , and the fractional variables are x aba , xaba (possibly with xaba = 0). An initial feasible
solution to MCKP may be constructed by choosing the LP-optimal variables, i.e. setting
xibi = 1 for i = 1, . . . , k and xij = 0 for i = 1, . . . , k, j 6= bi . The solution will be denoted
the break solution and the corresponding weight and profit sum is W resp. P .
Proposition 3 An optimal solution x to LMCKP satisfies the following: 1) x has at
most two fractional variables x aba and xaba . 2) If x has two fractional variables they must
be adjacent variables within the same class N a . 3) If x has no fractional variables, then
the break solution is an optimal solution to MCKP.
5

r
+

bi r
r
r

ir

r
r

r
r
r
w

Figure 3: Gradients +
i , i in class Ni .

Proof A consequence of Algorithm 1.


P

The presented greedy algorithm demands O( ki=1 ni log ni ) for the sorting and determination of LP-undominated classes which gives the complexity O(n log max i=1,...,k ni ).
P
Next, the ordering of slopes is done in O(n log n) time, with n = ki=1 |Ri |. Thus the
overall complexity is O(n log n). It should be mentioned, that when the classes form a
KP, algorithm 1 is exactly the greedy algorithm for LKP, and the objective value (7)
corresponds to the Dantzig upper bound for KP (Dantzig 1957).
An optimal solution to MCKP generally corresponds to the break solution, except for
some few classes where other items than the LP-optimal choices have been chosen. This
property may be illustrated the following way: Define the positive and negative gradient

+
i and i for each class Ni , i 6= a as (see Figure 3)
pij pibi
max
, i = 1, . . . , k, i 6= a,
jNi , wij >wibi wij wib
i
pibi pij
=
min
, i = 1, . . . , k, i =
6 a,
jNi , wij <wibi wib wij
i

+
=
i

(8)

(9)

and we set +
i = 0 (resp. i = ) if the set we are maximizing (resp. minimizing) over
is empty. The gradients are a measure of the expected gain (resp. loss) per weight unit
by choosing a heavier (resp. lighter) item from N i instead of the LP-optimal choice b i .
The gradient of the fractional class N a is defined as

paba paba
.
waba waba

(10)

In Figure 4 we have ordered the classes according to decreasing +


i and show how
often the IP-optimal solution to MCKP differ from the LP-optimal choice in each class
Ni . The figure is a result of 5000 randomly generated data instances (k = 100, n i = 10),
where we have measured how often the IP-optimal choice j (satisfying w ij > wibi since we
are considering forward gradients) differ from the LP-optimal choice b i in each class Ni .
It is seen, that when +
i is decreasing, so is the probability that b i is not the IP-optimal
6

+
% differences
i

1.0 25

+
i

0.2 5

frequency



class Ni
10
100

Figure 4: Frequency of classes Ni where IP-optimal choice differ from LP-optimal choice,
compared to gradient +
i .

choice. Similarly in Figure 5 we have ordered the classes according to increasing


i to

show how the probability for changes decreases with increased i .


This observation motivates considering only a small number of the classes N i , namely

those classes where +


i or i are sufficiently close to . Thus at any stage the core is
simply a set of classes {N r1 , . . . , Nrm } where r1 , . . . , rm {1, . . . , k}. Initially the core
consists of the break set Na and we expand the core by need; alternately including the

next unused class Ni which has largest +


i or smallest i .
Since a complete enumeration of the core demands considering up to n r1 nr2 nrm
states, care should be taken before including a new class to the core. We use a simple
upper bound test to fix as many variables at their optimal value as possible in the class
before it is included in the core. If only one item remains, the class may be fathomed.
Otherwise we order the remaining variables by nondecreasing weight and use test (3) to
delete dominated items. The remaining class is added to the core and the new choices
are enumerated through dynamic programming.

% differences
i


1.0 25

0.2 5
frequency
class N
i
10
100

Figure 5: Frequency of classes Ni where IP-optimal choice differ from LP-optimal choice,
compared to gradient
i .
7

A partitioning algorithm for the LMCKP

Dyer (1984) and Zemel (1984) independently of each other developed O(n) algorithms
for LMCKP. Both algorithms are based on the convexity of the LP-dual problem to (1),
which makes it possible to pair the dual line segments, so that at each iteration at least
1/6 of the line segments are deleted. When the classes form a KP the algorithms reduce
to that of Balas and Zemel (1980) for LKP. As Martello and Toth (1988) modified the
Balas and Zemel algorithm for LKP to a primal approach which is easier to implement,
we will now modify the Dyer and Zemel algorithm for LMCKP in a similar way.
Assume that Na is the fractional class and that items b a and ba are the fractional
variables in Na , such that xaba + xaba = 1, possibly with xaba = 0. Moreover let bi be
the LP-optimal choice in class N i , i = 1, . . . , k, i 6= a. Due to the properties of LMCKP
given in Proposition 3, LMCKP may be reformulated as finding the slope
=

p
w

paba paba
,
waba waba

(11)

such that the weight sum of the LP-optimal choices satisfy


X

wibi + waba

c <

i6=a

wibi + waba ,

(12)

i6=a

and
det(wij , pij , w, p) det(wibi , pibi , w, p), i = 1, . . . , k, j = 1, . . . , ni .

(13)

Here (12) ensures that Na is the fractional class, and (13) ensures that each item b i Ni
is at the upper convex boundary of the set.
The formulation (11)-(13) allows us to use a partitioning algorithm for finding the
optimal slope . In the following algorithm we assume that the classes of items N i are
represented as a list [N1 , . . . , Nk ] and items in each class are also represented as a list
[j1 , . . . , jni ]. Elements may be deleted from a list by swapping the deleted element to the
end of the list, and subsequently decreasing the lists length. Thus at any step k and n i
refer to the current number of elements in the list. The partitioning algorithm looks like
this:
Algorithm 2 Partition.
0 Preprocess. For all classes i = 1, . . . , k let i and i be indices to the items having
minimal weight (resp. maximal profit) in N i (see Figure 6). In case of several items
satisfying the criterion, choose the item having largest profit for i and smallest
weight for i .
P
P
If ki=1 wii > c no solution exists so stop. If ki=1 wii c we have a trivial solution
consisting of the items i , so stop.
Set W = P = 0, and remove those items j 6= i which have wij wii and pij pii ,
since these are dominated by item i . If the class Ni has only one item left, save
the LP-optimal choice b i = i and set W = W + wibi , P = P + pibi , then delete
class Ni .
8

r i

r
r

r
r

r i

Figure 6: Preprocessing of Ni . White nodes are dominated by i .


1 Choose median. For M randomly chosen classes N i define the corresponding slope
pi = pii pii . Using an O(n) median algorithm (Aho et. al. 1974) let
i = w
wii wii
i
p be the median of these M slopes.
= w
2 Find the conclusion. For each class N i find the items which maximize the projection
on the normal to (w, p), i.e. which maximize the determinant
det(wij , pij , w, p) = wij p pij w.

(14)

See Figure 7. We swap these items to the beginning of the list such that they have
indices {1, . . . , i } in class Ni .
3 Determine weight sum of conclusion. Let g i , hi be the lightest (resp. heaviest) item
among {1, . . . , i } in class Ni , and let W and W be the corresponding weight sums.
P
P
Thus W = W + ki=1 wigi and W = W + ki=1 wihi .

4 Check for optimal partitioning. If W c W the partitioning at (w, p) is


optimal. First, choose the lightest items from each class by setting b i = gi , W =
W +wibi , P = P +pibi . Then while W wigi +wihi c run through the classes where
i 6= 1 and choose the heaviest item by setting b i = hi , W = W wigi + wibi , P =
P pigi + pibi . The first class where W wigi + wihi > c is the fractional class N a and
p

r i
r
r
r
r

dp)
(w,

r
r i

(w, p)

Figure 7: Conclusion of Ni .
9


i r

r
r



r i

Figure 8: Partition set Ni .


an optimal objective value to LMCKP is z LM CKP = P + (c W ). If no fractional
class is defined, the LP-solution is also the optimal IP-solution. Stop.
5 Partition. We have one of the following two cases: 1) If W > c then the slope
was too small (see Figure 8). For each class N i choose i as the lightest item in
{1, . . . , i } and delete items j 6= i with wij wii . 2) If W < c then the slope
p
= w was too large. For each class Ni choose i as the heaviest item in {1, . . . , i }
and delete items j 6= i with pij pii (items j with w ij wii are too light, and
items with w ij > wii , pij pii are dominated). If the class Ni has only one item
left, save the LP-optimal choice b i = i and set W = W + wibi , P = P + pibi , then
delete class Ni . Goto Step 1.
The above algorithm may be further improved by introducing LP-dominance reductions:
Each time i or i is changed in a class Ni , we use criterion (6) to test whether any items
are LP-dominated by i and i . In this way each class will only consist of the items
close to the upper convex boundary. Computational experiments do however indicate,
that Algorithm 2 does not become considerably faster by the addition of LP-dominance
criteria.
Depending on the choice of M in step 1, we obtain different behavior of the algorithm.
The best performance is obtained by choosing as the median of all slopes i , i = 1, . . . , k
(i.e. choose M = k) but for practical purpose M 15 works well. Note that in the KP
case, Algorithm 2 becomes the partitioning algorithm of Balas and Zemel (1980).
Proposition 4 If we choose = p as the exact median of M different slopes i = pi
w
wi
in step 1 of Algorithm 2, at least M/2 items are deleted at each iteration.
Proof Since is the median of the M classes, we have i for M/2 classes, so
for these classes at least one item j 6= i exists which maximizes (14). Similarly we
have i for M/2 classes, so for these classes at least one item j 6= i exists which
maximizes (14). If W > c in step 5, at least M/2 items { i } will be deleted. Otherwise
if W < c, at least M/2 items { i } will be deleted.
10

Corollary 1 If M = 1 at least one item is deleted at each iteration of Algorithm 2,


yielding a complexity of O(n 2 ).
Proposition 5 If M = k and the size of each class n i is bounded by a constant C,
Algorithm 2 runs in O(n).
Proof Due to Proposition 4 at least k2 items are deleted at each iteration. Since n i is
1
bounded by C it means that at least 2C
n items are deleted at each iteration, yielding the
complexity.

Expanding core

Balas and Zemel (1980) proposed for KP to consider only a small amount of the items
the so-called core where there was a large probability for finding an optimal solution.
However the core cannot be identified a-priori, implying that in some cases optimality of
the core solution cannot be proved.
Pisinger (1994a) noted, that even though the core cannot be identified before KP
is solved, it can be identified while the problem is solved by using an expanding core.
Moreover Pisinger developed an algorithm which is always using a minimal core (Pisinger
1993).
We will use the same concept for MCKP, but now the core consists of the smallest
possible number of classes N i , such that an optimal solution may be determined and
proved. Where the core for KP naturally consists of items having profit-to-weight ratio
close to that of the break item, there is no natural way of ordering the classes in MCKP.
Instead we use the gradients to identify a core: Define the positive and negative gradient

+
i and i for each class Ni , i 6= a by (8) and (9). Due to (13) we have that
+

p
w


i .

(15)

Order the sets L+ = {+


i } according to nonincreasing values, and L = {i } according
to nondecreasing values. Initially the core C only consists of the fractional class N a , and
then we repeatedly add classes Ni corresponding to the next gradient from the ordered
sets L+ and L . Since each class occur twice (once in each set L + and L ), we always
check whether the class already has been added to the core.

4.1

Class reduction

Before adding a class Ni to the core C it is appropriate to fathom unpromising items


from the class. We check whether each item j N i has an upper bound larger than the
currently best solution z. For this purpose we use the weak upper bound, obtained by
relaxing the constraint in (1) on the fractional variables b a , ba Na from xba , xba {0, 1}
to xba , xba R. The upper bound on item j Ni is then
uij = P pibi + pij + (c W + wibi wij ),
11

(16)

and if uij < z + 1 we may fix xij to 0. Since the bound (16) is evaluated in constant time,
the complexity of reducing class N i is O(ni ).
If the reduced set Ni has only one item left, we fathom the class, since no choices have
to be done. Otherwise we order the items in N i according to nondecreasing weights and
delete dominated items by applying (3). The computational effort is concentrated on the
sorting, yielding a complexity of O(n i log ni ) where ni is the size of Ni . In Section 8 it will
be demonstrated that a large majority of the items may be fixed at their optimal value
by the reduction (16), thus significantly decreasing the number of items which need to be
sorted.

A dynamic programming algorithm

The core is a set of currently included classes C = {N r1 , . . . , Nrm }, so the set of partial
vectors in C is given by
YC

= { (y1 , . . . , ym ) | yi {1, . . . , nri }, i = 1, . . . , m },

(17)

where each variable yi determines that variable x iyi = 1 while the remaining binary
variables in Ni are set to zero. The weight and profit sum of a vector y i = (y1 , . . . , ym ) YC
corresponds to the weight and profit sum of the chosen variables y ri when Nri C, and
to the LP-optimal choices b i when Nri 6 C. Thus
i =

wiyi +

Ni C

i =

wibi ,

(18)

pibi .

(19)

Ni 6C

piyi +

Ni C

Ni 6C

It is convenient to represent each vector y i YC by a state (i , i , vi ), where i , i are


given above, and vi is a (not necessarily complete) representation of y i . According to the
principle of optimality (Ibaraki 1987) we may fathom some of these states:
Definition 3 Given two states ( i , i , vi ) and (j , j , vj ). The state i is said to dominate
the state j if i j and i j .
Proposition 6 If a state i dominates another state j we may fathom the dominated
state j.
Proof Similar to Proposition 1.
We will keep the set Y C = {(1 , 1 , v1 ), . . . , (m , m , vm )} ordered according to increasing
weight and profit sums (i < i+1 and i < i+1 ) in order to easily fathom dominated
states.
When a new class N is added to the core C, we must enumerate the new choices and
delete dominated states. A clever way of doing this is by using a divide and conquer
algorithm (Pisinger 1994b), which takes advantage of the ordering of Y C and N.
12

The idea is to divide N recursively in two equally sized parts N A and NB until
hereby obtained sets have size 1. A set NA of size 1 is trivially multiplied with the
Y simply by adding the remaining item (w, p) to each state in Y , and the product
YA will still be ordered. Finally the sets Y A and YB are merged two by two, and
dominated states are removed.

the
set
set
the

Algorithm 3
procedure divide(Y , N, var Y );

{ Determine Y = Y N, where the product is Y = {(1 , 1 , v1 ), . . . , (m , m


, vm )}, and }
{ the multipliers are Y = {( 1 , 1 , v1 ), . . . , (m , m , vm )} resp. N = {(wf , pf ), . . . , (wl , pl )}. }
if (f = l) then
for i := 1 to m do (i , i , vi ) := (i wb + wf , i pb + pf , vi {f }); rof;
m := m;
else
d := (f + l)/2;
NA := {(wf , pf ), . . . , (wd , pd )};
NB := {(wd+1 , pd+1 ), . . . , (wl , pl )};
divide(Y, NA , YA ); divide(Y, NB , YB ); conquer(YA , YB , Y );
fi;
procedure conquer(Y , Y , var Y );

{ Determine Y = Y + Y , where the sum is Y = {(1 , 1 , v1), . . . , (m , m


, vm )}, and the }

{ addends are Y = {(1 , 1 , v1 ), . . . , (m , m , vm )} resp. Y = {(1 , 1 , v1 ), . . . , (m , m


, vm )}. }

i := 1; j := 1; k := 1; (m+1 , m+1 , vm+1 ) := (, , {}); (m +1 , m


+1 , vm +1 ) := (, , {});
if (1 1 ) then (1 , 1 , v1 ) := (1 , 1 , v1 ); i := 2; else (1 , 1 , v1 ) := (1 , 1 , v1 ); j := 2; fi;
repeat
if (i j ) then { Choose smallest weight to ensure ordering. }
if (i , i , vi ) is not dominated by ( k , k , vk ) then
if (i , i , vi ) does not dominate (k , k , vk ) then k := k + 1; fi;
(k , k , vk ) := (i , i , vi );
fi; i := i + 1;
else
if (j , j , vj ) is not dominated by ( k , k , vk ) then
if (j , j , vj ) does not dominate (k , k , vk ) then k := k + 1; fi;
(k , k , vk ) := (j , j , vj );
fi; j := j + 1;
fi;
until (i = m + 1) and (j = m + 1);
m := k;
To extend the core C with a new class Ni simply call divide(Y C , Ni , YC{Ni } ) to obtain
the new set of states YC{Ni } . The algorithm has time complexity O(mn i log2 (ni )) where
m is the size of YC and ni the size of Ni . For most data instances a great majority of
13

the new states are deleted by dominance such that far less than the expected mn i states
are generated. The structure of Algorithm 3 implies that many of the dominated states
may be deleted already when the first sets are merged, leading to a considerably faster
computation.

5.1

Reduction of states

Although the number of states in Y C at any time is bounded by the capacity c due to the
dominance criterion (3), the enumeration may be considerably improved by adding some
upper bound tests in order to delete unpromising states.
Assume that the core C is obtained by adding classes corresponding to the first m
gradients from L and L+ and that Ns and Nt are the next classes to be added from each
set. Thus the gradients satisfy
max

i
s,

(20)

min +
+
i
t .

(21)

Ni 6C

Ni 6C

By this assumption we get the following upper bounds on a state i given by ( i , i , vi )


u(i) =

i + (c i )+
t

if

i c,
(22)

i + (c

i )
s

if

i > c.

+
is empty,
For conveniency we set +
is empty, and
t = 0 if the set L
s = if L
ensuring that states which cannot be improved further are fathomed.
The bound (22) may also be used for deriving a global upper bound on MCKP. Since
any optimal solution must follow a branch in Y C , the global upper bound corresponds to
the upper bound of the most feasible branch in Y C . Therefore a global upper bound on
MCKP is given by

uMCKP =

max u(i).
y i YC

(23)

Since the gradient +


t will be decreasing during the solution process, and the gradient s
will be increasing, u MCKP will become more and more tight as the core is expanded. For
a complete core C = {N1 , . . . , Nk } we get uMCKP = z for the optimal solution z.

Finding the solution vector

According to Bellmans classical description of dynamic programming (Bellman 1957),


the optimal solution vector x may be found by backtracking through the sets of states,
implying that all sets of states should be saved during the solution process. In the computational experiments it is demonstrated that the number of states may be half a million
in each iteration and since the number of classes may be large (k = 10000) we would need
14

to store billions of states. Pisinger (1993) proposed to save only the last a changes in the
solution vector in each state (, , v). If this information is not sufficient for reconstructing the solution vector, we simply solve a new MCKP problem with a reduced number of
variables. This is repeated till the solution vector is completely defined. More precisely
we do the following:
Whenever an improved solution is found during the construction of Y C , we save the
corresponding state (, , v). After termination of the algorithm, the solution vector is
reconstructed as far as possible. First, all variables are set to the break solution meaning
that xibi = 1 for i = 1, . . . , k and xij = 0 for i = 1, . . . , k, j 6= bi . Then we run through
the vector v in the following way:
Algorithm 4
procedure definesolution(, , v);
for h := 1 to a do { v = {v1 , . . . , va } are the last a changes in the solution vector. }
i := vh .i; { i is the class corresponding to vh . }
j := vh .j; { j is the variable corresponding to v h . }
:= wij + wibi ; := pij + pibi ; xij := 1; xibi := 0;
rof;
If the backtracked profit and weight sums , correspond to the profit and weight sums
of the break solution W, P , we know that the constructed vector is correct. Otherwise we
solve a new MCKP, this time with capacity c = , lower bound z = 1, and global
upper bound u = . The process is repeated until the solution vector x is completely
defined. This technique has proved very efficient, since generally only a few iterations
are needed. With a = 10, a maximum of 4 iterations has been observed for large data
instances, but generally the optimal solution vector is found after the first iteration.

Main algorithm

The previous sections may be summed up to the following main algorithm:


Algorithm 5
procedure mcknap;
Solve LMCKP through a partitioning algorithm.

Determine gradients L+ = {+
i } and L = {i } for i = 1, . . . , k, i 6= a.
Partially sort L+ in decreasing order and L in increasing order.
z := 0; C := {Na }; YC := reduceclass(Na );
repeat
reduceset(YC ); if (YC = ) then break; fi;
Ni := L
s ; s := s + 1; { Choose next class to be included. }
if (Ni is not used) then
Ri := reduceclass(Ni );
if (|Ri | > 1) then add(YC , Ri );
fi;
15

reduceset(YC ); if (YC = ) then break; fi;


Ni := L+
t ; t := t + 1; { Choose next class to be included. }
if (Ni is not used) then
Ri := reduceclass(Ni );
if (|Ri | > 1) then add(YC , Ri );
fi;
forever;
Find the solution vector.
The first step of the algorithm is to solve the LMCKP as sketched in Section 3. Hereby
we obtain the fractional class N a , the break solution {bi } as well as the corresponding
weight and profit sum W and P .

The gradients +
i and i are determined and the sets L and L are ordered. Since we
initially do not need a complete ordering, we use a partial ordering as presented in Pisinger
(1994a): Using the quicksort algorithm for sorting (Hoare 1962), we always choose the
interval containing largest values (resp. smallest for L ) for further partitioning, while
the other interval is pushed onto a stack. In this way we continue until the largest (resp.
smallest) values have been determined. Later in Algorithm 5, if more values are needed,
we simply pop the next interval from the stack by need and partition it further.
Our initial core is the fractional class N a , which is reduced by procedure reduceclass.
Here we apply criterion (16) to fix as many variables as possible at their optimal value. If
the reduced class has more than one item left, we sort the items according to increasing
weight, and then apply criterion (3) to remove dominated items. Hereby we obtain the
reduced class Ra which is the current set of states YC .
The set of states YC is reduced by procedure reduceset which apply criterion (22) to
fathom unpromising states. Moreover the procedure checks whether any feasible state
( c) has improved the lower bound z, and updates the current best solution in that
case.
Now we alternately include classes from L + and L , each time reducing the class to
see if it must be added to the core. The reduced class R i is added to the set of states YC
by using Algorithm 3, indicated by procedure add above.
The iteration stops when no more states are feasible, meaning that no improvements
+

can occur. Note that we set +


t = 0 when L is empty, and s = when L is empty,
meaning that the iteration in any case will stop when all classes have been considered.

7.1

Minimality

We claim Algorithm 5 solves MCKP to optimality with a minimal core and with minimal
effort for sorting and reduction. More precisely we have:
Definition 4 Given a core C and the corresponding set of states Y C . We say that the
core problem has been solved to optimality if one (or both) of the following cases occur:
1 YC = meaning that all states were fathomed due to an upper bound test.
16

2 All classes Ni 6 C could be reduced to contain only the LP-optimal choice b i .


Definition 5 MCKP has been solved with a minimal core if the following invariant holds:
A class Ns (resp. Nt ) is only added to the core C if the corresponding core problem could
not be solved to optimality, and the set N s (resp. Nt ) has the smallest gradient
s (resp.
largest gradient +
).
t
The definition states, that a class Ns (resp. Nt ) should be added to the core only if it
cannot be avoided by any upper bound test, and if all classes with smaller (resp. larger)
gradients have been considered. The definition ensures that if MCKP has been solved
to optimality with a minimal core C, no smaller subset core C C exists. Anyway a
smaller sized core C may exist if C 6 C and C 6 C but according to our definition
such cores are not comparable. Algorithm 5 finds the minimal core (of several possible)
which is symmetric around N a .
Definition 6 The effort used for reduction has been minimal if a class N i is reduced only
when the core C could not be solved to optimality, and N i is the next class to be included
according to the rule in definition 5.
Definition 7 The sorting effort has been minimal if 1) A class N i is sorted only when
the current core C could not be solved to optimality, 2) N i is the next class to be included
according to definition 5, and 3) only items which have passed the reduction criterion (22)
are sorted.
Proposition 7 The presented algorithm solved MCKP with a minimal core, using minimal reduction and sorting effort.
Proof A consequence of Section 4 (minimal core), and Algorithm 5 (minimal reduction,
sorting).

Computational experiments

The presented algorithm has been implemented in ANSI-C, and a complete listing is
available from the author on request. The following results have been achieved on a
HP9000/730 computer.
We will consider how the algorithm behaves for different problem sizes, test instances,
and data-ranges. Five types of randomly generated data instances are considered, each
instance tested with data-range R1 = 1000 or R2 = 10000 for different number of classes
k and sizes ni :
Uncorrelated data instances (uc): In each class we generate ni items by choosing
wij and pij randomly in [1, R].
Weakly correlated data instances (wc): In each class, w ij is randomly distributed in
[1, R] and pij is randomly distributed in [w ij 10, wij + 10], such that pij 1.
17

Strongly correlated data instances (sc): For KP these instances are generated as w j
randomly distributed in [1, R] and p j = wj + 10, which are very hard indeed. Such
instances are trivial for MCKP, since they degenerate to subset-sum data instances,
but hard instances for MCKP may be constructed by cumulating strongly correlated
KP-instances: For each class generate ni items (wj , pj ) as for KP, and order these
P
by increasing weight. The data instance for MCKP is then w ij = jh=1 wh , pij =
Pj

j = 1, . . . , ni . Such instances have no dominated items, and form an


h=1 ph ,
upper convex set.
Subset-sum data instances (ss): wij randomly distributed in [1, R] and p ij = wij .
Such instances are hard since any upper bound will yield u ij = c.
Sinha and Zoltners (sz): Sinha and Zoltners (1979) constructed their instances in
a special way. For each class construct ni items as (wj , pj ) randomly distributed in
[1, R]. Order the profits and weights in increasing order, and set w ij = wj , pij =
pj , j = 1, . . . , ni . Note that such data instances have no dominated items.
The constant M in Algorithm 2 is chosen as M = 15 and for each data instance the
capacity c is

k 
1X
min wij + max wij .
(24)
c =
jNi
2 i=1 jNi
We construct and solve 100 different data instances for each problem type, size and range.
The presented results are average values or extreme values.
First Table I shows the average core size (measured in classes) for solving MCKP to
optimality. For most instances only a few classes need to be considered in the dynamic
programming. The strongly correlated data instances however demand that almost all
classes are considered.
Table II shows how many classes have been tested by criterion (16). It is seen, that
when many classes are present, only a few percent of the classes are reduced, meaning
that we may solve the problem to optimality without even considering a large majority

k
10
100
1000
10000
10
100
1000
10
100

ni
10
10
10
10
100
100
100
1000
1000

UC
R1 R2
2
2
8
9
15 20
10 28
2
3
7 10
6 17
1
2
1
6

WC
R1 R2
8
8
11 16
7 12
1 10
4
5
3
6
1
4
2
2
0
2

SC
R1
R2
8
9
85
84
791
775
7563 7800
8
8
84
95
839
915
4
8
25
82

SS
R1 R2
2
4
2
4
0
2
0
0
1
2
0
1
0
0
0
1
0
0

SZ
R1 R2
6
5
17 17
18 33
11 33
7
8
15 34
11 41
6
9
9 30

Table I: Final core-size. Average of 100 instances.


18

k
10
100
1000
10000
10
100
1000
10
100

ni
10
10
10
10
100
100
100
1000
1000

UC
R1 R2
52 55
46 63
20 52
0 26
42 60
23 56
1 29
10 48
1 20

WC
R1 R2
87 88
14 19
1
1
0
0
43 55
3
6
0
0
16 20
0
2

SC
R1 R2
85 88
87 86
82 82
78 80
81 81
84 95
84 92
45 79
25 82

SS
R1 R2
23 37
2
4
0
0
0
0
10 19
0
1
0
0
0 11
0
0

SZ
R1 R2
83 82
68 80
18 70
0 19
80 90
23 82
2 21
66 97
11 39

Table II: Percentage of all classes which have been tested by weak upper bound. Average
of 100 instances.
of the classes. The strongly correlated data instances again demonstrate that almost all
classes must be considered.
The efficiency of the weak upper bound (16) is given in Table III. The entries show
how many percent of the tested items which are reduced. Generally a large majority of
the variables are fixed to their optimal value this way.
To illustrate the hardness of the dynamic programming, we measure the largest set
of states YC for each data instance in Table IV. It is seen that strongly correlated data
instances may result in more than half a million states. Still this is far less than the c
states, which is the guaranteed maximum.
Finally Table V gives the average computational times. Easy data instances are solved
in a fraction of a second. Only the strongly correlated instances demand more computational effort, but are still solved within 30 minutes.
The above results indicate that the presented algorithm outperforms any algorithm
for MCKP (see Sinha and Zoltners 1979, Armstrong et. al 1983, Dyer, Kayal and Walker
1984), implying that the stated minimal properties cause drastical reductions in the computational times.

k
10
100
1000
10000
10
100
1000
10
100

ni
10
10
10
10
100
100
100
1000
1000

UC
R1 R2
83
84
88
88
89
90
86
90
98
98
99
99
98
99
100 100
100 100

WC
R1 R2
48 27
62 56
68 49
80 68
75 61
87 68
94 86
87 58
94 85

SC
R1 R2
45 34
51 51
53 54
50 52
84 79
85 85
84 85
50 94
50 94

SS
R1 R2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

SZ
R1 R2
70 73
86 86
88 89
72 90
86 85
93 97
94 98
89 90
93 96

Table III: Percentage of tested items which are reduced. Average of 100 instances.
19

k
10
100
1000
10000
10
100
1000
10
100

ni
10
10
10
10
100
100
100
1000
1000

UC
R1 R2
0
0
0
0
1
0
1
4
0
0
0
0
0
1
0
0
0
0

WC
R1 R2
1 10
4 52
4 39
5 46
4 40
4 40
3 43
3 35
3 36

R1
3
7
20
84
1
4
10
3
25

SC
R2
24
68
194
572
10
26
106
4
9

SS
R1 R2
4 47
4 38
0 28
0
0
3 28
3 28
0
0
0 30
0 20

SZ
R1 R2
0
0
0
0
2
3
4 12
1
1
3
3
4
8
3 10
4 31

Table IV: Largest set of states in dynamic programming (in thousands). Maximum of 100
instances.

k
10
100
1000
10000
10
100
1000
10
100

ni
10
10
10
10
100
100
100
1000
1000

UC
R1
R2
0.00 0.00
0.00 0.00
0.03 0.03
0.25 0.31
0.00 0.00
0.02 0.02
0.14 0.17
0.02 0.03
0.12 0.15

WC
R1
R2
0.01 0.05
0.02 0.28
0.03 0.23
0.24 0.42
0.03 0.58
0.03 0.55
0.16 0.43
0.12 2.75
0.18 1.11

SC
R1
0.01
0.37
7.30
169.94
0.02
0.33
9.57
1.64
173.69

R2
0.09
5.16
92.46
1628.57
0.19
6.93
195.75
0.14
2.97

SS
R1
0.01
0.01
0.01
0.17
0.06
0.01
0.13
0.02
0.13

R2
0.17
0.11
0.09
0.17
1.05
0.68
0.13
12.55
0.15

SZ
R1
R2
0.00 0.00
0.01 0.01
0.04 0.05
0.33 0.41
0.01 0.02
0.05 0.07
0.24 0.32
0.19 0.74
0.41 2.66

Table V: Total computing time in seconds. Average of 100 instances.

Conclusions

We have presented a complete algorithm for the exact solution of the Multiple-Choice
Knapsack Problem. To our knowledge, it is the first enumerative algorithm which makes
use of the partitioning algorithms by Dyer (1984) and Zemel (1984). In order to do this,
it has been necessary to derive new upper bounds based on the positive and negative
gradients, as well as choosing a strategy for which classes should be added to the core.
The algorithm satisfies some minimality constraints as defined in Section 7.1: It solves
MCKP with a minimal core, since variables only are added to the core if the current core
could not be solved to optimality, and the effort used for sorting and reduction is also
minimal according to the stated definitions.
Computational experiments document that the presented algorithm is indeed very
efficient. Even very large data instances are solved in a fraction of a second; only strongly
correlated data instances demand more computational effort. It is our hope that the
appearance of this algorithm will promote the use of the MCKP model, since it is far
more flexible than e.g. a simple KP model.
20

k
100
1000
10000
100000

minknap
0.00
0.00
0.01
0.10

R1
mcknap
0.00
0.02
0.13
1.37

preproc
0.00
0.01
0.12
1.36

minknap
0.00
0.00
0.02
0.16

R2
mcknap
0.00
0.02
0.21
1.53

preproc
0.00
0.01
0.12
1.34

Table VI: Total computing time in seconds for solving 0-1 Knapsack Problems. Uncorrelated data instances. Average of 100 instances.

Appendix A: 0-1 Knapsack Problems


The algorithm developed, may equally well be used for solving 0-1 Knapsack Problems,
but this will naturally yield some overhead compared to specialized algorithms for the 0-1
Knapsack Problem.
Table VI compares the running times of mcknap with those of minknap (Pisinger
1994c). It is seen, that generally mcknap spends about 10 times more computational
time for the solution than minknap. However column preproc shows, that most of
the overhead is spent for the preprocessing (sorting and removal of dominated items)
where minknap obviously is able to use a faster algorithm for these steps, as there are
no dominated items in the classes of a 0-1 Knapsack Problem.
In spite of the higher computational times for mcknap it is seen, that the developed
algorithm has a stable behavior, even in this extreme case.

References
Aho, A.V., J.E. Hopcroft and J.D. Ullman, The design and analysis of Computer
algorithms, Addison-Wesley, Reading, MA, 1974.
Armstrong, R.D., D.S. Kung, P. Sinha and A.A. Zoltners, A Computational
Study of a Multiple-Choice Knapsack Algorithm, ACM Transactions on Mathematical
Software, 2 (1983), 184-198.
Balas, E. and E. Zemel, An Algorithm for Large Zero-One Knapsack Problems,
Operations Research, 28 (1980), 1130-1154.
Bellman, R.E., Dynamic Programming, Princeton University Press, Princeton, N.J.,
(1957).
Dantzig, G.B., Discrete Variable Extremum Problems, Operations Research, 5 (1957),
266-277.
Dudzinski K. and S. Walukiewicz, A fast algorithm for the linear multiple-choice
21

knapsack problem, Operations Research Letters, 3 (1984) 205-209.


Dudzinski K. and S. Walukiewicz, Exact methods for the knapsack problem and
its generalizations, European Journal of Operational Research, 28 (1987) 3-21.
Dyer M.E., An O(n) algorithm for the multiple-choice knapsack linear program, Mathematical Programming, 29 (1984) 57-63.
Dyer M.E., N. Kayal and J. Walker, A branch and bound algorithm for solving
the multiple choice knapsack problem, Journal of Computational and Applied Mathematics 11 (1984) 231-249.
Fisher, M.L., The Lagrangian Relaxation Method for Solving Integer Programming
Problems, Management Science, 27 (1981), 1-18.
Hoare, C.A.R., Quicksort, Computer Journal, 5, 1 (1962), 10-15.
Ibaraki, T., Enumerative Approaches to Combinatorial Optimization - Part 2, Annals
of Operations Research, 11 (1987).
Martello, S. and P. Toth, A New Algorithm for the 0-1 Knapsack Problem, Management Science, 34 (1988), 633-644.
Martello, S. and P. Toth, Knapsack Problems: Algorithms and Computer Implementations, Wiley, England, 1990.
Nauss, R.M., The 0-1 knapsack problem with multiple choice constraint, European
Journal of Operational Research, 2 (1978), 125-131.
Pisinger, D., On the solution of 0-1 knapsack problems with minimal preprocessing,
Proceedings NOAS93, Trondheim, Norway, June 11-12. (1993).
Pisinger, D., An expanding-core algorithm for the exact 0-1 Knapsack Problem, To
appear in European Journal of Operational Research (1994a).
Pisinger, D., Solving hard knapsack problems, DIKU, University of Copenhagen,
Denmark, Report 94/24 (1994b).
Pisinger, D., A minimal algorithm for the 0-1 Knapsack Problem, DIKU, University
of Copenhagen, Denmark, Report 94/23 (1994c).
Sinha, A. and A.A. Zoltners, The multiple-choice knapsack problem, Operations
Research 27 (1979) 503-515.
22

Witzgal, C., On One-Row Linear Programs, Applied Mathematics Division, National


Bureau of Standards (1977).
Zemel, E., The linear multiple choice knapsack problem, Operations Research 28
(1980) 1412-1423.
Zemel, E., An O(n) algorithm for the linear multiple choice knapsack problem and
related problems, Information Processing Letters, 18 (1984) 123-128.

23