Beruflich Dokumente
Kultur Dokumente
Laura Cruz R., Joaquin Pérez O., Irma Y. Hernández B., Nelson Rangel V.,
Norma E. Garcı́a A., and Victor M. Alvarez H.
1 Introduction
The advances in computer and communication technology facilitate the im-
plementation of Distributed Database Systems. However, the commercial Dis-
tributed Database Management Systems should be managed by expert profes-
sionals so they can develop databases for the web without support of robust
methodologies and design assistance tools. A distributed system with a wrong
data distribution design could undergo severe performance degradation.
Few mathematical models have been developed for data-objects distribution
in the web; in [1] we proposed one of them named DFAR model, an acronym
for distribution, fragmentation, allocation, and reallocation of data-objects. The
problem modeled by DFAR is NP-hard and it has been solved using the exact
method Branch and Bound and approximated algorithms like Threshold Accept-
ing, Tabu Search, and Genetic Algorithms. We have carried out a large number
of experiments with them, and no algorithm showed absolute superiority. Those
results correspond with the conjecture of Wolpert’s No-Free-Lunch Theorem
(NFL) about different algorithms being appropriated for different problems [2].
Hence, we have also been working in developing an automatic method for
algorithm selection [3]. In this paper the selection of exact algorithms is ap-
proached with an architecture based on heuristic sampling. The proposed ar-
chitecture allows to choose, from an exact algorithm set, the most promising
to solve a particular instance of DFAR. This architecture consists mainly of
two modules. The Converter module transforms DFAR instances to instances of
classical Constraint Satisfaction Problem (CSP); with a converter, we can take
advantage of the already existing algorithms for CSP. The Selector module does
a heuristic sampling.
To validate our approach we carried out experiments using exact algorithms
based on backtracking search. The Selector module does a heuristic sampling
derived from Knuth´s method, in other words, it estimates the efficiency taking
random samples of the search tree associated to a particular algorithm.
This paper is organized as follow: A description of DFAR and CSP problems
is given in section 2. Section 3 describes three backtracking solution methods.
The heuristic sampling and related works are shown in section 4. The architecture
to select exact algorithm is presented in section 5. The experimental results are
shown in section 6.
2 Optimization Problems
In this section the mathematical formulation of the data distribution problem
modeled by DFAR and the Constraint Satisfaction Problem are described.
XX XX XXX
min z = fki qkm lkm cij xmj + cl fki ykj +
k i m j i k j
X XXX
c2 wj + ami cij dm xmj (1)
j m i j
where
fki = emission frequency of query k from site i;
qkm = usage parameter, qkm = 1 if query k uses attribute m,
otherwise qkm = 0;
lkm = number of packets for transporting attribute m for query k;
cij = communication cost between sites i and j;
cl = cost for accessing several fragments to satisfy a query;
ykj = indicates if query k accesses one or more attributes located at site j;
c2 = cost for allocating a fragment to a site;
wj = indicates if there exist attributes at site j;
ami = indicates if attribute m was previously located at site i;
dm = number of packets for moving attribute m to another site if necessary;
The most commonly algorithms for CSP are based on chronological backtrack-
ing, variants can be obtained adding arc consistency and Branch and Bound
techniques.
The main difference among different backtracking algorithms is the level of con-
sistency. The technique more commonly used for domain reduction is Arc Con-
sistency (AC). This algorithm verifies consistency on variables that still have not
been instanced and remove values from their domains, reducing the branching
factor. We implemented two functions, with different consistency level: General
Arc Consistency (GAC3) and Arc Consistency 4 (AC4). The key data structure
used by GAC3 [4] is a stack of variables. The AC4 algorithm [5] uses a stack of
variable-value records.
3.3 Backtracking Algorithm with Branch and Bound
Backtracking algorithms in the Figure 1 solve decision problems like CSP, but
not optimization problems like DFAR. The next code is an adaptation of Back-
tracking with B&B, and solves optimization problems.
The algorithm finds an assignment with less cost than an upper bound given
(c*). Then, it searches the entire tree, when it finds a feasible solution (line 1)
updates the upper bound. This algorithm verifies consistency on variables that
still have not been instanced and remove values from their domains (lines 5 to
9), reducing the branching factor with the function FORCE CONSISTENCY. If
the partial assignment does not exceed the upper bound (line 11) the algorithm
continues in the next depth level (line 12), otherwise prunes the branch, restores
the level and does a backtrack.
In this section, we review the related works about sampling of algorithm per-
formance, which are based on tree search, and the proposed heuristic sampling
method.
4.1 Related Works
Few researches have identified the algorithm dominance regions considering more
than one characteristic of the problem, but they do not identify formally and
systematically the characteristics that affect performance in a critical way, and
do not incorporate them explicitly in a performance model. In contrast, in [3] we
proposed a methodology to model algorithm performance predictors that incor-
porate critical characteristics. The relationship among performance and charac-
teristics is learned from historical data using machine learning techniques.
The selection methodology consists of three phases: initial training, predic-
tion and retraining (Figure 5). The first phase constitutes the kernel of the
selection process. In this phase, starting from a set of historical data solved with
several algorithms, machine learning techniques are applied, in particular clus-
tering and classification, to learn the relationship among the performance and
problem characteristics. In the prediction phase the relationship learned is ap-
plied to algorithm selection for a given instance. The purpose of retraining phase
is to improve the accuracy of the prediction with new experiences.
Some experiments were carried out with the proposed selection architecture for
exact algorithms. Three backtracking algorithms were tested with B&B: CB,
AC4 and GAC3.
Table 1 presents a small instance set, which was selected from a sample
with 100 DFAR instances; columns 2, 3 and 4 contain attributes, sites, and
queries; column 5 indicates the level of constrained capacity. A cause of the
DFAR instances were converted to CSP instances, columns 6 and 7 show the
number of variables and constraints respectively. Column 8 shows the optimum
solution of DFAR instances.
DFAR CSP
Instance Optimium
Attrib. Sites Queries Constrained Num. of Num. of
Capacity Variables Constraints
I 1 30 20 20 Maximum 600 50 40532
I 2 33 22 22 Maximum 726 55 44585
I 3 33 22 22 Without 726 33 3324
I 4 39 26 26 Medium 1014 64 52691
I 5 39 26 26 Without 1014 39 3928
I 6 42 28 28 Without 1176 42 4230
I 7 51 34 34 Without 1734 51 5137
The SELECTOR algorithm was executed 30 times for each instance using
tn = 1 sec, tm = 2 sec, threads = 5 and depth = 0.2. The results are summarized
in Table 2. To verify the prediction quality of the SELECTOR, the algorithms
CB, GAC3 and AC4 were executed completely in order to determine the real
best algorithm for each instance (column 2) and the average time tb to execute
them (column 3). Columns 4, 5 y 6 indicate the times that each algorithm is
chosen as the most promising algorithm, the average time ts for execute it is in
column 7. Column 8 is for accuracy (percentage of successful selections).
For example, the algorithm BC was the best to solve the instance I 1, and
the execution time was 9.61 seconds. The selector chose the algorithms AC4,
CB y GAC3, 13, 17 and 0 times respectively; and was executed in 8.90 seconds.
For this instance, an accuracy of 57% was obtained, but in general, the selector
predicted the right algorithm for 86 % of the 100 instances.
Besides, the results of executing the selected algorithms were contrasted with
a random election. The accumulated time of the our proposal (4765.51 sec.) was
smaller than the second (7273.91 sec.); the difference was of 2508.4 seconds (41.8
minutes).
Table 2. Example of DFAR instances converted to CSP instances
Selector
Best Algorithm
Instance
Selected Algorithm Selection Accuracy
Real Execution AC4 CB GAC3 Time ts %
time tb
I 1 BC 9.61 13 17 0 8.90 57
I 2 BC 13.35 6 24 0 6.10 80
I 3 AC4 130.94 21 8 1 3.52 70
I 4 BC 24.25 5 25 0 17.00 83
I 5 BC 23.29 6 24 0 21.65 80
I 6 BC 46.24 7 23 0 24.72 77
I 7 AC4 728.64 20 10 0 4.23 67
References
[1] Pérez, J., Pazos, R.A., Frausto, J., Reyes, G., Santaolaya, R., Fraire, H., Cruz,
L., An Approach for Solving Very Large Scale Instances of the Design Distribution
Problems for distributed DataBase Systems, Lectures Notes in Computer Science,
Springer-Verlag, Berlin Heidelberg New York ,2005.
[2] Wolpert, D. H., Macready, W. G., No Free Lunch Theorems for Optimizations,
IEEE Transactions on Evolutionary Computation, Vol. 1, pp. 67-82, 1997.
[3] Pérez, J., Pazos, R.A., Frausto, J., Rodrı́guez, G., Romero, D., Cruz, L., A Sta-
tistical Approach for Algorithm Selection, Lectures Notes in Computer Science, Vol.
3059. Springer-Verlag, Berlin Heidelberg New York, pp. 417-431, 2004.
[4] Kumar, V., Algorithms for Constraint Satisfaction Problems: A Survey, AI Maga-
zine, 1992.
[5] Liu, Z., Algorithms for Constraint Satisfaction Problems, Master Thesis, University
of Waterloo, Ontario, Canadá, 1998.
[6] Knuth, D., Estimating the efficiency of backtrack programs, Mathematics of com-
putation, 1975.
[7] Purdom, P., Tree size by partial backtracking, SIAM J. Comput., 1978.
[8] Sillito, J., Improvements to and estimating the cost of backtracking algorithms for
constraint satisfaction problems, Master Thesis, University of Alberta, Edmonton,
Alberta, 2000.
[9] Lobjois, L., Lemaitre, M., Branch and bound algorithm selection by performance
prediction, Proceedings of the Fifteenth National Conference on Artificial Intelli-
gence, Madison, Wisconsin, 1998.