Beruflich Dokumente
Kultur Dokumente
Genetics and population analysis Advance Access publication July 17, 2010
© The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2145
LSMs were proposed not only for the study of computer science
optimization problems, but also with the intention of future use in
directed evolution (Corne et al., 2003). In this article, we begin to
Fig. 1. Representation of an exact LSM for the MaxOnes problem where examine, empirically, their potential in this area for the first time.
L = 5 (Corne et al., 2003). Until now, no attempt has been made to model recombination
events using LSMs. This task is not a trivial extension to the
scheme since populating a three dimensional transition matrix (for
development and availability of high-throughput sequencers and/or the fitnesses of two parents and one child) would require a much
microarrays, even more extensive datasets will emerge over the greater number of evaluations on the real landscape. In other words,
coming months and years. This potential development, while the sampling will be at an even greater order of sparseness than with
exciting, is only part of the battle, however. In order to understand the standard LSM. The LSM will also be limited in the algorithms
something useful about the landscapes and the dynamics of evolution it can assess because the sampling algorithm will only sample
on them, appropriate models or abstractions from the raw data must combinations of parents from a range dictated by the selection
be inferred. pressure employed and the properties of the fitness landscape. We
In computer science, in the field of evolutionary computation, attempt to correct for these problems using a simple heuristic,
characterizing fitness landscapes and how their features might described in ‘Methods’ section.
affect evolutionary progress and dynamics is an area of intense The performances of a range of genetic algorithms (including
study (Jones and Forrest, 1995; Kallel et al., 2001; Merz, 2004). those that incorporate recombination) were assessed on both
Grefenstette (1995) and Altenberg (1995) independently developed real landscapes and LSM abstractions of these landscapes. NK-
the idea of modelling/predicting evolutionary progress from fitness landscapes (Kauffman and Levin, 1987) with different levels of
selection pressures and so forth) can then be assessed without the The properties of the landscape can be tuned by varying K. When K is zero
use of any genotypic information. the landscape is smooth resulting in a single peak of the Mount Fuji type.
For more complex landscapes, calculating these transitions or When K is increased the landscape becomes increasingly rugged resulting
even storing a matrix of transitions between all points within S in many fitness peaks, making it more difficult for algorithms with high-
becomes intractable. If this is the case, points within S are pooled selection pressure to find the global optimum. Two NK-landscapes were
together into equally spaced fitness intervals (states) and transition assessed in this study, with K = 2 and K = 3. Both had N = 100.
probabilities are inferred from values generated by a sampling
algorithm run on the real landscape. Using this approach, landscape 2.2 Experimentally-derived DNA–protein interaction
state machines have shown a high degree of accuracy and specificity landscape
in predicting the performance of different evolutionary algorithms The second landscape was obtained from measured experimental data
on a range of optimization problems studied in Computer Science (Rowe et al., 2009). These data comprise every DNA sequence ten bases
(Corne et al., 2003; Rowe et al., 2006). in length and their corresponding affinities with the fluorescent protein
2146
2147
3 RESULTS
3.1 NK-Landscapes
Plots of the performance of each of the algorithms assessed on the
real landscapes and the LSMs are displayed in Figures 3–6; the
corresponding Pearson correlation coefficients are listed in Table 1.
The correlations between the values produced on the LSM and Fig. 5. Displaying plots of differing GA performance with increasing
the real landscapes are high for both NK-landscapes, ranging from mutation rate based on evaluations on the NK-landscape, where N = 100
0.94 to 1.00. While these values are strikingly high, in real terms and K = 3 (λ = 40 000).
they only relate to the general trends in performance of each of the
algorithms (with increasing mutation rates) in isolation. The relative
performance at each mutation rate was ranked for all algorithms and
2148
Algorithm K =2 K =3
2149
Table 2. Pearson correlation coefficients observed between average best Table 3. Pearson correlations representing the difference in performance
fitness values observed on the experimental fitness landscape and those of algorithms with and without crossover on the real landscape and the
predicted using the LSM for different µ, λ algorithms (where λ = 40 000) performance of algorithms with and without crossover on the LSM
The Pearson correlation coefficients of the average best fitness µ NK, K = 2 NK, K = 3 µ Experimental landscape
achieved on the real landscape and the LSM are much lower for
the experimental landscape than for the NK-landscapes (Table 2). 2 0.99 0.99 2 0.98
Correlations between 0.49 and 0.79 are observed for each individual 4 0.98 0.98 5 0.95
2150
Table 5. Pearson correlation coefficients between average best fitness values It has previously been shown that accurate LSMs can be
observed using the NK-landscapes (K = 2, K = 3) and the experimental constructed with far fewer real evaluations and an effective sampling
fitness landscape and those predicted using LSMs sampled at 1/L and 0.2/L algorithm can be beneficial to LSM accuracy (Rowe et al., 2006).
for different µ, λ algorithms (where λ = 40 000) with varying mutation rate In this study, a sampling algorithm with a low mutation rate was
(X denotes crossover)
selected not for its performance on the landscape, but because from
this low mutation rate a much greater range of mutation rates can
NK, K = 2 NK, K = 3 Experimental
be assessed by the LSM through matrix multiplication. This comes
landscape
at a price: higher mutation rate transitions may not be sampled.
Algorithm 1/L 0.2/L 1/L 0.2/L Algorithm 1/L 0.2/L
The predictions made on the performance of algorithms with much
higher mutation rates show only a small loss in accuracy, which
µ=2 0.993 0.990 0.971 0.982 µ=2 0.966 0.927 is remarkable given that any error within the transition matrix is
µ=4 0.992 0.990 0.965 0.976 µ=5 0.989 0.878 multiplied with every matrix multiplication.
µ = 40 0.990 0.989 0.975 0.971 µ = 10 0.959 0.915 The most important finding of this study has been that the simple
µ = 400 0.996 0.994 0.989 0.982 µ = 100 0.891 0.767 structure of the LSM is capable of mimicking the features of a real
µ = 2+X 0.997 0.995 0.980 0.982 µ = 2+X 0.982 0.715 biological landscape derived from real experimental data, despite
µ = 4+X 0.996 0.994 0.982 0.981 µ = 5+X 0.969 0.784 the noise that is inherent to such measurements and the complex
µ = 40+X 0.996 0.995 0.991 0.983 µ = 10+X 0.987 0.837 sequence/structural interactions. It would be interesting to determine
µ = 40+X 0.999 0.997 0.996 0.988 µ = 100+X 0.891 0.787
to what extent LSMs can replicate real biological fitness landscapes,
and whether they can be extended to model non-static and multiple
fitness functions. It may be necessary to modify the structure of
2151
Altenberg,L. (1995) The schema theorem and Price’s theorem. In Foundations of Genetic Merz,P. (2004) Advanced fitness landscape analysis and the performance of memetic
Algorithms. Morgan Kaufmann, San Francisco, pp. 23–49. algorithms. Evol. Comput., 12, 303–325.
Bäck,T. (1995) Generalized convergence models for tournament- and (mu, lambda)- Mitchell,M. et al. (1992) The royal road for genetic algorithms: fitness landscapes and
selection. In Proceedings of the 6th International Conference on Genetic GA performance. In Proceedings of European Conf. on Artificial Life, Varela, F.J.
Algorithms. Morgan Kaufmann Publishers Inc., San Francisco. and Bourgine, P. (eds), MIT Press, Cambridge MA, pp. 245–254.
Barnett,L. (1998) Ruggedness and neutrality - the NKp family of fitness landscapes. In Maynard Smith, J. (1970) Natural selection and the concept of a protein space. Nature,
Alive VI: Sixth International Conference on Artificial Life, MIT-Press, Cambridge, 225, 563–564.
MA, USA, pp. 18–27. Mühlenbein,H. (1992) How genetic algorithms really work: 1. Mutation and
Cervantes,J. and Stephens,C.R. (2009) Limitations of existing mutation rate heuristics Hillclimbing Parallel Problem Solving from Nature II. Elsevier Science Amsterdam,
and how a rank GA overcomes them. IEEE Trans. Evol. Comput., 12, 369–397. 15–25
Corne,D. et al. (2003) Landscape state machines: tools for evolutionary algorithm Naudts,B. and Kallel,L. (2000) A comparison of predictive measures of problem
performance analyses and landscape/algorithm mapping. Lecture Notes Computer difficulty in evolutionary algorithms. IEEE Trans. Evol. Comput., 4, 1–15.
Science, Springer, Berlin, pp. 197–198. Romero,P.A. and Arnold,F.H. (2009) Exploring protein fitness landscapes by directed
Dayhoff,M.O. et al. (1978) A model of evolutionary change in proteins. In Dayhoff, evolution. Nat. Rev. Mol. Cell Biol., 10, 866–877.
M.O. (ed), Atlas of Protein Sequence and Structure. National Biomedical Research Rowe,L.A. et al. (2003) A comparison of directed evolution approaches using the beta-
Foundation, Silver Spring, MD, pp. 345–352. glucuronidase model system. J. Mol. Biol., 332, 851–860.
Drummond,D.A. et al. (2005) Why high-error-rate random mutagenesis libraries are Rowe,W. et al. (2006) Predicting stochastic search algorithm performance using
enriched in functional and improved proteins. J. Mol. Biol., 350, 806–816. landscape state machines. IEEE Congress on Evolutionary Computation (CEC
Ellington,A.D. and Szostak,J.W. (1990) In vitro selection of RNA molecules that bind 2006), Vancouver. IEEE Press, Piscataway NJ, pp. 9849–9856.
specific ligands. Nature, 346, 818–822. Rowe,W. et al. (2010) Analysis of a complete DNA-protein affinity landscape. J. Roy.
Giver,L. et al. (1998) Directed evolution of a thermostable esterase Proc. Natl Acad. Soc. Interface, 7, 397–408.
Sci. USA, 95, 12809–12813. Stemmer,W.P. (1994) DNA shuffling by random fragmentation and reassembly: in
Grefenstette,J.J. (1995) Predictive models using fitness distributions of genetic vitro recombination for molecular evolution. Proc. Natl Acad. Sci. USA, 91,
operators. In Foundations of Genetic Algorithms 3. Morgan Kaufmann, San Mateo, 10747–10751.
2152