Sie sind auf Seite 1von 9

Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 511

International Journal of Drug Design and Discovery


Volume 2 Issue 3 July – September 2011. 511-519

Validation of QSAR Models - Strategies and Importance

Ravichandran Veerasamy1*, Harish Rajak2, Abhishek Jain3, Shalini Sivadasan1,


Christapher P. Varghese1 and Ram Kishore Agrawal3
1
Faculty of Pharmacy, AIMST University, Semeling – 08100, Kedah, Malaysia.
2
Institute of Pharmaceutical Sciences, Guru Ghasidas University, Bilaspur-495 009 (CG), India.
3
Department of Pharmaceutical Sciences, Dr. H. S. Gour University, Sagar, India – 470 003.

ABSTRACT: Quantitative Structure-Activity Relationship (QSAR) is based on the hypothesis that changes in molecular
structure reflect changes in the observed response or biological activity. The success of any quantitative structure–activity
relationship model depends on the accuracy of the input data, selection of appropriate descriptors, statistical tools and the
validation of the developed model. Validation is a crucial aspect of QSAR modeling. Validation is the process by which the
reliability and significance of a procedure are established for a specific purpose. Hence in this review we focus on the
importance of validation of QSAR models and different methods of validation.

KEYWORDS: QSAR; Internal validation; External validation; Randomization.

Introduction chemistry, toxicology, and eventually most facets of


chemistry. The overall goals of QSAR retain their original
A variety of in vitro and computational methods are being
essence and remain focused on the predictive ability of the
proposed for the toxicological assessment of chemicals
approach and its receptiveness to mechanistic
owing to the increasing requirement for alternative
interpretation.
methods to in vivo toxicity testing. More efforts will be
given to include methods such as in vitro techniques and QSAR includes all statistical methods, by which
Quantitative Structure-Activity Relationship (QSAR) biological activities (most often expressed by logarithms
models in the regulatory decision-making process in of equipotent molar activities) are related with structural
Europe1,2 under the new registration, evaluation, and elements (Free Wilson analysis), physicochemical
authorization of chemicals (REACH) system3. For reasons properties (Hansch analysis), or fields (three-
of practicality, cost effectiveness and animal welfare, it is dimensional QSAR). It is a technique that quantifies the
envisaged that QSAR will play an important role in the relationship between structure and biological data and is
assessment of existing chemicals for which further useful for optimizing the groups that modulate the
information may be required under the REACH system. potency of the molecule5.
Therefore, it will be essential that the QSAR models used There are different types of computational methods in
will produce reliable estimates. QSAR depends upon the data complexity. Those are two-
QSAR is a widely accepted predictive and diagnostic dimensional (2D), three-dimensional (3D) and higher
process used for finding associations between chemical dimensional methods6. 2D-QSAR is insensitive to the
structures and biological activity. QSAR has emerged and conformational arrangements of atoms in space, while 3D-
has evolved trying to fulfill the medicinal chemist’s need QSAR needs information on the position of the atoms in
and desire to predict biological response4. The quantitative three spatial dimensions7. In 4D-QSAR for each molecule,
structure-activity relationship paradigm first found its way a set of automatically docked orientations and
into the practice of agro chemistry, pharmaceutical conformations are developed by genetic algorithms8-12.
Induced-fit scenarios of ligands upon binding to the active
* For correspondence: site and solvation models can be thought of as the fifth
(protein flexibility) and sixth (entropy) dimensions in 5D-
Tel.: 006-04-4298000 Ext. 1029; Fax: 006-04-4298007. and 6D-QSAR, respectively.
E-mail: phravi75@rediffmail.com

511
512 International Journal of Drug Design and Discovery Volume 2  Issue 3 July – September 2011

The process of QSAR model development can be development at many times. In the QSAR community, the
generally divided into three stages: data preparation, data validation of a model is little more than an assessment of
analysis and model validation (Scheme 1). These steps statistical fit and, occasionally, predictivity using cross-
represent a standard practice of any QSAR modeling and validation techniques. However, it is now being accepted
their implementations are often determined by the that validation is a more important process that includes
researcher’s interests, experience and software availability. assessment of issues such as data quality, applicability of
Acquiring a good quality QSAR model depends on many the model and mechanistic interpretability in addition to
factors, such as the quality of biological data, the choice of statistical assessment20. In this review, we discussed about
descriptors, variable selection, statistical methods and the validation parameters for the QSAR models which are
validation13-18. developed by multiple linear regression (MLR) and partial
Validation is a crucial aspect of any QSAR modeling. It least square regression (PLS).
is the process by which the reliability and relevance of a
procedure are established for a specific purpose19. Formal
validation is one of the most overlooked steps in the model

Scheme 1 Various stages of QSAR model development


Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 513

Importance of validation of QSAR models models35 are (i) cross-validation, (ii) bootstrapping,
(iii) randomization of the response data, and (iv) external
The QSAR models are useful for various purposes validation. Several principles for assessing the validity of
including the prediction of activities of untested chemicals. QSAR models were proposed at an International workshop
The success of drug discovery efforts depends heavily on held in Setubal (Portugal), which were subsequently
the use of structure-activity relationship techniques21. Over modified in 2004 by the OECD Work Programme on
the years of development, many methods, algorithms and QSARs1,3. Against this background, in this review we have
techniques have been discovered and applied in QSAR discussed about the different validation methods of QSAR
studies22. models.
Main challenge in QSAR is to select the group of
descriptors which describe the most critical structural and Validation Methods for QSAR Models
physicochemical features associated with activity. Quality Validation methods are needed to establish the
of biological data and effective descriptor or variable predictiveness of a model on unseen data and to help
selection is an initial essential part of the QSAR modeling
determine the complexity of an equation that the amount
process23. The application of QSAR models is depends on of data justifies. Using the data that created the model (an
statistical significance and predictive ability of these internal method) or using a separate data set (an external
models. The regulatory decisions, justification of usage
method) can help validate the QSAR model (Scheme 2).
and use of a particular QSAR model is depend on the The methods of least squares fit (R2), crossvalidation
model ability to predict unknown chemicals with some
(Q2)36-38 , adjusted R2 (R2adj), chi-squared test (2), root-
known degree of certainity18,24. QSAR model can lead to
mean-squared error (RMSE), bootstrapping and
false prediction of biological activity if the developed
scrambling (Y-Randomization)39,40 are internal methods of
QSAR model is not validated. So validation of QSAR
validating a model. The best method of validating a model
models, after model development, is most important part in
is an external method, such as evaluating the QSAR model
QSAR studies.
on a test set of compounds. These are statistical
QSAR has clearly matured, although it still has a way methodologies used to ensure the models created are sound
to go. The estimation of accuracy of predictions is a critical and unbiased (“good model”). A poor model can do more
problem in QSAR modeling25. Only in this decade, harm than good, thus confirming the model as a “good
validation of QSAR models has received considerable model” is of utmost importance.
attention14-17, 26-34. Four tools of assessing validity of QSAR

Scheme 2 Various stages of QSAR model validation


514 International Journal of Drug Design and Discovery Volume 2  Issue 3 July – September 2011

However, excellent values of R2, 2 and RMSE are not


Internal Validation
sufficient indicators of model validity. Thus, alternative
Least Squares Fit parameters must be provided to indicate the predictive
The most common internal method of validating the model ability of models. In principle, two reasonable approaches
is least squares fitting. This method of validation is similar of validation can be envisaged one based on prediction and
to linear regression and is the R2 (squared correlation the other based on the fit of the predictor variables to
coefficient) for the comparison between the predicted and rearranged response variables.
experimental activities. An improved method of Cross-validation
determining R2 is the robust straight line fit, where data
A common method for internally validating a QSAR
points are away from the central data points (essentially
model is cross-validation (CV, Q2, q2, or jack-knifing). CV
data points a specified standard deviation away from the
process repeats the regression many times on subsets of
model) are given less weight when calculating the R2. An
data. Usually each molecule is left out once (only), in turn,
alternative to this method is the removal of outliers
and the R is computed using the predicted values of the
(compounds from the training set) from the dataset in an
missing molecule. Sometimes more than one molecule
attempt to optimize the QSAR model and is only valid if
(leave many out, LMO) is left out at a time. CV is often
strict statistical rules are followed. The difference between
used to determine how large a model can be used for a
the R2 and R2adj value is less than 0.3 indicates that the
given data set. A cross-validated R2 is usually smaller than
number of descriptors involved in the QSAR model is
the overall R2 for a QSAR equation. It is used as a
acceptable. The number of descriptors is not acceptable if
diagnostic tool to evaluate the predictive power of an
the difference is more than 0.3.
equation.
2
 NXY – (x) (Y)  CV used to measure a model’s predictive ability and
R = 
2
 draw attention to the possibility a model has been over-
 ([NX 2  (X) 2 ] [NY 2  (Y) 2 ]) 
  fitted. Over-fitting refers to the phenomenon in which a
predictive model may well describe the relationship
Fit of the Model between predictors and response, but may subsequently
Fit of the QSAR models can be determined by the methods fail to provide valid predictions for new compounds. Over-
of chi-squared (2) and root-mean squared error (RMSE). fitting of the model is usually suspected when the R2 value
These methods are used to decide if the model possesses from the original model is significantly larger (25%) than
the predictive quality reflected in the R2. The use of RMSE the Q2 value (Difference between R2 and Q2 should not be
shows the error between the mean of the experimental more than 0.3)36. CV values are considered more
values and predicted activities. The chi squared value characteristic of the predictive ability of the model. Thus,
exhibits the difference between the experimental and CV is considered a measure of goodness of prediction and
predicted bioactivities: not fit in the case of R2. The process of CV begins with the
removal of one or a group of compounds, which becomes a
n  (y – y ˆ i )2  temporary test set, from the training set. A CV model is
2 =   i 
i 1  ŷi  created from the remaining data points using the
descriptors from the original model, and tested on the
 n (yˆ i – y m ) 2  removed molecules for its ability to correctly predict the
RMSE = Sqrt  
 i 1 n  1  bioactivities.
In the leave-one-out (LOO) method of CV, the process
Where, y and ŷ are the experimental and predicted of removing a molecule, and creating and validating the
bioactivity for an individual compound in the training set, model against the individual molecules is performed for
ym is the mean of the experimental bioactivities, and n is
the entire training set. Once complete, the mean is taken of
the number of molecules in the set of data being examined.
all the Q2 values and reported. The data utilized in
Large chi-square or RMSE values (≥0.5 and 1.0, obtaining Q2 is an augmented training set of the
respectively) reflect the model’s poor ability to accurately compounds (data points) used to determine R2. The method
predict the bioactivities even the model is having large R2 of removing one molecule from the training set is
value (≥0.7). For good predictive model the chi and RMSE considered to be an inconsistent method37,38. A more
values should be low (<0.5 and <0.3, respectively). These correct method is leave-many-out (LMO), where a group
methods of error checking can also be used to aid in of compounds is selected for validation of the CV model.
creating models and are especially useful in creating and This method of cross-validation is especially useful if the
validating models for nonlinear data sets, such as those training set used to create the model is small (≤20
created with Artificial Neural Network (ANN)41.
Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 515

compounds) or if there is no test set. For good Randomization test (Scrambling model)
predictability R2 – Q2 value should not exceed 0.3. The The predictive power of the equation is poor when the
equation for CV is observations are not sufficiently independent of each other.
PRESS One way to test for this is by randomization of the
Q2 = 1 – N dependent variables. This procedure ensures that the model
 (yi – y m ) 2 is not due to a chance. The set of activity values is
i 1
reassigned randomly to different molecules, and repeating
N
PRESS =  (y pred,i – yi ) 2 the entire modeling procedure. This process is repeated
i 1 many times. If the random models activity prediction is
comparable to the original equation, the set of observations
where the yi is the data value(s) not used to construct
is not sufficient to support the model.
the CV model. PRESS is the predictive residual sum of the
squares. The creation of a Scrambled Model39,40 is a unique
method of checking the descriptors used in the model
Beware of Q2 because the bioactivities are randomized ensuring the new
To validate a QSAR model, most of researchers apply the model is created from a bogus data set. The basis for this
LOO or LMO CV procedures. The outcome from this method is to test the validity of the original QSAR model
procedure is a cross-validated correlation coefficient R2 and to ensure that the selected descriptors are appropriate.
These new models (Scram-models) are created using the
(Q2). Frequently, Q2 is used as a criterion of both
same descriptors as the original model, yet the bioactivities
robustness and predictive ability of the model. Many
are changed. After each Scram-model is created, validation
authors consider high Q2 (for instance, Q2 > 0.5) as an is performed using the methods mentioned earlier. To
indicator or even as the ultimate proof of the high ensure that the Scram-models are truly random, the process
predictive power of, the QSAR model. They do not test the of changing the bioactivities can be repeated and as each
models for their ability to predict the activity of new Scram-model is created its R2 and Q2 values. Each
compounds of an external test set (i.e., compounds that time the R2 and Q2 values of the Scram-models are
have not been used in the QSAR model development). substantially lower further enforces that the true QSAR
There are several examples of recent publications, in which model is sound. The basis of using this method is to
the authors claim that their models have high predictive validate the original QSAR model because the Scram-
ability without validating them by use of an external test models are created using the original descriptors and bogus
set42-46. Some authors validate their models by the use of bioactivities. The model would be in question if there was
a strong correlation (R2 > 0.50)50 between the randomized
only one or two compounds that were not used in QSAR
bioactivities and the predicted bioactivities, specifically
model development47,48 and still claim that their models are
that the model is not responsive to the bioactivities.
highly predictive. However, it has been found that if a test
set with known values of biological activities is available External validation
for prediction, no correlation may exist between the
Several authors have suggested that the only way to
predicted and observed activities for the test set49,14.
estimate the true predictive power of a QSAR model is to
Bootstrapping compare the predicted and observed activities of an
Bootstrapping is another method of internal validation (sufficiently large) external test set of compounds that
where samples are selected randomly from the data set. In were not used in the model development49,50, 51-53. The
the simplest form of bootstrapping, instead of repeatedly problem in external validation is how can we select the
analyzing subsets of the data, sub samples of the data are training and test set? Roy et al. clearly discussed that how
repeatedly analyzed. Each sub sample is a random sample we can solve this problem in one of their article22.
with replacement from the full sample. In a typical To estimate the predictive power of a QSAR model,
bootstrap validation, K groups of size n are generated by a Golbraikh and Tropsha recommended use of the following
repeated random selection of n objects from the original statistical characteristics of the test set14: (i) correlation
data set. Some of these objects can be included in the same coefficient R between the predicted and observed
random sample several times, whereas other objects may activities; (ii) coefficients of determination (R2) [54]
never be selected. The model obtained from n randomly (predicted vs. observed activities r02, and observed vs.
selected objects is used to predict the target properties for predicted activities r0'); (iii) slopes k and k' of the
the excluded samples. A high average Q2 in the bootstrap regression lines through the origin. They consider a QSAR
validation is a demonstration of the model robustness.
516 International Journal of Drug Design and Discovery Volume 2  Issue 3 July – September 2011

model is predictive, if the following conditions are values with intercept value set to zero. A value of r2m is
satisfied14: greater than 0.5 may be taken as an indicator of good
R2 pred > 0.6, external predictability.
Unlike external validation parameters (R2pred etc.), the
r2 – r02 / r2 < 0.1, r2 – r0 ,2 / r2 < 0.1 and
r2m (overall) statistic is not based only on limited number
0.85 < k < 1.15 or 0.85 < k’ < 1.15. of test set compounds. It includes prediction for both test
set and training set (using LOO predictions) compounds.
The predictive ability of the selected model was also
Thus, this statistic is based on prediction of comparably
confirmed by external R2 pred. A value of R2 pred is
large number of compounds. The r2m (overall) statistic may
greater than 0.6 may be taken as an indicator of good
be advantageous when the test set size is considerably
external predictability.
small and regression based external validation parameter
test
 (yexp – y pred ) 2 may be less reliable and highly dependent on individual
R 2Pr ed  1– i 1 test set observations. The r2m (overall) statistic may be
test
 (yexp – y tr )2 used for selection of the best predictive models from
i 1
among comparable models are obtained, where some
Where y tr is the average value for the dependent models show comparatively better internal validation
parameters and some other models show comparatively
variable for the training set.
superior external validation parameters.
The lack of the correlation between Q2 and R2 was
Other validation parameter named as R2p to check the
noted in, Kubinyi et al.49, Novellino et al.51, Norinder52,
acceptability of the selected model has been reported by
and Golbraikh and Tropsha14, publication, where they
Roy (2007)55. The parameter R2p, which penalize the model
demonstrated that all of the above-mentioned criteria are
R2 for the difference between squared mean correlation
necessary to adequately assess the predictive ability of a
coefficient (R2r) of the randomized models and squared
QSAR model. Norinder suggest52 that the external test set
correlation coefficient (R2) of the non-randomized model.
must contain at least five compounds, representing the
A value of R2p should be greater than 0.5 may be taken as
whole range of both descriptor and activities of compounds
an indicator of model acceptability.
included into the training set.
Recently the use of internal versus external validation R 2p  R 2 * 1  R 2  R 2r 
 
has been a matter of great debate55. One group of QSAR
workers supports internal validation, while the other group The value of r2m (overall) determines whether the range
considers that internal validation is not a sufficient test for of predicted activity values for the whole dataset of
checking robustness of the models and external validation molecules are really close to the observed activity or not
must be done. Hawkins et al., the major group of (best predictive model or not). The value of R2p, on the
supporters of internal validation, are of the opinion that divergent, determines whether the model obtained is really
cross-validation is able to assess the model fit and to check robust or obtained as a result of chance only. Hence it can
whether the predictions will carry over to fresh data not be inferred that a QSAR model can be considered
used in the model fitting exercise. They have argued that acceptable if the values of r2m (overall) and R2p are equal to
when the sample size is small, holding a portion of it back or above 0.5 (or at least near 0.5).
for testing is wasteful and it is much better to use
The selection of robust and well predictive QSAR
“computationally more burdensome” leave-one-out cross-
models on the basis of R2, Q2 and R2pred may mislead the
validation56,57.
search for the ideally predictive model so it may also be
Recent term to check the external predictability of the done on the basis of few other parameters, such as R2CVext,
selected model is r2m, which was proposed by Roy and Paul r2 - r20 / r2, r2 - r’20 / r2, k, k’, r2m (overall) and R2p.
(2008) [58] and it was calculated by the following formula
When can we accept the developed QSAR model as
rm2 r 2
1  r 2
 r02  reliable and predictive one?
A developed QSAR model can be accepted generally in
Where r2 is squared correlation coefficient between QSAR (MLR and PLS) studies when it can satisfy the
observed and predicted values and r20 is squared following criterion (The following values are the minimum
correlation coefficient between observed and predicted recommended values for significant QSAR model
Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 517

meanwhile these evaluation measures are depend on the Recently Jaworska et al.60 reviewed the methods and
response measure scale or measure unit): criteria for estimating applicability domain through
 If correlation coefficient R  0.8 (for in vivo data). training set interpolation based on range, distance,
geometrical and probability density distribution based
 If coefficient of determination R2  0.6
approaches. A cluster-based approach have been proposed
 If the standard deviation s is not much larger than by Stanforth et al.61 to modeling the domain of
standard deviation of the biological data. applicability by applying an intelligent version of the K-
 If its F value indicate that overall significance level means clustering algorithm, modeling the training set as a
is better than 95%. collection of clusters in the descriptors space, assigning a
 If its confidence interval of all individual regression test compound fuzzy membership of each individual
coefficients proves that they are justified at the 95% cluster from which an overall distance may be calculated.
significance level. Guha and Jurs50 have used a classification method that
divides the regression residuals from a previously
 If cross-validated R2 (Q2) > 0.5
generated model into a good class and a bad class and
 If R2 for external test set, R2pred > 0.6 builds a classifier based on the division. The trained
 Randomized R2 value should be as low as to R2. classifier is then used to determine the class of the residual
 Randomized Q2 value should be as low as to Q2. of a new compound.
 (r2 – r20)/r2 < 0.1 and 0.85 < = k < = 1.15, or (r2 – Dispute on QSAR Validation
r'20) / r2 < 0.1 and 0.85 < = k' < = 1.15 (for test set).
Hawkins et al., the major group of supporters of internal
 r2m (overall) and R2p are  0.5 (or at least near 0.5).
validation, are of the opinion that cross-validation is able
 In addition, the biological data should cover a range to assess the model fit and to check whether the predictions
of at least one, two or even more logarithmic units: will carry over to fresh data not used in the model fitting
they should be well distributed over whole distance. exercise. They have argued that when the sample size is
Also, physicochemical parameter should be spread small, holding a portion of it back for testing is wasteful
over a certain range and should be more or less and it is much better to use “computationally more
evenly distributed. burdensome” leave-one-out cross-validation56,57.
Equation has to be rejected An inconsistency between internal and external
 If the above mentioned statistical measures are not predictivity was reported in a few QSAR studies 51,52,62. It
satisfied was reported that, in general, there is no relationship
 If the number of the variables in the regression between internal and external predictivity63,64: high internal
equation is unreasonably large. predictivity may result in low external predictivity and vice
versa. In many cases, comparable models are obtained
 If standard deviation is smaller than error in the
where some models show comparatively better internal
biological data.
validation parameters and some other models show
Applicability domain comparatively superior external validation parameters.
This may create a problem in selecting the final model. So
Activity of the entire universe of chemicals can not be it is must to develop some good validation techniques to
predicted even by a robust and validated QSAR model. overcome the entire above mentioned disputes.
The prediction of a modeled response using QSAR is valid
only if the compound being predicted is within the
applicability domain of the model. The applicability Conclusion
domain is a theoretical region of the chemical space, Validation of QSAR models is a very important aspect to
defined by the model descriptors and modeled response understand reliability of model for prediction of a new
and, thus, by the nature of the training set molecules59. It is compound not present in the data set. If we consider 1000
possible to check whether a new chemical lies within reported QSAR models, out of which only 50 to 60 models
applicability domain using the leverage approach. A are really predictive but its not sure that these 60 models
compound will be considered outside the applicability have been obeyed all the conditions and validation
domain when the leverage value is higher than the critical parameters discussed in this articles. Our opinion is, both
value of 3p/n, where p is the number of model variables internal and external validation strategies are important
plus 1 and n is the number of objects used to develop the and, in fact, one should adopt all available validation
model. strategies to check robustness of the model. Only few
518 International Journal of Drug Design and Discovery Volume 2  Issue 3 July – September 2011

reported QSAR models were following all the validation [15] Tropsha, A.; Gramatica, P.; Gombar, V.K. QSAR Comb.
characteristics mentioned in this article65,66. Sci. 2003, 22, 69-77.
As a conclusion, not only the above recommendations [16] Tong, W.; Xie, Q.; Hong, H.; Shi, L.; Fang, H.; Perkins,
for validation of QSAR models by different scientist and R. Environ. Health Perspect. 2004, 112, 1249-1253.
researcher should be followed, also the chemical space of
[17] Aptula, A.O.; Jeliazkova, N.G.; Schultz, T.W.; Cronin,
training and test sets has to be analyzed; real outliers, with
M.T.D. QSAR Comb.Sci. 2005, 24, 385-390.
respect to congeneric character and structural similarity,
have to be discovered and eliminated. Even then, [18] He, L.; Jurs, P.C. J. Mol. Graphics Mod. 2005, 23, 503-
predictions by QSAR models are remains as a risky 523.
procedure. So, still we need proper validation techniques to [19] Ghafourian, T.; Cronin, M.T.D. SAR QSAR Environ.
select a good predictive QSAR models. Res. 2005, 16, 171-190.
[20] Roy, K.; Leonard, J.T. QSAR Comb. Sci. 2006, 25, 235-
References 251.
[1] Von, P.C.; Kuhne, R.; Ebert, R.U.; Altenburger, R.; [21] Kolossov, E.; Stanforth, R. SAR and QSAR Environ. Res.
Liess, M.; Schuurmann, G. Chem. Res. Toxicol. 2005, 18, 2007, 18, 89-100.
536 -555.
[22] Roy, P.P.; Roy, K. QSAR Comb. Sci. 2008, 27, 302-313.
[2] Http://europa.eu.int/comm/ enterprise/reach/overview.htm
[23] Walker, J.D.; Jawrska, J.; Comber, J.H.I.; Schultz, T.W.;
[3] Combes, R.; Balls, M.; Bansil, L. ATLA. 1991, 30, 365 - Dearden, J.C. Environ. Toxicol. Chem. 2003, 22, 1653-
406. 1665.
[4] Seidel, J.K.; Schaper, K.J. Chemical structure and [24] Roy, P.P.; Leonard, J.T.; Roy, K. Chemom. Intell. Lab.
biological activity, Verlag Chemie Weinheim: New York, Sys. 2008, 90, 31-42.
1979, 1-11.
[25] Tong, W.; Hong, H.; Xie, Q.; Shi, L.; Fang, H.; Perkins,
[5] Hansch, C. In Comprehensive Medicinal Chemistry, R. Curr. Comput. Aided Drug Des. 2005, 1, 195-205.
Ramsden, C. A. Ed.; Pergamon Press: New York, 1990;
[26] He, L.; Jurs, P.C. J. Mol. Graph. Mod. 2005, 23, 503-523.
Vol. 4, pp 5-8.
[27] Ghafourian, T.; Cronin, M.T.D. SAR QSAR Environ.
[6] Livingstone, D.J. Predicting Chemical Toxicity and Fate.
Res. 2005, 16, 171-190.
CRC Press LLC: Boca Raton, FL, 2004, 151-170.
[28] Balls, M. ; Blaauboer, B.J. ; Fentem, J.H. ATLA. 1995,
[7] Winkler, D.A. Briefing Bioinformat. 2002, 3, 73-86.
23, 129-147.
[8] Albuquerque, M.G.; Hopfinger, A.J.; Barreiro, E.J.; De
[29] McKinney, J.D.; Richard, A.; Waller, C.; Newman, M.C.;
Alencastro, R.B. J. Chem. Inf. Comput. Sci. 1998, 38,
Gerberick, F. Toxicol. Sci. 2000, 56, 8-17.
925-938.
[30] Tong, W.; Welsh, W.J.; Shi, L.; Fang, H.; Perkins, R.
[9] Santos-Filho, O.; Hopfinger, A.; J. Comput-Aided Mol.
Environ. Toxicol. Chem. 2003, 22, 1680-1695.
Des. 2001, 15, 1-12.
[31] Worth, A.P.; Cronin, M.T.D. ATLA. 2004, 32, 703-706.
[10] Ravi, M.; Hopfinger, A.; Hormann, R.; Dinan, L. J.
Chem. Inf. Comput. Sci. 2001, 41, 1587-1604. [32] Cronin, M.T.D.; Jaworska, J.S.; Walker, J.D.; Comber,
M.H.I. Watts, C.D.; Worth, A.P. Environ. Health
[11] Krasowski, X. Hong, A. Hopfinger, N. Harrison. J. Med.
Perspect. 2003, 111, 1391-1401.
Chem. 45 (2002) 3210-3221.
[33] M.T.D. Cronin, J.S. Jaworska, J.D. Walker, M.H.I.
[12] Hong, M. X.; Hopfinger, A. J. Chem. Inf. Comput. Sci.
Comber, C.D. Watts, A.P. Worth. Environ. Health Persp.
2003, 43, 324-336.
111 (2003) 1376-1390.
[13] Tetko, I.V.; Sushko, I.; Pandey, A.K.; Zhu, H.; Tropsha,
[34] Worth, A.P.; Leeuwen, C.J.; Hartung, T. SAR QSAR
A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.;
Environ. Res. 15 (2004) 331-343.
Varnek, A. J. Chem. Inf. Model. 2008, 48, 1733-1738.
[35] A.P. Worth, T. Hartung, C.J. Leeuwen. SAR QSAR
[14] Golbraikh, A.; Tropsha, A. J. Mol. Graphics Mod. 2002,
Environ. Res. 15 (2004) 345-358
20, 269-276.
Ravichandran Veerasamy, et al : Validation of QSAR Models-Strategies and Importance 519

[36] A.R. Leach. Molecular modeling: Principles and [50] Guha, R.; Jurs, P.C. J. Chem. Inf. Model. 2005, 45, 65-73.
applications. Pearson Education Ltd. Harlow, England,
[51] Novellino, E.; Fattorusso, C.; Greco, G. Pharm. Acta
2001.
Helv. 1995, 70, 149-154.
[37] Shao, J. J. Am. Stat. Assoc. 1993, 88, 486-494.
[52] Norinder, U. J. Chemom. 1996, 10, 95-105.
[38] Besal, E. J. Math. Chem. 2001, 29, 191-195.
[53] Zefirov, N.S.; Palyulin, V.A. J. Chem. Inf.Comput. Sci.
[39] Wold, S.; Ericksson, L. Partial least squares projections to 2001, 41, 1022-1027.
latent structures (PLS) in chemistry. In Encyclopedia of
[54] Sachs, L. Applied Statistics: A Handbook of Techniques,
computationalchemistry, Ragu & Schleyer, P. (ed.), John
Springer-Verlag, BerlirdNew York, 1984.
Wiley & Sons, Chichester, 1998, Vol. 3, 2006–2021.
[55] Roy, K. Expert Opin. Drug Discov. 2007, 2, 1567-1577.
[40] Yasri, A.; Hartsough, D. J. Chem. Inf. Comput. Sci. 2001,
41, 1218-1227. [56] Hawkins, D.M.; Basak, S.C.; Mills, D. J. Chem.
Inf.Comput. Sci. 2003, 43, 579-586.
[41] Schneider, G.; Wrede, P. Prog. Biophys. Mol. Biol. 1998,
70, 175–222. [57] Hawkins, D.M. J. Chem. Inf. Comput. Sci. 2004, 44, 1-12.

[42] Gironbs, X.; Gallegos, A.; Ramon, C.D. J. Chem Inf [58] Roy, K.; Paul, S. QSAR Comb. Sci. 2008, 28, 406-425.
Comput. Sci. 2000, 46, 1400-1407. [59] Atkinson, A.C. Plots, Transformations and Regression.
[43] Bordhs, B.; Kijmives, T.; Szant, Z.; Lopata, A. J. Agric. Clarendon Press, Oxford, UK (1985).
Food Chem. 2000, 48, 926-931. [60] Jaworska, J.; Nikolovzjeliazkova, N.; Aldenberg, T.
[44] Fan, Y.; Shi, L.M.; Kohn, K. W.; Pommier, Y.; Altern. Lab. Anim. 2005, 33, 445-459.
Weinstein, J.N. J. Med. Chem. 2001, 44, 3254-3263. [61] Stanforth, R.W.; Kolossov, E.; Mirkin, B. QSAR Comb.
[45] Randic, M.; Basak, S.C. J. Chem. Inf. Comput.Sci. 2000, Sci. 2007, 26, 837–844.
40, 899-902. [62] Kubinyi, H. A general view on similarity and QSAR
[46] Suzuki, T.; Ide, K.; Ishida, M.; Shapiro, S. J. Chem. Inf. studies. In: Computer-Assisted
Comput. Sci. 2001, 41, 718-726. [63] Lead Finding and Optimization, van de Waterbeemd H,
[47] Recanatini, M.; Cavalli, A.; Belluti, F.; Piazzi, L.; Rarnpa, Testa B, Folkers G (Eds), VHChA and VCH, Basel,
A.; Bisi, A.; Gobbi, S.; Valenti, P.; Andrisano, V. ; Weinheim. 1997, 9-28.
Bartolini, M.; Cavrini, V. J. Med. Chem. 2000, 43, 2007- [64] Kubinyi, H.; Hamprecht, F.A.; Mietzner, T. J. Med.
2018. Chem. 1998, 41, 2553-2564.
[48] Morbn, J. A.; Campillo, M.; Perez, V.; Unzeta, M.; Pardo, [65] Ravichandran, V.; Shalini, S.; Sundram, K.M.; Dhanaraj,
L. J. Med. Chem. 2000, 43, 1684-1691. S.A. Eurp. J. Med. Chem. 2010, 45, 2791-2797.
[49] Kubinyi, H.; Hamprecht, F.A.; Mietzner, T. J. Med. [66] Roy, P.P.; Paul, S.; Indrani, M.; Roy, K. Molecules. 2009,
Chem. 1998, 41, 2553-2564. 14, 1660-1701.

Das könnte Ihnen auch gefallen