Apoptosis is known as programmed cell death that plays an important role in tumor biology.
In this study, apoptosis-inducing activity is predicted by using a QSAR modeling approach for a series of 4-anilinoquinozaline derivatives. 2D-QSAR model for the prediction of apoptosis-inducing activity was obtained by applying multiple linear regression giving r2 = 0.8225 and q2 = 0.7626, principal component regression giving r2 = 0.7539 and q2 = 0.6669 and partial least squares giving r2 = 0.8237 and q2 = 0.6224.
QSAR study revealed that alignment-independent descriptors and distance-based topology index are the most important descriptors in predicting apoptosis-inducing activity. 3D-QSAR study was performed using k-nearest neighbor molecular field analysis (kNN-MFA) approach for both electrostatic and steric fields. Three different kNN-MFA 3D-QSAR methods (SW-FB, SA, and GA) were used for the development of models and tested successfully for internal (q2 > 0.62) and external (predictive r2 > 0.52) validation criteria. Thus, 3D-QSAR models showed that electrostatic effects dominantly determine the binding affinities.
The QSAR models developed in this study would be useful for the development of new apoptosis inducer as anticancer agents.
Cancer is a disease of cell characterized by progressive, persistent, abnormal, purposeless, and uncontrolled proliferation of tissues. Currently, cancer is most dominating cause of death in world [1, 2]. Apoptosis is a cellular process, that organisms use for digestion of excessive cells to control cell numbers. Caspases is a family of cysteine proteases, which are produced as zymogenes, plays a vital role in the beginning and ending of apoptosis . Small drug molecules can inhibit or activate caspases . In apoptosis, cell shrinks, deformed, and looses its contacts to neighboring cells that resulted into fragmented cells called "apoptotic bodies" . Human body removes apoptotic bodies without causing any inflammatory response. Process of apoptosis is controlled by a diverse range of cell signals . In humans during cell development, many cells are produced by mitosis in excess which eventually undergo programmed cell death and their by contribute to sculpturing many organs and tissues . Apoptosis is an essential process for the maintenance of tissue homeostasis. During the last decade, a number of compounds are identified to induce apoptosis. Compounds that promote apoptosis are considered as important medicaments for the treatment of cancer. Anticancer efficacy of many therapeutic agents is correlated to their apoptosis inducer ability, so identification of apoptosis-inducing activity plays an important role in discovery and development of potential anticancer agent . Many anticancer drugs like camptothecins, such as topotecan and irinotecan,  and vinca alkaloids, such as vincristine and vinblastine , kill tumors to some extent through induction of apoptosis . Recently, extensive advances are achieved in the field of apoptosis-based therapeutics. Many new drug candidates are currently being developed and most of them are in clinical state as potential apoptosis inducer agents. In an effort for search of new potent apoptosis-inducing agents, we have performed 2D- and 3D-QSAR study on 4-anilinoquinozaline derivatives for quantifying the necessary structural and physicochemical requirements of this series of compounds as potent apoptosis inducer and efficacious anticancer agents.
2. Materials and methods
QSAR is the study of the quantitative relationship between the experimental activity of a set of compounds and their physicochemical properties using statistical methods. The experimental information associated with biological activity, which is used as dependent variables in building a QSAR model. In this study, all computational work (2D- and 3D-QSAR) was performed using Vlife MDS QSAR plus software on a HP computer with Core2 Duo processor and a window XP operating system.
2.1 2D-QSAR modeling and dataset
Apoptosis-inducing activity data EC50 (μM) were taken from the published work . The experimental EC50 values were evaluated by Cai Sui Xiong et al. in a caspase-based HTS assay in human breast cancer cells (T47D). The negative logarithm of the measured EC50 (μM) [pEC50 = -log (EC50)] was used as dependent variable for 2D- and 3D-QSAR analysis and it is listed in Table 1. Since some compounds showed insignificant activity, such compounds were excluded from the dataset. Compounds were sketched using 2D draw application and converted to 3D structures. Energy minimization and geometry optimization were conducted using Merck molecular force field as force field and charge, maximum number of cycles were 1,000, convergence criterion (RMS gradient) was 0.01 and medium's dielectric constant of 1 by batch energy minimization method. Energy-minimized geometry was used for calculation of descriptors, a total of 208 2D descriptors were calculated which encoded different aspects of molecular structure and consists of electronic, thermodynamic, spatial, and structural descriptors, e.g., retention index (chi), atomic valence connectivity index (chiV), path count, chain path count, cluster, path cluster, element count, estate number, semi-empirical, molecular weight, molecular refractivity, logP, and topological index. Various Baumann alignment-independent (AI) descriptors were also calculated.
2.2. Selection of training and test set
The dataset of 32 molecules was divided into training set (24 compounds) and test set (8 compounds) by Sphere Exclusion (SE) method  for multiple linear regression (MLR), principal component regression (PCR), and partial least squares (PLS) model with dissimilarity value of 3.2 using pEC50 activity field as dependent variable and various 2D descriptors as independent variables (Additional file 1). The unicolumn statistics of test and training sets (Table 2) showed the accurate selection of test and training sets, as the maximum of the training set was more than that of the test set and the minimum of the training set was less than or equal to that of the test set.
2.3. Regression analysis
Dataset of 32 molecules was subjected to regression analysis using MLR, PCR, and PLS as model building methods. QSAR models were generated using pEC50 values as the dependent variable and various descriptors values as independent variables. The cross-correlation limit was set at 0.5, number of variables in the final equation at six in MLR and five in PCR and six in PLS, and term selection criteria as r2, F-test 'in,' at 4 and 'out' at 3.99, r2, and F-test. Variance cutoff was set at 0, scaling to autoscaling, and number of random iterations to 10. Statistical measures were used for the evaluation of QSAR models were the number of compounds in regression n, regression coefficient r2, number of descriptors in a model k, F-test (Fisher test value) for statistical significance F, cross-validated correlation coefficient q2, predictive squared correlation coefficients pred_r2, coefficient of correlation of predicted data set pred_r2se and standard error (SE) of estimation r2se and q2se.
2.4. MLR analysis
MLR is a method used for modeling linear relationship between a dependent variable Y (pEC50) and independent variable X (2D descriptors). MLR is based on least squares: the model is fit such that sum-of-squares of differences of observed and a predicted value is minimized. MLR estimates values of regression coefficients (r2) by applying least squares curve fitting method. The model creates a relationship in the form of a straight line (linear) that best approximates all the individual data points. In regression analysis, conditional mean of dependant variable (pEC50) Y depends on (descriptors) X. MLR analysis extends this idea to include more than one independent variable.
Regression equation takes the form
where Y is dependent variable, 'b's are regression coefficients for corresponding 'x's (independent variable), 'c' is a regression constant or intercept [14, 15].
2.5 PCR method
PCR is a data compression method based on the correlation among dependent and independent variables. PCR provides a method for finding structure in datasets. Its aim is to group correlated variables, replacing the original descriptors by new set called principal components (PCs). These PCs uncorrelated and are built as a simple linear combination of original variables. It rotates the data into a new set of axes such that first few axes reflect most of the variations within the data. First PC (PC1) is defined in the direction of maximum variance of the whole dataset. Second PC (PC2) is the direction that describes the maximum variance in orthogonal subspace to PC1. Subsequent components are taken orthogonal to those previously chosen and describe maximum of remaining variance, by plotting the data on new set of axes, it can spot major underlying structures automatically. Value of each point, when rotated to a given axis, is called the PC value. PCA selects a new set of axes for the data. These are selected in decreasing order of variance within the data. Purpose of principal component PCR is the estimation of values of a dependent variable on the basis of selected PCs of independent variables .
2.6 PLS regression method
PLS analysis is a popular regression technique which can be used to relate one or more dependent variable (Y) to several independent (X) variables. PLS relates a matrix Y of dependent variables to a matrix X of molecular structure descriptors. PLS is useful in situations where the number of independent variables exceeds the number of observation, when X data contain colinearties or when N is less than 5 M, where N is number of compound and M is number of dependant variable. PLS creates orthogonal components using existing correlations between independent variables and corresponding outputs while also keeping most of the variance of independent variables. Main aim of PLS regression is to predict the activity (Y) from X and to describe their common structure . PLS is probably the least restrictive of various multivariate extensions of MLR model. PLS is a method for constructing predictive models when factors are many and highly collinear.
2.7. Validation of QSAR model
The best way to evaluate quality of regression model is internal validation of QSAR model. Mostly leave-one-out (LOO) cross validation, one object (one biological activity value) is eliminated from training set and training dataset is divided into subsets (number of subsets = number of data points) of equal size. Model is build using these subsets and dependent variable value of the data point that was not included in the subset is determined, which is a predicted value. Mean of predicted will be same for r2 and LOO q2 (cross-validated correlation coefficient value) since all the data point will be sequentially considered as predicted in LOO subset. Same procedure is repeated after elimination of another object until all objects have been eliminated once. LOO cross validation resulted in three statistically significant models for each regression method . To calculate q2 following equation was used.
where Ypred, Yact, and Ymean are predicted, actual, and mean values of the pIC50, respectively. Σ(Ypred - Yact)2 is the predictive residual error sum of squares (PRESS). Definitive validity of model is examined by mean of external validation also, which evaluates how well equation generalizes. Training set is used to derive an adjustment model that is used after to predict activities of test set members. The predicted power of equations was validated using cross-validated squared correlation coefficient (q2) and by predictive squared correlation coefficients pred_r2 which is used as a diagnostic tool. To calculate the predictive r2 (r2pred) following equation was used.
where YPred(Test) and YTest are predicted and observed activity values, respectively, of test set compounds, and YTraining is the mean activity value of training set. Statistical significance of these models was further supported by 'fitness plot' obtained for each model; this is a plot of experimental versus predicted activity of training and test set compounds and provides an idea about how fit the model was trained and how well it predicts activity of external test set (Figure 1). Nearness of experimental to predicted activity reported in Table 1 also adds to this fact. Contribution charts for all the significant models are presented in Figure 2, which gives percentage contribution of descriptors used in deriving the QSAR models.
2.8. 3D-QSAR modeling and dataset
Dataset of 32 molecules was divided into training (24 compounds) and test (8 compounds) set by SE method having dissimilarities values of 7.9 with pEC50 activity field as dependent variable and various 3D descriptors calculated for the compounds as independent variables (Additional file 2).
2.9. Molecular modeling and alignment
Conformational search was carried out by systemic conformational search method (grid search), which generates all possible conformations, by systematically varying each of the torsion angles of a molecule by some increment, keeping the bond lengths and bond angles fixed and lowest energy conformers were selected. All the compounds were aligned by template-based method. In template-based alignment method, a template structure was defined and used as a basis for alignment of a set of molecules. In this study, all the compounds were aligned against minimum energy conformation of most active compound number using quinazoline ring as template.
2.10. Calculation of field descriptors
Electrostatic and steric field descriptors were calculated with cutoffs of 10.0 kcal/mol for electrostatic and 30.0 kcal/mol for steric, and charge type was selected as by Gasteiger-Marsili . The dielectric constant was set to 1.0, considering distance-dependent dielectric function. Probe setting was carbon atom with charge 1.0. A total of 2,080 field descriptors (1,040 for each electrostatic and steric) were calculated for all the compounds in separate columns. 3D-QSAR analysis was performed after exclusion of all the invariable columns, as they do not contribute to QSAR.
2.11. k-Nearest neighbor molecular field analysis (kNN-MFA)
The kNN methodology relies on a simple distance learning approach whereby an unknown member is classified according to the majority of its kNN in training set. The nearness is measured by an appropriate distance metric. In kNN-MFA method, several models were generated for selected members of training and test sets. Once training and test sets are generated, kNN methodology is applied to descriptors generated over the grid . The steric and electrostatic interaction energies are computed at lattice points of the grid using a methyl probe of charge +1. These interaction energy values are considered for relationship generation and utilized as descriptors to decide nearness between molecules.
2.12. kNN-MFA with stepwise forward-backward (SW-FB) variable selection method
kNN-MFA models were developed using SW-FB method with cross-correlation limit set to 0.5 and term selection criterion as q2. F-test 'in' was set to 4.0, and F-test 'out' to 3.99. As some additional parameters, variance cutoff was set at 2 kcal/mol Å and scaling to autoscaling; additionally, kNN parameter setting was done within the range of 2-5 and prediction method was selected as the distance-based weighted average.
2.13. kNN-MFA with genetic algorithm (GA)
GA first described by Holland , mimics natural evolution and selection. In biological systems, genetic information that determines the individuality of an organism is stored in chromosomes. Chromosomes are replicated and passed onto the next generation with selection criteria depending on fitness.
2.14. kNN-MFA with simulated annealing (SA)
SA is the simulation of a physical process, 'annealing', which involves heating the system to a high temperature and then gradually cooling it down to a preset temperature (e.g., room temperature). During this process, the system samples possible configurations distributed according to the Boltzmann distribution so that at equilibrium, low energy states are the most populated.
3. Results and discussion
3.1. Generation of 2D-QSAR models
Descriptors used in generation of 2D-QSAR models are given in Table 3 with detail description. 2D-QSAR study of 4- anilinoquinozaline derivatives resulted in several QSAR models. Statistically significant QSAR models were selected for discussion.
where n = 24training and 8test , DF = 30, r2 = 0.824, q2 = 0.622, F-test = 31.137, r2 se = 0.435, q2 se = 0.515, pred_r2 = 0.589
In above QSAR models, r2 is a correlation coefficient that has been multiplied by 100 gives explained variance in biological activity. Predictive ability of generated QSAR models was evaluated by q2 employing LOO method. F value reflects ratio of variance explained by models and variance due to error in regression. High F value indicates that model is statistically significant. Low SE of estimation indicted by r2 se and q2 se suggested that models are statistically significant. Predictive ability of QSAR model was also confirmed by external validation of test set compounds denoted by pred_r2, and it was found in agreement with accepted criteria of more than 0.3. Among these three models, PLS has come out with very good results as compare to other two models. Results of PLS analysis showed very good predictive ability as indicted by r2, q2, F-test, and pred_r2 values (Table 4).
3.2. Interpretaion of 2D-QSAR models
MLR (Model-1) and PLS (Model-3) indicate positive contribution of Baumann's  AI topological descriptors T_C_C_4, T_2_F_5, T_2_Cl_1 and negative contribution of chi3Cluster where as PCR model indicates positive contribution of RadiusOfGyration along with SKMostHydrophobic and negative contribution of T_N_O_1 and MomInertiaY.
AI descriptors can be generated considering topology of the molecule, atom type, and bond. For calculation of AI descriptors, every atom in the molecule was assigned at least one and at most three attributes. First attribute is 'T-attribute' to thoroughly characterize topology of the molecule. Second attribute is atom type, atom symbol is used here. Third attribute is assigned to atoms taking part in a double or triple bond. After all atoms have been assigned their respective attributes, selective distance count statistics for all combinations of different attributes are computed. A selective distance count statistic 'XY2' (e.g., TOPO2N3) counts all the fragments between start atom with attribute 'X' (e.g., '2' double-bonded atom) and end atom with attribute 'Y' (e.g., 'N') separated by graph distance 3. Graph distance can be defined as the smallest number of atoms along the path connecting two atoms in molecular structure. In this study, to calculate AI descriptors, we have used following attributes: 2 (double-bonded atom), 3 (triple-bonded atom), C, N, O, S, H, F, Cl, and Br the distance range of 0 to 7. RadiusOfGyration and MomInertiaY are distance-based topological descriptors. Topological indices are numerical values associated with chemical constitutions for the purpose of correlating chemical structure with biological activity. Distance-based topological descriptors are defined by their atom types and topological distance which signifies basic connectivity of atoms in the molecules. SKMostHydrophobic is a thermodynamic descriptor to characterize the hydrophobicity of a molecule. SlogP estimates logP by summing the contribution of atom-weighted solvent accessible surface areas and correction factors. SKMostHydrophobic contributed positively to the PCR model means the group, which increases hydrophobic nature, may cause increase apoptosis inducing activity. An estate contribution descriptor SssCH2E-index, which represents the electro-topological state indices for number of -CH2 group connected with two single bonds, is inversely proportional to the activity.
3.3. Generation and interpretaion of 3D-QSAR models
3D-QSAR modeling was performed using kNN-MFA method that adopts a kNN principle for generating relationships between molecular fields and apoptosis-inducing activity. The kNN-MFA models (4-6) were generated using training set of 24 compounds and 3D-QSAR models were validated using a test set of 8 compounds. The steric (S) and electrostatic (E) descriptors specify the regions, where variation in the structural features of different compounds in training set leads to increase or decrease in activities. The number accompanied by descriptors represents its position in 3D MFA grid. The stepwise forward backward variable selection method resulted in several statistically significant models, of which model-4 is considered as the best one. The model selection criterion is the value of q2, internal predictive ability of model, and that of pred_r2, ability of the model to predict activity of external test set. Model-4 (SW-FB)
Model-4 was considered as 3D-QSAR-model (SW-FB) for the dataset. Model-4 is used for internal predictivity, the value of LOO cross-validation squared correlation coefficient q2 = 0.653 suggested goodness of the prediction. Model having predictive squared correlation coefficient (pred_r2) > 0.52, in agreement with the accepted criteria as more than 0.4 which means 52% predictive power for the external test set (Table 5).
The predicted versus the experimental value for training and test sets are depicted in Figure 3. The best 3D-QSAR (kNN-MFA) models for the prediction of apoptosis-inducing activity were obtained by applying SA giving q2 = 0.693 and pred_r2 = 0.537, and GA giving q2 = 0.622 and pred_r2 = 0.563. 3D-QSAR models (4-6) obtained showed that electrostatic and steric interactions plays major role in determination of apoptosis inducing activity. E_895 and E_805 in model-4, E_509, E_875, and E_147 in model-5 are electrostatic field descriptors, similarly S_526 in model-5 and S_477 and S_664 in model-6 are steric descriptors (Figure 4). Negative value in electrostatic field descriptors indicates that negative electronic potential is required to increase apoptosis-inducing activity and more electronegative groups are preferred in that position, positive range indicates that group that imparting positive electrostatic potential is favorable for apoptosis-inducing activity so less electronegative group is preferred in that region. Similarly, negative range in steric descriptors indicates that negative steric potential is favorable for activity, and less bulky substituents group is preferred in that region, positive value of steric descriptors reveals that positive steric potential is favorable for increase in apoptosis-inducing activity of 4-anilinoquinozalines and more bulky group is preferred in that region.
Apoptosis plays a pivotal role in the cytotoxic activity of most chemotherapeutic drugs. QSAR modeling resulted in identification of common structural features responsible for prediction of apoptosis-inducing activity for 4-anilinoquinozaline derivatives. 2D-QSAR studies revealed that AI descriptors were major contributing descriptors. Descriptor values obtained in this study helped in quantification of the structural features of 4-anilinoquinozaline derivatives. The overall degree of prediction was found to be around 82% in case of MLR and PLS. Among the three 2D-QSAR models (MLR, PCR, and PLS), results of PLS analysis showed significant predictive power and reliability as compare to other two methods. The master grid obtained for the various kNN-MFA models showed positive value in electrostatic field descriptors, which indicates that positive electronic potential is required to increase apoptosis-inducing activity. Negative range in steric descriptors indicates that less bulky group is preferred in that region. 3D-QSAR results suggested the importance of some molecular characteristics, which should significantly affect the binding affinities of compounds. These results provide useful clues for designing novel apoptosis inducer for the treatment of cancer.
Varmus H: The new era in cancer research.Science 2006, 312: 1162–1165. 10.1126/science.1126758
Grivicich I, Regner A, da Rocha AB, Grass LB, Alves PA, Kayser GB, Schwartsmann G, Henriques JA: Irinotecan/5-fluorouracil combination induces alterations in mitochondrial membrane potential and caspases on colon cancer cell lines.Oncol Res 2005, 15: 385–392.
Kolomeichuk SN, Bene A, Upreti M, Dennis RA, Lyle CS, Rajasekaran M, Chambers TC: Induction of apoptosis by vinblastine via c-Jun autoamplification and p53-independent down-regulation of p21WAF1/CIP1.Mol Pharmacol 2008, 73: 128–136. 10.1124/mol.108.039750
Cai SX, Sirisoma N, Pervin A, Zhang H, Jiang S, Willardsen AJ, Anderson MB, Mather G, Pleiman CM, Kasibhatla S, Tseng B, Drewe J: Discovery ofN-(4-Methoxyphenyl)-N,2-dimethylquinazolin-4-amine, a potent apoptosis inducer and efficacious anticancer agent with high blood brain barrier penetration.J Med Chem 2009, 52: 2341–2351. 10.1021/jm801315b
Shen M, LeTiran A, Xiao Y, Golbraikh A, Kohn H, Tropsha A: Quantitative structure-activity relationship analysis of functionalized amino acid anticonvulsant agents using k nearest neighbor and simulated annealing PLS methods.J Med Chem 2002, 4: 2811–2823.
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Vyas, V.K., Ghate, M. & Katariya, H. 2D and 3D-QSAR study on 4-anilinoquinozaline derivatives as potent apoptosis inducer and efficacious anticancer agent.
Org Med Chem Lett1, 13 (2011). https://doi.org/10.1186/2191-2858-1-13