Skip to main content

Analogue-based approaches in anti-cancer compound modelling: the relevance of QSAR models



QSAR is among the most extensively used computational methodology for analogue-based design. The application of various descriptor classes like quantum chemical, molecular mechanics, conceptual density functional theory (DFT)- and docking-based descriptors for predicting anti-cancer activity is well known. Although in vitro assay for anti-cancer activity is available against many different cell lines, most of the computational studies are carried out targeting insufficient number of cell lines. Hence, statistically robust and extensive QSAR studies against 29 different cancer cell lines and its comparative account, has been carried out.


The predictive models were built for 266 compounds with experimental data against 29 different cancer cell lines, employing independent and least number of descriptors. Robust statistical analysis shows a high correlation, cross-validation coefficient values, and provides a range of QSAR equations. Comparative performance of each class of descriptors was carried out and the effect of number of descriptors (1-10) on statistical parameters was tested. Charge-based descriptors were found in 20 out of 39 models (approx. 50%), valency-based descriptor in 14 (approx. 36%) and bond order-based descriptor in 11 (approx. 28%) in comparison to other descriptors. The use of conceptual DFT descriptors does not improve the statistical quality of the models in most cases.


Analysis is done with various models where the number of descriptors is increased from 1 to 10; it is interesting to note that in most cases 3 descriptor-based models are adequate. The study reveals that quantum chemical descriptors are the most important class of descriptors in modelling these series of compounds followed by electrostatic, constitutional, geometrical, topological and conceptual DFT descriptors. Cell lines in nasopharyngeal (2) cancer average R2 = 0.90 followed by cell lines in melanoma cancer (4) with average R2 = 0.81 gave the best statistical values.


Cancer has been seriously threatening the health and life of humans for a long period and has become the leading disease-related cause of deaths of human population [1]. Radiation therapy and surgery as a means of treatment are only successful when the cancer is found at early-localized stage. However, chemotherapy in contrast is the mainstay in treatment of malignancies because of its ability to cure widespread or metastatic cancers. Natural products are the chemical agents that have been the major source of anti-cancer drugs. According to a review on new chemical entities, approximately 74% of anti-cancer drugs were either natural products or natural product-related synthetic compounds or their mimetics [2]. Computational methodologies have emerged as an indispensible tool for any drug discovery program, playing key role from hit identification to lead optimization. The QSPR/QSAR is among the most practical tool used in analogue/ligand-based drug design and has been extensively reviewed for prediction of various properties like ADME [3], toxicity [4, 5], carcinogenicity [6], retention time [7] stability [8] and other physicochemical properties apart from the biological activity [912]. This theoretical method follows the axiom that the variance in the activities or physicochemical properties of chemical compounds is determined by the variance in their molecular structures [1315].

Computational methods aids in not only the design and interpretation of hypothesis-driven experiments in the field of cancer research but also in the rapid generation of new hypotheses. The QSAR has widely been applied for the activity prediction of diverse series of biological and/or chemical compounds including anti-cancer drugs [1621]. A number of quantum chemical descriptors (such as charge, molecular orbital, dipole moment, etc.) and molecular property descriptors (such as steric, hydrophobic coefficient, etc.) have been successfully applied to establish 2D QSAR models for predicting activities of compounds [2224]. Density functional theory (DFT)-based descriptors have found immense usefulness in the prediction of reactivity of atoms and molecules, and its application in the development of QSAR has been recently reviewed [2530]. QSAR has been instrumental in the development of various popular drugs, and it has been discussed in detail earlier [31].

For a cancer type, there are a number of cell lines available, on which in vitro evaluation of biological activity can be performed, but the results of this evaluation varies based on the cell line employed for assay. Therefore, it becomes difficult for computational chemist to choose experimental data from a pool of available biological activity for a single scaffold type, so as to proceed for analogue-based design. Although in vitro assay for anti-cancer activity is available against many different cell lines, most of the computational studies are carried out targeting any one particular cell line, which may not be a good approach to rely upon. The study considering all the available experimental data to build predictive models, will guide medicinal chemist to more reliably design new and potent compounds. Also, analyzing the obtained descriptors for models against all the cell lines, may suggest the importance of a particular class of descriptor in modelling anti-cancer activity against a cancer type. Such statistically robust and extensive QSAR studies against many different cancer cell lines have not been reported yet. Hence, we performed comprehensive QSAR modelling studies on 266 anti-cancer compounds against 29 different cancer cell lines. Descriptor analysis of all the QSAR models was performed to derive commonality among various cell lines belonging to a cancer type. The experimental data considered in the study was from in vitro cell line-based assays, and it is difficult to get reliable target-based information from such studies, unless meticulously validated. Since the aim of the present study was to evaluate the potentials of simple 2D-based descriptors in anti-cancer compound modelling, the biological target-related aspects were not considered. This study provides one of the most comprehensive accounts of the structure-activity relationship of a large number of molecules against 29 different cancer cell lines. Besides being statistically significant, the aim of this study is to assess the role and relevance of computationally demanding conceptual-DFT descriptors compared with the conventional descriptors. The strengths and limitations of QSAR models on treating a complex area such as the development of anti-cancer compounds are important to notice, and the present study shows a systematic way of developing and applying QSAR equations effectively. Table 1 shows the name of scaffolds considered, different cell lines [3241], number of molecules corresponding to cell lines and the target of action or the molecular mechanism of scaffolds.

Table 1 Details of scaffolds considered in the study and the cell lines against which their anticancer activity was reported along with the number of molecules in each cell lines and its molecular target/mechanism of action if studied.

Results and discussion

Two different schemes were opted to develop statistically significant QSAR models. In the first scheme, 10 QSAR models were developed for the 10 scaffolds used in this study (i.e. scaffold-based QSAR models), whereas in the second scheme 29 different QSAR models were developed based on the availability of IC50 values against 29 cancer cell lines by combining all the scaffolds (i.e. cell lines-based QSAR models). The parent structure of all the scaffolds with a number of compounds and name of cell lines are represented in Scheme 1.

Scheme 1

266 compounds which have IC 50 values represented into different scaffolds (S1-S10), the number of compounds in each scaffold in parenthesis and different cell lines against which the cytotoxicity values were reported (please see Tables S1-10 in Additional file 1 for structure of all the compounds with their in vitro IC 50 values against various cell lines).

It is vitally necessary to avoid the oversimplification of the QSAR modelling process and employ statistically robust approaches for the model development. The selection of the best model was based on the values of correlation coefficient obtained from the correlation of approximately 300 descriptors (constitutional, geometrical, topological, electrostatic and quantum chemical, etc.) in different combinations. In one hand, the uniqueness of a compound and its total chemical information cannot be described by very few descriptors while on the other hand large number of descriptors will create confusions and reduce the statistical robustness and predictive ability of the model. The effect of a number of descriptors on the correlation coefficient values for all the models were tested on training set by correlating 1-10 descriptors separately and presented in Figure 1a (for cell lines-based models) and b (for scaffold-based models). We observed that in various models, three descriptors are sufficient for getting a good correlation and using more than three descriptors make only small effect on the statistical quality of the models in most cases. Although more than six descriptor-based models may provide high correlation and cross-validation coefficient values, however, this may be false and thus may not be very useful for the further prediction of IC50 values. Before the division of training and test set of compounds three, four and five, descriptor-based models were selected. While comparing the statistical performance of the selected models, three descriptor-based models were found to be optimum as they provide very acceptable correlation in most cases.

Figure 1

Effect of number of descriptors on the correlation coefficient of (a) cell line-based QSAR models, (b) scaffold-based QSAR models.

All the models were then divided into training and test set by randomly selecting around 20% of the compounds in the test set. Two independent test sets were constructed to rule out chance correlation (statistical data for the second test set is reported in Additional file 1 Table S83). Both the test sets showed the similar statistical performance indicating that the developed models are adequate. Final QSAR models were generated within the training set, and they were used to predict the activity of test set of compounds. The lower average residual obtained in both the training and test set of compounds in all the models indicate that the developed models are valuable and have capability to establish the relationship between the structure and activity for various anti-cancer scaffolds used in this study.

In order to assess and compare the predictive power and the stability of the QSAR models, several statistical and other parameters are reported and widely applied like R2, R cv 2, s2, F, and AE (for details about these parameters, see footnote to Table 2). Table 2 contains the regression summary for cell lines-based QSAR models along with regression equation, name of the cell lines and types of cancer. Most of the cell lines-based QSAR models where the activity range is broad (M1, M2, M4, M5, M6, M8, M9, M11, M12 and M20) show higher statistical quality (R2 ~ 0.80, R cv 2 ~ 0.75) and seems valuable for the current class of compounds. The statistical quality of few other cell line-based models (M10, M15, M19 and M21) is also reasonable (R2 ~ 0.75, R cv 2 ~ 0.70), and these models can be used for the prediction. However, the statistical qualities of M17, M23 and M26 models, which are lower (R2 ~ 0.60, R cv 2 ~ 0.50), show that extra care is required before utilizing these models for the prediction. However, M29 cannot be used for the prediction because of the insignificant statistical results obtained for this model (R2 = 0.46, R cv 2 = 0.43). The reason for poor result in M29 is probably due to involvement of 118 compounds and 5 different scaffolds in this model. The increase in the number of descriptors for M29 is not much improving the quality of the model (with 10 descriptors R2 ~ 0.7) and indicates that the currently used descriptors are not good enough for developing the structure-activity relationship for this model, and one needs to try or develop additional descriptors. However, the involvement of single scaffolds in this model provides a good statistical quality (DU145/S10 in Table 3). The models (M3, M7, M13, M14, M16, M18, M22, M24, M25, M27 and M28), for which the activity range was narrow were moved to the end of Table 2 and will not be very reliable for predictions. Some of these models (M3, M7, M13, M14 and M27) show higher correlation values (R2 ~ 0.80, R cv 2 ~ 0.75) while other six models show moderate correlation values (R2 ~ 0.65, R cv 2 ~ 0.60) although the residuals are lower in all the 11 models as per expectations. The statistical details and descriptor types for cell line-based QSAR models are depicted in Figure 2a.

Table 2 Cell line with type of cancer in parenthesis, scaffolds involved, regression summary and number of compounds in various cell lines based QSAR models.
Table 3 Cell line with type of cancer in parenthesis, scaffolds involved, regression summary and number of compounds in various scaffolds based QSAR models developed for the prediction of IC50 values.
Figure 2

Regression summary (correlation coefficient R2 , cross-validation coefficient R CV 2 and average residual AE values) for (a) cell line-based QSAR models, (b) scaffold-based QSAR models.

Regression summary for scaffold-based QSAR models along with regression equation, name of the cell lines and types of cancer is given in Table 3. We observed a good statistical quality with higher regression coefficient values in all the scaffold-based QSAR models probably because of the involvement of lesser number of compounds and only one scaffold in the development of these models. The range of activity of compounds in four models (S1, S2, S5 and S6) is narrow, so these models were moved to the end of Table 3 and these models will not be very reliable. The models with narrow activity range compounds show lower regression coefficient values compared with the ones with broad activity range compounds. All the scaffold-based models with broad activity range compounds seem reasonable and can be used for the prediction. The statistical details and descriptor types for scaffold-based QSAR models are depicted in Figure 2b.

The observed and predicted activity with residuals and descriptors values for all the developed models are presented in Additional file 1 (Tables S12 to S46). Outliers are those compounds which are unable to fit in the developed QSAR models. Although most of these QSAR models do not have any outlier, however, in some cases maximum of one outlier is present because of its higher deviation between the observed and predicted activities. The occurrence of outliers is not only due to the possibility that the compounds may act by different mechanisms or interact with the receptor in different binding modes but also due to the intrinsic noise associated with both the original data and methodological aspects opted for the construction of models. Figure 3a,b represents the plot between the experimental and predicted IC50 values for cell line- and scaffold-based QSAR, respectively, (the plot for 11 cell line- and 4 scaffold-based models, which has narrow activity range, is presented in Figure S1a,b, respectively, of the Additional file 1). The average residual for test and training set compounds presented in this figure clearly shows the compounds of test set are closer to the line compared with the compounds of training set. Rigorous validation for the applicability of generated QSAR models was done by dividing another independent test set. As per our expectations, the statistical performance of the second test set is similar to that of the first test set. The observed and predicted activity with residuals and descriptors values for all the developed models for the second test set of compounds are presented in Additional file 1 (Tables S48-S82).

Figure 3

Plot between experimental and predicted IC 50 values with correlation coefficient, cross-validation coefficient and average residual for training and test set of molecules separately for (a) Cell line-based QSAR, (b) scaffold-based QSAR models.

In the developed QSAR models, 78 descriptors (42 quantum chemical, 18 electrostatic, 8 constitutional, 7 geometrical and 3 topological) were used in different combinations. Figure 4 depicts the details of all the 78 descriptors, its type and occurrence in the models. The inter-correlation of the descriptors appeared in all the developed models were taken into account, and the descriptors were found to be reasonably orthogonal (see Additional file 1 Table S47 for details). Frequent occurrence of quantum chemical descriptors was found in general in the developed QSAR models. Charge-based descriptors (such as Maximum partial charge for a H atom, Minimum net atomic charge for a H atom, Relative positive charged surface area, Maximum net atomic charge for a C atom etc.) were present in 20 of 39 models (approx. 50%) thereby sharing a major proportion of overall descriptor space. This was followed by valency-based descriptors (such as Minimum valency of O atom, Minimum valency of a C atom, Average valency of a N atom, Maximum valency of a H atom, etc.) present in 14 models (approx. 36%). This was later followed by bond order-based descriptors (such as Minimum (>0.1) bond order of a H atom, Maximum bond order of a N atom, Average bond order of a C atom, Maximum PI-PI bond order, etc.) present in 11 models (~28%). This indicates the role of charge-based, valency-based and bond order-based descriptors in modelling of the present set of compounds. We have tested the conceptual DFT descriptors on all the above models and found that these descriptors are not important for this class of compounds.

Figure 4

Classification of various descriptors involved in QSAR model. Numbers in parenthesis indicates the number of descriptors from one group while numbers outside parenthesis indicates the occurrence of a particular type of descriptor in the models (see Additional file 1 Table S11 for the details of all the descriptors).

Cell lines considered in the current study correspond to 14 different cancer types (Additional file 1 Table S84). Among them, eight cancer types have experimental data with more than one cell line. Thus, comparative statistical significance of various types of cancer has been analysed (see Additional file 1 Table S84 for details). It is interesting to note that nasopharyngeal (R2 = 0.90, R cv 2 = 0.84), lymphoma (R2 = 0.84, R cv 2 = 0.75), cervical (R2 = 0.83, R cv 2 = 0.76), melanoma (R2 = 0.81, R cv 2 = 0.77), CNS (R2 = 0.81, R cv 2 = 0.75), fibroblast (R2 = 0.79, R cv 2 = 0.72) and colon (R2 = 0.77, R cv 2 = 0.69) types of cancer show better statistical performance (average R2 = 0.82 and average R cv 2 = 0.75) compared with other types of cancer (glioblastoma, prostate, breast, lung, blood, ovarian and renal; average R2 = 0.65 and average R cv 2 = 0.57).


Within the present study, we assessed the predictive power of QSAR approaches to model anti-cancer compounds. A total of 39 QSAR models, 10 for different scaffolds and 29 for different cell lines, were built to assess the predictive power of QSAR models for anti-cancer activity. Although analysis is done with various models where the number of descriptors is increased from 1 to 10, it is interesting to note that in most cases 3 descriptor-based models are adequate. The study reveals that quantum chemical descriptors are the most important class of descriptors followed by electrostatic, constitutional, geometrical, topological and conceptual DFT descriptors. Charge-based descriptors prevailed among the rest, followed by valency-based and bond order-based descriptors. Thus, the current study highlights the importance of analogue-based designing approaches in modelling anti-cancer compounds. Considerably, we did not make any assumptions about the site of interaction or mechanism of action of these compounds yet were able to develop statistically robust models for all experimentally tested compounds where the correlation coefficient (R2) and cross-validation coefficient (R cv 2) values are higher and average residuals (AE) are lower in most cases. Cell lines in nasopharyngeal (2) cancer average R2 = 0.90 followed by cell lines in melanoma cancer (4) with average R2 = 0.81 gave the best statistical values.


Details of the scaffold considered in the study along with the cell lines against which experimental IC50 values is reported with number of compounds in each cell line is given in Table 1. Two different schemes (scaffold- and cell line-based) were followed for performing QSAR studies. Scaffold-based QSAR studies were carried out based on the availability of compounds in various scaffolds (S1-S10) collected from ten different studies. The cell line that provided the best regression summary was used for making scaffold-based QSAR models. See Tables S1-S10 in Additional file 1 for the structure and the corresponding activity values of all the compounds. Scheme 2 provides a schematic illustration of workflow adopted in the manuscript for building and validating various QSAR models. A total of 266 compounds are collected along with their anti-cancer activity against 29 cancer cell lines which belong to 10 different chemical scaffolds (Scheme 1). All the structures were initially optimized using semi-empirical AM1 procedure and later subjected to energy evaluations at B3LYP/6-31G(d) level on AM1 geometries [42]. Important descriptors were obtained using these B3LYP calculations by using the CODESSA [43] program in conjunction with the Gaussian output files. The 300 descriptors obtained using the CODESSA program can be divided into different classes such as constitutional, topological, geometrical, quantum chemical and thermodynamic. For each compound these descriptors were calculated, and non-significant descriptors were identified by heuristic method and eliminated. The inter-correlation of the descriptors in all the models was tested. Then, models where the descriptors are highly inter-correlated were replaced and refined so that the descriptors employed in a given model are virtually orthogonal to each other. In order to find out the minimum number of descriptors defining activity, we systematically developed 3, 4 and 5 descriptor-based models for all sets of compounds, using heuristic method. It was found that three descriptor-based models are fairly satisfactory. Then all the compounds were divided into two independent tests (approx. 20%) and training set (approx. 80%) using Project Leader application associated with Scigress explorer [44]. The statistical quality of the model was assessed by various parameters like R2, R2 cv, AE, s, F, for both test and training set. The validation of QSAR models was done by examining the prediction of activity on test set i.e. R2, R2 cv and AE. Also, the effect of the number of descriptors on the correlation coefficient was examined on the training set of molecules by running heuristic method at 1-10 descriptors. Two different training and test sets were developed to rule out chance correlation. Scheme 2 illustrates the steps taken for developing the final QSAR models in a schematic fashion.

Scheme 2

Flowchart of methodology adopted for building and validating QSAR models.



density functional theory.


  1. 1.

    Gibbs JB: Mechanism-based target identification and drug discovery in cancer research. Science 2000, 287: 1969–1973. 10.1126/science.287.5460.1969

    CAS  Article  Google Scholar 

  2. 2.

    Cragg GM, Grothaus PG, Newman DJ: Impact of natural products on developing new anti-cancer agents. Chem Rev 2009, 109: 3012–3043. 10.1021/cr900019j

    CAS  Article  Google Scholar 

  3. 3.

    Hansch C, Leo A, Mekapati SB, Kurup A: QSAR and ADME. Bioorg Med Chem 2004, 12: 3391–3400. 10.1016/j.bmc.2003.11.037

    CAS  Article  Google Scholar 

  4. 4.

    Cronin MT, Dearden JC: QSAR in toxicology. 2. Prediction of acute mammalian toxicity and interspecies correlations. Quant Struct Act Relat 1995, 14: 117–120. 10.1002/qsar.19950140202

    CAS  Article  Google Scholar 

  5. 5.

    Mwense M, Wang XZ, Buontempo FV, Horan N, Young A, Osborn D: QSAR approach for mixture toxicity prediction using independent latent descriptors and fuzzy membership functions. SAR QSAR Environ Res 2006, 17: 53–73. 10.1080/10659360600562202

    CAS  Article  Google Scholar 

  6. 6.

    Benigni R, Giuliani A: Putting the predictive toxicology challenge into perspective: reflections on the results. Bioinformatics 2003, 19: 1194–1200. 10.1093/bioinformatics/btg099

    CAS  Article  Google Scholar 

  7. 7.

    Zhao M, Li Z, Wu Y, Tang YR, Wang C, Zhang Z, Peng S: Studies on log P, retention time and QSAR of 2-substituted phenylnitronyl nitroxides as free radical scavengers. Eur J Med Chem 2007, 42: 955–965. 10.1016/j.ejmech.2006.12.027

    CAS  Article  Google Scholar 

  8. 8.

    Srivastava HK, Chourasia M, Kumar D, Sastry GN: Comparison of computational methods to model dna minor groove binders. J Chem Inf Model 2011, 51: 558–571. 10.1021/ci100474n

    CAS  Article  Google Scholar 

  9. 9.

    Reddy AS, Pati SP, Kumar PP, Pradeep HN, Sastry GN: Virtual screening in drug discovery--a computational perspective. Curr Protein Pept Sci 2007, 8: 329–351.

    CAS  Article  Google Scholar 

  10. 10.

    Pasha FA, Muddassar M, Cho SJ: Molecular docking and 3D QSAR studies of Chk2 inhibitors. Chem Biol Drug Des 2009, 73: 292–300. 10.1111/j.1747-0285.2009.00773.x

    CAS  Article  Google Scholar 

  11. 11.

    Srivastava HK, Pasha FA, Singh PP: Atomic softness-based QSAR study of testosterone. Int J Quant Chem 2005, 103: 237–245. 10.1002/qua.20506

    CAS  Article  Google Scholar 

  12. 12.

    Srivani P, Sastry GN: Potential choline kinase inhibitors: a molecular modeling study of bis-quinolinium compounds. J Mol Graph Mod 2009, 27: 676–688. 10.1016/j.jmgm.2008.10.010

    CAS  Article  Google Scholar 

  13. 13.

    Schultz TW, Cronin MTD, Walker JD, Aptula AO: Quantitative structure-activity relationships (QSARs) in toxicology: a historical perspective. J Mol Struct 2003, 622: 1–22.

    CAS  Article  Google Scholar 

  14. 14.

    Karcher W, Devillers J, (eds): Kluwer Academic Publishers, Dordrecht, Practical Applications of Quantitative Structure-Activity Relationships (QSAR). Environmental Chemistry and Toxicology 1990, 1–12.

  15. 15.

    Katritzky AR, Petrukhin R, Tatham D, Basak S, Benfenati E: Interpretation of quantitative structure-property and activity relationships. J Chem Inf Comput Sci 2001, 41: 679–685.

    CAS  Article  Google Scholar 

  16. 16.

    Ravindra GK, Achaiah G, Sastry GN: Molecular modeling studies of phenoxy-pyrimidinyl imidazoles as p38 kinase inhibitors using QSAR and docking. Eur J Med Chem 2008, 43: 830–838. 10.1016/j.ejmech.2007.06.009

    CAS  Article  Google Scholar 

  17. 17.

    Janardhan S, Srivani P, Sastry GN: 2D and 3D quantitative structure-activity relationship studies on a series of bis-pyridinium compounds as choline kinase inhibitors. QSAR Combi Sci 2006, 25: 860–872. 10.1002/qsar.200530199

    CAS  Article  Google Scholar 

  18. 18.

    Kumar SH: A comparative QSPR study of alkanes with the help of computational chemistry. Bull Kor Chem Soc 2009, 30: 67–76.

    CAS  Article  Google Scholar 

  19. 19.

    de Jonge MR, Koymans LM, Vinkers HM, Daeyaert FF, Heeres J, Lewi PJ, Janssen PA: Structure based activity prediction of HIV-1 reverse transcriptase inhibitors. J Med Chem 2005, 48: 2176–2183. 10.1021/jm049534r

    CAS  Article  Google Scholar 

  20. 20.

    Miguet L, Zervosen A, Gerards T, Pasha FA, Luxen A, Disteche-Nguyen M, Thomas A: Discovery of new inhibitors of resistant streptococcus pneumoniae penicillin binding protein (PBP) 2x by structure-based virtual screening. J Med Chem 2010, 52: 5926–5936.

    Article  Google Scholar 

  21. 21.

    Liao SY, Chen C, Qian L, Shen Y, Zheng KC: QSAR studies and molecular design of phenanthrene-based tylophorine derivatives with anticancer activity. QSAR Combi Sci 2008, 27: 280–288. 10.1002/qsar.200730028

    CAS  Article  Google Scholar 

  22. 22.

    Sivaprakasam P, Xie A, Doerksen RJ: Probing the physicochemical and structural requirements for glycogen synthase kinase-3α inhibition: 2D-QSAR for 3-anilino-4-phenylmaleimides. Bioo Med Chem 2006, 14: 8210–8218. 10.1016/j.bmc.2006.09.021

    CAS  Article  Google Scholar 

  23. 23.

    Chen JC, Shen Y, Liao SY, Chen LM, Zheng KC: DFT-based QSAR study and molecular design of AHMA derivatives as potent anticancer agents. Int J Quant Chem 2007, 107: 1468–1478. 10.1002/qua.21285

    CAS  Article  Google Scholar 

  24. 24.

    Zhang S, Wei L, Bastow K, Zheng W, Brossi A, Lee KH, Tropsha A: Application of validated QSAR models to database mining: discovery of novel tylophorine derivative as potential anticancer agents. J Comput Aided Mol Des 2007, 21: 97–112. 10.1007/s10822-007-9102-6

    CAS  Article  Google Scholar 

  25. 25.

    Parr RG, Szentpály Lv, Liu S: Electrophilicity index. J Am Chem Soc 1999, 121: 1922–1924. 10.1021/ja983494x

    CAS  Article  Google Scholar 

  26. 26.

    Chermette H: Chemical reactivity indexes in density functional theory. J Comp Chem 1999, 20: 129–154. 10.1002/(SICI)1096-987X(19990115)20:1<129::AID-JCC13>3.0.CO;2-A

    CAS  Article  Google Scholar 

  27. 27.

    Chattaraj PK, Maiti B, Sarkar U: Philicity: a unified treatment of chemical reactivity and selectivity. J Phys Chem A 2003, 107: 4973. 10.1021/jp034707u

    CAS  Article  Google Scholar 

  28. 28.

    Chattaraj PK, Roy DR: Local descriptors around a transition state: a link between chemical bonding and reactivity. J Phys Chem A 2005, 109: 3771. 10.1021/jp051118a

    CAS  Article  Google Scholar 

  29. 29.

    Karelson M, Lobanov VS, Katritzky AR: Quantum-chemical descriptors in QSAR/QSPR studies. Chem Rev 1996, 96: 1027–1043. 10.1021/cr950202r

    CAS  Article  Google Scholar 

  30. 30.

    DeProft F, Geerlings P: Calculation of ionization energies, electron affinities, electronegativities, and hardnesses using density functional methods. J Chem Phys 1997, 106: 3270–3279. 10.1063/1.473796

    CAS  Article  Google Scholar 

  31. 31.

    Ooma F: Molecular modeling and computer aided drug design. Examples of their application in medicinal chemistry. Curr Med Chem 2000, 7: 141–158.

    Article  Google Scholar 

  32. 32.

    Quaquebeke EV, Mahieu T, Dumont P, Dewelle J, Ribaucour F, Simon G, Sauvage S, Gaussin JF, Tuti JE, Yazidi M, Vynckt FV, Mijatovic T, Lefranc F, Darro F, Kiss R: 2,2,2-Trichloro- N -({2-[2-(dimethylamino)ethyl]-1,3-dioxo-2,3-dihydro-1H-benzo[de]isoquinolin-5-yl}carbamoyl)acetamide (UNBS3157), a novel nonhematotoxic naphthalimide derivative with potent antitumor activity. J Med Chem 2007, 50: 4122–4134. 10.1021/jm070315q

    Article  Google Scholar 

  33. 33.

    Qiu XL, Li G, Wu G, Zhu J, Zhou L, Chen PL, Chamberlin AR, Lee WH: Synthesis and biological evaluation of a series of novel inhibitor of Nek2/Hec1 analogues. J Med Chem 2009, 52: 1757–1767. 10.1021/jm8015969

    CAS  Article  Google Scholar 

  34. 34.

    Peterson QP, Hsu DC, Goode DR, Novotny CJ, Totten RK, Hergenrother PJ: Procaspase-3 activation as an anti-cancer strategy: structure-activity relationship of procaspase-activating compound 1 (PAC-1) and its cellular co-localization with caspase-3. J Med Chem 2009, 52: 5721–5731. 10.1021/jm900722z

    CAS  Article  Google Scholar 

  35. 35.

    Yang X, Shi Q, Liu Y, Zhao G, Bastow KF, Lin J, Yang S, Yang P, Lee K: Design, synthesis, and mechanistic studies of new 9-substituted phenanthrene-based tylophorine analogues as potent cytotoxic agents. J Med Chem 2009, 52: 5262–5268. 10.1021/jm9009263

    CAS  Article  Google Scholar 

  36. 36.

    Shah BL, Kaur B, Gupta P, Kumar A, Sethi VK, Andotra SS, Singh J, Saxena AK, Taneja SC: Structure-activity relationship (SAR) of parthenin analogues with pro-apoptotic activity: development of novel anti-cancer leads. Bioorg Med Chem Lett 2009, 19: 4394–4398. 10.1016/j.bmcl.2009.05.089

    CAS  Article  Google Scholar 

  37. 37.

    Lu Y, Wang Z, Li C, Chen J, Dalton JT, Li W, Miller DD: Synthesis, in vitro structure-activity relationship, and in vivo studies of 2-arylthiazolidine-4-carboxylic acid amides as anticancer agents. Bioorg Med Chem 2010, 18: 477–495. 10.1016/j.bmc.2009.12.020

    CAS  Article  Google Scholar 

  38. 38.

    Tsoua H, MacEwan G, Birnberg G, Grosu G, Bursavich MG, Bard J, Brooijmansa N, Toral-Barzab L, Hollanderb I, Mansoura TS, Ayral-Kaloustiana S, Yub K: Discovery and optimization of 2-(4-substituted-pyrrolo[2,3-b]pyridin-3-yl)methylene-4-hydroxybenzofuran-3(2H)-ones as potent and selective ATP-competitive inhibitors of the mammalian target of rapamycin (mTOR). Bioorg Med Chem lett 2010, 20: 2321–2325. 10.1016/j.bmcl.2010.01.135

    Article  Google Scholar 

  39. 39.

    Lu Y, Li C, Wang Z, Ross CRII, Chen J, Dalton JT, Li W, Miller DD: Discovery of 4-substituted methoxybenzoyl-aryl-thiazole as novel anticancer agents: synthesis, biological evaluation, and structure-activity relationships. J Med Chem 2009, 52: 1701–1711. 10.1021/jm801449a

    CAS  Article  Google Scholar 

  40. 40.

    Jourdan F, Leese MP, Dohle W, Hamel E, Ferrandis E, Newman SP, Purohit A, Reed MJ, Potter BVL: Synthesis, antitubulin, and antiproliferative SAR of analogues of 2-methoxyestradiol-3,17-O,O-bis-sulfamate. J Med Chem 2010, 53: 2942–2951. 10.1021/jm9018806

    CAS  Article  Google Scholar 

  41. 41.

    Cinelli MA, Morrel AE, Dexheimer TS, Agama K, Agarwal S, Pommier Y, Cushman M: The structure-activity relationships of A-ring-substituted aromathecin topoisomerase I inhibitors strongly support a camptothecin-like binding mode. Bioorg Med Chem 2010, 18: 5535–5552. 10.1016/j.bmc.2010.06.040

    CAS  Article  Google Scholar 

  42. 42.

    Frisch MJ, et al.: Gaussian 03, revision E.0.1. Gaussian, Inc., Pittsburgh, PA; 2003.

    Google Scholar 

  43. 43.

    Katritzky AR, Lobanov VS, Karelson M: CODESSA 2.0, comprehensive descriptors for structural and statistical analysis. University of Florida; 1994.

    Google Scholar 

  44. 44.

    Scigress Explorer version 7.7; Fujitsu: Tokyo, Japan 2008.

Download references


HKS and GNS thank Department of Science and Technology (DST), New Delhi for Fast-Track young scientist and Swarnajayanti fellowships, respectively. The support from CSIR-IICT and NIPER (Hyderabad) is acknowledged.

Author information



Corresponding authors

Correspondence to Hemant Kumar Srivastava or Garikapati Narahari Sastry.

Additional information

Competing interests

The authors declare that they have no competing interests.

Electronic supplementary material

Additional file 1: The additional data file available with the online version of the article contains following information: (a) Structure of all the compounds used in this study (Tables S1-S10); (b) Full name of all the descriptors involved in the study (Table S11); (c) The predicted activity and descriptors values for all the models, the first test set (Tables S12-S46); (d) Inter-correlation analysis of the descriptors (Table S47); (e) The predicted activity and descriptors values for all the models, the second test set (Tables S48-S82); (f) Regression summary for cell- line-based and scaffold-based QSAR models pertaining to the second test set (Table S83a and S83b); (g) Comparative statistical significance of various cancer types (Table S84); (h) Figure of plot between the experimental and predicted IC50 values for the QSAR models where activity range was narrow, based on cell lines and scaffold (Figure S1a,b). (DOC 12 MB)

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bohari, M.H., Srivastava, H.K. & Sastry, G.N. Analogue-based approaches in anti-cancer compound modelling: the relevance of QSAR models. Org Med Chem Lett 1, 3 (2011).

Download citation


  • Analogue-based design
  • Anti-cancer cell lines
  • Anti-cancer drugs
  • Quantum chemical descriptors
  • QSAR
  • Docking