Description
Computer-aided drug design is very promising in expediting the drug discovery process. A computational model for quantitative structure-activity relationship (QSAR) analysis can help a medicinal chemist in making rational decisions to optimize a drug candidate by providing an understanding of inhibitor (ligand or drug)-protein binding interactions. The use of a computational model as a predictive tool can also expedite screening of a large chemical library to identify a small number of potential drug candidates which will reduce the time and costs associated with the synthesis and in vivo experimentation phases. In this dissertation, we have developed hybrid algorithms for descriptor optimization and model development for conducting two major studies. The first study focused on the development of computational models for analyzing the HIV-1 protease protein. In this study, we used Random Forest (RF) for descriptor selection and two learning algorithms (linear discriminant analysis (LDA) and logistic regression (LR)) for model development. The hybrid RF-LDA and RF-LR models successfully identified important conformational changes in the HIV-1 protease binding pockets as a result of mutations, providing insight for the design of novel HIV-1 protease inhibitors. The second study developed computational models for the analysis of the HIV- 1 integrase protein. In this study, we developed a novel hybrid evolutionary algorithm combining differential evolution (DE) and binary particle swarm optimization (BPSO) for descriptor selection. We then developed models using multiple linear regression (MLR), partial least squares (PLS), and extremely randomized trees (ERT). These three models were then used as a virtual screening tool to identify six novel highly-active HIV-1 integrase inhibitors from the NCI Open Database containing 265,242 compounds. We also developed a Python-based evoQSAR library for QSAR modeling which allows the development of statistically significant, robust, predictive models. Hybridization of multiple algorithms gave us superior models which can more clearly explain the chemical properties of the compounds in the data set. To the best of our knowledge, the hybrid DE-BPSO algorithm for descriptor selection and the use of the hybrid RF-LDA and RF-LR, and ERT algorithms for model development have not been previously studied for drug design applications.