Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning
Abstract
:1. Introduction
2. Results
2.1. Exploratory Data Analysis Visualization
2.2. Chemical Space Visualization via Principal Component Analysis
2.3. QSAR Modeling and Validation for Steroidal Inhibitors
2.4. Murcko Scaffold Analysis and R-Group Decomposition-Based Structure–Activity Relationship for Nonsteroidal Inhibitors
- Scaffold 1 has two core fragments. Fragment 1-1 has fluoride and all molecules within this group are active. Fragment 1-2 generally has weaker bioactivities than fragment 1-1, and its functional group substitution is provided in Table 5.
- Scaffold 6 only has one core fragment (6-1). All molecules in this scaffold series are strong inhibitors regardless of the substitutions.
- Scaffolds 9 and 10. These scaffolds are steroid-like, with ring A replaced by a seven-membered ring. All molecules in these scaffolds are highly active against CYP17A1 regardless of the substitutions.
- Scaffold 14 only has one core fragment (14-1). Molecules with methyl groups on the R1 or R2 positions are active.
- Scaffold 15 only has one core fragment (15-1). All molecules in this scaffold are potent or active against CYP17A1. All functional groups at the R1 position are sulfonyl groups.
- Scaffolds 16, 17, 18, and 19 are similar scaffolds sharing the same cyclic skeleton. All molecules in scaffold 18 are either potent or active against CYP17A1 regardless of the substitution. For other scaffolds, compounds with fluoride-containing groups on R1 or R2 positions are active.
- Scaffold 20 has only one core fragment (20-1). All molecules in this scaffold are potent or active against CYP17A1, and all functional groups on the R1 position contain the carbonyl group.
2.5. QSAR Models of Nonsteroidal Inhibitors Based on Murcko Scaffold Analysis
3. Discussion
4. Materials and Methods
4.1. Data Compilation
4.2. Exploratory Data Analysis of Drug-Likeness Properties
4.3. PCA
4.4. Structure–Activity Landscape Visualization
4.5. QSAR Modeling
4.5.1. Molecular Fingerprints
4.5.2. Feature Selection
4.5.3. QSAR Model Construction
4.5.4. Performance Evaluation and Model Validation
4.5.5. Applicability Domain Determination
4.6. Scaffold Analysis
4.6.1. Murcko Scaffold Visualization
4.6.2. Murcko Scaffold Diversity Analysis
4.6.3. Scaffold Enrichment Factor Calculation
4.7. Reproducible Research
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Teo, M.Y.; Rathkopf, D.E.; Kantoff, P. Treatment of Advanced Prostate Cancer. Annu. Rev. Med. 2019, 70, 479–499. [Google Scholar] [CrossRef]
- Gomez, L.; Kovac, J.R.; Lamb, D.J. CYP17A1 inhibitors in castration-resistant prostate cancer. Steroids 2015, 95, 80–87. [Google Scholar] [CrossRef]
- Nevedomskaya, E.; Baumgart, S.J.; Haendler, B. Recent Advances in Prostate Cancer Treatment and Drug Discovery. Int. J. Mol. Sci. 2018, 19, 1359. [Google Scholar] [CrossRef] [PubMed]
- DeVore, N.M.; Scott, E.E. Structures of cytochrome P450 17A1 with prostate cancer drugs abiraterone and TOK-001. Nature 2012, 482, 116–119. [Google Scholar] [CrossRef] [PubMed]
- Schaduangrat, N.; Lampa, S.; Simeon, S.; Gleeson, M.P.; Spjuth, O.; Nantasenamat, C. Towards reproducible computational drug discovery. J. Cheminformatics 2020, 12, 9. [Google Scholar] [CrossRef] [PubMed]
- Fjodorova, N.; Novich, M.; Vrachko, M.; Smirnov, V.; Kharchevnikova, N.; Zholdakova, Z.; Novikov, S.; Skvortsova, N.; Filimonov, D.; Poroikov, V.; et al. Directions in QSAR modeling for regulatory uses in OECD member countries, EU and in Russia. J. Environ. Sci. Health Part C Environ. Carcinog. Ecotoxicol. Rev. 2008, 26, 201–236. [Google Scholar] [CrossRef]
- Piir, G.; Kahn, I.; García-Sosa, A.T.; Sild, S.; Ahte, P.; Maran, U. Best Practices for QSAR Model Reporting: Physical and Chemical Properties, Ecotoxicity, Environmental Fate, Human Health, and Toxicokinetics Endpoints. Environ. Health Perspect. 2018, 126, 126001. [Google Scholar] [CrossRef]
- Tropsha, A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef]
- Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef]
- Saad, F.; Fizazi, K.; Jinga, V.; Efstathiou, E.; Fong, P.C.; Hart, L.L.; Jones, R.; McDermott, R.; Wirth, M.; Suzuki, K.; et al. Orteronel plus prednisone in patients with chemotherapy-naive metastatic castration-resistant prostate cancer (ELM-PC 4): A double-blind, multicentre, phase 3, randomised, placebo-controlled trial. Lancet Oncol. 2015, 16, 338–348. [Google Scholar] [CrossRef]
- Madan, R.A.; Schmidt, K.T.; Karzai, F.; Peer, C.J.; Cordes, L.M.; Chau, C.H.; Steinberg, S.M.; Owens, H.; Eisner, J.; Moore, W.R.; et al. Phase 2 Study of Seviteronel (INO-464) in Patients with Metastatic Castration-Resistant Prostate Cancer after Enzalutamide Treatment. Clin. Genitourin. Cancer 2020, 18, 258–267.e1. [Google Scholar] [CrossRef] [PubMed]
- Latysheva, A.S.; Zolottsev, V.A.; Pokrovsky, V.S.; Khan, I.I.; Misharin, A.Y. Novel Nitrogen Containing Steroid Derivatives for Prostate Cancer Treatment. Curr. Med. Chem. 2021, 28, 8416–8432. [Google Scholar] [CrossRef] [PubMed]
- Mostaghel, E.A.; Marck, B.T.; Plymate, S.R.; Vessella, R.L.; Balk, S.; Matsumoto, A.M. Resistance to CYP17A1 inhibition with abiraterone in castration-resistant prostate cancer: Induction of steroidogenesis and androgen receptor splice variants. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 2011, 17, 5913–5925. [Google Scholar] [CrossRef] [PubMed]
- Attard, G.; Reid, A.H.M.; Auchus, R.J.; Hughes, B.A.; Cassidy, A.M.; Thompson, E.; Oommen, N.B.; Folkerd, E.; Dowsett, M.; Arlt, W.; et al. Clinical and biochemical consequences of CYP17A1 inhibition with abiraterone given with and without exogenous glucocorticoids in castrate men with advanced prostate cancer. J. Clin. Endocrinol. Metab. 2012, 97, 507–516. [Google Scholar] [CrossRef] [PubMed]
- Giacinti, S.; Bassanelli, M.; Aschelter, A.M.; Milano, A.; Roberto, M.; Marchetti, P. Resistance to abiraterone in castration-resistant prostate cancer: A review of the literature. Anticancer. Res. 2014, 34, 6265–6269. [Google Scholar] [PubMed]
- Petrunak, E.M.; Rogers, S.A.; Aubé, J.; Scott, E.E. Structural and Functional Evaluation of Clinically Relevant Inhibitors of Steroidogenic Cytochrome P450 17A1. Drug Metab. Dispos. Biol. Fate Chem. 2017, 45, 635–645. [Google Scholar] [CrossRef] [PubMed]
- Al-Masoudi, N.A.; Ali, D.S.; Saeed, B.; Hartmann, R.W.; Engel, M.; Rashid, S.; Saeed, A. New CYP17 hydroxylase inhibitors: Synthesis, biological evaluation, QSAR, and molecular docking study of new pregnenolone analogs. Arch. Der Pharm. 2014, 347, 896–907. [Google Scholar] [CrossRef]
- Gumede, N.J.; Nxumalo, W.; Bisetty, K.; Escuder Gilabert, L.; Medina-Hernandez, M.J.; Sagrado, S. Prospective computational design and in vitro bio-analytical tests of new chemical entities as potential selective CYP17A1 lyase inhibitors. Bioorganic Chem. 2020, 94, 103462. [Google Scholar] [CrossRef]
- Wróbel, T.M.; Rogova, O.; Sharma, K.; Rojas Velazquez, M.N.; Pandey, A.V.; Jørgensen, F.S.; Arendrup, F.S.; Andersen, K.L.; Björkling, F. Synthesis and Structure–Activity Relationships of Novel Non-Steroidal CYP17A1 Inhibitors as Potential Prostate Cancer Agents. Biomolecules 2022, 12, 165. [Google Scholar] [CrossRef]
- Simeon, S.; Anuwongcharoen, N.; Shoombuatong, W.; Malik, A.A.; Prachayasittikul, V.; Wikberg, J.E.S.; Nantasenamat, C. Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking. PeerJ 2016, 4, e2322. [Google Scholar] [CrossRef] [Green Version]
- Suvannang, N.; Preeyanon, L.; Malik, A.A.; Schaduangrat, N.; Shoombuatong, W.; Worachartcheewan, A.; Tantimongcolwat, T.; Nantasenamat, C. Probing the origin of estrogen receptor alpha inhibition via large-scale QSAR study. RSC Adv. 2018, 8, 11344–11356. [Google Scholar] [CrossRef] [PubMed]
- Nantasenamat, C. Best Practices for Constructing Reproducible QSAR Models. In Ecotoxicological QSARs; Roy, K., Ed.; Springer US: New York, NY, USA, 2020; pp. 55–75. [Google Scholar]
- Sander, T.; Freyss, J.; Von Korff, M.; Rufener, C. DataWarrior: An open-source program for chemistry aware data visualization and analysis. J. Chem. Inf. Model. 2015, 55, 460–473. [Google Scholar] [CrossRef] [PubMed]
- Guha, R.; Van Drie, J.H. Structure—Activity landscape index: Identifying and quantifying activity cliffs. J. Chem. Inf. Model. 2008, 48, 646–658. [Google Scholar] [CrossRef] [PubMed]
- González-Medina, M.; Méndez-Lucio, O.; Medina-Franco, J.L. Activity Landscape Plotter: A Web-Based Application for the Analysis of Structure–Activity Relationships. J. Chem. Inf. Model. 2017, 57, 397–402. [Google Scholar] [CrossRef] [PubMed]
- Carracedo-Reboredo, P.; Liñares-Blanco, J.; Rodríguez-Fernández, N.; Cedrón, F.; Novoa, F.J.; Carballal, A.; Maojo, V.; Pazos, A.; Fernandez-Lozano, C. A review on machine learning approaches and trends in drug discovery. Comput. Struct. Biotechnol. J. 2021, 19, 4538–4558. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Gorodkin, J. Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem. 2004, 28, 367–374. [Google Scholar] [CrossRef]
- Batista, J.; Vikić-Topić, D.; Lučić, B. The Difference between the Accuracy of Real and the Corresponding Random Model is a Useful Parameter for Validation of Two-State Classification Model Quality. Croat. Chem. Acta 2016, 89, 527–534. [Google Scholar] [CrossRef]
- Lucic, B.; Batista, J.; Bojović, V.; Lovric, M.; Sovic, A.; Beslo, D.; Nadramija, D.; Vikić-Topić, D. Estimation of Random Accuracy and its Use in Validation of Predictive Quality of Classification Models within Predictive Challenges. Croat. Chem. Acta 2019, 92, 379–391. [Google Scholar] [CrossRef]
- Bemis, G.W.; Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887–2893. [Google Scholar] [CrossRef]
- Manelfi, C.; Gemei, M.; Talarico, C.; Cerchia, C.; Fava, A.; Lunghini, F.; Beccari, A.R. “Molecular Anatomy”: A new multi-dimensional hierarchical scaffold analysis tool. J. Cheminformatics 2021, 13, 54. [Google Scholar] [CrossRef] [PubMed]
Results | MW | LogP | nHA | |||
---|---|---|---|---|---|---|
Nonsteroidal | Steroidal | Nonsteroidal | Steroidal | Nonsteroidal | Steroidal | |
p-value | 0.17162521 | 3.01167828 · 10−16 | 2.35402105 · 10−37 | |||
min | 255.05 | 272.21 | 1.03 | 3.13 | 1 | 1 |
max | 700.33 | 446.29 | 6.48 | 6.18 | 12 | 5 |
median | 354 | 363.26 | 4.02 | 4.72 | 5 | 3 |
mean | 380.09 | 365.59 | 3.90 | 4.67 | 4.69 | 2.65 |
skewness | 1.36 | 0.56 | −0.18 | −0.06 | 1.14 | 0.55 |
kurtosis | 1.49 | 1.07 | −0.13 | −0.65 | 1.54 | 0.14 |
Results | nHD | nRot | TPSA | |||
Nonsteroidal | Steroidal | Nonsteroidal | Steroidal | Nonsteroidal | Steroidal | |
p-value | 0.02433225 | 1.02741519 · 10−37 | 4.7408051 · 10−20 | |||
min | 0 | 0 | 1 | 0 | 12.89 | 12.89 |
max | 4 | 3 | 13 | 5 | 120.27 | 75.21 |
median | 0 | 1 | 4 | 1 | 49.33 | 34.89 |
mean | 0.65 | 0.68 | 4.15 | 1.49 | 57.78 | 38.69 |
skewness | 1.53 | 0.62 | 0.81 | 1.92 | 0.59 | 0.28 |
kurtosis | 1.63 | −0.17 | 0.34 | 2.51 | −0.13 | 0.52 |
Property | PC1 | PC2 | PC3 |
---|---|---|---|
MW | 0.383 | −0.486 | 0.270 |
LogP | −0.218 | −0.597 | 0.556 |
nHA | 0.535 | −0.056 | −0.195 |
nHD | 0.097 | 0.578 | 0.757 |
nRot | 0.494 | −0.159 | 0.014 |
TPSA | 0.516 | 0.210 | 0.081 |
Cumulated variance (%) | 54.484 | 82.646 | 94.274 |
Accuracy | MCC | |||||
---|---|---|---|---|---|---|
Training | CV | Test | Training | CV | Test | |
DT | 0.933 | 0.788 | 0.789 | 0.911 | 0.724 | 0.724 |
ET | 0.933 | 0.818 | 0.833 | 0.911 | 0.766 | 0.784 |
RF | 0.933 | 0.816 | 0.844 | 0.912 | 0.762 | 0.798 |
GB | 0.933 | 0.776 | 0.856 | 0.911 | 0.711 | 0.809 |
LGBM | 0.933 | 0.779 | 0.844 | 0.911 | 0.713 | 0.796 |
XGB | 0.933 | 0.779 | 0.833 | 0.912 | 0.713 | 0.783 |
SVC | 0.768 | 0.718 | 0.667 | 0.693 | 0.629 | 0.559 |
MLP | 0.916 | 0.788 | 0.844 | 0.890 | 0.722 | 0.798 |
LR | 0.760 | 0.670 | 0.633 | 0.681 | 0.568 | 0.509 |
KNN | 0.763 | 0.653 | 0.667 | 0.690 | 0.544 | 0.566 |
NB | 0.542 | 0.514 | 0.456 | 0.401 | 0.365 | 0.298 |
GP | 0.908 | 0.796 | 0.811 | 0.879 | 0.736 | 0.751 |
N | Ns | Nss | Ncsk | Ns/N | Nss/N | Ncsk/N | Ncsk/Ns | |
---|---|---|---|---|---|---|---|---|
Complete | 683 | 268 | 162 | 150 | 0.392 | 0.237 | 0.220 | 0.560 |
pIC50 ≥ 7.0 | 351 | 165 | 105 | 91 | 0.470 | 0.299 | 0.259 | 0.552 |
pIC50 < 7.0 | 332 | 151 | 101 | 101 | 0.455 | 0.304 | 0.304 | 0.627 |
Scaffold ID—Core ID | Effect of Substitutions |
---|---|
1-2 | R4 position: hydroxy group can increase bioactivities, but nitrogen/sulfur/halogen groups reduce activities. |
2-1 | Hydroxyl group on R2 or R3 position can enhance bioactivities. |
3-1, 3-2, 3-3 | R1 position with hydroxyl group yields strong activities; R2 position with fluoride weakens bioactivities. |
4-1 | R3 position with halogens increases bioactivities; R2 position with sulfonamides decreases bioactivities. |
5-1 | R1 position with halogens has negative impacts on activities. |
7-1 | R2 position: sulfonamide is a plus to the activity, but amidine is a minus. |
8-1 | R1 position must be a bromide to be bioactive. |
11-1 | R1 position: sulfonyl group contributes positively to bioactivities, while the ketone group contributes negatively. |
11-2 | R1 position: ketone group contributes positively to activities. |
12-1 | R1 or R2 positions: hydroxyl and amide groups increase bioactivities, while the carbamate group reduces activities. The length of the sidechain also contributes negatively to activities. |
13-1, 13-2, 13-3 | Any substitution on the R1 or R2 position will reduce activities. |
14-1 | R1 or R2 positions: compounds with methyl groups are active. |
16-1, 16-2, 17-1, 18-1, 19-1, 19-2, 19-3, 19-5 | R1 or R2 positions: compounds with fluoride-containing groups are active. |
Model | Scaffold | Fingerprint | Algorithm | Accuracy | ||
---|---|---|---|---|---|---|
Training | CV | Test | ||||
II | 1,4 | KRC | ET | 0.978 | 0.842 | 0.783 |
III | 2,12 | KR | RF | 0.987 | 0.771 | 0.706 |
IV | 6,7,11,20 | KRC | XGB | 0.940 | 0.741 | 0.735 |
V | 3 | KRC | RF | 0.977 | 0.907 | 0.970 |
VI | 5 | PC | GP | 0.964 | 0.879 | 0.905 |
VII | 16,17,18,19 | KRC | XGB | 0.929 | 0.854 | 0.888 |
VIII | 13 | PC | RF | 0.960 | 0.920 | 0.913 |
Algorithm | Abbr | Type | Description |
---|---|---|---|
Decision tree | DT | Tree model | Tree-structured decision support tool, both for classification and regression models Both classification and regression. |
Extra trees | ET | Ensemble learning | Extremely randomized trees. Meta-estimator consisting of a multitude of decision trees. Predictions are conducted by averaging the prediction of trees in regression tasks or using majority voting in classification tasks. Unlike random forests that develop each decision tree from a bootstrap sample of the training set, it fits each decision tree upon the entire training dataset. Both classification and regression. |
Random forest | RF | Ensemble learning | Meta-estimator consisting of a multitude of decision trees, making predictions by averaging the decision tree predictions. Fits each decision tree on a bootstrap sample of the training set. Belongs to the bagging ensemble algorithm. Both classification and regression. |
Gradient boost | GB | Ensemble learning | Boosting ensemble algorithm, the generalization of AdaBoost. A forward-learning ensemble algorithm that obtains predictive results using gradually improved estimations. Both classification and regression. |
LightGBM | LGBM | Ensemble learning | Light gradient-boosting machine. A gradient-boosting algorithm based on decision trees to increase the efficiency of the model and reduce memory usage. Characterized by vertical pruning decision trees, high speed, and low memory use. Suitable for large datasets. Both classification and regression. |
Extreme gradient boost | XGB | Ensemble learning | Extreme gradient boosting. A tree-based ensemble machine learning algorithm that is a scalable, optimized distributed machine learning system for tree boosting. Both classification and regression. |
Multilayer perceptron | MLP | Artificial neural network | Consists of input and output layers, along with a multitude of hidden layers between. Each node amongst layers is a neuron that utilizes an activation function. Backpropagation tactics are the algorithms for training. Both classification and regression. |
Logistic regression | LR | Linear model | Modeling the relationship between independent variables and dependent variables by fitting a linear equation to the dataset Classification. |
K-nearest neighbor | KNN | Non-parametric | A simple algorithm that stores all available cases and predicts the numerical target based on a similarity measure. Both classification and regression. |
Support vector machine | SVM | Kernel function | Support vector machine constructs a hyperplane in multidimensional space to separate different classes. SVM generates hyperplanes in an iterative manner to minimize an error. Both classification and regression. |
Naive-Bayes | NB | Naive-bayes | Naive-Bayes classifier is a simple and quick classifier based on probability. Classification. |
Gaussian process | GP | Non-parametric | Nonparametric Bayesian algorithm that infers a probability distribution over all possible values. Both classification and regression. |
Model | I | II | III | IV | V | VI | VII | VIII |
---|---|---|---|---|---|---|---|---|
Training | 358 | 89 | 78 | 134 | 129 | 83 | 184 | 275 |
Test | 90 | 23 | 34 | 34 | 33 | 21 | 80 | 69 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, T.; Huang, T.; Yu, L.; Nantasenamat, C.; Anuwongcharoen, N.; Piacham, T.; Ren, R.; Chiang, Y.-C. Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning. Molecules 2023, 28, 1679. https://doi.org/10.3390/molecules28041679
Yu T, Huang T, Yu L, Nantasenamat C, Anuwongcharoen N, Piacham T, Ren R, Chiang Y-C. Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning. Molecules. 2023; 28(4):1679. https://doi.org/10.3390/molecules28041679
Chicago/Turabian StyleYu, Tianshi, Tianyang Huang, Leiye Yu, Chanin Nantasenamat, Nuttapat Anuwongcharoen, Theeraphon Piacham, Ruobing Ren, and Ying-Chih Chiang. 2023. "Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning" Molecules 28, no. 4: 1679. https://doi.org/10.3390/molecules28041679
APA StyleYu, T., Huang, T., Yu, L., Nantasenamat, C., Anuwongcharoen, N., Piacham, T., Ren, R., & Chiang, Y. -C. (2023). Exploring the Chemical Space of CYP17A1 Inhibitors Using Cheminformatics and Machine Learning. Molecules, 28(4), 1679. https://doi.org/10.3390/molecules28041679