Molecular Similarity Perception Based on Machine-Learning Models
Abstract
:1. Introduction
2. Results and Discussion
2.1. Rational Selection of the Dataset and Human Experts Similarity Assessment
2.2. Building and Validation of the Models
3. Materials and Methods
3.1. The Data Set Used for Human Assessments
3.2. The Franco et al. Training Set
3.3. The 2D Protocol
3.4. The OpenEye 3D Protocol
3.5. Training of Similarity Prediction Models
3.6. Model Performance Evaluation
3.7. Model Implementation
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Franco, P. Orphan drugs: The regulatory environment. Drug Discov. Today 2013, 18, 163–172. [Google Scholar] [CrossRef] [PubMed]
- DiMasi, J.A.; Grabowski, H.G.; Hansen, R.W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20–33. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Morgan, S.; Grootendorst, P.; Lexchin, J.; Cunningham, C.; Greyson, D. The cost of drug development: A systematic review. Health Policy 2011, 100, 4–17. [Google Scholar] [CrossRef] [PubMed]
- Simmons, S.; Estes, Z. Individual differences in the perception of similarity and difference. Cognition 2008, 108, 781–795. [Google Scholar] [CrossRef]
- Kutchukian, P.S.; Vasilyeva, N.Y.; Xu, J.; Lindvall, M.K.; Dillon, M.P.; Glick, M.; Coley, J.D.; Brooijmans, N. Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery. PLoS ONE 2012, 7, e48476. [Google Scholar] [CrossRef] [Green Version]
- Lajiness, M.S.; Maggiora, G.M.; Shanmugasundaram, V. Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds. J. Med. Chem. 2004, 47, 4891–4896. [Google Scholar] [CrossRef]
- Hack, M.D.; Rassokhin, D.N.; Buyck, C.; Seierstad, M.; Skalkin, A.; Holte, P.T.; Jones, T.K.; Mirzadegan, T.; Agrafiotis, D.K. Library Enhancement through the Wisdom of Crowds. J. Chem. Inf. Model. 2011, 51, 3275–3286. [Google Scholar] [CrossRef]
- Lopez-Vallejo, F.; Caulfield, T.; Martinez-Mayorga, K.; Giulianotti, M.A.; Nefzi, A.; Houghten, R.A.; Medina-Franco, J.L. Integrating Virtual Screening and Combinatorial Chemistry for Accelerated Drug Discovery. Comb. Chem. High Throughput Screen. 2011, 14, 475–487. [Google Scholar] [CrossRef]
- Medina-Franco, J.L.; Caulfield, T. Advances in the computational development of DNA methyltransferase inhibitors. Drug Discov. Today 2011, 16, 418–425. [Google Scholar] [CrossRef]
- Pérez-Villanueva, J.; Medina-Franco, J.L.; Caulfield, T.R.; Hernández-Campos, A.; Hernández-Luis, F.; Yépez-Mulia, L.; Castillo, R. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) of some benzimidazole derivatives with trichomonicidal activity. Eur. J. Med. Chem. 2011, 46, 3499–3508. [Google Scholar] [CrossRef]
- Franco, P.; Porta, N.; Holliday, J.D.; Willett, P. The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation. J. Cheminform 2014, 6, 5. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Franco, P.; Porta, N.; Holliday, J.D.; Willett, P. Molecular similarity considerations in the licensing of orphan drugs. Drug Discov. Today 2017, 22, 377–381. [Google Scholar] [CrossRef] [PubMed]
- Chemical Computing Group ULC. Molecular Operating Environment; Chemical Computing Group ULC: Montreal, QC, Canada, 2020. [Google Scholar]
- ROCS. Santa Fe, NM: OpenEye Scientific Software. Available online: https://www.eyesopen.com/rocs (accessed on 26 May 2022).
- Haigh, J.A.; Pickup, B.T.; Grant, J.A.; Nicholls, A. Small Molecule Shape-Fingerprints. J. Chem. Inf. Model. 2005, 45, 673–684. [Google Scholar] [CrossRef] [PubMed]
- Hawkins, P.C.D.; Skillman, A.A.G.; Nicholls, A. Comparison of Shape-Matching and Docking as Virtual Screening Tools. J. Med. Chem. 2006, 50, 74–82. [Google Scholar] [CrossRef]
- Artese, A.; Cross, S.; Costa, G.; Distinto, S.; Parrotta, L.; Alcaro, S.; Ortuso, F.; Cruciani, G. Molecular interaction fields in drug discovery: Recent advances and future perspectives. WIREs Comput. Mol. Sci. 2013, 3, 594–613. [Google Scholar] [CrossRef]
- Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945. [Google Scholar] [CrossRef]
- Claesen, M.; de Moor, B. Hyperparameter Search in Machine Learning. arXiv 2015, arXiv:1502.02127. Available online: http://arxiv.org/abs/1502.02127 (accessed on 25 August 2021).
- Roth, B.L. Drugs and Valvular Heart Disease. N. Engl. J. Med. 2007, 356, 6–9. [Google Scholar] [CrossRef]
- Wang, B.; Yang, L.-P.; Zhang, X.-Z.; Huang, S.-Q.; Bartlam, M.; Zhou, S.-F. New insights into the structural characteristics and functional relevance of the human cytochrome P450 2D6 enzyme. Drug Metab. Rev. 2009, 41, 573–643. [Google Scholar] [CrossRef]
- Ehrman, J.N.; Lim, V.T.; Bannan, C.C.; Thi, N.; Kyu, D.Y.; Mobley, D.L. Improving small molecule force fields by identifying and characterizing small molecules with inconsistent parameters. J. Comput. Mol. Des. 2021, 35, 271–284. [Google Scholar] [CrossRef]
- Bickerton, G.R.; Paolini, G.V.; Besnard, J.; Muresan, S.; Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 2012, 4, 90–98. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nicholls, A.; McGaughey, G.B.; Sheridan, R.P.; Good, A.C.; Warren, G.; Mathieu, M.; Muchmore, S.W.; Brown, S.P.; Grant, J.A.; Haigh, J.A.; et al. Molecular Shape and Medicinal Chemistry: A Perspective. J. Med. Chem. 2010, 53, 3862–3886. [Google Scholar] [CrossRef] [PubMed]
- Blum, L.C.; Van Deursen, R.; Reymond, J.-L. Visualisation and subsets of the chemical universe database GDB-13 for virtual screening. J. Comput. Mol. Des. 2011, 25, 637–647. [Google Scholar] [CrossRef] [PubMed]
- Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [Green Version]
- Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Krüger, F.A.; Light, Y.; Mak, L.; McGlinchey, S.; et al. The ChEMBL bioactivity database: An update. Nucleic Acids Res. 2013, 42, D1083–D1090. [Google Scholar] [CrossRef] [Green Version]
- Sanguinetti, M.C.; Tristani-Firouzi, M. hERG potassium channels and cardiac arrhythmia. Nature 2006, 440, 463–469. [Google Scholar] [CrossRef]
- Heller, S.R.; McNaught, A.; Stein, S.; Tchekhovskoi, D.; Pletnev, I.V. InChI-the worldwide chemical structure identifier standard. J. Cheminform. 2013, 5, 7. [Google Scholar] [CrossRef] [Green Version]
- Heller, S.R.; McNaught, A.; Pletnev, I.V.; Stein, S.; Tchekhovskoi, D. InChI, the IUPAC International Chemical Identifier. J. Cheminform. 2015, 7, 23. [Google Scholar] [CrossRef] [Green Version]
- Voila-Dashboards/Voila. Voilà Dashboards. 2021. Available online: https://github.com/voila-dashboards/voila (accessed on 22 August 2021).
- Kluyver, T.; Ragan-Kelley, B.; Pérez, F.; Granger, B.E.; Bussonnier, M.; Frederic, J.; Kelley, K.; Hamrick, J.; Grout, J.; Corlay, S.; et al. Jupyter Notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas; IOS Press: Amsterdam, The Netherlands, 2016; pp. 87–90. [Google Scholar] [CrossRef]
- Heroku-Cloud Application Platform. Available online: https://www.heroku.com/ (accessed on 22 August 2021).
- Group, P.G.D. PostgreSQL. 2021. Available online: https://www.postgresql.org/ (accessed on 22 August 2021).
- Rose, A.S.; Hildebrand, P.W. NGL Viewer: A web application for molecular visualization. Nucleic Acids Res. 2015, 43, W576–W579. [Google Scholar] [CrossRef]
- Nguyen, H.; Case, D.A.; Rose, A.S. NGLview–interactive molecular graphics for Jupyter notebooks. Bioinformatics 2017, 34, 1241–1242. [Google Scholar] [CrossRef] [Green Version]
- Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; et al. DrugBank 3.0: A comprehensive resource for ‘Omics’ research on drugs. Nucleic Acids Res. 2010, 39, D1035–D1041. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef] [PubMed]
- RDKit: Open-Source Cheminformatics. Available online: http://www.rdkit.org (accessed on 26 May 2022).
- Swain, M. MolVS: Molecule Validation and Standardization. 2021. Available online: https://github.com/mcs07/MolVS (accessed on 18 August 2021).
- Steinbeck, C.; Han, Y.; Kuhn, S.; Horlacher, O.; Luttmann, A.E.; Willighagen, E. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43, 493–500. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Willighagen, E.L.; Mayfield, J.W.; Alvarsson, J.; Berg, A.; Carlsson, L.; Jeliazkova, N.; Kuhn, S.; Pluskal, T.; Rojas-Chertó, M.; Spjuth, O.; et al. The Chemistry Development Kit (CDK) v2.0: Atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017, 9, 33. [Google Scholar] [CrossRef] [Green Version]
- OMEGA. Santa Fe, NM: OpenEye Scientific Software. Available online: https://www.eyesopen.com/omega (accessed on 26 May 2022).
- Hawkins, P.C.D.; Skillman, A.G.; Warren, G.L.; Ellingson, B.A.; Stahl, M.T. Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database. J. Chem. Inf. Model. 2010, 50, 572–584. [Google Scholar] [CrossRef]
- Hawkins, P.C.D.; Nicholls, A. Conformer Generation with OMEGA: Learning from the Data Set and the Analysis of Failures. J. Chem. Inf. Model. 2012, 52, 2919–2936. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. IJDKP 2015, 5, 1. [Google Scholar] [CrossRef]
Model Type | Variables | Fit | Validation | ||
---|---|---|---|---|---|
Ncorrect | ROCAUC | Ncorrect | ROCAUC | ||
single-feature | tXT | 81 | 0.920 | 92 | 0.988 |
tCS | 70 | 0.845 | 92 | 0.970 | |
double-feature | tXT, tCS | 84 | 0.924 | 95 | 0.988 |
Model Type | Variables | Fit | Validation | ||
---|---|---|---|---|---|
Ncorrect | ROCAUC | Ncorrect | ROCAUC | ||
single-feature | tXT | 93 | 0.988 | 81 | 0.920 |
tCS | 91 | 0.970 | 69 | 0.845 | |
double-feature | tXT, tCS | 95 | 0.988 | 81 | 0.916 |
ω0 | ω1 | ω2 | |
---|---|---|---|
Equation (2), tXT | −4.860 | 8.449 | - |
Equation (2), tCS | −4.464 | 3.554 | - |
Equation (3) | −5.605 | 5.214 | 2.009 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gandini, E.; Marcou, G.; Bonachera, F.; Varnek, A.; Pieraccini, S.; Sironi, M. Molecular Similarity Perception Based on Machine-Learning Models. Int. J. Mol. Sci. 2022, 23, 6114. https://doi.org/10.3390/ijms23116114
Gandini E, Marcou G, Bonachera F, Varnek A, Pieraccini S, Sironi M. Molecular Similarity Perception Based on Machine-Learning Models. International Journal of Molecular Sciences. 2022; 23(11):6114. https://doi.org/10.3390/ijms23116114
Chicago/Turabian StyleGandini, Enrico, Gilles Marcou, Fanny Bonachera, Alexandre Varnek, Stefano Pieraccini, and Maurizio Sironi. 2022. "Molecular Similarity Perception Based on Machine-Learning Models" International Journal of Molecular Sciences 23, no. 11: 6114. https://doi.org/10.3390/ijms23116114
APA StyleGandini, E., Marcou, G., Bonachera, F., Varnek, A., Pieraccini, S., & Sironi, M. (2022). Molecular Similarity Perception Based on Machine-Learning Models. International Journal of Molecular Sciences, 23(11), 6114. https://doi.org/10.3390/ijms23116114