Machine Learning for Molecular Modelling in Drug Design

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (25 September 2018) | Viewed by 42172

Special Issue Editor


E-Mail Website
Guest Editor
Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France
Interests: structure bioinformatics; cancer pharmaco-omics modelling; biomarker discovery; precision oncology; chemoinformatics; drug discovery informatics; virtual screening; machine learning

Special Issue Information

Dear Colleagues,

Machine Learning (ML) has become a crucial component of early drug discovery. This research area has been fuelled by two main factors. The first is the fast-growing availability of relevant experimental data. Examples of such data are bioactivities between molecules of known chemical structure and non-molecular targets (cell lines, mice models, etc.), binding affinities of such molecules against macromolecular targets or X-ray crystal structures of proteins acting as drug targets. This trend has been catalysed by the development of community resources (e.g., ChEMBL, PubChem or PDB to name a few) that curate and facilitate re-using these data sets for predictive modelling. The second factor is the easy access to high-quality implementations in R or Python of a range of ML algorithms, along with the continuous introduction of new advances (e.g., XGBoost, deep learning or conformal prediction). As a result, an increasing number of data-driven ML models are being proposed and found advantageous in some way to identify new starting points for the drug discovery process.

We invite scientists working on this area to submit their original research or review articles for publication in this Special Issue. Topics of interest include (but are not limited to) docking, QSAR, target prediction, virtual screening or lead optimization. Both application and methodology research studies are welcome.

Dr. Pedro J. Ballester
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Predictive modelling
  • Docking
  • QSAR
  • Virtual screening
  • Lead optimization
  • Target prediction
  • Drug design

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 175 KiB  
Editorial
Machine Learning for Molecular Modelling in Drug Design
by Pedro J. Ballester
Biomolecules 2019, 9(6), 216; https://doi.org/10.3390/biom9060216 - 4 Jun 2019
Cited by 30 | Viewed by 4770
Abstract
Machine learning (ML) has become a crucial component of early drug discovery [...] Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)

Research

Jump to: Editorial

17 pages, 829 KiB  
Article
Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders
by Esben Jannik Bjerrum and Boris Sattarov
Biomolecules 2018, 8(4), 131; https://doi.org/10.3390/biom8040131 - 30 Oct 2018
Cited by 115 | Viewed by 11290
Abstract
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. [...] Read more.
Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here, it is shown that the choice of chemical representation, such as strings from the simplified molecular-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the code layer in quantitative structure activity relationship (QSAR) of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a marked increase in the rate of decoding to different molecules than encoded, a tendency that can be counteracted with more complex network architectures. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Show Figures

Graphical abstract

11 pages, 835 KiB  
Article
Predicting Aromatic Amine Mutagenicity with Confidence: A Case Study Using Conformal Prediction
by Ulf Norinder, Glenn Myatt and Ernst Ahlberg
Biomolecules 2018, 8(3), 85; https://doi.org/10.3390/biom8030085 - 29 Aug 2018
Cited by 20 | Viewed by 4706
Abstract
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such [...] Read more.
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such as both (both classes assigned to a compound) and empty (no class assigned to a compound), provides the user with additional information on how to use, further develop, and possibly improve future models. The study also indicates that the use of different sets of fingerprints results in models, for which the ability to discriminate varies with respect to the set level of acceptable errors. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Show Figures

Figure 1

22 pages, 7587 KiB  
Article
In Silico HCT116 Human Colon Cancer Cell-Based Models En Route to the Discovery of Lead-Like Anticancer Drugs
by Sara Cruz, Sofia E. Gomes, Pedro M. Borralho, Cecília M. P. Rodrigues, Susana P. Gaudêncio and Florbela Pereira
Biomolecules 2018, 8(3), 56; https://doi.org/10.3390/biom8030056 - 17 Jul 2018
Cited by 27 | Viewed by 6101
Abstract
To discover new inhibitors against the human colon carcinoma HCT116 cell line, two quantitative structure–activity relationship (QSAR) studies using molecular and nuclear magnetic resonance (NMR) descriptors were developed through exploration of machine learning techniques and using the value of half maximal inhibitory concentration [...] Read more.
To discover new inhibitors against the human colon carcinoma HCT116 cell line, two quantitative structure–activity relationship (QSAR) studies using molecular and nuclear magnetic resonance (NMR) descriptors were developed through exploration of machine learning techniques and using the value of half maximal inhibitory concentration (IC50). In the first approach, A, regression models were developed using a total of 7339 molecules that were extracted from the ChEMBL and ZINC databases and recent literature. The performance of the regression models was successfully evaluated by internal and external validations, the best model achieved R2 of 0.75 and 0.73 and root mean square error (RMSE) of 0.66 and 0.69 for the training and test sets, respectively. With the inherent time-consuming efforts of working with natural products (NPs), we conceived a new NP drug hit discovery strategy that consists in frontloading samples with 1D NMR descriptors to predict compounds with anticancer activity prior to bioactivity screening for NPs discovery, approach B. The NMR QSAR classification models were built using 1D NMR data (1H and 13C) as descriptors, from 50 crude extracts, 55 fractions and five pure compounds obtained from actinobacteria isolated from marine sediments collected off the Madeira Archipelago. The overall predictability accuracies of the best model exceeded 63% for both training and test sets. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Show Figures

Graphical abstract

22 pages, 1227 KiB  
Article
Pharmaceutical Machine Learning: Virtual High-Throughput Screens Identifying Promising and Economical Small Molecule Inhibitors of Complement Factor C1s
by Jonathan J. Chen, Lyndsey N. Schmucker and Donald P. Visco, Jr.
Biomolecules 2018, 8(2), 24; https://doi.org/10.3390/biom8020024 - 7 May 2018
Cited by 15 | Viewed by 6379
Abstract
When excessively activated, C1 is insufficiently regulated, which results in tissue damage. Such tissue damage causes the complement system to become further activated to remove the resulting tissue damage, and a vicious cycle of activation/tissue damage occurs. Current Food and Drug Administration approved [...] Read more.
When excessively activated, C1 is insufficiently regulated, which results in tissue damage. Such tissue damage causes the complement system to become further activated to remove the resulting tissue damage, and a vicious cycle of activation/tissue damage occurs. Current Food and Drug Administration approved treatments include supplemental recombinant C1 inhibitor, but these are extremely costly and a more economical solution is desired. In our work, we have utilized an existing data set of 136 compounds that have been previously tested for activity against C1. Using these compounds and the activity data, we have created models using principal component analysis, genetic algorithm, and support vector machine approaches to characterize activity. The models were then utilized to virtually screen the 72 million compound PubChem repository. This first round of virtual high-throughput screening identified many economical and promising inhibitor candidates, a subset of which was tested to validate their biological activity. These results were used to retrain the models and rescreen PubChem in a second round vHTS. Hit rates for the first round vHTS were 57%, while hit rates for the second round vHTS were 50%. Additional structure–property analysis was performed on the active and inactive compounds to identify interesting scaffolds for further investigation. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Show Figures

Graphical abstract

8 pages, 675 KiB  
Article
The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction
by Hongjian Li, Jiangjun Peng, Yee Leung, Kwong-Sak Leung, Man-Hon Wong, Gang Lu and Pedro J. Ballester
Biomolecules 2018, 8(1), 12; https://doi.org/10.3390/biom8010012 - 14 Mar 2018
Cited by 51 | Viewed by 7053
Abstract
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, [...] Read more.
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Show Figures

Figure 1

Back to TopTop