ijms-logo

Journal Browser

Journal Browser

Artificial Intelligence & Deep Learning Approaches for Structural Bioinformatics

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (31 March 2021) | Viewed by 59980

Special Issue Editors


E-Mail Website
Guest Editor
Department of Biological Research on the Red Blood Cells, INTS, INSERM UMR_S 1134, Université de Paris, Université de la Réunion, 75739 Paris, France
Interests: structural bioinformatics; bioinformatics; next-generation sequence; drug design; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Biological Research on the Red Blood Cells, INTS, Univ of Paris, Paris, France
Interests: structural bioinformatics; bioinformatics; deep learning; machine learning; protein structure

Special Issue Information

Dear Colleagues,

Deep learning methods have revolutionized the field of machine learning since their rapid development in 2012. These methods have shown spectacular performances in fields as varied as digital image recognition and interpretation, natural language processing, and Go gaming. These approaches have made it possible to envisage the treatment of complex bioinformatics problems, especially in the field of structural bioinformatics. Deep learning methods open up new avenues of research, to deal with problems which up until recently were still considered too complex or fields which were no longer progressing. Recently, problems as varied as prediction of secondary structures, model quality assessment, and structure prediction determined from co-evolutive constraints have demonstrated striking improvement thanks to deep learning approaches. In this Special Issue, we propose a broad overview of recent advances in various fields of structural bioinformatics that have benefited from the contributions of Deep Learning and Artificial Intelligence. Potential topics include, but are not limited to, the following:

  • Structural Bioinformatics 1: Local structure prediction (secondary structure, local conformation & phi & psi angle, accessibility)
  • Structural Bioinformatics 2: Global structure prediction (Co-evolution contact map)
  • Structural Bioinformatics 3: Structural model assessment
  • Protein/Protein interactions
  • Drug discovery and drug design
  • Protein function and modification predictions
  • Computational protein design
  • Antibodies design and prediction

Prof. Dr. Alexandre G. de Brevern
Prof. Jean-Christophe Gelly
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Protein structure prediction
  • Methods for 1D Protein Structural predictions
  • Methods for 2D Protein Structural predictions
  • Machine learning
  • Protein-protein and protein-ligand interactions
  • Biotechnology
  • Mining protein data
  • Accelerate AI development
  • Structural analysis of proteins

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 2825 KiB  
Article
Structural Dynamics Predominantly Determine the Adaptability of Proteins to Amino Acid Deletions
by Anupam Banerjee and Ivet Bahar
Int. J. Mol. Sci. 2023, 24(9), 8450; https://doi.org/10.3390/ijms24098450 - 8 May 2023
Cited by 1 | Viewed by 1818
Abstract
The insertion or deletion (indel) of amino acids has a variety of effects on protein function, ranging from disease-forming changes to gaining new functions. Despite their importance, indels have not been systematically characterized towards protein engineering or modification goals. In the present work, [...] Read more.
The insertion or deletion (indel) of amino acids has a variety of effects on protein function, ranging from disease-forming changes to gaining new functions. Despite their importance, indels have not been systematically characterized towards protein engineering or modification goals. In the present work, we focus on deletions composed of multiple contiguous amino acids (mAA-dels) and their effects on the protein (mutant) folding ability. Our analysis reveals that the mutant retains the native fold when the mAA-del obeys well-defined structural dynamics properties: localization in intrinsically flexible regions, showing low resistance to mechanical stress, and separation from allosteric signaling paths. Motivated by the possibility of distinguishing the features that underlie the adaptability of proteins to mAA-dels, and by the rapid evaluation of these features using elastic network models, we developed a positive-unlabeled learning-based classifier that can be adopted for protein design purposes. Trained on a consolidated set of features, including those reflecting the intrinsic dynamics of the regions where the mAA-dels occur, the new classifier yields a high recall of 84.3% for identifying mAA-dels that are stably tolerated by the protein. The comparative examination of the relative contribution of different features to the prediction reveals the dominant role of structural dynamics in enabling the adaptation of the mutant to mAA-del without disrupting the native fold. Full article
Show Figures

Figure 1

15 pages, 3167 KiB  
Article
UMPred-FRL: A New Approach for Accurate Prediction of Umami Peptides Using Feature Representation Learning
by Phasit Charoenkwan, Chanin Nantasenamat, Md Mehedi Hasan, Mohammad Ali Moni, Balachandran Manavalan and Watshara Shoombuatong
Int. J. Mol. Sci. 2021, 22(23), 13124; https://doi.org/10.3390/ijms222313124 - 4 Dec 2021
Cited by 54 | Viewed by 4145
Abstract
Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale [...] Read more.
Umami ingredients have been identified as important factors in food seasoning and production. Traditional experimental methods for characterizing peptides exhibiting umami sensory properties (umami peptides) are time-consuming, laborious, and costly. As a result, it is preferable to develop computational tools for the large-scale identification of available sequences in order to identify novel peptides with umami sensory properties. Although a computational tool has been developed for this purpose, its predictive performance is still insufficient. In this study, we use a feature representation learning approach to create a novel machine-learning meta-predictor called UMPred-FRL for improved umami peptide identification. We combined six well-known machine learning algorithms (extremely randomized trees, k-nearest neighbor, logistic regression, partial least squares, random forest, and support vector machine) with seven different feature encodings (amino acid composition, amphiphilic pseudo-amino acid composition, dipeptide composition, composition-transition-distribution, and pseudo-amino acid composition) to develop the final meta-predictor. Extensive experimental results demonstrated that UMPred-FRL was effective and achieved more accurate performance on the benchmark dataset compared to its baseline models, and consistently outperformed the existing method on the independent test dataset. Finally, to aid in the high-throughput identification of umami peptides, the UMPred-FRL web server was established and made freely available online. It is expected that UMPred-FRL will be a powerful tool for the cost-effective large-scale screening of candidate peptides with potential umami sensory properties. Full article
Show Figures

Figure 1

15 pages, 3732 KiB  
Article
PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction
by Gabriel Cretin, Tatiana Galochkina, Alexandre G. de Brevern and Jean-Christophe Gelly
Int. J. Mol. Sci. 2021, 22(16), 8831; https://doi.org/10.3390/ijms22168831 - 17 Aug 2021
Cited by 6 | Viewed by 3863
Abstract
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to [...] Read more.
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to protein structure alignment and protein structure prediction. In the current study, we present a new model, PYTHIA (predicting any conformation at high accuracy), for the prediction of the protein local conformations in terms of PBs directly from the amino acid sequence. PYTHIA is based on a deep residual inception-inside-inception neural network with convolutional block attention modules, predicting 1 of 16 PB classes from evolutionary information combined to physicochemical properties of individual amino acids. PYTHIA clearly outperforms the LOCUSTRA reference method for all PB classes and demonstrates great performance for PB prediction on particularly challenging proteins from the CASP14 free modelling category. Full article
Show Figures

Figure 1

15 pages, 11367 KiB  
Article
VirtualFlow Ants—Ultra-Large Virtual Screenings with Artificial Intelligence Driven Docking Algorithm Based on Ant Colony Optimization
by Christoph Gorgulla, Süleyman Selim Çınaroğlu, Patrick D. Fischer, Konstantin Fackeldey, Gerhard Wagner and Haribabu Arthanari
Int. J. Mol. Sci. 2021, 22(11), 5807; https://doi.org/10.3390/ijms22115807 - 28 May 2021
Cited by 18 | Viewed by 5040
Abstract
The docking program PLANTS, which is based on ant colony optimization (ACO) algorithm, has many advanced features for molecular docking. Among them are multiple scoring functions, the possibility to model explicit displaceable water molecules, and the inclusion of experimental constraints. Here, we add [...] Read more.
The docking program PLANTS, which is based on ant colony optimization (ACO) algorithm, has many advanced features for molecular docking. Among them are multiple scoring functions, the possibility to model explicit displaceable water molecules, and the inclusion of experimental constraints. Here, we add support of PLANTS to VirtualFlow (VirtualFlow Ants), which adds a valuable method for primary virtual screenings and rescoring procedures. Furthermore, we have added support of ligand libraries in the MOL2 format, as well as on the fly conversion of ligand libraries which are in the PDBQT format to the MOL2 format to endow VirtualFlow Ants with an increased flexibility regarding the ligand libraries. The on the fly conversion is carried out with Open Babel and the program SPORES. We applied VirtualFlow Ants to a test system involving KEAP1 on the Google Cloud up to 128,000 CPUs, and the observed scaling behavior is approximately linear. Furthermore, we have adjusted several central docking parameters of PLANTS (such as the speed parameter or the number of ants) and screened 10 million compounds for each of the 10 resulting docking scenarios. We analyzed their docking scores and average docking times, which are key factors in virtual screenings. The possibility of carrying out ultra-large virtual screening with PLANTS via VirtualFlow Ants opens new avenues in computational drug discovery. Full article
Show Figures

Figure 1

13 pages, 5241 KiB  
Article
DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method
by Samuel Godfrey Hendrix, Kuan Y. Chang, Zeezoo Ryu and Zhong-Ru Xie
Int. J. Mol. Sci. 2021, 22(11), 5510; https://doi.org/10.3390/ijms22115510 - 24 May 2021
Cited by 6 | Viewed by 4722
Abstract
It is essential for future research to develop a new, reliable prediction method of DNA binding sites because DNA binding sites on DNA-binding proteins provide critical clues about protein function and drug discovery. However, the current prediction methods of DNA binding sites have [...] Read more.
It is essential for future research to develop a new, reliable prediction method of DNA binding sites because DNA binding sites on DNA-binding proteins provide critical clues about protein function and drug discovery. However, the current prediction methods of DNA binding sites have relatively poor accuracy. Using 3D coordinates and the atom-type of surface protein atom as the input, we trained and tested a deep learning model to predict how likely a voxel on the protein surface is to be a DNA-binding site. Based on three different evaluation datasets, the results show that our model not only outperforms several previous methods on two commonly used datasets, but also demonstrates its robust performance to be consistent among the three datasets. The visualized prediction outcomes show that the binding sites are also mostly located in correct regions. We successfully built a deep learning model to predict the DNA binding sites on target proteins. It demonstrates that 3D protein structures plus atom-type information on protein surfaces can be used to predict the potential binding sites on a protein. This approach should be further extended to develop the binding sites of other important biological molecules. Full article
Show Figures

Figure 1

15 pages, 577 KiB  
Article
Drug Target Identification with Machine Learning: How to Choose Negative Examples
by Matthieu Najm, Chloé-Agathe Azencott, Benoit Playe and Véronique Stoven
Int. J. Mol. Sci. 2021, 22(10), 5118; https://doi.org/10.3390/ijms22105118 - 12 May 2021
Cited by 11 | Viewed by 4117
Abstract
Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, [...] Read more.
Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken. Full article
Show Figures

Figure 1

26 pages, 25569 KiB  
Article
Methods for Identifying Microbial Natural Product Compounds that Target Kinetoplastid RNA Structural Motifs by Homology and De Novo Modeled 18S rRNA
by Harrison Ndung’u Mwangi, Edward Kirwa Muge, Peter Waiganjo Wagacha, Albert Ndakala and Francis Jackim Mulaa
Int. J. Mol. Sci. 2021, 22(9), 4493; https://doi.org/10.3390/ijms22094493 - 26 Apr 2021
Cited by 1 | Viewed by 4074
Abstract
The development of novel anti-infectives against Kinetoplastids pathogens targeting proteins is a big problem occasioned by the antigenic variation in these parasites. This is also a global concern due to the zoonosis of these parasites, as they infect both humans and animals. Therefore, [...] Read more.
The development of novel anti-infectives against Kinetoplastids pathogens targeting proteins is a big problem occasioned by the antigenic variation in these parasites. This is also a global concern due to the zoonosis of these parasites, as they infect both humans and animals. Therefore, we need not only to create novel antibiotics, but also to speed up the development pipeline for these antibiotics. This may be achieved by using novel drug targets for Kinetoplastids drug discovery. In this study, we focused our attention on motifs of rRNA molecules that have been created using homology modeling. The RNA is the most ambiguous biopolymer in the kinetoplatid, which carries many different functions. For instance, tRNAs, rRNAs, and mRNAs are essential for gene expression both in the pro-and eukaryotes. However, all these types of RNAs have sequences with unique 3D structures that are specific for kinetoplastids only and can be used to shut down essential biochemical processes in kinetoplastids only. All these features make RNA very potent targets for antibacterial drug development. Here, we combine in silico methods combined with both computational biology and structure prediction tools to address our hypothesis. In this study, we outline a systematic approach for identifying kinetoplastid rRNA-ligand interactions and, more specifically, techniques that can be used to identify small molecules that target particular RNA. The high-resolution optimized model structures of these kineoplastids were generated using RNA 123, where all the stereochemical conflicts were solved and energies minimized to attain the best biological qualities. The high-resolution optimized model’s structures of these kinetoplastids were generated using RNA 123 where all the stereochemical conflicts were solved and energies minimized to attain the best biological qualities. These models were further analyzed to give their docking assessment reliability. Docking strategies, virtual screening, and fishing approaches successfully recognized novel and myriad macromolecular targets for the myxobacterial natural products with high binding affinities to exploit the unmet therapeutic needs. We demonstrate a sensible exploitation of virtual screening strategies to 18S rRNA using natural products interfaced with classical maximization of their efficacy in phamacognosy strategies that are well established. Integration of these virtual screening strategies in natural products chemistry and biochemistry research will spur the development of potential interventions to these tropical neglected diseases. Full article
Show Figures

Figure 1

17 pages, 1656 KiB  
Article
DockingApp RF: A State-of-the-Art Novel Scoring Function for Molecular Docking in a User-Friendly Interface to AutoDock Vina
by Gabriele Macari, Daniele Toti, Andrea Pasquadibisceglie and Fabio Polticelli
Int. J. Mol. Sci. 2020, 21(24), 9548; https://doi.org/10.3390/ijms21249548 - 15 Dec 2020
Cited by 19 | Viewed by 4879
Abstract
Motivation: Bringing a new drug to the market is expensive and time-consuming. To cut the costs and time, computer-aided drug design (CADD) approaches have been increasingly included in the drug discovery pipeline. However, despite traditional docking tools show a good conformational space sampling [...] Read more.
Motivation: Bringing a new drug to the market is expensive and time-consuming. To cut the costs and time, computer-aided drug design (CADD) approaches have been increasingly included in the drug discovery pipeline. However, despite traditional docking tools show a good conformational space sampling ability, they are still unable to produce accurate binding affinity predictions. This work presents a novel scoring function for molecular docking seamlessly integrated into DockingApp, a user-friendly graphical interface for AutoDock Vina. The proposed function is based on a random forest model and a selection of specific features to overcome the existing limits of Vina’s original scoring mechanism. A novel version of DockingApp, named DockingApp RF, has been developed to host the proposed scoring function and to automatize the rescoring procedure of the output of AutoDock Vina, even to nonexpert users. Results: By coupling intermolecular interaction, solvent accessible surface area features and Vina’s energy terms, DockingApp RF’s new scoring function is able to improve the binding affinity prediction of AutoDock Vina. Furthermore, comparison tests carried out on the CASF-2013 and CASF-2016 datasets demonstrate that DockingApp RF’s performance is comparable to other state-of-the-art machine-learning- and deep-learning-based scoring functions. The new scoring function thus represents a significant advancement in terms of the reliability and effectiveness of docking compared to AutoDock Vina’s scoring function. At the same time, the characteristics that made DockingApp appealing to a wide range of users are retained in this new version and have been complemented with additional features. Full article
Show Figures

Figure 1

Review

Jump to: Research

19 pages, 1184 KiB  
Review
Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction
by Donghyuk Suh, Jai Woo Lee, Sun Choi and Yoonji Lee
Int. J. Mol. Sci. 2021, 22(11), 6032; https://doi.org/10.3390/ijms22116032 - 2 Jun 2021
Cited by 18 | Viewed by 6248
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins’ 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their [...] Read more.
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins’ 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug–target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery. Full article
Show Figures

Figure 1

30 pages, 3308 KiB  
Review
Deep Learning-Based Advances in Protein Structure Prediction
by Subash C. Pakhrin, Bikash Shrestha, Badri Adhikari and Dukka B. KC
Int. J. Mol. Sci. 2021, 22(11), 5553; https://doi.org/10.3390/ijms22115553 - 24 May 2021
Cited by 71 | Viewed by 14286
Abstract
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known [...] Read more.
Obtaining an accurate description of protein structure is a fundamental step toward understanding the underpinning of biology. Although recent advances in experimental approaches have greatly enhanced our capabilities to experimentally determine protein structures, the gap between the number of protein sequences and known protein structures is ever increasing. Computational protein structure prediction is one of the ways to fill this gap. Recently, the protein structure prediction field has witnessed a lot of advances due to Deep Learning (DL)-based approaches as evidenced by the success of AlphaFold2 in the most recent Critical Assessment of protein Structure Prediction (CASP14). In this article, we highlight important milestones and progresses in the field of protein structure prediction due to DL-based methods as observed in CASP experiments. We describe advances in various steps of protein structure prediction pipeline viz. protein contact map prediction, protein distogram prediction, protein real-valued distance prediction, and Quality Assessment/refinement. We also highlight some end-to-end DL-based approaches for protein structure prediction approaches. Additionally, as there have been some recent DL-based advances in protein structure determination using Cryo-Electron (Cryo-EM) microscopy based, we also highlight some of the important progress in the field. Finally, we provide an outlook and possible future research directions for DL-based approaches in the protein structure prediction arena. Full article
Show Figures

Figure 1

21 pages, 2120 KiB  
Review
Comprehensive Survey and Comparative Assessment of RNA-Binding Residue Predictions with Analysis by RNA Type
by Kui Wang, Gang Hu, Zhonghua Wu, Hong Su, Jianyi Yang and Lukasz Kurgan
Int. J. Mol. Sci. 2020, 21(18), 6879; https://doi.org/10.3390/ijms21186879 - 19 Sep 2020
Cited by 14 | Viewed by 4818
Abstract
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. [...] Read more.
With close to 30 sequence-based predictors of RNA-binding residues (RBRs), this comparative survey aims to help with understanding and selection of the appropriate tools. We discuss past reviews on this topic, survey a comprehensive collection of predictors, and comparatively assess six representative methods. We provide a novel and well-designed benchmark dataset and we are the first to report and compare protein-level and datasets-level results, and to contextualize performance to specific types of RNAs. The methods considered here are well-cited and rely on machine learning algorithms on occasion combined with homology-based prediction. Empirical tests reveal that they provide relatively accurate predictions. Virtually all methods perform well for the proteins that interact with rRNAs, some generate accurate predictions for mRNAs, snRNA, SRP and IRES, while proteins that bind tRNAs are predicted poorly. Moreover, except for DRNApred, they confuse DNA and RNA-binding residues. None of the six methods consistently outperforms the others when tested on individual proteins. This variable and complementary protein-level performance suggests that users should not rely on applying just the single best dataset-level predictor. We recommend that future work should focus on the development of approaches that facilitate protein-level selection of accurate predictors and the consensus-based prediction of RBRs. Full article
Show Figures

Figure 1

Back to TopTop