Protein Structure Prediction with AlphaFold

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 33139

Special Issue Editors


E-Mail Website
Guest Editor
Toulouse Biotechnology Institute (TBI), CNRS UMR 5504, 135 Avenue de Rangueil, Toulouse, France
Interests: enzymology; chemical biology; protein structure and dynamics; cytochrome P450; AlphaFold2
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Toulouse Biotechnology Institute (TBI), CNRS UMR 5504, 135 Avenue de Rangueil, Toulouse, France
Interests: bioinformatic; structural biology; electron transfer proteins; metabolic engineering; protein function discovery and engineering

Special Issue Information

Dear Colleagues,

AlphaFold, an AI deep learning system, was developed by DeepMind and proven to surpass all previous computing attempts for protein structure predictions. Recent versions successfully extended to the modeling of protein complexes, including, in some instances, the potential to address large-scale conformational changes involved in catalysis. The self-sufficient and fully automated end-to-end modelling process mostly differs from previous modeling approaches by the involvement of geometric complementarity rules and not energy minimization, as well as AI-supervised arbitration mechanisms when conflicting structural solutions are generated. DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create the AlphaFold Protein Structure DataBase (ADB) that covers all human and 47 other key organisms (mouse, yeast, Arabidopsis, etc.), as well as the manually curated UniProt entries (Swiss-Prot). As of April 2022, AlphaFold DB has provided open access to 992,316 protein 3D structure entries, most of them with no experimental structures available. 

The objective of this Special Issue of Biomolecules is to publish a collection of original and high-quality papers addressing any aspect of AlphaFold-related usage, properties and strengths, but also limitations or potential improvements alone or in combination with other experimental or computational approaches. Papers can also address novel biological aspects for which AlphaFold-related databases and, more generally, AI-based technologies can provide breakthroughs or novel concepts in structural biology. The emphasis can be particularly set on the structural basis of dynamic aspects controlling the functions of biological assemblies. 

Dr. Philippe Urban
Dr. Denis Pompon
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • AlphaFold2
  • AlphaFold strengths and limitations
  • AI-related methods for structural biology
  • computational and experimental complementarities
  • structural bases of molecular assembly dynamics
  • geometric versus energy-based modeling
  • arbitration mechanisms in structural modeling
  • multimer
  • AlphaFold2_advanced

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 3266 KiB  
Article
Impact of Multi-Factor Features on Protein Secondary Structure Prediction
by Benzhi Dong, Zheng Liu, Dali Xu, Chang Hou, Na Niu and Guohua Wang
Biomolecules 2024, 14(9), 1155; https://doi.org/10.3390/biom14091155 - 13 Sep 2024
Viewed by 1034
Abstract
Protein secondary structure prediction (PSSP) plays a crucial role in resolving protein functions and properties. Significant progress has been made in this field in recent years, and the use of a variety of protein-related features, including amino acid sequences, position-specific score matrices (PSSM), [...] Read more.
Protein secondary structure prediction (PSSP) plays a crucial role in resolving protein functions and properties. Significant progress has been made in this field in recent years, and the use of a variety of protein-related features, including amino acid sequences, position-specific score matrices (PSSM), amino acid properties, and secondary structure trend factors, to improve prediction accuracy is an important technical route for it. However, a comprehensive evaluation of the impact of these factor features in secondary structure prediction is lacking in the current work. This study quantitatively analyzes the impact of several major factors on secondary structure prediction models using a more explanatory four-class machine learning approach. The applicability of each factor in the different types of methods, the extent to which the different methods work on each factor, and the evaluation of the effect of multi-factor combinations are explored in detail. Through experiments and analyses, it was found that PSSM performs best in methods with strong high-dimensional features and complex feature extraction capabilities, while amino acid sequences, although performing poorly overall, perform relatively well in methods with strong linear processing capabilities. Also, the combination of amino acid properties and trend factors significantly improved the prediction performance. This study provides empirical evidence for future researchers to optimize multi-factor feature combinations and apply them to protein secondary structure prediction models, which is beneficial in further optimizing the use of these factors to enhance the performance of protein secondary structure prediction models. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

35 pages, 12505 KiB  
Article
Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality
by Brandon G. Roy, Jiyeong Choi and Marc F. Fuchs
Biomolecules 2024, 14(1), 62; https://doi.org/10.3390/biom14010062 - 2 Jan 2024
Cited by 2 | Viewed by 2498
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires [...] Read more.
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

21 pages, 9044 KiB  
Article
Evaluation of Myocilin Variant Protein Structures Modeled by AlphaFold2
by Tsz Kin Ng, Jie Ji, Qingping Liu, Yao Yao, Wen-Ying Wang, Yingjie Cao, Chong-Bo Chen, Jian-Wei Lin, Geng Dong, Ling-Ping Cen, Chukai Huang and Mingzhi Zhang
Biomolecules 2024, 14(1), 14; https://doi.org/10.3390/biom14010014 - 21 Dec 2023
Cited by 4 | Viewed by 5471
Abstract
Deep neural network-based programs can be applied to protein structure modeling by inputting amino acid sequences. Here, we aimed to evaluate the AlphaFold2-modeled myocilin wild-type and variant protein structures and compare to the experimentally determined protein structures. Molecular dynamic and ligand binding properties [...] Read more.
Deep neural network-based programs can be applied to protein structure modeling by inputting amino acid sequences. Here, we aimed to evaluate the AlphaFold2-modeled myocilin wild-type and variant protein structures and compare to the experimentally determined protein structures. Molecular dynamic and ligand binding properties of the experimentally determined and AlphaFold2-modeled protein structures were also analyzed. AlphaFold2-modeled myocilin variant protein structures showed high similarities in overall structure to the experimentally determined mutant protein structures, but the orientations and geometries of amino acid side chains were slightly different. The olfactomedin-like domain of the modeled missense variant protein structures showed fewer folding changes than the nonsense variant when compared to the predicted wild-type protein structure. Differences were also observed in molecular dynamics and ligand binding sites between the AlphaFold2-modeled and experimentally determined structures as well as between the wild-type and variant structures. In summary, the folding of the AlphaFold2-modeled MYOC variant protein structures could be similar to that determined by the experiments but with differences in amino acid side chain orientations and geometries. Careful comparisons with experimentally determined structures are needed before the applications of the in silico modeled variant protein structures. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

24 pages, 2661 KiB  
Article
AI-Based Homology Modelling of Fatty Acid Transport Protein 1 Using AlphaFold: Structural Elucidation and Molecular Dynamics Exploration
by Ranjitha Acharya, Shilpa S. Shetty, Gollapalli Pavan, Flama Monteiro, Manne Munikumar, Sriram Naresh and Nalilu Suchetha Kumari
Biomolecules 2023, 13(11), 1670; https://doi.org/10.3390/biom13111670 - 20 Nov 2023
Cited by 1 | Viewed by 2020
Abstract
Fatty acid transport protein 1 (FATP1) is an integral transmembrane protein that is involved in facilitating the translocation of long-chain fatty acids (LCFA) across the plasma membrane, thereby orchestrating the importation of LCFA into the cell. FATP1 also functions as an acyl-CoA ligase, [...] Read more.
Fatty acid transport protein 1 (FATP1) is an integral transmembrane protein that is involved in facilitating the translocation of long-chain fatty acids (LCFA) across the plasma membrane, thereby orchestrating the importation of LCFA into the cell. FATP1 also functions as an acyl-CoA ligase, catalyzing the ATP-dependent formation of fatty acyl-CoA using LCFA and VLCFA (very-long-chain fatty acids) as substrates. It is expressed in various types of tissues and is involved in the regulation of crucial signalling pathways, thus playing a vital role in numerous physiological and pathological conditions. Structural insight about FATP1 is, thus, extremely important for understanding the mechanism of action of this protein and developing efficient treatments against its anomalous expression and dysregulation, which are often associated with pathological conditions such as breast cancer. As of now, there has been no prior prediction or evaluation of the 3D configuration of the human FATP1 protein, hindering a comprehensive understanding of the distinct functional roles of its individual domains. In our pursuit to unravel the structure of the most commonly expressed isoforms of FATP1, we employed the cutting-edge ALPHAFOLD 2 model for an initial prediction of the entire protein’s structure. This prediction was complemented by molecular dynamics simulations, focusing on the most promising model. We predicted the structure of FATP1 in silico and thoroughly refined and validated it using coarse and molecular dynamics in the absence of the complete crystal structure. Their relative dynamics revealed the different properties of the characteristic FATP1. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

14 pages, 4809 KiB  
Article
Prediction of CD44 Structure by Deep Learning-Based Protein Modeling
by Chiara Camponeschi, Benedetta Righino, Davide Pirolli, Alessandro Semeraro, Francesco Ria and Maria Cristina De Rosa
Biomolecules 2023, 13(7), 1047; https://doi.org/10.3390/biom13071047 - 28 Jun 2023
Cited by 3 | Viewed by 2276
Abstract
CD44 is a cell surface glycoprotein transmembrane receptor that is involved in cell–cell and cell–matrix interactions. It crucially associates with several molecules composing the extracellular matrix, the main one of which is hyaluronic acid. It is ubiquitously expressed in various types of cells [...] Read more.
CD44 is a cell surface glycoprotein transmembrane receptor that is involved in cell–cell and cell–matrix interactions. It crucially associates with several molecules composing the extracellular matrix, the main one of which is hyaluronic acid. It is ubiquitously expressed in various types of cells and is involved in the regulation of important signaling pathways, thus playing a key role in several physiological and pathological processes. Structural information about CD44 is, therefore, fundamental for understanding the mechanism of action of this receptor and developing effective treatments against its aberrant expression and dysregulation frequently associated with pathological conditions. To date, only the structure of the hyaluronan-binding domain (HABD) of CD44 has been experimentally determined. To elucidate the nature of CD44s, the most frequently expressed isoform, we employed the recently developed deep-learning-based tools D-I-TASSER, AlphaFold2, and RoseTTAFold for an initial structural prediction of the full-length receptor, accompanied by molecular dynamics simulations on the most promising model. All three approaches correctly predicted the HABD, with AlphaFold2 outperforming D-I-TASSER and RoseTTAFold in the structural comparison with the crystallographic HABD structure and confidence in predicting the transmembrane helix. Low confidence regions were also predicted, which largely corresponded to the disordered regions of CD44s. These regions allow the receptor to perform its unconventional activity. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

20 pages, 1084 KiB  
Article
MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein–Protein Docking Conformations
by Yong Jung, Cunliang Geng, Alexandre M. J. J. Bonvin, Li C. Xue and Vasant G. Honavar
Biomolecules 2023, 13(1), 121; https://doi.org/10.3390/biom13010121 - 6 Jan 2023
Cited by 4 | Viewed by 4135
Abstract
Protein–protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable [...] Read more.
Protein–protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking—the so-called scoring problem—still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein–protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein–protein interfacial features and by using ensemble methods to combine multiple scoring functions. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Graphical abstract

32 pages, 16161 KiB  
Article
Use of an Integrated Approach Involving AlphaFold Predictions for the Evolutionary Taxonomy of Duplodnaviria Viruses
by Peter Evseev, Daria Gutnik, Mikhail Shneider and Konstantin Miroshnikov
Biomolecules 2023, 13(1), 110; https://doi.org/10.3390/biom13010110 - 5 Jan 2023
Cited by 7 | Viewed by 3982
Abstract
The evaluation of the evolutionary relationships is exceptionally important for the taxonomy of viruses, which is a rapidly expanding area of research. The classification of viral groups belonging to the realm Duplodnaviria, which include tailed bacteriophages, head-tailed archaeal viruses and herpesviruses, has [...] Read more.
The evaluation of the evolutionary relationships is exceptionally important for the taxonomy of viruses, which is a rapidly expanding area of research. The classification of viral groups belonging to the realm Duplodnaviria, which include tailed bacteriophages, head-tailed archaeal viruses and herpesviruses, has undergone many changes in recent years and continues to improve. One of the challenging tasks of Duplodnaviria taxonomy is the classification of high-ranked taxa, including families and orders. At the moment, only 17 of 50 families have been assigned to orders. The evaluation of the evolutionary relationships between viruses is complicated by the high level of divergence of viral proteins. However, the development of structure prediction algorithms, including the award-winning AlphaFold, encourages the use of the results of structural predictions to clarify the evolutionary history of viral proteins. In this study, the evolutionary relationships of two conserved viral proteins, the major capsid protein and terminase, representing different viruses, including all classified Duplodnaviria families, have been analysed using AlphaFold modelling. This analysis has been undertaken using structural comparisons and different phylogenetic methods. The results of the analyses mainly indicated the high quality of AlphaFold modelling and the possibility of using the AlphaFold predictions, together with other methods, for the reconstruction of the evolutionary relationships between distant viral groups. Based on the results of this integrated approach, assumptions have been made about refining the taxonomic classification of bacterial and archaeal Duplodnaviria groups, and problems relating to the taxonomic classification of Duplodnaviria have been discussed. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

25 pages, 5245 KiB  
Article
Evolutionary Diversity of Dus2 Enzymes Reveals Novel Structural and Functional Features among Members of the RNA Dihydrouridine Synthases Family
by Murielle Lombard, Colbie J. Reed, Ludovic Pecqueur, Bruno Faivre, Sabrine Toubdji, Claudia Sudol, Damien Brégeon, Valérie de Crécy-Lagard and Djemel Hamdane
Biomolecules 2022, 12(12), 1760; https://doi.org/10.3390/biom12121760 - 26 Nov 2022
Cited by 2 | Viewed by 2434
Abstract
Dihydrouridine (D) is an abundant modified base found in the tRNAs of most living organisms and was recently detected in eukaryotic mRNAs. This base confers significant conformational plasticity to RNA molecules. The dihydrouridine biosynthetic reaction is catalyzed by a large family of flavoenzymes, [...] Read more.
Dihydrouridine (D) is an abundant modified base found in the tRNAs of most living organisms and was recently detected in eukaryotic mRNAs. This base confers significant conformational plasticity to RNA molecules. The dihydrouridine biosynthetic reaction is catalyzed by a large family of flavoenzymes, the dihydrouridine synthases (Dus). So far, only bacterial Dus enzymes and their complexes with tRNAs have been structurally characterized. Understanding the structure-function relationships of eukaryotic Dus proteins has been hampered by the paucity of structural data. Here, we combined extensive phylogenetic analysis with high-precision 3D molecular modeling of more than 30 Dus2 enzymes selected along the tree of life to determine the evolutionary molecular basis of D biosynthesis by these enzymes. Dus2 is the eukaryotic enzyme responsible for the synthesis of D20 in tRNAs and is involved in some human cancers and in the detoxification of β-amyloid peptides in Alzheimer’s disease. In addition to the domains forming the canonical structure of all Dus, i.e., the catalytic TIM-barrel domain and the helical domain, both participating in RNA recognition in the bacterial Dus, a majority of Dus2 proteins harbor extensions at both ends. While these are mainly unstructured extensions on the N-terminal side, the C-terminal side extensions can adopt well-defined structures such as helices and beta-sheets or even form additional domains such as zinc finger domains. 3D models of Dus2/tRNA complexes were also generated. This study suggests that eukaryotic Dus2 proteins may have an advantage in tRNA recognition over their bacterial counterparts due to their modularity. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

16 pages, 3782 KiB  
Article
The Epigenetic Dimension of Protein Structure Is an Intrinsic Weakness of the AlphaFold Program
by Fodil Azzaz, Nouara Yahi, Henri Chahinian and Jacques Fantini
Biomolecules 2022, 12(10), 1527; https://doi.org/10.3390/biom12101527 - 20 Oct 2022
Cited by 26 | Viewed by 6844
Abstract
One of the most important lessons we have learned from sequencing the human genome is that not all proteins have a 3D structure. In fact, a large part of the human proteome is made up of intrinsically disordered proteins (IDPs) which can adopt [...] Read more.
One of the most important lessons we have learned from sequencing the human genome is that not all proteins have a 3D structure. In fact, a large part of the human proteome is made up of intrinsically disordered proteins (IDPs) which can adopt multiple structures, and therefore, multiple functions, depending on the ligands with which they interact. Under these conditions, one can wonder about the value of algorithms developed for predicting the structure of proteins, in particular AlphaFold, an AI which claims to have solved the problem of protein structure. In a recent study, we highlighted a particular weakness of AlphaFold for membrane proteins. Based on this observation, we have proposed a paradigm, referred to as “Epigenetic Dimension of Protein Structure” (EDPS), which takes into account all environmental parameters that control the structure of a protein beyond the amino acid sequence (hence “epigenetic”). In this new study, we compare the reliability of the AlphaFold and Robetta algorithms’ predictions for a new set of membrane proteins involved in human pathologies. We found that Robetta was generally more accurate than AlphaFold for ascribing a membrane-compatible topology. Raft lipids (e.g., gangliosides), which control the structural dynamics of membrane protein structure through chaperone effects, were identified as major actors of the EDPS paradigm. We conclude that the epigenetic dimension of a protein structure is an intrinsic weakness of AI-based protein structure prediction, especially AlphaFold, which warrants further development. Full article
(This article belongs to the Special Issue Protein Structure Prediction with AlphaFold)
Show Figures

Figure 1

Back to TopTop