molecules-logo

Journal Browser

Journal Browser

Computational Analysis for Protein Structure and Interaction

A special issue of Molecules (ISSN 1420-3049). This special issue belongs to the section "Bioorganic Chemistry".

Deadline for manuscript submissions: closed (31 December 2018) | Viewed by 211836

Special Issue Editor


grade E-Mail Website
Guest Editor
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
Interests: bioinformatics; parallel computing; deep learning; protein classification; genome assembly
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Protein structure analysis is a hot topic and key issue in organic chemistry and molecular biology research. Several essential protein molecules were rebuilt with Cryo-EM (Cryo-Electron Microscopy) and their structures were published in Nature and Science. Computational structure analysis and prediction is a key process for the 3D structure reconstruction. Machine learning techniques have been employed for protein secondary and tertiary structure prediction for a long time, and it seemed to have reached a bottleneck. However, the development of the Cryo-EM technique brings new challenges and requirements to computer science. Additionally, deep learning in machine learning also seems to be powerful. Therefore, there is considerable and increasing interest in developing computational methods for protein structure analysis and prediction. Moreover, new techniques on structure could also facilitate protein–protein interaction research.

The Guest Editor looks forward to collecting a set of recent advances in the related topics, to provide a platform for researchers, and bridge the gap between computer researchers and structural chemistry researchers.

Prof. Dr. Quan Zou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Molecules is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • protein structure prediction

  • protein–protein interaction network

  • Cryo-EM molecule particles boxing

  • Cryo-EM image process

  • machine learning

  • protein disorder region

  • docking

  • protein inter-residue contacts prediction

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (39 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

15 pages, 2725 KiB  
Article
Misprediction of Structural Disorder in Halophiles
by Rita Pancsa, Denes Kovacs and Peter Tompa
Molecules 2019, 24(3), 479; https://doi.org/10.3390/molecules24030479 - 29 Jan 2019
Cited by 5 | Viewed by 4242
Abstract
Whereas the concept of intrinsic disorder derives from biophysical observations of the lack of structure of proteins or protein regions under native conditions, many of our respective concepts rest on proteome-scale bioinformatics predictions. It is established that most predictors work reliably on proteins [...] Read more.
Whereas the concept of intrinsic disorder derives from biophysical observations of the lack of structure of proteins or protein regions under native conditions, many of our respective concepts rest on proteome-scale bioinformatics predictions. It is established that most predictors work reliably on proteins commonly encountered, but it is often neglected that we know very little about their performance on proteins of microorganisms that thrive in environments of extreme temperature, pH, or salt concentration, which may cause adaptive sequence composition bias. To address this issue, we predicted structural disorder for the complete proteomes of different extremophile groups by popular prediction methods and compared them to those of the reference mesophilic group. While significant deviations from mesophiles could be explained by a lack or gain of disordered regions in hyperthermophiles and radiotolerants, respectively, we found systematic overprediction in the case of halophiles. Additionally, examples were collected from the Protein Data Bank (PDB) to demonstrate misprediction and to help understand the underlying biophysical principles, i.e., halophilic proteins maintain a highly acidic and hydrophilic surface to avoid aggregation in high salt conditions. Although sparseness of data on disordered proteins from extremophiles precludes the development of dedicated general predictors, we do formulate recommendations for how to address their disorder with current bioinformatics tools. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

22 pages, 1083 KiB  
Article
Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment
by Dariusz Mrozek, Tomasz Dąbek and Bożena Małysiak-Mrozek
Molecules 2019, 24(1), 179; https://doi.org/10.3390/molecules24010179 - 5 Jan 2019
Cited by 7 | Viewed by 4241
Abstract
Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data [...] Read more.
Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

21 pages, 980 KiB  
Article
Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments
by Patrice Koehl, Henri Orland and Marc Delarue
Molecules 2019, 24(1), 104; https://doi.org/10.3390/molecules24010104 - 28 Dec 2018
Cited by 1 | Viewed by 3746
Abstract
Residues in proteins that are in close spatial proximity are more prone to covariate as their interactions are likely to be preserved due to structural and evolutionary constraints. If we can detect and quantify such covariation, physical contacts may then be predicted in [...] Read more.
Residues in proteins that are in close spatial proximity are more prone to covariate as their interactions are likely to be preserved due to structural and evolutionary constraints. If we can detect and quantify such covariation, physical contacts may then be predicted in the structure of a protein solely from the sequences that decorate it. To carry out such predictions, and following the work of others, we have implemented a multivariate Gaussian model to analyze correlation in multiple sequence alignments. We have explored and tested several numerical encodings of amino acids within this model. We have shown that 1D encodings based on amino acid biochemical and biophysical properties, as well as higher dimensional encodings computed from the principal components of experimentally derived mutation/substitution matrices, do not perform as well as a simple twenty dimensional encoding with each amino acid represented with a vector of one along its own dimension and zero elsewhere. The optimum obtained from representations based on substitution matrices is reached by using 10 to 12 principal components; the corresponding performance is less than the performance obtained with the 20-dimensional binary encoding. We highlight also the importance of the prior when constructing the multivariate Gaussian model of a multiple sequence alignment. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

14 pages, 911 KiB  
Article
Recognition of Protein Pupylation Sites by Adopting Resampling Approach
by Tao Li, Yan Chen, Taoying Li and Cangzhi Jia
Molecules 2018, 23(12), 3097; https://doi.org/10.3390/molecules23123097 - 27 Nov 2018
Cited by 4 | Viewed by 2810
Abstract
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a [...] Read more.
With the in-depth study of posttranslational modification sites, protein ubiquitination has become the key problem to study the molecular mechanism of posttranslational modification. Pupylation is a widely used process in which a prokaryotic ubiquitin-like protein (Pup) is attached to a substrate through a series of biochemical reactions. However, the experimental methods of identifying pupylation sites is often time-consuming and laborious. This study aims to propose an improved approach for predicting pupylation sites. Firstly, the Pearson correlation coefficient was used to reflect the correlation among different amino acid pairs calculated by the frequency of each amino acid. Then according to a descending ranked order, the multiple types of features were filtered separately by values of Pearson correlation coefficient. Thirdly, to get a qualified balanced dataset, the K-means principal component analysis (KPCA) oversampling technique was employed to synthesize new positive samples and Fuzzy undersampling method was employed to reduce the number of negative samples. Finally, the performance of our method was verified by means of jackknife and a 10-fold cross-validation test. The average results of 10-fold cross-validation showed that the sensitivity (Sn) was 90.53%, specificity (Sp) was 99.8%, accuracy (Acc) was 95.09%, and Matthews Correlation Coefficient (MCC) was 0.91. Moreover, an independent test dataset was used to further measure its performance, and the prediction results achieved the Acc of 83.75%, MCC of 0.49, which was superior to previous predictors. The better performance and stability of our proposed method showed it is an effective way to predict pupylation sites. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

19 pages, 3783 KiB  
Article
Prediction of GluN2B-CT1290-1310/DAPK1 Interaction by Protein–Peptide Docking and Molecular Dynamics Simulation
by Gao Tu, Tingting Fu, Fengyuan Yang, Lixia Yao, Weiwei Xue and Feng Zhu
Molecules 2018, 23(11), 3018; https://doi.org/10.3390/molecules23113018 - 19 Nov 2018
Cited by 18 | Viewed by 5682
Abstract
The interaction of death-associated protein kinase 1 (DAPK1) with the 2B subunit (GluN2B) C-terminus of N-methyl-D-aspartate receptor (NMDAR) plays a critical role in the pathophysiology of depression and is considered a potential target for the structure-based discovery of new antidepressants. However, the 3D [...] Read more.
The interaction of death-associated protein kinase 1 (DAPK1) with the 2B subunit (GluN2B) C-terminus of N-methyl-D-aspartate receptor (NMDAR) plays a critical role in the pathophysiology of depression and is considered a potential target for the structure-based discovery of new antidepressants. However, the 3D structures of C-terminus residues 1290–1310 of GluN2B (GluN2B-CT1290-1310) remain elusive and the interaction between GluN2B-CT1290-1310 and DAPK1 is unknown. In this study, the mechanism of interaction between DAPK1 and GluN2B-CT1290-1310 was predicted by computational simulation methods including protein–peptide docking and molecular dynamics (MD) simulation. Based on the equilibrated MD trajectory, the total binding free energy between GluN2B-CT1290-1310 and DAPK1 was computed by the mechanics generalized born surface area (MM/GBSA) approach. The simulation results showed that hydrophobic, van der Waals, and electrostatic interactions are responsible for the binding of GluN2B-CT1290–1310/DAPK1. Moreover, through per-residue free energy decomposition and in silico alanine scanning analysis, hotspot residues between GluN2B-CT1290-1310 and DAPK1 interface were identified. In conclusion, this work predicted the binding mode and quantitatively characterized the protein–peptide interface, which will aid in the discovery of novel drugs targeting the GluN2B-CT1290-1310 and DAPK1 interface. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

12 pages, 3869 KiB  
Article
Novel Transforming Growth Factor-Beta Receptor 1 Antagonists through a Pharmacophore-Based Virtual Screening Approach
by Junhao Jiang, Hui Zhou, Qihua Jiang, Lili Sun and Ping Deng
Molecules 2018, 23(11), 2824; https://doi.org/10.3390/molecules23112824 - 31 Oct 2018
Cited by 7 | Viewed by 3343
Abstract
As new drugs for the treatment of malignant tumors, transforming growth factor-beta receptor 1 (TGFβR1) antagonists have attracted wide attention. Based on the crystal structure of TGFβR1-BMS22 complex, the pharmacophore model A02 with two hydrogen bond acceptors (HBAs) and four hydrophobic (HYD) properties [...] Read more.
As new drugs for the treatment of malignant tumors, transforming growth factor-beta receptor 1 (TGFβR1) antagonists have attracted wide attention. Based on the crystal structure of TGFβR1-BMS22 complex, the pharmacophore model A02 with two hydrogen bond acceptors (HBAs) and four hydrophobic (HYD) properties was constructed. From the common features of active ligands reported in the literature, pharmacophore model B10 was also generated, which has two aromatic ring centers (RAs) and two HYD properties. The two models have high sensitivity and specificity to the training set, and they are highly consistent in spatial structure. Combining the two pharmacophore models, two novel skeleton structures with potential activity were selected by virtual screening from the DruglikeDiverse, MiniMaybridge, and ZINC Drug-Like databases. Four compounds (YXY01–YXY04) with potential anti-TGFβR1 activity were designed based on the new skeleton structures. In combination with Lipinski’s rules; absorption, distribution, metabolism, excretion, and toxicity (ADMET); and, toxicological properties predicted in the study, YXY01-03 with the novel skeleton, good drug-like properties, and potential activity were finally discovered and may have higher safety relative to BMS22, which may be valuable for further research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

16 pages, 9103 KiB  
Article
A Central Edge Selection Based Overlapping Community Detection Algorithm for the Detection of Overlapping Structures in Protein–Protein Interaction Networks
by Fang Zhang, Anjun Ma, Zhao Wang, Qin Ma, Bingqiang Liu, Lan Huang and Yan Wang
Molecules 2018, 23(10), 2633; https://doi.org/10.3390/molecules23102633 - 13 Oct 2018
Cited by 14 | Viewed by 3836
Abstract
Overlapping structures of protein–protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in [...] Read more.
Overlapping structures of protein–protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein–protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

22 pages, 5405 KiB  
Article
In-Silico Prediction and Modeling of the Quorum Sensing LuxS Protein and Inhibition of AI-2 Biosynthesis in Aeromonas hydrophila
by Farman Ali, Zujie Yao, Wanxin Li, Lina Sun, Wenxiong Lin and Xiangmin Lin
Molecules 2018, 23(10), 2627; https://doi.org/10.3390/molecules23102627 - 12 Oct 2018
Cited by 24 | Viewed by 5804
Abstract
luxS is conserved in several bacterial species, including A. hydrophila, which causes infections in prawn, fish, and shrimp, and is consequently a great risk to the aquaculture industry and public health. luxS plays a critical role in the biosynthesis of the autoinducer-2 (AI-2), [...] Read more.
luxS is conserved in several bacterial species, including A. hydrophila, which causes infections in prawn, fish, and shrimp, and is consequently a great risk to the aquaculture industry and public health. luxS plays a critical role in the biosynthesis of the autoinducer-2 (AI-2), which performs wide-ranging functions in bacterial communication, and especially in quorum sensing (QS). The prediction of a 3D structure of the QS-associated LuxS protein is thus essential to better understand and control A. hydrophila pathogenecity. Here, we predicted the structure of A. hydrophila LuxS and characterized it structurally and functionally with in silico methods. The predicted structure of LuxS provides a framework to develop more complete structural and functional insights and will aid the mitigation of A. hydrophila infection, and the development of novel drugs to control infections. In addition to modeling, the suitable inhibitor was identified by high through put screening (HTS) against drug like subset of ZINC database and inhibitor ((−)-Dimethyl 2,3-O-isopropylidene-l-tartrate) molecule was selected based on the best drug score. Molecular docking studies were performed to find out the best binding affinity between LuxS homologous or predicted model of LuxS protein for the ligand selection. Remarkably, this inhibitor molecule establishes agreeable interfaces with amino acid residues LYS 23, VAL 35, ILE76, and SER 90, which are found to play an essential role in inhibition mechanism. These predictions were suggesting that the proposed inhibitor molecule may be considered as drug candidates against AI-2 biosynthesis of A. hydrophila. Therefore, (−)-Dimethyl 2,3-O-isopropylidene-l-tartrate inhibitor molecule was studied to confirm its potency of AI-2 biosynthesis inhibition. The results shows that the inhibitor molecule had a better efficacy in AI-2 inhibition at 40 μM concentration, which was further validated using Western blotting at a protein expression level. The AI-2 bioluminescence assay showed that the decreased amount of AI-2 biosynthesis and downregulation of LuxS protein play an important role in the AI-2 inhibition. Lastly, these experiments were conducted with the supplementation of antibiotics via cocktail therapy of AI-2 inhibitor plus OXY antibiotics, in order to determine the possibility of novel cocktail drug treatments of A. hydrophila infection. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

17 pages, 1839 KiB  
Article
An Algorithm for Computing Side Chain Conformational Variations of a Protein Tunnel/Channel
by Udeok Seo, Ku-Jin Kim and Beom Sik Kang
Molecules 2018, 23(10), 2459; https://doi.org/10.3390/molecules23102459 - 26 Sep 2018
Viewed by 4400
Abstract
In this paper, a novel method to compute side chain conformational variations for a protein molecule tunnel (or channel) is proposed. From the conformational variations, we compute the flexibly deformed shapes of the initial tunnel, and present a way to compute the maximum [...] Read more.
In this paper, a novel method to compute side chain conformational variations for a protein molecule tunnel (or channel) is proposed. From the conformational variations, we compute the flexibly deformed shapes of the initial tunnel, and present a way to compute the maximum size of the ligand that can pass through the deformed tunnel. By using the two types of graphs corresponding to amino acids and their side chain rotamers, the suggested algorithm classifies amino acids and rotamers which possibly have collisions. Based on the divide and conquer technique, local side chain conformations are computed first, and then a global conformation is generated by combining them. With the exception of certain cases, experimental results show that the algorithm finds up to 327,680 valid side chain conformations from 128~1233 conformation candidates within three seconds. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

13 pages, 2041 KiB  
Article
A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier
by Zhe Yang, Juan Wang, Zhida Zheng and Xin Bai
Molecules 2018, 23(8), 2008; https://doi.org/10.3390/molecules23082008 - 11 Aug 2018
Cited by 6 | Viewed by 3031
Abstract
Research on cytokine recognition is of great significance in the medical field due to the fact cytokines benefit the diagnosis and treatment of diseases, but the current methods for cytokine recognition have many shortcomings, such as low sensitivity and low F-score. Therefore, this [...] Read more.
Research on cytokine recognition is of great significance in the medical field due to the fact cytokines benefit the diagnosis and treatment of diseases, but the current methods for cytokine recognition have many shortcomings, such as low sensitivity and low F-score. Therefore, this paper proposes a new method on the basis of feature combination. The features are extracted from compositions of amino acids, physicochemical properties, secondary structures, and evolutionary information. The classifier used in this paper is SVM. Experiments show that our method is better than other methods in terms of accuracy, sensitivity, specificity, F-score and Matthew’s correlation coefficient. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

13 pages, 2474 KiB  
Article
Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods
by Jiu-Xin Tan, Fu-Ying Dao, Hao Lv, Peng-Mian Feng and Hui Ding
Molecules 2018, 23(8), 2000; https://doi.org/10.3390/molecules23082000 - 10 Aug 2018
Cited by 40 | Viewed by 3945
Abstract
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for [...] Read more.
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

17 pages, 2676 KiB  
Article
NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features
by Md. Mehedi Hasan, Mst. Shamima Khatun, Md. Nurul Haque Mollah, Cao Yong and Guo Dianjing
Molecules 2018, 23(7), 1667; https://doi.org/10.3390/molecules23071667 - 9 Jul 2018
Cited by 38 | Viewed by 6242
Abstract
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation [...] Read more.
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

16 pages, 1277 KiB  
Article
Feature Selection via Swarm Intelligence for Determining Protein Essentiality
by Ming Fang, Xiujuan Lei, Shi Cheng, Yuhui Shi and Fang-Xiang Wu
Molecules 2018, 23(7), 1569; https://doi.org/10.3390/molecules23071569 - 28 Jun 2018
Cited by 8 | Viewed by 4624
Abstract
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value [...] Read more.
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value to predict protein essentiality with high accuracy using computational methods. In this study, we present a novel feature selection named Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) to determine protein essentiality. Unlike other protein essentiality prediction methods, ESFPA uses an improved swarm intelligence–based algorithm for feature selection and selects optimal features for protein essentiality prediction. The first step is to collect numerous features with the highly predictive characteristics of essentiality. The second step is to develop a feature selection strategy based on a swarm intelligence algorithm to obtain the optimal feature subset. Furthermore, an elite search mechanism is adopted to further improve the quality of feature subset. Subsequently a hybrid classifier is applied to evaluate the essentiality for each protein. Finally, the experimental results show that our method is competitive to some well-known feature selection methods. The proposed method aims to provide a new perspective for protein essentiality determination. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

11 pages, 336 KiB  
Article
Regularized Multi-View Subspace Clustering for Common Modules Across Cancer Stages
by Enli Zhang and Xiaoke Ma
Molecules 2018, 23(5), 1016; https://doi.org/10.3390/molecules23051016 - 26 Apr 2018
Cited by 12 | Viewed by 3573
Abstract
Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common [...] Read more.
Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common modules associated with cancer progression. To address this issue, we propose a novel regularized multi-view subspace clustering (rMV-spc) algorithm to obtain a representation matrix for each stage and a joint representation matrix that balances the agreement across various stages. To avoid the heterogeneity of data, the protein interaction network is incorporated into the objective of rMV-spc via regularization. Based on the interior point algorithm, we solve the optimization problem to obtain the common modules. By using artificial networks, we demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy. Furthermore, the rMV-spc discovers common modules in breast cancer networks based on the breast data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm effectively integrate heterogeneous data for dynamic modules. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

13 pages, 7770 KiB  
Article
Computational Prediction and Analysis of Associations between Small Molecules and Binding-Associated S-Nitrosylation Sites
by Guohua Huang, Jincheng Li and Chenglin Zhao
Molecules 2018, 23(4), 954; https://doi.org/10.3390/molecules23040954 - 19 Apr 2018
Cited by 5 | Viewed by 4445
Abstract
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented [...] Read more.
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented a machine learning-based method for identifying associations between small molecules and binding-associated S-nitrosylated (SNO-) proteins. Namely, small molecules were encoded by molecular fingerprint, SNO-proteins were encoded by the information entropy-based method, and the random forest was used to train a classifier. Ten-fold and leave-one-out cross validations achieved, respectively, 0.7235 and 0.7490 of the area under a receiver operating characteristic curve. Computational analysis of similarity suggested that SNO-proteins associated with the same drug shared statistically significant similarity, and vice versa. This method and finding are useful to identify drug–SNO associations and further facilitate the discovery and development of SNO-associated drugs. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

14 pages, 1125 KiB  
Article
Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features
by Tao Wang, Liping Li, Yu-An Huang, Hui Zhang, Yahong Ma and Xing Zhou
Molecules 2018, 23(4), 823; https://doi.org/10.3390/molecules23040823 - 4 Apr 2018
Cited by 21 | Viewed by 4471
Abstract
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify [...] Read more.
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify protein-protein interactions, the experimental methods are both time-consuming and costly. In addition, they yield high rates of false positive and false negative results. In addition, most of the proposed computational methods are limited in information about protein homology or the interaction marks of the protein partners. In this paper, we report a computational method only using the information from protein sequences. The main improvements come from novel protein sequence representation by combing the continuous and discrete wavelet transforms and from adopting weighted sparse representation-based classifier (WSRC). The proposed method was used to predict PPIs from three different datasets: yeast, human and H. pylori. In addition, we employed the prediction model trained on the PPIs dataset of yeast to predict the PPIs of six datasets of other species. To further evaluate the performance of the prediction model, we compared WSRC with the state-of-the-art support vector machine classifier. When predicting PPIs of yeast, humans and H. pylori dataset, we obtained high average prediction accuracies of 97.38%, 98.92% and 93.93% respectively. In the cross-species experiments, most of the prediction accuracies are over 94%. These promising results show that the proposed method is indeed capable of obtaining higher performance in PPIs detection. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

19 pages, 352 KiB  
Article
Representation Learning for Class C G Protein-Coupled Receptors Classification
by Raúl Cruz-Barbosa, Erik-German Ramos-Pérez and Jesús Giraldo
Molecules 2018, 23(3), 690; https://doi.org/10.3390/molecules23030690 - 19 Mar 2018
Cited by 3 | Viewed by 4042
Abstract
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The complete tertiary structure including both extracellular and transmembrane domains has not been determined for any member of class C GPCRs. An alternative way to work on GPCR structural models [...] Read more.
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The complete tertiary structure including both extracellular and transmembrane domains has not been determined for any member of class C GPCRs. An alternative way to work on GPCR structural models is the investigation of their functionality through the analysis of their primary structure. For this, sequence representation is a key factor for the GPCRs’ classification context, where usually, feature engineering is carried out. In this paper, we propose the use of representation learning to acquire the features that best represent the class C GPCR sequences and at the same time to obtain a model for classification automatically. Deep learning methods in conjunction with amino acid physicochemical property indices are then used for this purpose. Experimental results assessed by the classification accuracy, Matthews’ correlation coefficient and the balanced error rate show that using a hydrophobicity index and a restricted Boltzmann machine (RBM) can achieve performance results (accuracy of 92.9%) similar to those reported in the literature. As a second proposal, we combine two or more physicochemical property indices instead of only one as the input for a deep architecture in order to add information from the sequences. Experimental results show that using three hydrophobicity-related index combinations helps to improve the classification performance (accuracy of 94.1%) of an RBM better than those reported in the literature for class C GPCRs without using feature selection methods. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

14 pages, 1149 KiB  
Article
RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence
by Wen-Jun Shen, Wenjuan Cui, Danze Chen, Jieming Zhang and Jianzhen Xu
Molecules 2018, 23(3), 540; https://doi.org/10.3390/molecules23030540 - 28 Feb 2018
Cited by 10 | Viewed by 4067
Abstract
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now [...] Read more.
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now possible to measure and understand RNA-protein interactions by computational methods. In this study, we integrate a sequence-based derived kernel with regularized least squares to perform prediction. The derived kernel exploits the contextual information around an amino acid or a nucleic acid as well as the repetitive conserved motif information. We propose a novel machine learning method, called RPiRLS to predict the interaction between any RNA and protein of known sequences. For the RPiRLS classifier, each protein sequence comprises up to 20 diverse amino acids but for the RPiRLS-7G classifier, each protein sequence is represented by using 7-letter reduced alphabets based on their physiochemical properties. We evaluated both methods on a number of benchmark data sets and compared their performances with two newly developed and state-of-the-art methods, RPI-Pred and IPMiner. On the non-redundant benchmark test sets extracted from the PRIDB, the RPiRLS method outperformed RPI-Pred and IPMiner in terms of accuracy, specificity and sensitivity. Further, RPiRLS achieved an accuracy of 92% on the prediction of lncRNA-protein interactions. The proposed method can also be extended to construct RNA-protein interaction networks. The RPiRLS web server is freely available at http://bmc.med.stu.edu.cn/RPiRLS. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

10 pages, 3168 KiB  
Article
Structural Dynamics of DPP-4 and Its Influence on the Projection of Bioactive Ligands
by Simone Queiroz Pantaleão, Eric Allison Philot, Pedro Túlio De Resende-Lara, Angélica Nakagawa Lima, David Perahia, Maria Atanassova Miteva, Ana Ligia Scott and Kathia Maria Honorio
Molecules 2018, 23(2), 490; https://doi.org/10.3390/molecules23020490 - 23 Feb 2018
Cited by 29 | Viewed by 8995
Abstract
Dipeptidyl peptidase-4 (DPP-4) is a target to treat type II diabetes mellitus. Therefore, it is important to understand the structural aspects of this enzyme and its interaction with drug candidates. This study involved molecular dynamics simulations, normal mode analysis, binding site detection and [...] Read more.
Dipeptidyl peptidase-4 (DPP-4) is a target to treat type II diabetes mellitus. Therefore, it is important to understand the structural aspects of this enzyme and its interaction with drug candidates. This study involved molecular dynamics simulations, normal mode analysis, binding site detection and analysis of molecular interactions to understand the protein dynamics. We identified some DPP-4 functional motions contributing to the exposure of the binding sites and twist movements revealing how the two enzyme chains are interconnected in their bioactive form, which are defined as chains A (residues 40–767) and B (residues 40–767). By understanding the enzyme structure, its motions and the regions of its binding sites, it will be possible to contribute to the design of new DPP-4 inhibitors as drug candidates to treat diabetes. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

3988 KiB  
Article
Designability of Aromatic Interaction Networks at E. coli Bacterioferritin B-Type Channels
by Yu Zhang, Jinhua Zhou, Maziar S. Ardejani, Xun Li, Fei Wang and Brendan P. Orner
Molecules 2017, 22(12), 2184; https://doi.org/10.3390/molecules22122184 - 8 Dec 2017
Cited by 11 | Viewed by 5613
Abstract
The bacterioferritin from E. coli (BFR), a maxi-ferritin made of 24 subunits, has been utilized as a model to study the fundamentals of protein folding and self-assembly. Through structural and computational analyses, two amino acid residues at the B-site interface of BFR were [...] Read more.
The bacterioferritin from E. coli (BFR), a maxi-ferritin made of 24 subunits, has been utilized as a model to study the fundamentals of protein folding and self-assembly. Through structural and computational analyses, two amino acid residues at the B-site interface of BFR were chosen to investigate the role they play in the self-assembly of nano-cage formation, and the possibility of building aromatic interaction networks at B-type protein–protein interfaces. Three mutants were designed, expressed, purified, and characterized using transmission electron microscopy, size exclusion chromatography, native gel electrophoresis, and temperature-dependent circular dichroism spectroscopy. All of the mutants fold into α-helical structures and possess lowered thermostability. The double mutant D132W/N34W was 12 °C less stable than the wild type, and was also the only mutant for which cage-like nanostructures could not be detected in the dried, surface-immobilized conditions of transmission electron microscopy. Two mutants—N34W and D132W/N34W—only formed dimers in solution, while mutant D132W favored the 24-mer even more robustly than the wild type, suggesting that we were successful in designing proteins with enhanced assembly properties. This investigation into the structure of this important class of proteins could help to understand the self-assembly of proteins in general. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

1548 KiB  
Article
Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
by Cong Shen, Yijie Ding, Jijun Tang, Jian Song and Fei Guo
Molecules 2017, 22(12), 2079; https://doi.org/10.3390/molecules22122079 - 28 Nov 2017
Cited by 33 | Viewed by 4908
Abstract
DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have [...] Read more.
DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives M C C of 0.392 , 0.315 , 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. M C C for our method is increased by at least 0.053 , 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

1888 KiB  
Article
Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information
by Wen Zhang, Yanlin Chen and Dingfang Li
Molecules 2017, 22(12), 2056; https://doi.org/10.3390/molecules22122056 - 25 Nov 2017
Cited by 75 | Viewed by 6991
Abstract
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. [...] Read more.
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

1888 KiB  
Article
Predict the Relationship between Gene and Large Yellow Croaker’s Economic Traits
by Xiangxiang Zeng, Shuting Jin, Jing Jiang, Kunhuang Han, Xiaoping Min and Xiangrong Liu
Molecules 2017, 22(11), 1978; https://doi.org/10.3390/molecules22111978 - 16 Nov 2017
Cited by 6 | Viewed by 3999
Abstract
The importance of a gene’s impact on traits is well appreciated. Gene expression will affect the growth, immunity, reproduction and environmental resistance of some fish, and then affect the economic performance of fish-related business. Studying the connection between gene and character can help [...] Read more.
The importance of a gene’s impact on traits is well appreciated. Gene expression will affect the growth, immunity, reproduction and environmental resistance of some fish, and then affect the economic performance of fish-related business. Studying the connection between gene and character can help elucidate the growth of fishes. Thus far, a collected database containing large yellow croaker (Larimichthys crocea) genes does not exist. The gene having to do with the growth efficiency of fish will have a huge impact on research. For example, the protein encoded by the IFIH1 gene is associated with the function of viral infection in the immune system, which affects the survival rate of large yellow croakers. Thus, we collected data through the published literature and combined them with a biological genetic database related to the large yellow croaker. Based on the data, we can predict new gene–trait associations which have not yet been discovered. This work will contribute to research on the growth of large yellow croakers. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

2023 KiB  
Article
Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine
by Xiaowei Zhao, Xiaosa Zhao, Lingling Bao, Yonggang Zhang, Jiangyan Dai and Minghao Yin
Molecules 2017, 22(11), 1891; https://doi.org/10.3390/molecules22111891 - 3 Nov 2017
Cited by 17 | Viewed by 4845
Abstract
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins [...] Read more.
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews’s correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

612 KiB  
Article
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network
by Renzhi Cao, Colton Freitas, Leong Chan, Miao Sun, Haiqing Jiang and Zhangxin Chen
Molecules 2017, 22(10), 1732; https://doi.org/10.3390/molecules22101732 - 17 Oct 2017
Cited by 154 | Viewed by 10295
Abstract
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a [...] Read more.
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language “ProLan” to the protein function language “GOLan”, and build a neural machine translation model based on recurrent neural networks to translate “ProLan” language to “GOLan” language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

824 KiB  
Article
Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure
by Ricardo Corral-Corral, Jesús A. Beltrán, Carlos A. Brizuela and Gabriel Del Rio
Molecules 2017, 22(10), 1673; https://doi.org/10.3390/molecules22101673 - 9 Oct 2017
Cited by 9 | Viewed by 4716
Abstract
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis [...] Read more.
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

3639 KiB  
Article
Molecular Dynamic Simulation of Space and Earth-Grown Crystal Structures of Thermostable T1 Lipase Geobacillus zalihae Revealed a Better Structure
by Siti Nor Hasmah Ishak, Sayangku Nor Ariati Mohamad Aris, Khairul Bariyyah Abd Halim, Mohd Shukuri Mohamad Ali, Thean Chor Leow, Nor Hafizah Ahmad Kamarudin, Malihe Masomian and Raja Noor Zaliha Raja Abd Rahman
Molecules 2017, 22(10), 1574; https://doi.org/10.3390/molecules22101574 - 25 Sep 2017
Cited by 26 | Viewed by 7733
Abstract
Less sedimentation and convection in a microgravity environment has become a well-suited condition for growing high quality protein crystals. Thermostable T1 lipase derived from bacterium Geobacillus zalihae has been crystallized using the counter diffusion method under space and earth conditions. Preliminary study using [...] Read more.
Less sedimentation and convection in a microgravity environment has become a well-suited condition for growing high quality protein crystals. Thermostable T1 lipase derived from bacterium Geobacillus zalihae has been crystallized using the counter diffusion method under space and earth conditions. Preliminary study using YASARA molecular modeling structure program for both structures showed differences in number of hydrogen bond, ionic interaction, and conformation. The space-grown crystal structure contains more hydrogen bonds as compared with the earth-grown crystal structure. A molecular dynamics simulation study was used to provide insight on the fluctuations and conformational changes of both T1 lipase structures. The analysis of root mean square deviation (RMSD), radius of gyration, and root mean square fluctuation (RMSF) showed that space-grown structure is more stable than the earth-grown structure. Space-structure also showed more hydrogen bonds and ion interactions compared to the earth-grown structure. Further analysis also revealed that the space-grown structure has long-lived interactions, hence it is considered as the more stable structure. This study provides the conformational dynamics of T1 lipase crystal structure grown in space and earth condition. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

807 KiB  
Article
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
by Kaiyang Qu, Ke Han, Song Wu, Guohua Wang and Leyi Wei
Molecules 2017, 22(10), 1602; https://doi.org/10.3390/molecules22101602 - 22 Sep 2017
Cited by 32 | Viewed by 4559
Abstract
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient [...] Read more.
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

2928 KiB  
Article
Integrative Pathway Analysis of Genes and Metabolites Reveals Metabolism Abnormal Subpathway Regions and Modules in Esophageal Squamous Cell Carcinoma
by Chunquan Li, Qiuyu Wang, Jiquan Ma, Shengshu Shi, Xin Chen, Haixiu Yang and Junwei Han
Molecules 2017, 22(10), 1599; https://doi.org/10.3390/molecules22101599 - 22 Sep 2017
Cited by 12 | Viewed by 5931
Abstract
Aberrant metabolism is one of the main driving forces in the initiation and development of ESCC. Both genes and metabolites play important roles in metabolic pathways. Integrative pathway analysis of both genes and metabolites will thus help to interpret the underlying biological phenomena. [...] Read more.
Aberrant metabolism is one of the main driving forces in the initiation and development of ESCC. Both genes and metabolites play important roles in metabolic pathways. Integrative pathway analysis of both genes and metabolites will thus help to interpret the underlying biological phenomena. Here, we performed integrative pathway analysis of gene and metabolite profiles by analyzing six gene expression profiles and seven metabolite profiles of ESCC. Multiple known and novel subpathways associated with ESCC, such as ‘beta-Alanine metabolism’, were identified via the cooperative use of differential genes, differential metabolites, and their positional importance information in pathways. Furthermore, a global ESCC-Related Metabolic (ERM) network was constructed and 31 modules were identified on the basis of clustering analysis in the ERM network. We found that the three modules located just to the center regions of the ERM network—especially the core region of Module_1—primarily consisted of aldehyde dehydrogenase (ALDH) superfamily members, which contributes to the development of ESCC. For Module_4, pyruvate and the genes and metabolites in its adjacent region were clustered together, and formed a core region within the module. Several prognostic genes, including GPT, ALDH1B1, ABAT, WBSCR22 and MDH1, appeared in the three center modules of the network, suggesting that they can become potentially prognostic markers in ESCC. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

665 KiB  
Article
EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
by Xuanguo Nan, Lingling Bao, Xiaosa Zhao, Xiaowei Zhao, Arun Kumar Sangaiah, Gai-Ge Wang and Zhiqiang Ma
Molecules 2017, 22(9), 1463; https://doi.org/10.3390/molecules22091463 - 5 Sep 2017
Cited by 26 | Viewed by 4467
Abstract
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, [...] Read more.
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

978 KiB  
Article
Detection of Interactions between Proteins by Using Legendre Moments Descriptor to Extract Discriminatory Information Embedded in PSSM
by Yan-Bin Wang, Zhu-Hong You, Li-Ping Li, Yu-An Huang and Hai-Cheng Yi
Molecules 2017, 22(8), 1366; https://doi.org/10.3390/molecules22081366 - 18 Aug 2017
Cited by 28 | Viewed by 4870
Abstract
Protein-protein interactions (PPIs) play a very large part in most cellular processes. Although a great deal of research has been devoted to detecting PPIs through high-throughput technologies, these methods are clearly expensive and cumbersome. Compared with the traditional experimental methods, computational methods have [...] Read more.
Protein-protein interactions (PPIs) play a very large part in most cellular processes. Although a great deal of research has been devoted to detecting PPIs through high-throughput technologies, these methods are clearly expensive and cumbersome. Compared with the traditional experimental methods, computational methods have attracted much attention because of their good performance in detecting PPIs. In our work, a novel computational method named as PCVM-LM is proposed which combines the probabilistic classification vector machine (PCVM) model and Legendre moments (LMs) to predict PPIs from amino acid sequences. The improvement mainly comes from using the LMs to extract discriminatory information embedded in the position-specific scoring matrix (PSSM) combined with the PCVM classifier to implement prediction. The proposed method was evaluated on Yeast and Helicobacter pylori datasets with five-fold cross-validation experiments. The experimental results show that the proposed method achieves high average accuracies of 96.37% and 93.48%, respectively, which are much better than other well-known methods. To further evaluate the proposed method, we also compared the proposed method with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the same datasets. The comparison results clearly show that our method is better than the SVM-based method and other existing methods. The promising experimental results show the reliability and effectiveness of the proposed method, which can be a useful decision support tool for protein research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

4254 KiB  
Article
Predicting and Interpreting the Structure of Type IV Pilus of Electricigens by Molecular Dynamics Simulations
by Chuanjun Shu, Ke Xiao, Changchang Cao, Dewu Ding and Xiao Sun
Molecules 2017, 22(8), 1342; https://doi.org/10.3390/molecules22081342 - 12 Aug 2017
Cited by 7 | Viewed by 4964
Abstract
Nanowires that transfer electrons to extracellular acceptors are important in organic matter degradation and nutrient cycling in the environment. Geobacter pili of the group of Type IV pilus are regarded as nanowire-like biological structures. However, determination of the structure of pili remains challenging [...] Read more.
Nanowires that transfer electrons to extracellular acceptors are important in organic matter degradation and nutrient cycling in the environment. Geobacter pili of the group of Type IV pilus are regarded as nanowire-like biological structures. However, determination of the structure of pili remains challenging due to the insolubility of monomers, presence of surface appendages, heterogeneity of the assembly, and low-resolution of electron microscopy techniques. Our previous study provided a method to predict structures for Type IV pili. In this work, we improved on our previous method using molecular dynamics simulations to optimize structures of Neisseria gonorrhoeae (GC), Neisseria meningitidis and Geobacter uraniireducens pilus. Comparison between the predicted structures for GC and Neisseria meningitidis pilus and their native structures revealed that proposed method could predict Type IV pilus successfully. According to the predicted structures, the structural basis for conductivity in G.uraniireducens pili was attributed to the three N-terminal aromatic amino acids. The aromatics were interspersed within the regions of charged amino acids, which may influence the configuration of the aromatic contacts and the rate of electron transfer. These results will supplement experimental research into the mechanism of long-rang electron transport along pili of electricigens. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

5444 KiB  
Article
Neighbor Affinity-Based Core-Attachment Method to Detect Protein Complexes in Dynamic PPI Networks
by Xiujuan Lei and Jing Liang
Molecules 2017, 22(7), 1223; https://doi.org/10.3390/molecules22071223 - 24 Jul 2017
Cited by 7 | Viewed by 5804
Abstract
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods [...] Read more.
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods predict protein complexes from static PPI networks, and usually overlook the inherent dynamics and topological properties of protein complexes. In this paper, we proposed a novel method, called NABCAM (Neighbor Affinity-Based Core-Attachment Method), to identify protein complexes from dynamic PPI networks. Firstly, the centrality score of every protein is calculated. The proteins with the highest centrality scores are regarded as the seed proteins. Secondly, the seed proteins are expanded to complex cores by calculating the similarity values between the seed proteins and their neighboring proteins. Thirdly, the attachments are appended to their corresponding protein complex cores by comparing the affinity among neighbors inside the core, against that outside the core. Finally, filtering processes are carried out to obtain the final clustering result. The result in the DIP database shows that the NABCAM algorithm can predict protein complexes effectively in comparison with other state-of-the-art methods. Moreover, many protein complexes predicted by our method are biologically significant. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

798 KiB  
Article
Prediction of Drug–Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures
by Fan-Rong Meng, Zhu-Hong You, Xing Chen, Yong Zhou and Ji-Yong An
Molecules 2017, 22(7), 1119; https://doi.org/10.3390/molecules22071119 - 5 Jul 2017
Cited by 62 | Viewed by 8626
Abstract
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to [...] Read more.
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

2839 KiB  
Article
High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures
by Yuki Asako and Yoshihiro Uesawa
Molecules 2017, 22(4), 675; https://doi.org/10.3390/molecules22040675 - 23 Apr 2017
Cited by 8 | Viewed by 5477
Abstract
Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational [...] Read more.
Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational toxicology competition organized by the National Center for Advancing Translational Sciences. This competition aims to find high-performance predictive models for various adverse-outcome pathways, including the estrogen receptor. Our predictive model, which is based on the random forest method, delivered the best performance in its competition category. In the current study, the predictive performance of the random forest models was improved by strictly adjusting the hyperparameters to avoid overfitting. The random forest models were optimized from 4000 descriptors simultaneously applied to 10,000 activity assay results for the estrogen receptor ligand-binding domain, which have been measured and compiled by Tox21. Owing to the correlation between our model’s and the challenge’s results, we consider that our model currently possesses the highest predictive power on agonist activity of the estrogen receptor ligand-binding domain. Furthermore, analysis of the optimized model revealed some important features of the agonists, such as the number of hydroxyl groups in the molecules. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

Review

Jump to: Research, Other

15 pages, 1027 KiB  
Review
Machine Learning Approaches for Protein–Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment
by Siyu Liu, Chuyao Liu and Lei Deng
Molecules 2018, 23(10), 2535; https://doi.org/10.3390/molecules23102535 - 4 Oct 2018
Cited by 66 | Viewed by 9762
Abstract
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical [...] Read more.
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical for understanding the principles of protein interactions, and it has broad application prospects in protein design and drug development. Experimental methods like alanine scanning mutagenesis are labor-intensive and time-consuming. At present, the experimentally measured hot spots are very limited. Hence, the use of computational approaches to predicting hot spots is becoming increasingly important. Here, we describe the basic concepts and recent advances of machine learning applications in inferring the protein–protein interaction hot spots, and assess the performance of widely used features, machine learning algorithms, and existing state-of-the-art approaches. We also discuss the challenges and future directions in the prediction of hot spots. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

15 pages, 749 KiB  
Review
Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs
by Ke Han, Lei Zhang, Miao Wang, Rui Zhang, Chunyu Wang and Chengzhi Zhang
Molecules 2018, 23(9), 2303; https://doi.org/10.3390/molecules23092303 - 10 Sep 2018
Cited by 28 | Viewed by 6434
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese [...] Read more.
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

1485 KiB  
Review
Recent Advances in Conotoxin Classification by Using Machine Learning Methods
by Fu-Ying Dao, Hui Yang, Zhen-Dong Su, Wuritu Yang, Yun Wu, Ding Hui, Wei Chen, Hua Tang and Hao Lin
Molecules 2017, 22(7), 1057; https://doi.org/10.3390/molecules22071057 - 25 Jun 2017
Cited by 54 | Viewed by 9687
Abstract
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. In addition, conotoxins are [...] Read more.
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Graphical abstract

Other

Jump to: Research, Review

355 KiB  
Technical Note
3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures
by Ruben Sanchez-Garcia, Carlos Oscar Sanchez Sorzano, Jose Maria Carazo and Joan Segura
Molecules 2017, 22(12), 2230; https://doi.org/10.3390/molecules22122230 - 15 Dec 2017
Cited by 4 | Viewed by 4173
Abstract
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although [...] Read more.
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Show Figures

Figure 1

Back to TopTop