TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting
Abstract
:1. Introduction
2. Results
2.1. Collection of a Comprehensive Thermophilic and Non-Thermophilic Protein Dataset
2.2. Construction of RNN Network Architecture
2.3. Analyzing Different Machine Learning Models with String Feature Engineering
2.4. Analyzing Different Machine Learning Models with Protein Descriptor Feature Engineering
2.5. Selection of Optimal Machine Learning Models
2.6. Standardization and Oversampling of Models
2.7. Cross-Validation Analysis
2.8. Analyzing the Impact of Feature Selection on Models
2.9. Performance of the TPGPred Model on Actual Thermophilic Proteins
2.10. Feature Importance Analysis
3. Discussion
4. Materials and Methods
4.1. Construction of Thermophilic and Non-Thermophilic Protein Datasets
4.2. String Feature-Engineering Construction
4.3. Feature Descriptors
4.4. Feature Standardization
4.5. Feature Selection
4.6. Random Ten-Fold Cross-Validation
4.7. Performance Evaluation
4.8. Feature Importance Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Saghatelyan, A.; Panosyan, H.; Birkeland, N.-K. The Genus Thermus: A Brief History of Cosmopolitan Extreme Thermophiles: Diversity, Distribution, Biotechnological Potential and Applications. In Microbial Communities and Their Interactions in the Extreme Environment; Springer: Singapore, 2021; pp. 141–175. [Google Scholar]
- Vieille, C.; Zeikus, G.J. Hyperthermophilic enzymes: Sources, uses, and molecular mechanisms for thermostability. Microbiol. Mol. Biol. Rev. 2001, 65, 1–43. [Google Scholar] [CrossRef] [PubMed]
- Williams, R.A.D. The genus Thermus. In Thermophilic Bacteria; CRC Press: Boca Raton, FL, USA, 2021; pp. 51–62. [Google Scholar]
- Littlechild, J.A. Enzymes from extreme environments and their industrial applications. Front. Bioeng. Biotechnol. 2015, 3, 161. [Google Scholar] [CrossRef] [PubMed]
- Satyanarayana, T.; Littlechild, J.; Kawarabayasi, Y. Thermophilic microbes in environmental and industrial biotechnology. In Biotechnology of Thermophiles; Springer Science & Business Media: Dordrecht, The Netherlands, 2013. [Google Scholar]
- Zhao, C.; Zheng, T.; Feng, Y.; Wang, X.; Zhang, L.; Hu, Q.; Chen, J.; Wu, F.; Chen, G.-Q. Engineered Halomonas spp. for production of l-Lysine and cadaverine. Bioresour. Technol. 2022, 349, 126865. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Ye, J.; Fu, J.; Chen, G.-Q. Engineering peptidoglycan degradation related genes of Bacillus subtilis for better fermentation processes. Bioresour. Technol. 2018, 248, 238–247. [Google Scholar] [CrossRef] [PubMed]
- Varghese, J.; Georrge, J.J. Structural features and industrial uses of thermostable proteins. In Recent Trends in Science and Technology-2020; America Publications: New York, NY, USA, 2020; pp. 181–189. [Google Scholar]
- Zhu, D.; Adebisi, W.A.; Ahmad, F.; Sethupathy, S.; Danso, B.; Sun, J. Recent development of extremophilic bacteria and their application in biorefinery. Front. Bioeng. Biotechnol. 2020, 8, 483. [Google Scholar] [CrossRef]
- Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869. [Google Scholar] [CrossRef]
- Wang, X.-F.; Lu, F.; Du, Z.-Y.; Li, Q.-M. Prediction model of thermophilic protein based on Stacking Method. Curr. Bioinform. 2021, 16, 1328–1340. [Google Scholar] [CrossRef]
- Wang, X.-F.; Gao, P.; Liu, Y.-F.; Li, H.-F.; Lu, F. Predicting thermophilic proteins by machine learning. Curr. Bioinform. 2020, 15, 493–502. [Google Scholar]
- Feng, C.; Ma, Z.; Yang, D.; Li, X.; Zhang, J.; Li, Y. A method for prediction of thermophilic protein based on reduced amino acids and mixed features. Front. Bioeng. Biotechnol. 2020, 8, 285. [Google Scholar] [CrossRef]
- Meng, C.; Ju, Y.; Shi, H. TMPpred: A support vector machine-based thermophilic protein identifier. Anal. Biochem. 2022, 645, 114625. [Google Scholar] [CrossRef]
- Tang, H.; Cao, R.-Z.; Wang, W.; Liu, T.-S.; Wang, L.-M.; He, C.-M. A two-step discriminated method to identify thermophilic proteins. Int. J. Biomath. 2017, 10, 1750050. [Google Scholar] [CrossRef]
- Guo, Z.; Wang, P.; Liu, Z.; Zhao, Y. Discrimination of thermophilic proteins and non-thermophilic proteins using feature dimension reduction. Front. Bioeng. Biotechnol. 2020, 8, 584807. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Chotpatiwetchkul, W.; Lee, V.S.; Nantasenamat, C.; Shoombuatong, W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci. Rep. 2021, 11, 23782. [Google Scholar] [CrossRef] [PubMed]
- Charoenkwan, P.; Schaduangrat, N.; Moni, M.A.; Manavalan, B.; Shoombuatong, W. SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput. Biol. Med. 2022, 146, 105704. [Google Scholar] [CrossRef]
- Zhao, J.; Yan, W.; Yang, Y. DeepTP: A deep learning model for thermophilic protein prediction. Int. J. Mol. Sci. 2023, 24, 2217. [Google Scholar] [CrossRef]
- Ahmed, Z.; Zulfiqar, H.; Khan, A.A.; Gul, I.; Dao, F.-Y.; Zhang, Z.-Y.; Yu, X.-L.; Tang, L. iThermo: A sequence-based model for identifying thermophilic proteins using a multi-feature fusion strategy. Front. Microbiol. 2022, 13, 790063. [Google Scholar] [CrossRef]
- Pei, H.; Li, J.; Ma, S.; Jiang, J.; Li, M.; Zou, Q.; Lv, Z. Identification of thermophilic proteins based on sequence-based bidirectional representations from transformer-embedding features. Appl. Sci. 2023, 13, 2858. [Google Scholar] [CrossRef]
- Li, M.; Wang, H.; Yang, Z.; Zhang, L.; Zhu, Y. DeepTM: A deep learning algorithm for prediction of melting temperature of thermophilic proteins directly from sequences. Comput. Struct. Biotechnol. J. 2023, 21, 5544–5560. [Google Scholar] [CrossRef]
- Ahmed, Z.; Zulfiqar, H.; Tang, L.; Lin, H. A statistical analysis of the sequence and structure of thermophilic and non-thermophilic proteins. Int. J. Mol. Sci. 2022, 23, 10116. [Google Scholar] [CrossRef]
- Shastry, K.A.; Sanjay, H.A. Machine learning for bioinformatics. In Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications; Springer: Singapore, 2020; pp. 25–39. [Google Scholar]
- Millham, R.; Agbehadji, I.E.; Yang, H. Parameter tuning onto recurrent neural network and long short-term memory (RNN-LSTM) network for feature selection in classification of high-dimensional bioinformatics datasets. In Bio-Inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing; Springer: Singapore, 2021; pp. 21–42. [Google Scholar]
- Cava, F.; Hidalgo, A.; Berenguer, J. Thermus thermophilus as biological model. Extremophiles 2009, 13, 213–231. [Google Scholar] [CrossRef]
- Tripathi, C.; Mishra, H.; Khurana, H.; Dwivedi, V.; Kamra, K.; Negi, R.K.; Lal, R. Complete genome analysis of Thermus parvatiensis and comparative genomics of Thermus spp. provide insights into genetic variability and evolution of natural competence as strategic survival attributes. Front. Microbiol. 2017, 8, 1410. [Google Scholar] [CrossRef] [PubMed]
- Babák, L.; Šupinová, P.; Burdychová, R. Growth models of Thermus aquaticus and Thermus scotoductus. Acta Univ. Agric. Silvic. Mendel. Brun. 2012, 60, 19–26. [Google Scholar] [CrossRef]
- Vajna, B.; Kanizsai, S.; Keki, Z.; Marialigeti, K.; Schumann, P.; Toth, E.M. Thermus composti sp. nov., isolated from oyster mushroom compost. Int. J. Syst. Evol. Microbiol. 2012, 62 Pt 7, 1486–1490. [Google Scholar] [CrossRef] [PubMed]
- da Costa, M.S.; Rainey, F.A. Thermaceae fam. nov. In Bergey’s Manual of Systematics of Archaea and Bacteria; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015; p. 1. [Google Scholar]
- Bjornsdottir, S.H.; Petursdottir, S.K.; Hreggvidsson, G.O.; Skirnisdottir, S.; Hjorleifsdottir, S.; Arnfinnsson, J.; Kristjansson, J.K. Thermus islandicus sp. nov., a mixotrophic sulfur-oxidizing bacterium isolated from the Torfajokull geothermal area. Int. J. Syst. Evol. Microbiol. 2009, 59, 2962–2966. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Schumacher, M.A. Structures of partition protein ParA with nonspecific DNA and ParB effector reveal molecular insights into principles governing Walker-box DNA segregation. Genes Dev. 2017, 31, 481–492. [Google Scholar] [CrossRef]
- Dunbar, K.L.; Melby, J.O.; Mitchell, D.A. YcaO domains use ATP to activate amide backbones during peptide cyclodehydrations. Nat. Chem. Biol. 2012, 8, 569–575. [Google Scholar] [CrossRef]
- Xu, S.; Huang, H.; Chen, S.; Muhammad, Z.U.A.; Wei, W.; Xie, W.; Jiang, H.; Hou, S. Recovery of 1887 metagenome-assembled genomes from the South China Sea. Sci. Data 2024, 11, 197. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, Y.; Liu, J.; Zhong, H.; Williams, B.T.; Zheng, Y.; Curson, A.R.J.; Sun, C.; Sun, H.; Song, D. Bacterial dimethylsulfoniopropionate biosynthesis in the East China Sea. Microorganisms 2021, 9, 657. [Google Scholar] [CrossRef]
- Cui, Y.; Cheng, B.; Meng, Y.; Li, C.; Yin, H.; Xu, P.; Yang, C. Expression and functional analysis of two NhaD type antiporters from the halotolerant and alkaliphilic Halomonas sp. Y2. Extremophiles 2016, 20, 631–639. [Google Scholar] [CrossRef]
- Fakhirruddin, F.; Amid, A.; Salim, W.W.A.W.; Azmi, A.S. Electricity Generation in Microbial Fuel Cell (MFC) by Bacterium Isolated from Rice Paddy Field Soil; EDP Sciences: Les Ulis, France, 2018; p. 02036. [Google Scholar]
- Silva-Solar, S.; Viver, T.; Wang, Y.; Orellana, L.H.; Knittel, K.; Amann, R. Acidimicrobiia, the actinomycetota of coastal marine sediments: Abundance, taxonomy and genomic potential. Syst. Appl. Microbiol. 2024, 47, 126555. [Google Scholar] [CrossRef]
- Kim, K.K.; Lee, K.C.; Oh, H.-M.; Lee, J.-S. Microbacterium aquimaris sp. nov., isolated from seawater. Int. J. Syst. Evol. Microbiol. 2008, 58, 1616–1620. [Google Scholar] [CrossRef] [PubMed]
- Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
- Mangalathu, S.; Hwang, S.-H.; Jeon, J.-S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
- Sayers, E.W.; Bolton, E.E.; Brister, J.R.; Canese, K.; Chan, J.; Comeau, D.C.; Connor, R.; Funk, K.; Kelly, C.; Kim, S. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022, 50, D20–D26. [Google Scholar] [CrossRef]
- Walker, A.S.; Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 2021, 61, 2560–2571. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Wang, Y.; Webb, G.I.; Smith, A.I.; Daly, R.J.; Chou, K.-C. iFeature: A python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 2018, 34, 2499–2502. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Revote, J.; Zhu, Y.; Powell, D.R.; Akutsu, T.; Webb, G.I. iLearn: An integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform. 2020, 21, 1047–1057. [Google Scholar] [CrossRef]
- Chen, Z.; Zhao, P.; Li, C.; Li, F.; Xiang, D.; Chen, Y.-Z.; Akutsu, T.; Daly, R.J.; Webb, G.I.; Zhao, Q. iLearnPlus: A comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization. Nucleic Acids Res. 2021, 49, e60. [Google Scholar] [CrossRef]
- Aksoy, S.; Haralick, R.M. Feature normalization and likelihood-based similarity measures for image retrieval. Pattern Recognit. Lett. 2001, 22, 563–582. [Google Scholar] [CrossRef]
- DeVore, G.R. Computing the Z score and centiles for cross-sectional analysis: A practical approach. J. Ultrasound Med. 2017, 36, 459–473. [Google Scholar] [CrossRef]
- Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
- Rupapara, V.; Rustam, F.; Ishaq, A.; Lee, E.; Ashraf, I. Chi-square and PCA based feature selection for diabetes detection with ensemble classifier. Intell. Autom. Soft Comput. 2023, 36, 1931–1949. [Google Scholar] [CrossRef]
- Yan, C.; Zhang, J.; Kang, X.; Gong, Z.; Wang, J.; Zhang, G. Comparison and evaluation of the combinations of feature selection and classifier on microarray data. In Proceedings of the 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA), Xiamen, China, 5–8 March 2021; pp. 133–137. [Google Scholar]
- Batina, L.; Gierlichs, B.; Prouff, E.; Rivain, M.; Standaert, F.-X.; Veyrat-Charvillon, N. Mutual information analysis: A comprehensive study. J. Cryptol. 2011, 24, 269–291. [Google Scholar] [CrossRef]
- Powell, A.; Bates, D.; Van Wyk, C.; de Abreu, D. A Cross-Comparison of Feature Selection Algorithms on Multiple Cyber Security Data-Sets; Intrusion Detection on Cyber Security Data-Sets: Cape Town, South Africa, 2019; pp. 196–207. [Google Scholar]
- Sejuti, Z.A.; Islam, M.S. A hybrid CNN–KNN approach for identification of COVID-19 with 5-fold cross validation. Sens. Int. 2023, 4, 100229. [Google Scholar] [CrossRef] [PubMed]
- Hammad, A.; Elshaer, M.; Tang, X. Identification of potential biomarkers with colorectal cancer based on bioinformatics analysis and machine learning. Math. Biosci. Eng. 2021, 18, 8997–9015. [Google Scholar] [CrossRef]
- Wen, P.; Xu, Q.; Yang, Z.; He, Y.; Huang, Q. Exploring the algorithm-dependent generalization of auprc optimization with list stability. Adv. Neural Inf. Process. Syst. 2022, 35, 28335–28349. [Google Scholar]
Application | Protein Classification | Source | Name | Quantity |
---|---|---|---|---|
Training | Thermophilic protein | NCBI | Thermus aquaticus | 2213 |
Training | Non-thermophilic protein | Sequencing in our laboratory and NCBI | Halomonas sp. TD01 | 3439 |
Testing | Thermophilic protein | NCBI | Thermus aquaticus | 245 |
Testing | Non-thermophilic protein | NCBI | Halomonas sp. TD01 | 382 |
Recall | Precision | Accuracy | F1-Score | MCC | Specificity | AUROC | AUPRC | |
---|---|---|---|---|---|---|---|---|
Bagging_CountVectorizer | 0.808163 | 0.838983 | 0.864434 | 0.823285 | 0.713715 | 0.900524 | 0.918549 | 0.89672 |
Bagging_CountVectorizer | 0.808163 | 0.838983 | 0.864434 | 0.823285 | 0.713715 | 0.900524 | 0.918549 | 0.89672 |
Bagging_CountVectorizer_n_grams | 0.906122 | 0.560606 | 0.685805 | 0.692668 | 0.455802 | 0.544503 | 0.881766 | 0.858188 |
Bagging_HashingVectorizer | 0.893878 | 0.706452 | 0.813397 | 0.789189 | 0.639855 | 0.76178 | 0.931152 | 0.926136 |
Bagging_LDA | 0.938776 | 0.651558 | 0.779904 | 0.769231 | 0.606718 | 0.67801 | 0.931141 | 0.91374 |
Bagging_LSA | 0.816327 | 0.829876 | 0.862839 | 0.823045 | 0.711143 | 0.89267 | 0.927717 | 0.900711 |
Bagging_PCA | 0.918367 | 0.650289 | 0.77512 | 0.761421 | 0.590257 | 0.683246 | 0.926648 | 0.913162 |
Bagging_t-SNE | 0.767347 | 0.767347 | 0.818182 | 0.767347 | 0.618132 | 0.850785 | 0.884945 | 0.837632 |
Bagging_Word2Vec | 0.869388 | 0.8875 | 0.905901 | 0.878351 | 0.801754 | 0.929319 | 0.962731 | 0.957469 |
Bagging_FastText | 0.518367 | 0.686486 | 0.719298 | 0.590698 | 0.392133 | 0.848168 | 0.754541 | 0.710745 |
Bagging_Doc2Vec | 0.865306 | 0.9098712 | 0.913875 | 0.8870292 | 0.8181855 | 0.9450261 | 0.9500641 | 0.9469430 |
Bagging_BERT | 0.963265 | 0.482618 | 0.582137 | 0.643052 | 0.354431 | 0.337696 | 0.862159 | 0.834961 |
GradientBoosting_CountVectorizer | 0.995918 | 0.458647 | 0.539075 | 0.628057 | 0.329304 | 0.246073 | 0.937974 | 0.913477 |
GradientBoosting_CountVectorizer_n_grams | 1 | 0.492958 | 0.598086 | 0.660377 | 0.409586 | 0.340314 | 0.955786 | 0.94162 |
GradientBoosting_HashingVectorizer | 0.918367 | 0.9740259 | 0.9585326 | 0.9453781 | 0.9130311 | 0.9842931 | 0.9851907 | 0.9846906 |
GradientBoosting_LDA | 0.963265 | 0.670455 | 0.800638 | 0.79062 | 0.648572 | 0.696335 | 0.965028 | 0.957619 |
GradientBoosting_LSA | 0.955102 | 0.735849 | 0.848485 | 0.831261 | 0.717516 | 0.780105 | 0.96584 | 0.954213 |
GradientBoosting_PCA | 0.971429 | 0.636364 | 0.77193 | 0.768982 | 0.612042 | 0.643979 | 0.969366 | 0.962006 |
GradientBoosting_t-SNE | 0.967347 | 0.642276 | 0.776715 | 0.771987 | 0.61651 | 0.65445 | 0.96848 | 0.964892 |
GradientBoosting_Word2Vec | 0.877551 | 0.972851 | 0.942584 | 0.922747 | 0.880206 | 0.984293 | 0.977743 | 0.976425 |
GradientBoosting_FastText | 0.918367 | 0.865385 | 0.912281 | 0.891089 | 0.818778 | 0.908377 | 0.966054 | 0.955491 |
GradientBoosting_Doc2Vec | 0.946939 | 0.781145 | 0.875598 | 0.856089 | 0.759064 | 0.829843 | 0.968768 | 0.965108 |
GradientBoosting_BERT | 1 | 0.427574 | 0.476874 | 0.599022 | 0.24585 | 0.141361 | 0.942141 | 0.924382 |
RandomForest_CountVectorizer | 0.865306 | 0.625369 | 0.744817 | 0.726027 | 0.521699 | 0.667539 | 0.842366 | 0.766364 |
RandomForest_CountVectorizer_n_grams | 0.897959 | 0.572917 | 0.698565 | 0.699523 | 0.469337 | 0.570681 | 0.852719 | 0.801474 |
RandomForest_HashingVectorizer | 0.795918 | 0.709091 | 0.792663 | 0.75 | 0.576688 | 0.790576 | 0.870595 | 0.821572 |
RandomForest_LDA | 0.844898 | 0.877119 | 0.893142 | 0.860707 | 0.774437 | 0.924084 | 0.939331 | 0.926095 |
RandomForest_LSA | 0.934694 | 0.636111 | 0.76555 | 0.757025 | 0.583922 | 0.657068 | 0.930516 | 0.916145 |
RandomForest_PCA | 0.767347 | 0.780083 | 0.824561 | 0.773663 | 0.630506 | 0.861257 | 0.890736 | 0.864668 |
RandomForest_t-SNE | 0.832653 | 0.766917 | 0.835726 | 0.798434 | 0.661792 | 0.837696 | 0.906454 | 0.892166 |
RandomForest_Word2Vec | 0.734694 | 0.947368 | 0.880383 | 0.827586 | 0.752223 | 0.973822 | 0.935271 | 0.930686 |
RandomForest_FastText | 0.8 | 0.933333 | 0.899522 | 0.861538 | 0.789153 | 0.963351 | 0.951469 | 0.945652 |
RandomForest_Doc2Vec | 0.873469 | 0.895397 | 0.910686 | 0.884298 | 0.811754 | 0.934555 | 0.958051 | 0.952242 |
RandomForest_BERT | 0.906122 | 0.556391 | 0.681021 | 0.689441 | 0.449098 | 0.536649 | 0.84478 | 0.77856 |
ScikitRNN_CountVectorizer | 0.971429 | 0.387622 | 0.389155 | 0.554133 | −0.04405 | 0.015707 | 0.761064 | 0.703102 |
ScikitRNN_CountVectorize r_n_grams | 0.995918 | 0.400657 | 0.416268 | 0.571429 | 0.118107 | 0.044503 | 0.726114 | 0.611873 |
ScikitRNN_HashingVectorizer | 1 | 0.391374 | 0.392344 | 0.562572 | 0.032008 | 0.002618 | 0.752821 | 0.690301 |
ScikitRNN_LDA | 0.979592 | 0.38961 | 0.392344 | 0.557491 | −0.01747 | 0.015707 | 0.726157 | 0.653844 |
ScikitRNN_LSA | 1 | 0.39075 | 0.39075 | 0.561927 | 0 | 0 | 0.611241 | 0.516229 |
ScikitRNN_PCA | 1 | 0.39075 | 0.39075 | 0.561927 | 0 | 0 | 0.726979 | 0.653761 |
ScikitRNN_t-SNE | 0.832653 | 0.392308 | 0.430622 | 0.533333 | 0.00704 | 0.172775 | 0.573234 | 0.452664 |
ScikitRNN_Word2Vec | 0.910204 | 0.401802 | 0.435407 | 0.5575 | 0.06289 | 0.13089 | 0.632001 | 0.5229 |
ScikitRNN_FastText | 0.946939 | 0.392555 | 0.406699 | 0.555024 | 0.014992 | 0.060209 | 0.61107 | 0.495664 |
ScikitRNN_Doc2Vec | 0.995918 | 0.3904 | 0.39075 | 0.56092 | −0.01267 | 0.002618 | 0.67785 | 0.589583 |
ScikitRNN_BERT | 0.987755 | 0.391586 | 0.395534 | 0.560834 | 0.014201 | 0.015707 | 0.633289 | 0.506693 |
Source | LOCUS (Version) | True Type | Predicted Type | Reference |
---|---|---|---|---|
Thermus thermophilus | WP_143586044.1 | TP | TP | [26] |
Thermus parvatiensis | WP_008631403.1 | TP | TP | [27] |
Thermus scotoductus | WP_172960035.1 | TP | TP | [28] |
Thermus composti | WP_188845765.1 | TP | TP | [29] |
Thermaceae | WP_318773468.1 | TP | TP | [30] |
Thermus islandicus | WP_245540704.1 | TP | TP | [31] |
Meiothermus sp. | WP_314136165.1 | TP | TP | [32] |
Vreelandella neptunia | WP_133729827.1 | NTP | NTP | [33] |
Pseudomonadota bacterium | MEC9020573.1 | NTP | NTP | [34] |
Gammaproteobacteria bacterium | MBR9902729.1 | NTP | NTP | [35] |
Halovibrio variabilis | WP_146875438.1 | NTP | NTP | [36] |
Natronocella acetinitrilica | WP_253485102.1 | NTP | NTP | [37] |
Actinomycetota bacterium | MDQ2621912.1 | NTP | NTP | [38] |
Microbacterium aquimaris | WP_322602619.1 | NTP | NTP | [39] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, C.; Yan, S.; Li, J. TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting. Int. J. Mol. Sci. 2024, 25, 11866. https://doi.org/10.3390/ijms252211866
Zhao C, Yan S, Li J. TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting. International Journal of Molecular Sciences. 2024; 25(22):11866. https://doi.org/10.3390/ijms252211866
Chicago/Turabian StyleZhao, Cuihuan, Shuan Yan, and Jiahang Li. 2024. "TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting" International Journal of Molecular Sciences 25, no. 22: 11866. https://doi.org/10.3390/ijms252211866
APA StyleZhao, C., Yan, S., & Li, J. (2024). TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting. International Journal of Molecular Sciences, 25(22), 11866. https://doi.org/10.3390/ijms252211866