Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes
Abstract
:1. Introduction
2. Materials and Methods
2.1. Selection of Microarray Datasets and Meta-Analysis
2.2. Enrichment Analysis of Metagenes
2.3. Selection of Key Metagenes and Their Biological Validation for T2D
2.4. GWAS Evidence for Involvement of Key Metagenes in T2D
2.5. Construction of Machine Learning Models and Their Performance Evaluation
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bonora, E.; Kiechl, S.; Willeit, J.; Oberhollenzer, F.; Egger, G.; Targher, G.; Alberiche, M.; Bonadonna, R.C.; Muggeo, M. Prevalence of insulin resistance in metabolic disorders: The Bruneck Study. Diabetes 1998, 47, 1643–1649. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Global Report on Diabetes. 2016. Available online: https://www.who.int/publications/i/item/9789241565257 (accessed on 1 October 2019).
- Meigs, J.B.; Cupples, L.A.; Wilson, P.W. Parental transmission of type 2 diabetes: The Framingham Offspring Study. Diabetes 2000, 49, 2201–2207. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Poulsen, P.; Kyvik, K.O.; Vaag, A.; Beck-Nielsen, H. Heritability of type II (non-insulin-dependent) diabetes mellitus and ab-normal glucose tolerance—A population-based twin study. Diabetologia. 1999, 42, 139–145. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fuchsberger, C.; Flannick, J.; Teslovich, T.M.; Mahajan, A.; Agarwala, V.; Gaulton, K.J.; Ma, C.; Fontanillas, P.; Moutsianas, L.; McCarthy, D.J.; et al. The genetic architecture of type 2 diabetes. Nature 2016, 536, 41–47. [Google Scholar] [CrossRef] [Green Version]
- Cantley, J.; Ashcroft, F.M. Q&A: Insulin secretion and type 2 diabetes: Why do β-cells fail? BMC Biol. 2015, 13, 1–7. [Google Scholar] [CrossRef] [Green Version]
- Saxena, A. Bioinformatics of Meta-analyses of Genomic Data. In Bioinformatics and Human Genomics Research; CRC Press: Boca Raton, FL, USA, 2021; pp. 331–334. [Google Scholar]
- Jung, J.; Kim, G.W.; Lee, W.; Mok, C.; Chung, S.H.; Jang, W. Meta- and cross-species analyses of insulin resistance based on gene expression datasets in human white adipose tissues. Sci. Rep. 2018, 8, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Saxena, A.; Sachin, K.; Bhatia, A.K. System Level Meta-analysis of Microarray Datasets for Elucidation of Diabetes Mellitus Pathobiology. Curr. Genom. 2017, 18, 298–304. [Google Scholar] [CrossRef] [Green Version]
- Saxena, A.; Sachin, K. A Network Biology Approach for Assessing the Role of Pathologic Adipose Tissues in Insulin Re-sistance Using Meta-analysis of Microarray Datasets. Curr. Genom. 2018, 19, 630–636. [Google Scholar] [CrossRef] [PubMed]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine Learning and Data Mining Methods in Diabetes Research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [Google Scholar] [CrossRef]
- Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting Diabetes Mellitus with Machine Learning Techniques. Front. Genet. 2018, 9, 515. [Google Scholar] [CrossRef]
- Tigga, N.P.; Garg, S. Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Comput. Sci. 2020, 167, 706–716. [Google Scholar] [CrossRef]
- Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. 2020. ahead of print. [Google Scholar] [CrossRef]
- Davis, S.; Meltzer, P.S. GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 2007, 23, 1846–1847. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhou, G.; Soufan, O.; Ewald, J.; Hancock, R.E.W.; Basu, N.; Xia, J. NetworkAnalyst 3.0: A visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 2019, 47, W234–W241. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression pro-files. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [Green Version]
- Liberzon, A.; Subramanian, A.; Pinchback, R.; Thorvaldsdóttir, H.; Tamayo, P.; Mesirov, J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011, 27, 1739–1740. [Google Scholar] [CrossRef]
- Merico, D.; Isserlin, R.; Stueker, O.; Emili, A.; Bader, G.D. Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation. PLOS ONE 2010, 5, e13984. [Google Scholar] [CrossRef] [Green Version]
- Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef]
- Xiong, Y.; Ling, Q.-H.; Han, F.; Liu, Q.-H. An efficient gene selection method for microarray data based on LASSO and BPSO. BMC Bioinform. 2019, 20, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kaneko, S.; Hirakawa, A.; Hamada, C. Enhancing the Lasso Approach for Developing a Survival Prediction Model Based on Gene Expression Data. Comput. Math. Methods Med. 2015, 2015, 1–7. [Google Scholar] [CrossRef]
- Fontanarosa, J.B.; Dai, Y. Using LASSO regression to detect predictive aggregate effects in genetic studies. BMC Proc. 2011, 5, S69. [Google Scholar] [CrossRef] [Green Version]
- Xiao, J.; Wang, R.; Cai, X.; Ye, Z. Coupling of co-expression network analysis and machine learning validation unearthed po-tential key genes involved in rheumatoid arthritis. Front. Genet. 2021, 12, 9. [Google Scholar] [CrossRef]
- Jourquin, J.; Duncan, D.; Shi, Z.; Zhang, B. GLAD4U: Deriving and prioritizing gene lists from PubMed literature. BMC Genom. 2012, 13 (Suppl. 8), S20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Piñero, J.; Bravo, À.; Queralt-Rosinach, N.; Gutiérrez-Sacristán, A.; Deu-Pons, J.; Centeno, E.; García-García, J.; Sanz, F.; Furlong, L.I. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017, 45, D833–D839. [Google Scholar] [CrossRef]
- Clarke, D.; Kuleshov, M.V.; Schilder, B.M.; Torre, D.; E Duffy, M.; Keenan, A.B.; Lachmann, A.; Feldmann, A.S.; Gundersen, G.W.; Silverstein, M.C.; et al. eXpression2Kinases (X2K) Web: Linking expression signatures to upstream cell signaling networks. Nucleic Acids Res. 2018, 46, W171–W179. [Google Scholar] [CrossRef] [Green Version]
- Brown, M.P.; Grundy, W.N.; Lin, D.; Cristianini, N.; Sugnet, C.W.; Furey, T.S.; Ares, M.; Haussler, D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 2000, 97, 262–267. [Google Scholar] [CrossRef] [Green Version]
- George, G.; Raj, V.C. Review on feature selection techniques and the impact of SVM for cancer classification using gene ex-pression profile. arXiv 2011, arXiv:1109.1062. [Google Scholar]
- Chen, Z.; Li, J.; Wei, L.; Xu, W.; Shi, Y. Multiple-kernel SVM based multiple-task oriented data mining system for gene expression data analysis. Expert Syst. Appl. 2011, 38, 12151–12159. [Google Scholar] [CrossRef]
- Li, W.; Yin, Y.; Quan, X.; Zhang, H. Gene Expression Value Prediction Based on XGBoost Algorithm. Front. Genet. 2019, 10, 1077. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Okun, O.; Priisalu, H. Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Girona, Spain, 6–8 June 2007; pp. 483–490. [Google Scholar]
- Chen, Y.-C.; Ke, W.-C.; Chiu, H.-W. Risk classification of cancer survival using ANN with gene expression data from multiple laboratories. Comput. Biol. Med. 2014, 48, 1–7. [Google Scholar] [CrossRef]
- Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C.; et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 2001, 7, 673–679. [Google Scholar] [CrossRef] [PubMed]
- Vohradsky, J. Neural network model of gene expression. FASEB J. 2001, 15, 846–854. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sharma, A. Skin Burn Detection using Feature Extraction. Ann. Rom. Soc. Cell Biol. 2021, 25, 1656–1662. [Google Scholar]
- Yadav, D.P.; Sharma, A.; Singh, M.; Goyal, A. Feature Extraction Based Machine Learning for Human Burn Diagnosis from Burn Images. IEEE J. Transl. Eng. Heal. Med. 2019, 7, 1–7. [Google Scholar] [CrossRef]
- Yadav, D.P.; Rathor, S. Bone fracture detection and classification using deep learning approach. In Proceedings of the 2020 International Con-ference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC), Mathura, India, 28–29 February 2020; pp. 282–285. [Google Scholar]
- Yadav, D.P.; Saini, P.; Mittal, P. Feature Optimization Based Heart Disease Prediction using Machine Learning. In Proceedings of the 2021 5th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 22–23 October 2021; pp. 1–5. [Google Scholar]
- Winnier, D.A.; Fourcaudot, M.; Norton, L.; Abdul-Ghani, M.A.; Hu, S.L.; Farook, V.S.; Coletta, D.K.; Kumar, S.; Puppala, S.; Chittoor, G.; et al. Transcriptomic Identification of ADH1B as a Novel Candidate Gene for Obesity and Insulin Resistance in Human Adipose Tissue in Mexican Americans from the Veterans Administration Genetic Epidemiology Study (VAGES). PLoS ONE 2015, 10, e0119941. [Google Scholar] [CrossRef]
- Ogata, H.; Goto, S.; Sato, K.; Fujibuchi, W.; Bono, H.; Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28, 29–34. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Uzun, H.; Yılmaz, A.; Kemik, A.; Zorba, O.U.; Kalkan, M. Association of Insulin Resistance with Overactive Bladder in Female Patients. Int. Neurourol. J. 2012, 16, 181–186. [Google Scholar] [CrossRef] [Green Version]
- Fawcett, K.A.; Barroso, I. The genetics of obesity: FTO leads the way. Trends Genet. 2010, 26, 266–274. [Google Scholar] [CrossRef] [Green Version]
- Zhao, M.; Yuan, M.M.; Yuan, L.; Huang, L.L.; Liao, J.H.; Yu, X.L.; Su, C.; Chen, Y.H.; Yang, Y.Y.; Yu, H.; et al. Chronic folate deficiency induces glucose and lipid metabolism disorders and subsequent cognitive dysfunction in mice. PLoS ONE 2018, 13, e0202910. [Google Scholar] [CrossRef]
- Wongdokmai, R.; Shantavasinkul, P.C.; Chanprasertyothin, S.; Panpunuan, P.; Matchariyakul, D.; Sritara, P.; Sirivarasai, J. The Involvement of Selenium in Type 2 Diabetes Development Related to Obesity and Low Grade Inflammation. Diabetes Metab. Syndr. Obes. Targets Ther. 2021, 14, 1669–1680. [Google Scholar] [CrossRef]
- Boachie, J.; Adaikalakoteswari, A.; Samavat, J.; Saravanan, P. Low Vitamin B12 and Lipid Metabolism: Evidence from Pre-Clinical and Clinical Studies. Nutrients 2020, 12, 1925. [Google Scholar] [CrossRef] [PubMed]
- Kay, A.M.; Simpson, C.L.; Stewart, J.A. The Role of AGE/RAGE Signaling in Diabetes-Mediated Vascular Calcification. J. Diabetes Res. 2016, 2016, 1–8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Grisanti, L.A. Diabetes and Arrhythmias: Pathophysiology, Mechanisms and Therapeutic Outcomes. Front. Physiol. 2018, 9, 1669. [Google Scholar] [CrossRef] [PubMed]
S. No | GEO Series | Tissue | Place of Study | Disease Phenotype | |
---|---|---|---|---|---|
Insulin Resistance (IR) | Normal Glucose Tolerance (NGT) | ||||
1 | GSE6798 | Skeletal muscle | Department of Hematology in Roskilde Hospital, Roskilde, Denmark. | IR = 16 | 13 |
2 | GSE15773 | Subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT) | Department of Molecular Medicine at University of Massachusetts, Worcester, MA, USA | IR = 4 (SAT) IR = 5 (VAT) | 5 (SAT) 5 (VAT) |
3 | GSE20950 | Subcutaneous adipose tissue (SAT) and visceral adipose tissue (VAT) | Department of Molecular Medicine at University of Massachusetts, Worcester, MA, USA. | IR = 9 (SAT) IR = 10 (VAT) | 10 (SAT) 10 (VAT) |
4 | GSE22309 | Skeletal muscle | Department of Biostatistics at University of Alabama, Birmingham, AL, USA | IR = 20 | 20 |
6 | GSE26637 | Subcutaneous adipose tissue | Department of Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland. | IR = 5 (Fasting) IR = 5 (Hyperinsulinemia) | 5 (Fasting) 5 (Hyperinsulinemia) |
7 | GSE34526 | Granulosa cells | Department of Zoology at University of Delhi, Delhi, India. | IR = 16 (PCOS) | 12 (PCOS) |
8 | GSE36297 | Vastus lateralis muscle | Department of Hematology, Roskilde Hospital, Roskilde, Denmark. | IR = 6 | 10 |
9 | GSE64567 | Fasted subcutaneous abdominal adipose tissue (FAT) | Department of Medicine at University of Texas, Health Sciences Center at San Antonio, San Antonio, TX, USA. | IR = 38 | 26 |
S. No. | Enrichment FDR | Genes in List | Total Genes | Functional Category |
---|---|---|---|---|
1 | 4.4E-03 | 2 | 8 | FTO Obesity Variant Mechanism |
2 | 4.4E-03 | 4 | 103 | Electron Transport Chain |
3 | 1.1E-02 | 3 | 66 | Folate Metabolism |
4 | 1.7E-02 | 3 | 85 | Selenium Micronutrient Network |
5 | 2.8E-02 | 3 | 110 | DNA Damage Response (only ATM dependent) |
6 | 2.9E-02 | 2 | 38 | Amyotrophic lateral sclerosis (ALS) |
7 | 3.0E-02 | 2 | 50 | Vitamin B12 Metabolism |
8 | 3.0E-02 | 3 | 132 | Angiopoietin Like Protein 8 Regulatory Pathway |
9 | 3.0E-02 | 2 | 52 | Translation Factors |
10 | 3.0E-02 | 3 | 155 | Myometrial Relaxation and Contraction Pathways |
11 | 3.0E-02 | 3 | 160 | Insulin Signaling |
12 | 3.0E-02 | 2 | 45 | ATM Signaling Network in Development and Disease |
13 | 3.0E-02 | 3 | 159 | Epithelial to mesenchymal transition in colorectal cancer |
14 | 3.0E-02 | 2 | 60 | Oxidative phosphorylation |
15 | 3.1E-02 | 2 | 66 | AGE/RAGE pathway |
Evaluation Indicators | LASSO | SVM | XGBoost | Random Forest | ANN |
---|---|---|---|---|---|
False Positive Rate | Type I error | 0.22 | 0.20 | 0.08 | 0.30 | 0.04 |
False Negative Rate | Type II error | 0.27 | 0.29 | 0.21 | 0.19 | 0.05 |
True Negative Rate | Specificity | 0.77 | 0.79 | 0.91 | 0.69 | 0.95 |
Negative Predictive Value | 0.69 | 0.68 | 0.81 | 0.73 | 0.95 |
False Discovery Rate | 0.19 | 0.18 | 0.09 | 0.23 | 0.051 |
True Positive Rate | Recall | Sensitivity | 0.72 | 0.70 | 0.78 | 0.80 | 0.94 |
Positive Predictive Value | Precision | 0.80 | 0.81 | 0.90 | 0.76 | 0.94 |
Accuracy | 0.74 | 0.74 | 0.85 | 0.75 | 0.95 |
F1 Score | 0.76 | 0.75 | 0.84 | 0.78 | 0.94 |
Matthews Correlation Coefficient MCC | 0.49 | 0.50 | 0.71 | 0.50 | 0.9 |
ROC AUC score | 0.75 | 0.75 | 0.85 | 0.75 | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saxena, A.; Mathur, N.; Pathak, P.; Tiwari, P.; Mathur, S.K. Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules 2023, 13, 432. https://doi.org/10.3390/biom13030432
Saxena A, Mathur N, Pathak P, Tiwari P, Mathur SK. Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules. 2023; 13(3):432. https://doi.org/10.3390/biom13030432
Chicago/Turabian StyleSaxena, Aditya, Nitish Mathur, Pooja Pathak, Pradeep Tiwari, and Sandeep Kumar Mathur. 2023. "Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes" Biomolecules 13, no. 3: 432. https://doi.org/10.3390/biom13030432
APA StyleSaxena, A., Mathur, N., Pathak, P., Tiwari, P., & Mathur, S. K. (2023). Machine Learning Model Based on Insulin Resistance Metagenes Underpins Genetic Basis of Type 2 Diabetes. Biomolecules, 13(3), 432. https://doi.org/10.3390/biom13030432