An Artificial Intelligence Prediction Model of Insulin Sensitivity, Insulin Resistance, and Diabetes Using Genes Obtained through Differential Expression
Abstract
:1. Introduction
2. Materials and Methods
- The deep learning multilayer perceptron (MLP). It is a special type of network totally connected to multiple individual neurons. The input layer has the same number of inputs as the total of the predictor variables, in this case, 60. The middle layer looks for characteristics associated with the data. In this case, two intermediate layers were defined with 64 nodes. The output layer had the same number of outputs as the categories to predict, in this case, three. The activation function of the last layer was “softmax”, which converts a vector of values into a probability distribution. The loss function was “categorical crossentropy”, and accuracy was the metrics [24];
- K-nearest neighbor (kNN). This kNN algorithm begins with a training dataset made up of examples that are classified into several categories, as labeled by a nominal variable. Assume that there is a test dataset containing unlabeled examples that otherwise have the same features as the training data. For each record in the test dataset, kNN identifies k records in the training data that are the “nearest” in similarity, where k is an integer specified in advance. The unlabeled test instance is assigned the class of most of the k nearest neighbors [25]. Euclidean distance was used, and the k value was 7;
- Artificial neural network (ANN). The ANN uses a network of artificial neurons or nodes to solve learning problems. In this process, two neural network models were used. In the first one, the input data were the 60 genes obtained in the ADO process; two hidden layers were used, the first with 30 nodes and the second with 20 nodes. In the second, the input variables were the seven variables whose eigenvalues were higher or close to 1 in the principal components analysis carried out with the 60 genes. In this second model, two hidden layers were also used with five and three nodes, respectively. In both cases, the activation function was the logistic function (it is the main activation function and is very important since it can be derived), and the training algorithm was “backpropagation” [25];
- Support vector machine (SVM). A support vector machine (SVM) can be imagined as a surface that defines a boundary between various data points, representing examples plotted in multidimensional space according to their feature values. The goal of an SVM is to create a flat boundary, called a hyperplane, which leads to fairly homogeneous partitions of data on either side. When the data are not linearly separable, it is necessary to use kernels or similarity functions and specify a parameter C to minimize the cost function. The most popular kernels are the linear and the Gaussian [25]. In this analysis, the SVM technique was applied twice. The first has been linear (vanilladot option), and the second the Gaussian (rbfdot option). In both cases, the parameter C took the value 1;
- Random forest (RF). This technique combines versatility and power into a single machine learning approach. Because the ensemble uses only a small, random portion of the full feature set, random forests can handle extremely large datasets, where the so-called “curse of dimensionality” might cause other models to fail. At the same time, its error rates for most learning tasks are on par with nearly any other method. Individuals are selected at random with replacement, thus forming different data sets. Subsequently, a decision tree was created with each data set so that different trees were obtained. When creating the tree, the random variables in each node of the tree, and thus, without pruning the tree, were allowed to grow. Subsequently, the new data were predicted using the majority vote, classified as positive if the majority of trees predicted the observation as positive [25]. In this analysis, the random forest included 500 trees and tested seven variables in each division;
- Random forest by fivefold cross-validation (RF-5CV). The technique is RF, but in this case, the dataset has been split into five groups. Then, four folds were used as a training data set, and the remaining one was used for testing. This process was repeated for each of the five folders. This random forest model had 500 trees and tested two variables in each division.
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Comparison | Process |
---|---|
DB vs. IR Over-representation | spleen development |
embryonic heart tube left/right pattern formation | |
left/right pattern formation | |
DB vs. IR Under-representation | mRNA catabolic process |
RNA catabolic process | |
translation | |
DB vs. IS Over-representation | mRNA metabolic process |
regulation of translation | |
posttranscriptional regulation of gene expression | |
DB vs. IS Under-representation | extracellular matrix organization |
extracellular structure organization | |
external encapsulating structure organization | |
IR vs. IS Over-representation | translational initiation |
SRP-dependent cotranslational protein-targeting membrane | |
nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | |
IR vs. IS Under-representation | external encapsulating structure organization |
urogenital system development | |
extracellular matrix organization |
References
- Mediavilla Bravo, J.J.; Alonso Fernández, M.; Moreno Moreno, A.; Carramiñana Barrera, F. Guías Clínicas Diabetes Mellitus 2015. EUROMEDICE, Ediciones Médicas, S.L. Available online: https://2016.jornadasdiabetes.com/docs/Guia_Diabetes_Semergen.pdf (accessed on 19 November 2023).
- Servicio Canario de la Salud. Estrategia de Abordaje de la Diabetes Mellitus en Canarias 2021; Servicio Canario de la Salud: Las Palmas de Gran Canaria, Spain, 2021; ISBN 978-84-16878-26_0. [Google Scholar]
- Williams, R.; Colagiuri, A.R.; Aschner Montoya, B. Atlas de la Diabetes de la FID. Fed. Int. Diabetes Suvi Karuranga Belma Malanda Pouya Saeedi Paraskevi Salpea. 2019. Available online: https://www.diabetesatlas.org/upload/resources/material/20200302_133352_2406-IDF-ATLAS-SPAN-BOOK.pdf (accessed on 19 November 2023).
- Gheibi, S.; Singh, T.; da Cunha, J.P.M.C.M.; Fex, M.; Mulder, H.; Liaw, A.; Wiener, M. Insulin/glucose-responsive cells derived from induced pluripotent stem cells: Disease modeling and treatment of diabetes. Cells 2007, 9, 2465. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Wang, J.; Cui, X.; Maianu, L.; Rhees, B.; Rosinski, J.; So, W.V.; Willi, S.M.; Osier, M.V.; Hill, H.S. The effect of insulin on expression of genes and biochemical pathways in human skeletal muscle. Endocrine 2007, 31, 5–17. [Google Scholar] [CrossRef] [PubMed]
- Huber, W.; Carey, V.J.; Gentleman, R.C.; Anders, S.; Carlson, M.; Carvalho, B.S.; Bravo, H.C.; Davis, S.; Gatto, L.; Girke, T.; et al. Protein–protein interaction in insulin signaling and the molecular mechanisms of insulin resistance. Diabetes 2015, 9, 5–32. [Google Scholar]
- Pawson, T.; Scott, J.D. Signaling through scaffold, anchoring, and adaptor proteins. Science 1997, 278, 2075–2080. [Google Scholar] [CrossRef] [PubMed]
- Brazma, A.; Hingamp, P.; Quackenbush, J.; Sherlock, G.; Spellman, P.; Stoeckert, C.; Aach, J.; Ansorge, W.; Ball, C.A.; Causton, H.C.; et al. The triumvirate: β-cell, muscle, liver: A collusion responsible for NIDDM. Diabetes 2019, 3, 2465. [Google Scholar]
- DeFronzo, R.A.; Jacot, E.; Jequier, E.; Maeder, E.; Wahren, J.; Felber, J.P. The effect of insulin on the disposal of intravenous glucose: Results from indirect calorimetry and hepatic and femoral venous catheterization. Diabetes 1981, 30, 1000–1007. [Google Scholar] [CrossRef] [PubMed]
- Sanz, R.G.; Sánchez-Pla, A. Statistical Analysis of Microarray Data. In Microarray Bioinformatics; Springer: Berlin/Heidelberg, Germany, 2019; pp. 87–121. [Google Scholar]
- Sánchez-Pla, A.; Gonzalo Sanz, R. Análisis de Datos Ómicos. Available online: https://github.com/ASPteaching/Analisis_de_datos_omicos-Materiales_para_un_curso (accessed on 19 November 2023).
- Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D.; Carlson, M. hgu95av2. db: Affymetrix Human Genome U95 Set Annotation Data (chip hgu95av2), R Packag. version 3.2.3; R Core Team: Vienna, Austria, 2016; Volume 15, pp. 3133–3181. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing 2021. Available online: https://www.R-project.org. (accessed on 19 November 2023).
- Gentleman, R.C.; Carey, V.J.; Bates, D.M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004, 5, R80. [Google Scholar] [CrossRef] [PubMed]
- Irizarry, R.A.; Hobbs, B.; Collin, F.; Beazer-Barclay, Y.D.; Antonellis, K.J.; Scherf, U.; Speed, T.P. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003, 4, 249–264. [Google Scholar] [CrossRef] [PubMed]
- Hackstadt, A.J.; Hess, A.M. Filtering for increased power for microarray data analysis. BMC Bioinform. 2009, 10, 11. [Google Scholar] [CrossRef] [PubMed]
- Chrominski, K.; Tkacz, M.; Ritchie, M.E.; Phipson, B.; Wu, D.I.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. Comparison of high-level microarray analysis methods in the context of result consistency. PLoS ONE 2015, 10, e0128845. [Google Scholar] [CrossRef] [PubMed]
- Ritchie, M.E.; Phipson, B.; Wu, D.I.; Hu, Y.; Law, C.W.; Shi, W.; Smyth, G.K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015, 43, e47. [Google Scholar] [CrossRef] [PubMed]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Falcon, S.; Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 2007, 23, 257–258. [Google Scholar] [CrossRef] [PubMed]
- Yu, G.; He, Q.-Y. ReactomePA: An R/Bioconductor package for reactome pathway analysis and visualization. Mol. Biosyst. 2016, 12, 477–479. [Google Scholar] [CrossRef] [PubMed]
- Sammut, C.; Webb, G.I. Encyclopedia of Machine Learning; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 2014, 15, 3133–3181. [Google Scholar]
- Allaire, J.J.; Chollet, F. keras: R Interface to 'Keras’. R Packag. Version. 2019, Volume 2. Available online: https://cran.r-project.org/web/packages/keras/index.html (accessed on 19 November 2023).
- Lantz, B. Machine Learning with R: Expert Techniques for Predictive Modeling; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
- Byeon, H. Exploring the risk factors of impaired fasting glucose in middle-aged population living in South Korean communities by using categorical boosting machine. Front. Endocrinol. 2022, 13, 1013162. [Google Scholar] [CrossRef] [PubMed]
- Hathaway, Q.A.; Roth, S.M.; Pinti, M.V.; Sprando, D.C.; Kunovac, A.; Durr, A.J.; Cook, C.C.; Fink, G.K.; Cheuvront, T.B.; Grossman, J.H. Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics. Cardiovasc. Diabetol. 2019, 18, 78. [Google Scholar] [CrossRef] [PubMed]
- Tonyan, Z.N.; Nasykhova, Y.A.; Danilova, M.M.; Barbitoff, Y.A.; Changalidi, A.I.; Mikhailova, A.A.; Glotov, A.S. Overview of Transcriptomic Research on Type 2 Diabetes: Challenges and Perspectives. Genes 2022, 13, 1176. [Google Scholar] [CrossRef] [PubMed]
- Bury, J.J.; Chambers, A.; Heath, P.R.; Ince, P.G.; Shaw, P.J.; Matthews, F.E.; Brayne, C.; Simpson, J.E.; Wharton, S.B. Type 2 diabetes mellitus-associated transcriptome alterations in cortical neurones and associated neurovascular unit cells in the ageing brain. Acta Neuropathol. Commun. 2021, 9, 5. [Google Scholar] [CrossRef] [PubMed]
- Kedziora, S.M.; Obermayer, B.; Sugulle, M.; Herse, F.; Kräker, K.; Haase, N.; Langmia, I.M.; Müller, D.N.; Staff, A.C.; Beule, D. Placental transcriptome profiling in subtypes of diabetic pregnancies is strongly confounded by fetal sex. Int. J. Mol. Sci. 2022, 23, 15388. [Google Scholar] [CrossRef]
N | DB vs. IR | DB vs. IS | IR vs. IS |
---|---|---|---|
1 | RAB11B | PCBD1 | SAFB |
2 | TASOR | PCGF1 | TNFAIP1 |
3 | FAP | ATP1A3 | NFIC |
4 | NEAT1 | PRKAR2A | RAB31 |
5 | LUM | ALOX12 | CR1 |
6 | VGLL1 | ATP5ME | NFATC1 |
7 | IKZF1 | SLC22A6 | RHOBTB2 |
8 | ACSL4 | GSPT1 | PLD3 |
9 | MPDZ | ACOX1 | NUP188 |
10 | XCL2 | TFR2 | RPS2 |
11 | ACTL6A | SETBP1 | RSU1 |
12 | CACNA1G | EDA | BPTF |
13 | EPHX1 | ATP5MC1 | PIN1P1 |
14 | KRT14 | PRRC2C | MPP2 |
15 | ARHGAP12 | PPP2R5E | ZNF473 |
16 | OGT | ATP5MC3 | H4C3 |
17 | NEDD4L | EXOC6B | APOA1 |
18 | RAB11A | ZNF133 | ATP6V1H |
19 | CDC27 | MAP4 | MAD2L1BP |
20 | PDE4A | SDCBP | TBC1D22A |
DB vs. IS | IR vs. IS |
---|---|
Signaling by ROBO receivers | Eukaryotic Translation Initiation |
Regulation of expression of SLITs and ROBOs | Cap-dependent Translation Initiation |
Eukaryotic Translation Initiation | GTP hydrolysis and joining of the 60S ribosomal subunit |
Cap-dependent Translation Initiation | L13a-mediated translational silencing of Ceruloplasmin expression |
L13a-mediated translational silencing of Ceruloplasmin expression | Regulation of expression of SLITs and ROBOs |
Technique | Accuracya * | Accuracyb * | Sens. | Spec. | PPV | NPV |
---|---|---|---|---|---|---|
MLP | 95.42 | 96.31 | 97.65 | 93.94 | 96.58 | 95.8 |
KNN | 85.51 | 90.65 | 98.49 | 76.88 | 88.2 | 96.68 |
ANN | 88.34 | 92.96 | 96.04 | 87.57 | 93.13 | 92.65 |
ANN-PCA | 89.01 | 91.99 | 93.32 | 89.65 | 94.05 | 88.44 |
SVM-radial | 89.55 | 93.06 | 99.54 | 81.69 | 90.51 | 99.02 |
SVM-lineal | 90.99 | 94.53 | 97.09 | 90.04 | 94.47 | 94.64 |
RF | 80.97 | 89.57 | 95.14 | 79.8 | 89.2 | 90.35 |
RF—5CV | 81.92 | 90.29 | 96.11 | 80.05 | 89.44 | 92.14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
González-Martín, J.M.; Torres-Mata, L.B.; Cazorla-Rivero, S.; Fernández-Santana, C.; Gómez-Bentolila, E.; Clavo, B.; Rodríguez-Esparragón, F. An Artificial Intelligence Prediction Model of Insulin Sensitivity, Insulin Resistance, and Diabetes Using Genes Obtained through Differential Expression. Genes 2023, 14, 2119. https://doi.org/10.3390/genes14122119
González-Martín JM, Torres-Mata LB, Cazorla-Rivero S, Fernández-Santana C, Gómez-Bentolila E, Clavo B, Rodríguez-Esparragón F. An Artificial Intelligence Prediction Model of Insulin Sensitivity, Insulin Resistance, and Diabetes Using Genes Obtained through Differential Expression. Genes. 2023; 14(12):2119. https://doi.org/10.3390/genes14122119
Chicago/Turabian StyleGonzález-Martín, Jesús María, Laura B. Torres-Mata, Sara Cazorla-Rivero, Cristina Fernández-Santana, Estrella Gómez-Bentolila, Bernardino Clavo, and Francisco Rodríguez-Esparragón. 2023. "An Artificial Intelligence Prediction Model of Insulin Sensitivity, Insulin Resistance, and Diabetes Using Genes Obtained through Differential Expression" Genes 14, no. 12: 2119. https://doi.org/10.3390/genes14122119
APA StyleGonzález-Martín, J. M., Torres-Mata, L. B., Cazorla-Rivero, S., Fernández-Santana, C., Gómez-Bentolila, E., Clavo, B., & Rodríguez-Esparragón, F. (2023). An Artificial Intelligence Prediction Model of Insulin Sensitivity, Insulin Resistance, and Diabetes Using Genes Obtained through Differential Expression. Genes, 14(12), 2119. https://doi.org/10.3390/genes14122119