Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness
Abstract
:1. Introduction
2. Materials and Methods
2.1. Privacy Protection
2.2. Data Collection
2.3. Variables Used to Develop the Diabetes Prediction Model
2.4. Construction and Validation of the DM Prediction Models
3. Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ku, E.S.; Kim, H.Y.; Suh, Y.S.; Shin, D.H.; Cho, H.Y.; Kang, M.K.; Bae, H.G. An investigation of the use of a general health examination center. J. Korean Acad. Fam. Med. 1991, 12, 52–62. [Google Scholar]
- Kim, Y.T.; Lee, W.-C.; Cho, B. National screening program for the transitional ages in Korea. J. Korean Med. Assoc. 2010, 53, 371–376. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y.S.; Lee, J.A. National health examination expansion policy. J. Korean Med. Assoc. 2017, 60, 104–107. [Google Scholar] [CrossRef] [Green Version]
- Shieh, Y.; Eklund, M.; Sawaya, G.F.; Black, W.C.; Kramer, B.S.; Esserman, L.J. Population-based screening for cancer: Hope and hype. Nat. Rev. Clin. Oncol. 2016, 13, 550–565. [Google Scholar] [CrossRef]
- Tremblay, J.; Hamet, P. Environmental and genetic contributions to diabetes. Metabolism 2019, 100, 153952. [Google Scholar] [CrossRef] [PubMed]
- Jung, C.-H.; Son, J.W.; Kang, S.; Kim, W.J.; Kim, H.S.; Kim, H.S.; Seo, M.; Shin, H.-J.; Lee, S.-S.; Jeong, S.J.; et al. Diabetes Fact Sheets in Korea, 2020: An Appraisal of Current Status. Diabetes Metab. J. 2021, 45, 1–10. [Google Scholar] [CrossRef]
- Li, R.; Zhang, P.; Barker, L.E.; Chowdhury, F.M.; Zhang, X. Cost-effectiveness of interventions to prevent and control diabetes mellitus: A systematic review. Diabetes Care 2010, 33, 1872–1894. [Google Scholar] [CrossRef] [Green Version]
- Popa, C.L.; Dobrescu, T.G.; Silvestru, C.I.; Firulescu, A.C.; Popescu, C.A.; Cotet, C.E. Pollution and weather reports: Using machine learning for combating pollution in big cities. Sensors 2021, 21, 7329. [Google Scholar] [CrossRef] [PubMed]
- Hong, S.; Park, C.; Cho, S. A Rail-Temperature-Prediction Model Based on Machine Learning: Warning of Train-Speed Restrictions Using Weather Forecasting. Sensors 2021, 21, 4606. [Google Scholar] [CrossRef]
- Marchand, A.; Marx, P. Automated product recommendations with preference-based explanations. J. Retail. 2021, 96, 328–343. [Google Scholar] [CrossRef]
- Chong, M.; Abraham, A.; Paprzycki, M. Traffic accident analysis using machine learning paradigms. Informatica 2005, 21, 89–98. [Google Scholar]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Deo, R.C. Machine learning in medicine. Circulation 2015, 132, 1920–1930. [Google Scholar] [CrossRef] [Green Version]
- Basu, S.; Johnson, K.T.; Berkowitz, S.A. Use of machine learning approaches in clinical epidemiological research of diabetes. Curr. Diabetes Rep. 2020, 20, 80. [Google Scholar] [CrossRef]
- Shin, J.; Kim, J.; Lee, C.; Yoon, J.Y.; Kim, S.; Song, S.; Kim, H.-S. Development of various diabetes prediction models using machine learning techniques. Diabetes Metab. J. 2022, 46, 650–657. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Mousa, S.R.; Bakhit, P.R.; Ishak, S. An extreme gradient boosting method for identifying the factors contributing to crash/near-crash events: A naturalistic driving study. Can. J. Civ. Eng. 2019, 46, 712–721. [Google Scholar] [CrossRef]
- Vieira, D.; Gimenez, G.; Marmerola, G.; Estima, V. XGBoost Survival Embeddings. Available online: https://zenodo.org/record/6326018#.Y3HlNORBxPZ (accessed on 16 June 2022).
- Kim, M.K.; Ko, S.-H.; Kim, B.-Y.; Kang, E.S.; Noh, J.; Kim, S.-K.; Park, S.-O.; Hur, K.Y.; Chon, S.; Moon, M.K.; et al. 2019 Clinical practice guidelines for type 2 diabetes mellitus in Korea. Diabetes Metab. J. 2019, 43, 398–406. [Google Scholar] [CrossRef] [PubMed]
- De Silva, K.; Lee, W.K.; Forbes, A.; Demmer, R.T.; Barton, C.; Enticott, J. Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis. Int. J. Med. Inform. 2020, 143, 104268. [Google Scholar] [CrossRef] [PubMed]
- Lee, Y.-h.; Bang, H.; Kim, D.J. How to establish clinical prediction models. Endocrinol. Metab. 2016, 31, 38–44. [Google Scholar] [CrossRef]
- Anyanwagu, U.; Idris, I.; Donnelly, R. Drug-induced diabetes mellitus: Evidence for statins and other drugs affecting glucose metabolism. Clin. Pharmacol. Ther. 2016, 99, 390–400. [Google Scholar] [CrossRef]
- Howard, A.A.; Arnsten, J.H.; Gourevitch, M.N. Effect of alcohol consumption on diabetes mellitus: A systematic review. Ann. Intern. Med. 2004, 140, 211–219. [Google Scholar] [CrossRef] [PubMed]
- Eliasson, B. Cigarette smoking and diabetes. Prog. Cardiovasc. Dis. 2003, 45, 405–413. [Google Scholar] [CrossRef]
- Casanova, R.; Saldana, S.; Simpson, S.L.; Lacy, M.E.; Subauste, A.R.; Blackshear, C.; Wagenknecht, L.; Bertoni, A.G. Prediction of incident diabetes in the Jackson heart study using high-dimensional machine learning. PLoS ONE 2016, 11, e0163942. [Google Scholar] [CrossRef] [Green Version]
- Nanri, A.; Nakagawa, T.; Kuwahara, K.; Yamamoto, S.; Honda, T.; Okazaki, H.; Uehara, A.; Yamamoto, M.; Miyamoto, T.; Kochi, T.; et al. Development of risk score for predicting 3-year incidence of type 2 diabetes: Japan epidemiology collaboration on occupational health study. PLoS ONE 2015, 10, e0142779. [Google Scholar] [CrossRef]
- Deberneh, H.M.; Kim, I. Prediction of type 2 diabetes based on machine learning algorithm. Int. J. Environ. Res. Public Health 2021, 18, 3317. [Google Scholar] [CrossRef]
- Norberg, M.; Eriksson, J.W.; Lindahl, B.; Andersson, C.; Rolandsson, O.; Stenlund, H.; Weinehall, L. A combination of HbA1c, fasting glucose and BMI is effective in screening for individuals at risk of future type 2 diabetes: OGTT is not needed. J. Intern. Med. 2006, 260, 263–271. [Google Scholar] [CrossRef]
- Causevic, A.; Semiz, S.; Macic-Dzankovic, A.; Cico, B.; Dujic, T.; Malenica, M.; Bego, T. Relevance of uric acid in progression of type 2 diabetes mellitus. Bosn. J. Basic Med. Sci. 2010, 10, 54–59. [Google Scholar] [CrossRef] [Green Version]
- Zhang, F.-L.; Ren, J.-X.; Zhang, P.; Jin, H.; Qu, Y.; Yu, Y.; Guo, Z.-N.; Yang, Y. Strong association of waist circumference (WC), body mass index (BMI), waist-to-height ratio (WHtR), and waist-to-hip ratio (WHR) with diabetes: A population-based cross-sectional study in Jilin province, China. J. Diabetes Res. 2021, 2021, 8812431. [Google Scholar] [CrossRef] [PubMed]
- Burton, R.F. The waist-hip ratio: A flawed index. Ann. Hum. Biol. 2020, 47, 629–631. [Google Scholar] [CrossRef]
- Li, K.; Feng, T.; Wang, L.; Chen, Y.; Zheng, P.; Pan, P.; Wang, M.; Binnay, I.T.S.; Wang, Y.; Chai, R.; et al. Causal associations of waist circumference and waist-to-hip ratio with type II diabetes mellitus: New evidence from Mendelian randomization. Mol. Genet. Genom. 2021, 296, 605–613. [Google Scholar] [CrossRef]
- Alkhalidy, H.; Orabi, A.; Alnaser, K.; Al-Shami, I.; Alzboun, T.; Obeidat, M.D.; Liu, D. Obesity measures as predictors of type 2 diabetes and cardiovascular diseases among the Jordanian population: A cross-sectional study. Int. J. Environ. Res. Public Health 2021, 18, 12187. [Google Scholar] [CrossRef]
- Qiao, Q.; Nyamdorj, R. Is the association of type II diabetes with waist circumference or waist-to-hip ratio stronger than that with body mass index? Eur. J. Clin. Nutr. 2010, 64, 30–34. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.-H.; Cai, Y.-H.; Shu, L.-P.; Yang, J.; Qi, L.; Han, M.; Zhou, J.; Simó, R.; Lecube, A. Bidirectional relationship between diabetes and pulmonary function: A systematic review and meta-analysis. Diabetes Metab. 2021, 47, 101186. [Google Scholar] [CrossRef]
- Choi, H.S.; Lee, S.W.; Kim, J.T.; Lee, H.K. The association between pulmonary functions and incident diabetes: Longitudinal analysis from the Ansung cohort in Korea. Diabetes Metab. J. 2020, 44, 699–710. [Google Scholar] [CrossRef]
- Heianza, Y.; Arase, Y.; Tsuji, H.; Saito, K.; Amakawa, K.; Hsieh, S.D.; Kodama, S.; Shimano, H.; Yamada, N.; Hara, S.; et al. Low lung function and risk of type 2 diabetes in Japanese men: The Toranomon Hospital Health Management Center Study 9 (TOPICS 9). Mayo Clin. Proc. 2012, 87, 853–861. [Google Scholar] [CrossRef] [Green Version]
- Lee, H.Y.; Shin, J.; Kim, H.; Lee, S.-H.; Cho, J.-H.; Lee, S.Y.; Kim, H.-S. Association between lung function and new-onset diabetes mellitus in healthy individuals after a 6-Year follow-up. Endocrinol. Metab. 2021, 36, 1254–1267. [Google Scholar] [CrossRef]
- Wu, J.T.; Wong, K.C.L.; Gur, Y.; Ansari, N.; Karargyris, A.; Sharma, A.; Morris, M.; Saboury, B.; Ahmad, H.; Boyko, O.; et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA Netw. Open 2020, 3, e2022779. [Google Scholar] [CrossRef]
- Hou, N.; Li, M.; He, L.; Xie, B.; Wang, L.; Zhang, R.; Yu, Y.; Sun, X.; Pan, Z.; Wang, K. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: A machine learning approach using XGboost. J. Transl. Med. 2020, 18, 462. [Google Scholar] [CrossRef] [PubMed]
- Yuan, K.-C.; Tsai, L.-W.; Lee, K.-H.; Cheng, Y.-W.; Hsu, S.-C.; Lo, Y.-S.; Chen, R.-J. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit. Int. J. Med. Inform. 2020, 141, 104176. [Google Scholar] [CrossRef]
- Inoue, T.; Ichikawa, D.; Ueno, T.; Cheong, M.; Inoue, T.; Whetstone, W.D.; Endo, T.; Nizuma, K.; Tominaga, T. XGBoost, a machine learning method, predicts neurological recovery in patients with cervical apinal cord injury. Neurotrauma Rep. 2020, 1, 8–16. [Google Scholar] [CrossRef] [PubMed]
- Noh, B.; Youm, C.; Goh, E.; Lee, M.; Park, H.; Jeon, H.; Kim, O.Y. XGBoost based machine learning approach to predict the risk of fall in older adults using gait outcomes. Sci. Rep. 2021, 11, 12183. [Google Scholar] [CrossRef] [PubMed]
- Rhee, S.Y.; Sung, J.M.; Kim, S.; Cho, I.-J.; Lee, S.-E.; Chang, H.-J. Development and validation of a deep learning based diabetes prediction system using a nationwide population-based cohort. Diabetes Metab. J. 2021, 45, 515–525. [Google Scholar] [CrossRef] [PubMed]
- Rhee, S.Y.; Chon, S.; Ahn, K.J.; Woo, J.-T. Hospital-based Korean diabetes prevention study: A prospective, multi-center, randomized, open-label controlled study. Diabetes Metab. J. 2019, 43, 49–58. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.-S.; Kim, J.H. Proceed with caution when using real world data and real world evidence. J. Korean Med. Sci. 2019, 34, e28. [Google Scholar] [CrossRef]
- Kim, H.-S.; Kim, D.-J.; Yoon, K.H. Medical big data is not yet available: Why we need realism rather than exaggeration. Endocrinol. Metab. 2019, 34, 349–354. [Google Scholar] [CrossRef]
All (n = 30,703) | Incident Diabetes (n = 1,214) | No Diabetes (n = 29,489) | |
---|---|---|---|
Age, years | 45 ± 10 | 51 ± 10 | 45 ± 10 |
Sex, female (%) | 13,578 (44.2) | 323 (26.6) | 13,255 (44.9) |
BMI, kg/m2 | 23.4 ± 3.2 | 25.8 ± 3.4 | 23.3 ± 3.1 |
Body fat percent, % | 26.1 ± 6.2 | 27.8 ± 6.3 | 26.0 ± 6.1 |
Waist-hip ratio | 0.88 ± 0.12 | 0.93 ± 0.08 | 0.88 ± 0.12 |
SBP, mmHg | 119 ± 14 | 127 ± 14 | 118 ± 14 |
DBP, mmHg | 73 ± 10 | 78 ± 11 | 73 ± 10 |
Laboratory finding | |||
FBG, mg/dL | 91 ± 12 | 109 ± 21 | 90 ± 10 |
HbA1c, % | 5.4 ± 0.3 | 6.0 ± 0.3 | 5.4 ± 0.3 |
Total cholesterol, mg/dL | 195 ± 34 | 200 ± 38 | 195 ± 33 |
Triglyceride, mg/dL | 112 ± 80 | 162 ± 109 | 110 ± 78 |
HDL-C, mg/dL | 54 ± 13 | 47 ± 11 | 54 ± 13 |
LDL-C, mg/dL | 118 ± 30 | 122 ± 34 | 118 ± 30 |
AST, IU/L | 23 ± 13 | 28 ± 15 | 23 ± 13 |
ALT, IU/L | 25 ± 21 | 36 ± 27 | 25 ± 21 |
gamma-GTP, IU/L | 34 ± 39 | 54 ± 50 | 34 ± 38 |
Amylase, IU/L | 86 ± 29 | 79 ± 30 | 87 ± 29 |
BUN, mg/dL | 13 ± 4 | 14 ± 4 | 13 ± 4 |
Creatinine, mg/dL | 0.9 ± 0.2 | 0.9 ± 0.4 | 0.9 ± 0.2 |
Sodium, mEq/L | 142 ± 2 | 142 ± 2 | 142 ± 2 |
Potassium, mEq/L | 4.2 ± 0.3 | 4.2 ± 0.3 | 4.2 ± 0.3 |
Calcium, mg/dL | 9.1 ± 0.4 | 9.2 ± 0.4 | 9.1 ± 0.4 |
Phosphate, mg/dL | 3.5 ± 0.5 | 3.5 ± 0.5 | 3.5 ± 0.5 |
Uric acid, mg/dL | 5.4 ± 1.4 | 5.9 ± 1.4 | 5.4 ± 1.4 |
TSH, uIU/ml | 2.24 ± 4.97 | 2.25 ± 3.42 | 2.24 ± 5.03 |
Hemoglobin, g/dL | 14.4 ± 1.6 | 14.9 ± 1.5 | 14.7 ± 1.6 |
Pulmonary function test | |||
FVC, % | 93.8 ± 11.0 | 90.9 ± 11.1 | 92.9 ± 10.9 |
FEV1, % | 98.6 ± 12.8 | 97.7 ± 13.3 | 98.7 ± 12.8 |
FEV1/FVC, % | 82.4 ± 6.7 | 80.7 ± 6.1 | 82.4 ± 6.8 |
FEF25–75, % | 100.1 ± 26.9 | 97.9 ± 27.8 | 100.2 ± 26.8 |
PEFR, L/s | 99.1 ± 17.5 | 99.5 ± 18.3 | 99.1 ± 17.5 |
Personal history | |||
Hypertension (%) | 3320 (11.0) | 319 (26.5) | 3001 (10.3) |
Cardiovascular diseases (%) | 449 (1.5) | 55 (4.6) | 394 (1.4) |
Cerebrovascular diseases (%) | 371 (1.2) | 20 (1.7) | 351 (1.2) |
Family history | |||
Hypertension (%) | 12,797 (42.1) | 577 (47.8) | 12,220 (41.9) |
Diabetes (%) | 8927 (29.4) | 604 (50.0) | 8323 (28.5) |
Cardiovascular diseases (%) | 5529 (18.2) | 252 (20.9) | 5277 (18.1) |
Cerebrovascular diseases (%) | 6907 (22.7) | 338 (28.0) | 6569 (22.5) |
Follow-up, year | 3.9 ± 2.5 | 5.4 ± 2.4 | 3.9 ± 2.5 |
Checkup, n | 3.7 ± 2.0 | 4.7 ± 2.3 | 3.7 ± 2.0 |
AUROC | Sensitivity | Specificity | PPV | NPV | Accuracy | ||
---|---|---|---|---|---|---|---|
Logistic regression | MI by Median | 0.547 | 0.964 | 0.529 | 0.996 | 0.098 | 0.960 |
Decision tree | MI by Median | 0.590 | 0.967 | 0.582 | 0.994 | 0.186 | 0.962 |
No MI | 0.598 | 0.968 | 0.556 | 0.993 | 0.204 | 0.962 | |
Random Forest | MI by Median | 0.524 | 0.962 | 0.696 | 0.999 | 0.050 | 0.961 |
No MI | 0.530 | 0.962 | 0.756 | 0.999 | 0.061 | 0.962 | |
XGBoost | MI by Median | 0.616 | 0.969 | 0.646 | 0.994 | 0.237 | 0.964 |
No MI | 0.623 | 0.970 | 0.690 | 0.995 | 0.250 | 0.966 |
Threshold = 0.5 | Threshold = 0.25 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Prediction Period | AUROC | Sensitivity | Specificity | PPV | NPV | Accuracy | Sensitivity | Specificity | PPV | NPV | Accuracy | |
2 year | 0.949 | 0.090 | 0.999 | 0.571 | 0.985 | 0.984 | 0.326 | 0.992 | 0.387 | 0.989 | 0.981 | |
3 year | 0.953 | 0.190 | 0.996 | 0.600 | 0.974 | 0.970 | 0.465 | 0.984 | 0.489 | 0.982 | 0.967 | |
4 year | 0.945 | 0.332 | 0.989 | 0.643 | 0.962 | 0.953 | 0.626 | 0.969 | 0.536 | 0.978 | 0.95 | |
5 year | 0.948 | 0.377 | 0.985 | 0.692 | 0.947 | 0.936 | 0.686 | 0.960 | 0.599 | 0.972 | 0.938 | |
6 year | 0.945 | 0.506 | 0.982 | 0.804 | 0.932 | 0.922 | 0.741 | 0.948 | 0.674 | 0.962 | 0.922 | |
7 year | 0.937 | 0.578 | 0.974 | 0.846 | 0.905 | 0.897 | 0.785 | 0.926 | 0.720 | 0.947 | 0.898 | |
8 year | 0.940 | 0.671 | 0.97 | 0.925 | 0.841 | 0.863 | 0.856 | 0.888 | 0.809 | 0.917 | 0.876 | |
9 year | 0.955 | 0.818 | 0.917 | 0.969 | 0.615 | 0.842 | 0.934 | 0.792 | 0.934 | 0.792 | 0.900 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shin, J.; Lee, J.; Ko, T.; Lee, K.; Choi, Y.; Kim, H.-S. Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. J. Pers. Med. 2022, 12, 1899. https://doi.org/10.3390/jpm12111899
Shin J, Lee J, Ko T, Lee K, Choi Y, Kim H-S. Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. Journal of Personalized Medicine. 2022; 12(11):1899. https://doi.org/10.3390/jpm12111899
Chicago/Turabian StyleShin, Juyoung, Joonyub Lee, Taehoon Ko, Kanghyuck Lee, Yera Choi, and Hun-Sung Kim. 2022. "Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness" Journal of Personalized Medicine 12, no. 11: 1899. https://doi.org/10.3390/jpm12111899
APA StyleShin, J., Lee, J., Ko, T., Lee, K., Choi, Y., & Kim, H. -S. (2022). Improving Machine Learning Diabetes Prediction Models for the Utmost Clinical Effectiveness. Journal of Personalized Medicine, 12(11), 1899. https://doi.org/10.3390/jpm12111899