Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Design and Participants
2.2. Data Collection
2.3. Definition of Outcome
2.4. Feature Selection
2.5. Machine Learning Model Development and Evaluation
2.6. Statistical Analysis
3. Results
3.1. Baseline Characteristics of Data Sets Used for the Analysis
3.2. Performance Comparison between Different Machine Learning Models
3.2.1. 1-Year Forecast Period
3.2.2. 2-Year Forecast Period
3.3. Analysis of Feature Importance
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- International Diabetes Federation. IDF Diabetes Atlas, 10th ed; International Diabetes Federation: Brussels, Belgium, 2021. [Google Scholar]
- Caspersen, C.J.; Thomas, G.D.; Boseman, L.A.; Beckles, G.L.A.; Albright, A.L. Aging, Diabetes, and the Public Health System in the United States. Am. J. Public Health 2012, 102, 1482–1497. [Google Scholar] [CrossRef]
- Shang, Y.; Marseglia, A.; Fratiglioni, L.; Welmer, A.-K.; Wang, R.; Wang, H.-X.; Xu, W. Natural History of Prediabetes in Older Adults from a Population-based Longitudinal Study. J. Intern. Med. 2019, 286, 326–340. [Google Scholar] [CrossRef] [Green Version]
- Knowler, W.C.; Barrett-Connor, E.; Fowler, S.E.; Hamman, R.F.; Lachin, J.M.; Walker, E.A.; Nathan, D.M. Reduction in the Incidence of Type 2 Diabetes with Lifestyle Intervention or Metformin. N. Engl. J. Med. 2002, 346, 393–403. [Google Scholar] [CrossRef] [PubMed]
- Obermeyer, Z.; Emanuel, E.J. Predicting the Future—Big Data, Machine Learning, and Clinical Medicine. N. Engl. J. Med. 2016, 375, 1216–1219. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Neves, A.L.; Pereira Rodrigues, P.; Mulla, A.; Glampson, B.; Willis, T.; Darzi, A.; Mayer, E. Using Electronic Health Records to Develop and Validate a Machine-Learning Tool to Predict Type 2 Diabetes Outcomes: A Study Protocol. BMJ Open 2021, 11, e046716. [Google Scholar] [CrossRef] [PubMed]
- Lama, L.; Wilhelmsson, O.; Norlander, E.; Gustafsson, L.; Lager, A.; Tynelius, P.; Wärvik, L.; Östenson, C.-G. Machine Learning for Prediction of Diabetes Risk in Middle-Aged Swedish People. Heliyon 2021, 7, e07419. [Google Scholar] [CrossRef]
- Meng, X.-H.; Huang, Y.-X.; Rao, D.-P.; Zhang, Q.; Liu, Q. Comparison of Three Data Mining Models for Predicting Diabetes or Prediabetes by Risk Factors. Kaohsiung J. Med. Sci. 2013, 29, 93–99. [Google Scholar] [CrossRef] [Green Version]
- Lutsey, P.L.; Pereira, M.A.; Bertoni, A.G.; Kandula, N.R.; Jacobs, D.R. Interactions Between Race/Ethnicity and Anthropometry in Risk of Incident Diabetes: The Multi-Ethnic Study of Atherosclerosis. Am. J. Epidemiol. 2010, 172, 197–204. [Google Scholar] [CrossRef]
- World Health Organization. International Diabetes Federation Definition and Diagnosis of Diabetes Mellitus and Intermediate Hyperglycaemia; Report of a WHO/IDF Consultation; World Health Organization: Geneva, Switzerland, 2006. [Google Scholar]
- American Diabetes Association Professional Practice Committee 2. Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2022. Diabetes Care 2021, 45, S17–S38. [Google Scholar] [CrossRef]
- Dreiseitl, S.; Ohno-Machado, L. Logistic Regression and Artificial Neural Network Classification Models: A Methodology Review. J. Biomed. Inform. 2002, 35, 352–359. [Google Scholar] [CrossRef] [Green Version]
- Podgorelec, V.; Kokol, P.; Stiglic, B.; Rozman, I. Decision Trees: An Overview and Their Use in Medicine. J. Med. Syst. 2002, 26, 445–463. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Xue, M.; Su, Y.; Li, C.; Wang, S.; Yao, H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J. Diabetes Res. 2020, 2020, 6873891. [Google Scholar] [CrossRef]
- Li, W.; Song, Y.; Chen, K.; Ying, J.; Zheng, Z.; Qiao, S.; Yang, M.; Zhang, M.; Zhang, Y. Predictive Model and Risk Analysis for Diabetic Retinopathy Using Machine Learning: A Retrospective Cohort Study in China. BMJ Open 2021, 11, e050989. [Google Scholar] [CrossRef]
- Lai, H.; Huang, H.; Keshavjee, K.; Guergachi, A.; Gao, X. Predictive Models for Diabetes Mellitus Using Machine Learning Techniques. BMC Endocr. Disord. 2019, 19, 101. [Google Scholar] [CrossRef] [Green Version]
- Wu, Y.; Hu, H.; Cai, J.; Chen, R.; Zuo, X.; Cheng, H.; Yan, D. Machine Learning for Predicting the 3-Year Risk of Incident Diabetes in Chinese Adults. Front. Public Health 2021, 9, 626331. [Google Scholar] [CrossRef]
- Cahn, A.; Shoshan, A.; Sagiv, T.; Yesharim, R.; Goshen, R.; Shalev, V.; Raz, I. Prediction of Progression from Pre-diabetes to Diabetes: Development and Validation of a Machine Learning Model. Diabetes/Metab. Res. Rev. 2020, 36, e3252. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, X.; Zhang, S.; Zhu, Q.; Fu, X.; Chen, H.; Guan, H.; Xia, Y.; He, Q.; Kuang, J. Association of Anthropometric Indices With the Development of Diabetes Among Hypertensive Patients in China: A Cohort Study. Front. Endocrinol. 2021, 12, 736077. [Google Scholar] [CrossRef]
- Kopelman, P.G. Obesity as a Medical Problem. Nature 2000, 404, 635–643. [Google Scholar] [CrossRef]
- Ahn, C.H.; Yoon, J.W.; Hahn, S.; Moon, M.K.; Park, K.S.; Cho, Y.M. Evaluation of Non-Laboratory and Laboratory Prediction Models for Current and Future Diabetes Mellitus: A Cross-Sectional and Retrospective Cohort Study. PLoS ONE 2016, 11, e0156155. [Google Scholar] [CrossRef]
- Wang, X.; Zhai, M.; Ren, Z.; Ren, H.; Li, M.; Quan, D.; Chen, L.; Qiu, L. Exploratory Study on Classification of Diabetes Mellitus through a Combined Random Forest Classifier. BMC Med. Inform. Decis. Mak. 2021, 21, 105. [Google Scholar] [CrossRef]
- Deberneh, H.M.; Kim, I. Prediction of Type 2 Diabetes Based on Machine Learning Algorithm. Int. J. Environ. Res. Public Health 2021, 18, 3317. [Google Scholar] [CrossRef]
- Alghamdi, M.; Al-Mallah, M.; Keteyian, S.; Brawner, C.; Ehrman, J.; Sakr, S. Predicting Diabetes Mellitus Using SMOTE and Ensemble Machine Learning Approach: The Henry Ford ExercIse Testing (FIT) Project. PLoS ONE 2017, 12, e0179805. [Google Scholar] [CrossRef]
- Ravaut, M.; Harish, V.; Sadeghi, H.; Leung, K.K.; Volkovs, M.; Kornas, K.; Watson, T.; Poutanen, T.; Rosella, L.C. Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes. JAMA Netw. Open 2021, 4, e2111315. [Google Scholar] [CrossRef] [PubMed]
- De Silva, K.; Lim, S.; Mousa, A.; Teede, H.; Forbes, A.; Demmer, R.T.; Jönsson, D.; Enticott, J. Nutritional Markers of Undiagnosed Type 2 Diabetes in Adults: Findings of a Machine Learning Analysis with External Validation and Benchmarking. PLoS ONE 2021, 16, e0250832. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, Y.; Niu, M.; Wang, C.; Wang, Z. Machine Learning for Characterizing Risk of Type 2 Diabetes Mellitus in a Rural Chinese Population: The Henan Rural Cohort Study. Sci. Rep. 2020, 10, 4406. [Google Scholar] [CrossRef]
- Zou, Q.; Qu, K.; Luo, Y.; Yin, D.; Ju, Y.; Tang, H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front. Genet. 2018, 9, 515. [Google Scholar] [CrossRef]
- Xiong, X.; Zhang, R.; Bi, Y.; Zhou, W.; Yu, Y.; Zhu, D. Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-Sectional Retrospective Study in Chinese Adults. Curr. Med. Sci. 2019, 39, 582–588. [Google Scholar] [CrossRef]
- Lazo-Porras, M.; Bernabe-Ortiz, A.; Ruiz-Alejos, A.; Smeeth, L.; Gilman, R.H.; Checkley, W.; Málaga, G.; Miranda, J.J. Regression from Prediabetes to Normal Glucose Levels Is More Frequent than Progression towards Diabetes: The CRONICAS Cohort Study. Diabetes Res. Clin. Pract. 2020, 163, 107829. [Google Scholar] [CrossRef]
- Mao, W.; Yip, C.-M.W.; Chen, W. Complications of Diabetes in China: Health System and Economic Implications. BMC Public Health 2019, 19, 269. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tang, D.; Wang, J. Basic Public Health Service Utilization by Internal Older Adult Migrants in China. Int. J. Environ. Res. Public Health 2021, 18, 270. [Google Scholar] [CrossRef] [PubMed]
Variables | 2019 | p Value | 2020 | p Value | ||
---|---|---|---|---|---|---|
Without DM (n = 10,231) | DM (n = 1778) | Without DM (n = 8595) | DM (n = 3414) | |||
Age (years) | 72.06 ± 5.10 | 72.17 ± 5.22 | 0.393 | 72.08 ± 5.13 | 72.06 ± 5.10 | 0.813 |
Gender, n (%) | <0.001 | 0.018 | ||||
Male | 4536 (83.36) | 873 (26.14) | 3813 (70.49) | 1596 (29.51) | ||
Female | 5695 (86.29) | 905 (13.71) | 4782 (72.45) | 1818 (27.55) | ||
Education, n (%) | <0.001 | <0.001 | ||||
≤Primary school | 6485 (86.98) | 971 (13.02) | 5529 (74.16) | 1927 (25.84) | ||
Middle school | 1990 (82.10) | 434 (17.90) | 1615 (66.63) | 809 (33.37) | ||
High school | 931 (82.10) | 203 (17.90) | 764 (67.37) | 370 (32.63) | ||
≥University | 825 (82.91) | 170 (17.09) | 687 (69.05) | 308 (9.02) | ||
Marital status, n (%) | 0.383 | 0.897 | ||||
Married | 7762 (84.96) | 1374 (15.04) | 6541 (71.60) | 2595 (28.40) | ||
Divorced | 57 (87.69) | 8 (12.31) | 44 (67.69) | 21 (32.31) | ||
Widowed | 2331 (85.76) | 387 (14.24) | 1947 (71.63) | 771 (28.37) | ||
Unmarried | 81 (90.00) | 9 (10.00) | 63 (70.00) | 27 (30.00) | ||
Hypertension, n (%) | <0.001 | <0.001 | ||||
No | 4893 (87.02) | 730 (12.98) | 4153 (73.86) | 1470 (26.14) | ||
Yes | 5338 (85.39) | 1048 (16.41) | 4442 (69.56) | 1944 (30.44) | ||
Myocardial infarction, n (%) | 0.463 | 0.298 | ||||
No | 10,177 (85.18) | 1771 (14.82) | 8555 (71.60) | 3393 (28.40) | ||
Yes | 54 (88.52) | 7 (11.48) | 40 (65.57) | 21 (34.43) | ||
Coronary heart disease, n (%) | 0.841 | 0.144 | ||||
No | 9632 (85.22) | 1670 (14.78) | 8106 (71.72) | 3196 (28.28) | ||
Yes | 599 (84.72) | 108 (15.28) | 489 (69.17) | 218 (30.83) | ||
Angina pectoris, n (%) | 0.828 | 0.437 | ||||
No | 10,187 (85.19) | 1771 (14.81) | 8556 (71.55) | 3402 (28.45) | ||
Yes | 44 (86.27) | 7 (13.73) | 39 (76.47) | 12 (23.53) | ||
Fatty liver, n (%) | 0.315 | 0.055 | ||||
No | 9979 (85.25) | 1727 (14.75) | 8393 (71.70) | 3313 (28.30) | ||
Yes | 252 (83.17) | 51 (16.83) | 202 (66.67) | 101 (33.33) | ||
Exercise, n (%) | 0.587 | 0.455 | ||||
No | 3942 (85.42) | 673 (14.58) | 3321 (71.96) | 1294 (28.04) | ||
Yes | 6289 (85.06) | 1105 (14.94) | 5274 (71.33) | 2120 (28.67) | ||
Smoking, n (%) | 0.705 | 0.883 | ||||
No | 8804 (85.15) | 1536 (14.85) | 7403 (71.60) | 2937 (28.40) | ||
Yes | 1427 (85.50) | 242 (14.50) | 1192 (71.42) | 477 (28.58) | ||
Drinking, n (%) | 0.295 | 0.212 | ||||
No | 8544 (85.35) | 1467 (14.65) | 7188 (71.80) | 2823 (28.20) | ||
Yes | 1687 (84.43) | 311 (15.57) | 1407 (70.42) | 591 (29.58) | ||
BMI (kg/m2) | 24.56 ± 3.41 | 25.48 ± 3.31 | <0.001 | 24.44 ± 3.42 | 25.34 ± 3.33 | <0.001 |
WC (cm) | 85.57 ± 9.73 | 88.65 ± 9.23 | <0.001 | 85.20 ± 9.68 | 88.10 ± 9.52 | <0.001 |
SBP (mmHg) | 139.17 ± 19.48 | 140.30 ± 18.70 | 0.023 | 138.96 ± 19.52 | 140.28 ± 18.96 | <0.001 |
DBP (mmHg) | 80.99 ± 11.09 | 81.57 ± 10.64 | 0.041 | 80.86 ± 11.11 | 81.60 ± 10.79 | 0.001 |
FPG (mmol/L) | 6.42 ± 0.24 | 6.54 ± 0.26 | <0.001 | 6.40 ± 0.24 | 6.52 ± 0.26 | <0.001 |
TC (mmol/L) | 5.05 ± 1.05 | 4.99 ± 1.03 | 0.021 | 5.06 ± 1.05 | 5.00 ± 1.03 | 0.007 |
TG (mmol/L) | 1.32 (0.96) | 1.50 (1.06) | <0.001 | 1.30 (0.93) | 1.48 (1.05) | <0.001 |
HDL-C (mmol/L) | 1.39 ± 0.40 | 1.34 ± 0.51 | <0.001 | 1.40 ± 0.40 | 1.33 ± 0.44 | <0.001 |
LDL-C (mmol/L) | 2.80 ± 0.92 | 2.76 ± 1.02 | 0.147 | 2.78 ± 0.93 | 2.81 ± 0.95 | 0.231 |
ALT (U/L) | 18.00 (11.00) | 19.10 (12.00) | <0.001 | 18.00 (10.90) | 19.00 (12.00) | <0.001 |
AST (U/L) | 22.00 (8.30) | 22.00 (9.90) | 0.797 | 22.00 (8.30) | 22.30 (9.50) | 0.034 |
TBil (μmol/L) | 12.80 (6.80) | 13.10 (6.20) | 0.131 | 12.80 (6.90) | 12.90 (6.50) | 0.931 |
Scr (μmol/L) | 77.50 (29.00) | 76.90 (29.00) | 0.107 | 78.00 (28.00) | 76.00 (29.00) | <0.001 |
BUN (mmol/L) | 5.80 (2.26) | 5.70 (2.05) | 0.002 | 5.83 (2.29) | 5.67 (2.07) | <0.001 |
SUA (μmol/L) | 332.93 ± 99.46 | 347.54 ± 95.61 | <0.001 | 333.43 ± 99.44 | 344.32 ± 97.40 | <0.001 |
Metrics | Machine Learning Models | |||
---|---|---|---|---|
LR | DT | RF | XGBoost | |
1-year forecast period | ||||
Sensitivity | 0.5559 | 0.5213 | 0.5824 | 0.6569 |
Specificity | 0.6876 | 0.7004 | 0.6807 | 0.5972 |
Accuracy | 0.6669 | 0.6724 | 0.6653 | 0.6066 |
F1 score | 0.3432 | 0.3325 | 0.3527 | 0.3433 |
2-year forecast period | ||||
Sensitivity | 0.6232 | 0.5580 | 0.5754 | 0.6130 |
Specificity | 0.6016 | 0.6612 | 0.6647 | 0.6443 |
Accuracy | 0.6078 | 0.6316 | 0.6391 | 0.6353 |
F1 score | 0.4772 | 0.4653 | 0.4780 | 0.4913 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Q.; Zhou, Q.; He, Y.; Zou, J.; Guo, Y.; Yan, Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. J. Pers. Med. 2022, 12, 1055. https://doi.org/10.3390/jpm12071055
Liu Q, Zhou Q, He Y, Zou J, Guo Y, Yan Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. Journal of Personalized Medicine. 2022; 12(7):1055. https://doi.org/10.3390/jpm12071055
Chicago/Turabian StyleLiu, Qing, Qing Zhou, Yifeng He, Jingui Zou, Yan Guo, and Yaqiong Yan. 2022. "Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults" Journal of Personalized Medicine 12, no. 7: 1055. https://doi.org/10.3390/jpm12071055
APA StyleLiu, Q., Zhou, Q., He, Y., Zou, J., Guo, Y., & Yan, Y. (2022). Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. Journal of Personalized Medicine, 12(7), 1055. https://doi.org/10.3390/jpm12071055