All-Cause Mortality Prediction in Subjects with Diabetes Mellitus Using a Machine Learning Model and Shapley Values
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Choice of Parameters
2.3. Machine Learning Model
2.4. Imputation and Imbalance Treatment
2.5. Feature Importance in the Model
2.6. Outcome
2.7. Statistical Analysis
3. Results
3.1. Study Population
3.2. Outcome
3.3. Imputation Rate
3.4. Machine Learning Model and Feature Importance
4. Discussion
4.1. Study Population
4.2. Machine Learning Model and Feature Importance
5. Study Limitations
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sun, H.; Saeedi, P.; Karuranga, S.; Pinkepank, M.; Ogurtsova, K.; Duncan, B.B.; Stein, C.; Basit, A.; Chan, J.C.; Mbanya, J.C.; et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res. Clin. Pract. 2022, 183, 109119. [Google Scholar] [CrossRef] [PubMed]
- Ahmad, E.; Lim, S.; Lamptey, R.; Webb, D.R.; Davies, M.J. Type 2 diabetes. Lancet 2022, 400, 1803–1820. [Google Scholar] [CrossRef] [PubMed]
- Matheus, A.S.; Tannus, L.R.; Cobas, R.A.; Palma, C.C.; Negrato, C.A.; Gomes, M.B. Impact of diabetes on cardiovascular disease: An update. Int. J. Hypertens. 2013, 2013, 653789. [Google Scholar] [CrossRef] [PubMed]
- Sattar, N. Revisiting the links between glycaemia, diabetes and cardiovascular disease. Diabetologia 2013, 56, 686–695. [Google Scholar] [CrossRef]
- Schena, F.P.; Gesualdo, L. Pathogenetic mechanisms of diabetic nephropathy. J. Am. Soc. Nephrol. 2005, 16 (Suppl. S1), S30–S33. [Google Scholar] [CrossRef]
- Chiles, N.S.; Phillips, C.L.; Volpato, S.; Bandinelli, S.; Ferrucci, L.; Guralnik, J.M.; Patel, K.V. Diabetes, peripheral neuropathy, and lower-extremity function. J. Diabetes Complicat. 2014, 28, 91–95. [Google Scholar] [CrossRef]
- Quesada, J.A.; Lopez-Pineda, A.; Gil-Guillén, V.F.; Durazo-Arvizu, R.; Orozco-Beltrán, D.; López-Domenech, A.; Carratalá-Munuera, C. Machine learning to predict cardiovascular risk. Int. J. Clin. Pract. 2019, 73, e13389. [Google Scholar] [CrossRef]
- Adler, E.D.; Voors, A.A.; Klein, L.; Macheret, F.; Braun, O.O.; Urey, M.A.; Zhu, W.; Sama, I.; Tadel, M.; Campagnari, C.; et al. Improving risk prediction in heart failure using machine learning. Eur. J. Heart Fail. 2020, 22, 139–147. [Google Scholar] [CrossRef]
- Shapley, L. A value for n-person games. In Contributions to the Theory of Games, Volume II; Kuhn, H., Tucker, A., Eds.; Princeton University Press: Princeton, NJ, USA, 1953; pp. 307–318. [Google Scholar]
- Mancia, G.; Kreutz, R.; Brunström, M.; Burnier, M.; Grassi, G.; Januszewicz, A.; Muiesan, M.L.; Tsioufis, K.; Agabiti-Rosei, E.; Algharably, E.A.E.; et al. 2023 ESH Guidelines for the management of arterial hypertension The Task Force for the management of arterial hypertension of the European Society of Hypertension: Endorsed by the International Society of Hypertension (ISH) and the European Renal Association (ERA). J. Hypertens. 2023, 41, 1874–2071. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Stekhoven, D.J.; Bühlmann, P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef]
- Swana, E.F.; Doorsamy, W.; Bokoro, P. Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset. Sensors 2022, 22, 3246. [Google Scholar] [CrossRef] [PubMed]
- Alimbayev, A.; Zhakhina, G.; Gusmanov, A.; Sakko, Y.; Yerdessov, S.; Arupzhanov, I.; Kashkynbayev, A.; Zollanvari, A.; Gaipov, A. Predicting 1-year mortality of patients with diabetes mellitus in Kazakhstan based on administrative health data using machine learning. Sci. Rep. 2023, 13, 8412. [Google Scholar] [CrossRef] [PubMed]
- The Hypertension in Diabetes Study Group. Prevalence of hypertension in newly presenting type 2 diabetic patients and the association with risk factors for cardiovascular and diabetic complications. J. Hypertens. 1993, 11, 309–317. [Google Scholar] [CrossRef]
- Tarnow, L.; Rossing, P.; Gall, M.A.; Nielsen, F.S.; Parving, H.H. Prevalence of arterial hypertension in diabetic patients before and after the JNC-V. Diabetes Care 1994, 17, 1247–1251. [Google Scholar] [CrossRef]
- Rawshani, A.; Rawshani, A.; Gudbjörnsdottir, S. Mortality and Cardiovascular Disease in Type 1 and Type 2 Diabetes. N. Engl. J. Med. 2017, 377, 300–301. [Google Scholar] [CrossRef]
- Carstensen, B.; Rønn, P.F.; Jørgensen, M.E. Prevalence, incidence and mortality of type 1 and type 2 diabetes in Denmark 1996–2016. BMJ Open Diabetes Res. Care 2020, 8, e001071. [Google Scholar] [CrossRef]
- de Fine Olivarius, N.; Andreasen, A.H. Five-year all-cause mortality of 1323 newly diagnosed middle-aged and elderly diabetic patients. Data from the population-based study, diabetes care in general practice, Denmark. J. Diabetes Complicat. 1997, 11, 83–89. [Google Scholar] [CrossRef]
- Hansen, L.J.; Olivarius Nde, F.; Siersma, V. 16-year excess all-cause mortality of newly diagnosed type 2 diabetic patients: A cohort study. BMC Public Health 2009, 9, 400. [Google Scholar] [CrossRef]
- Colhoun, H.M.; the WHO Multinational Study Group; Lee, E.T.; Bennett, P.H.; Lu, M.; Keen, H.; Wang, S.-L.; Stevens, L.K.; Fuller, J.H. Risk factors for renal failure: The WHO Multinational Study of Vascular Disease in Diabetes. Diabetologia 2001, 44 (Suppl. S2), S46–S53. [Google Scholar] [CrossRef]
- Groop, P.-H.; Thomas, M.C.; Moran, J.L.; Wadèn, J.; Thorn, L.M.; Mäkinen, V.-P.; Rosengård-Bärlund, M.; Saraheimo, M.; Hietala, K.; Heikkilä, O.; et al. The presence and severity of chronic kidney disease predicts all-cause mortality in type 1 diabetes. Diabetes 2009, 58, 1651–1658. [Google Scholar] [CrossRef]
- Afkarian, M.; Sachs, M.C.; Kestenbaum, B.; Hirsch, I.B.; Tuttle, K.R.; Himmelfarb, J.; De Boer, I.H. Kidney disease and increased mortality risk in type 2 diabetes. J. Am. Soc. Nephrol. 2013, 24, 302–308. [Google Scholar] [CrossRef] [PubMed]
- Bruno, G.; Merletti, F.; Bargero, G.; Novelli, G.; Melis, D.; Soddu, A.; Perotto, M.; Pagano, G.; Cavallo-Perin, P. Estimated glomerular filtration rate, albuminuria and mortality in type 2 diabetes: The Casale Monferrato study. Diabetologia 2007, 50, 941–948. [Google Scholar] [CrossRef] [PubMed]
- Donoiu, I.; Târtea, G.; Sfredel, V.; Raicea, V.; Țucă, A.M.; Preda, A.N.; Cozma, D.; Vătășescu, R. Dapagliflozin Ameliorates Neural Damage in the Heart and Kidney of Diabetic Mice. Biomedicines 2023, 11, 3324. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Wang, B.; Liu, W.; Wu, C.; Huang, J. The effects of insulin therapy on mortality in diabetic patients undergoing percutaneous coronary intervention. Ann. Transl. Med. 2021, 9, 1294. [Google Scholar] [CrossRef]
- Gamble, J.-M.; Chibrikov, E.; Twells, L.K.; Midodzi, W.K.; Young, S.W.; MacDonald, D.; Majumdar, S.R. Association of insulin dosage with mortality or major adverse cardiovascular events: A retrospective cohort study. Lancet Diabetes Endocrinol. 2017, 5, 43–52. [Google Scholar] [CrossRef]
- Kengne, A.P.; Czernichow, S.; Hamer, M.; Batty, G.D.; Stamatakis, E. Anaemia, haemoglobin level and cause-specific mortality in people with and without diabetes. PLoS ONE 2012, 7, e41875. [Google Scholar] [CrossRef]
- McFarlane, S.I.; Salifu, M.O.; Makaryus, J.; Sowers, J.R. Anemia and cardiovascular disease in diabetic nephropathy. Curr. Diabetes Rep. 2006, 6, 213–218. [Google Scholar] [CrossRef]
- Zvetkova, E.; Ivanov, I.; Koytchev, E.; Antonova, N.; Gluhcheva, Y.; Alexandrova-Watanabe, A.; Kostov, G. Hematological and Hemorheological Parameters of Blood Platelets as Biomarkers in Diabetes Mellitus Type 2: A Comprehensive Review. Appl. Sci. 2024, 14, 4684. [Google Scholar] [CrossRef]
- Perry, I.J.; Wannamethee, S.G.; Shaper, A.G. Prospective study of serum gamma-glutamyltransferase and risk of NIDDM. Diabetes Care 1998, 21, 732–737. [Google Scholar] [CrossRef]
- André, P.; Balkau, B.; Born, C.; Charles, M.A.; Eschwège, E.; DESIR Study Group. Three-year increase of gamma-glutamyltransferase level and development of type 2 diabetes in middle-aged men and women: The DESIR cohort. Diabetologia 2006, 49, 2599–2603. [Google Scholar] [CrossRef]
- Sluik, D.; Beulens, J.W.; Weikert, C.; van Dieren, S.; Spijkerman, A.M.; van der A, D.L.; Fritsche, A.; Joost, H.; Boeing, H.; Nöthlings, U. Gamma-glutamyltransferase, cardiovascular disease and mortality in individuals with diabetes mellitus. Diabetes Metab. Res. Rev. 2012, 28, 284–288. [Google Scholar] [CrossRef] [PubMed]
- Mannucci, E.; Monami, M.; Masotti, G.; Marchionni, N. All-cause mortality in diabetic patients treated with combinations of sulfonylureas and biguanides. Diabetes Metab. Res. Rev. 2004, 20, 44–47. [Google Scholar] [CrossRef] [PubMed]
- Monami, M.; Marchionni, N.; Masotti, G.; Mannucci, E. Effect of combined secretagogue/biguanide treatment on mortality in type 2 diabetic patients with and without ischemic heart disease. Int. J. Cardiol. 2008, 126, 247–251. [Google Scholar] [CrossRef] [PubMed]
- Varvaki Rados, D.; Catani Pinto, L.; Reck Remonti, L.; Bauermann Leitão, C.; Gross, J.L. The Association between Sulfonylurea Use and All-Cause and Cardiovascular Mortality: A Meta-Analysis with Trial Sequential Analysis of Randomized Clinical Trials. PLoS Med. 2016, 13, e1001992. [Google Scholar]
- Costanzo, P.; Cleland, J.G.; Pellicori, P.; Clark, A.L.; Hepburn, D.; Kilpatrick, E.S.; Perrone-Filardi, P.; Zhang, J.; Atkin, S.L. The obesity paradox in type 2 diabetes mellitus: Relationship of body mass index to prognosis: A cohort study. Ann. Intern. Med. 2015, 162, 610–618. [Google Scholar] [CrossRef]
- Lajous, M.; Bijon, A.; Fagherazzi, G.; Boutron-Ruault, M.C.; Balkau, B.; Clavel-Chapelon, F.; Hernán, M.A. Body mass index, diabetes, and mortality in French women: Explaining away a “paradox”. Epidemiology 2014, 25, 10–14. [Google Scholar] [CrossRef]
- Andersson, C.; van Gaal, L.; Caterson, I.D.; Weeke, P.; James, W.P.T.; Couthino, W.; Finer, N.; Sharma, A.M.; Maggioni, A.P.; Torp-Pedersen, C. Relationship between HbA1c levels and risk of cardiovascular adverse outcomes and all-cause mortality in overweight and obese cardiovascular high-risk women and men with type 2 diabetes. Diabetologia 2012, 55, 2348–2355. [Google Scholar] [CrossRef]
- UK Prospective Diabetes Study (UKPDS) Group. Effect of intensive blood-glucose control with metformin on complications in overweight patients with type 2 diabetes (UKPDS 34). Lancet 1998, 352, 854–865. [Google Scholar] [CrossRef]
- Patel, A.; MacMahon, S.; Chalmers, J.; Neal, B.; Billot, L.; Woodward, M.; Marre, M.; Cooper, M.; Glasziou, P.; Grobbee, D.; et al. Intensive blood glucose control and vascular outcomes in patients with type 2 diabetes. N. Engl. J. Med. 2008, 358, 2560–2572. [Google Scholar]
Demographic and Clinical | Biological Data | DM Related |
---|---|---|
Gender (male/female) Age (years) Height (cm) Current weight (kg) Maximum weight (kg) Current BMI (kg/m2) Maximum BMI (kg/m2) Abdominal circumference (cm) Hip circumference (cm) Systolic blood pressure (mmHg) Diastolic blood pressure (mmHg) Heart rate (bpm) Arterial hypertension (yes/no) Atrial fibrillation (yes/no) | Uric acid (mg/dL) ALT (U/L) AST (U/L) GGT (U/L) Serum amylase (U/L) Creatinine (mg/dL) eGFR (mL/min/1.73 m2) Urea (mg/dL) Alkaline phosphatase (U/L) Total cholesterol (mg/dL) HDL cholesterol (mg/dL) LDL cholesterol (mg/dL) Triglycerides (mg/dL) Hemoglobin (g/dL) Platelet count (number/µL) Glycated hemoglobin (%) Maximum glycemia (mg/dL) | Type of DM (type 1/2) Duration of DM (years) Duration of Insulin treatment (years) DM Medication:
|
Parameter | Unit | All (n = 1969) | T1DM (n = 255) | T2DM (n = 1714) | p Value * |
---|---|---|---|---|---|
Anthropometric data | |||||
Age | years | 59 ± 13 | 37 ± 13 | 62 ± 10 | <0.01 |
Current weight | kg | 81 ± 20 | 68 ± 14 | 83 ± 20 | <0.01 |
Maximum weight | kg | 91 ± 21 | 78 ± 18 | 93 ± 21 | <0.01 |
Current BMI | kg/m2 | 29 ± 7 | 24 ± 4 | 30 ± 7 | <0.01 |
Maximum BMI | kg/m2 | 33 ± 7 | 27 ± 5 | 34 ± 6 | <0.01 |
Abdominal circumference | cm | 102 ± 16 | 85 ± 12 | 105 ± 15 | <0.01 |
Hip circumference | cm | 105 ± 13 | 96 ± 9 | 106 ± 12 | <0.01 |
DM characteristics | |||||
Duration of DM | years | 10.0 ± 8.5 | 13 ± 11 | 10 ± 8 | <0.01 |
Duration of insulin treatment | years | 4.4 ± 7.1 | 13 ± 11 | 4 ± 5 | <0.01 |
HbA1c | % | 10.1 ± 2.6 | 10.1 ± 3 | 10.0 ± 2.5 | <0.01 |
Maximum glycemia | mg/dL | 309 ± 98 | 341 ± 98 | 305 ± 98 | <0.01 |
SBP | mmHg | 135 ± 19 | 124 ± 18 | 137 ± 19 | <0.01 |
DBP | mmHg | 80 ± 12 | 77 ± 11 | 80 ± 12 | <0.01 |
Heart rate | bpm | 83 ± 14 | 87 ± 14 | 83 ± 13 | ns |
Laboratory data | |||||
Hemoglobin | g/dL | 13.5 ± 1.9 | 13.6 ± 2.0 | 13.5 ± 1.8 | <0.01 |
Platelet count | number/µL | 221.000 ± 87.000 | 234.800 ± 78.000 | 219.200 ± 88.000 | <0.01 |
Creatinine | mg/dL | 1.0 ± 2.0 | 1.0 ± 1.8 | 1.0 ± 2.0 | <0.01 |
Urea | mg/dL | 44 ± 25 | 35 ± 19 | 45 ± 26 | <0.01 |
eGFR | mL/min/1.73 m2 | 82 ± 25 | 100 ± 24 | 79 ± 25 | <0.01 |
Uric acid | mg/dL | 5.1 ± 1.8 | 4.3 ± 1.7 | 7.8 ± 1.8 | <0.01 |
Alkaline phosphatase | U/L | 80 ± 44 | 86 ± 53 | 79 ± 42 | <0.01 |
GGT | U/L | 80 ± 181 | 56 ± 165 | 84 ± 183 | <0.01 |
AST | U/L | 31 ± 27 | 27 ± 27 | 31 ± 27 | <0.01 |
ALT | U/L | 36 ± 33 | 28 ± 27 | 37 ± 34 | <0.01 |
Total cholesterol | mg/dL | 190 ± 61 | 190 ± 56 | 190 ± 62 | ns |
HDL cholesterol | mg/dL | 41 ± 14 | 50 ± 18 | 40 ± 13 | <0.01 |
LDL cholesterol | mg/dL | 111 ± 45 | 113 ± 42 | 111 ± 46 | <0.01 |
Triglycerides | mg/dL | 201 ± 280 | 144 ± 140 | 210 ± 300 | <0.01 |
Parameter | Unit | All (n = 1969) | Alive (n = 1560) | Deceased (n = 409) | p Value * |
---|---|---|---|---|---|
Anthropometric data | |||||
Age | years | 59 ± 14 | 57 ± 14 | 67 ± 11 | <0.001 |
Current weight | kg | 81 ± 20 | 81 ± 19 | 79 ± 21 | ns |
Max weight | kg | 91 ± 21 | 91 ± 21 | 90 ± 22 | ns |
Current BMI | kg/m2 | 29 ± 7 | 29 ± 7 | 29 ± 7 | ns |
Maximum BMI | kg/m2 | 33 ± 7 | 33 ± 7 | 33 ± 7 | ns |
Abdominal circumference | cm | 102 ± 16 | 102 ± 16 | 104 ± 17 | ns |
Hip circumference | cm | 105 ± 13 | 105 ± 12 | 105 ± 14 | ns |
DM characteristics | |||||
Duration of DM | years | 10.0 ± 8.5 | 9.5 ± 8.3 | 11.4 ± 9.1 | <0.001 |
Duration of insulin treatment | years | 4.4 ± 7.1 | 4.3 ± 7.1 | 4.3 ± 7.4 | ns |
HbA1c | % | 10.1 ± 2.6 | 10.0 ± 2.5 | 10.2 ± 2.9 | ns |
Maximum glycemia | mg/dL | 309 ± 98 | 302 ± 95 | 333 ± 105 | <0.001 |
SBP | mmHg | 135 ± 19 | 135 ± 19 | 136 ± 21 | ns |
DBP | mmHg | 80 ± 12 | 80 ± 11 | 78 ± 13 | <0.001 |
Heart rate | bpm | 83 ± 14 | 84 ± 13 | 83 ± 15 | ns |
Laboratory data | |||||
Hemoglobin | g/dL | 13.5 ± 1.9 | 13.7 ± 1.7 | 12.7 ± 2.2 | <0.001 |
Platelet count | number/µL | 221.000 ± 87.000 | 222.000 ± 75.000 | 218.000 ± 121.000 | ns |
Creatinine | mg/dL | 1.0 ± 2.0 | 0.9 ± 0.4 | 1.4 ± 4.2 | <0.001 |
Urea | mg/dL | 44 ± 25 | 40 ± 21 | 56 ± 35 | <0.001 |
eGFR | mL/min/1.73 m2 | 82 ± 25 | 85.5 ± 24.2 | 68.4 ± 27.6 | <0.001 |
Uric acid | mg/dL | 5.1 ± 1.8 | 5.0 ± 1.9 | 5.5 ± 2.0 | <0.001 |
Alkaline phosphatase | U/L | 80 ± 44 | 78 ± 39 | 87 ± 57 | <0.001 |
GGT | U/L | 80 ± 181 | 72 ± 150 | 111 ± 268 | <0.001 |
AST | U/L | 31 ± 27 | 29 ± 23 | 36 ± 38 | <0.001 |
ALT | U/L | 36 ± 33 | 36 ± 30 | 38 ± 44 | ns |
Total cholesterol | mg/dL | 190 ± 61 | 192 ± 60 | 180 ± 66 | <0.001 |
HDL cholesterol | mg/dL | 41 ± 14 | 41 ± 13 | 38 ± 15 | <0.001 |
LDL cholesterol | mg/dL | 111 ± 45 | 113 ± 45 | 103 ± 44 | <0.001 |
Triglycerides | mg/dL | 201 ± 280 | 202 ± 290 | 199 ± 273 | ns |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mirea, O.; Oghli, M.G.; Neagoe, O.; Berceanu, M.; Țieranu, E.; Moraru, L.; Raicea, V.; Donoiu, I. All-Cause Mortality Prediction in Subjects with Diabetes Mellitus Using a Machine Learning Model and Shapley Values. Diabetology 2025, 6, 5. https://doi.org/10.3390/diabetology6010005
Mirea O, Oghli MG, Neagoe O, Berceanu M, Țieranu E, Moraru L, Raicea V, Donoiu I. All-Cause Mortality Prediction in Subjects with Diabetes Mellitus Using a Machine Learning Model and Shapley Values. Diabetology. 2025; 6(1):5. https://doi.org/10.3390/diabetology6010005
Chicago/Turabian StyleMirea, Oana, Mostafa Ghelich Oghli, Oana Neagoe, Mihaela Berceanu, Eugen Țieranu, Liviu Moraru, Victor Raicea, and Ionuț Donoiu. 2025. "All-Cause Mortality Prediction in Subjects with Diabetes Mellitus Using a Machine Learning Model and Shapley Values" Diabetology 6, no. 1: 5. https://doi.org/10.3390/diabetology6010005
APA StyleMirea, O., Oghli, M. G., Neagoe, O., Berceanu, M., Țieranu, E., Moraru, L., Raicea, V., & Donoiu, I. (2025). All-Cause Mortality Prediction in Subjects with Diabetes Mellitus Using a Machine Learning Model and Shapley Values. Diabetology, 6(1), 5. https://doi.org/10.3390/diabetology6010005