Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data
2.2. Experimental Setup
3. Results
3.1. Feature Extraction Using LogicRegression Approach
3.2. Performance Evaluation
4. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Name | Description | Value |
---|---|---|
Age | Age of the patient | Numeric |
Gender | Gender of the patient | Male, Female |
BMI | Body Mass Index of the patient | Numeric |
Blood_pressure | Blood pressure of the patient | Numeric |
WC | Waist circumference of the patient | Numeric |
Heart_beat | Heart beat of the patient | Numeric |
Body_weight | Body weight of the patient | Numeric |
Body_height | Body height of the patient | Numeric |
Smoking_status | Smoking status of the patient | Non-smoker, Smoker, Ex-smoker, Passive smoker |
Eating_habits | Assessment of eating habits | Adequate, Satisfactory, Inadequate |
Drinking_status | Drinking status | Abstinent, Less risky drinking, Risky, Harmful, Addictive |
SDH | Social determinants of health | Not threatened, Medium threatened, Threatened |
PAS | Physical activity status | Sufficient, Borderline, Insufficient |
Stress | Level of stress | Not threatened, Threatened |
RD | Risk of depression | No significant risk of depression, Risk of depression |
Q18 | How often do you usually eat vegetables? | Never Points, 4-6 times a week, 1x a day, More than 1x a day |
Q16 | How many meals do you eat on average per day? | 2 or less, 3 to 5, 6 or more |
Q2 | Are you physically active for at least 30 min/day? | Yes, No |
Q3 | Do you take medication to lower your blood pressure? | Yes, No |
Q30 | Do you have a habit of salting dishes at the table? | Yes, No |
Q32 | On average, which type of fat do you use most in food preparation or as a spread? | Vegetable oils, Cream, Butter, Lard, Hard margarines, Soft margarines, High-fat spreads, Low-fat spreads, Chocolate spread, Peanut butter, Pate, Cream Spread, Mayonnaise |
Q4 | Have you ever had your blood sugar measured? | Yes, No |
Q43 | How many times in a typical week do you engage in vigorous physical activity for at least 25 minutes each time to the point where you are breathing and sweating? | 0 or 1 times per week, 2 times per week, 3 or more times per week |
Q44 | How many times in a typical week do you engage in moderate physical activity for at least 30 minutes each time, to the extent that you breathe a little faster and warm up? | 0 or 1 times per week, 2 to 4 times per week, 5 or more times per week |
Q47 | How often have you drunk drinks containing alcohol in the last 12 months? | Never, Once a month or less, 2 to 4 times a month, 2 to 3 times a week, 4 or more times a week |
Q48 | In the last 12 months, how many measures of a drink containing alcohol did you usually have when you were drinking? | Zero to 1 measure, 2 measures, 3 or 4 measures, 5 or 6 measures, 7 or more |
Q49 | In the last 12 months, how often have you had 6 or more sips on one occasion for men and 4 or more sips on one occasion for women? | Never, Less than once a month, 1 to 3 times a month, 1 to 3 times a week, Daily or almost daily |
Q51 | In the last 12 months, how often have you needed an alcoholic drink in the morning to recover from excessive drinking the day before? | Never, Less than once a month, 1 to 3 times a month, 1 to 3 times a week, Daily or almost daily |
Q57 | How often do you feel tense, stressed or under a lot of pressure? | Never, Rarely, Occasionally, Often, Every day |
Q58 | How do you manage the tensions, stresses and pressures you experience in your life? | Easily, Able to, Able to with more efforts, Very difficult, Can’t |
Q59 | How often in the past 2 weeks have you felt little interest and satisfaction in the things you do? | Not at all, A few days, More than half the days, Almost every day |
Q6 | Does family have diabetes? | No, Outer family, Inner family |
Q60 | How often have you felt depressed, depressed, despairing in the past 2 weeks? | Not at all, A few days, More than half the days, Almost every day |
Q69 | Please indicate the last school you attended. | Primary school incomplete, Primary school, 2 or 3-year vocational school, 4-year secondary school or gymnasium, Graduate, Postgraduate |
Q70 | What is your current employment status? | Employed, Self-employed, Unemployed, Student, Retired, Disabled pensioner, Permanently disabled, Housewife |
Q71 | How do you get through the month based in income? | Good, Occasional problems, I have problems |
ATC_A02BC01 | Omeprazole | Binary (0,1) |
ATC_A02BC02 | Pantoprazole | Binary (0,1) |
ATC_A11CC05 | Colecalciferol | Binary (0,1) |
ATC_B01AC06 | Acetylsalicylic acid | Binary (0,1) |
ATC_C03BA11 | Indapamide | Binary (0,1) |
ATC_C07AB07 | Bisoprolol | Binary (0,1) |
ATC_C09AA04 | Perindopril | Binary (0,1) |
ATC_D01AC01 | Clotrimazole | Binary (0,1) |
ATC_D01AE15 | Terbinafine | Binary (0,1) |
ATC_D07AC13 | Mometasone | Binary (0,1) |
ATC_G04BD09 | Trospium | Binary (0,1) |
ATC_J01CA04 | Amoxicillin | Binary (0,1) |
ATC_J01CE10 | Benzathine phenoxymethylpenicillin | Binary (0,1) |
ATC_J01CR02 | Amoxicillin and beta-lactamase inhibitor | Binary (0,1) |
ATC_J01EE01 | Sulfadiazine /trimethoprim | Binary (0,1) |
ATC_J01FA10 | Azithromycin | Binary (0,1) |
ATC_M01AB05 | Diclofenac | Binary (0,1) |
ATC_M01AE01 | Ibuprofen | Binary (0,1) |
ATC_M01AE02 | Naproxen | Binary (0,1) |
ATC_N02AJ13 | Tramadol and paracetamol | Binary (0,1) |
ATC_N02BB02 | Metamizole sodium | Binary (0,1) |
ATC_N02BE01 | Paracetamol | Binary (0,1) |
ATC_N05BA08 | Bromazepam | Binary (0,1) |
ATC_N05BA12 | Alprazolam | Binary (0,1) |
ATC_N05CF02 | Zolpidem | Binary (0,1) |
ATC_R01AD09 | Mometasone | Binary (0,1) |
ATC_R03AC02 | Salbutamol | Binary (0,1) |
ATC_R03AL01 | Fenoterol and ipratropium bromide | Binary (0,1) |
ATC_R06AE07 | Cetirizine | Binary (0,1) |
ATC_R06AX13 | Loratadine | Binary (0,1) |
ATC_S01AA12 | Tobramycin | Binary (0,1) |
References
- Einarson, T.R.; Acs, A.; Ludwig, C.; Panton, U.H. Prevalence of cardiovascular disease in type 2 diabetes: A systematic literature review of scientific evidence from across the world in 2007–2017. Cardiovasc. Diabetol. 2018, 17, 83. [Google Scholar] [CrossRef] [Green Version]
- International Diabetes Federation. IDF Diabetes Atlas 2021, 10th ed.; IDF: Brussels, Belgium, 2021. [Google Scholar]
- Mohammedi, K.; Woodward, M.; Marre, M.; Colagiuri, S.; Cooper, M.; Harrap, S.; Mancia, G.; Poulter, N.; Williams, B.; Zoungas, S.; et al. Comparative effects of microvascular and macrovascular disease on the risk of major outcomes in patients with type 2 diabetes. Cardiovasc. Diabetol. 2017, 16, 95. [Google Scholar] [CrossRef]
- Steele, A.J.; Denaxas, S.C.; Shah, A.D.; Hemingway, H.; Luscombe, N.M. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE 2018, 13, e0202344. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- La Cava, W.; Bauer, C.; Moore, J.H.; Pendergrass, S.A. Interpretation of machine learning predictions for patient outcomes in electronic health records. AMIA Annu. Symp. Proc. 2019, 2019, 572–581. [Google Scholar]
- Birjandi, S.M.; Khasteh, S.H. A survey on data mining techniques used in medicine. J. Diabetes Metab. Disord. 2021, 20, 2055–2071. [Google Scholar] [CrossRef]
- Stekhoven, D.J.; Bühlmann, P. Missforest—Non-parametric missing value imputation for mixed-type data. Bioinformatics 2012, 28, 112–118. [Google Scholar] [CrossRef] [Green Version]
- Barda, A.J.; Horvat, C.M.; Hochheiser, H. A qualitative research framework for the design of user-centered displays of explanations for machine learning model predictions in healthcare. BMC Med. Inform. Decis. Mak. 2020, 20, 257. [Google Scholar] [CrossRef]
- Elshawi, R.; Al-Mallah, M.H.; Sakr, S. On the interpretability of machine learning-based model for predicting hypertension. BMC Med. Inform. Decis. Mak. 2019, 19, 146. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lakkaraju, H.; Bach, S.H.; Leskovec, J. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1675–1684. [Google Scholar]
- Schwender, H.; Ruczinski, I. Logic regression and its extensions. Adv. Genet. 2010, 72, 25–45. [Google Scholar] [PubMed]
- Ruczinski, I.; Kooperberg, C.; LeBlanc, M. Logic regression. J. Comput. Graph. Stat. 2003, 12, 475–511. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010, 33. [Google Scholar] [CrossRef] [Green Version]
- Stiglic, G.; Kocbek, P.; Fijacko, N.; Zitnik, M.; Verbert, K.; Cilar, L. Interpretability of machine learning-based prediction models in healthcare. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1379. [Google Scholar] [CrossRef]
- Emmert-Streib, F.; Yang, Z.; Feng, H.; Tripathi, S.; Dehmer, M. An introductory review of deep learning for prediction models with big data. Front. Artif. Intell. 2020, 3, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lim, T.S.; Loh, W.Y.; Shih, Y.S. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 2000, 40, 203–228. [Google Scholar] [CrossRef]
- Stiglic, G.; Kocbek, S.; Pernek, I.; Kokol, P. Comprehensive decision tree models in bioinformatics. PLoS ONE 2012, 7, e33812. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning; Lulu.com: Research Triangle, NC, USA, 2020. [Google Scholar]
- Brigugilio, W.R. Machine Learning Interpretability in Malware Detection. Ph.D. Dissertation, University of Windsor, Windsor, ON, Canada, 2020. [Google Scholar]
- Lucas, T.C. A translucent box: Interpretable machine learning in ecology. Ecol. Monogr. 2020, 90, e01422. [Google Scholar] [CrossRef]
Original Feature Name | Description | FPGL ≤ 6.1 mmol/L [75.29% [n = 2349]] | FPGL > 6.1 mmol/L [24.71% [n = 771]] |
---|---|---|---|
Age [mean (standard deviation − SD)] | Age in years | 56.07 (SD = 13.2) | 61.77 (SD = 10.98) |
Gender_M [%(n)] | Percentage of males | 37.16 (n = 873) | 54.47 (n = 420) |
BMI [mean (SD)] | Body mass index | 28.89 (SD = 5.39) | 32.16 (SD = 13.21) |
WC [mean (SD)] | Waist circumference in cm | 96.25 (SD = 13.89) | 103.48 (SD = 13.8) |
Active_30_min (Q2) [%(n)] | Active at least 30 minutes a day? | 64.88 (n = 1524) | 52.27 (n = 403) |
Medication (Q3) [%(n)] | Blood pressure medication? | 40.19 (n = 944) | 60.18 (n = 464) |
High_BS [%(n)] (Q4) | Ever measured high blood sugar? | 7.32 (n = 172) | 47.47 (n = 366) |
Grocer [%(n)] (Q18) | Eat vegetable/fruit daily? | 90.59 (n = 2128) | 78.99 (n = 609) |
Diab_fam [%(n)] (Q6) | Diabetes in family? | 69.65 (n = 1636) | 61.74 (n = 476) |
FPGL [mean (SD)] | Fasting plasma glucose level | 5.26 (SD = 0.44) | 6.74 (SD = 0.8) |
Feature | Rule | Description |
---|---|---|
L1 | (ATC_J01EE01 or (not Q41)) | Prescribed sulfadiazine and trimethoprim, or never measured high blood sugar. |
L2 | Q51 | Seldom eat fruit and vegetable. |
L3 | ((ATC_M01AE02 and ATC_J01CE10) or (not SE)) | Prescribed naproxen and benzathine phenoxymethylpenicillin or not socially endangered. |
L4 | Q494 | Daily consumption of alcohol in the last 12 months. |
L5 | (MSE or ATC_D01AE15) | Medium socially endangered or prescribed antifungals for dermatological use. |
Feature | Freq | Description |
---|---|---|
−Gender | 100 | Gender |
+Blood_pressure | 100 | Blood pressure |
+Heart_beat | 100 | Heart_beat |
+Age | 100 | Age |
+BMI | 100 | Body mass index |
+WC | 100 | Waist circumference |
−L1 | 100 | Logic feature 1 |
−L2 | 100 | Logic feature 2 |
−L3 | 100 | Logic feature 3 |
−Body_height | 99 | Body height |
+Body_weight | 83 | Body weight |
Feature | Freq | Description |
---|---|---|
−L3 | 100 | Logic feature 3 |
−L4 | 100 | Logic feature 4 |
+L5 | 100 | Logic feature 5 |
+Blood_pressure | 100 | Blood pressure |
+WC | 100 | Waist circumference in cm |
+Heart_beat | 100 | Heart_beat |
+Age | 100 | Age in years |
+Q45 | 100 | Ever measured high blood sugar? Yes |
−Gender | 100 | Gender |
+Q32 | 93 | Using drug(s) for lowering blood pressure |
+Body_weight | 87 | Body weight |
−Non_smoker | 87 | Non-smoker |
+L2 | 79 | Logic attribute 2 |
−Q321 | 78 | Most often used oil is vegetable oil |
−Non_drinker | 77 | No alcohol consumption |
−Q583 | 75 | Handle stress with hardship |
+Q62 | 74 | Parent, brother, or sister have diabetes |
+BMI | 69 | Body Mass Index |
−Q161 | 63 | 2 meals per day on average |
−Q301 | 51 | No habit of using salt at the table |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kocbek, S.; Kocbek, P.; Gosak, L.; Fijačko, N.; Štiglic, G. Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models. J. Pers. Med. 2022, 12, 368. https://doi.org/10.3390/jpm12030368
Kocbek S, Kocbek P, Gosak L, Fijačko N, Štiglic G. Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models. Journal of Personalized Medicine. 2022; 12(3):368. https://doi.org/10.3390/jpm12030368
Chicago/Turabian StyleKocbek, Simon, Primož Kocbek, Lucija Gosak, Nino Fijačko, and Gregor Štiglic. 2022. "Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models" Journal of Personalized Medicine 12, no. 3: 368. https://doi.org/10.3390/jpm12030368
APA StyleKocbek, S., Kocbek, P., Gosak, L., Fijačko, N., & Štiglic, G. (2022). Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models. Journal of Personalized Medicine, 12(3), 368. https://doi.org/10.3390/jpm12030368