Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Set
2.2. Study Population
2.3. Feature Preparation
2.4. Training
2.5. Feature Importance
3. Results
3.1. Model Performance
3.2. Feature Importance
3.2.1. SARS-CoV-2 Variants
3.2.2. Sociodemographic and Practice Effects, and General Diagnosis and Medication Counts
3.2.3. ICD-10 Classes
3.2.4. ATC Classes
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- WHO Coronavirus (COVID-19) Dashboard. Available online: https://covid19.who.int/ (accessed on 25 February 2023).
- Chen, C.; Haupert, S.R.; Zimmermann, L.; Shi, X.; Fritsche, L.G.; Mukherjee, B. Global Prevalence of Post-Coronavirus Disease 2019 (COVID-19) Condition or Long COVID: A Meta-Analysis and Systematic Review. J. Infect. Dis. 2022, 226, 1593–1607. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Zhang, L.; Zhang, Y.; Chen, G.; Wang, D.; Chen, X.; Wang, Z.; Wang, J.; Che, X.; Horita, N.; et al. Prevalence and clinical features of long COVID from omicron infection in children and adults. J. Infect. 2023, 86, e97–e99. [Google Scholar] [CrossRef] [PubMed]
- Cisterna-García, A.; Guillén-Teruel, A.; Caracena, M.; Pérez, E.; Jiménez, F.; Francisco-Verdú, F.J.; Reina, G.; González-Billalabeitia, E.; Palma, J.; Sánchez-Ferrer, Á.; et al. A predictive model for hospitalization and survival to COVID-19 in a retrospective population-based study. Sci. Rep. 2022, 12, 18126. [Google Scholar] [CrossRef] [PubMed]
- Gupta, H.; Verma, O.P. Vaccine hesitancy in the post-vaccination COVID-19 era: A machine learning and statistical analysis driven study. Evol. Intell. 2023, 16, 739–757. [Google Scholar] [CrossRef]
- Jimenez-Solem, E.; Petersen, T.S.; Hansen, C.; Hansen, C.; Lioma, C.; Igel, C.; Boomsma, W.; Krause, O.; Lorenzen, S.; Selvan, R.; et al. Developing and validating COVID-19 adverse outcome risk prediction models from a bi-national European cohort of 5594 patients. Sci. Rep. 2021, 11, 3246. [Google Scholar] [CrossRef]
- Sudre, C.H.; Murray, B.; Varsavsky, T.; Graham, M.S.; Penfold, R.S.; Bowyer, R.C.; Pujol, J.C.; Klaser, K.; Antonelli, M.; Canas, L.S.; et al. Attributes and Predictors of Long COVID. Nat. Med. 2021, 27, 626–631. [Google Scholar] [CrossRef]
- Pfaff, E.R.; Girvin, A.T.; Bennett, T.D.; Bhatia, A.; Brooks, I.M.; Deer, R.R.; Dekermanjian, J.P.; Jolley, S.E.; Kahn, M.G.; Kostka, K.; et al. Identifying who has long COVID in the USA: A machine learning approach using N3C data. Lancet Digit. Health 2022, 4, e532–e541. [Google Scholar] [CrossRef]
- Rathmann, W.; Bongaerts, B.; Carius, H.-J.; Kruppert, S.; Kostev, K. Basic characteristics and representativeness of the German Disease Analyzer database. Int. J. Clin. Pharmacol. Ther. 2018, 56, 459–466. [Google Scholar] [CrossRef] [PubMed]
- Federal Institute for Drugs and Medical Devices (BfArM). Internationale statistische Klassifikation der Krankheiten und verwandter Gesundheitsprobleme, 10. Revision, German Modification, Version 2023. Available online: https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2023/#IV (accessed on 12 October 2022).
- EphMRA. Available online: https://www.ephmra.org/ (accessed on 12 October 2022).
- Robert Koch Institute. Anzahl und Anteile von VOC und VOI in Deutschland. Available online: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/VOC_VOI_Tabelle.xlsx (accessed on 12 October 2022).
- Impfdashboard Deutschland. Available online: https://impfdashboard.de/static/data/germany_vaccinations_timeseries_v3.tsv (accessed on 20 June 2022).
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; Guyon, I., von Luxburg, U., Bengio Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; MIT Press: Cambridge, MA, USA, 2017; pp. 3146–3154. [Google Scholar]
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? arXiv 2022, arXiv:2207.08815. [Google Scholar]
- Schöler, D.; Kostev, K.; Peters, M.; Zamfir, C.; Wolk, A.; Roderburg, C.; Loosen, S.H. Machine Learning Can Predict the Probability of Biologic Therapy in Patients with Inflammatory Bowel Disease. J. Clin. Med. 2022, 11, 4586. [Google Scholar] [CrossRef]
- Csizmadia, G.; Liszkai-Peres, K.; Ferdinandy, B.; Miklósi, Á.; Konok, V. Human activity recognition of children with wearable devices using LightGBM machine learning. Sci. Rep. 2022, 12, 5472. [Google Scholar] [CrossRef]
- Rahman, S.; Irfan, M.; Raza, M.; Moyeezullah Ghori, K.; Yaqoob, S.; Awais, M. Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living. Int. J. Environ. Res. Public Health 2020, 17, 1082. [Google Scholar] [CrossRef] [Green Version]
- Sasaki, Y. The Truth of the F-Measure. 2007. Available online: https://www.cs.odu.edu/mukka/cs795sum09dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf (accessed on 26 February 2023).
- Lundberg, S.; Lee, S. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Available online: https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (accessed on 26 February 2023).
- O’Sullivan, C. SHAP for Categorical Features. Available online: https://towardsdatascience.com/shap-for-categorical-features-7c63e6a554ea (accessed on 26 February 2023).
- Aktar, S.; Ahamad, M.M.; Rashed-Al-Mahfuz, M.; Azad, A.; Uddin, S.; Kamal, A.; A Alyami, S.; Lin, P.-I.; Islam, S.M.S.; Quinn, J.M.; et al. Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development. JMIR Med. Inform. 2021, 9, e25884. [Google Scholar] [CrossRef] [PubMed]
- Du, M.; Ma, Y.; Deng, J.; Liu, M.; Liu, J. Comparison of Long COVID-19 Caused by Different SARS-CoV-2 Strains: A Systematic Review and Meta-Analysis. Int. J. Environ. Res. Public Health 2022, 19, 16010. [Google Scholar] [CrossRef] [PubMed]
- Kostev, K.; Smith, L.; Koyanagi, A.; Jacob, L. Prevalence of and Factors Associated with Post-Coronavirus Disease 2019 (COVID-19) Condition in the 12 Months After the Diagnosis of COVID-19 in Adults Followed in General Practices in Germany. Open Forum Infect. Dis. 2022, 9, ofac333. [Google Scholar] [CrossRef]
- Peghin, M.; Palese, A.; Venturini, M.; De Martino, M.; Gerussi, V.; Graziano, E.; Bontempo, G.; Marrella, F.; Tommasini, A.; Fabris, M.; et al. Post-COVID-19 symptoms 6 months after acute infection among hospitalized and non-hospitalized patients. Clin. Microbiol. Infect. 2021, 27, 1507–1513. [Google Scholar] [CrossRef] [PubMed]
- Fernández-De-Las-Peñas, C.; Martín-Guerrero, J.D.; Pellicer-Valero, Ó.J.; Navarro-Pardo, E.; Gómez-Mayordomo, V.; Cuadrado, M.L.; Arias-Navalón, J.A.; Cigarán-Méndez, M.; Hernández-Barrera, V.; Arendt-Nielsen, L. Female Sex Is a Risk Factor Associated with Long-Term Post-COVID Related-Symptoms but Not with COVID-19 Symptoms: The LONG-COVID-EXP-CM Multicenter Study. J. Clin. Med. 2022, 11, 413. [Google Scholar] [CrossRef]
- Thompson, E.J.; Williams, D.M.; Walker, A.J.; Mitchell, R.E.; Niedzwiedz, C.L.; Yang, T.C.; Huggins, C.F.; Kwong, A.S.F.; Silverwood, R.J.; Di Gessa, G.; et al. Long COVID burden and risk factors in 10 UK longitudinal studies and electronic health records. Nat. Commun. 2022, 13, 3528. [Google Scholar] [CrossRef]
- Yong, S.J. Long COVID or post-COVID-19 syndrome: Putative pathophysiology, risk factors, and treatments. Infect. Dis. 2021, 53, 737–754. [Google Scholar] [CrossRef] [PubMed]
- Tsampasian, V.; Elghazaly, H.; Chattopadhyay, R.; Debski, M.; Naing, T.K.P.; Garg, P.; Clark, A.; Ntatsaki, E.; Vassiliou, V.S. Risk Factors Associated with Post−COVID-19 Condition: A Systematic Review and Meta-analysis. JAMA Intern. Med. 2023. [Google Scholar] [CrossRef]
- Schou, T.M.; Joca, S.; Wegener, G.; Bay-Richter, C. Psychiatric and neuropsychiatric sequelae of COVID-19—A systematic review. Brain Behav. Immun. 2021, 97, 328–348. [Google Scholar] [CrossRef]
- Pływaczewska-Jakubowska, M.; Chudzik, M.; Babicki, M.; Kapusta, J.; Jankowski, P. Lifestyle, course of COVID-19, and risk of Long-COVID in non-hospitalized patients. Front. Med. 2022, 9, 1036556. [Google Scholar] [CrossRef] [PubMed]
- Wilk, P.; Ruiz-Castell, M.; Moran, V.; Noel Pi Alperin, M.; Bohn, T.; Fagherazzi, G.; Suhrcke, M. How multimorbidity and socio-economic factors affect Long COVID: Evidence from European Countries. Eur. J. Public Health 2022, 32 (Suppl. S3), ckac129.137. [Google Scholar] [CrossRef]
- Hayhoe, B.W.; Powell, R.A.; Barber, S.; Nicholls, D. Impact of COVID-19 on individuals with multimorbidity in primary care. Br. J. Gen. Pract. 2021, 72, 38–39. [Google Scholar] [CrossRef] [PubMed]
- Notarte, K.I.; Catahay, J.A.; Velasco, J.V.; Pastrana, A.; Ver, A.T.; Pangilinan, F.C.; Peligro, P.J.; Casimiro, M.; Guerrero, J.J.; Gellaco, M.M.L.; et al. Impact of COVID-19 vaccination on the risk of developing long-COVID and on existing long-COVID symptoms: A systematic review. eClinicalMedicine 2022, 53, 101624. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kessler, R.; Philipp, J.; Wilfer, J.; Kostev, K. Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. J. Clin. Med. 2023, 12, 3511. https://doi.org/10.3390/jcm12103511
Kessler R, Philipp J, Wilfer J, Kostev K. Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. Journal of Clinical Medicine. 2023; 12(10):3511. https://doi.org/10.3390/jcm12103511
Chicago/Turabian StyleKessler, Roman, Jos Philipp, Joanna Wilfer, and Karel Kostev. 2023. "Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany" Journal of Clinical Medicine 12, no. 10: 3511. https://doi.org/10.3390/jcm12103511
APA StyleKessler, R., Philipp, J., Wilfer, J., & Kostev, K. (2023). Predictive Attributes for Developing Long COVID—A Study Using Machine Learning and Real-World Data from Primary Care Physicians in Germany. Journal of Clinical Medicine, 12(10), 3511. https://doi.org/10.3390/jcm12103511