Artificial Intelligence to Improve Antibiotic Prescribing: A Systematic Review
Abstract
:1. Introduction
2. Materials and Methods
2.1. Eligibility Criteria
- Participants and setting: Participants were human patients without restrictions on their characteristics (i.e., gender, age, weight, morbidities). No restrictions were applied for setting (i.e., primary, secondary, or tertiary care), timing of publication or language of the studies included. The “no restriction” on participants′ inclusion, setting or language was applied to ensure that all relevant studies were retrieved;
- Study design: the studies included in this review were cohort studies or any observational study that examined the potential or actual use of AI, machine learning or data analytics to improve antibiotic prescribing or consumption in human patients;
- Outcome measures:
- ○
- The relative reduction of antibiotic prescriptions (primary outcome);
- ○
- The prediction of inappropriate antibiotics prescriptions;
- ○
- The relative reduction in re-consultations of patients, irrespective of the reason for re-consultation (infection recurrence or worsening of patient’s condition).
2.2. Data Sources and Search Strategy
- Data sources: the search strategy was applied to Scopus, OVID, ScienceDirect, EMBASE, Web of Science, IEEE Xplore and the Association for Computing Machinery (ACM);
- Search strategy: the search strategy, which can be accessed in [41], was based on three different concepts: (1) artificial intelligence, (2) prescriptions, and (3) population, and was customised to each database, depending on the special filters of these databases. The search for studies stopped in April 2022, and the snowball search (i.e., screening the references of the included studies for additional eligible studies) was conducted in August 2022.
2.3. Study Records
2.3.1. Study Selection
2.3.2. Data Extraction and Management
- Study information, for example, publication type, country, name of publication outlet, authors, year of publication, title, aim of study and funding source;
- Population, for example, total number of participants, cohorts, sample size calculation, methods of recruitment, age group, gender, race/ethnicity, illness, comorbidities and inclusion/exclusion criteria of patients;
- Methods and setting, for example, design of the study, data source, setting, start and end dates, machine learning methods used, training and test sets, predictors, data overfitting, valuation and validation of performance and handling of missing data;
- Outcomes, such as outcome(s) name(s) and definition and, if applicable, unit of measurement and scales;
- Results, limitations and key conclusions.
2.3.3. Assessment of Risk of Bias
2.3.4. Data Synthesis
3. Results
3.1. Included Studies
3.1.1. Study Information and Population
3.1.2. Methods and Settings
- Study design and setting: four studies [43,44,46,47] were retrospective, and one [45] was prospective (see Supplementary Materials Table S1). The setting in four studies [43,44,45,47] was secondary care and primary care in one study [46]. The data sources in four studies [43,44,45,47] were the electronic health records (EHRs) of the hospitals, while in a study by Yelin et al., the data of community- and retirement home-acquired urinary tract infections (UTIs) was obtained from Maccabi Healthcare Services (MHS), the second largest healthcare maintenance organisation in Israel [46]. All five studies [43,44,45,46,47] used supervised machine learning algorithms [28];
- Machine learning models in included studies:
- ○
- ○
- Predicted (outcome) variables: the primary outcome (relative reduction in antibiotic prescriptions) was reported in one study [43]; however, it was reported as a “proportion of recommendations for second-line antibiotics” (i.e., reduction in the use of second-line antibiotics) (see Table 1 and Supplementary Materials Table S1). The second outcome (prediction of inappropriate antibiotic prescriptions) was reported in five studies [43,44,45,46,47]; however, it was defined differently in each of them (Supplementary Materials Table S1). In a study by Kanjilal et al., it was reported as a “proportion of recommendations for inappropriate antibiotic therapy” [43], while in a study by Lee et al., a prediction of “antibiotic prescription errors” [44]. In a study by Beaudoin et al., it was reported as a prediction of “inappropriate prescriptions of piperacillin–tazobactam” [45]. The outcome in the study by Yelin et al. was a prediction of “mismatched treatments” (i.e., when the sample is resistant to the prescribed antibiotic) [46], and in a study by Oonsivilai et al., the outcome was a prediction of “susceptibility to antibiotics” [47]. None of the five studies [43,44,45,46,47] reported the third outcome (i.e., the relative reduction in re-consultations of patients);
- ○
- Predictors: predictors in the studies included in this review were categorised as either “bedside” or “non-bedside” to make a distinction in the data a clinician had access to at diagnosis and prescription of the (empiric) treatment, and data that was only available after laboratory or other investigations [49]. Furthermore, predictors were divided into 10 groups: labs, antibiotics, demographics, geographical, temporal, socioeconomic conditions, gender-related, comorbidities, vital signs and medical history of patients (see Supplementary Materials Table S3). All groups of predictors used in the five studies [43,44,45,46,47] belonged to the “bedside” category, except for the lab′s group of predictors, which belonged to the “non-bedside” category. Since the clinical predictors are indicative in prescribing treatments, the inclusion in the development of ML models provides more accurate prediction compared to the clinicians who did not have this information at the time of prescribing.
- ○
- Training and test datasets: A train/test split approach is used to ensure that the performance of an ML model is validated [50]. Three studies [43,46,47] reported that a train/test split approach was used, while two studies [44,45] did not report information about a train/test split. In the study by Kanjilal et al., the train/test split consisted of a training dataset of 10,053 patients versus a test dataset of 3941 patients [43], while in the study by Yelin et al., the data was divided in a way that all data collected from 1 July 2007 to 30 June 2016 was treated as a training set, and data collected from 1 July 2016 to 30 June 2017 was treated as test set [46]. In the remaining study, Oonsivilai et al. made an 80% versus 20% train/test split [47];
- ○
- Machine learning models: All models [43,44,45,46,47] belong to the supervised machine learning models [28]. Three studies [43,46,47] used logistic regression and decision trees, while random forest models were used in two studies [43,47]. Gradient boosting decision trees (GBDTs) were used in [46,47], in addition to linear, radial and polynomial support vector machines (SVMs) and K-nearest neighbours (KNNs) in [47]. Two studies [44,45] used different models: an advanced rule-based deep neural network (ARDNN) and a supervised learning module (i.e., temporal induction of classification models (TIM));
- ○
- Method to avoid data overfitting: Data overfitting occurs when an ML model perfectly fits training data but fails to generalise to test data [51], for which regularisation can be a solution [52]. Regularisation adds a penalty to the algorithm’s smoothness or complexity to avoid overfitting and improve the generalisation of the algorithm [53]. In the studies by Kanjilal et al. [43] and Oonsivilai et al. [47], regularisation was used to avoid data overfitting. In the study by Beaudoin et al. [45], the “J-measure”, which is defined as a measure of the goodness-of-fit of a rule [54], was used to reduce overfitting, and in the study by Yelin et al., the model’s performance on the test set was contrasted with the training set to identify data overfitting [46]. The study by Lee et al. did not report on data overfitting [44];
- ○
- Handling missing data: Two studies [43,45] did not report how missing data were handled. In the study by Lee et al., 2.45% of height and weight missing data were predicted, other empty data records were deleted, and data outliers were treated as missing values [44]. In the study by Yelin et al., the missing data for antibiotic resistance was treated as “not available” and eventually dropped from the models [46]. In the study by Oonsivilai et al., missing data for binary variables were considered “negative”; however, no further explanation for these negative values was provided [47];
- ○
- Evaluation of models’ performance: The predictive ability of an algorithm is assessed by the accuracy of the predicted outcomes in the test dataset [55]. Different measures are used to evaluate the predictive performance, for example, the area under the receiver operating characteristic curve (AUROC) [56], accuracy, sensitivity (i.e., recall), specificity, precision and F1 score [57].
- ○
- Findings of included studies: All five studies [43,44,45,46,47] used ML models to predict their outcomes. They included both bedside and non-bedside predictors that were extracted from either EHRs, or information systems of healthcare services organisations, and thus the ML models included complete patients’ information for their development. Regarding the primary outcome, the study by Kanjilal et al. reported that their algorithm was able to make a recommendation for an antibiotic in 99% of the specimens, and chose ciprofloxacin or levofloxacin for 11% of the specimens, relative to 34% in the case of clinicians (a 67% reduction) [43].
3.1.3. Assessing the Risk of Bias
4. Discussion
4.1. Summary of Main Findings
4.2. Using ML Models to Improve Antibiotic Prescribing
- Problem selection and definition;
- Data collection/curating datasets;
- ML development;
- Evaluation of ML models;
- Assessment of impacts;
- Deployment and monitoring [59].
More on ML Models and Generalisation
- ○
- Reproducing bias in ML models: Bias can lead to the lack of generalisation in an ML model [66]. If an ML model is trained on a dataset that was generated on a biased process, the output of the model may also be biased (i.e., bias reproduction), which is a real challenge when using healthcare data sources, such as EHRs [67]. This type of bias is called “algorithmic bias” [66]. Other sources of bias in an ML model may be “data-driven” (for example, bias due to ethnicity or socioeconomic status) or “human”, in which the persons who develop the ML system reflect their personal biases [66]. Two of the included studies [43,46] reported biases in their data. In the study by Kanjilal et al. [43], the authors reported data bias, for most of their data was for Caucasian patients. And to minimise producing biased predictions, they used a national criterion adopted for uncomplicated urinary tract infections. The second bias reported was the prediction of non-susceptibility more often in environments in which the ABR prevalence is higher than what exists in the training data, and thus the authors have assessed the temporality by using longitudinal data and have confounded by indication [43]. In the study by Yelin et al. [46], the authors reported some aspects of the data they used that would introduce bias in their results, such as the later use of a purchased antibiotic (shall produce a bias towards higher odds ratio for purchases before infection), patients who used antibiotics that were not made via the information system the authors used to extract the data (shall produce a bias towards lower odds ratio for drug purchases), the UTIs that are empirically treated without culture (shall produce a bias towards measuring of more resistant samples), the elective culture testing for cultures made after failure of treatment (shall produce a bias towards measure of more resistant samples, in addition to a strong association of drug purchases and resistance), in addition to the dependence of elective cultures on demographics (shall produce associations between demographics and resistance). However, the authors did not report any means to avoid these biases, and they have reported that their models were still able to predict resistance well and for their recommendation algorithms to reduce the chances of antibiotic prescribing for a resistant infection [46]. However, this implied that there was still a chance to reproduce these biases in their predictions. On the other hand, the three remaining studies [44,45,47] did not report biases indicating the chance of bias reproducibility in their predictions;
- ○
- Robustness of ML models: The robustness of an ML model is how well the model is trained to face adversarial examples [68]. It considers the sensitivity of the model (i.e., how the model’s output is sensitive to a change in its input) [69]. In addition, robustness has been used to derive generalisation in a supervised learning ML model [70]. All five studies [43,44,45,46,47] have used different performance measures, which help in understanding the sensitivity of their models and, thus, how robust they are. They reported measures such as recall (i.e., sensitivity), F1 score, AUROC, etc., which contribute to indicating how sensitive their models were to changes in inputs. In addition, in the study by Kanjilal et al. [43], the authors reported carrying out a sensitivity analysis with several false negative rates for each antibiotic to translate the output of their models into susceptibility phenotypes. And in the study by Beaudoin et al. [45], the authors reported that they selected a distance threshold (below which a prescription for piperacillin–tazobactam was classified as inappropriate) based on previous experimentations (i.e., sensitivity analysis).
4.3. ML Models Versus AB Prescribing Clinicians
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Van Boeckel, T.P.; Gandra, S.; Ashok, A.; Caudron, Q.; Grenfell, B.T.; Levin, S.A.; Laxminarayan, R. Global antibiotic consumption 2000 to 2010, an analysis of national pharmaceutical sales data. Lancet Infect. Dis. 2014, 14, 742–750. [Google Scholar] [CrossRef]
- Founou, R.C.; Founou, L.L.; Essack, S.Y. Clinical and economic impact of antibiotic resistance in developing countries: A systematic review and meta-analysis. PLoS ONE 2017, 12, e0189621. [Google Scholar] [CrossRef] [Green Version]
- Dadgostar, P. Antimicrobial Resistance: Implications and Costs. Infect. Drug Resist. 2019, 12, 3903–3910. [Google Scholar] [CrossRef] [Green Version]
- Zaman, S.B.; Hussain, M.A.; Nye, R.; Mehta, V.; Mamun, K.T.; Hossain, N. A Review on Antibiotic Resistance: Alarm Bells are Ringing. Cureus 2017, 9, e1403. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bell, B.G.; Schellevis, F.; Stobberingh, E.; Goossens, H.; Pringle, M. A systematic review and meta-analysis of the effects of antibiotic consumption on antibiotic resistance. BMC Infect. Dis. 2014, 14, 13. [Google Scholar] [CrossRef] [Green Version]
- Costelloe, C.; Metcalfe, C.; Lovering, A.; Mant, D.; Hay, A.D. Effect of antibiotic prescribing in primary care on antimicrobial resistance in individual patients: Systematic review and meta-analysis. BMJ 2010, 340, c2096. [Google Scholar] [CrossRef] [Green Version]
- Bakhit, M.; Hoffmann, T.; Scott, A.M.; Beller, E.; Rathbone, J.; Del Mar, C. Resistance decay in individuals after antibiotic exposure in primary care: A systematic review and meta-analysis. BMC Med. 2018, 16, 126. [Google Scholar] [CrossRef] [Green Version]
- Holmes, A.H.; Moore, L.S.; Sundsfjord, A.; Steinbakk, M.; Regmi, S.; Karkey, A.; Guerin, P.J.; Piddock, L.J.V. Understanding the mechanisms and drivers of antimicrobial resistance. Lancet 2016, 387, 176–187. [Google Scholar] [CrossRef]
- Goossens, H.; Ferech, M.; Vander Stichele, R.; Elseviers, M. Outpatient antibiotic use in Europe and association with resistance: A cross-national database study. Lancet 2005, 365, 579–587. [Google Scholar] [CrossRef] [PubMed]
- Prestinaci, F.; Pezzotti, P.; Pantosti, A. Antimicrobial resistance: A global multifaceted phenomenon. Pathog. Glob. Health 2015, 109, 309–318. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- ECDC. Surveillance of Antimicrobial Resistance in Europe 2018; European Centre for Disease Prevention and Control: Stockholm, Sweden, 2019.
- Shrestha, P.; Cooper, B.S.; Coast, J.; Oppong, R.; Do Thi Thuy, N.; Phodha, T.; Celhay, O.; Guerin, P.J.; Werheim, H.; Lubell, Y. Enumerating the economic cost of antimicrobial resistance per antibiotic consumed to inform the evaluation of interventions affecting their use. Antimicrob. Resist. Infect. Control 2018, 7, 98. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- World Health Organization (WHO). Ten Threats to Global Health in 2019; WHO: Geneva, Switzerland, 2019; Volume 2019. Available online: https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019 (accessed on 15 December 2022).
- Collaborators, A.R. Global burden of bacterial antimicrobial resistance in 2019, a systematic analysis. Lancet 2022, 399, 629–655. [Google Scholar]
- O’neill, J. Review on Antimicrobial Resistance. 2014. Available online: http://amr-review.org/ (accessed on 15 December 2022).
- Dyar, O.J.; Huttner, B.; Schouten, J.; Pulcini, C. What is antimicrobial stewardship? Clin. Microbiol. Infect. 2017, 23, 793–798. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fishman, N.; Society for Healthcare Epidemiology of America; Infectious Diseases Society of America; Pediatric Infectious Diseases Society. Policy statement on antimicrobial stewardship by the society for healthcare epidemiology of America (SHEA), the infectious diseases society of America (IDSA), and the pediatric infectious diseases society (PIDS). Infect. Control Hosp. Epidemiol. 2012, 33, 322–327. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hunt, D.L.; Haynes, R.B.; Hanna, S.E.; Smith, K. Effects of computer-based clinical decision support systems on physician performance and patient outcomes: A systematic review. JAMA 1998, 280, 1339–1346. [Google Scholar] [CrossRef]
- Charani, E.; Castro-Sánchez, E.; Holmes, A. The Role of Behavior Change in Antimicrobial Stewardship. Infect. Dis. Clin. N. Am. 2014, 28, 169–175. [Google Scholar] [CrossRef]
- Nabovati, E.; Jeddi, F.R.; Farrahi, R.; Anvari, S. Information technology interventions to improve antibiotic prescribing for patients with acute respiratory infection: A systematic review. Clin. Microbiol. Infect. 2021, 27, 838–845. [Google Scholar] [CrossRef]
- Skodvin, B.; Aase, K.; Charani, E.; Holmes, A.; Smith, I. An antimicrobial stewardship program initiative: A qualitative study on prescribing practices among hospital doctors. Antimicrob. Resist. Infect. Control 2015, 4, 24. [Google Scholar] [CrossRef]
- Barlam, T.F.; Cosgrove, S.E.; Abbo, L.M.; MacDougall, C.; Schuetz, A.N.; Septimus, E.J.; Srinivasan, A.; Dellit, T.H.; Falck-Ytter, Y.T.; Fishman, N.O.; et al. Implementing an Antibiotic Stewardship Program: Guidelines by the Infectious Diseases Society of America and the Society for Healthcare Epidemiology of America. Clin. Infect. Dis. 2016, 62, e51–e77. [Google Scholar] [CrossRef] [Green Version]
- Rawson, T.M.; Moore, L.S.P.; Hernandez, B.; Charani, E.; Castro-Sanchez, E.; Herrero, P.; Hayhoe, B.; Hope, W.; Georgiou, P.; Holmes, A.H. A systematic review of clinical decision support systems for antimicrobial management: Are we failing to investigate these interventions appropriately? Clin. Microbiol. Infect. 2017, 23, 524–532. [Google Scholar] [CrossRef] [Green Version]
- Goldenberg, S.L.; Nir, G.; Salcudean, S.E. A new era: Artificial intelligence and machine learning in prostate cancer. Nat. Rev. Urol. 2019, 16, 391–403. [Google Scholar] [CrossRef] [PubMed]
- Foster, K.R.; Koprowski, R.; Skufca, J.D. Machine learning, medical diagnosis, and biomedical engineering research–Commentary. BioMed. Eng. OnLine 2014, 13, 94. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bini, S.A. Artificial intelligence, machine learning, deep learning, and cognitive computing: What do these terms mean and how will they impact health care? J. Arthroplast. 2018, 33, 2358–2361. [Google Scholar] [CrossRef]
- Naylor, C.D. On the Prospects for a (Deep) Learning Health Care System. JAMA 2018, 320, 1099–1100. [Google Scholar] [CrossRef] [PubMed]
- Hamet, P.; Tremblay, J. Artificial intelligence in medicine. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef]
- Panch, T.; Szolovits, P.; Atun, R. Artificial intelligence, machine learning and health systems. J. Glob. Health 2018, 8, 020303. [Google Scholar] [CrossRef]
- Jiang, T.; Gradus, J.L.; Rosellini, A.J. Supervised Machine Learning: A Brief Primer. Behav. Ther. 2020, 51, 675–687. [Google Scholar] [CrossRef]
- Mahesh, B. Machine Learning Algorithms–A Review. Int. J. Sci. Res. (IJSR) 2019, 9, 381–386. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Lv, J.; Deng, S.; Zhang, L. A review of artificial intelligence applications for antimicrobial resistance. Biosaf. Health 2021, 3, 22–31. [Google Scholar] [CrossRef]
- Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Rodríguez-González, A.; Zanin, M.; Menasalvas-Ruiz, E. Public Health and Epidemiology Informatics: Can Artificial Intelligence Help Future Global Challenges? An Overview of Antimicrobial Resistance and Impact of Climate Change in Disease Epidemiology. Yearb. Med. Inform. 2019, 28, 224–231. [Google Scholar] [CrossRef] [Green Version]
- Pogorelc, B.; Bosnić, Z.; Gams, M. Automatic recognition of gait-related health problems in the elderly using machine learning. Multimed. Tools Appl. 2012, 58, 333–354. [Google Scholar] [CrossRef] [Green Version]
- Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef]
- Gharaibeh, M.; Alzu’bi, D.; Abdullah, M.; Hmeidi, I.; Al Nasar, M.R.; Abualigah, L.; Gandomi, A.H. Radiology imaging scans for early diagnosis of kidney tumors: A review of data analytics-based machine learning and deep learning approaches. Big Data Cogn. Comput. 2022, 6, 29. [Google Scholar] [CrossRef]
- Malik, P.A.; Pathania, M.; Rathaur, V.K. Overview of artificial intelligence in medicine. J. Fam. Med. Prim. Care 2019, 8, 2328–2331. [Google Scholar]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G. Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ 2009, 339, b2535. [Google Scholar] [CrossRef] [Green Version]
- Amin, D.; Garzón-Orjuela, N.; Garcia Pereira, A.; Parveen, S.; Vornhagen, H.; Vellinga, A. Search Strategy–Artificial Intelligence to Improve Antimicrobial Prescribing–A Protocol for a Systematic Review; Figshare: Iasi, Romania, 2022. [Google Scholar]
- NIH. Quality Assessment Tool for Observational Cohort and Cross-sectional Studies. 2018; Volume 2020. Available online: https://www.nhlbi.nih.gov/healthtopics/study-quality-assessment-tools (accessed on 15 December 2022).
- Kanjilal, S.; Oberst, M.; Boominathan, S.; Zhou, H.; Hooper, D.C.; Sontag, D. A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection. Sci. Transl. Med. 2020, 12, eaay5067. [Google Scholar] [CrossRef]
- Lee, S.; Shin, J.; Kim, H.S.; Lee, M.J.; Yoon, J.M.; Lee, S.; Kim, Y.; Kim, J.-Y.; Lee, S. Hybrid Method Incorporating a Rule-Based Approach and Deep Learning for Prescription Error Prediction. Drug Saf. 2022, 45, 27–35. [Google Scholar] [CrossRef]
- Beaudoin, M.; Kabanza, F.; Nault, V.; Valiquette, L. Evaluation of a machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs. Artif. Intell. Med. 2016, 68, 29–36. [Google Scholar] [CrossRef]
- Yelin, I.; Snitser, O.; Novich, G.; Katz, R.; Tal, O.; Parizade, M.; Chodick, G.; Koren, G.; Shalev, V.; Kishony, R. Personal clinical history predicts antibiotic resistance of urinary tract infections. Nat. Med. 2019, 25, 1143–1152. [Google Scholar] [CrossRef] [PubMed]
- Oonsivilai, M.; Mo, Y.; Luangasanatip, N.; Lubell, Y.; Miliya, T.; Tan, P.; Loeuk, L.; Turner, P.; Cooper, B.S. Using machine learning to guide targeted and locally-tailored empiric antibiotic prescribing in a children′s hospital in Cambodia. Wellcome Open Res. 2018, 3, 131. [Google Scholar] [CrossRef] [PubMed]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
- Quinn, M.; Forman, J.; Harrod, M.; Winter, S.; Fowler, K.E.; Krein, S.L.; Gupta, A.; Saint, S.; Singh, H.; Chopra, V. Electronic health records, communication, and data sharing: Challenges and opportunities for improving the diagnostic process. Diagnosis 2019, 6, 241–248. [Google Scholar] [CrossRef]
- Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
- Okser, S.; Pahikkala, T.; Aittokallio, T. Genetic variants and their interactions in disease risk prediction—Machine learning and network perspectives. BioData Min. 2013, 6, 5. [Google Scholar] [CrossRef] [Green Version]
- Moradi, R.; Berangi, R.; Minaei, B. A survey of regularisation strategies for deep models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Tian, Y.; Zhang, Y. A comprehensive survey on regularisation strategies in machine learning. Inf. Fusion. 2022, 80, 146–166. [Google Scholar] [CrossRef]
- Bramer, M. Using J-Pruning to Reduce Overfitting in Classification Trees. Knowl. Based Syst. KBS 2002, 15, 301–308. [Google Scholar] [CrossRef]
- Jin, H.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar]
- Ferri, C.; Hernández-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef]
- Luque, A.; Carrasco, A.; Martín, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–231. [Google Scholar] [CrossRef]
- Fanelli, U.; Pappalardo, M.; Chinè, V.; Gismondi, P.; Neglia, C.; Argentiero, A.; Calderaro, A.; Prati, A.; Esposito, S. Role of Artificial Intelligence in Fighting Antimicrobial Resistance in Pediatrics. Antibiotics 2020, 9, 767. [Google Scholar] [CrossRef]
- Chen, P.-H.C.; Liu, Y.; Peng, L. How to develop machine learning models for healthcare. Nat. Mater. 2019, 18, 410–414. [Google Scholar] [CrossRef] [PubMed]
- Finlayson, S.G.; Beam, A.L.; van Smeden, M. Machine Learning and Statistics in Clinical Research Articles—Moving Past the False Dichotomy. JAMA Pediatr. 2023, 177, 448–450. [Google Scholar] [CrossRef] [PubMed]
- Vuttipittayamongkol, P.; Elyan, E.; Petrovski, A. On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 2021, 212, 106631. [Google Scholar] [CrossRef]
- Vuttipittayamongkol, P.; Elyan, E. Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf. Sci. 2020, 509, 47–70. [Google Scholar] [CrossRef]
- Elyan, E.; Moreno-Garcia, C.F.; Jayne, C. CDSMOTE: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput. Appl. 2021, 33, 2839–2851. [Google Scholar] [CrossRef]
- Elyan, E.; Gaber, M.M. A fine-grained Random Forests using class decomposition: An application to medical diagnosis. Neural Comput. Appl. 2016, 27, 2279–2288. [Google Scholar] [CrossRef] [Green Version]
- Elyan, E.; Hussain, A.; Sheikh, A.; Elmanama, A.A.; Vuttipittayamongkol, P.; Hijazi, K. Antimicrobial Resistance and Machine Learning: Challenges and Opportunities. IEEE Access 2022, 10, 31561–31577. [Google Scholar] [CrossRef]
- Norori, N.; Hu, Q.; Aellen, F.M.; Faraci, F.D.; Tzovara, A. Addressing bias in big data and AI for health care: A call for open science. Patterns 2021, 2, 100347. [Google Scholar] [CrossRef]
- Parikh, R.B.; Teeple, S.; Navathe, A.S. Addressing Bias in Artificial Intelligence in Health Care. JAMA 2019, 322, 2377–2378. [Google Scholar] [CrossRef] [PubMed]
- Bai, T.; Luo, J.; Zhao, J.; Wen, B.; Wang, Q. Recent advances in adversarial training for adversarial robustness. arXiv 2021. arXiv:210201356. [Google Scholar]
- McCarthy, A.; Ghadafi, E.; Andriotis, P.; Legg, P. Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey. J. Cybersecur. Priv. 2022, 2, 154–190. [Google Scholar] [CrossRef]
- Bellet, A.; Habrard, A. Robustness and generalisation for metric learning. Neurocomputing 2015, 151, 259–267. [Google Scholar] [CrossRef]
- Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif. Intell. Med. 2020, 104, 101822. [Google Scholar] [CrossRef]
Title | Time Frame | Predicted (Outcome) Variable | Predictors’ Groups | Training/Test Sets | Machine Learning Models Used | Ways to Avoid Data Overfitting | Handling Missing Data | Evaluation of Models’ Performance | Results |
---|---|---|---|---|---|---|---|---|---|
A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection [43]. | 10 years (1 January 2007–31 December 2016). | The proportion of recommendations for second-line antibiotics and the proportion of recommendations for inappropriate antibiotic therapy. | Labs, antibiotics, demographics, geographical, comorbidities and medical history of patients. | Training dataset (n = 10,053 patients, and thus 11,865 specimens) and test dataset (n = 3629 patients, and thus 3941 specimens). | Logistic regression, decision trees, random forest models. | Regularisation was used in the logistic regression model. | No information. | AUROCs for nitrofurantoin and TMP-SMX were poor (0.56 and 0.59). For ciprofloxacin and levofloxacin, the AUROCs were poor as well (0.64). | The ML model was able to make a recommendation for an antibiotic in 99% of the specimens, and chose ciprofloxacin or levofloxacin for 11% of the specimens, relative to 34% in the case of clinicians (a 67% reduction). Furthermore, the model’s recommendation resulted in an inappropriate antibiotic therapy (i.e., second-line antibiotics) in 10% of the specimens relative to 12% in the case of clinicians (18% reduction). |
A hybrid method incorporating a rule-based approach and deep learning for prescription error prediction [44]. | 1 year (1 January–31 December 2018). | Antibiotic prescription errors. | Labs, medical history of patients and antibiotics. | No information. | An advanced rule-based deep neural network (ARDNN). | No information. | 2.45% of height and weight missing data were predicted, and other empty data records were deleted. Data outliers were treated as missing values. | The performance was evaluated with a precision of 73%, recall of 81% and F1 score of 77%. | Out of 15,407 prescriptions by clinicians, there were 179 prescription errors. A validated prediction model for prescription errors correctly detected 145 prescription errors out of the 179 errors, implying a precision of 81%, recall of 73% and F1 score of 77%. |
Evaluation of machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs [45]. | 11 months (2012 and 2013), where phase one was from 1 February to 30 November 2012, and phase two was from 18 November to 20 December 2013). | Inappropriate prescriptions of piperacillin–tazobactam. | Labs, demographics, geographical and vital signs. | No information. | A supervised learning module (temporal induction of classification models (TIM), which combines instance-based learning and rule induction). | The J-measure was used for measuring the improvement of a rule (i.e., the higher the information content of a rule reflected by the high J-measure, the higher the predictive accuracy of the model). | No information. | The overall system achieved a precision of 74%, recall of 96% and accuracy of 79%. | 44 learned rules were extracted to identify inappropriate piperacillin–tazobactam prescriptions. When tested against the data set, they were able to identify inappropriate prescriptions with a precision of 66%, recall of 64% and accuracy of 71%. |
Personal clinical history predicts antibiotic resistance to urinary tract infections [46]. | 10 years (1 July 2007–30 June 2017). | Mismatched treatment. | Labs, antibiotics, demographics, geographical, temporal and gender-related. | Training dataset: all data collected from 1 Jul 2007 to 30 Jun 2016; test dataset: all data collected from Jul 2016 to 30 Jun 2017. | Logistic regression and gradient-boosting decision trees (GBDTs). | The model performance on the test set was contrasted with the model performance on the training set to identify data overfitting. | Missing data for resistance measurements were defined as not available (N/A), and such samples were not used in the models. | AUROC was acceptable at 0.7 [amoxicillin-CA] to excellent at 0.83 [ciprofloxacin]. | The unconstrained algorithm resulted in a predicted mismatch treatment of 5% (42% lower than the mismatch treatment of 8.5% in the case of clinicians’ prescriptions), and the constrained resulted in a predicted mismatch of 6%. |
Using machine learning to guide targeted and locally tailored empiric antibiotic prescribing in a children’s hospital in Cambodia [47]. | 3 years (from February 2013 to January 2016). | Susceptibility to antibiotics. | Labs, antibiotics, demographics, temporal, socioeconomic conditions and medical history of patients. | The dataset was split 80% versus 20% for training and test datasets. | Logistic regression, decision trees, random forests, boosted decision trees, linear support vector machines (SVM), polynomial SVMs, radial SVMs and K-nearest neighbours. | Regularisation was used in the logistic regression model; however, no details about the rest of the ML models were used. | Missing data for the binary predictors were treated as being “negative”. | AUROC of the random forest method was excellent at 0.80 for ceftriaxone, acceptable at 0.74 for ampicillin and gentamicin and 0.71 for Gram-stain. | The random forest method had the best predictive performance in predicting susceptibility to antibiotics (such as ceftriaxone, ampicillin and gentamicin, and Gram-stain), which will be used to guide appropriate antibiotic therapy. In addition, the authors reported the AUROC values of different models rather than the ML models’ results. |
Criteria | A Decision Algorithm to Promote Outpatient Antimicrobial Stewardship for Uncomplicated Urinary Tract Infection [43] | Hybrid Method Incorporating a Rule-Based Approach and Deep Learning for Prescription Error Prediction [44] | Evaluation of a Machine Learning Capability for a Clinical Decision Support System to Enhance Antimicrobial Stewardship Programs [45] | Personal Clinical History Predicts Antibiotic Resistance of Urinary Tract Infections [46] | Using Machine Learning to Guide Targeted and Locally Tailored Empiric Antibiotic Prescribing in a Children’s Hospital in Cambodia [47] |
---|---|---|---|---|---|
1. Was the research question or objective in this paper clearly stated? | Yes | Yes | Yes | Yes | Yes |
2. Was the study population clearly specified and defined? | Yes | Yes | Yes | Yes | Yes |
3. Was the participation rate of eligible persons at least 50%? | NA | NA | NA | NA | NA |
4. Were all the subjects selected or recruited from the same or similar populations (including the same time period)? Were inclusion and exclusion criteria for being in the study prespecified and applied uniformly to all participants? | Yes | Yes | Yes | Yes | Yes |
5. Was a sample size justification, power description, or variance and effect estimates provided? | NA | NA | NA | NA | NA |
6. For the analyses in this paper, were the exposure(s) of interest measured prior to the outcome(s) being measured? | Yes | Yes | Yes | Yes | Yes |
7. Was the timeframe sufficient so that one could reasonably expect to see an association between exposure and outcome if it existed? | Yes | Yes | Yes | Yes | Yes |
8. For exposures that can vary in amount or level, did the study examine different levels of the exposure as related to the outcome (e.g., categories of exposure or exposure measured as a continuous variable)? | NA | NA | NA | NA | NA |
9. Were the exposure measures (independent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? | Yes | Yes | Yes | Yes | Yes |
10. Was the exposure(s) assessed more than once over time? | Yes | Yes | Yes | Yes | Yes |
11. Were the outcome measures (dependent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? | Yes | Yes | Yes | Yes | Yes |
12. Were the outcome assessors blinded to the exposure status of participants? | NR | NR | NR | NR | NR |
13. Was the loss to follow-up after baseline 20% or less? | NA | NA | NA | NA | NA |
14. Were key potential confounding variables measured and adjusted statistically for their impact on the relationship between exposure(s) and outcome(s)? | NR | NR | NR | NR | NR |
Quality Rating | Fair (57.1%) | Fair (57.1%) | Fair (57.1%) | Fair (57.1%) | Fair (57.1%) |
Studies | Phase (1) | Phase (2) | Phase (3) | Phase (4) | Phase (5) | Phase (6) |
---|---|---|---|---|---|---|
Problem Selection and Definition | Data Collection/Curating Datasets | ML Development | Evaluation of the ML Model | Assessment of Impact | Deployment and Monitoring | |
A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection [43]. | Yes | Yes, information about the dataset and training/test split is provided. However, no justification was provided for the train/test split. | Logistic regression was chosen based on a comparison with decision tree and random forest models; however, no justification was provided on why these three models were chosen. | Yes | No information | No information |
Hybrid Method Incorporating a Rule-Based Approach and Deep Learning for Prescription Error Prediction [44]. | Yes | Yes, information about the dataset is provided. However, no information was provided for the train/ test split. | No justification was provided for choosing the ML model; however, the authors reported that the rules they used with their model were based on consultation with a clinician. | Yes | No information | No information |
Evaluation of a machine learning capability for a clinical decision support system to enhance antimicrobial stewardship programs [45]. | Yes | Yes, information about the dataset is provided. However, no information was provided for the train/ test split. | No justification was provided for choosing the ML model. | Yes | No information | No information |
Personal clinical history predicts antibiotic resistance to urinary tract infections [46]. | Yes | Yes, information about the dataset and training/ test split is provided. However, no justification was provided for the train/test split. | No justification was provided for choosing the ML model. | Yes | No information | No information |
Using machine learning to guide targeted and locally tailored empiric antibiotic prescribing in a children’s hospital in Cambodia [47]. | Yes | Yes, information about the dataset and training/test split is provided. However, no justification was provided for the train/test split. | No justification was provided for choosing the ML model. | Yes | No information | No information |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amin, D.; Garzόn-Orjuela, N.; Garcia Pereira, A.; Parveen, S.; Vornhagen, H.; Vellinga, A. Artificial Intelligence to Improve Antibiotic Prescribing: A Systematic Review. Antibiotics 2023, 12, 1293. https://doi.org/10.3390/antibiotics12081293
Amin D, Garzόn-Orjuela N, Garcia Pereira A, Parveen S, Vornhagen H, Vellinga A. Artificial Intelligence to Improve Antibiotic Prescribing: A Systematic Review. Antibiotics. 2023; 12(8):1293. https://doi.org/10.3390/antibiotics12081293
Chicago/Turabian StyleAmin, Doaa, Nathaly Garzόn-Orjuela, Agustin Garcia Pereira, Sana Parveen, Heike Vornhagen, and Akke Vellinga. 2023. "Artificial Intelligence to Improve Antibiotic Prescribing: A Systematic Review" Antibiotics 12, no. 8: 1293. https://doi.org/10.3390/antibiotics12081293
APA StyleAmin, D., Garzόn-Orjuela, N., Garcia Pereira, A., Parveen, S., Vornhagen, H., & Vellinga, A. (2023). Artificial Intelligence to Improve Antibiotic Prescribing: A Systematic Review. Antibiotics, 12(8), 1293. https://doi.org/10.3390/antibiotics12081293