Evaluation of the Performance of an Artificial Intelligence (AI) Algorithm in Detecting Thoracic Pathologies on Chest Radiographs
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Algorithm Development
2.2.1. Algorithm Design and Function
2.2.2. Transfer Learning and Databases
2.2.3. Dataset
2.2.4. Algorithm Training
2.2.5. Threshold Definition and Utility of the Validation Set
2.3. Ground Truth Labelling for Thoracic Abnormalities
2.4. Sample-Size Calculation
2.5. Clinical Validation Study
2.6. Evaluation of AI Standalone Performance
3. Results
3.1. Clinical Validation Study
3.2. AI Standalone Performance Evaluation
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Section/Topic | Item | Checklist Item | Page | |
---|---|---|---|---|
Title and abstract | ||||
Title | 1 | D; V | Identify the study as developing and/or validating a multivariable prediction model, the target population, and the outcome to be predicted. | 1 |
Abstract | 1 | V | Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions. | 1 |
Introduction | ||||
Background and objectives | 3a | V | Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models. | 1, 2 |
3b | V | Specify the objectives, including whether the study describes the development or validation of the model or both. | 2 | |
Methods | ||||
Source of data | 4a | V | Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation of data sets, if applicable. | 2 |
4b | V | Specify the key study dates, including the start of accrual; end of accrual; and, if applicable, end of follow-up. | 6, 7 | |
Participants | 5a | V | Specify key elements of the study setting (e.g., primary care, secondary care, and general population) including the number and location of centers. | 6, 7 |
5b | V | Describe eligibility criteria for participants. | n/a | |
5c | V | Give details of treatments received, if relevant. | n/a | |
Outcome | 6a | V | Clearly define the outcome that is predicted by the prediction model, including how and when assessed. | 6,7 |
6b | V | Report any actions to blind assessment of the outcome to be predicted. | n/a | |
Predictors | 7a | V | Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured. | 6, 7 |
7b | V | Report any actions to blind assessment of predictors for the outcome and other predictors. | n/a | |
Sample size | 8 | V | Explain how the study size was arrived at. | 6 |
Missing data | 9 | V | Describe how missing data were handled (e.g., complete-case analysis, single imputation, and multiple imputation) with details of any imputation method. | 6 |
Statistical analysis methods | 10a | V | Describe how predictors were handled in the analyses. | 7 |
10b | V | Specify the type of model, all model-building procedures (including any predictor selection), and methods for internal validation. | 3, 4, 5, 6, 7 | |
10c | V | For validation, describe how the predictions were calculated. | 7 | |
10d | V | Specify all measures used to assess model performance and, if relevant, to compare multiple models. | 7 | |
10e | V | Describe any model updating (e.g., recalibration) arising from the validation, if conducted. | 7 | |
Risk groups | 11 | V | Provide details on how risk groups were created, if conducted. | n/a |
Development vs. validation | 12 | V | For validation, identify any differences from the development data in setting, eligibility criteria, outcome, and predictors. | n/a |
Results | ||||
Participants | 13a | V | Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. | 3 |
13b | V | Describe the characteristics of the participants (basic demographics, clinical features, and available predictors), including the number of participants with missing data for predictors and outcome. | 7, 8 | |
13c | V | For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors, and outcome). | n/a | |
Model development | 14a | V | Specify the number of participants and outcome events in each analysis. | 7, 8 |
14b | V | If done, report the unadjusted association between each candidate predictor and outcome. | n/a | |
Model specification | 15a | V | Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients and model intercept or baseline survival at a given time point). | n/a |
15b | V | Explain how to use the prediction model. | ||
Model performance | 16 | V | Report performance measures (with CIs) for the prediction model. | 8, 9 |
Model-updating | 17 | V | If done, report the results from any model updating (i.e., model specification and model performance). | n/a |
Discussion | ||||
Limitations | 18 | V | Discuss any limitations of the study (such as a nonrepresentative sample, few events per predictor, and missing data). | 12 |
Interpretation | 19a | V | For validation, discuss the results with reference to performance in the development data and any other validation data. | n/a |
19b | V | Give an overall interpretation of the results, considering objectives, limitations, results from similar studies, and other relevant evidence. | 11, 12 | |
Implications | 20 | V | Discuss the potential clinical use of the model and its implications for future research. | 12 |
Other information | ||||
Supplementary information | 21 | V | Provide information about the availability of supplementary resources, such as the study protocol, Web calculator, and data sets. | 12, 13 |
Funding | 22 | V | Give the source of funding and the role of the funders for the present study. | 12 |
References
- Corne, J. Chest X-ray Made Easy; Churchill Livingstone: London, UK, 1997. [Google Scholar]
- Singh, R.; Kalra, M.K.; Nitiwarangkul, C.; Patti, J.A.; Homayounieh, F.; Padole, A.; Rao, P.; Putha, P.; Muse, V.V.; Sharma, A.; et al. Deep learning in chest radiography: Detection of findings and presence of change. PLoS ONE 2018, 13, e0204155. [Google Scholar] [CrossRef] [PubMed]
- Bruls, R.J.M.; Kwee, R.M. Workload for radiologists during on-call hours: Dramatic increase in the past 15 years. Insights Imaging 2020, 11, 121. [Google Scholar] [CrossRef] [PubMed]
- Islam, M.N.; Inan, T.T.; Rafi, S.; Akter, S.S.; Sarker, I.H.; Islam, A.K.M.N. A Systematic Review on the Use of AI and ML for Fighting the COVID-19 Pandemic. IEEE Trans. Artif. Intell. 2020, 1, 258–270. [Google Scholar] [CrossRef] [PubMed]
- Kufel, J.; Bargieł-Łączek, K.; Kocot, S.; Koźlik, M.; Bartnikowska, W.; Janik, M.; Czogalik, Ł.; Dudek, P.; Magiera, M.; Lis, A.; et al. What Is Machine Learning, Artificial Neural Networks and Deep Learning?—Examples of Practical Applications in Medicine. Diagnostics 2023, 13, 2582. [Google Scholar] [CrossRef] [PubMed]
- Laino, M.E.; Ammirabile, A.; Posa, A.; Cancian, P.; Shalaby, S.; Savevski, V.; Neri, E. The Applications of Artificial Intelligence in Chest Imaging of COVID-19 Patients: A Literature Review. Diagnostics 2021, 11, 1317. [Google Scholar] [CrossRef] [PubMed]
- Castiglioni, I.; Ippolito, D.; Interlenghi, M.; Monti, C.B.; Salvatore, C.; Schiaffino, S.; Polidori, A.; Gandola, D.; Messa, C.; Sardanelli, F. Machine learning applied on chest X-ray can aid in the diagnosis of COVID-19: A first experience from Lombardy, Italy. Eur. Radiol. Exp. 2021, 5, 7. [Google Scholar] [CrossRef] [PubMed]
- Padash, S.; Mohebbian, M.R.; Adams, S.J.; Henderson, R.D.E.; Babyn, P. Pediatric chest radiograph interpretation: How far has artificial intelligence come? A systematic literature review. Pediatr. Radiol. 2022, 52, 1568–1580. [Google Scholar] [CrossRef] [PubMed]
- Kufel, J.; Bargieł, K.; Koźlik, M.; Czogalik, Ł.; Dudek, P.; Jaworski, A.; Cebula, M.; Gruszczyńska, K. Application of artificial intelligence in diagnosing COVID-19 disease symptoms on chest X-rays: A systematic review. Int. J. Med. Sci. 2022, 19, 1743–1752. [Google Scholar] [CrossRef] [PubMed]
- Nam, J.G.; Park, S.; Hwang, E.J.; Lee, J.H.; Jin, K.-N.; Lim, K.Y.; Vu, T.H.; Sohn, J.H.; Hwang, S.; Goo, J.M.; et al. Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology 2018, 290, 218–228. [Google Scholar] [CrossRef]
- Ajmera, P.; Pant, R.; Seth, J.; Ghuwalewala, S.; Kathuria, S.; Rathi, S.; Patil, S.; Edara, M.; Saini, M.; Raj, P.; et al. Deep-learning-based automatic detection of pulmonary nodules from chest radiographs. medRxiv 2022, 1–18. [Google Scholar] [CrossRef]
- Qin, Z.Z.; Sander, M.S.; Rai, B.; Titahong, C.N.; Sudrungrot, S.; Laah, S.N.; Adhikari, L.M.; Carter, E.J.; Puri, L.; Codlin, A.J.; et al. Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems. Sci. Rep. 2019, 9, 15000. [Google Scholar] [CrossRef] [PubMed]
- Nafisah, S.I.; Muhammad, G. Tuberculosis detection in chest radiograph using convolutional neural network architecture and explainable artificial intelligence. Neural Comput. Appl. 2024, 36, 111–131. [Google Scholar] [CrossRef] [PubMed]
- Park, S.; Lee, S.M.; Kim, N.; Choe, J.; Cho, Y.; Do, K.-H.; Seo, J.B. Application of deep learning–based computer-aided detection system: Detecting pneumothorax on chest radiograph after biopsy. Eur. Radiol. 2019, 29, 5341–5348. [Google Scholar] [CrossRef]
- Putha, P.; Tadepalli, M.; Reddy, B.; Raj, T.; Chiramal, J.A.; Govil, S.; Sinha, N.; Ks, M.; Reddivari, S.; Jagirdar, A.; et al. Can Artificial Intelligence Reliably Report Chest X-rays?: Radiologist Validation of an Algorithm trained on 2.3 Million X-rays. arXiv 2018, arXiv:1807.07455. Available online: https://arxiv.org/abs/1807.07455v2 (accessed on 15 February 2024).
- Pham, H.H.; Nguyen, H.Q.; Lam, K.; Le, L.T.; Nguyen, D.B.; Nguyen, H.T.; Le, T.T.; Nguyen, T.V.; Dao, M.; Vu, V. An Accurate and Explainable Deep Learning System Improves Interobserver Agreement in the Interpretation of Chest Radiograph. medRxiv 2021. [CrossRef]
- Jin, K.N.; Kim, E.Y.; Kim, Y.J.; Lee, G.P.; Kim, H.; Oh, S.; Kim, Y.S.; Han, J.H.; Cho, Y.J. Diagnostic effect of artificial intelligence solution for referable thoracic abnormalities on chest radiography: A multicenter respiratory outpatient diagnostic cohort study. Eur. Radiol. 2022, 32, 3469–3479. [Google Scholar] [CrossRef] [PubMed]
- Govindarajan, A.; Govindarajan, A.; Tanamala, S.; Chattoraj, S.; Reddy, B.; Agrawal, R.; Iyer, D.; Srivastava, A.; Kumar, P.; Putha, P. Role of an Automated Deep Learning Algorithm for Reliable Screening of Abnormality in Chest Radiographs: A Prospective Multicenter Quality Improvement Study. Diagnostics 2022, 12, 2724. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; Available online: https://ora.ox.ac.uk/objects/uuid:60713f18-a6d1-4d97-8f45-b60ad8aebbce (accessed on 3 February 2024).
- Nguyen, N.H.; Nguyen, H.Q.; Nguyen, N.T.; Nguyen, T.V.; Pham, H.H.; Nguyen, T.N.-M. Deployment and validation of an AI system for detecting abnormal chest radiographs in clinical settings. Front. Digit. Health 2022, 4, 890759. [Google Scholar] [CrossRef]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD statement. Ann. Intern. Med. 2015, 162, 55–63. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. Available online: https://arxiv.org/abs/1502.03167v3 (accessed on 18 May 2024).
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. arXiv 2014, arXiv:1405.0312. Available online: https://arxiv.org/abs/1405.0312v3 (accessed on 18 May 2024).
- Irvin, J.; Rajpurkar, P.; Ko, M.; Yu, Y.; Ciurea-Ilcus, S.; Chute, C.; Marklund, H.; Haghgoo, B.; Ball, R.; Shpanskaya, K.; et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv 2019, arXiv:1901.07031. Available online: https://arxiv.org/abs/1901.07031v1 (accessed on 10 February 2024). [CrossRef]
- Hwang, E.J.; Nam, J.G.; Lim, W.H.; Park, S.J.; Jeong, Y.S.; Kang, J.H.; Hong, E.K.; Kim, T.M.; Goo, J.M.; Park, S.; et al. Deep Learning for Chest Radiograph Diagnosis in the Emergency Department. Radiology 2019, 293, 573–580. [Google Scholar] [CrossRef] [PubMed]
Readers Alone (without AI Aid) | Rayvolve Users (with AI Aid) | t-Test | |||||||
---|---|---|---|---|---|---|---|---|---|
Mean | SD | Var | CI (95%) | Mean | SD | Var | CI (95%) | p-Value | |
Time (s) | 22.9 | 2.3 | 5.4 | 1.503 | 14.7 | 1.3 | 1.7 | 0.849 | <0.001 * |
AUC | 0.759 | 0.07 | 0.01 | 0.046 | 0.88 | 0.05 | 0.01 | 0.033 | <0.001 * |
Sensitivity | 0.769 | 0.02 | 0 | 0.013 | 0.857 | 0.02 | 0 | 0.013 | <0.001 * |
Specificity | 0.946 | 0.01 | 0 | 0.007 | 0.974 | 0.01 | 0 | 0.007 | <0.001 * |
TP | FN | FP | TN | |
---|---|---|---|---|
Pleural effusion | 486 | 14 | 130 | 870 |
Consolidation | 465 | 35 | 114 | 886 |
Cardiomegaly | 489 | 11 | 161 | 839 |
Nodules | 494 | 6 | 205 | 795 |
Pneumothorax | 488 | 12 | 188 | 812 |
APE | 471 | 29 | 141 | 859 |
Anomaly | Sensitivity | Specificity | NPV | PPV | AUC [95% CI] |
---|---|---|---|---|---|
Pleural effusion | 0.972 | 0.87 | 0.984 | 0.789 | 0.9362 (0.9237–0.9477) |
Consolidation | 0.93 | 0.886 | 0.962 | 0.803 | 0.9161 (0.9024–0.9292) |
Cardiomegaly | 0.978 | 0.839 | 0.987 | 0.752 | 0.9464 (0.9361–0.9558) |
Nodules | 0.988 | 0.795 | 0.9925 | 0.707 | 0.9582 (0.9497–0.9659) |
Pneumothorax | 0.976 | 0.812 | 0.9854 | 0.7219 | 0.9457 (0.9355–0.9549) |
APE | 0.942 | 0.859 | 0.9673 | 0.7696 | 0.9123 (0.8981–0.92556) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bettinger, H.; Lenczner, G.; Guigui, J.; Rotenberg, L.; Zerbib, E.; Attia, A.; Vidal, J.; Beaumel, P. Evaluation of the Performance of an Artificial Intelligence (AI) Algorithm in Detecting Thoracic Pathologies on Chest Radiographs. Diagnostics 2024, 14, 1183. https://doi.org/10.3390/diagnostics14111183
Bettinger H, Lenczner G, Guigui J, Rotenberg L, Zerbib E, Attia A, Vidal J, Beaumel P. Evaluation of the Performance of an Artificial Intelligence (AI) Algorithm in Detecting Thoracic Pathologies on Chest Radiographs. Diagnostics. 2024; 14(11):1183. https://doi.org/10.3390/diagnostics14111183
Chicago/Turabian StyleBettinger, Hubert, Gregory Lenczner, Jean Guigui, Luc Rotenberg, Elie Zerbib, Alexandre Attia, Julien Vidal, and Pauline Beaumel. 2024. "Evaluation of the Performance of an Artificial Intelligence (AI) Algorithm in Detecting Thoracic Pathologies on Chest Radiographs" Diagnostics 14, no. 11: 1183. https://doi.org/10.3390/diagnostics14111183
APA StyleBettinger, H., Lenczner, G., Guigui, J., Rotenberg, L., Zerbib, E., Attia, A., Vidal, J., & Beaumel, P. (2024). Evaluation of the Performance of an Artificial Intelligence (AI) Algorithm in Detecting Thoracic Pathologies on Chest Radiographs. Diagnostics, 14(11), 1183. https://doi.org/10.3390/diagnostics14111183