Explainable Machine Learning for Lung Cancer Screening Models
Abstract
:1. Introduction
2. Models for Lung Cancer Screening
- corresponds to the baseline survival without being diagnosed with lung cancer beyond 1-year, its estimate is 0.996229;
- corresponds to the baseline overall survival beyond 1 year, its estimate is 0.9917663;
- X is the matrix of values of variables for a given person;
- are the values of i-th variable;
- are coefficients estimated in the model, see Table 2.
3. Dataset for Model Comparisons
4. Results
4.1. Dataset-Level Explanations
4.2. Individual-Level Explanations
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Dataset-Level Explanations
Appendix A.2. Individual-Level Explanations
References
- O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Crown Publishing Group: New York, NY, USA, 2016. [Google Scholar]
- European Commission. On Artificial Intelligence—A European Approach to Excellence and Trust; European Commission: Luxembourg, 2020. [Google Scholar]
- EU Expert Group. Ethics Guidelines for Trustworthy AI; EU Expert Group: Brussels, Belgium, 2019. [Google Scholar]
- Dickson, B. Inside DARPA’s Effort to Create Explainable Artificial Intelligence; DARPA: Arlington, VA, USA, 2019. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Gosiewska, A.; Biecek, P. Do Not Trust Additive Explanations. arXiv 2019, arXiv:1903.11420. [Google Scholar]
- Biecek, P. DALEX: Explainers for Complex Predictive Models in R. J. Mach. Learn. Res. 2018, 19, 1–5. [Google Scholar]
- Wexler, J.; Pushkarna, M.; Bolukbasi, T.; Wattenberg, M.; Viégas, F.; Wilson, J. (Eds.) The What-If Tool: Interactive Probing of Machine Learning Models; Institute of Electrical and Electronics Engineers (IEEE): Washington, DC, USA, 2019. [Google Scholar]
- Nori, H.; Jenkins, S.; Koch, P.; Caruana, R. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv 2019, arXiv:1909.09223. [Google Scholar]
- Thorsen-Meyer, H.C.; Nielsen, A.B.; Nielsen, A.P.; Kaas-Hansen, B.S.; Toft, P.; Schierbeck, J.; Strøm, T.; Chmura, P.J.; Heimann, M.; Dybdahl, L.; et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: A retrospective study of high-frequency data in electronic patient records. Lancet Digit. Health 2020, 2, e179–e191. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
- Hyland, S.; Faltys, M.; Hüser, M.; Lyu, X.; Gumbsch, T.; Esteban, C.; Bock, C.; Horn, M.; Moor, M.; Rieck, B.; et al. Early Prediction of Circulatory Failure in the Intensive Care Unit Using Machine Learning. Nat. Med. 2020, 26, 364–373. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Nair, B.; Vavilala, M.S.; Horibe, M.; Eisses, M.J.; Adams, T.; Liston, D.E.; Low, D.K.W.; Newman, S.F.; Kim, J.; et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2018, 2, 749–760. [Google Scholar] [CrossRef]
- Singh, A.; Sengupta, S.; Lakshminarayanan, V. Explainable Deep Learning Models in Medical Image Analysis. J. Imaging 2020, 6, 52. [Google Scholar] [CrossRef]
- Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
- Xie, Y.; Chen, M.; Kao, D.; Gao, G.; Chen, X. CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis. In Proceedings of the CHI’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; Association for Computing Machinery: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
- Lauritsen, S.M.; Kristensen, M.R.B.; Olsen, M.V.; Larsen, M.S.; Lauritsen, K.M.; Jørgensen, M.J.; Lange, J.; Thiesson, B. Explainable artificial intelligence model to predict acute critical illness from electronic health records. arXiv 2019, arXiv:1912.01266. [Google Scholar] [CrossRef]
- Paul, R.; Schabath, M.; Gillies, R.; Hall, L.; Goldgof, D. Convolutional Neural Network ensembles for accurate lung nodule malignancy prediction 2 years in the future. Comput. Biol. Med. 2020, 122, 103882. [Google Scholar] [CrossRef] [PubMed]
- Xi, J.; Zhao, W.; Yuan, J.E.; Cao, B.; Zhao, L. Multi-resolution classification of exhaled aerosol images to detect obstructive lung diseases in small airways. Comput. Biol. Med. 2017, 87, 57–69. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Jia, Z.; Xie, D.; Chen, K.; Cui, J.; Liu, H. Recognizing lung cancer using a homemade e-nose: A comprehensive study. Comput. Biol. Med. 2020, 120, 103706. [Google Scholar] [CrossRef] [PubMed]
- National Lung Screening Trial Research Team. Reduced Lung-Cancer Mortality with Low-Dose Computed Tomographic Screening. N. Engl. J. Med. 2011, 365, 395–409. [Google Scholar] [CrossRef] [Green Version]
- De Koning, H.J.; van der Aalst, C.M.; de Jong, P.A.; Scholten, E.T.; Nackaerts, K.; Heuvelmans, M.A.; Lammers, J.W.J.; Weenink, C.; Yousaf-Khan, U.; Horeweg, N.; et al. Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial. N. Engl. J. Med. 2020, 382, 503–513. [Google Scholar] [CrossRef]
- Raghu, V.K.; Zhao, W.; Pu, J.; Leader, J.K.; Wang, R.; Herman, J.; Yuan, J.M.; Benos, P.V.; Wilson, D.O. Feasibility of lung cancer prediction from low-dose CT scan and smoking factors using causal models. Thorax 2019, 74, 643–649. [Google Scholar] [CrossRef] [Green Version]
- Tammemägi, M.C. Selecting lung cancer screenees using risk prediction models—Where do we go from here. Transl. Lung Cancer Res. 2018, 7, 243. [Google Scholar] [CrossRef]
- Bach, P.B.; Kattan, M.W.; Thornquist, M.D.; Kris, M.G.; Tate, R.C.; Barnett, M.J.; Hsieh, L.J.; Begg, C.B. Variations in Lung Cancer Risk Among Smokers. J. Natl. Cancer Inst. 2003, 95, 470–478. [Google Scholar] [CrossRef] [Green Version]
- Tammemägi, M.C.; Katki, H.A.; Hocking, W.G.; Church, T.R.; Caporaso, N.; Kvale, P.A.; Chaturvedi, A.K.; Silvestri, G.A.; Riley, T.L.; Commins, J.; et al. Selection Criteria for Lung-Cancer Screening. N. Engl. J. Med. 2013, 368, 728–736. [Google Scholar] [CrossRef] [Green Version]
- Katki, H.; Kovalchik, S.; Cheung, C.B.L.; Chaturvedi, A. Development and Validation of Risk Models to Select Ever-Smokers for CT Lung Cancer Screening. JAMA 2016, 315, 2300–2311. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
- Cheung, L.C.; Kovalchik, S.A.; Katki, H.A. lcmodels: Predictions from Lung Cancer Models. R Package Version 4.0.0. 2019. Available online: https://dceg.cancer.gov/tools/risk-assessment/lcmodels/lcmodels-manual.pdf (accessed on 6 November 2021).
- Katki, H.; Kovalchik, S.; Petito, L.; Cheung, L.; Jacobs, E.; Jemal, A.; Berg, C.; Chaturvedi, A. Implications of nine risk prediction models for selecting ever-smokers for computed tomography lung cancer screening. Ann. Intern. Med. 2018, 169, 10–19. [Google Scholar] [CrossRef] [PubMed]
- Miller, T. Explanation in Artificial Intelligence: Insights from the Social Sciences. arXiv 2018, arXiv:1706.07269. [Google Scholar] [CrossRef]
- Biecek, P.; Burzykowski, T. Explanatory Model Analysis; Chapman and Hall/CRC: New York, NY, USA, 2021. [Google Scholar]
- Pękala, K.; Biecek, P. triplot: Explaining Correlated Features in Machine Learning Models. R Package. 2020. Available online: https://cran.r-project.org/web/packages/triplot/triplot.pdf (accessed on 6 November 2021).
- Apley, D. ALEPlot: Accumulated Local Effects (ALE) Plots and Partial Dependence (PD) Plots. R Package. 2018. Available online: https://cran.r-project.org/web/packages/ALEPlot/ALEPlot.pdf (accessed on 6 November 2021).
- Fisher, A.; Rudin, C.; Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. 2019, 20, 1–81. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable; Leanpub: Victoria, BC, Canada, 2018. [Google Scholar]
- Staniak, M.; Biecek, P. Explanations of Model Predictions with live and breakDown Packages. R J. 2018, 10, 395–409. [Google Scholar] [CrossRef] [Green Version]
- Siddhartha, M.; Maity, P.; Nath, R. Explanatory Artificial Intelligence (XAI) in the prediction of post-operative life expectancy in lung cancer patients. Int. J. Sci. Res. 2019, 8, 112. [Google Scholar]
- Kobylińska, K.; Mikołajczyk, T.; Adamek, M.; Orłowski, T.; Biecek, P. Explainable Machine Learning for Modeling of Early Postoperative Mortality in Lung Cancer. In Artificial Intelligence in Medicine: Knowledge Representation and Transparent and Explainable Systems; Marcos, M., Juarez, J.M., Lenz, R., Nalepa, G.J., Nowaczyk, S., Peleg, M., Stefanowski, J., Stiglic, G., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 161–174. [Google Scholar]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2000, 29, 1189–1232. [Google Scholar] [CrossRef]
LCRAT Model | BACH Model | PLCO Model | |
---|---|---|---|
Age | + | + | + |
Gender | + | + | |
Race/ethnicity | + | + | |
Education | + | + | |
BMI | + | + | |
Smoking status | + | ||
Quitted smoking (in years) | + | + | + |
Years smoked | + | + | + |
Cigarettes per day | + | + | + |
Pack-years | + | ||
Prior cancer | + | ||
Lung disease | + | + | |
Asbestos exposure | + | ||
Any relatives with LC | + | + | |
Number of relatives with LC | + | ||
Total number of variables | 12 | 6 | 11 |
Prediction of Lung Cancer risk | 5 years | 10 years | 6 years |
Variable | Expression | Coefficient |
---|---|---|
intercept | −9.7960571 | |
age | 0.070322812 | |
age | −0.00009382122 | |
age | 0.00018282661 | |
age | −0.000089005389 | |
female | −0.05827261 | |
qtyears | −0.085684793 | |
qtyears | 0.0065499693 | |
qtyears | −0.0068305845 | |
qtyears | 0.00028061519 | |
smkyears | 0.11425297 | |
smkyears | −0.000080091477 | |
smkyears | 0.00017069483 | |
smkyears | −0.000090603358 | |
cpd | 0.060818386 | |
cpd | −0.00014652216 | |
cpd | 0.00018486938 | |
cpd | −0.000038347226 | |
asbestos | 0.2153936 |
Variable | Description | Mean | sd | Median |
---|---|---|---|---|
age | age at diagnosis | 63.02 | 8.66 | 63 |
smkyears | smoking years | 21.59 | 18.90 | 25 |
qtyears | quit smoking time | 1.70 | 5.03 | 0 |
cpd | cigarettes per day | 13.02 | 11.84 | 20 |
Variable | Description | Frequencies |
---|---|---|
female | being a female | No 0: 22,219 (64.6%) |
Yes 1: 12174 (35.4%) | ||
race | ethnicity | White 0: 34,393 (100.0%) |
emp | emphysema | No 0: 34,284 (99.7%) |
Yes 1: 109 (0.3%) | ||
fam.lung.trend | number of first degree relatives | No relatives 0: 33,332 (96.9%) |
with lung cancer | 1 or more 1: 1061 (3.1%) | |
asb | asbestos exposure | No 0: 34,223 (99.5%) |
Yes 1: 170 (0.5%) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kobylińska, K.; Orłowski, T.; Adamek, M.; Biecek, P. Explainable Machine Learning for Lung Cancer Screening Models. Appl. Sci. 2022, 12, 1926. https://doi.org/10.3390/app12041926
Kobylińska K, Orłowski T, Adamek M, Biecek P. Explainable Machine Learning for Lung Cancer Screening Models. Applied Sciences. 2022; 12(4):1926. https://doi.org/10.3390/app12041926
Chicago/Turabian StyleKobylińska, Katarzyna, Tadeusz Orłowski, Mariusz Adamek, and Przemysław Biecek. 2022. "Explainable Machine Learning for Lung Cancer Screening Models" Applied Sciences 12, no. 4: 1926. https://doi.org/10.3390/app12041926
APA StyleKobylińska, K., Orłowski, T., Adamek, M., & Biecek, P. (2022). Explainable Machine Learning for Lung Cancer Screening Models. Applied Sciences, 12(4), 1926. https://doi.org/10.3390/app12041926