COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare
Abstract
:1. Introduction
- Due to lack of large public data sets, we collected and merged three standard data sets (https://data.mendeley.com/datasets/9xkhgts2s6 (accessed on 1 May 2021)), (https://data.mendeley.com/datasets/8h65ywd2jr/3 (accessed on 1 May 2021)), (https://www.kaggle.com/endiqq/largest-covid19-dataset) (accessed on 1 May 2021) to form a 30 K chest X-ray images COVID-19 data set for multi-class classification and a 20 K images data set for binary classification. These two data sets have equal number of images in each class making it the largest and balanced data set on COVID-19 imaging-based detection available as open-source, which can help the research community in training much more accurate and generalizable models in the future.
- We implemented a model based on Vision Transformer (ViT) architecture on both data sets and achieved a state-of-the-art overall accuracy of 98.4% in distinguishing COVID-19 positive from normal X-rays, and an accuracy of 92.4% in distinguishing COVID-19 from pneumonia and normal X-ray images.
- For evaluation, we fine-tuned multiple state-of-the-art baseline models which are widely used in literature such as Inception-V3, Resnet-V2, EfficientNet-B0, MobileNet-V2, VGG-16, Xception, and DenseNet-121 on both of the data sets and compared these with our proposed model on multiple standard metrics.
- For better model interpretability and ease of diagnosis, we created Grad-CAM-based visualizations of COVID-19 progression in the lungs, which assists the diagnosis process for healthcare.
2. Model Architecture and Pipeline
2.1. Architecture
2.2. Fine-Tuning Procedure
2.3. Model Training Mechanism
- 1
- Accuracy: The most common performance metric in any classification problem is the accuracy metric. For the multi-class classification, the categorical accuracy was chosen which resembles the average accuracy over all the three classes of chest X-ray images. The binary classification involved the binary accuracy metric which measures how many times the predicted label matches the true label for the chest X-ray image.
- 2
- AUC score: The area under the ROC curve (AUC) score shows how well predictions are ranked across all the classes and how much the model can distinguish between each class. It ensures that performance across all feasible categorization criteria is aggregated. It has been proved in the literature that AUC score is a more robust metric to measure the ability of a classifier than the accuracy [43].
- 3
- Precision: Precision is defined as the number of true positives divided by the number of true positives plus the number of false positives.
- 4
- Recall: Recall is defined as the number of true positives divided by the number of true positives plus the number of false negatives.
3. Experimental Results and Discussion
3.1. Data Set
3.2. Preprocessing
- 1
- Resize: As neural network models have a fixed-size input layer, all images must be scaled to the same size. Therefore, we resize all the images in the data set to 224 × 224 pixels.
- 2
- Interpolation: There are a few images in the data set which are of size lesser than 224 × 224. While increasing their size, the estimation of new pixels needs to be done efficiently in order to retain quality. This process is termed “Interpolation” of images. For our pipeline we used the nearest neighbor interpolation, in which the closest pixel value to the supplied input coordinates is used to approximate the output pixel value. This approach is straightforward to implement, and there is no bogus data in the end result [47].
3.3. Data Augmentation
3.4. Testing Environment
3.5. Model Evaluation
3.6. Ablation Experiments
3.7. Comparison with Baseline Models
3.8. Grad-Cam Visualization
4. Case Study in Medical Services
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- World-Health-Organization. COVID-19 Weekly Epidemiological Update. Available online: https://www.who.int/ (accessed on 16 October 2021).
- Lang, T. Plug COVID-19 research gaps in detection, prevention and care. Nature 2020, 583, 333–334. [Google Scholar] [CrossRef]
- Yang, P.; Wang, X. COVID-19: A new challenge for human beings. Cell. Mol. Immunol. 2020, 17, 555–557. [Google Scholar] [CrossRef] [Green Version]
- Laajaj, R.; De Los Rios, C.; Sarmiento-Barbieri, I.; Aristizabal, D.; Behrentz, E.; Bernal, R.; Buitrago, G.; Cucunubá, Z.; de la Hoz, F.; Gaviria, A.; et al. COVID-19 spread, detection, and dynamics in Bogota, Colombia. Nat. Commun. 2021, 12, 1–8. [Google Scholar] [CrossRef]
- Vepa, A.; Saleem, A.; Rakhshan, K.; Daneshkhah, A.; Sedighi, T.; Shohaimi, S.; Omar, A.; Salari, N.; Chatrabgoun, O.; Dharmaraj, D.; et al. Using Machine Learning Algorithms to Develop a Clinical Decision-Making Tool for COVID-19 Inpatients. Int. J. Environ. Res. Public Health 2021, 18, 6228. [Google Scholar] [CrossRef]
- Ghibu, S.; Juncan, A.M.; Rus, L.L.; Frum, A.; Dobrea, C.M.; Chiş, A.A.; Gligor, F.G.; Morgovan, C. The Particularities of Pharmaceutical Care in Improving Public Health Service During the COVID-19 Pandemic. Int. J. Environ. Res. Public Health 2021, 18, 9776. [Google Scholar] [CrossRef]
- Xu, T. Psychological Distress of International Students during the COVID-19 Pandemic in China: Multidimensional Effects of External Environment, Individuals’ Behavior, and Their Values. Int. J. Environ. Res. Public Health 2021, 18, 9758. [Google Scholar] [CrossRef]
- Cass, A.L.; Slining, M.M.; Carson, C.; Cassidy, J.; Epright, M.C.; Gilchrist, A.E.; Peterson, K.; Wheeler, J.F. Risk Management of COVID-19 in the Residential Educational Setting: Lessons Learned and Implications for Moving Forward. Int. J. Environ. Res. Public Health 2021, 18, 9743. [Google Scholar] [CrossRef]
- Ting, D.S.W.; Carin, L.; Dzau, V.; Wong, T.Y. Digital technology and COVID-19. Nat. Med. 2020, 26, 459–461. [Google Scholar] [CrossRef] [Green Version]
- Nayak, S.R.; Nayak, D.R.; Sinha, U.; Arora, V.; Pachori, R.B. Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: A comprehensive study. Biomed. Signal Process. Control 2021, 64, 102365. [Google Scholar] [CrossRef]
- Cozzi, A.; Schiaffino, S.; Arpaia, F.; Della Pepa, G.; Tritella, S.; Bertolotti, P.; Menicagli, L.; Monaco, C.G.; Carbonaro, L.A.; Spairani, R.; et al. Chest X-ray in the COVID-19 pandemic: Radiologists’ real-world reader performance. Eur. J. Radiol. 2020, 132, 109272. [Google Scholar] [CrossRef]
- Zhou, S.K.; Greenspan, H.; Davatzikos, C.; Duncan, J.S.; Van Ginneken, B.; Madabhushi, A.; Prince, J.L.; Rueckert, D.; Summers, R.M. A Review of Deep Learning in Medical Imaging: Imaging Traits, Technology Trends, Case Studies With Progress Highlights, and Future Promises. Proc. IEEE 2021, 109, 820–838. [Google Scholar] [CrossRef]
- Mittal, H.; Pandey, A.C.; Pal, R.; Tripathi, A. A new clustering method for the diagnosis of CoVID19 using medical images. Appl. Intell. 2021, 51, 2988–3011. [Google Scholar] [CrossRef]
- Xu, R.; Cao, X.; Wang, Y.; Chen, Y.W.; Ye, X.; Lin, L.; Zhu, W.; Chen, C.; Xu, F.; Zhou, Y.; et al. Unsupervised Detection of Pulmonary Opacities for Computer-Aided Diagnosis of COVID-19 on CT Images. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10 January 2021; pp. 9007–9014. [Google Scholar] [CrossRef]
- Al-antari, M.A.; Hua, C.H.; Bang, J.; Lee, S. Fast deep learning computer-aided diagnosis of COVID-19 based on digital chest X-ray images. Appl. Intell. 2021, 51, 2890–2907. [Google Scholar] [CrossRef]
- Saiz, F.A.; Barandiaran, I. COVID-19 detection in chest X-ray images using a deep learning approach. Int. J. Interact. Multimed. Artif. Intell. 2020, 1. in press. [Google Scholar] [CrossRef]
- Aslan, M.F.; Unlersen, M.F.; Sabanci, K.; Durdu, A. CNN-based transfer learning—BiLSTM network: A novel approach for COVID-19 infection detection. Appl. Soft Comput. 2021, 98, 106912. [Google Scholar] [CrossRef] [PubMed]
- Marques, G.; Agarwal, D.; de la Torre Díez, I. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Appl. Soft Comput. 2020, 96, 106691. [Google Scholar] [CrossRef]
- Demir, F. DeepCoroNet: A deep LSTM approach for automated detection of COVID-19 cases from chest X-ray images. Appl. Soft Comput. 2021, 103, 107160. [Google Scholar] [CrossRef] [PubMed]
- Mukherjee, H.; Ghosh, S.; Dhar, A.; Obaidullah, S.M.; Santosh, K.; Roy, K. Deep neural network to detect COVID-19: One architecture for both CT Scans and Chest X-rays. Appl. Intell. 2020, 51, 2777–2789. [Google Scholar] [CrossRef]
- Li, D.; Fu, Z.; Xu, J. Stacked-autoencoder-based model for COVID-19 diagnosis on CT images. Appl. Intell. 2020, 51, 2805–2817. [Google Scholar] [CrossRef]
- Chakraborty, M.; Dhavale, S.V.; Ingole, J. Corona-Nidaan: Lightweight deep convolutional neural network for chest X-ray based COVID-19 infection detection. Appl. Intell. 2021, 51, 3026–3043. [Google Scholar] [CrossRef]
- Perumal, V.; Narayanan, V.; Rajasekar, S.J.S. Detection of COVID-19 using CXR and CT images using Transfer Learning and Haralick features. Appl. Intell. 2021, 51, 341–358. [Google Scholar] [CrossRef]
- Khan, A.I.; Shah, J.L.; Bhat, M.M. CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest X-ray images. Comput. Methods Prog. Biomed. 2020, 196, 105581. [Google Scholar] [CrossRef]
- Oh, Y.; Park, S.; Ye, J.C. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans. Med. Imaging 2020, 39, 2688–2700. [Google Scholar] [CrossRef]
- Mishra, M.; Parashar, V.; Shimpi, R. Development and evaluation of an AI System for early detection of Covid-19 pneumonia using X-ray (Student Consortium). In Proceedings of the 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), New Delhi, India, 24 September 2020; pp. 292–296. [Google Scholar]
- Sitaula, C.; Hossain, M.B. Attention-based VGG-16 model for COVID-19 chest X-ray image classification. Appl. Intell. 2020, 51, 2850–2863. [Google Scholar] [CrossRef]
- Shankar, K.; Perumal, E.; Díaz, V.G.; Tiwari, P.; Gupta, D.; Saudagar, A.K.J.; Muhammad, K. An optimal cascaded recurrent neural network for intelligent COVID-19 detection using Chest X-ray images. Appl. Soft Comput. 2021, 113, 107878. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Wang, Z.; Hu, S. Recognizing COVID-19 positive: Through CT images. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6 November 2020; pp. 4572–4577. [Google Scholar] [CrossRef]
- Luz, E.; Silva, P.; Silva, R.; Silva, L.; Guimarães, J.; Miozzo, G.; Moreira, G.; Menotti, D. Towards an effective and efficient deep learning model for COVID-19 patterns detection in X-ray images. Res. Biomed. Eng. 2021, 1–14. [Google Scholar] [CrossRef]
- Pham, T.D. A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks. Sci. Rep. 2020, 10, 1–8. [Google Scholar] [CrossRef]
- Wang, B.; Xie, Q.; Pei, J.; Tiwari, P.; Li, Z. Pre-trained Language Models in Biomedical Domain: A Survey from Multiscale Perspective. arXiv 2021, arXiv:2110.05006. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20 June 2009; pp. 248–255. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning; PMLR: Lille, France, 2015; pp. 448–456. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2019. Available online: https://arxiv.org/abs/1810.04805 (accessed on 5 October 2021).
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Cortes, C.; Mohri, M.; Rostamizadeh, A. L2 regularization for learning kernels. arXiv 2012, arXiv:1205.2653. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G. When does label smoothing help? arXiv 2019, arXiv:1906.02629. [Google Scholar]
- Chollet, F. Keras: The python deep learning library. 2018. Available online: https://ui.adsabs.harvard.edu/abs/2018ascl.soft06022C/abstract (accessed on 5 October 2021).
- Ling, C.X.; Huang, J.; Zhang, H. AUC: A statistically consistent and more discriminating measure than accuracy. Ijcai 2003, 3, 519–524. [Google Scholar]
- El-Shafai, W.; Abd El-Samie, F. Extensive COVID-19 X-ray and CT Chest Images Dataset. Mendeley Data 2020, 3, 384. [Google Scholar]
- Sait, U.; Lal, K.G.; Prajapati, S.; Bhaumik, R.; Kumar, T.; Sanjana, S.; Bhalla, K. Curated Dataset for COVID-19 Posterior-Anterior Chest Radiography Images (X-rays). Mendeley Data 2020. [Google Scholar] [CrossRef]
- Qi, X.; Brown, L.G.; Foran, D.J.; Nosher, J.; Hacihaliloglu, I. Chest X-ray image phase features for improved diagnosis of COVID-19 using convolutional neural network. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 197–206. [Google Scholar] [CrossRef]
- Devaraj, S.J. Emerging Paradigms in Transform-Based Medical Image Compression for Telemedicine Environment. In Telemedicine Technologies; Elsevier: Amsterdam, The Netherlands, 2019; pp. 15–29. [Google Scholar]
- Hussain, Z.; Gimenez, F.; Yi, D.; Rubin, D. Differential data augmentation techniques for medical imaging classification tasks. In AMIA Annual Symposium Proceedings; American Medical Informatics Association: Bethesda, MD, USA, 2017; Volume 2017, p. 979. [Google Scholar]
- Muhammad, K.; Khan, S.; Del Ser, J.; de Albuquerque, V.H.C. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 507–522. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org/ (accessed on 10 May 2021).
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22 October 2017; pp. 618–626. [Google Scholar]
Model | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|
Binary-class | 0.98 | 0.97 | 0.97 | 0.97 | 0.99 |
Multi-class | 0.92 | 0.93 | 0.89 | 0.91 | 0.98 |
Model | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|
Inception-V3 [31] | 0.90 | 0.89 | 0.91 | 0.89 | 0.92 |
EfficientNet-B0 [30] | 0.89 | 0.88 | 0.89 | 0.88 | 0.92 |
MobileNet-V2 [31] | 0.90 | 0.90 | 0.89 | 0.90 | 0.92 |
ResNet-V2 [29,31] | 0.88 | 0.87 | 0.86 | 0.86 | 0.93 |
VGG-16 [23,27,29,31] | 0.87 | 0.87 | 0.85 | 0.86 | 0.90 |
Xception [24,31] | 0.90 | 0.92 | 0.87 | 0.90 | 0.93 |
DenseNet-121 [25,29,31] | 0.88 | 0.90 | 0.85 | 0.87 | 0.92 |
COVID-Transformer (Ours) | 0.92 | 0.93 | 0.89 | 0.91 | 0.98 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shome, D.; Kar, T.; Mohanty, S.N.; Tiwari, P.; Muhammad, K.; AlTameem, A.; Zhang, Y.; Saudagar, A.K.J. COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare. Int. J. Environ. Res. Public Health 2021, 18, 11086. https://doi.org/10.3390/ijerph182111086
Shome D, Kar T, Mohanty SN, Tiwari P, Muhammad K, AlTameem A, Zhang Y, Saudagar AKJ. COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare. International Journal of Environmental Research and Public Health. 2021; 18(21):11086. https://doi.org/10.3390/ijerph182111086
Chicago/Turabian StyleShome, Debaditya, T. Kar, Sachi Nandan Mohanty, Prayag Tiwari, Khan Muhammad, Abdullah AlTameem, Yazhou Zhang, and Abdul Khader Jilani Saudagar. 2021. "COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare" International Journal of Environmental Research and Public Health 18, no. 21: 11086. https://doi.org/10.3390/ijerph182111086
APA StyleShome, D., Kar, T., Mohanty, S. N., Tiwari, P., Muhammad, K., AlTameem, A., Zhang, Y., & Saudagar, A. K. J. (2021). COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare. International Journal of Environmental Research and Public Health, 18(21), 11086. https://doi.org/10.3390/ijerph182111086