Identifying Malignant Breast Ultrasound Images Using ViT-Patch
Abstract
:1. Introduction
2. Methods
2.1. Standard ViT for Malignant Identification
2.2. Improved ViT-Patch Architecture
3. Experiments
3.1. Dataset
3.2. Experimental Setting
3.3. Experimental Results
3.4. Learned Features for ViT vs. ViT-Patch
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Correction Statement
References
- Zheng, W.; Yang, B.; Xiao, Y.; Tian, J.; Liu, S.; Yin, L. Low-Dose CT Image Post-Processing Based on Learn-Type Sparse Transform. Sensors 2022, 22, 2883. [Google Scholar] [CrossRef] [PubMed]
- Nikolaev, A.V.; de Jong, L.; Weijers, G.; Groenhuis, V.; Mann, R.M.; Siepel, F.J.; Maris, B.M.; Stramigioli, S.; Hansen, H.H.G.; de Korte, C.L. Quantitative Evaluation of an Automated Cone-Based Breast Ultrasound Scanner for MRI–3D US Image Fusion. IEEE Trans. Med. Imaging 2021, 40, 1229–1239. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Yang, B.; Xu, C.; Tian, J.; Liu, Y.; Yin, L.; Liu, S.; Zheng, W.; Liu, C. Sparse Angle CBCT Reconstruction Based on Guided Image Filtering. Front. Oncol. 2022, 12, 832037. [Google Scholar] [CrossRef]
- Brosch, T.; Tam, R. Manifold Learning of Brain MRIs by Deep Learning. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Nagoya, Japan, 22–26 September 2013. [Google Scholar]
- Plis, S.M.; Hjelm, D.R.; Salakhutdinov, R.; Calhoun, V.D. Deep learning for neuroimaging: A validation study. arXiv 2013, arXiv:1312.5847. [Google Scholar] [CrossRef]
- Cao, Z.; Duan, L.; Yang, G.; Yue, T.; Chen, Q.; Fu, H.; Xu, Y. Breast Tumor Detection in Ultrasound Images Using Deep Learning. In Proceedings of the Patch-Based Techniques in Medical Imaging; Wu, G., Munsell, B.C., Zhan, Y., Bai, W., Sanroma, G., Coupé, P., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 121–128. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2022; pp. 1748–1758. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; J’egou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. arXiv 2021, arXiv:2102.12122. [Google Scholar]
- Song, L.; Liu, G.; Ma, M. TD-Net:unsupervised medical image registration network based on Transformer and CNN. Appl. Intell. 2022, 52, 18201–18209. [Google Scholar] [CrossRef]
- Wu, Y.; Qi, S.; Sun, Y.; Xia, S.; Yao, Y.; Qian, W. A vision transformer for emphysema classification using CT images. Phys. Med. Biol. 2021, 66, 245016. [Google Scholar] [CrossRef]
- Gao, X.; Qian, Y.; Gao, A. COVID-VIT: Classification of COVID-19 from CT chest images based on vision transformer models. arXiv 2021, arXiv:2107.01682. [Google Scholar]
- Gao, Y.; Zhou, M.; Metaxas, D. UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation. arXiv 2021, arXiv:2107.00781. [Google Scholar]
- Peiris, H.; Hayat, M.; Chen, Z.; Egan, G.; Harandi, M. A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation. arXiv 2021, arXiv:2111.13300. [Google Scholar]
- Yan, X.; Tang, H.; Sun, S.; Ma, H.; Kong, D.; Xie, X. AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation. arXiv 2021, arXiv:2110.10403. [Google Scholar]
- Gheflati, B.; Rivaz, H. Vision Transformer for Classification of Breast Ultrasound Images. arXiv 2021, arXiv:2110.14731. [Google Scholar]
- Shamshad, F.; Khan, S.; Waqas Zamir, S.; Haris Khan, M.; Hayat, M.; Shahbaz Khan, F.; Fu, H. Transformers in Medical Imaging: A Survey. arXiv 2022, arXiv:2201.09873. [Google Scholar]
- Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. MaxViT: Multi-Axis Vision Transformer. arXiv 2022, arXiv:2204.01697. [Google Scholar]
- Yi, Y.; Zhao, H.; Hu, Z.; Peng, J. A local–global transformer for distributed monitoring of multi-unit nonlinear processes. J. Process Control 2023, 122, 13–26. [Google Scholar] [CrossRef]
- Yuan, L.; Chen, Y.; Wang, T.; Yu, W.; Shi, Y.; Jiang, Z.; Tay, F.E.; Feng, J.; Yan, S. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. arXiv 2021, arXiv:2101.11986. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Wang, Q.; Li, B.; Xiao, T.; Zhu, J.; Li, C.; Wong, D.F.; Chao, L.S. Learning Deep Transformer Models for Machine Translation. arXiv 2019, arXiv:1906.01787. [Google Scholar]
- Baevski, A.; Auli, M. Adaptive Input Representations for Neural Language Modeling. arXiv 2018, arXiv:1809.10853. [Google Scholar]
- Al-Dhabyani, W.; Gomaa, M.; Khaled, H.; Fahmy, A. Dataset of breast ultrasound images. Data Brief 2020, 28, 104863. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
- Liu, S.; Deng, W. Very deep convolutional neural network based image classification using small training sample size. In Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3–6 November 2015; pp. 730–734. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in Transformer. arXiv 2021, arXiv:2103.00112. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Characteristics | Benign | Malignant |
---|---|---|
Shape | Regular | Irregular |
Margins | Smooth | irregular |
Model | ACC (%) | SEN (%) | Params |
---|---|---|---|
ResNet50 [29] | 83.1 | 48.5 | - |
InceptionV3 [30] | 86.4 | 54.5 | - |
VGG16 [31] | 88.1 | 66.7 | - |
Swin-B [23] | 86.4 | 60.6 | 88 M |
TNT-B [32] | 83.1 | 63.6 | 65.6 M |
T2T-ViT-24 [22] | 84.7 | 51.5 | 64.1 M |
PVT-Large [11] | 85.6 | 60.6 | 61.4 M |
ViT/ViT-Patch (16) | 85.6/89.0 | 66.7/69.7 | 88.46 M/90.82 M |
ViT/ViT-Patch (32) | 84.7/89.8 | 60.6/72.7 | 89.89 M/92.26 M |
ViT/ViT-Patch (64) | 80.5/85.6 | 57.6/54.5 | 96.89 M/99.25 M |
ViT/ViT-Patch (128) | 79.7/78.8 | 54.5/54.5 | 125.18 M/127.54 M |
Model | DICE (%) |
---|---|
ViT-Patch (16) | 62.0 |
ViT-Patch (32) | 61.5 |
ViT-Patch (64) | 63.1 |
ViT-Patch (128) | 68.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Feng, H.; Yang, B.; Wang, J.; Liu, M.; Yin, L.; Zheng, W.; Yin, Z.; Liu, C. Identifying Malignant Breast Ultrasound Images Using ViT-Patch. Appl. Sci. 2023, 13, 3489. https://doi.org/10.3390/app13063489
Feng H, Yang B, Wang J, Liu M, Yin L, Zheng W, Yin Z, Liu C. Identifying Malignant Breast Ultrasound Images Using ViT-Patch. Applied Sciences. 2023; 13(6):3489. https://doi.org/10.3390/app13063489
Chicago/Turabian StyleFeng, Hao, Bo Yang, Jingwen Wang, Mingzhe Liu, Lirong Yin, Wenfeng Zheng, Zhengtong Yin, and Chao Liu. 2023. "Identifying Malignant Breast Ultrasound Images Using ViT-Patch" Applied Sciences 13, no. 6: 3489. https://doi.org/10.3390/app13063489
APA StyleFeng, H., Yang, B., Wang, J., Liu, M., Yin, L., Zheng, W., Yin, Z., & Liu, C. (2023). Identifying Malignant Breast Ultrasound Images Using ViT-Patch. Applied Sciences, 13(6), 3489. https://doi.org/10.3390/app13063489