MMAformer: Multiscale Modality-Aware Transformer for Medical Image Segmentation
Abstract
:1. Introduction
- In this paper, we proposed a four-stage Transformer framework designed for processing parallel multimodal medical images by using the cross-modality downsampling (CMD) module. CMD modules are used to select modality types of interest to the model at different scales, enhancing the modality awareness of the Transformer network for multimodal medical images;
- We designed a multimodality gated aggregation block that combines a dual-attention mechanism with multi-gated clustering, efficiently enhancing and integrating spatial, channel, and modal features across different imaging modalities;
- We conducted sufficient experiments on several datasets to validate the segmentation accuracy of the proposed model. For example, we increased the Dice score from 90.66% to 91.53% on the BraTS2021 dataset compared to the previous method, which is an improvement of about 1%. The convergence stability of the model is illustrated by more experiments in the experimental section.
2. Related Work
2.1. CNN-Based Segmentation Networks
2.2. Transformers-Based Segmentation Networks
3. Methods
3.1. Dataset and Pre-Processing
3.2. Encoder
Global Average Pooling
3.3. Cross-Modality Downsampling
3.4. Multimodality Gated Aggregating
3.4.1. Cluster Spatial and Channel Information
3.4.2. Cluster Intermodal Information
3.5. Loss Function
3.6. Evaluation Metrics
4. Experimental Setup
4.1. Implementation Details
4.2. Experiments
4.2.1. Experiment 1—Comparison in the BraTS2020 Dataset
4.2.2. Experiment 2—Comparison in the BraTS2021 Dataset
4.2.3. Experiment 3—Ablation Study on BraTS2021
4.2.4. Experiment 4—Comparison of Convergence Stability of Training Models
5. Results and Discussion
Limitation
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Balwant, M.K. A Review on Convolutional Neural Networks for Brain Tumor Segmentation: Methods, Datasets, Libraries, and Future Directions. IRBM 2022, 43, 521–537. [Google Scholar] [CrossRef]
- Ostrom, Q.T.; Patil, N.; Cioffi, G.; Waite, K.; Kruchko, C.; Barnholtz-Sloan, J.S. Cbtrus statistical report: Primary brain and central nervous system tumors diag-nosed in the united states in 2013–2017. Neuro-Oncology 2020, 22 (Suppl. 1), iv1–iv96. [Google Scholar] [CrossRef] [PubMed]
- Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Freymann, J.B.; Farahani, K.; Davatzikos, C. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef] [PubMed]
- Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overallsurvival prediction in the brats challenge. arXiv 2018. [Google Scholar] [CrossRef]
- Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
- Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Farahani, K.; Kalpathy-Cramer, J.; Kitamura, F.C.; Pati, S.; et al. The rsna-asnr-miccai brats 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv 2021. [Google Scholar] [CrossRef]
- Xie, H.; Yang, D.; Sun, N.; Chen, Z.; Zhang, Y. Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recognit. 2019, 85, 109–119. [Google Scholar] [CrossRef]
- Pak, M.; Kim, S. A review of deep learning in image recognition. In Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), Kuta Bali, Indonesia, 8–10 August 2017; pp. 1–3. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2015, 39, 3431–3440. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical Image Segmentation based on U-Net: A Review. J. Imaging Sci. Technol. 2020, 64, 1–12. [Google Scholar] [CrossRef]
- Qamar, S.; Jin, H.; Zheng, R.; Ahmad, P.; Usama, M. A variant form of 3D-UNet for infant brain segmentation. Future Gener. Comput. Syst. 2020, 108, 613–623. [Google Scholar] [CrossRef]
- Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nsndi, K.A. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
- Wu, W.; Gao, L.; Duan, H.; Huang, G.; Ye, X.; Nie, S. Segmentation of pulmonary nodules in CT images based on 3D-UNET combined with three-dimensional conditional random field optimization. Med. Phys. 2020, 47, 4054–4063. [Google Scholar] [CrossRef] [PubMed]
- Milletari, F.; Navab, N.; Ahmadi, S.A.; Net, V. Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), COUNTRY, Stanford, CA, USA, 15 June 2016. [Google Scholar] [CrossRef]
- Guan, X.; Yang, G.; Ye, J.; Yang, W.; Xu, X.; Jiang, W.; Lai, X. 3D AGSE-VNet: An automatic brain tumor MRI data segmentation framework. BMC Med. Imaging 2022, 22, 6. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.T. TransUNet: Transformers make strong encoders for medical image segmentation. arXiv 2021. [Google Scholar] [CrossRef]
- Selvi, S.; Vishvaksenan, A.; Rajasekar, E. Cold metal transfer (CMT) technology-An overview. Def. Technol. 2018, 14, 28–44. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Nature Switzerland, Tel Aviv, Israel, 18 February 2023. [Google Scholar] [CrossRef]
- Wenxuan, W.; Chen, C.; Meng, D.; Hong, Y.; Sen, Z.; Li, J. Transbts: Multimodal brain tumor segmentation using transformer. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 21 September 2021. [Google Scholar] [CrossRef]
- Wu, Z.; Liu, Z.; Lin, J.; Lin, Y.; Han, S. Lite transformer with long-short range attention. arXiv 2004. [Google Scholar] [CrossRef]
- Yu, W.H.; Luo, M.; Zhou, P.; Si, C.Y.; Zhou, Y.C.; Wang, X.C.; Feng, J.S.; Yan, S.C. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10819–10829. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.C.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 4–8 January 2022; pp. 574–584. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Nath, V.; Tang, Y.C.; Yang, D.; Roth, H.R.; Xu, D.G. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop; Springer International Publishing: Cham, Switzerland, 2021; pp. 272–284. [Google Scholar] [CrossRef]
- Xing, Z.H.; Yu, L.Q.; Wan, L.; Han, T.; Zhu, L. NestedFormer: Nested modality-aware transformer for brain tumor segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2022; pp. 140–150. [Google Scholar] [CrossRef]
- Zhang, Y.; He, N.J.; Yang, J.W.; Li, Y.X.; Dong, W.; Huang, Y.W.; Zhang, Y.; He, Z.Q.; Zheng, Y.F. mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2022; pp. 107–117. [Google Scholar] [CrossRef]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; pp. 424–432. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2021, 18, 203–211. [Google Scholar] [CrossRef]
- Wang, L.B.; Li, R.; Zhang, C.; Fang, S.H.; Duan, C.X.; Meng, X.L.; Peter, M.A. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS J. Photogramm. Remote Sens. 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Dolz, J.; Gopinath, K.; Yuan, J.; Lombaert, H.; Desrosiers, C.; Ayed, I.B. HyperDense-Net: A hyper-densely connected CNN for multi-modal image segmentation. IEEE Trans. Med. Imaging 2018, 38, 1116–1126. [Google Scholar] [CrossRef] [PubMed]
- Cardoso, M.J.; Li, W.; Brown, R.; Ma, N.; Kerfoot, E.; Wang, Y.; Murrey, B.; Myronenko, A.; Zhao, C.; Yang, D.; et al. Monai: An open-source framework for deep learning in healthcare. arXiv 2022. [Google Scholar] [CrossRef]
- Nam, H.; Ha, J.W.; Kim, J. Dual attention networks for multimodal reasoning and matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 299–307. [Google Scholar] [CrossRef]
- Zhao, R.; Qian, B.; Zhang, X.; Li, Y.; Wei, R.; Liu, Y.; Pan, Y. Rethinking dice loss for medical image segmentation. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020. [Google Scholar] [CrossRef]
- Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31, 8792–8802. [Google Scholar] [CrossRef]
- Loshchilov, I. Decoupled weight decay regularization. arXiv 2017. [Google Scholar] [CrossRef]
- Pham, Q.D.; Nguyen-Truong, H.; Phuong, N.N.; Nguyen, K.N.; Nguyen, C.D.; Bui, T.; Truong, S.Q. Segtransvae: Hybrid cnn-transformer with regularization for medical image segmentation. In Proceedings of the 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India, 28–31 March 2022. [Google Scholar] [CrossRef]
- Liang, J.; Yang, C.; Zeng, L. 3D PSwinBTS: An efficient transformer-based Unet using 3D parallel shifted windows for brain tumor segmentation. Digit. Signal Process. 2022, 131, 103784. [Google Scholar] [CrossRef]
- Lin, J.; Lu, C.; Chen, H.; Lin, H.; Zhao, B.; Shi, Z.; Qiu, B.; Pan, X.; Xu, Z.; Huang, B. CKD-TransBTS: Clinical knowledge-driven hybrid transformer with modality-correlated cross-attention for brain tumor segmentation. IEEE Trans. Med. Imaging 2023, 42, 2451–2461. [Google Scholar] [CrossRef]
Methods | Parm | FLOPs | WT | TC | ET | Ave | ||||
---|---|---|---|---|---|---|---|---|---|---|
(M) ↓ | (G) ↓ | Dice ↑ | HD95 ↓ | Dice ↑ | HD95 ↓ | Dice ↑ | HD95 ↓ | Dice ↑ | HD95 ↓ | |
3D-UNet 122 | 5.75 | 1449.59 | 88.2 | 5.113 | 83.0 | 6.604 | 78.2 | 6.715 | 83.1 | 6.144 |
SegResNet 31 | 18.79 | 185.23 | 90.3 | 4.578 | 84.5 | 5.667 | 79.6 | 7.064 | 84.8 | 5.763 |
nnUNet 32 | 5.75 | 1449.59 | 90.7 | 6.94 | 84.8 | 5.069 | 81.4 | 5.851 | 85.6 | 5.953 |
SwinUNet (2D) 21 | 27.17 | 357.49 | 87.2 | 6.752 | 80.9 | 8.071 | 74.4 | 10.644 | 80.8 | 8.489 |
TransBTS 22 | 32.99 | 333 | 91.0 | 4.141 | 85.5 | 5.894 | 79.1 | 5.463 | 85.2 | 5.166 |
UNETR 25 | 92.58 | 41.19 | 89.9 | 4.314 | 84.2 | 5.843 | 78.8 | 5.598 | 84.3 | 5.251 |
SwinUNETR 26 | 62.5 | 295 | 92.0 | 4.907 | 85.3 | 7.218 | 80.5 | 11.419 | 85.9 | 7.848 |
MMAformer (ours) | 18.76 | 104.6 | 91.3 | 3.873 | 86.6 | 4.559 | 81.1 | 5.890 | 86.3 | 4.774 |
Methods | WT | TC | ET | Ave | Year |
---|---|---|---|---|---|
DynUnet 32 | 92.88 | 89.71 | 85.81 | 89.46 | 2021 |
SegtransVAE 40 | 92.54 | 89.99 | 86.22 | 89.58 | 2022 |
SwinUNETR 26 | 92.73 | 89.98 | 86.81 | 89.84 | 2022 |
PSwinBTS 41 | 93.62 | 90.43 | 88.25 | 90.76 | 2022 |
CKD-TransBTS 42 | 93.33 | 90.16 | 88.50 | 90.66 | 2023 |
MMAformer (ours) | 93.58 | 92.21 | 88.78 | 91.53 | ours |
Methods | Training | Validation | Testing |
---|---|---|---|
U-Net | 89.68 | 89.83 | 89.10 |
Baseline | 90.23 | 90.48 | 90.35 |
Baseline + CMD | 90.66 | 90.88 | 90.80 |
Baseline + CMD + MGA | 91.56 | 91.60 | 91.53 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, H.; Zhang, X.; Lu, W.; Yuan, F.; Luo, H. MMAformer: Multiscale Modality-Aware Transformer for Medical Image Segmentation. Electronics 2024, 13, 4636. https://doi.org/10.3390/electronics13234636
Ding H, Zhang X, Lu W, Yuan F, Luo H. MMAformer: Multiscale Modality-Aware Transformer for Medical Image Segmentation. Electronics. 2024; 13(23):4636. https://doi.org/10.3390/electronics13234636
Chicago/Turabian StyleDing, Hao, Xiangfen Zhang, Wenhao Lu, Feiniu Yuan, and Haixia Luo. 2024. "MMAformer: Multiscale Modality-Aware Transformer for Medical Image Segmentation" Electronics 13, no. 23: 4636. https://doi.org/10.3390/electronics13234636
APA StyleDing, H., Zhang, X., Lu, W., Yuan, F., & Luo, H. (2024). MMAformer: Multiscale Modality-Aware Transformer for Medical Image Segmentation. Electronics, 13(23), 4636. https://doi.org/10.3390/electronics13234636