DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model
Abstract
:1. Introduction
- (a)
- An innovatively proposed dual-path encoder architecture based on CNN and Swin Transformer, combined with a CNN and an improved Swin Transformer, a convolution operation is used to extract local dependencies and rich local features, and an improved Swin Transformer is used to learn global dependencies for global modeling. Then, feature fusion and upsampling are carried out to produce the segmentation results. Deeply integrating CNN and Transformer, leveraging the strengths of both frameworks, effectively enhances the accuracy of brain-tumor boundary recognition. Finally, considering that as the depth of the Swin Transformer model increases, the differences in the amplitudes of cross-layer activations significantly grow, mainly due to the outputs of residual units directly added to the main branch. This instability issue in large-scale models of the Swin Transformer is addressed by normalizing the activation values of each residual branch and merging them back to the main branch, thereby enhancing the stability of training through structural improvements.
- (b)
- A new location coding module is proposed. By adding a trainable parameter in the local window (M × M × M) and integrating location information in self-attention training, the Swin Transformer encoder structure can obtain rich location information, which helps to improve the segmentation accuracy of the brain-tumor model, especially for the recognition of the brain-tumor boundary region. M represents the local window size during Swin Transformer training, and r represents the relative positional offset.
- (c)
- For validation of the benchmark dataset, this study utilized publicly available datasets named Brats 2021 [6,7,8] and Brats 2019 [9], which were provided by the organizers of MICCAI (International Conference on Medical Image Computing and Computer-Assisted Intervention) and served as part of the BraTS challenge. The experimental results of the Brats 2021 and Brats 2019 datasets demonstrated the effectiveness of the model, further improving the segmentation accuracy of brain tumors.
2. Related Work
2.1. Swin Transformer
2.2. Position Embedding
3. Materials and Methods
3.1. Overall Architecture
3.2. Network Encoder
3.3. Network Decoder
4. Experimental Results
4.1. Data and Evaluation Indicators
- (a)
- The background region was clipped, considering the proportion of tumor region and non-tumor region, and 128 × 128 × 128 feature blocks were extracted.
- (b)
- The scaling coefficient is 0.8–1.2; the probability is 0.15.
- (c)
- Gaussian N(0, 0.01) noise is added.
- (d)
- Gaussian smoothing is performed with α ϵ [0.5, 1.15].
- (e)
- The probability of random flipping on the axial, coronal, and sagittal planes is 0.5.
- (a)
- The dice evaluation function is a commonly used index to measure the segmentation accuracy of brain tumors. This index measures the accuracy of the model by calculating the overlap between the predicted results of the model and the real label. When the dice coefficient is close to one, the higher the overlap and the better the performance.
- (b)
- The Hausdorff distance (95%) represents the surface distance between the prediction and ground truth. The 95% quantized value of the maximum distance is different from the dice coefficient, which is sensitive to the inner filling of the mask. The Hausdorff distance is sensitive to the segmentation boundary of the brain tumor. It can effectively identify the boundary of the enhanced tumor region and tumor core region. The Hausdorff distance measures the distance between two subsets in the space, where d represents the element-by-element distance closest to the voxels from the first set of voxels to the second set of identical labels, X represents the true value label of the voxel, and Y represents the predicted value label of the voxel.
4.2. Main Result
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html (accessed on 17 April 2024). [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arxiv 2014, arXiv:1409.1556. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer International Publishing: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Baid, U.; Ghodasara, S.; Mohan, S.; Bilello, M.; Calabrese, E.; Colak, E.; Bakas, S. The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification. arXiv 2021, arXiv:2107.02314. [Google Scholar]
- Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Van Leemput, K. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef] [PubMed]
- Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Davatzikos, C. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef] [PubMed]
- BraTS Challenge Organizers. BraTS2019 Challenge Dataset [Dataset]. 2019. Available online: https://www.med.upenn.edu/cbica/brats-2019/ (accessed on 17 April 2024).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 17 April 2024).
- Jiao, J.; Cheng, X.; Chen, W.; Yin, X.; Shi, H.; Yang, K. Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers. arxiv 2024, arXiv:2401.16700. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12009–12019. [Google Scholar]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-attention with relative position representations. arXiv 2018, arXiv:1803.02155. [Google Scholar]
- Huang, C.Z.A.; Vaswani, A.; Uszkoreit, J.; Shazeer, N.; Hawthorne, C.; Dai, A.M.; Eck, D. Music transformer: Generating music with long-term structure (2018). arXiv 2018, arXiv:1809.04281. [Google Scholar]
- Chu, X.; Tian, Z.; Zhang, B.; Wang, X.; Shen, C. Conditional Positional Encodings for Vision Transformers. arXiv 2021, arXiv:2102.10882. [Google Scholar]
- Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 2024, 568, 127063. [Google Scholar] [CrossRef]
- Ramachandran, P.; Parmar, N.; Vaswani, A.; Bello, I.; Levskaya, A.; Shlens, J. Stand-alone self-attention in vision models. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/3416a75f4cea9109507cacd8e2f2aefc-Paper.pdf (accessed on 17 April 2024).
- Kamnitsas, K.; Ledig, C.; Newcombe, V.F.; Simpson, J.P.; Kane, A.D.; Menon, D.K.; Glocker, B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 2017, 36, 61–78. [Google Scholar] [CrossRef] [PubMed]
- Isensee, F.; Jäger, P.F.; Full, P.M.; Vollmuth, P.; Maier-Hein, K.H. nnU-Net for brain tumor segmentation. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 6th International Workshop, BrainLes 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 4 October 2020; Revised Selected Papers, Part II 6. Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 118–132. [Google Scholar]
- Luu, H.M.; Park, S.H. Extending nn-UNet for brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 173–186. [Google Scholar]
- Wang, W.; Chen, C.; Ding, M.; Yu, H.; Zha, S.; Li, J. Transbts: Multimodal brain tumor segmentation using transformer. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Proceedings, Part I 24. Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 109–119. [Google Scholar]
- Xu, X.; Zhao, W.; Zhao, J. Brain tumor segmentation using attention-based network in 3D MRI images. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 17 October 2019; Revised Selected Papers, Part II 5. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 3–13. [Google Scholar]
- Zhao, G.; Zhang, J.; Xia, Y. Improving brain tumor segmentation in multi-sequence MR images using cross-sequence MR image generation. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 17 October 2019; Revised Selected Papers, Part II 5. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 27–36. [Google Scholar]
- Jiang, Z.; Ding, C.; Liu, M.; Tao, D. Two-stage cascaded u-net: 1st place solution to brats challenge 2019 segmentation task. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 17 October 2019; Revised Selected Papers, Part I 5. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 231–241. [Google Scholar]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 424–432. [Google Scholar]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 272–284. [Google Scholar]
- Pei, L.; Vidyaratne, L.; Monibor Rahman, M.; Shboul, Z.A.; Iftekharuddin, K.M. Multimodal brain tumor segmentation and survival prediction using hybrid machine learning. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 17 October 2019; Revised Selected Papers, Part II 5. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 73–81. [Google Scholar]
- Peiris, H.; Chen, Z.; Egan, G.; Harandi, M. Reciprocal adversarial learning for brain tumor segmentation: A solution to BraTS challenge 2021 segmentation task. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 171–181. [Google Scholar]
- Jia, Q.; Shu, H. Bitr-unet: A cnn-transformer combined network for mri brain tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 3–14. [Google Scholar]
- Yuan, Y. Evaluating scale attention network for automatic brain tumor segmentation with large multi-parametric MRI database. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 42–53. [Google Scholar]
- Pawar, K.; Zhong, S.; Goonatillake, D.S.; Egan, G.; Chen, Z. Orthogonal-Nets: A Large Ensemble of 2D Neural Networks for 3D Brain Tumor Segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 54–67. [Google Scholar]
- Cai, X.; Lou, S.; Shuai, M.; An, Z. Feature learning by attention and ensemble with 3d u-net to glioma tumor segmentation. In Proceedings of the International MICCAI Brainlesion Workshop, Virtual, 27 September 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 68–79. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature Switzerland: Cham, Switzerland, 2022; pp. 205–218. [Google Scholar]
Positional Encoding Type | Advantages | Limitations |
---|---|---|
Absolute Positional Encoding | Provides precise location information vital for accurate segmentation. | 1. Lacks flexibility and struggles with deformations or transformations in the image. 2. Susceptible to changes in image orientation or scale. |
Relative Positional Encoding | 1. Accommodates image deformations, crucial in tumor-growth monitoring. 2. Enhances the model’s ability to handle variations in tumor shape and size. | Demands complex computations to determine relative positions accurately. |
Conditional Positional Encoding | Adapts to the context of the image, offering superior segmentation accuracy. | Requires sophisticated architectures and training strategies. |
Conditional Positional Encoding | Addresses rotational variations, essential in handling different imaging angles. | Might not be as relevant in standard brain-tumor segmentation, where rotations are minimal. |
DualTrans Positional Encoding (Our) | 1. During the training process, positional offset information is learned to provide more precise segmentation accuracy. 2. Captures intricate spatial relationships within the image. | Due to the dynamic learning strategy, overfitting is prone to occur. |
Model | Dice Score (%) | Hausdorff 95 (mm) | ||||
---|---|---|---|---|---|---|
ET | TC | WT | ET | TC | WT | |
TransBTS [22] | 78.36 | 81.41 | 88.89 | 5.91 | 7.58 | 7.60 |
Attention-based [23] | 75.9 | 80.7 | 89.3 | 4.19 | 7.66 | 6.96 |
Cross-Sequence [24] | 78.09 | 84.32 | 90.81 | 2.88 | 5.74 | 5.27 |
Two-Stage Cascaded [25] | 80.21 | 86.47 | 90.94 | 3.16 | 5.43 | 4.26 |
3D-UNet [26] | 70.86 | 72.48 | 87.38 | 5.06 | 8.71 | 9.43 |
Swin UNETR [27] | 85.2 | 86.9 | 90.8 | 8.78 | 5.62 | 6.23 |
Pei et al. [28] | 81.33 | 84.08 | 88.62 | 4.21 | 8.02 | 5.46 |
DualTrans (ours) | 85.63 | 87.2 | 91.42 | 8.32 | 5.64 | 7.21 |
Model | Dice Score (%) | Hausdorff 95 (mm) | ||||
---|---|---|---|---|---|---|
ET | TC | WT | ET | TC | WT | |
Swin UNETR [27] | 85.8 | 88.5 | 92.6 | 6.02 | 3.77 | 5.83 |
Reciprocal Adversarial [29] | 81.38 | 85.63 | 90.77 | 21.83 | 8.56 | 5.37 |
Qiran Jia et al. [30] | 81.87 | 84.34 | 90.97 | 17.85 | 16.69 | 4.51 |
Yuan et al. [31] | 84.79 | 86.55 | 92.65 | 12.75 | 11.19 | 3.67 |
Orthogonal-Nets [32] | 83.2 | 84.99 | 91.38 | 20.97 | 9.81 | 5.43 |
Attention and Ensemble [33] | 83.79 | 86.47 | 91.99 | 6.39 | 7.81 | 3.86 |
Swin–Unet [34] | 85.37 | 87.26 | 92.08 | 14.32 | 9.80 | 11.28 |
Extending-nnUNet [21] | 84.51 | 87.81 | 92.75 | 20.73 | 7.62 | 3.47 |
DualTrans (ours) | 86.23 | 88.12 | 92.83 | 6.37 | 3.64 | 4.51 |
Model | Dice Score (%) | ||
---|---|---|---|
ET | TC | WT | |
3D-UNet | 70.86 | 72.48 | 87.38 |
DualTrans (ours) | 86.23 | 88.12 | 92.83 |
Model | Dice Score (%) | ||
---|---|---|---|
ET | TC | WT | |
DualTrans (no position) | 85.38 | 87.01 | 90.46 |
DualTrans (rel. position) | 86.06 | 87.98 | 92.65 |
DualTrans (DualTrans position) | 86.23 | 88.12 | 92.83 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Silamu, W.; Ma, Y.; Li, Y. DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model. Appl. Sci. 2024, 14, 4834. https://doi.org/10.3390/app14114834
Li Z, Silamu W, Ma Y, Li Y. DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model. Applied Sciences. 2024; 14(11):4834. https://doi.org/10.3390/app14114834
Chicago/Turabian StyleLi, Zongren, Wushouer Silamu, Yajing Ma, and Yanbing Li. 2024. "DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model" Applied Sciences 14, no. 11: 4834. https://doi.org/10.3390/app14114834
APA StyleLi, Z., Silamu, W., Ma, Y., & Li, Y. (2024). DualTrans: A Novel Glioma Segmentation Framework Based on a Dual-Path Encoder Network and Multi-View Dynamic Fusion Model. Applied Sciences, 14(11), 4834. https://doi.org/10.3390/app14114834