A Novel Tongue Coating Segmentation Method Based on Improved TransUNet
Abstract
:1. Introduction
- We innovatively introduce TransUNet into the task of tongue coating segmentation, integrating semantic information from the high-level features captured by the transformer and spatial information from the low-level features of the encoder through the skip connection structure of UNet. This achieves the complete and continuous segmentation of the tongue coating from the tongue body, aiming to solve the problem of tongue coating segmentation in intelligent tongue diagnosis.
- We improve and design the subtraction feature pyramid (SFP) and visual regional enhancer (VRE) modules. SFP is used to reduce redundant information in low-level encoder features and focus on local spatial details; VRE is used to enrich spatial detail information in low-level features, reduce significant differences between high-level and low-level features, and enable more effective fusion.
- Comparative experiments and ablation experiments show that our model has superior overall performance compared to the commonly used UNet, UNet++, and SegNet models for medical image segmentation on the same dataset. Furthermore, it can also better cope with irregular tongue coatings such as tooth marks, cracks, peeling, etc., with unclear boundaries, irregular shapes, and irregular distribution.
2. Materials and Methods
2.1. Subtraction Feature Pyramid
2.2. Visual Regional Enhancer
2.3. Loss
3. Results
3.1. Datasets
3.1.1. Sources
3.1.2. Labeling
3.2. Evaluation Metrics
- Accuracy: This indicates that the model correctly predicts the number of pixels in proportion to all pixels.
- Precision: The ratio of the number of tongue coating pixels correctly predicted by the model to the actual number of tongue coating pixels.
- Recall: The ratio of the number of pixels correctly predicted by the model to the actual number of pixels in the tongue coating.
- Dice: The precision reflects the model’s ability to distinguish non-tongue coating area. The higher the precision, the stronger the model’s ability to distinguish non-tongue coating area. The recall reflects the model’s ability to recognize the tongue coating area. The higher the recall, the stronger the model’s ability to recognize the tongue coating area. Dice is the average sum of the two. The higher the dice, the more robust the model becomes.
- IoU: IoU is a commonly used evaluation metric used to calculate the ratio between the intersection and union of two sets of predicted segmentation results and true segmentation results.
3.3. Implementation Details
3.4. Comparative Study
3.5. Ablation Study
3.5.1. SFP
3.5.2. VRE
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Han, S.; Yang, X.; Qi, Q.; Pan, Y.; Chen, Y.; Shen, J.; Liao, H.; Ji, Z. Potential screening and early diagnosis method for cancer: Tongue diagnosis. Int. J. Oncol. 2016, 48, 2257–2264. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Lee, H.; Kim, H.; Kim, J.Y.; Kim, K.H. Differences in the tongue features of primary dysmenorrhea patients and controls over a normal menstrual cycle. Evid.-Based Complement. Altern. Med. 2017, 2017, 6435702. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Xu, Y.; Hu, Y. Computer-based Study on EGFR Expression and SALIvary EGF Content in Tongue Coating Exfoliated Cells in Patients with Digestive Dystem Tumor. J. Phys. Conf. Ser. 2020, 1648, 022100. [Google Scholar] [CrossRef]
- Chen, J.; Sun, Y.; Li, J.; Lyu, M.; Yuan, L.; Sun, J.; Chen, S.; Hu, C.; Wei, Q.; Xu, Z.; et al. In-depth metaproteomics analysis of tongue coating for gastric cancer: A multicenter diagnostic research study. Microbiome 2024, 12, 6. [Google Scholar] [CrossRef] [PubMed]
- Mohammed, M.M.A.; Al Kawas, S.; Al-Qadhi, G. Tongue-coating microbiome as a cancer predictor: A scoping review. Arch. Oral Biol. 2021, 132, 105271. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Zhao, H.; Guo, Y.; Meng, Y.; Yu, S.; Pan, B.; Zhai, X. Relationship between thick or greasy tongue-coating microbiota and tongue diagnosis in patients with primary liver cancer. Front. Microbiol. 2022, 13, 903616. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Wen, G.; Zeng, H. Natural tongue physique identification using hybrid deep learning methods. Multimed. Tools Appl. 2019, 78, 6847–6868. [Google Scholar] [CrossRef]
- Hu, Y.; Wen, G.; Luo, M.; Yang, P.; Dai, D.; Yu, Z.; Wang, C.; Hall, W. Fully-channel regional attention network for disease-location recognition with tongue images. Artif. Intell. Med. 2021, 118, 102110. [Google Scholar] [CrossRef]
- Hu, M.C.; Lan, K.C.; Fang, W.C.; Huang, Y.C.; Ho, T.J.; Lin, C.P.; Yeh, M.H.; Raknim, P.; Lin, Y.H.; Cheng, M.H.; et al. Automated tongue diagnosis on the smartphone and its applications. Comput. Methods Programs Biomed. 2019, 174, 51–64. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Lou, Y.; Liu, J.; Huo, S.; Pang, X.; Wang, W.; Wu, C.; Chen, Y.; Chen, Y.; et al. Constructing tongue coating recognition model using deep transfer learning to assist syndrome diagnosis and its potential in noninvasive ethnopharmacological evaluation. J. Ethnopharmacol. 2022, 285, 114905. [Google Scholar] [CrossRef]
- Li, J.; Cui, L.; Tu, L.; Hu, X.; Wang, S.; Shi, Y.; Liu, J.; Zhou, C.; Li, Y.; Huang, J.; et al. Research of the distribution of tongue features of diabetic population based on unsupervised learning technology. Evid.-Based Complement. Altern. Med. 2022, 2022, 7684714. [Google Scholar] [CrossRef] [PubMed]
- Jiang, T.; Lu, Z.; Hu, X.; Zeng, L.; Ma, X.; Huang, J.; Cui, J.; Tu, L.; Zhou, C.; Yao, X.; et al. Deep Learning Multi-label Tongue Image Analysis and Its Application in a Population Undergoing Routine Medical Checkup. Evid.-Based Complement. Altern. Med. 2022, 2022, 3384209. [Google Scholar] [CrossRef] [PubMed]
- Shi, M.; Li, G.; Li, F. C2G2 FSnake: Automatic tongue image segmentation utilizing prior knowledge. Sci. China Inf. Sci. 2013, 56, 1–14. [Google Scholar]
- Cui, Z.; Zhang, H.; Zhang, D.; Li, N.; Zuo, W. Fast marching over the 2D Gabor magnitude domain for tongue body segmentation. EURASIP J. Adv. Signal Process. 2013, 2013, 190. [Google Scholar] [CrossRef]
- Wu, K.; Zhang, D. Robust tongue segmentation by fusing region-based and edge-based approaches. Expert Syst. Appl. 2015, 42, 8027–8038. [Google Scholar] [CrossRef]
- Huang, Z.; Miao, J.; Song, H.; Yang, S.; Zhong, Y.; Xu, Q.; Tan, Y.; Wen, C.; Guo, J. A novel tongue segmentation method based on improved U-Net. Neurocomputing 2022, 500, 73–89. [Google Scholar] [CrossRef]
- Xu, H.; Chen, X.; Qian, P.; Li, F. A two-stage segmentation of sublingual veins based on compact fully convolutional networks for Traditional Chinese Medicine images. Health Inf. Sci. Syst. 2023, 11, 19. [Google Scholar] [CrossRef] [PubMed]
- Yan, J.; Cai, J.; Xu, Z.; Guo, R.; Zhou, W.; Yan, H.; Xu, Z.; Wang, Y. Tongue crack recognition using segmentation based deep learning. Sci. Rep. 2023, 13, 511. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
- Zhang, Z.; Wu, C.; Coleman, S.; Kerr, D. DENSE-INception U-net for medical image segmentation. Comput. Methods Programs Biomed. 2020, 192, 105395. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. Denseaspp for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3684–3692. [Google Scholar]
- Zhao, X.; Jia, H.; Pang, Y.; Lv, L.; Tian, F.; Zhang, L.; Sun, W.; Lu, H. M2SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation. arXiv 2023, arXiv:2303.10894. [Google Scholar]
- Quan, Y.; Zhang, D.; Zhang, L.; Tang, J. Centralized feature pyramid for object detection. IEEE Trans. Image Process. 2023, 32, 4341–4354. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. arXiv 2019, arXiv:1911.02855. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Kendall, A.; Gal, Y.; Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7482–7491. [Google Scholar]
- Shi, Y.; Guo, D.; Chun, Y.; Liu, J.; Liu, L.; Tu, L.; Xu, J. A lung cancer risk warning model based on tongue images. Front. Physiol. 2023, 14, 1154294. [Google Scholar] [CrossRef]
- Zhang, X.; Bian, H.; Cai, Y.; Zhang, K.; Li, H. An improved tongue image segmentation algorithm based on Deeplabv3+ framework. IET Image Process. 2022, 16, 1473–1485. [Google Scholar] [CrossRef]
- GB/T 20348-2006; State Standard of the People’s Republic of China—Basic Theory Nomenclature of Traditional Chinese Medicine. National Administration of Traditional Chinese Medicine: Beijing, China, 2006.
- Zhao, J.D. Differential Diagnosis of TCM Symptoms; People’s Medical Publishing House: Beijing, China, 1994. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Marhamati, M.; Zadeh, A.A.L.; Fard, M.M.; Hussain, M.A.; Jafarnezhad, K.; Jafarnezhad, A.; Bakhtoor, M.; Momeny, M. LAIU-Net: A learning-to-augment incorporated robust U-Net for depressed humans’ tongue segmentation. Displays 2023, 76, 102371. [Google Scholar] [CrossRef]
- Zhou, C.; Fan, H.; Li, Z. Tonguenet: Accurate localization and segmentation for tongue images using deep neural networks. IEEE Access 2019, 7, 148779–148789. [Google Scholar] [CrossRef]
- Kusakunniran, W.; Borwarnginn, P.; Karnjanapreechakorn, S.; Thongkanchorn, K.; Ritthipravat, P.; Tuakta, P.; Benjapornlert, P. Encoder-decoder network with RMP for tongue segmentation. Med. Biol. Eng. Comput. 2023, 61, 1193–1207. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Wang, S.; Sun, K.; Zhou, T.; Yan, F.; Shen, D. Three-dimensional affinity learning based multi-branch ensemble network for breast tumor segmentation in MRI. Pattern Recognit. 2022, 129, 108723. [Google Scholar] [CrossRef]
- Li, Z.; Zheng, Y.; Shan, D.; Yang, S.; Li, Q.; Wang, B.; Zhang, Y.; Hong, Q.; Shen, D. Scribformer: Transformer makes cnn work better for scribble-based medical image segmentation. IEEE Trans. Med. Imaging 2024, 43, 2254–2265. [Google Scholar] [CrossRef] [PubMed]
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
Our model | 96.36% | 96.26% | 96.76% | 97.43% | 93.81% |
UNet | 93.32% | 92.12% | 94.20% | 96.95% | 89.31% |
UNet++ | 95.73% | 94.39% | 95.90% | 97.04% | 92.36% |
SegNet | 95.50% | 95.54% | 95.95% | 96.62% | 92.35% |
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
Our model | 0.003757 | 0.008130 | 0.002809 | 0.004388 | 0.004859 |
UNet | 0.021674 | 0.043071 | 0.018303 | 0.016150 | 0.030058 |
UNet++ | 0.003167 | 0.010994 | 0.004059 | 0.005874 | 0.006905 |
SegNet | 0.005874 | 0.017500 | 0.006594 | 0.007607 | 0.010757 |
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
TransUNet | 95.78% | 95.51% | 96.31% | 97.27% | 93.04% |
+SFP | 96.25% | 95.99% | 96.61% | 97.44% | 93.56% |
+VRE | 96.36% | 96.26% | 96.76% | 97.43% | 93.81% |
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
ResNet-50 (5 levels) | 96.32% | 96.17% | 96.70% | 97.41% | 93.71% |
ResNet-50 (4 levels) | 96.36% | 96.26% | 96.76% | 97.43% | 93.81% |
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
TransUNet | 95.78% | 95.51% | 96.31% | 97.27% | 93.04% |
+SFP(⊕) | 96.23% | 96.49% | 96.60% | 96.91% | 93.54% |
+SFP(⊖) | 96.25% | 95.99% | 96.61% | 97.44% | 93.56% |
Accuracy | Precision | Dice | Recall | IoU | |
---|---|---|---|---|---|
TransUNet+SFP | 96.25% | 95.99% | 96.61% | 97.44% | 93.56% |
+LVC | 96.33% | 96.47% | 96.70% | 97.10% | 93.71% |
+VRE (parallel) | 96.36% | 96.21% | 96.74% | 97.45% | 93.79% |
+VRE (series) | 96.36% | 96.26% | 96.76% | 97.43% | 93.81% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Li, Z.; Cai, Y.; Liang, H.; Zhou, L.; Chen, M.; Guan, J. A Novel Tongue Coating Segmentation Method Based on Improved TransUNet. Sensors 2024, 24, 4455. https://doi.org/10.3390/s24144455
Wu J, Li Z, Cai Y, Liang H, Zhou L, Chen M, Guan J. A Novel Tongue Coating Segmentation Method Based on Improved TransUNet. Sensors. 2024; 24(14):4455. https://doi.org/10.3390/s24144455
Chicago/Turabian StyleWu, Jiaze, Zijian Li, Yiheng Cai, Hao Liang, Long Zhou, Ming Chen, and Jing Guan. 2024. "A Novel Tongue Coating Segmentation Method Based on Improved TransUNet" Sensors 24, no. 14: 4455. https://doi.org/10.3390/s24144455
APA StyleWu, J., Li, Z., Cai, Y., Liang, H., Zhou, L., Chen, M., & Guan, J. (2024). A Novel Tongue Coating Segmentation Method Based on Improved TransUNet. Sensors, 24(14), 4455. https://doi.org/10.3390/s24144455