PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation
Abstract
:1. Introduction
2. Related Work
2.1. Semantic Segmentation
2.2. UDA
2.3. Attention Mechanism
3. Proposed Approaches
3.1. Overview
3.2. Self-Training for UDA
3.2.1. Source-Domain Training
3.2.2. Target-Domain Training
3.2.3. Cross-Domain Training
3.3. Polarized Attention Network
4. Experiments
4.1. Implementation Details
4.2. Variation in PSA Layout Combinations
4.3. Comparison of Different Convolution Modules
4.4. Comparison with State-of-the-Art Methods
4.5. Ablation Study
4.6. Failure Case Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
UDA | Unsupervised domain adaptation |
PANDA | Polarized Attention Network Domain Adaptation |
PSA | Polarized Self-Attention |
mIoU | Mean intersection over union |
GTA | Grand Theft Auto |
GAN | Generative adversarial network |
HRDA | High-Resolution Domain-Adaptive |
MIC | Masked Image Consistency |
EA | Efficient attention |
SENet | Squeeze-and-excitation network |
GCNet | Global Context Network |
CBAM | Convolutional Block Attention Module |
EMA | Exponential moving average |
PAN | Polarized Attention Network |
MiT | Mix Transformer |
RCS | Rare Class Sampling |
FD | Feature distance |
GAP | Global average pooling |
DACS | Domain Adaptation via Cross-domain Mixed Sampling |
ASPP | Atrous Spatial Pyramid Pooling |
CSA | Channel-only Self-attention |
SSA | Spatial-only Self-attention |
Conv | Convolution |
SW | Sidewalk |
Build. | Building |
TL | Traffic light |
TS | Traffic sign |
Veg. | Vegetation |
M.Bike | Motorcycle |
SYN | SYNTHIA |
CS | Cityscapes |
References
- Toldo, M.; Maracani, A.; Michieli, U.; Zanuttigh, P. Unsupervised domain adaptation in semantic segmentation: A review. Technologies 2020, 8, 35. [Google Scholar] [CrossRef]
- Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized self-attention: Towards high-quality pixel-wise regression. arXiv 2021, arXiv:2107.00782. [Google Scholar]
- Zhou, T.; Zhang, F.; Chang, B.; Wang, W.; Yuan, Y.; Konukoglu, E.; Cremers, D. Image Segmentation in Foundation Model Era: A Survey. arXiv 2024, arXiv:2408.12957. [Google Scholar]
- Wang, K.; Liew, J.H.; Zou, Y.; Zhou, D.; Feng, J. Panet: Few-shot image semantic segmentation with prototype alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9197–9206. [Google Scholar]
- Zhou, T.; Wang, W.; Konukoglu, E.; Van Gool, L. Rethinking semantic segmentation: A prototype view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 2582–2593. [Google Scholar]
- Zhou, T.; Wang, W. Cross-image pixel contrasting for semantic segmentation. IEEE Trans. Pattern. Anal. Mach. Intell. 2024, 46, 5398–5412. [Google Scholar] [CrossRef] [PubMed]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Sakaridis, C.; Dai, D.; Van Gool, L. ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 10765–10775. [Google Scholar]
- Chen, L.; Chen, H.; Wei, Z.; Jin, X.; Tan, X.; Jin, Y.; Chen, E. Reusing the task-specific classifier as a discriminator: Discriminator-free adversarial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 7181–7190. [Google Scholar]
- Wang, Q.; Meng, F.; Breckon, T.P. Data augmentation with norm-AE and selective pseudo-labelling for unsupervised domain adaptation. Neural Netw. 2023, 161, 614–625. [Google Scholar] [CrossRef] [PubMed]
- Zhu, J.; Bai, H.; Wang, L. Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 3561–3571. [Google Scholar]
- Singh, I.P.; Ghorbel, E.; Kacem, A.; Rathinam, A.; Aouada, D. Discriminator-free unsupervised domain adaptation for multi-label image classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 3936–3945. [Google Scholar]
- Mattolin, G.; Zanella, L.; Ricci, E.; Wang, Y. ConfMix: Unsupervised Domain Adaptation for Object Detection via Confidencebased Mixing. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 423–433. [Google Scholar]
- Kennerley, M.; Wang, J.G.; Veeravalli, B.; Tan, R.T. 2pcnet: Two-phase consistency training for day-to-night unsupervised domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–23 June 2023; pp. 11484–11493. [Google Scholar]
- VS, V.; Oza, P.; Patel, V.M. Towards online domain adaptive object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 478–488. [Google Scholar]
- VS, V.; Oza, P.; Patel, V.M. Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 3520–3530. [Google Scholar]
- Pu, B.; Wang, L.; Yang, J.; He, G.; Dong, X.; Li, S.; Tan, Y.; Chen, M.; Jin, Z.; Li, K.; et al. M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle WA, USA, 17–21 June 2024; pp. 11621–11630. [Google Scholar]
- Hoyer, L.; Dai, D.; Van Gool, L. Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9924–9935. [Google Scholar]
- Hoyer, L.; Dai, D.; Van Gool, L. HRDA: Context-aware high-resolution domain-adaptive semantic segmentation. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; pp. 372–391. [Google Scholar]
- Hoyer, L.; Dai, D.; Wang, H.; Van Gool, L. MIC: Masked image consistency for context-enhanced domain adaptation. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 11721–11732. [Google Scholar]
- Chen, M.; Zheng, Z.; Yang, Y.; Chua, T.S. Pipa: Pixel-and patch-wise self-supervised learning for domain adaptative semantic segmentation. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1905–1914. [Google Scholar]
- Zhao, X.; Mithun, N.C.; Rajvanshi, A.; Chiu, H.P.; Samarasekera, S. Unsupervised domain adaptation for semantic segmentation with pseudo label self-refinement. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 2399–2409. [Google Scholar]
- Long, M.; Zhu, H.; Wang, J.; Jordan, M.I. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2208–2217. [Google Scholar]
- Sun, B.; Saenko, K. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 443–450. [Google Scholar]
- Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K.; Efros, A.; Darrell, T. Cycada: Cycle-consistent adversarial domain adaptation. Proc. Int. Conf. Mach. 2018, 80, 1989–1998. [Google Scholar]
- Vu, T.H.; Jain, H.; Bucher, M.; Cord, M.; Pérez, P. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2517–2526. [Google Scholar]
- Zou, Y.; Yu, Z.; Kumar, B.; Wang, J. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 289–305. [Google Scholar]
- Zhang, P.; Zhang, B.; Zhang, T.; Chen, D.; Wang, Y.; Wen, F. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12414–12424. [Google Scholar]
- Shen, Z.; Zhang, M.; Zhao, H.; Yi, S.; Li, H. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3531–3539. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. In Proceedings of the 2019 IEEE/CVF International Conference on Computer VisionWorkshop, Seoul, Republic of Korea, 27–28 October 2019; pp. 1971–1980. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yu, Q.; Wei, W.; Pan, Z.; He, J.; Wang, S.; Hong, D. GPF-Net: Graph-polarized fusion network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–22. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
- Olsson, V.; Tranheden, W.; Pinto, J.; Svensson, L. Classmix: Segmentation-based data augmentation for semi-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2021; pp. 1369–1378. [Google Scholar]
- Tranheden, W.; Olsson, V.; Pinto, J.; Svensson, L. Dacs: Domain adaptation via cross-domain mixed sampling. In Proceedings of the IEEE/CVFWinter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 1379–1389. [Google Scholar]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 1195–1204. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for data: Ground truth from computer games. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 102–118. [Google Scholar]
- Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A.M. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3234–3243. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Method | GTA→Cityscape | SYNTHIA→Cityscape |
---|---|---|
MIC (baseline) | 75.9 | 67.3 |
SSA→CSA | 76.0 | 67.1 |
CSA→SSA | 76.2 | 68.1 |
Method | GTA→Cityscapes | SYNTHIA→Cityscapes |
---|---|---|
MIC [20] | 75.9 | 67.3 |
PANDA(P) | 76.1 | 67.8 |
PANDA(S) | 76.2 | 68.1 |
PANDA(P-P) | 76.0 | 68.5 |
PANDA(S-S) | 76.0 | 67.7 |
PANDA(S-P) | 76.1 | 68.0 |
PANDA(P-S) | 76.1 | 68.7 |
Method | Point-Depth Conv | Standard Conv | GTA→Cityscapes | SYNTHIA→Cityscapes |
---|---|---|---|---|
MIC [20] | 75.9 | 67.3 | ||
PANDA | ✔ | 75.9 | 67.6 | |
PANDA | ✔ | 76.1 | 68.7 |
Method | Road | SW | Build. | Wall | Fence | Pole | TL | TS | Veg. | Terrain | Sky | Person | Rider | Car | Truck | Bus | Train | M.Bike | Bike | mIoU |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GTA→Cityscapes | ||||||||||||||||||||
ADVENT [26] | 89.4 | 33.1 | 81.0 | 26.6 | 26.8 | 27.2 | 33.5 | 24.7 | 83.9 | 36.7 | 78.8 | 58.7 | 30.5 | 84.8 | 38.5 | 44.5 | 1.7 | 31.6 | 32.4 | 45.5 |
DACS [36] | 89.9 | 39.7 | 87.9 | 30.7 | 39.5 | 38.5 | 46.4 | 52.8 | 88.0 | 44.0 | 88.8 | 67.2 | 35.8 | 84.5 | 45.7 | 50.2 | 0.0 | 27.3 | 34.0 | 52.1 |
ProDA [28] | 87.8 | 56.0 | 79.7 | 46.3 | 44.8 | 45.6 | 53.5 | 53.5 | 88.6 | 45.2 | 82.1 | 70.7 | 39.2 | 88.8 | 45.5 | 59.4 | 1.0 | 48.9 | 56.4 | 57.5 |
DAFormer [18] | 95.7 | 70.2 | 89.4 | 53.5 | 48.1 | 49.6 | 55.8 | 59.4 | 89.9 | 47.9 | 92.5 | 72.2 | 44.7 | 92.3 | 74.5 | 78.2 | 65.1 | 55.9 | 61.8 | 68.3 |
HRDA [19] | 96.4 | 74.4 | 91.0 | 61.6 | 51.5 | 57.1 | 63.9 | 69.3 | 91.3 | 48.4 | 94.2 | 79.0 | 52.9 | 93.9 | 84.1 | 85.7 | 75.9 | 63.9 | 67.5 | 73.8 |
MIC [20] | 97.4 | 80.1 | 91.7 | 61.2 | 56.9 | 59.7 | 66.0 | 71.3 | 91.7 | 51.4 | 94.3 | 79.8 | 56.1 | 94.6 | 85.4 | 90.3 | 80.4 | 64.5 | 68.5 | 75.9 |
PANDA | 97.3 | 79.0 | 91.8 | 63.4 | 58.4 | 62.0 | 66.6 | 73.4 | 91.4 | 52.7 | 93.3 | 80.8 | 58.0 | 94.4 | 85.7 | 86.6 | 80.2 | 64.3 | 66.9 | 76.1 |
SYNTHIA→Cityscapes | ||||||||||||||||||||
ADVENT [26] | 85.6 | 42.2 | 79.7 | 8.7 | 0.4 | 25.9 | 5.4 | 8.1 | 80.4 | - | 84.1 | 57.9 | 23.8 | 73.3 | - | 36.4 | - | 14.2 | 33.0 | 41.2 |
DACS [36] | 80.6 | 25.1 | 81.9 | 21.5 | 2.9 | 37.2 | 22.7 | 24.0 | 83.7 | - | 90.8 | 67.6 | 38.3 | 82.9 | - | 38.9 | - | 28.5 | 47.6 | 48.3 |
ProDA [28] | 87.8 | 45.7 | 84.6 | 37.1 | 0.6 | 44.0 | 54.6 | 37.0 | 88.1 | - | 84.4 | 74.2 | 24.3 | 88.2 | - | 51.1 | - | 40.5 | 45.6 | 55.5 |
DAFormer [18] | 84.5 | 40.7 | 88.4 | 41.5 | 6.5 | 50.0 | 55.0 | 54.6 | 86.0 | - | 89.8 | 73.2 | 48.2 | 87.2 | - | 53.2 | - | 53.9 | 61.7 | 60.9 |
HRDA [19] | 85.2 | 47.7 | 88.8 | 49.5 | 4.8 | 57.2 | 65.7 | 60.9 | 85.3 | - | 92.9 | 79.4 | 52.8 | 89.0 | - | 64.7 | - | 63.9 | 64.9 | 65.8 |
MIC [20] | 86.6 | 50.5 | 89.3 | 47.9 | 7.8 | 59.4 | 66.7 | 63.4 | 87.1 | - | 94.6 | 81.0 | 58.9 | 90.1 | - | 61.9 | - | 67.1 | 64.3 | 67.3 |
PANDA | 91.7 | 59.8 | 89.1 | 45.0 | 9.4 | 61.8 | 68.1 | 62.3 | 88.5 | - | 94.4 | 80.9 | 58.5 | 90.2 | - | 67.3 | - | 67.8 | 63.8 | 68.7 |
Method | Channel Resolution | Spatial Resolution | Non-Linearity | GTA→CS (mIoU) | SYN→CS (mIoU) | Param (M) | Memory (GB) | Throughput (img/s) |
---|---|---|---|---|---|---|---|---|
MIC [20] | - | - | - | 75.9 | 67.3 | 85.69 | 22.60 | 0.98 |
+EA [29] | ≪C | ≪min(W, H) | Softmax | - | - | 103.5 | 30.88 | - |
+SENet [30] | C/4 | - | ReLU + Sigmoid | 75.6 | 68.1 | 95.65 | 25.51 | 0.65 |
+GCNet [31] | C/4 | - | ReLU + Softmax | 74.8 | 67.9 | 96.18 | 25.55 | 0.73 |
+CBAM [32] | C/16 | [W,H] | Sigmoid | 75.6 | 68.4 | 95.39 | 26.02 | 0.70 |
PANDA | C/2 | [W,H] | Sigmoid + Softmax | 76.1 | 68.7 | 99.33 | 26.34 | 0.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kao, C.-W.; Chang, W.-L.; Lee, C.-C.; Fan, K.-C. PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation. Electronics 2024, 13, 4302. https://doi.org/10.3390/electronics13214302
Kao C-W, Chang W-L, Lee C-C, Fan K-C. PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation. Electronics. 2024; 13(21):4302. https://doi.org/10.3390/electronics13214302
Chicago/Turabian StyleKao, Chiao-Wen, Wei-Ling Chang, Chun-Chieh Lee, and Kuo-Chin Fan. 2024. "PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation" Electronics 13, no. 21: 4302. https://doi.org/10.3390/electronics13214302
APA StyleKao, C. -W., Chang, W. -L., Lee, C. -C., & Fan, K. -C. (2024). PANDA: A Polarized Attention Network for Enhanced Unsupervised Domain Adaptation in Semantic Segmentation. Electronics, 13(21), 4302. https://doi.org/10.3390/electronics13214302