Latent Graph Attention for Spatial Context in Light-Weight Networks: Multi-Domain Applications in Visual Perception Tasks
Abstract
:1. Introduction
- We present Latent Graph Attention (LGA), a graph network-based module, to incorporate the global context in existing CNN architectures.
- LGA is computationally inexpensive, and it provides a speedup of over previous methods. The time complexity of LGA scales linearly with the number of connected nodes in the graph.
- For stable and responsive learning, we introduce an LGA-specific contrastive loss term that amplifies the discrimination between the foreground and background and, accordingly, helps to learn the suitable weights for each edge of the graph.
- We experimentally demonstrate that our LGA module, when plugged into a small-scale architecture, helps to boost performance with only minimal additional computational load. This empowers the development of efficient and compact architectures for edge devices.
- We experimentally demonstrate the efficacy of LGA over a variety of hard image-to-image translation tasks, including the segmentation of transparent objects, image dehazing, and optical flow estimation.
2. Related Work
3. The Proposed Approach
3.1. Latent Graph Attention (LGA)
3.2. Other Features of LGA
4. Experiments
4.1. Training Process
4.2. Training Data
4.3. Segmentation
4.4. Dehazing
4.5. Optical Flow Estimation
4.6. Ablation Study
4.6.1. Effect of Incorporating LGA
4.6.2. Original SqueezeNet with LGA or CCNet
4.6.3. Effects of Divergence Loss and Group Convolution
4.6.4. Ablation on BPPNet for Dehazing Problem
4.6.5. Ablation for Number of LGA Layers
4.6.6. Performance and Efficiency Comparison of LGA vs. CCNet
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wani, M.; Batchelor, B. Edge-region-based segmentation of range images. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 314–319. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1956–1963. [Google Scholar] [CrossRef]
- Liu, W.; Rabinovich, A.; Berg, A. ParseNet: Looking Wider to See Better. In Proceedings of the International Conference on Learning Representations Workshops, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Wang, S.; Lokhande, V.; Singh, M.; Kording, K.; Yarkony, J. End-to-end Training of CNN-CRF via Differentiable Dual-Decomposition. arXiv 2019, arXiv:1912.02937. [Google Scholar]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In Proceedings of the IEEE International Conference on Image Processing, Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1860–1864. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Ancuti, C.; Ancuti, C.O.; Timofte, R.; De Vleeschouwer, C. I-HAZE: A Dehazing Benchmark with Real Hazy and Haze-Free Indoor Images. In Proceedings of the Advanced Concepts for Intelligent Vision Systems; Blanc-Talon, J., Helbert, D., Philips, W., Popescu, D., Scheunders, P., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 620–631. [Google Scholar]
- Huang, Z.; Wang, X.; Wei, Y.; Huang, L.; Shi, H.; Liu, W.; Huang, T.S. CCNet: Criss-Cross Attention for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 45, 6896–6908. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Jin, L.; Xie, J.; Pan, B.; Luo, G. Generalized Phase Retrieval Model Based on Physics-Inspired Network for Holographic Metasurface. Prog. Electromagn. Res. 2023, 178, 103–110. [Google Scholar]
- Zhang, X.; Xu, H.; Mo, H.; Tan, J.; Yang, C.; Wang, L.; Ren, W. Dcnas: Densely connected neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13956–13967. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. arXiv 2017, arXiv:1707.01083. [Google Scholar]
- Kong, L.; Yang, J. MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge Distillation. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 677–688. [Google Scholar] [CrossRef]
- Lu, Y.; Chen, Y.; Zhao, D.; Chen, J. Graph-FCN for image semantic segmentation. In Proceedings of the International Symposium on Neural Networks; Springer: Berlin/Heidelberg, Germany, 2019; pp. 97–105. [Google Scholar]
- Ye, X.b.; Guan, Q.; Luo, W.; Fang, L.; Lai, Z.R.; Wang, J. Molecular substructure graph attention network for molecular property identification in drug discovery. Pattern Recognit. 2022, 128, 108659. [Google Scholar] [CrossRef]
- Zhou, H.; Yang, Y.; Luo, T.; Zhang, J.; Li, S. A unified deep sparse graph attention network for scene graph generation. Pattern Recognit. 2022, 123, 108367. [Google Scholar] [CrossRef]
- Sun, T.; Zhang, G.; Yang, W.; Xue, J.H.; Wang, G. Trosd: A new rgb-d dataset for transparent and reflective object segmentation in practice. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 5721–5733. [Google Scholar] [CrossRef]
- Yu, R.; Ren, W.; Zhao, M.; Wang, J.; Wu, D.; Xie, Y. Transparent objects segmentation based on polarization imaging and deep learning. Opt. Commun. 2024, 555, 130246. [Google Scholar] [CrossRef]
- Banerjee, S.; Hati, A.; Chaudhuri, S.; Velmurugan, R. Image co-segmentation using graph convolution neural network. In Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, Hyderabad, India, 18–22 December 2018; pp. 1–9. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Singh, A.; Bhave, A.; Prasad, D.K. Single image dehazing for a variety of haze scenarios using back projected pyramid network. In Proceedings of the European Conference on Computer Vision Workshop, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 166–181. [Google Scholar]
- Xie, E.; Wang, W.; Wang, W.; Sun, P.; Xu, H.; Liang, D.; Luo, P. Segmenting transparent object in the wild with transformer. arXiv 2021, arXiv:2101.08461. [Google Scholar]
- Liu, L.; Zhang, J.; He, R.; Liu, Y.; Wang, Y.; Tai, Y.; Luo, D.; Wang, C.; Li, J.; Huang, F. Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6489–6498. [Google Scholar]
- Liu, M.; Yin, H. Feature pyramid encoding network for real-time semantic segmentation. arXiv 2019, arXiv:1909.08599. [Google Scholar]
- Mehta, S.; Rastegari, M.; Shapiro, L.; Hajishirzi, H. Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9190–9200. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Li, G.; Yun, I.; Kim, J.; Kim, J. Dabnet: Depth-wise asymmetric bottleneck for real-time semantic segmentation. In Proceedings of the British Machine Vision Conference, Cardiff, UK, 9–12 September 2019. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Wang, Y.; Zhou, Q.; Xiong, J.; Wu, X.; Jin, X. Esnet: An efficient symmetric network for real-time semantic segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision, Long Beach, CA, USA, 15–20 June 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 41–52. [Google Scholar]
- Iandola, F.N. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters. 2016. Available online: https://github.com/forresti/SqueezeNet (accessed on 20 October 2024).
- He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
- Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 154–169. [Google Scholar]
- Berman, D.; Treibitz, T.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1674–1682. [Google Scholar]
- Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-In-One Dehazing Network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Butler, D.J.; Wulff, J.; Stanley, G.B.; Black, M.J. A naturalistic open source movie for optical flow evaluation. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 611–625. [Google Scholar]
- Jonschkowski, R.; Stone, A.; Barron, J.T.; Gordon, A.; Konolige, K.; Angelova, A. What matters in unsupervised optical flow. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 557–572. [Google Scholar]
- Kong, L.; Shen, C.; Yang, J. Fastflownet: A lightweight network for fast optical flow estimation. In Proceedings of the IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 10310–10316. [Google Scholar]
- Im, W.; Kim, T.K.; Yoon, S.E. Unsupervised learning of optical flow with deep feature similarity. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 172–188. [Google Scholar]
MODEL | Performance (↑) | Efficiency (↓) | ||
---|---|---|---|---|
mIoU (%) | Accuracy (%) | Parameters () | FLOPS () | |
Efficient/real-time small architectures | ||||
FPENet [27] | 10.1 | 70.3 | 0.5 | 0.8 |
ESPNet-v2 [28] | 12.3 | 73.0 | 3.5 | 0.8 |
ENet [29] | 23.4 | 78.2 | 0.4 | 2.1 |
DABNet [30] | 15.3 | 77.4 | 0.8 | 5.2 |
LEDNet [5] | 30.3 | 72.9 | 1.1 | 19.6 |
ICNet [31] | 23.4 | 78.2 | 7.8 | 10.6 |
MobileNet-v2 [32] | 17.6 | 77.6 | 3.3 | 29.3 |
ESNet [33] | 43.6 | 45.5 | 1.7 | 27.3 |
LGA incorporation into small architectures | ||||
Shuffle [15] with LGA | 44.5 | 78.7 | 0.4 | 3.5 |
Squeeze [34] with LGA | 44.6 | 79.6 | 1.1 | 13.5 |
Large architectures | ||||
ViT [13] | 29.6 | 67.8 | 171.6 | 176.7 |
PSPNet (Res34) [10] | 43.2 | 82.8 | 21.5 | 19.3 |
PSPNet (Res50) [10] | 43.2 | 83.2 | 24.4 | 24.0 |
DeepLab [6] | 59.1 | 89.6 | 39.6 | 328.0 |
MODEL | SSIM (↑) | PSNR (↑) |
---|---|---|
Input (hazy image) | 0.7302 | 13.80 |
He et al. [35] | 0.7516 | 14.43 |
Zhu et al. [36] | 0.6065 | 12.24 |
Ren et al. [37] | 0.7545 | 15.22 |
Berman et al. [38] | 0.6537 | 14.12 |
Li et al. [39] | 0.7323 | 13.98 |
BPPNet-reduced | 0.8482 | 18.89 |
BPPNet-reduced with LGA | 0.8663 | 20.17 |
MODEL | Final (↓) | Clean (↓) | ||||
---|---|---|---|---|---|---|
EPE All | EPE Matched | EPE Unmatched | EPE All | EPE Matched | EPE Unmatched | |
UFlow [41] | 6.498 | 3.078 | 34.398 | 5.205 | 2.036 | 31.058 |
FastFlowNet [42] | 6.080 | 2.942 | 31.692 | 4.886 | 1.789 | 30.182 |
MDFlow-fast [16] | 5.994 | 2.770 | 32.283 | 4.733 | 1.673 | 29.718 |
UnsupSimFlow [43] | 6.916 | 3.017 | 38.702 | 5.926 | 2.159 | 36.655 |
ARFlow [26] | 5.889 | 2.734 | 31.602 | 4.782 | 1.908 | 28.261 |
LGA | 5.502 | 2.604 | 29.142 | 4.109 | 1.597 | 24.626 |
MODEL | Performance (↑) | Extra Resources (↓) | ||
---|---|---|---|---|
mIoU (%) | Accuracy (%) | Parameters (×) | FLOPS (×) | |
CCNet [8] | 42.8 | 79.6 | 2686 | 5652 |
LGA | 45.8 | 81.8 | 132 | 140 |
LGA small | 44.6 | 79.6 | 17 | 22 |
MODEL | Variation | mIoU (%) | Parameters () | FLOPS () |
---|---|---|---|---|
Squeeze | No LGA | 41.5 | 1.018 | 13.140 |
LGA with SC and | 45.8 | 1.150 | 13.280 | |
LGA with GC and | 44.6 | 1.035 | 13.162 | |
LGA with SC, no | 43.6 | |||
LGA with GC, no | 42.3 | |||
Shuffle | No LGA | 36.9 | 0.395 | 3.42 |
LGA with SC and | 44.6 | 0.506 | 3.69 | |
LGA with GC and | 44.5 | 0.412 | 3.5 | |
LGA with SC, no | 43.0 | |||
LGA with GC, no | 41.6 |
Original | Reduced | Reduced with LGA | |
---|---|---|---|
SSIM | 0.8994 | 0.8482 | 0.8663 |
PSNR | 22.56 | 18.89 | 20.17 |
Parameters () | 8.851 | 0.685 | 0.687 |
FLOPS () | 348.49 | 87.44 | 87.78 |
No. of LGA Layers | 0 | 1 | 2 | 4 | 8 |
---|---|---|---|---|---|
mIoU (%) | 36.9 | 43.1 | 43.9 | 44.5 | 41.1 |
MODEL | Performance (↑) | Extra Parameters () (↓) | Extra FLOPS () (↓) | ||||||
---|---|---|---|---|---|---|---|---|---|
mIoU | Accuracy | Conv. Channel Resizing | Attention Module | Total | Conv. Channel Resizing | Attention Module | Total | ||
Information Propagation | Other Conv. Operations | ||||||||
CCNet [8] | 42.8 | 79.6 | 2359 | 327 | 2686 | 4832 | 150 | 670 | 5652 |
LGA | 45.8 | 81.8 | 66 | 67 | 132 | 67 | 5 | 68 | 140 |
LGA small | 44.6 | 79.6 | 8 | 9 | 17 | 8 | 5 | 9 | 22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Singh, A.; Bhambhu, Y.; Buckchash, H.; Gupta, D.K.; Prasad, D.K. Latent Graph Attention for Spatial Context in Light-Weight Networks: Multi-Domain Applications in Visual Perception Tasks. Appl. Sci. 2024, 14, 10677. https://doi.org/10.3390/app142210677
Singh A, Bhambhu Y, Buckchash H, Gupta DK, Prasad DK. Latent Graph Attention for Spatial Context in Light-Weight Networks: Multi-Domain Applications in Visual Perception Tasks. Applied Sciences. 2024; 14(22):10677. https://doi.org/10.3390/app142210677
Chicago/Turabian StyleSingh, Ayush, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, and Dilip K. Prasad. 2024. "Latent Graph Attention for Spatial Context in Light-Weight Networks: Multi-Domain Applications in Visual Perception Tasks" Applied Sciences 14, no. 22: 10677. https://doi.org/10.3390/app142210677
APA StyleSingh, A., Bhambhu, Y., Buckchash, H., Gupta, D. K., & Prasad, D. K. (2024). Latent Graph Attention for Spatial Context in Light-Weight Networks: Multi-Domain Applications in Visual Perception Tasks. Applied Sciences, 14(22), 10677. https://doi.org/10.3390/app142210677