High-Performance Binocular Disparity Prediction Algorithm for Edge Computing
Abstract
:1. Introduction
- The three-dimensional convolution in the matching cost aggregation process is equivalent to one- and two-dimensional convolution according to the low-rank approximation principle, and the action mode of one- and two-dimensional convolution on the output is demonstrated and determined, which greatly reduces the number of network weights.
- In terms of disparity accuracy, the activation function with pixel-level modeling capability is used to optimize the gradient propagation of the disparity computing network after network compression and address approximation to improve the performance of the network.In this way, for the edge computing device, only one convolution layer and one max operation are needed to achieve an activation function with pixel-wise modeling capability.
- The matching cost volume is regularized by using unimodal cost volume filtering and a confidence estimation network, and the network parameters are updated by an independent loss function only in the training stage, which reduces the video memory in the running stage and alleviates the problem of the disparity matching cost distribution being far from the real distribution.
2. Related Works
2.1. Disparity Estimation
2.2. Three-Dimensional Convolution and Its Optimization
3. Methods
3.1. Overview
3.2. Pseudo 3D Convolution
3.3. WReLU Activation Function with Pixel-Level Modeling Capability
3.4. EC-P3D Module and Transposed EC-P3D Module
3.5. Adaptive Unimodal Cost Volume Filtering
3.6. Loss Function of Multi-Module Fusion Training
4. Experiments
4.1. Datasets
4.2. Training Details
4.3. Metrics
4.4. Results and Evaluation
4.4.1. Qualitative Analysis and Comparison of Algorithm Performance
- (1)
- In the error heat map, the overall color of the algorithm proposed in this study is more cool, so the overall accuracy is higher than that of other compared algorithms.
- (2)
- The output of the disparity map is dense, and the disparity changes continuously in the semantically relevant regions. Although the algorithm’s adaptation of compression and low-rank approximation is performed on the terminals, the quality of the disparity output is still guaranteed.
- (3)
- Due to the regularization operation on the matching cost volume, the algorithm proposed in this study achieves sharper and clearer boundaries in the edge areas compared to other algorithms. As shown in the black box in Figure 9, our algorithm demonstrates superior capabilities in representing fine structures and edges over other networks. The output of the confidence network is shown in Figure 10 and Figure 11, which not only represent the reliability of the disparity predicted by the network for these pixels but also objectively reflect the possibility of the real scene point corresponding to the pixel being in the edge, occlusion, or subtle structure.
4.4.2. Quantitative Analysis and Comparison of Algorithm Performance
4.4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hu, K.; Wang, T.; Shen, C.; Weng, C.; Zhou, F.; Xia, M.; Weng, L. Overview of underwater 3D reconstruction technology based on optical images. J. Mar. Sci. Eng. 2023, 11, 949. [Google Scholar] [CrossRef]
- Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Found. Trends Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
- Schmid, K.; Tomic, T.; Ruess, F.; Hirschmüller, H.; Suppa, M. Stereo vision based indoor/outdoor navigation for flying robots. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Toyo, Japan, 3–7 November 2013; pp. 3955–3962. [Google Scholar]
- Zenati, N.; Zerhouni, N. Dense stereo matching with application to augmented reality. In Proceedings of the 2007 IEEE International Conference on Signal Processing and Communications, Dubai, United Arab Emirates, 24–27 November 2007; pp. 1503–1506. [Google Scholar]
- Liu, F.; Qiao, R.; Chen, G.; Gong, G.; Lu, H. CASSANN-v2: A high-performance CNN accelerator architecture with on-chip memory self-adaptive tuning. IEICE Electron. Express 2022, 19, 20220124. [Google Scholar] [CrossRef]
- Žbontar, J.; LeCun, Y. Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 2016, 17, 1–32. [Google Scholar]
- Guney, F.; Geiger, A. Displets: Resolving stereo ambiguities using object knowledge. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2015; pp. 4165–4175. [Google Scholar]
- Pang, J.; Sun, W.; Ren, J.S.; Yang, C.; Yan, Q. Cascade residual learning: A two-stage convolutional neural network for stereo matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 887–895. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 2014, 27, 2366–2374. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Chang, J.R.; Chen, Y.S. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5410–5418. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Teed, Z.; Deng, J. Raft: Recurrent all-pairs field transforms for optical flow. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part II 16. pp. 402–419. [Google Scholar]
- Tankovich, V.; Hane, C.; Zhang, Y.; Kowdle, A.; Fanello, S.; Bouaziz, S. Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14362–14372. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 221–231. [Google Scholar] [CrossRef]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Hara, K.; Kataoka, H.; Satoh, Y. Can spatiotemporal 3d cnns retrace the history of 2D cnns and imagenet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6546–6555. [Google Scholar]
- Fan, H.; Niu, X.; Liu, Q.; Luk, W. F-C3D: FPGA-based 3-dimensional convolutional neural network. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017; pp. 1–4. [Google Scholar]
- Qiu, Z.; Yao, T.; Mei, T. Learning spatio-temporal representation with pseudo-3D residual networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5533–5541. [Google Scholar]
- Liu, Y.; Guo, X.; Tan, K.; Gong, G.; Lu, H. Novel activation function with pixelwise modeling capacity for lightweight neural network design. Concurr. Comput. Pract. Exp. 2021, 35, e6350. [Google Scholar] [CrossRef]
- Menze, M.; Geiger, A. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3061–3070. [Google Scholar]
Name | Parameters | Computational Complexity |
---|---|---|
Standard 3D convolution | ||
2D convolution in the spatial direction | ||
1D convolution in the temporal direction | ||
Spatial + temporal |
EPE | Three-Pixel Error% | Parameters | Running Time | |
---|---|---|---|---|
PSMNet | 1.09 | 4.35 | 5.2 M | 0.50 s |
AnyNet | 3.19 | 6.20 | 0.04 M | 97.3 ms |
DeepPruner | 0.86 | 2.15 | N/A | 182 ms |
AANet | 0.87 | 2.55 | N/A | 62 ms |
AcfNet | 0.86 | 1.89 | 5.6 M | 0.48 s |
GC-Net | 2.51 | 2.87 | 3.5 M | 0.95 s |
Proposed method | 0.77 | 1.41 | 1.7 M | 0.48 s |
EPE | Three-Pixel Error% | Parameters | Running Time | |
---|---|---|---|---|
PSMNet infrastructure | 1.090 | 4.346% | 5.22 M | 500 ms |
Increased network compression | 1.135 | 4.864% | 3.84 M | 452.4 ms |
EC-P3D module added | 1.187 | 5.124% | 1.72 M | 476.2 ms |
WReLU added | 1.073 | 3.573% | 1.74 M | 483.1 ms |
Cost volume regularization added | 0.770 | 1.41% | 1.74 M | 483.1 ms |
EPE | Three-Pixel Error% | Parameters | FPS | |
---|---|---|---|---|
AnyNet infrastructure | Stage 0 = 5.44, Stage 1 = 4.88, Stage 2 = 4.51 | 7.25% | 34,629 | 88.1 |
EC-P3D module added | Stage 0 = 5.79, Stage 1 = 5.12, Stage 2 = 4.74 | 7.62% | 22,683 | 92.5 |
WReLU added | Stage 0 = 5.11, Stage 1 = 4.63, Stage 2 = 4.11 | 6.85% | 22,683 | 91.6 |
Network | Method | Top-1 Accuracy | Top-5 Accuracy | Parameters |
---|---|---|---|---|
SqueezeNet 1.0 | Original | 57.50% | 80.30% | 1.25 M |
This Design | 64.55% | 85.09% | 1.25 M | |
SqueezeNet 1.1 | Original | 57.10% | 80.30% | 1.24 M |
This Design | 64.08% | 84.98% | 1.24 M | |
SqNxt-23 | Original | 57.80% | 80.90% | 0.72 M |
This Design | 65.15% | 86.33% | 0.77 M | |
MobileNetV2 | Original | 71.88% | 90.29% | 3.51 M |
This Design | 73.80% | 91.64% | 3.59 M | |
ShuffleNetV2x05 | Original | 58.62% | 81.14% | 1.37 M |
This Design | 62.16% | 83.45% | 1.37 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, Y.; Song, Y.; Liu, Y.; Zhang, H.; Liu, F. High-Performance Binocular Disparity Prediction Algorithm for Edge Computing. Sensors 2024, 24, 4563. https://doi.org/10.3390/s24144563
Cheng Y, Song Y, Liu Y, Zhang H, Liu F. High-Performance Binocular Disparity Prediction Algorithm for Edge Computing. Sensors. 2024; 24(14):4563. https://doi.org/10.3390/s24144563
Chicago/Turabian StyleCheng, Yuxi, Yang Song, Yi Liu, Hui Zhang, and Feng Liu. 2024. "High-Performance Binocular Disparity Prediction Algorithm for Edge Computing" Sensors 24, no. 14: 4563. https://doi.org/10.3390/s24144563
APA StyleCheng, Y., Song, Y., Liu, Y., Zhang, H., & Liu, F. (2024). High-Performance Binocular Disparity Prediction Algorithm for Edge Computing. Sensors, 24(14), 4563. https://doi.org/10.3390/s24144563