Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks
Abstract
:1. Introduction
2. Related Work
2.1. Benchmark Datasets
2.2. GDSR Algorithm
2.2.1. Traditional GDSR Method
2.2.2. Deep Learning GDSR Methods
2.3. Comparison with Existing Methods
3. Real-Sense Dataset
- (i)
- (ii)
- A depth completion model [53] is employed to restore the collected images directly.
- (i)
- Attitudes of people from different backgrounds.
- (ii)
- Various models, such as chairs and backpacks.
- (iii)
- Densely intertwined plants, such as flowers.
4. Methods
4.1. Problem Formula
- (i)
- Learning a suitable threshold function to select the beneficial edges for the task;
- (ii)
- Modifying the model to adaptively select the value of . The following section presents a novel SCD-Net model to address these issues and prevent the excessive RGB texture migration in the cross-modal image processing.
4.2. Overall Network Architecture
- (i)
- A set of HR RGB images and LRDMs is presented. The semi-coupled residual module extracts the shared and private features from the source image.
- (ii)
- For each element in the feature map, a set of matching weights and offsets are learned, enabling the filter to extract information beneficial to the task from different images.
- (iii)
- The obtained weights and offsets are multiplied and concatenated with the original LRDM to obtain an HR feature map.
4.2.1. Semi-Couple Feature Extractor
4.2.2. Deformable Kernel for Guided Edge
4.2.3. Training Loss
5. Experiments
5.1. Setup
5.2. Experimental Details
5.3. Comparison with Other Methods
5.4. Ablation Study
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chatzopoulos, D.; Bermejo, C.; Huang, Z.; Hui, P. Mobile augmented reality survey: From where we are to where we go. IEEE Access 2017, 5, 6917–6950. [Google Scholar] [CrossRef]
- Shotton, J.; Girshick, R.; Fitzgibbon, A.; Sharp, T.; Cook, M.; Finocchio, M.; Moore, R.; Kohli, P.; Criminisi, A.; Kipman, A.; et al. Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2821–2840. [Google Scholar] [CrossRef] [PubMed]
- Rasouli, A.; Tsotsos, J.K. Autonomous vehicles that interact with pedestrians: A survey of theory and practice. IEEE Trans. Intell. Transp. Syst. 2019, 21, 900–918. [Google Scholar] [CrossRef]
- DeSouza, G.N.; Kak, A.C. Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 237–267. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef]
- Riegler, G.; Rüther, M.; Bischof, H. Atgv-net: Accurate depth super-resolution. In Proceedings of the Computer Vision—ECCV 2016 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 268–284. [Google Scholar]
- Liu, B.; Chen, K.; Peng, S.L.; Zhao, M. Adaptive Aggregate Stereo Matching Network with Depth Map Super-Resolution. Sensors 2022, 22, 4548. [Google Scholar] [CrossRef]
- Liu, M.Y.; Tuzel, O.; Taguchi, Y. Joint geodesic upsampling of depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 169–176. [Google Scholar]
- Min, D.; Lu, J.; Do, M.N. Depth video enhancement based on weighted mode filtering. IEEE Trans. Image Process. 2011, 21, 1176–1190. [Google Scholar] [PubMed]
- Lu, J.; Forsyth, D. Sparse depth super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2245–2253. [Google Scholar]
- Lu, J.; Shi, K.; Min, D.; Lin, L.; Do, M.N. Cross-based local multipoint filtering. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: New York, NY, USA, 2012; pp. 430–437. [Google Scholar]
- Diebel, J.; Thrun, S. An application of markov random fields to range sensing. In Advances in Neural Information Processing Systems, Proceedings of the 18th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005; MIT Press: Cambridge, MA, USA, 2005; pp. 291–298. [Google Scholar]
- Li, Y.; Min, D.; Do, M.N.; Lu, J. Fast guided global interpolation for depth and motion. In Proceedings of the Computer Vision—ECCV 2016 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part III 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 717–733. [Google Scholar]
- Ferstl, D.; Reinbacher, C.; Ranftl, R.; Rüther, M.; Bischof, H. Image guided depth upsampling using anisotropic total generalized variation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 993–1000. [Google Scholar]
- Park, J.; Kim, H.; Tai, Y.W.; Brown, M.S.; Kweon, I. High quality depth map upsampling for 3D-TOF cameras. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 1623–1630. [Google Scholar]
- Yang, J.; Ye, X.; Li, K.; Hou, C.; Wang, Y. Color-guided depth recovery from RGB-D data using an adaptive autoregressive model. IEEE Trans. Image Process. 2014, 23, 3443–3458. [Google Scholar] [CrossRef]
- Xie, J.; Feris, R.S.; Yu, S.S.; Sun, M.T. Joint super resolution and denoising from a single depth image. IEEE Trans. Multimed. 2015, 17, 1525–1537. [Google Scholar] [CrossRef]
- Xie, J.; Feris, R.S.; Sun, M.T. Edge-guided single depth image super resolution. IEEE Trans. Image Process. 2015, 25, 428–438. [Google Scholar] [CrossRef] [PubMed]
- Gu, S.; Zuo, W.; Guo, S.; Chen, Y.; Chen, C.; Zhang, L. Learning dynamic guidance for depth image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3769–3778. [Google Scholar]
- Kiechle, M.; Hawe, S.; Kleinsteuber, M. A joint intensity and depth co-sparse analysis model for depth map super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1545–1552. [Google Scholar]
- Guo, C.; Li, C.; Guo, J.; Cong, R.; Fu, H.; Han, P. Hierarchical features driven residual learning for depth map super-resolution. IEEE Trans. Image Process. 2018, 28, 2545–2557. [Google Scholar] [CrossRef] [PubMed]
- Hui, T.W.; Loy, C.C.; Tang, X. Depth map super-resolution by deep multi-scale guidance. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 353–369. [Google Scholar]
- Ye, X.; Sun, B.; Wang, Z.; Yang, J.; Xu, R.; Li, H.; Li, B. PMBANet: Progressive multi-branch aggregation network for scene depth super-resolution. IEEE Trans. Image Process. 2020, 29, 7427–7442. [Google Scholar] [CrossRef]
- Kim, B.; Ponce, J.; Ham, B. Deformable kernel networks for joint image filtering. Int. J. Comput. Vis. 2021, 129, 579–600. [Google Scholar] [CrossRef]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from RGBD images. In Proceedings of the Computer Vision—ECCV 2012 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part V 12. Springer Berlin Heidelberg: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Tang, J.; Chen, X.; Zeng, G. Joint implicit image function for guided depth super-resolution. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 4390–4399. [Google Scholar]
- Nguyen, H.T.; Worring, M.; Van Den Boomgaard, R. Watersnakes: Energy-driven watershed segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 330–342. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Hirschmuller, H.; Scharstein, D. Evaluation of cost functions for stereo matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; IEEE: New York, NY, USA, 2007; pp. 1–8. [Google Scholar]
- Lu, S.; Ren, X.; Liu, F. Depth enhancement via low-rank matrix completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3390–3397. [Google Scholar]
- Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4040–4048. [Google Scholar]
- Butler, D.J.; Wulff, J.; Stanley, G.B.; Black, M.J. A naturalistic open source movie for optical flow evaluation. In Proceedings of the Computer Vision–ECCV 2012 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part VI 12. Springer Berlin Heidelberg: Berlin/Heidelberg, Germany, 2012; pp. 611–625. [Google Scholar]
- Peris, M.; Martull, S.; Maki, A.; Ohkawa, Y.; Fukui, K. Towards a simulation driven stereo vision system. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; IEEE: New York, NY, USA, 2012; pp. 1038–1042. [Google Scholar]
- Kopf, J.; Cohen, M.F.; Lischinski, D.; Uyttendaele, M. Joint bilateral upsampling. ACM Trans. Graph. (ToG) 2007, 26, 96-es. [Google Scholar] [CrossRef]
- Tomasi, C.; Manduchi, R. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), Bombay, India, 7 January 1998; IEEE: New York, NY, USA, 1998; pp. 839–846. [Google Scholar]
- Lo, K.H.; Wang YC, F.; Hua, K.L. Joint trilateral filtering for depth map super-resolution. In Proceedings of the 2013 Visual Communications and Image Processing (VCIP), Kuching, Malaysia, 17–20 November 2013; IEEE: New York, NY, USA, 2013; pp. 1–6. [Google Scholar]
- Li, Y.; Xue, T.; Sun, L.; Liu, J. Joint example-based depth map super-resolution. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, Melbourne, VIC Australia, 9–13 July 2012; IEEE: New York, NY, USA, 2012; pp. 152–157. [Google Scholar]
- Kasetkasem, T.; Arora, M.K.; Varshney, P.K. Super-resolution land cover mapping using a Markov random field based approach. Remote Sens. Environ. 2005, 96, 302–314. [Google Scholar] [CrossRef]
- Strong, D.; Chan, T. Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl. 2003, 19, S165. [Google Scholar] [CrossRef]
- Saputro, D.R.S.; Widyaningsih, P. Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method for the parameter estimation on geographically weighted ordinal logistic regression model (GWOLR). In Proceedings of the AIP Conference Proceedings, Yogyakarta, Indonesia, 15–16 May 2017; AIP Publishing: Melville, NY, USA, 2017; p. 1868. [Google Scholar]
- Bi, H.; Zhang, B.; Zhu, X.X.; Hong, W.; Sun, J.; Wu, Y. L1-regularization-based SAR imaging and CFAR detection via complex approximated message passing. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3426–3440. [Google Scholar] [CrossRef]
- Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, Proceedings of the 13th International Conference on Neural Information Processing Systems, Denver, CO, USA, 1 January 2000; MIT Press: Cambridge, MA, USA, 2000; pp. 556–562. [Google Scholar]
- Tosic, I.; Olshausen, B.A.; Culpepper, B.J. Learning sparse representations of depth. IEEE J. Sel. Top. Signal Process. 2011, 5, 941–952. [Google Scholar] [CrossRef]
- Zhang, K.; Gao, X.; Tao, D.; Li, X. Multi-scale dictionary for single image super-resolution. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: New York, NY, USA, 2012; pp. 1114–1121. [Google Scholar]
- Wang, S.; Zhang, L.; Liang, Y.; Pan, Q. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: New York, NY, USA, 2012; pp. 2216–2223. [Google Scholar]
- Li, Y.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep joint image filtering. In Proceedings of the Computer Vision–ECCV 2016 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 154–169. [Google Scholar]
- Tang, Q.; Cong, R.; Sheng, R.; He, L.; Zhang, D.; Zhao, Y.; Kwong, S. Bridgenet: A joint learning network of depth map super-resolution and monocular depth estimation. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 2148–2157. [Google Scholar]
- He, L.; Zhu, H.; Li, F.; Bai, H.; Cong, R.; Zhang, C.; Lin, C.; Liu, M.; Zhao, Y. Towards fast and accurate real-world depth super-resolution: Benchmark dataset and baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9229–9238. [Google Scholar]
- Wu, H.; Zheng, S.; Zhang, J.; Huang, K. Fast end-to-end trainable guided filter. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1838–1847. [Google Scholar]
- Zhao, Z.; Zhang, J.; Xu, S.; Lin, Z.; Pfister, H. Discrete cosine transform network for guided depth map super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5697–5707. [Google Scholar]
- Levin, A.; Lischinski, D.; Weiss, Y. Colorization using optimization. In ACM SIGGRAPH 2004 Papers; Association for Computing Machinery: New York, NY, USA, 2004; pp. 689–694. [Google Scholar]
- Jeon, J.; Lee, S. Reconstruction-based pairwise depth dataset for depth image enhancement using CNN. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 422–438. [Google Scholar]
- Li, Y.; Huang, J.B.; Ahuja, N.; Yang, M.H. Joint image filtering with deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1909–1923. [Google Scholar] [CrossRef]
- Su, H.; Jampani, V.; Sun, D.; Gallo, O.; Learned-Miller, E.; Kautz, J. Pixel-adaptive convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11166–11175. [Google Scholar]
- Deng, X.; Dragotti, P.L. Deep convolutional neural network for multi-modal image restoration and fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3333–3348. [Google Scholar] [CrossRef] [PubMed]
Methods | Filter-Based | Optimizer-Based | Learning-Based | Deep Learning | |
---|---|---|---|---|---|
Directions | |||||
Computational speed | Fast | Slowest | Fast | Fastest | |
Accuracy | Lowest | Low | High | highest | |
Interpretability | Strong | Strong | Poor | Poorest |
Methods | Middlebury | NYU v2 | Lu | ||||||
---|---|---|---|---|---|---|---|---|---|
×4 | ×8 | ×16 | ×4 | ×8 | ×16 | ×4 | ×8 | ×16 | |
Bicubic | 4.44 | 7.58 | 11.87 | 8.16 | 14.22 | 22.32 | 5.07 | 9.22 | 14.27 |
DJF [47] | 1.68 | 3.24 | 5.62 | 2.80 | 5.33 | 9.46 | 1.65 | 3.96 | 6.75 |
DJFR [54] | 1.32 | 3.19 | 5.57 | 2.38 | 4.94 | 9.18 | 1.15 | 3.57 | 6.77 |
PAC [55] | 1.32 | 2.62 | 4.58 | 1.89 | 3.33 | 6.78 | 1.20 | 2.33 | 5.19 |
CUNet [56] | 1.10 | 2.17 | 4.33 | 1.92 | 3.70 | 6.78 | 0.91 | 2.23 | 4.99 |
DKN [24] | 1.08 | 2.17 | 4.50 | 1.86 | 3.58 | 6.96 | 0.82 | 2.10 | 5.05 |
ours | 1.13 | 2.13 | 4.39 | 1.68 | 3.45 | 6.88 | 0.86 | 1.92 | 4.88 |
Number of Kernel-Group i (j = 3) | |||||
Setting | 8 | 16 | 32 | 64 | 128 |
×4 | 1.9146 | 1.8432 | 1.8247 | 1.6816 | 1.6729 |
×8 | 4.1318 | 3.8965 | 3.8424 | 3.4579 | 3.4413 |
×16 | 8.3753 | 7.8421 | 7.6025 | 6.8872 | 6.9037 |
Number of Kernel-Group j (i = 64) | |||||
Setting | 2 | 3 | 4 | 5 | 6 |
×4 | 2.0121 | 1.6816 | 1.6732 | 1.6553 | 1.6599 |
×8 | 3.8937 | 3.4579 | 3.4599 | 3.4603 | 3.4623 |
×16 | 7.5524 | 6.8872 | 6.8551 | 6.8324 | 6.8121 |
RMSE | DJF | DJFR | DKN | SCD | SCD_R |
---|---|---|---|---|---|
×4 | 2.66 | 2.23 | 1.75 | 1.58 | 1.47 |
×8 | 5.22 | 4.61 | 3.35 | 3.22 | 3.01 |
×16 | 9.11 | 8.89 | 6.56 | 6.34 | 6.03 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, B.; Chen, K.; Peng, S.-L.; Zhao, M. Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks. Mathematics 2023, 11, 4556. https://doi.org/10.3390/math11214556
Liu B, Chen K, Peng S-L, Zhao M. Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks. Mathematics. 2023; 11(21):4556. https://doi.org/10.3390/math11214556
Chicago/Turabian StyleLiu, Botao, Kai Chen, Sheng-Lung Peng, and Ming Zhao. 2023. "Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks" Mathematics 11, no. 21: 4556. https://doi.org/10.3390/math11214556
APA StyleLiu, B., Chen, K., Peng, S. -L., & Zhao, M. (2023). Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks. Mathematics, 11(21), 4556. https://doi.org/10.3390/math11214556