Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions
Abstract
:1. Introduction
- This paper utilizes a combination of original, exposure-adjusted, and enhanced images to strengthen the model’s ability to generalize across different scenarios.
- Two key modules are integrated: the channel interaction module improves scene comprehension by integrating information across the preceding five frames, while the Multi-Scale Fusion Module optimizes the network’s use of features at multiple levels.
- This paper introduces a mask when calculating the reconstruction loss of the enhanced images to reduce depth estimation bias caused by data augmentation. An enhanced consistency loss is proposed to ensure consistency in depth predictions.
2. Related Work
2.1. Supervised Depth Estimation
2.2. Self-Supervised Depth Estimation
3. Method
3.1. Data Augmentation for Simulating Weather Variability
3.2. Self-Supervised Depth Estimation Network
3.2.1. Channel Interaction Enhanced Depth Encoder
3.2.2. Multi-Scale Feature Fusion Depth Decoder
3.3. Loss Function with Enhanced Consistency Loss
Algorithm 1 CIT-Depth: Depth Estimation Network |
|
4. Experiments
4.1. Implementation Details
4.2. Data Augmentation with GAN-Generated Images
4.3. Results on KITTI
Method | Data | Resolution | Abs Rel | Sq Rel | RMSE | RMSE log | |||
---|---|---|---|---|---|---|---|---|---|
SfMLearner [10] | M | 640 × 192 | 0.183 | 1.595 | 6.709 | 0.270 | 0.734 | 0.902 | 0.959 |
Monodepth2-Res18 [11] | M | 640 × 192 | 0.115 | 0.903 | 4.863 | 0.193 | 0.877 | 0.959 | 0.981 |
Monodepth2-Res50 [11] | M | 640 × 192 | 0.110 | 0.830 | 4.642 | 0.189 | 0.882 | 0.961 | 0.981 |
SGDepth [12] | M + Se | 640 × 192 | 0.113 | 0.832 | 4.691 | 0.190 | 0.880 | 0.961 | 0.981 |
PackNet-SfM [23] | M | 640 × 192 | 0.111 | 0.784 | 4.601 | 0.189 | 0.877 | 0.960 | 0.982 |
HR-Depth [38] | M | 640 × 192 | 0.109 | 0.791 | 4.633 | 0.186 | 0.884 | 0.961 | 0.983 |
ADAADepth [39] | M | 640 × 192 | 0.111 | 0.815 | 4.684 | 0.187 | 0.883 | 0.961 | 0.982 |
BRNet [40] | M | 640 × 192 | 0.105 | 0.699 | 4.465 | 0.180 | 0.888 | 0.963 | 0.984 |
CADepth [41] | M | 640 × 192 | 0.105 | 0.765 | 4.535 | 0.181 | 0.891 | 0.964 | 0.983 |
DIFFNet [42] | M | 640 × 192 | 0.102 | 0.750 | 4.447 | 0.179 | 0.896 | 0.965 | 0.983 |
MonoViT [32] | M | 640 × 192 | 0.099 | 0.708 | 4.374 | 0.175 | 0.900 | 0.967 | 0.984 |
CIT-Depth (Ours) | M | 640 × 192 | 0.097 | 0.655 | 4.214 | 0.172 | 0.902 | 0.968 | 0.984 |
4.4. Results on Make3D
4.5. Results on DrivingStereo
4.6. Results on Foggy CityScape
4.7. Results on NuScene-Night
Method | Abs Rel | Sq Rel | RMSE | RMSE log | |||
---|---|---|---|---|---|---|---|
Monodepth2 [11] | 0.398 | 6.210 | 14.571 | 0.568 | 0.378 | 0.650 | 0.794 |
HR-Depth [38] | 0.460 | 6.635 | 15.028 | 0.622 | 0.305 | 0.570 | 0.749 |
CADepth [41] | 0.421 | 5.950 | 14.504 | 0.593 | 0.311 | 0.613 | 0.776 |
DIFFNet [42] | 0.344 | 4.853 | 13.154 | 0.491 | 0.440 | 0.710 | 0.838 |
ADDS-Depth [48] | 0.321 | 4.594 | 12.909 | 0.479 | 0.466 | 0.711 | 0.840 |
MonoViT [32] | 0.313 | 4.144 | 12.255 | 0.456 | 0.484 | 0.736 | 0.858 |
CIT-Depth (Ours) | 0.307 | 4.080 | 11.591 | 0.431 | 0.539 | 0.781 | 0.863 |
4.8. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Afshar, M.F.; Shirmohammadi, Z.; Ghahramani, S.A.A.G.; Noorparvar, A.; Hemmatyar, A.M.A. An Efficient Approach to Monocular Depth Estimation for Autonomous Vehicle Perception Systems. Sustainability 2023, 15, 8897. [Google Scholar] [CrossRef]
- Ebner, L.; Billings, G.; Williams, S. Metrically scaled monocular depth estimation through sparse priors for underwater robots. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 3751–3757. [Google Scholar]
- Jia, Q.; Chang, L.; Qiang, B.; Zhang, S.; Xie, W.; Yang, X.; Sun, Y.; Yang, M. Real-time 3D reconstruction method based on monocular vision. Sensors 2021, 21, 5909. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A. Attention is all you need. In Proceedings of the International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2014; pp. 2366–2374. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2002–2011. [Google Scholar]
- Bhat, S.F.; Alhashim, I.; Wonka, P. Adabins: Depth estimation using adaptive bins. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 4009–4018. [Google Scholar]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G.J. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3828–3838. [Google Scholar]
- Klingner, M.; Termöhlen, J.-A.; Mikolajczyk, J.; Fingscheidt, T. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Cham, Switzerland, 2020; pp. 582–600. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 270–279. [Google Scholar]
- Garg, R.; Vijay Kumar, B.G.; Carneiro, G.; Reid, I. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 740–756. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Zhu, Y.; Adam, H.; Yuille, A.; Chen, L.-C. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 5463–5474. [Google Scholar]
- Zhao, C.; Tang, Y.; Sun, Q. Unsupervised monocular depth estimation in highly complex environments. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 1237–1246. [Google Scholar] [CrossRef]
- Gasperini, S.; Morbitzer, N.; Jung, H.; Navab, N.; Tombari, F. Robust monocular depth estimation under challenging conditions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 8177–8186. [Google Scholar]
- Spencer, J.; Bowden, R.; Hadfield, S. Defeat-net: General monocular depth via simultaneous unsupervised representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 14402–14413. [Google Scholar]
- Choi, J.; Jung, D.; Lee, D.; Kim, C. Safenet: Self-supervised monocular depth estimation with semantic-aware feature extraction. In Proceedings of the Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
- Wang, K.; Zhang, Z.; Yan, Z.; Li, X.; Xu, B.; Li, J.; Yang, J. Regularizing nighttime weirdness: Efficient self-supervised monocular depth estimation in the dark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16055–16064. [Google Scholar]
- Yin, Z.; Shi, J. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1983–1992. [Google Scholar]
- Guizilini, V.; Ambrus, R.; Pillai, S.; Raventos, A.; Gaidon, A. 3D packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2485–2494. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Varma, A.; Chawla, H.; Zonooz, B.; Arani, E. Transformers in self-supervised monocular depth estimation with unknown camera intrinsics. arXiv 2022, arXiv:2202.03131. [Google Scholar]
- Lasinger, K.; Ranftl, R.; Schindler, K.; Koltun, V. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv 2019, arXiv:1907.01341. [Google Scholar]
- Li, Z.; Chen, Z.; Liu, X.; Jiang, J. Depthformer: Exploiting long-range correlation and local information for accurate monocular depth estimation. Mach. Intell. Res. 2023, 20, 837–854. [Google Scholar] [CrossRef]
- Hwang, S.-J.; Park, S.-J.; Baek, J.-H.; Kim, B. Self-supervised monocular depth estimation using hybrid transformer encoder. IEEE Sens. J. 2022, 22, 18762–18770. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Pizzati, F.; Cerri, P.; De Charette, R. CoMoGAN: Continuous model-guided image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14288–14298. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhao, C.; Zhang, Y.; Poggi, M.; Tosi, F.; Guo, X.; Zhu, Z.; Huang, G.; Tang, Y.; Mattoccia, S. Monovit: Self-supervised monocular depth estimation with a vision transformer. In Proceedings of the 2022 International Conference on 3D Vision (3DV), Prague, Czech Republic, 17–19 October 2022; pp. 668–678. [Google Scholar]
- Lee, Y.; Kim, J.; Willette, J.; Hwang, S.J. MPViT: Multi-path vision transformer for dense prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 7287–7296. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Loshchilov, I. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef]
- Lyu, X.; Liu, L.; Wang, M.; Kong, X.; Liu, L.; Liu, Y.; Chen, X.; Yuan, Y. HR-Depth: High resolution self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, Number 3. pp. 2294–2301. [Google Scholar]
- Kaushik, V.; Jindgar, K.; Lall, B. ADAADepth: Adapting data augmentation and attention for self-supervised monocular depth estimation. IEEE Robot. Autom. Lett. 2021, 6, 7791–7798. [Google Scholar] [CrossRef]
- Han, W.; Yin, J.; Jin, X.; Dai, X.; Shen, J. Brnet: Exploring comprehensive features for monocular depth estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–28 August 2022; pp. 586–602. [Google Scholar]
- Yan, J.; Zhao, H.; Bu, P.; Jin, Y. Channel-wise attention-based network for self-supervised monocular depth estimation. In Proceedings of the 2021 International Conference on 3D Vision (3DV), Virtual, 18–21 October 2021; pp. 464–473. [Google Scholar]
- Zhou, H.; Greenwood, D.; Taylor, S. Self-supervised monocular depth estimation with internal feature fusion. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 22–25 November 2021. [Google Scholar]
- Saxena, A.; Sun, M.; Ng, A.Y. Make3d: Learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 824–840. [Google Scholar] [CrossRef] [PubMed]
- Yang, G.; Song, X.; Huang, C.; Deng, Z.; Shi, J.; Zhou, B. DrivingStereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 899–908. [Google Scholar]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11621–11631. [Google Scholar]
- Liu, L.; Song, X.; Wang, M.; Liu, Y.; Zhang, L. Self-supervised monocular depth estimation for all day images using domain separation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 12737–12746. [Google Scholar]
- Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The Oxford RobotCar dataset. Int. J. Robot. Res. 2017, 36, 3–15. [Google Scholar] [CrossRef]
Method | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|
Monodepth2 [11] | 0.321 | 3.377 | 7.417 | 0.164 |
HR-Depth [38] | 0.315 | 3.208 | 7.031 | 0.155 |
CADepth [41] | 0.318 | 3.223 | 7.151 | 0.158 |
DIFFNet [42] | 0.299 | 2.910 | 6.760 | 0.153 |
MonoViT [32] | 0.286 | 2.759 | 6.625 | 0.147 |
CIT-Depth (Ours) | 0.275 | 2.639 | 6.402 | 0.144 |
Domain | Method | Abs Rel | Sq Rel | RMSE | RMSE log | |||
---|---|---|---|---|---|---|---|---|
foggy | Monodepth2 [11] | 0.143 | 1.954 | 9.818 | 0.218 | 0.812 | 0.936 | 0.974 |
HR-Depth [38] | 0.132 | 1.826 | 9.587 | 0.198 | 0.826 | 0.949 | 0.982 | |
CADepth [41] | 0.141 | 1.779 | 9.450 | 0.208 | 0.811 | 0.945 | 0.981 | |
DIFFNet [42] | 0.126 | 1.562 | 8.724 | 0.189 | 0.839 | 0.956 | 0.985 | |
MonoViT [32] | 0.106 | 1.155 | 7.256 | 0.161 | 0.871 | 0.970 | 0.990 | |
CIT-Depth (Ours) | 0.104 | 1.147 | 7.213 | 0.159 | 0.880 | 0.975 | 0.992 | |
cloudy | Monodepth2 [11] | 0.155 | 1.902 | 6.977 | 0.209 | 0.812 | 0.943 | 0.979 |
HR-Depth [38] | 0.148 | 1.657 | 6.659 | 0.204 | 0.815 | 0.945 | 0.981 | |
CADepth [41] | 0.148 | 1.805 | 6.712 | 0.205 | 0.829 | 0.947 | 0.981 | |
DIFFNet [42] | 0.140 | 1.572 | 6.298 | 0.192 | 0.837 | 0.950 | 0.983 | |
MonoViT [32] | 0.135 | 1.469 | 6.095 | 0.183 | 0.857 | 0.955 | 0.985 | |
CIT-Depth (Ours) | 0.133 | 1.456 | 5.912 | 0.180 | 0.860 | 0.957 | 0.986 | |
rainy | Monodepth2 [11] | 0.240 | 3.339 | 11.042 | 0.301 | 0.590 | 0.587 | 0.953 |
HR-Depth [38] | 0.222 | 2.962 | 10.495 | 0.281 | 0.631 | 0.869 | 0.959 | |
CADepth [41] | 0.226 | 3.015 | 10.825 | 0.287 | 0.629 | 0.851 | 0.956 | |
DIFFNet [42] | 0.192 | 2.411 | 9.626 | 0.246 | 0.677 | 0.914 | 0.969 | |
MonoViT [32] | 0.174 | 2.132 | 9.490 | 0.231 | 0.728 | 0.928 | 0.976 | |
CIT-Depth (Ours) | 0.170 | 2.015 | 9.023 | 0.220 | 0.736 | 0.935 | 0.979 | |
sunny | Monodepth2 [11] | 0.178 | 2.105 | 8.209 | 0.240 | 0.782 | 0.925 | 0.968 |
HR-Depth [38] | 0.164 | 1.839 | 7.890 | 0.227 | 0.794 | 0.936 | 0.975 | |
CADepth [41] | 0.162 | 1.755 | 7.689 | 0.221 | 0.801 | 0.936 | 0.974 | |
DIFFNet [42] | 0.150 | 1.616 | 7.580 | 0.210 | 0.812 | 0.940 | 0.978 | |
MonoViT [32] | 0.142 | 1.457 | 7.007 | 0.199 | 0.832 | 0.948 | 0.981 | |
CIT-Depth (Ours) | 0.143 | 1.459 | 7.009 | 0.199 | 0.833 | 0.949 | 0.981 |
Method | Abs Rel | Sq Rel | RMSE | RMSE log | |||
---|---|---|---|---|---|---|---|
Monodepth2 [11] | 0.208 | 3.095 | 12.449 | 0.337 | 0.656 | 0.842 | 0.917 |
HR-Depth [38] | 0.213 | 3.015 | 12.267 | 0.336 | 0.642 | 0.841 | 0.920 |
CADepth [41] | 0.207 | 2.738 | 11.550 | 0.318 | 0.650 | 0.856 | 0.933 |
DIFFNet [42] | 0.187 | 2.583 | 11.337 | 0.304 | 0.689 | 0.867 | 0.937 |
MonoViT [32] | 0.155 | 1.873 | 9.585 | 0.244 | 0.771 | 0.910 | 0.967 |
CIT-Depth (Ours) | 0.151 | 1.689 | 8.092 | 0.229 | 0.798 | 0.939 | 0.970 |
Dataset | Model Configuration | Abs Rel | Sq Rel | RMSE | RMSE log | |||
---|---|---|---|---|---|---|---|---|
KITTI | Baseline | 0.099 | 0.701 | 4.409 | 0.177 | 0.896 | 0.965 | 0.983 |
+CIM | 0.098 | 0.689 | 4.298 | 0.174 | 0.898 | 0.967 | 0.984 | |
+MSFM | 0.098 | 0.698 | 4.341 | 0.175 | 0.899 | 0.967 | 0.984 | |
+CIM+MSFM | 0.097 | 0.655 | 4.214 | 0.172 | 0.902 | 0.968 | 0.984 | |
DrivingStereo (foggy) | Baseline | 0.108 | 1.201 | 7.310 | 0.165 | 0.865 | 0.967 | 0.990 |
+CIM | 0.105 | 1.155 | 7.259 | 0.163 | 0.869 | 0.969 | 0.991 | |
+MSFM | 0.105 | 1.150 | 7.255 | 0.162 | 0.871 | 0.970 | 0.991 | |
+CIM+MSFM | 0.104 | 1.147 | 7.213 | 0.159 | 0.880 | 0.975 | 0.992 | |
DrivingStereo (rainy) | Baseline | 0.175 | 2.159 | 9.492 | 0.232 | 0.725 | 0.926 | 0.976 |
+CIM | 0.172 | 2.122 | 9.251 | 0.225 | 0.732 | 0.931 | 0.978 | |
+MSFM | 0.173 | 2.138 | 9.319 | 0.230 | 0.729 | 0.929 | 0.978 | |
+CIM+MSFM | 0.170 | 2.015 | 9.023 | 0.220 | 0.736 | 0.935 | 0.979 | |
Foggy CityScape | Baseline | 0.156 | 1.877 | 9.592 | 0.247 | 0.770 | 0.909 | 0.966 |
+CIM | 0.152 | 1.711 | 8.301 | 0.235 | 0.785 | 0.928 | 0.968 | |
+MSFM | 0.154 | 1.851 | 9.204 | 0.240 | 0.778 | 0.916 | 0.968 | |
+CIM+MSFM | 0.151 | 1.689 | 8.092 | 0.229 | 0.798 | 0.939 | 0.970 | |
NuScene-Night | Baseline | 0.315 | 4.148 | 12.260 | 0.457 | 0.482 | 0.739 | 0.859 |
+CIM | 0.310 | 4.098 | 11.803 | 0.440 | 0.519 | 0.769 | 0.862 | |
+MSFM | 0.312 | 4.121 | 12.009 | 0.454 | 0.501 | 0.752 | 0.860 | |
+CIM+MSFM | 0.307 | 4.080 | 11.591 | 0.431 | 0.539 | 0.781 | 0.863 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Guo, Z.; Ping, P.; Zhang, H.; Shi, Q. Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions. Sustainability 2024, 16, 9131. https://doi.org/10.3390/su16209131
Liu J, Guo Z, Ping P, Zhang H, Shi Q. Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions. Sustainability. 2024; 16(20):9131. https://doi.org/10.3390/su16209131
Chicago/Turabian StyleLiu, Jianqiang, Zhengyu Guo, Peng Ping, Hao Zhang, and Quan Shi. 2024. "Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions" Sustainability 16, no. 20: 9131. https://doi.org/10.3390/su16209131
APA StyleLiu, J., Guo, Z., Ping, P., Zhang, H., & Shi, Q. (2024). Channel Interaction and Transformer Depth Estimation Network: Robust Self-Supervised Depth Estimation Under Varied Weather Conditions. Sustainability, 16(20), 9131. https://doi.org/10.3390/su16209131