DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation
Abstract
:1. Introduction
- We designed a laser-visible image acquisition system and constructed a dataset comprising one-to-one corresponding laser-visible images that can be used for both unsupervised and supervised methods.
- An improved diffusion model, DCLTV, is proposed to tackle the laser-visible image translation challenge. The proposed model limits the random generation of diffusion models via dual conditions consisting of a Brownian bridge strategy and interpolation-based conditional injection. To the best of our knowledge, this is the first time that the diffusion model has been explored in the field of laser-visible image translation.
- Compared to five representative baseline models, the proposed DCLTV achieves the best performance in both quantitative and qualitative experiments, which proves its effectiveness in laser-visible image translation.
2. Laser-Visible Image Dataset
2.1. Dataset Acquisition
- Diversity. The dataset ought to encompass as extensive a variety of scene and target categories as feasible so that the neural network models can better learn color and edge features in the real world.
- Multiple angles. Given that an object may present radically different appearances when photographed from diverse angles, it is essential to capture images of the scenes and targets from multiple perspectives, which can make the dataset more comprehensive.
- Matching. In order to satisfy the requirements of supervised and unsupervised methods and to accurately evaluate the models, laser images and visible images need to be acquired in a paired fashion, meaning that they must be captured in the same field of view and under the same environmental conditions.
2.2. Dataset Processing
2.3. Instances of the Laser-Visible Image Dataset
3. Methods
3.1. Overall Framework and Diffusion Process of DCLTV
3.2. Bridge Principle
3.3. Diffusion Process for DCLTV
3.3.1. Forward Diffusion Process
3.3.2. Reverse Diffusion Process
3.4. Interpolation-Based Conditional Injection
3.5. Loss Function
3.6. Self-Attention Mechanism
4. Experiments and Results
4.1. Experimental Setup
4.1.1. Experimental Environment and Hyperparameters
4.1.2. Evaluation Metrics
4.1.3. Baseline Models
4.2. Comparative Results
4.2.1. Qualitative Comparison
4.2.2. Quantitative Comparison
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Deng, Q.-L.; Huang, Z.-L.; Tsai, C.-C.; Lin, C.-W. HardGAN: A Haze-Aware Representation Distillation GAN for Single Image Dehazing. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VI. Springer: Berlin/Heidelberg, Germany, 2020; pp. 722–738. [Google Scholar]
- Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2020. [Google Scholar]
- Eboli, T.; Sun, J.; Ponce, J. End-to-End Interpretable Learning of Non-Blind Image Deblurring. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XVII. Springer: Berlin/Heidelberg, Germany, 2020; pp. 314–331. [Google Scholar]
- Liu, H.-Y.; Wan, Z.-Y.; Huang, W.; Song, Y.-B.; Han, X.-T.; Liao, J. PD-GAN: Probabilistic Diverse GAN for Image Inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9371–9381. [Google Scholar]
- Shao, X.-N.; Zhang, W.-D. SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 6546–6555. [Google Scholar]
- Huang, Q.; Zheng, Z.-L.; Hu, X.-Q.; Sun, L.; Li, Q.-L. Bridging the Gap Between Label- and Reference-Based Synthesis in Multi-Attribute Image-to-Image Translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 14628–14637. [Google Scholar]
- Wan, Z.-Y.; Zhang, J.-B.; Chen, D.-D.; Liao, J. High-Fidelity Pluralistic Image Completion with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 4692–4701. [Google Scholar]
- Pang, Y.-X.; Lin, J.-X.; Qin, T.; Chen, Z.-B. Image-to-Image Translation: Methods and Applications. IEEE Trans. Multimed. 2022, 24, 3859–3881. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Y.-L.; Gu, J.-J.; Kong, L.-H.; Yang, X.-K.; Yu, F. Dual Aggregation Transformer for Image Super-Resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2023; pp. 12312–12321. [Google Scholar]
- Sun, Z.-J.; Fu, X.-Y.; Huang, L.-Z.; Liu, A.-P.; Zha, Z.-J. Motion Aware Event Representation-Driven Image Deblurring. In Proceedings of the Computer Vision—ECCV 2024: 18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part XLVI. Springer: Berlin/Heidelberg, Germany, 2024; pp. 418–435. [Google Scholar]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
- Cao, J.; Hou, L.-X.; Yang, M.-H.; He, R.; Sun, Z.-N. ReMix: Towards Image-to-Image Translation with Limited Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Kim, G.; Kang, K.; Kim, S.; Lee, H.; Kim, S.; Kim, J.; Baek, S.-H.; Cho, S. BigColor: Colorization Using a Generative Color Prior for Natural Images. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 350–366. [Google Scholar]
- Ji, X.-Z.; Jiang, B.-Y.; Luo, D.-H.; Tao, G.-P.; Chu, W.-Q.; Xie, Z.; Wang, C.; Tai, Y. ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 20–36. [Google Scholar]
- Weng, S.; Sun, J.-M.; Li, Y.; Li, S.; Shi, B.-X. CT2: Colorization Transformer via Color Tokens. In Proceedings of the Computer Vision—ECCV, Tel Aviv, Israel, 23–27 October 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 1–16. [Google Scholar]
- Kang, X.-Y.; Yang, T.; Ouyang, W.-Q.; Ren, P.-R.; Li, L.-Z.; Xie, X.-S. DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 328–338. [Google Scholar]
- He, Y.-W.; Jin, X.; Jiang, Q.; Cheng, Z.-E.; Wang, P.-M.; Zhou, W. LKAT-GAN: A GAN for Thermal Infrared Image Colorization Based on Large Kernel and AttentionUNet-Transformer. IEEE Trans. Consum. Electron. 2023, 69, 478–489. [Google Scholar] [CrossRef]
- Zhai, H.; Chen, M.; Yang, X.-X.; Kang, G.-S. Multi-Scale HSV Color Feature Embedding for High-Fidelity NIR-to-RGB Spectrum Translation. arXiv 2024, arXiv:2404.16685. [Google Scholar]
- Shen, K.-Q.; Vivone, G.; Yang, X.-Y.; Lolli, S.; Schmitt, M. A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches. Neural Netw. 2024, 169, 698–712. [Google Scholar] [CrossRef] [PubMed]
- Luan, Q.; Wen, F.; Cohen-Or, D.; Liang, L.; Xu, Y.-Q.; Shum, H.-Y. Natural Image Colorization. In Proceedings of the Rendering Techniques, Grenoble, France, 25–27 June 2007; Kautz, J., Pattanaik, S., Eds.; The Eurographics Association: Grenoble, France, 2007. [Google Scholar]
- Heu, J.-H.; Hyun, D.-Y.; Kim, C.-S.; Lee, S.-U. Image and Video Colorization Based on Prioritized Source Propagation. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 465–468. [Google Scholar]
- Gupta, R.K.; Chia, A.Y.-S.; Rajan, D.; Ng, E.S.; Huang, Z.-Y. Image Colorization Using Similar Images. In Proceedings of the 20th ACM international conference on Multimedia, Nara, Japan, 29 October–2 November 2012; ACM: New York, NY, USA, 2012; pp. 369–378. [Google Scholar]
- Hamam, T.; Dordek, Y.; Cohen, D. Single-Band Infrared Texture-Based Image Colorization. In Proceedings of the 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 14–17 November 2012; pp. 1–5. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar] [CrossRef]
- Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Qin, M.-Y.; Fan, Y.-C.; Guo, H.-C.; Wang, M.-Q. Application of Improved CycleGAN in Laser-Visible Face Image Translation. Sensors 2022, 22, 4057. [Google Scholar] [CrossRef] [PubMed]
- Qin, M.-Y.; Fan, Y.-C.; Guo, H.-C.; Zhang, L.-X. Laser-Visible Face Image Translation and Recognition Based on CycleGAN and Spectral Normalization. Sensors 2023, 23, 3765. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Kim, S.; Baek, J.; Park, J.; Kim, G.; Kim, S. InstaFormer: Instance-Aware Image-to-Image Translation With Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 18321–18331. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. Arxiv E-Prints 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10674–10685. [Google Scholar]
- Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Adv. Neural Inf. Process. Syst. 2022, 35, 36479–36494. [Google Scholar]
- Sohl-Dickstein, J.; Weiss, E.A.; Maheswaranathan, N.; Ganguli, S. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 7–9 July 2015; PMLR 37. pp. 2256–2265. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
- Chen, H.-W.; Xu, Y.-S.; Hong, M.-F.; Tsai, Y.-M.; Kuo, H.-K.; Lee, C.-Y. Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 18257–18267. [Google Scholar]
- Wang, H.-J.; Chen, J.-Y.; Zheng, Y.-Q.; Zeng, T.-Y. Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 25532–25543. [Google Scholar]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. RePaint: Inpainting Using Denoising Diffusion Probabilistic Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11461–11471. [Google Scholar]
- Ni, M.-H.; Li, X.-M.; Zuo, W.-M. NUWA-LIP: Language-Guided Image Inpainting With Defect-Free VQGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14183–14192. [Google Scholar]
- Zabari, N.; Azulay, A.; Gorkor, A.; Halperin, T.; Fried, O. Diffusing Colors: Image Colorization with Text Guided Diffusion. In Proceedings of the SIGGRAPH Asia 2023 Conference Papers, Sydney, NSW, Australia, 12–15 December 2023; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar]
- Cong, X.-Y.; Wu, Y.; Chen, Q.-F.; Lei, C.-Y. Automatic Controllable Colorization via Imagination. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 2609–2619. [Google Scholar]
- Kumar, T.; Brennan, R.; Mileo, A.; Bendechache, M. Image Data Augmentation Approaches: A Comprehensive Survey and Future Directions. IEEE Access 2024, 12, 187536–187571. [Google Scholar] [CrossRef]
- Li, B.; Xue, K.-T.; Liu, B.; Lai, Y.-K. BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1952–1961. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-Based Generative Modeling through Stochastic Differential Equations. In Proceedings of the International Conference on Learning Representations, Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR. Volume 139, pp. 8162–8171. [Google Scholar]
- Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-Image Diffusion Models. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar]
- Choi, J.; Kim, S.; Jeong, Y.; Gwon, Y.; Yoon, S. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 14367–14376. [Google Scholar]
- Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image Super-Resolution via Iterative Refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 4713–4726. [Google Scholar] [CrossRef]
- Liu, X.-H.; Park, D.H.; Azadi, S.; Zhang, G.; Chopikyan, A.; Hu, Y.; Shi, H.; Rohrbach, A.; Darrell, T. More Control for Free! Image Synthesis With Semantic Diffusion Guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 289–299. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Song, J.-M.; Meng, C.-L.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Fréchet, M.M. Sur Quelques Points Du Calcul Fonctionnel. Rendiconti del Circolo Matematico di Palermo (1884–1940) 1906, 22, 1–72. [Google Scholar] [CrossRef]
Hardware and Software | Parameters |
---|---|
Operating System | Ubuntu 22.04 |
CPU | Inter(R) Xeon(R) Gold 5218R |
GPU | NVIDIA RTX A6000 |
Video Memory | 48 GB |
Deep Learning Framework | Pytorch 2.1.0 |
Model | FID ↓ | LPIPS ↓ |
---|---|---|
Pix2pix | 283.812 | 0.486 |
BigColor | 210.154 | 0.535 |
CT2 | 216.339 | 0.506 |
ColorFormer | 183.969 | 0.549 |
DDColor | 217.325 | 0.520 |
DCLTV | 154.740 | 0.379 |
Model | FID ↓ | LPIPS ↓ |
---|---|---|
DCLTV(1C) | 180.068 | 0.426 |
DCLTV(2C) | 164.809 | 0.381 |
DCLTV(2CwithSAttn) | 154.740 | 0.379 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Zhang, L.; Guo, H.; Zheng, H.; Sun, H.; Li, Y.; Li, R.; Luan, C.; Tong, X. DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation. Sensors 2025, 25, 697. https://doi.org/10.3390/s25030697
Zhang X, Zhang L, Guo H, Zheng H, Sun H, Li Y, Li R, Luan C, Tong X. DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation. Sensors. 2025; 25(3):697. https://doi.org/10.3390/s25030697
Chicago/Turabian StyleZhang, Xiaoyu, Laixian Zhang, Huichao Guo, Haijing Zheng, Houpeng Sun, Yingchun Li, Rong Li, Chenglong Luan, and Xiaoyun Tong. 2025. "DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation" Sensors 25, no. 3: 697. https://doi.org/10.3390/s25030697
APA StyleZhang, X., Zhang, L., Guo, H., Zheng, H., Sun, H., Li, Y., Li, R., Luan, C., & Tong, X. (2025). DCLTV: An Improved Dual-Condition Diffusion Model for Laser-Visible Image Translation. Sensors, 25(3), 697. https://doi.org/10.3390/s25030697