A Training-Free Latent Diffusion Style Transfer Method
Abstract
:1. Introduction
2. Related Work
3. Training-Free Approach
3.1. Diffusion Models
3.2. Detailed Work
3.2.1. Normalization Feature Mapping
- (1)
- Instance Normalization
- (2)
- Cross-Attention
3.2.2. SimAM Attention Fusion
4. Experiments and Results Analysis
4.1. Dataset and Experimental Setup
4.2. Style Arbitrariness Experiment
4.3. Comparative Experiment
4.3.1. Quantitative Comparison
- Minimum Time: Our model achieves a minimum inference time of 58.00 s, which is faster than DiffuseIT (145.00 s) and InST (150.00 s), and faster than DiffStyle (64.00 s). This indicates that our model performs efficiently in the best-case scenarios, achieving the fastest minimum time among all models.
- Maximum Time: The maximum inference time for our model is 63.00 s. This is lower than the maximum times recorded for DiffuseIT (153.00 s) and InST (157.00 s), and also lower than DiffStyle (70.00 s). This suggests that our model generally maintains efficient performance and does not experience the longest inference times compared to all models.
- Standard Deviation: Our model’s standard deviation of 1.80 s is the lowest among the models compared. This indicates less variability in the inference times compared to DiffuseIT (2.50 s), InST (2.80 s), and DiffStyle (2.00 s). A smaller standard deviation reflects greater consistency and stability in performance, which is advantageous for reliable and predictable inference times.
4.3.2. Qualitative Comparison
4.4. Ablation Study
4.5. Achieving a Harmony between Style and Content
4.6. Impact of Attention Injection Step
4.7. Discussion and Explain
5. Conclusions and Future Work
- Expanding Application Areas: We will investigate and illustrate how our method can be adapted and utilized in various fields such as medical imaging, video processing, and other domains where style transfer or latent diffusion techniques may be beneficial.
- Case Studies: We will conduct case studies and provide concrete examples to showcase the practical utility and versatility of our approach in different real-world scenarios.
- Benchmarking: We plan to benchmark our method against existing solutions in these new application areas to evaluate its effectiveness and advantages in practical settings.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A Neural Algorithm of Artistic Style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- Everaert, M.N.; Bocchio, M.; Arpa, S.; Süsstrunk, S.; Achanta, R. Diffusion in style. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 2251–2261. [Google Scholar]
- Wang, Z.; Zhao, L.; Xing, W. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 7677–7689. [Google Scholar]
- Zhang, Y.; Huang, N.; Tang, F.; Huang, H.; Ma, C.; Dong, W.; Xu, C. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10146–10156. [Google Scholar]
- Chung, J.; Hyun, S.; Heo, J.P. Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 8795–8805. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Wright, M.; Ommer, B. Artfid: Quantitative evaluation of neural style transfer. In Proceedings of the DAGM German Conference on Pattern Recognition, Konstanz, Germany, 27–30 September 2022; pp. 560–576. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Naeem, M.F.; Oh, S.J.; Uh, Y.; Choi, Y.; Yoo, J. Reliable fidelity and diversity metrics for generative models. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; pp. 7176–7185. [Google Scholar]
- Banar, N.; Sabatelli, M.; Geurts, P.; Daelemans, W.; Kestemont, M. Transfer learning with style transfer between the photorealistic and artistic domain. In Proceedings of the IS&T International Symposium on Electronic Imaging 2021, Computer Vision and Image Analysis of Art 2021, Online, 11–28 January 2021. [Google Scholar]
- Li, H.; Wan, X.X. Image style transfer algorithm under deep convolutional neural network. In Proceedings of the Computer Engineering and Applications, Guangzhou, China, 18–20 March 2020; pp. 176–183. [Google Scholar]
- Chen, C.J. Chinese Painting Style Transfer Based on Convolutional Neural Network; Hangzhou Dianzi University: Hangzhou, China, 2021. [Google Scholar] [CrossRef]
- Li, S.; Xu, X.; Nie, L.; Chua, T.S. Laplacian-steered neural style transfer. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1716–1724. [Google Scholar]
- Risser, E.; Wilmot, P.; Barnes, C. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv 2017, arXiv:1701.08893. [Google Scholar]
- Dumoulin, V.; Shlens, J.; Kudlur, M. A learned representation for artistic style. arXiv 2016, arXiv:1610.07629. [Google Scholar]
- Chen, D.; Yuan, L.; Liao, J.; Yu, N.; Hua, G. Stylebank: An explicit representation for neural image style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1897–1906. [Google Scholar]
- Chen, T.Q.; Schmidt, M. Fast patch-based style transfer of arbitrary style. arXiv 2016, arXiv:1612.04337. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Zhang, Y.; Tang, F.; Dong, W.; Huang, H.; Ma, C.; Lee, T.Y.; Xu, C. Domain enhanced arbitrary image style transfer via contrastive learning. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–8. [Google Scholar]
- Liu, S.; Ye, J.; Wang, X. Any-to-any style transfer: Making picasso and da vinci collaborate. arXiv 2023, arXiv:2304.09728. [Google Scholar]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.H. Universal style transfer via feature transforms. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; p. 30. [Google Scholar]
- Liu, S.; Lin, T.; He, D.; Li, F.; Wang, M.; Li, X.; Sun, Z.; Li, Q.; Ding, E. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6649–6658. [Google Scholar]
- Zhu, Z.X.; Mao, Y.S.; Cai, K.W. Image style transfer method for industrial inspection. In Proceedings of the Computer Engineering and Applications, Hangzhou, China, 7–9 April 2023; pp. 234–241. [Google Scholar]
- Han, J.; Shoeiby, M.; Petersson, L.; Armin, M.A. Dual contrastive learning for unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 746–755. [Google Scholar]
- Nichol, A.; Dhariwal, P.; Ramesh, A.; Shyam, P.; Mishkin, P.; McGrew, B.; Sutskever, I.; Chen, M. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv 2021, arXiv:2112.10741. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Avrahami, O.; Lischinski, D.; Fried, O. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18208–18218. [Google Scholar]
- Brooks, T.; Holynski, A.; Efros, A.A. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18392–18402. [Google Scholar]
- Cao, M.; Wang, X.; Qi, Z.; Shan, Y.; Qie, X.; Zheng, Y. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 22560–22570. [Google Scholar]
- Couairon, G.; Verbeek, J.; Schwenk, H.; Cord, M. Dffedit: Diffusion-based semantic image editing with mask guidance. arXiv 2022, arXiv:2210.11427. [Google Scholar]
- Hertz, A.; Mokady, R.; Tenenbaum, J.; Aberman, K.; Pritch, Y.; Cohen-Or, D. Prompt-to-prompt image editing with cross attention control. arXiv 2022, arXiv:2208.01626. [Google Scholar]
- Wu, C.H.; De la Torre, F. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 7378–7387. [Google Scholar]
- Zhang, Z.; Han, L.; Ghosh, A.; Metaxas, D.N.; Ren, J. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6027–6037. [Google Scholar]
- Qi, T.; Fang, S.; Wu, Y.; Xie, H.; Liu, J.; Chen, L.; Zhang, Y. DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 8693–8702. [Google Scholar]
- Jeong, J.; Kwon, M.; Uh, Y. Training-free style transfer emerges from h-space in diffusion models. arXiv 2023, arXiv:2303.15403. [Google Scholar]
- Lin, H.; Cheng, X.; Wu, X.; Shen, D.; Wang, Z.; Song, Q.; Yuan, W. Cat: Cross attention in vision transformer. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Lawrence Zitnick, C.; Dollár, P. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Tan, W.R.; Chan, C.S.; Aguirre, H.E.; Tanaka, K. Improved ArtGAN for conditional synthesis of natural image and artwork. IEEE Trans. Image Process. 2018, 28, 394–409. [Google Scholar] [CrossRef] [PubMed]
- Deng, Y.; Tang, F.; Dong, W.; Ma, C.; Pan, X.; Wang, L.; Xu, C. Stytr2: Image style transfer with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11326–11336. [Google Scholar]
- Kwon, G.; Ye, J.C. Diffusion-based image translation using disentangled style and content representation. arXiv 2022, arXiv:2209.15264. [Google Scholar]
- Deng, Y.; Tang, F.; Dong, W.; Sun, W.; Huang, F.; Xu, C. Arbitrary style transfer via multi-adaptation network. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 2719–2727. [Google Scholar]
Metrics | Ours | AdaIN | StyTR2 | MAST | CAST | DiffuseIT | StyleID |
---|---|---|---|---|---|---|---|
ArtFID ⬇ | 28.124 | 30.933 | 30.720 | 31.282 | 34.685 | 40.721 | 28.801 |
FID ⬇ | 17.891 | 18.242 | 18.890 | 18.199 | 20.395 | 23.065 | 18.131 |
LPIPS ⬇ | 0.4705 | 0.6076 | 0.5445 | 0.6293 | 0.6212 | 0.6921 | 0.5055 |
CFSD ⬇ | 0.2156 | 0.3155 | 0.3011 | 0.3043 | 0.2918 | 0.3428 | 0.2281 |
Metrics | DiffuseIT | InST | DiffStyle | Ours | Unit |
---|---|---|---|---|---|
Average time | 149.02 | 153.55 | 66.89 | 60.59 | second |
Minimum time | 145.00 | 150.00 | 64.00 | 58.00 | second |
Maximum time | 153.00 | 157.00 | 70.00 | 63.00 | second |
Standard deviation | 2.50 | 2.80 | 2.00 | 1.80 | second |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xiang, Z.; Wan, X.; Xu, L.; Yu, X.; Mao, Y. A Training-Free Latent Diffusion Style Transfer Method. Information 2024, 15, 588. https://doi.org/10.3390/info15100588
Xiang Z, Wan X, Xu L, Yu X, Mao Y. A Training-Free Latent Diffusion Style Transfer Method. Information. 2024; 15(10):588. https://doi.org/10.3390/info15100588
Chicago/Turabian StyleXiang, Zhengtao, Xing Wan, Libo Xu, Xin Yu, and Yuhan Mao. 2024. "A Training-Free Latent Diffusion Style Transfer Method" Information 15, no. 10: 588. https://doi.org/10.3390/info15100588
APA StyleXiang, Z., Wan, X., Xu, L., Yu, X., & Mao, Y. (2024). A Training-Free Latent Diffusion Style Transfer Method. Information, 15(10), 588. https://doi.org/10.3390/info15100588