Deep Layer Aggregation Architectures for Photorealistic Universal Style Transfer
Abstract
:1. Introduction
1.1. Style Transfer—An Overview
- Photo editing and art: neural style transfer is already used in photo editing software for generating license-free artwork, art designing, fashion designing, etc. [2].
- Virtual reality: neural style transfer can generate virtual reality scenes and environments of great visual impact, much faster than conventional methods.
1.2. Neural Style Transfer (NST)
- Texture synthesis: extracts the main characteristics from the input images.
- Image reconstruction: expands the synthesized content in the desired style, preserving high-level features.
1.2.1. Encoder
1.2.2. Decoder
1.3. Skip Connections and Deep Layer Aggregation
1.4. Motivation for Using Deep Layer Aggregation Architectures
2. Materials and Methods
2.1. Deep Layer Aggregation Decoders
2.1.1. Architectural Design
2.1.2. Training
2.1.3. Expectations
- Improved generalization: some aggregations might be better adapted to some styles, and this would be best captured by the metric of FID score.
- Better output quality: some aggregations might be better suited to capture different aspects of the style, and this would be best captured by the metric of reconstruction error (lower value means that output image is closer to input image)
- More flexibility: some aggregations might be better suited to capturing color and spatial patterns, and this would be best captured by the metric of total variation; since the spatial coherence should come only from the input image, and the color from the style image, a very low value of total variation is a sign of content loss when doing transfer on complex images such as the ones in our dataset.
3. Results
3.1. Objective (Quantitative) Comparison
3.2. Subjective Comparison
3.3. Comparison between Expected Results and Measured Results
3.4. Computational Time Comparison
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- An, J.; Xiong, H.; Luo, J.; Huan, J.; Ma, J. Fast universal style transfer for artistic and photorealistic rendering. arXiv 2019, arXiv:1907.03118. [Google Scholar]
- Sohaliya, G.; Sharma, K. An Evolution of Style Transfer from Artistic to Photorealistic: A Review. In Proceedings of the 2021 Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 27–29 August 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Francis Hsu, University of Illinois at Urbana–Champaign, NeuralStyleTransfer Project. Available online: https://github.com/Francis-Hsu/NeuralStyleTransfer (accessed on 10 November 2022).
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: https://www.deeplearningbook.org/ (accessed on 10 November 2022).
- An, J.; Xiong, H.; Huan, J.; Luo, J. Ultrafast Photorealistic Style Transfer via Neural Architecture Search. arXiv 2020, arXiv:1912.02398v2. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Available online: https://arxiv.org/abs/1409.1556 (accessed on 29 March 2023).
- Yu, F.; Wang, D.; Shelhamer, E.; Darrell, T. Deep Layer Aggregation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2403–2412. [Google Scholar] [CrossRef]
- Adaloglou, N. Intuitive Explanation of Skip Connections in Deep Learning. The AI Summer. 2020. Available online: https://theaisummer.com/skip-connections (accessed on 10 November 2022).
- Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Yu, Y.; Song, M. Neural style transfer: A review. IEEE Trans. Vis. Comput. Graph. 2019, 26, 3365–3385. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.-H. Universal style transfer via feature transforms. arXiv 2017, arXiv:1705.08086. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. arXiv 2017, arXiv:1706.08500. [Google Scholar]
- Condat, L. Discrete total variation: New definition and minimization. SIAM J. Imaging Sci. 2017, 10, 1258–1290. [Google Scholar] [CrossRef]
- Hernandez-Cruz, N.; Cato, D.; Favela, J. Neural Style Transfer as Data Augmentation for Improving COVID-19 Diagnosis Classification. SN Comput. Sci. 2021, 2, 410. [Google Scholar] [CrossRef] [PubMed]
- Ma, C.; Ji, Z.; Gao, M. Neural Style Transfer Improves 3D Cardiovascular MR Image Segmentation on Inconsistent Data. arXiv 2019, arXiv:1909.09716. [Google Scholar]
- Benitez-Garcia, G.; Takahashi, H.; Yanai, K. Material Translation Based on Neural Style Transfer with Ideal Style Image Retrieval. Sensors 2022, 22, 7317. [Google Scholar] [CrossRef] [PubMed]
Architecture | Short Description of Architecture | Expected Improved Aspect | Metric for Improved Aspect |
---|---|---|---|
IDA | Iteratively refines the output image by adding details at multiple scales. Should gradually improve image quality by adding fine details and texture. Decreased generalization might happen due to overfitting. | Improved quality Increased flexibility Decreased generalization | Reconstruction Error Total Variation FID score |
TSA | Organizing aggregations in a tree structure, each node capturing a different aspect of the style. Decreased generalization might happen due to overfitting. | (Potentially) improved quality Decreased generalization | Reconstruction Error FID score |
RA | Reentrant aggregation involves a recursive process of combining multiple layers of feature maps from different layers of the convolutional neural network, allowing for a more detailed and fine-grained control over the style transfer process (by having a spatial memory). | (Potentially) improved quality Increased flexibility | Reconstruction Error Total Variation |
HDA | Multiple layers are combined hierarchically to produce output image: should be able to capture both local and global features of the style and content. | Improved quality Improved generalization | Reconstruction error FID score |
Architecture | Epoch | Reconstruction Error (Lower Is Better) | Total Variation Score (TV) (Very Low Is Bad) | Fréchet Inception Distance Score (FID) (Lower Is Better) |
---|---|---|---|---|
PhotoNet | 1 2 3 4 5 | 102,211.791 98,190.635 94,087.596 108,593.92 102,166.251 | 8189.258 8325.533 7271.771 7393.16 6957.632 | 163.79 160.23 160.28 161.92 163.71 |
IDA | 1 2 3 4 5 | 115,927.932 93,623.708 105,055.555 92,252.028 89,256.107 | 4205.513 4375.632 4618.789 4340.921 4291.648 | 161.45 159.38 160.53 158.28 160.04 |
TSA | 1 2 3 4 5 | 101,494.835 112,259.242 107,383.347 91,525.225 117,641.887 | 5037.395 4896.852 5072.647 5243.896 5089.479 | 162.96 160.98 161.65 160.3 160.32 |
RA | 1 2 3 4 5 | 73,907.531 131,479.38 113,357.705 106,816.984 90,576.84 | 7244.274 8140.81 7048.334 7184.867 7026.428 | 164.87 163.04 160.78 160.99 160.85 |
HDA | 1 2 3 4 5 | 124,712.988 108,710.381 113,971.76 97,956.429 99,175.09 | 4810.022 4815.858 4791.003 5077.516 4891.987 | 162.31 161.48 160.49 163.18 161.03 |
Architecture | PhotoNet | IDA | TSA | RA | HDA |
---|---|---|---|---|---|
Preference | 22.35% | 7% | 12.95% | 41.8% | 15.9% |
Architecture | Expected Impact | Metric for Improved Aspect | Result Matches the Expected Result? |
---|---|---|---|
IDA | Improved quality Increased flexibility Decreased generalization | Reconstruction Error Total Variation FID score | Yes, but marginally better than PhotoNet Low value, could be bad: check subjective results No, but marginally better than PhotoNet |
TSA | (Potential) improved quality Decreased generalization | Reconstruction Error FID score | Yes, but marginally better than PhotoNet No, but similar to PhotoNet |
RA | (Potential) improved quality Increased flexibility | Reconstruction Error Total Variation | Yes, improved Similar to PhotoNet |
HDA | (Potential) improved quality Improved generalization | Reconstruction error FID score | No, but marginally worse than PhotoNet No, but similar to PhotoNet |
Architecture | PhotoNet | IDA | TSA | RA | HDA |
---|---|---|---|---|---|
Inference (s) | 1.02 | 1.34 | 1.65 | 1.45 | 1.45 |
Training (min) | 128 | 135 | 142 | 135 | 136 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dediu, M.; Vasile, C.-E.; Bîră, C. Deep Layer Aggregation Architectures for Photorealistic Universal Style Transfer. Sensors 2023, 23, 4528. https://doi.org/10.3390/s23094528
Dediu M, Vasile C-E, Bîră C. Deep Layer Aggregation Architectures for Photorealistic Universal Style Transfer. Sensors. 2023; 23(9):4528. https://doi.org/10.3390/s23094528
Chicago/Turabian StyleDediu, Marius, Costin-Emanuel Vasile, and Călin Bîră. 2023. "Deep Layer Aggregation Architectures for Photorealistic Universal Style Transfer" Sensors 23, no. 9: 4528. https://doi.org/10.3390/s23094528
APA StyleDediu, M., Vasile, C. -E., & Bîră, C. (2023). Deep Layer Aggregation Architectures for Photorealistic Universal Style Transfer. Sensors, 23(9), 4528. https://doi.org/10.3390/s23094528