Sound-to-Imagination: An Exploratory Study on Cross-Modal Translation Using Diverse Audiovisual Data
Abstract
:1. Introduction
2. Related Work
3. Inherent Challenges in S2I Translation Processes
3.1. Computational Imagination
3.2. Computational Creativity and Divergence/Convergence Methods
3.3. Addressing the Problem
4. Sound-to-Image Translator
4.1. Overview
4.2. Network Architecture
4.3. Training Methodology
Algorithm 1 Pseudo-code for the training of the translator’s GAN. |
Input is the batch size) Input ( is the number of training iterations of the generator) Input is the number of training iterations of the generator per discriminator training) Input is the minimum RC-score value) Input is the maximum RC-score value) Input is the adversarial loss scale factor) iterations do from stored data: frozen from the audio encoder: 5: end with from stored data: then frozen to minimize: 10: end with 11: end if frozen to minimize: 14: end with 15: end for |
5. Experiments
5.1. Data Used
5.2. Preliminary Exploration and Training Details
5.2.1. Generator’s Architecture Evolution
5.2.2. Models’ Generalization
5.2.3. Networks Activation
5.2.4. Data Balancing and Networks Initialization
5.2.5. Algorithm Hyperparameters and Technology Stack
5.3. Informativity Classifiers
5.4. S2I Translation Results
5.4.1. Quantitative Evaluation
5.4.2. Qualitative Evaluation
- Defective—Images that may even be informative, but inaccurately depict elements concerning color, luminosity, size, and/or position.
- Incomplete—Images that may be informative to a certain degree, but lack some essential part of the element of interest. Or, despite having a coherent surrounding scene, the sound-emitting source is omitted.
- Artifactual—Non-informative images that are rather abstract, consisting basically of unrecognizable forms.
- Implausible—Images that may occasionally be informative, but that contain awkward or unlikely elements.
- Surreal—Images that may present some degree of informativity but have a curious or fantastical appearance.
- Creepy—Images that may be partially informative, but that portray parts of living beings in a harrowing way, or that contain ghostly or alien-looking elements. These images could sometimes also be considered defective or surreal, depending on the case.
- Multi-informative—These images depict elements from two or more sound classes.
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ballas, J.A.; Howard, J.H. Interpreting the Language of Environmental Sounds. Environ. Behav. 1987, 19, 91–114. [Google Scholar] [CrossRef]
- Gencoglu, O.; Virtanen, T.; Huttunen, H. Recognition of Acoustic Events Using Deep Neural Networks. In Proceedings of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 1–5 September 2014; pp. 506–510. [Google Scholar]
- Fanzeres, L.A.; Vivacqua, A.S.; Biscainho, L.W.P. Mobile Sound Recognition for the Deaf and Hard of Hearing. arXiv 2018. [Google Scholar] [CrossRef]
- Neubert, A.; Shreve, G.M. Translation as Text; Kent State University Press: Kent, OH, USA, 1992; ISBN 978-0-87338-695-1. [Google Scholar]
- Barnard, K.; Duygulu, P.; Forsyth, D.; de Freitas, N.; Blei, D.M.; Jordan, M.I. Matching Words and Pictures. J. Mach. Learn. Res. 2003, 3, 1107–1135. [Google Scholar]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Object Detectors Emerge in Deep Scene CNNs. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Gonzalez-Garcia, A.; Modolo, D.; Ferrari, V. Do Semantic Parts Emerge in Convolutional Neural Networks? Int. J. Comput. Vis. 2018, 126, 476–494. [Google Scholar] [CrossRef]
- Liang, J.; Jin, Q.; He, X.; Yang, G.; Xu, J.; Li, X. Detecting Semantic Concepts in Consumer Videos Using Audio. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AL, Canada, 15–20 April 2015; pp. 2279–2283. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27; Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 2672–2680. [Google Scholar]
- Deng, L.; Yu, D. Deep Learning: Methods and Applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef]
- Chen, L.; Srivastava, S.; Duan, Z.; Xu, C. Deep Cross-Modal Audio-Visual Generation. In Proceedings of the Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, 23–27 October 2017; Association for Computing Machinery: New York, NY, USA; pp. 349–357. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-To-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Li, B.; Liu, X.; Dinesh, K.; Duan, Z.; Sharma, G. Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications. IEEE Trans. Multimed. 2019, 21, 522–535. [Google Scholar] [CrossRef]
- Hao, W.; Zhang, Z.; Guan, H. CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Duan, B.; Wang, W.; Tang, H.; Latapie, H.; Yan, Y. Cascade Attention Guided Residue Learning GAN for Cross-Modal Translation. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milano, Italy, 10–15 January 2021; pp. 1336–1343. [Google Scholar]
- Wan, C.; Chuang, S.; Lee, H. Towards Audio to Scene Image Synthesis Using Generative Adversarial Network. In Proceedings of the ICASSP 2019–IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 496–500. [Google Scholar]
- Aytar, Y.; Vondrick, C.; Torralba, A. SoundNet: Learning Sound Representations from Unlabeled Video. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 892–900. [Google Scholar]
- Yang, P.-T.; Su, F.-G.; Wang, Y.-C.F. Diverse Audio-to-Image Generation via Semantics and Feature Consistency. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 7–10 December 2020; pp. 1188–1192. [Google Scholar]
- Duarte, A.; Roldan, F.; Tubau, M.; Escur, J.; Pascual, S.; Salvador, A.; Mohedano, E.; McGuinness, K.; Torres, J.; Giro-i-Nieto, X. Wav2Pix: Speech-Conditioned Face Generation Using Generative Adversarial Networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; Volume 3. [Google Scholar]
- Oh, T.-H.; Dekel, T.; Kim, C.; Mosseri, I.; Freeman, W.T.; Rubinstein, M.; Matusik, W. Speech2Face: Learning the Face Behind a Voice. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 5–20 June 2019; pp. 7539–7548. [Google Scholar]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep Face Recognition. In BMVC 2015-Proceedings of the British Machine Vision Conference 2015; British Machine Vision Association: Durham, UK, 2015. [Google Scholar]
- Cole, F.; Belanger, D.; Krishnan, D.; Sarna, A.; Mosseri, I.; Freeman, W.T. Synthesizing Normalized Faces from Facial Identity Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3386–3395. [Google Scholar]
- Chatterjee, M.; Cherian, A. Sound2Sight: Generating Visual Dynamics from Sound and Context. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 701–719. [Google Scholar]
- Shim, J.Y.; Kim, J.; Kim, J.-K. S2I-Bird: Sound-to-Image Generation of Bird Species Using Generative Adversarial Networks. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2226–2232. [Google Scholar]
- Hao, W.; Han, M.; Li, S.; Li, F. An Attention Enhanced Cross-Modal Image–Sound Mutual Generation Model for Birds. Comput. J. 2021, 65, bxaa188. [Google Scholar] [CrossRef]
- Sanguineti, V.; Thakur, S.; Morerio, P.; Del Bue, A.; Murino, V. Audio-Visual Inpainting: Reconstructing Missing Visual Information with Sound. In Proceedings of the ICASSP 2023–IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Van den Oord, A.; Kalchbrenner, N.; Vinyals, O.; Espeholt, L.; Graves, A.; Kavukcuoglu, K. Conditional Image Generation with PixelCNN Decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates Inc.: Red Hook, NY, USA; pp. 4797–4805. [Google Scholar]
- Gemmeke, J.F.; Ellis, D.P.W.; Freedman, D.; Jansen, A.; Lawrence, W.; Moore, R.C.; Plakal, M.; Ritter, M. Audio Set: An Ontology and Human-Labeled Dataset for Audio Events. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 776–780. [Google Scholar]
- Chen, H.; Xie, W.; Vedaldi, A.; Zisserman, A. Vggsound: A Large-Scale Audio-Visual Dataset. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 721–725. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014. [Google Scholar] [CrossRef]
- Sung-Bin, K.; Senocak, A.; Ha, H.; Owens, A.; Oh, T.-H. Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment. arXiv 2023. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, Z.; Fang, C.; Bui, T.; Berg, T.L. Visual to Sound: Generating Natural Sound for Videos in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3550–3558. [Google Scholar]
- Zhu, H.; Luo, M.-D.; Wang, R.; Zheng, A.-H.; He, R. Deep Audio-Visual Learning: A Survey. Int. J. Autom. Comput. 2021, 18, 351–376. [Google Scholar] [CrossRef]
- Vilaça, L.; Yu, Y.; Viana, P. Recent Advances and Challenges in Deep Audio-Visual Correlation Learning. arXiv 2022. [Google Scholar] [CrossRef]
- Mahadevan, S. Imagination Machines: A New Challenge for Artificial Intelligence. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Davies, J. Artificial Intelligence and Imagination. In The Cambridge Handbook of the Imagination; Abraham, A., Ed.; Cambridge University Press: Cambridge, UK, 2020; pp. 162–171. ISBN 978-1-108-65929-1. [Google Scholar]
- Stevenson, L.F. Twelve Conceptions of Imagination. Br. J. Aesthet. 2003, 43, 238–259. [Google Scholar] [CrossRef]
- Beaney, M. Imagination and Creativity; Open University Worldwide: Milton Keynes, UK, 2010; ISBN 978-0-7492-1735-8. [Google Scholar]
- Pereira, F.C.; Cardoso, A. Conceptual Blending and the Quest for the Holy Creative Process; ResearchGate: Lyon, France, 2002. [Google Scholar]
- Guilford, J.P. The Nature of Human Intelligence, 1st ed.; McGraw-Hill: New York, NY, USA, 1967; ISBN 978-0-07-025135-9. [Google Scholar]
- Gabora, L. Reframing Convergent and Divergent Thought for the 21st Century. In Proceedings of the 41st Annual Meeting of the Cognitive Science Society (CogSci), Montreal, QC, Canada, 24–27 July 2019; pp. 1794–1800. [Google Scholar]
- Mescheder, L.; Nowozin, S.; Geiger, A. The Numerics of GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Long Beach, CA, USA; pp. 1823–1833. [Google Scholar]
- Mescheder, L.; Geiger, A.; Nowozin, S. Which Training Methods for GANs Do Actually Converge? In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–13 July 2018; Volume 80, pp. 3481–3490. [Google Scholar]
- Zhu, J.-Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward Multimodal Image-to-Image Translation. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA; pp. 465–476. [Google Scholar]
- Mao, Q.; Lee, H.-Y.; Tseng, H.-Y.; Ma, S.; Yang, M.-H. Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1429–1437. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2018; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-Attention Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 10–15 June 2019; pp. 7354–7363. [Google Scholar]
- Pearl, J. Causality: Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2009; ISBN 978-0-521-89560-6. [Google Scholar]
- Walker, C.M.; Gopnik, A. Causality and Imagination. In The Oxford Handbook of the Development of Imagination; Taylor, M., Ed.; Oxford University Press: Oxford, UK, 2013; pp. 342–358. ISBN 978-0-19-998303-2. [Google Scholar]
- Stewart, R.; Ermon, S. Label-Free Supervision of Neural Networks with Physics and Domain Knowledge. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2576–2582. [Google Scholar]
- Grzeszick, R.; Plinge, A.; Fink, G.A. Bag-of-Features Methods for Acoustic Event Detection and Classification. IEEEACM Trans. Audio Speech Lang. Process. 2017, 25, 1242–1252. [Google Scholar] [CrossRef]
- Kato, H.; Harada, T. Image Reconstruction from Bag-of-Visual-Words. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 955–962. [Google Scholar]
- Kavalerov, I.; Wisdom, S.; Erdogan, H.; Patton, B.; Wilson, K.; Le Roux, J.; Hershey, J.R. Universal Sound Separation. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; pp. 175–179. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach. In Proceedings of the 28th International Conference on Machine Learning, Omnipress, Madison, WI, USA, 28 June–2 July 2011; pp. 513–520. [Google Scholar]
- Lucas, T.; Tallec, C.; Ollivier, Y.; Verbeek, J. Mixed Batches and Symmetric Discriminators for GAN Training. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2844–2853. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the Difficulty of Training Deep Feedforward Neural Networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Odena, A.; Dumoulin, V.; Olah, C. Deconvolution and Checkerboard Artifacts. Distill 2016, 1, e3. [Google Scholar] [CrossRef]
- Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep Variational Information Bottleneck. arXiv 2017, arXiv:1612.00410. [Google Scholar]
- Jeon, I.; Lee, W.; Kim, G. IB-GAN: Disentangled Representation Learning with Information Bottleneck GAN. 2018. Available online: https://openreview.net/pdf?id=ryljV2A5KX (accessed on 27 September 2023).
- Peng, X.B.; Kanazawa, A.; Toyer, S.; Abbeel, P.; Levine, S. Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. arXiv 2020, arXiv:1810.00821. [Google Scholar]
- Luo, Y.; Liu, P.; Guan, T.; Yu, J.; Yang, Y. Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6778–6787. [Google Scholar]
- Song, Y.; Yu, L.; Cao, Z.; Zhou, Z.; Shen, J.; Shao, S.; Zhang, W.; Yu, Y. Improving Unsupervised Domain Adaptation with Variational Information Bottleneck. In Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 29 August–8 September 2020; IOS Press: Amsterdam, The Netherlands, 2020; Volume 325, pp. 1499–1506. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA; pp. 5580–5590. [Google Scholar]
- Sedai, S.; Antony, B.; Mahapatra, D.; Garnavi, R. Joint Segmentation and Uncertainty Visualization of Retinal Layers in Optical Coherence Tomography Images Using Bayesian Deep Learning. In Computational Pathology and Ophthalmic Medical Image Analysis, Proceedings of the 1st International Workshop, COMPAY 2018, Granada, Spain, 16–20 September 2018; Stoyanov, D., Taylor, Z., Ciompi, F., Xu, Y., Martel, A., Maier-Hein, L., Rajpoot, N., van der Laak, J., Veta, M., McKenna, S., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 219–227. [Google Scholar]
Sound Classes | No. of Original Video Tracks (max. 10 s) | No. of 1 s Video Segments | ||
---|---|---|---|---|
Training | Validation | Test | ||
Baby cry | 2059 | 9789 | 1115 | 1365 |
Dog | 2785 | 9789 | 1115 | 1365 |
Fireworks | 3115 | 9789 | 1115 | 1365 |
Rail transport | 3259 | 9789 | 1115 | 1365 |
Water flowing | 2924 | 9789 | 1115 | 1365 |
Total | 14,142 | 48,945 | 5575 | 6825 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fanzeres, L.A.; Nadeu, C. Sound-to-Imagination: An Exploratory Study on Cross-Modal Translation Using Diverse Audiovisual Data. Appl. Sci. 2023, 13, 10833. https://doi.org/10.3390/app131910833
Fanzeres LA, Nadeu C. Sound-to-Imagination: An Exploratory Study on Cross-Modal Translation Using Diverse Audiovisual Data. Applied Sciences. 2023; 13(19):10833. https://doi.org/10.3390/app131910833
Chicago/Turabian StyleFanzeres, Leonardo A., and Climent Nadeu. 2023. "Sound-to-Imagination: An Exploratory Study on Cross-Modal Translation Using Diverse Audiovisual Data" Applied Sciences 13, no. 19: 10833. https://doi.org/10.3390/app131910833
APA StyleFanzeres, L. A., & Nadeu, C. (2023). Sound-to-Imagination: An Exploratory Study on Cross-Modal Translation Using Diverse Audiovisual Data. Applied Sciences, 13(19), 10833. https://doi.org/10.3390/app131910833