Particle-Velocity-Based Mixed-Source Sound Field Translation for Binaural Reproduction
Abstract
:1. Introduction
2. Problem Formulation
3. Particle-Velocity-Based Mixed-Source Expansion
3.1. Spherical Harmonic Representation of Particle Velocity
3.2. Mixed-Source Model
3.3. Particle-Velocity-Based Expansion
3.3.1. Least Squares Solution
3.3.2. Sparse Solution
4. Sound Field Translation and Synthesis for Binaural Reproduction
5. Simulation Study
5.1. Simulation Setup and Criteria
5.2. Simulation Results
5.2.1. Reproduced Pressure Field
5.2.2. Frequency
5.2.3. Translated Position
5.3. Discussion
- The particle-velocity-based expansion is derived directly from the sound-pressure-based expansion by the gradient calculation. Therefore, the least squares solution of the particle-velocity-based expansion is mathematically parallel to the closed-form solution of the sound-pressure-based expansion, both of which aim to reconstruct the original truncated recording by distributing energy throughout all virtual sources and inherit the spatial artifacts caused by the truncated measurement. This leads to poor reproduction outside the receiver region.
- Sparse solutions can improve the performance of reproduction outside the receiver region due to the fact that most sound fields can be reproduced accurately by a single or a few virtual sources, and thus exhibit sparsity in space. Therefore, provided that the desired sound is sparse in space, the sparse solutions can be used to relax the restriction we mentioned in Section 2. We should note that the region of accurate translation in the multiple sources scenario becomes smaller than that in the single source scenario for the sparse solution. In addition, the sparse solution is less applicable to highly reverberant fields where the sparsity does not hold.
- Particle velocity contains the direction information of a sound field. For the sparse solutions, we can achieve more accurate sound field reproduction by controlling particle velocity than sound pressure. In other words, the particle-velocity-based solution outperforms the sound-pressure-based solution for sound field reproduction in the cases where there are a limited number of sources. Furthermore, particle velocity reflects the interaural time difference (ITD); therefore, the velocity vector is directly related to the localization predictor for human perception at low frequencies. Particle velocity is also one of the quantities that determine the sound intensity vector (the localization predictor at high frequencies), which reflects another human localization cue of interaural level difference (ILD). Therefore, the sparse solution of particle-velocity-based expansion is expected to provide an enhanced perceptual immersion for the listener.
- Sparse solutions extrapolate the sound field at the translated positions beyond the receiver region according to the sound field within the receiver region by the mixed-source expansion where we use multiple virtual sources to estimate the original source. Therefore, the more the virtual sources are used, the less error the mixed-source expansion has; however, the higher computation cost the system has. In addition, the particle-velocity-based solutions would be more computationally expensive than the sound-pressure-based solutions due to the the extra spherical harmonic decomposition of particle velocity.
6. Experimental Verification
6.1. Experimental Methodology
- Reference/Hidden reference: The ground truth, which is obtained directly from filtering the theoretical point source signal with the HRTFs.
- Anchor: Signals of the truncated recording at the origin, which are simulated using the pressure coefficients up to order N in (34). No translation is processed, but is filtered by the HRTFs.
- P-CFS: Signals reconstructed using the closed-form solution of the sound-pressure-based expansion and rendered by the HRTFs.
- V-LSS: Signals reconstructed using the least squares solution of the particle-velocity-based expansion and rendered by the HRTFs.
- P-SS: Signals reconstructed using the sparse solution of the sound-pressure-based expansion and rendered by the HRTFs.
- V-SS: Signals reconstructed using the sparse solution of the particle-velocity-based expansion and rendered by the HRTFs.
6.2. Experimental Results
6.3. Discussion
- Statistically, the least squares solution of the particle-velocity-based expansion is equivalent to the closed-form solution of the sound-pressure-based expansion, whereas the sparse solution of the the particle-velocity-based expansion provides better perceptual performance than the sparse solution of the sound-pressure-based expansion at a random translated position. The experimental results are consistent with the discussions in Section 5.3.
- Having particle velocity as the design criterion contributes to more precise reconstruction of the interaural phase difference than sound pressure, which is of utmost important for direction perception and therefore, contributes to improved localization of reconstructed sound when there are only a limited number of sources (the sparse solution). In addition to direction perception, audio quality can also be improved by controlling particle velocity for the sparse solution, which may be due to the more accurate reproduction of the sound field.
- There are some limitations for the experimental verification. We only evaluate the single source scenario in the perceptual experiment, where the sound field exhibits sparsity. The perceptual performance may degrade for the multiple source scenario or the highly reverberant sound fields. In addition, the source signal is a single type of music, and the performance of the proposed method in terms of the type of audio remains unexplored. We should note that the equipment used (e.g., headphone and sound card) and the listening environment may differ for different subjects due to the remote experiment.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
- The following abbreviations are used in this manuscript:
AR/VR Augmented Reality/Virtual Reality MUSHRA Multiple Stimulus with Hidden Reference and Anchor HRTF Head-related transfer function LASSO Least absolute shrinkage and selection operator IRLS Iteratively reweighted least squares ITD Interaural time difference ILD Interaural level difference
References
- Rafaely, B.; Tourbabin, V.; Habets, E.; Ben-Hur, Z.; Lee, H.; Gamper, H.; Arbel, L.; Birnie, L.; Abhayapala, T.; Samarasinghe, P. Spatial audio signal processing for binaural reproduction of recorded acoustic scenes—Review and challenges. Acta Acust. 2022, 6, 47. [Google Scholar] [CrossRef]
- Tylka, J.G.; Choueiri, E.Y. Models for evaluating navigational techniques for higher-order ambisonics. Proc. Meet. Acoust. 2017, 30, 050009. [Google Scholar]
- Tylka, J.G.; Choueiri, E.Y. Fundamentals of a parametric method for virtual navigation within an array of ambisonics microphones. J. Audio Eng. Soc. 2020, 68, 120–137. [Google Scholar] [CrossRef]
- Tylka, J.G.; Choueiri, E.Y. Performance of linear extrapolation methods for virtual sound field navigation. J. Audio Eng. Soc. 2020, 68, 138–156. [Google Scholar] [CrossRef]
- Mariette, N.; Katz, B. Sounddelta—Large scale, multi-user audio augmented reality. In Proceedings of the EAA Symposium on Auralization, Espoo, Finland, 15–17 June 2009; pp. 15–17. [Google Scholar]
- Southern, A.; Wells, J.; Murphy, D. Rendering walk-through auralisations using wave-based acoustical models. In Proceedings of the 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; pp. 715–719. [Google Scholar]
- Mariette, N.; Katz, B.F.; Boussetta, K.; Guillerminet, O. Sounddelta: A study of audio augmented reality using wifi-distributed ambisonic cell rendering. In Audio Engineering Society Convention 128; Audio Engineering Society: New York, NY, USA, 2010. [Google Scholar]
- Tylka, J.G.; Choueiri, E. Soundfield navigation using an array of higher-order ambisonics microphones. In Proceedings of the AES International Conference on Audio for Virtual and Augmented Reality, Los Angeles, CA, USA, 30 Septemeber–1 October 2016. [Google Scholar]
- Müller, K.; Zotter, F. Auralization based on multi-perspective ambisonic room impulse responses. Acta Acust. 2020, 4, 25. [Google Scholar] [CrossRef]
- Samarasinghe, P.; Abhayapala, T.; Poletti, M. Wavefield analysis over large areas using distributed higher order microphones. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 647–658. [Google Scholar] [CrossRef]
- Patricio, E.; Ruminski, A.; Kuklasinski, A.; Januszkiewicz, L.; Zernicki, T. Toward six degrees of freedom audio recording and playback using multiple ambisonics sound fields. In Audio Engineering Society Convention 146; Audio Engineering Society: New York, NY, USA, 2019. [Google Scholar]
- Wang, Y.; Chen, K. Translations of spherical harmonics expansion coefficients for a sound field using plane wave expansions. J. Acoust. Soc. Amer. 2018, 143, 3474–3478. [Google Scholar] [CrossRef]
- Thiergart, O.; Galdo, G.D.; Taseska, M.; Habets, E.A. Geometry-based spatial sound acquisition using distributed microphone arrays. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2583–2594. [Google Scholar] [CrossRef]
- Mccormack, L.; Politis, A.; Mckenzie, T.; Hold, C.; Pulkki, V. Object-based six-degrees-of-freedom rendering of sound scenes captured with multiple ambisonic receivers. J. Audio Eng. Soc. 2022, 70, 355–372. [Google Scholar] [CrossRef]
- Noisternig, M.; Sontacchi, A.; Musil, T.; Holdrich, R. A 3D ambisonic based binaural sound reproduction system. In Proceedings of the 24th International Conference: Multichannel Audio, the New Reality, Banff, AL, Canada, 26–28 June 2003. [Google Scholar]
- Menzies, D.; Al-Akaidi, M. Ambisonic synthesis of complex sources. J. Audio Eng. Soc. 2007, 55, 864–876. [Google Scholar]
- Pihlajamaki, T.; Pulkki, V. Synthesis of complex sound scenes with transformation of recorded spatial sound in virtual reality. J. Audio Eng. Soc. 2015, 63, 542–551. [Google Scholar] [CrossRef]
- Duraiswami, R.; Li, Z.; Zotkin, D.N.; Grassi, E.; Gumerov, N.A. Plane-wave decomposition analysis for spherical microphone arrays. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 16–19 October 2005; pp. 150–153. [Google Scholar]
- Menzies, D.; Al-Akaidi, M. Nearfield binaural synthesis and ambisonics. J. Acoust. Soc. Amer. 2007, 121, 1559–1563. [Google Scholar] [CrossRef] [PubMed]
- Schultz, F.; Spors, S. Data-based binaural synthesis including rotational and translatory head-movements. In Audio Engineering Society Conference: 52nd International Conference: Sound Field Control-Engineering and Perception; Audio Engineering Society: New York, NY, USA, 2013. [Google Scholar]
- Fernandez-Grande, E. Sound field reconstruction using a spherical microphone array. J. Acoust. Soc. Amer. 2016, 139, 1168–1178. [Google Scholar] [CrossRef]
- Tylka, J.G.; Choueiri, E. Comparison of techniques for binaural navigation of higher-order ambisonic soundfields. In Audio Engineering Society Convention 139; Audio Engineering Society: New York, NY, USA, 2015. [Google Scholar]
- Frank, M. Phantom Sources Using Multiple Loudspeakers in the Horizontal Plane. Ph.D. Thesis, University of Music and Performing Arts, Graz, Austria, 2013. [Google Scholar]
- Daniel, J. Spatial sound encoding including near field effect: Introducing distance coding filters and a viable, new ambisonic format. In Proceedings of the 23rd International Conference: Signal Processing in Audio Recording and Reproduction, Copenhagen, Denmark, 23–25 May 2003. [Google Scholar]
- Poletti, M.A. Three-dimensional surround sound systems based on spherical harmonics. J. Audio Eng. Soc. 2005, 53, 1004–1025. [Google Scholar]
- Ward, D.B.; Abhayapala, T.D. Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 2001, 9, 697–707. [Google Scholar] [CrossRef]
- Hahn, N.; Spors, S. Modal bandwidth reduction in data-based binaural synthesis including translatory head-movements. In Proceedings of the German Annual Conference on Acoustics (DAGA), Nurnberg, Germany, 16–19 March 2015; pp. 1122–1125. [Google Scholar]
- Hahn, N.; Spors, S. Physical properties of modal beamforming in the context of data-based sound reproduction. In Audio Engineering Society Convention 139; Audio Engineering Society: New York, NY, USA, 2015. [Google Scholar]
- Kuntz, A.; Rabenstein, R. Limitations in the extrapolation of wave fields from circular measurements. In Proceedings of the 15th European Signal Processing Conference, Poznan, Poland, 3–7 September 2007; pp. 2331–2335. [Google Scholar]
- Winter, F.; Schultz, F.; Spors, S. Localization properties of data-based binaural synthesis including translatory head-movements. In Proceedings of the Forum Acusticum, Krakow, Poland, 12–14 September 2014. [Google Scholar]
- Kowalczyk, K.; Thiergart, O.; Taseska, M.; Galdo, G.D.; Pulkki, V.; Habets, E.A.P. Parametric spatial sound processing: A flexible and efficient solution to sound scene acquisition, modification, and reproduction. IEEE Signal Proc. Mag. 2015, 32, 31–42. [Google Scholar] [CrossRef]
- Laitinen, T.; Pihlajamäki, M.V.a.; Erkut, C.; Pulkki, V. Parametric time-frequency representation of spatial sound in virtual worlds. ACM Trans. Appl. Percept. (TAP) 2012, 9, 1–20. [Google Scholar] [CrossRef]
- Plinge, A.; Schlecht, S.J.; Thiergart, O.; Robotham, T.; Rummukainen, O.; Habets, E.A.P. Six-degrees-of-freedom binaural audio reproduction of first-order ambisonics with distance information. In Proceedings of the AES International Conference on Audio for Virtual and Augmented Reality, Redmond, WA, USA, 20–22 August 2018. [Google Scholar]
- Stein, E.; Goodwin, M.M. Ambisonics depth extensions for six degrees of freedom. In Proceedings of the AES International Conference on Headphone Technology, San Francisco, CA, USA, 27–29 August 2019. [Google Scholar]
- Blochberger, M.; Zotter, F. Particle-filter tracking of sounds for frequency-independent 3D audio rendering from distributed b-format recordings. Acta Acust. 2021, 5, 20. [Google Scholar] [CrossRef]
- Allen, A.; Kleijn, B. Ambisonics soundfield navigation using directional decomposition and path distance estimation. In Proceedings of the International Conference on Spatial Audio, Graz, Austria, 7–10 September 2017. [Google Scholar]
- Kentgens, M.; Behler, A.; Jax, P. Translation of a higher order ambisonics sound scene based on parametric decomposition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 151–155. [Google Scholar]
- Werner, S.; Klein, F.; Neidhardt, A.; Sloma, U.; Schneiderwind, C.; Brandenburg, K. Creation of auditory augmented reality using a position-dynamic binaural synthesis system—Technical components, psychoacoustic needs, and perceptual evaluation. Appl. Sci. 2021, 11, 1150. [Google Scholar] [CrossRef]
- Birnie, L.; Abhayapala, T.; Samarasinghe, P.; Tourbabin, V. Sound field translation methods for binaural reproduction. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 20–23 October 2019; pp. 140–144. [Google Scholar]
- Birnie, L.I.; Abhayapala, T.D.; Tourbabin, V.; Samarasinghe, P. Mixed source sound field translation for virtual binaural application with perceptual validation. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 1188–1203. [Google Scholar] [CrossRef]
- Gerzon, M.A. Optimal reproduction matrices for multispeaker stereo. In Proceedings of the 91st Audio Engineering Society Convention, New York, NY, USA, 4–8 October 1991. [Google Scholar]
- Buerger, M.; Maas, R.; Löllmann, H.W.; Kellermann, W. Multizone sound field synthesis based on the joint optimization of the sound pressure and particle velocity vector on closed contours. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 18–21 October 2015; pp. 1–5. [Google Scholar]
- Buerger, M.; Hofmann, C.; Kellermann, W. Broadband multizone sound rendering by jointly optimizing the sound pressure and particle velocity. J. Acoust. Soc. Amer. 2018, 143, 1477–1490. [Google Scholar] [CrossRef] [PubMed]
- Zuo, H.; Abhayapala, T.D.; Samarasinghe, P.N. Particle velocity assisted three dimensional sound field reproduction using a modal-domain approach. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2119–2133. [Google Scholar] [CrossRef]
- Gerzon, M.A. General metatheory of auditory localisation. In Proceedings of the 92nd Audio Engineering Society Convention, Vienna, Austria, 24–27 March 1992. [Google Scholar]
- Wang, S.; Hu, R.; Chen, S.; Wang, X.; Peng, B.; Yang, Y.; Tu, W. Sound physical property matching between non central listening point and central listening point for nhk 22.2 system reproduction. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 436–440. [Google Scholar]
- Shin, M.; Fazi, F.M.; Nelson, P.A.; Seo, J. Control of velocity for sound field reproduction. In Proceedings of the 52nd International Conference: Sound Field Control-Engineering and Perception, Guildford, UK, 2–4 September 2013. [Google Scholar]
- Shin, M.; Nelson, P.A.; Fazi, F.M.; Seo, J. Velocity controlled sound field reproduction by non-uniformly spaced loudspeakers. J. Sound Vib. 2016, 370, 444–464. [Google Scholar] [CrossRef]
- Arteaga, D. An ambisonics decoder for irregular 3-D loudspeaker arrays. In Proceedings of the 134th Audio Engineering Society Convention, Rome, Italy, 4–7 May 2013. [Google Scholar]
- Scaini, D.; Arteaga, D. Decoding of higher order ambisonics to irregular periphonic loudspeaker arrays. In Proceedings of the 55th International Conference: Spatial Audio, Helsinki, Finland, 27–29 August 2014. [Google Scholar]
- Zuo, H.; Samarasinghe, P.N.; Abhayapala, T.D. Intensity based spatial soundfield reproduction using an irregular loudspeaker array. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 1356–1369. [Google Scholar] [CrossRef]
- Abhayapala, T.D.; Ward, D.B. Theory and design of high order sound field microphones using spherical microphone array. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; pp. II-1949–II-1952. [Google Scholar]
- MH Acoustics. Em32 Eigenmike Microphone Array Release Notes (v17. 0); Tech. Rep.; MH Acoustics: Summit, NJ, USA, 2013. [Google Scholar]
- Chen, H.; Abhayapala, T.D.; Zhang, W. Theory and design of compact hybrid microphone arrays on two-dimensional planes for three-dimensional soundfield analysis. J. Acoust. Soc. Amer. 2015, 138, 3081–3092. [Google Scholar] [CrossRef]
- Williams, E.G. Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography; Elsevier: Amsterdam, The Netherlands, 1999. [Google Scholar]
- Loan, C.F.V.; Golub, G.H. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 1983. [Google Scholar]
- Lilis, G.N.; Angelosante, D.; Giannakis, G.B. Sound field reproduction using the lasso. IEEE Trans. Audio Speech Lang. Process. 2010, 18, 1902–1912. [Google Scholar] [CrossRef]
- Chartrand, R.; Yin, W. Iteratively reweighted algorithms for compressive sensing. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 3869–3872. [Google Scholar]
- Candès, E.J.; Wakin, M.B. An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
- Zotkin, D.N.; Duraiswami, R.; Gumerov, N.A. Regularized hrtf fitting using spherical harmonics. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 18–21 October 2009; pp. 257–260. [Google Scholar]
- Zhang, W.; Abhayapala, T.D.; Kennedy, R.A.; Duraiswami, R. Insights into head-related transfer function: Spatial dimensionality and continuous representation. J. Acoust. Soc. Amer. 2010, 127, 2347–2357. [Google Scholar] [CrossRef]
- Bernschütz, B.; Giner, A.V.; Pörschmann, C.; Arend, J. Binaural reproduction of plane waves with reduced modal order. Acta Acust. 2014, 100, 972–983. [Google Scholar] [CrossRef]
- Schörkhuber, C.; Zaunschirm, M.; Höldrich, R. Binaural rendering of ambisonic signals via magnitude least squares. In Proc. German Annu. Conf. Acoust. (DAGA) 2018, 44, 339–342. [Google Scholar]
- Fliege, J.; Maier, U. The distribution of points on the sphere and corresponding cubature formulae. IMA J. Numer. Anal. 1999, 19, 317–334. [Google Scholar] [CrossRef]
- Lindau, A.; Hohn, T.; Weinzierl, S. Binaural resynthesis for comparative studies of acoustical environments. In Proceedings of the 122nd Audio Engineering Society Convention, Vienna, Austria, 5–8 May 2007. [Google Scholar]
- Brinkmann, F.; Dinakaran, M.; Pelzer, R.; Grosche, P.; Voss, D.; Weinzierl, S. A cross-evaluated database of measured and simulated hrtfs including 3D head meshes, anthropometric features, and headphone impulse responses. J. Audio Eng. Soc. 2019, 67, 705–718. [Google Scholar] [CrossRef]
- Fabian, B.; Manoj, D.; Robert, P.; Joschka, W.J.; Fabian, S.; Daniel, V.; Peter, G.; Stefan, W. The Hutubs Head-Related Transfer Function (Hrtf) Database. 2019. Available online: https://depositonce.tu-berlin.de/items/dc2a3076-a291-417e-97f0-7697e332c960 (accessed on 13 January 2021).
- ITU Radiocommunication Assembly. Itu-r bs. 1534-3: Method for the Subjective Assessment of Intermediate Quality Level of Audio Systems; Tech. Rep.; ITU Radiocommunication Assembly: Dubai, United Arab Emirates, 2015. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zuo, H.; Birnie, L.I.; Samarasinghe, P.N.; Abhayapala, T.D.; Tourbabin, V. Particle-Velocity-Based Mixed-Source Sound Field Translation for Binaural Reproduction. Appl. Sci. 2023, 13, 6449. https://doi.org/10.3390/app13116449
Zuo H, Birnie LI, Samarasinghe PN, Abhayapala TD, Tourbabin V. Particle-Velocity-Based Mixed-Source Sound Field Translation for Binaural Reproduction. Applied Sciences. 2023; 13(11):6449. https://doi.org/10.3390/app13116449
Chicago/Turabian StyleZuo, Huanyu, Lachlan I. Birnie, Prasanga N. Samarasinghe, Thushara D. Abhayapala, and Vladimir Tourbabin. 2023. "Particle-Velocity-Based Mixed-Source Sound Field Translation for Binaural Reproduction" Applied Sciences 13, no. 11: 6449. https://doi.org/10.3390/app13116449
APA StyleZuo, H., Birnie, L. I., Samarasinghe, P. N., Abhayapala, T. D., & Tourbabin, V. (2023). Particle-Velocity-Based Mixed-Source Sound Field Translation for Binaural Reproduction. Applied Sciences, 13(11), 6449. https://doi.org/10.3390/app13116449