Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
Abstract
:1. Introduction
2. Materials and Methods
2.1. Preparation of the Dataset
2.2. Deep Recurrent Neural Networks for Reverberated Signal Modeling
- In the same frequency band used in the training, but replacing the input samples with previously unknown ones;
- In two adjacent frequency bands, when the model was trained on the middle band and tested on adjacent bands
- In all frequency bands when the octave is divided into 12 parts. Firstly, when a separate model was trained to predict each frequency band, and secondly, when the prediction was performed by taking input data from each frequency band separately, and the reverberated signal was predicted using a model trained on only one frequency band.
3. Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mane, S.S.; Mali, S.G.; Mahajan, S.P. Localization of Steady Sound Source and Direction Detection of Moving Sound Source Using CNN. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 6–8 July 2019. [Google Scholar]
- Tang, Z.; Meng, H.Y.; Manocha, D. Low-Frequency Compensated Synthetic Impulse Responses for Improved Far-Field Speech Recognition. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6974–6978. [Google Scholar]
- Jenrungrot, T.; Jayaram, V.; Seitz, S.; Kemelmacher-Shlizerman, I. The Cone of Silence: Speech Separation by Localization. 2020. Available online: https://arxiv.org/abs/2010.06007 (accessed on 30 April 2023).
- Bergner, J.; Preihs, S.; Hupke, R.; Peissig, J. A System for Room Response Equalization of Listening Areas Using Parametric Peak Filters. In Proceedings of the 2019 AES International Conference on Immersive and Interactive Audio (March 2019), York, UK, 27–29 March 2019. [Google Scholar]
- Cecchi, S.; Carini, A.; Spors, S. Room Response Equalization—A Review. Appl. Sci. 2018, 8, 16. [Google Scholar] [CrossRef]
- Fuster, L.; De Diego, M.; Azpicueta-Ruiz, L.A.; Ferrer, M. Adaptive Filtered-x Algorithms for Room Equalization Based on Block-Based Combination Schemes. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 1732–1745. [Google Scholar] [CrossRef]
- Kurian, N.C.; Patel, K.; George, N.V. Robust Active Noise Control: An Information Theoretic Learning Approach. Appl. Acoust. 2017, 117, 180–184. [Google Scholar] [CrossRef]
- He, Z.C.; Ye, H.H.; Li, E. An Efficient Algorithm for Nonlinear Active Noise Control of Impulsive Noise. Appl. Acoust. 2019, 148, 366–374. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, H.; Wang, G. Fixed-Point Generalized Maximum Correntropy: Convergence Analysis and Convex Combination Algorithms. Signal Process. 2019, 154, 64–73. [Google Scholar] [CrossRef]
- Kumar, K.; George, N.V. A Generalized Maximum Correntropy Criterion Based Robust Sparse Adaptive Room Equalization. Appl. Acoust. 2020, 158, 107036. [Google Scholar] [CrossRef]
- ISO 3382-1; Acoustics—Measurement of Room Acoustic Parameters —Part 1: Performance Spaces. International Organization for Standardization: Geneva, Switzerland, 2009.
- Allen, J.B.; Berkley, D.A. Image Method for Efficiently Simulating Small-Room Acoustics. J. Acoust. Soc. Am. 1979, 65, 943–950. [Google Scholar] [CrossRef]
- Tang, Z.; Chen, L.; Wu, B.; Yu, D.; Manocha, D. Improving Reverberant Speech Training Using Diffuse Acoustic Simulation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6969–6973. [Google Scholar]
- Yu, W.; Kleijn, W.B. Room Acoustical Parameter Estimation from Room Impulse Responses Using Deep Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 436–447. [Google Scholar] [CrossRef]
- Habets, E. RIR Generator. 2010. Available online: https://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator (accessed on 30 April 2023).
- Szoke, I.; Skacel, M.; Mosner, L.; Paliesek, J.; Cernocky, J.H. Building and Evaluation of a Real Room Impulse Response Dataset. IEEE J. Sel. Top. Signal Process. 2019, 13, 863–876. [Google Scholar] [CrossRef]
- Shabtai, N.R.; Zigel, Y.; Rafaely, B. Room Volume Classification from Room Impulse Response Using Statistical Pattern Recognition and Feature Selection. J. Acoust. Soc. Am. 2010, 128, 1155–1162. [Google Scholar] [CrossRef] [PubMed]
- Dua, S.; Kumar, S.S.; Albagory, Y.; Ramalingam, R.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; Alshamrani, S.S.; Alghamdi, A.S. Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network. Appl. Sci. 2022, 12, 6223. [Google Scholar] [CrossRef]
- Attas, D.; Power, N.; Smithies, J.; Bee, C.; Aadahl, V.; Kellett, S.; Blackmore, C.; Christensen, H. Automated Detection of the Competency of Delivering Guided Self-Help for Anxiety via Speech and Language Processing. Appl. Sci. 2022, 12, 8608. [Google Scholar] [CrossRef]
- Alluhaidan, A.S.; Saidani, O.; Jahangir, R.; Nauman, M.A. Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network. Appl. Sci. 2023, 13, 4750. [Google Scholar] [CrossRef]
- Silaa, V.; Masui, F.; Ptaszynski, M. A Method of Supplementing Reviews to Less-Known Tourist Spots Using Geotagged Tweets. Appl. Sci. 2022, 12, 2321. [Google Scholar] [CrossRef]
- Pörschmann, C.; Arend, J.M. Analyzing the Directivity Patterns of Human Speakers. In Proceedings of the 46th DAGA, Hannover, Germany, 16–19 March 2020; pp. 1141–1144. [Google Scholar]
- ODEON Room Acoustics Software User’s Manual. Version 16. Available online: https://odeon.dk/download/Version17/OdeonManual.pdf (accessed on 30 April 2023).
- Bradley, J.S. Review of Objective Room Acoustics Measures and Future Needs. Appl. Acoust. 2011, 72, 713–720. [Google Scholar] [CrossRef]
- Irie, K.; Tüske, Z.; Alkhouli, T.; Schlüter, R.; Ney, H. LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition. In Proceedings of the Interspeech 2016, San Francisco, CA, USA, 8–12 September 2016; pp. 3519–3523. [Google Scholar]
- Kurata, G.; Audhkhasi, K. Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 18–21 December 2018. [Google Scholar]
- Alom, M.Z.; Taha, T.M.; Yakopcic, C.; Westberg, S.; Sidike, P.; Nasrin, M.S.; Hasan, M.; Van Essen, B.C.; Awwal, A.A.S.; Asari, V.K. A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics 2019, 8, 292. [Google Scholar] [CrossRef]
- Wenzel, E.M. Effect of increasing system latency on localization of virtual sounds. In Proceedings of the 16th International Conference: Spatial Sound Reproduction (March 1999), Arktikum, Finland, 10–12 April 1999. [Google Scholar]
RNN | Layer Size | SSE (103) | RSS [min max] | R-Squared | RMSE [min max] |
---|---|---|---|---|---|
10 + 20 | 15.85 | [0.24 65.29] | −0.13 | [3.36 × 1.46] | |
LSTM | 20 + 20 | 3.67 | [0.23 37.39] | 0.74 | [2.24 × 0.66] |
20 + 40 | 1.36 | [0.19 26.03] | 0.90 | [1.57 × 0.56] | |
10 + 20 | 6.43 | [0.15 43.44] | 0.54 | [1.18 × 0.81] | |
BiLSTM | 20 + 20 | 2.71 | [0.17 22.16] | 0.81 | [7.09 × 0.37] |
20 + 40 | 2.16 | [0.17 20.38] | 0.85 | [2.06 × 0.38] | |
10 + 20 | 5.05 | [0.32 29.97] | 0.64 | [1.30 × 0.60] | |
GRU | 20 + 20 | 2.45 | [0.21 34.03] | 0.83 | [6.37 × 0.60] |
20 + 40 | 3.31 | [0.18 32.59] | 0.77 | [7.99 × 0.66] |
RNN | Bin Number | SSE (104) | RSS (Mean) | R-Squared | RMSE (Mean) |
---|---|---|---|---|---|
1 | 30.5 | 9.59 | 0.63 | 0.1621 | |
2 | 52.6 | 12.54 | 0.37 | 0.2272 | |
3 | 75.6 | 14.45 | 0.27 | 0.2703 | |
4 | 84.3 | 14.74 | 0.27 | 0.2793 | |
5 | 65.4 | 14.10 | 0.32 | 0.2545 | |
LSTM | 6 | 17.6 | 9.16 | 0.57 | 0.1337 |
7 | 1.72 | 2.95 | 0.92 | 0.0191 | |
8 | 12.8 | 4.78 | 0.70 | 0.0523 | |
9 | 106 | 5.69 | 0.33 | 0.0881 | |
10 | 129 | 6.92 | 0.27 | 0.1177 | |
11 | 117 | 6.55 | 0.35 | 0.1060 | |
12 | 147 | 6.99 | 0.39 | 0.1091 | |
1 | 15.9 | 8.48 | 0.81 | 0.1788 | |
2 | 13.0 | 9.71 | 0.84 | 0.2240 | |
3 | 15.1 | 10.55 | 0.85 | 0.2614 | |
4 | 18.3 | 10.71 | 0.84 | 0.2749 | |
5 | 15.7 | 10.77 | 0.84 | 0.2709 | |
BiLSTM | 6 | 8.1 | 7.28 | 0.80 | 0.1507 |
7 | 1.10 | 1.94 | 0.95 | 0.0145 | |
8 | 10.4 | 4.22 | 0.76 | 0.0489 | |
9 | 86.8 | 5.09 | 0.45 | 0.0821 | |
10 | 104 | 4.76 | 0.41 | 0.1321 | |
11 | 103 | 4.84 | 0.43 | 0.1218 | |
12 | 114 | 6.28 | 0.52 | 0.0817 |
R-Squared | 5 bin | 8 bin | 11 bin | 14 bin | 17 bin | 20 bin | 23 bin | 26 bin |
---|---|---|---|---|---|---|---|---|
Bin at the Left | 0.72 | 0.64 | 0.68 | 0.90 | 0.61 | 0.13 | 0.76 | 0.85 |
Bin at the Center | 0.78 | 0.84 | 0.87 | 0.98 | 0.95 | 0.95 | 0.87 | 0.88 |
Bin at the Right | 0.65 | 0.74 | 0.71 | 0.79 | 0.30 | 0.62 | 0.75 | 0.87 |
RMSE (Mean) | 5 bin | 8 bin | 11 bin | 14 bin | 17 bin | 20 bin | 23 bin | 26 bin |
---|---|---|---|---|---|---|---|---|
Bin at the Left | 0.0971 | 0.1375 | 0.1491 | 0.0680 | 0.0742 | 0.3782 | 0.0746 | 0.0678 |
Bin at the Center | 0.0777 | 0.0680 | 0.0687 | 0.0182 | 0.0189 | 0.0260 | 0.0367 | 0.0484 |
Bin at the Right | 0.1644 | 0.1100 | 0.1335 | 0.1206 | 0.1737 | 0.2157 | 0.0940 | 0.0531 |
RNN | Bin Number | SSE (103) | RSS (Mean) | R-Squared | RMSE (Mean) |
---|---|---|---|---|---|
1 | 30.42 | 3.59 | 0.96 | 0.0227 | |
2 | 70.52 | 4.41 | 0.91 | 0.0325 | |
3 | 59.89 | 4.89 | 0.94 | 0.0354 | |
4 | 110.75 | 5.62 | 0.90 | 0.0483 | |
5 | 117.16 | 5.53 | 0.88 | 0.0452 | |
LSTM | 6 | 38.58 | 4.55 | 0.91 | 0.0311 |
7 | 17.16 | 2.95 | 0.92 | 0.0191 | |
8 | 19.62 | 2.93 | 0.95 | 0.0194 | |
9 | 23.79 | 2.18 | 0.98 | 0.0113 | |
10 | 245.23 | 1.94 | 0.86 | 0.0176 | |
11 | 315.59 | 8.44 | 0.82 | 0.1642 | |
12 | 16.87 | 2.13 | 0.99 | 0.0134 | |
1 | 38.78 | 3.05 | 0.95 | 0.0278 | |
2 | 62.71 | 3.30 | 0.92 | 0.0249 | |
3 | 29.26 | 3.46 | 0.97 | 0.0236 | |
4 | 36.86 | 3.94 | 0.97 | 0.0268 | |
5 | 47.86 | 3.82 | 0.95 | 0.0283 | |
BiLSTM | 6 | 19.48 | 3.19 | 0.95 | 0.0222 |
7 | 10.99 | 1.94 | 0.95 | 0.0145 | |
8 | 10.95 | 1.92 | 0.97 | 0.0126 | |
9 | 10.47 | 1.55 | 0.99 | 0.0099 | |
10 | 36.03 | 1.14 | 0.98 | 0.0097 | |
11 | 39.34 | 1.25 | 0.98 | 0.0096 | |
12 | 4.25 | 1.57 | 1.00 | 0.0092 | |
1 | 32.56 | 3.69 | 0.96 | 0.0241 | |
2 | 68.70 | 4.73 | 0.92 | 0.0365 | |
3 | 86.63 | 5.16 | 0.92 | 0.0430 | |
4 | 145.46 | 5.60 | 0.87 | 0.0527 | |
5 | 73.08 | 5.49 | 0.92 | 0.0409 | |
GRU | 6 | 89.77 | 4.85 | 0.78 | 0.0405 |
7 | 95.12 | 3.90 | 0.58 | 0.0418 | |
8 | 24.23 | 2.95 | 0.94 | 0.0216 | |
9 | 57.30 | 2.43 | 0.96 | 0.0164 | |
10 | N/A | 32.44 | N/A | 0.6615 | |
11 | N/A | 31.48 | N/A | 0.6381 | |
12 | 124.86 | 2.69 | 0.95 | 0.0210 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tamulionis, M.; Sledevič, T.; Serackis, A. Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal. Appl. Sci. 2023, 13, 5604. https://doi.org/10.3390/app13095604
Tamulionis M, Sledevič T, Serackis A. Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal. Applied Sciences. 2023; 13(9):5604. https://doi.org/10.3390/app13095604
Chicago/Turabian StyleTamulionis, Mantas, Tomyslav Sledevič, and Artūras Serackis. 2023. "Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal" Applied Sciences 13, no. 9: 5604. https://doi.org/10.3390/app13095604
APA StyleTamulionis, M., Sledevič, T., & Serackis, A. (2023). Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal. Applied Sciences, 13(9), 5604. https://doi.org/10.3390/app13095604