Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks
Abstract
:1. Introduction
2. Psycho-Acoustic Annoyance Model
2.1. Zwicker’s Psycho-Acoustic Annoyance Model
- Psycho-acoustic Annoyance () is a perceptual attribute that allows an objective quantification of noise annoyance from the physical characteristics of the signal, based on the mean values of L, S, R, and F.
- Loudness (L) is a perceptual measure of the effect of the energy content of sound on the ear (intensity sensation), measured in Sones using a linear scale. It is standardized in ISO 532B. The process used to calculate L is based on the Specific Loudness ( or L contribution for the z-th critical band, measured in Sone/Bark. The total L is the result of accumulating all contributions across the different bands, weighted by their specific bandwidth .
- Sharpness (S) is a value of sensory human perception of unpleasantness in sounds that is caused by high frequency components. It is measured in Aures in a linear scale.
- Roughness (R) describes the perception of the sound fluctuation even when L or (i.e., the equivalent continuous sound level) remains unchanged. It analyzes the effects with different degrees of frequency modulations (around 70 Hz) in each critical band. The basic unit for R is Asper. For each ERB, is an arbitrary weighting function, m is the modulation depth of each ERB and k is the cross-correlation between the envelopes of the ERB with indexes i and .
- Fluctuation Strength (F) describes how strongly or weakly sounds fluctuate. It depends on the frequency and depth of the L fluctuations, around 4 Hz in each ERB. It is measured in Vacils.
2.2. Signal Processing and Computing Time
- PC-1: Intel(R) Core i7-7700CPU 3.6 Ghz x64; 16 GB RAM; NVIDIA GTX 1060 6 GB.
- PC-2: 2 × Intel(R) Xenon(R)Silver 2.2 Ghz x64; 96 GB RAM; NVIDIA GTX 1080 8 GB.
- Raspberry Pi 3B: CPU + GPU: Broadcom BCM2837 Cortex-A53 (ARMv8) x64 1.2 GHz; 1 GB RAM.
3. Materials and Methods
3.1. Datasets
3.2. Regression CNN Design and Training
- Convolutional layer, that applies sliding convolutional filters to the input.
- Batch Normalization layer, that normalizes each input channel across a mini-batch to speed up the training process.
- Rectified Linear Unit (ReLU) layer, which sets to zero any negative input value.
- Max-pool layer, that performs a down-sampling of the input by dividing the input into same size pooling regions, and computing the maximum of each region.
- Solver: SGDM
- Momentum: 0.9000
- InitialLearnRate: 1.0000 × 10
- L2Regularization: 1.0000 × 10
- GradientThresholdMethod: L2 Norm
- MaxEpochs: 70
- MiniBatchSize: 128
- ValidationData: 11,830 elements
- ValidationFrequency: 1107
- Shuffle: Every epoch
4. Evaluation and Results
4.1. Training Results and Accuracy Test
4.2. Direct Calculation vs. CNN Prediction in Terms of Computation Time
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Cobos, M.; Perez-Solano, J.; Felici-Castell, S.; Segura Garcia, J.; Navarro, J.M. Cumulative-Sum-Based Localization of Sound Events in Low-Cost Wireless Acoustic Sensor Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1792–1802. [Google Scholar] [CrossRef]
- Cobos, M.; Antonacci, F.; Alexandridis, A.; Mouchtaris, A.; Lee, B. A Survey of Sound Source Localization Methods in Wireless Acoustic Sensor Networks. Wirel. Commun. Mob. Comput. 2017, 2017, 3956282. [Google Scholar] [CrossRef]
- Noriega-Linares, J.E.; Rodriguez-Mayol, A.; Cobos, M.; Segura Garcia, J.; Felici-Castell, S.; Navarro, J.M. A Wireless Acoustic Array System for Binaural Loudness Evaluation in Cities. IEEE Sens. J. 2017, 17, 7043–7052. [Google Scholar] [CrossRef]
- Cobos, M.; Perez-Solano, J.J.; Berger, L.T. Acoustic-based technologies for ambient assisted living. In Introduction to Smart eHealth and eCare Technologies; Sari Merilampi, A.S., Ed.; Taylor & Francis Group: Boca Raton, FL, USA, 2016; Chapter 9; pp. 159–177. [Google Scholar]
- International Organization for Standardization (ISO). ISO 1996-1: 2016, Acoustics-Description, Measurement and Assessment of Environmental Noise—Part 1: Basic Quantities and Assessment Procedures; Technical Report; ISO: Geneva, Switzerland, 2016. [Google Scholar]
- Li, B.; Tao, S.; Dawson, R. Evaluation and analysis of traffic noise from the main urban roads in Beijing. Appl. Acoust. 2002, 63, 1137–1142. [Google Scholar] [CrossRef]
- NoiseMap Ltd. Noise Map, Environmental Noise Mapping Software. 2018. Available online: http://www.londonnoisemap.com (accessed on 19 June 2019).
- Segura-Garcia, J.; Felici-Castell, S.; Perez-Solano, J.J.; Cobos, M.; Navarro, J.M. Low-Cost Alternatives for Urban Noise Nuisance Monitoring Using Wireless Sensor Networks. IEEE Sens. J. 2015, 15, 836–844. [Google Scholar] [CrossRef]
- Fastl, H.; Zwicker, E. Psychoacoustics: Facts and Models; Springer: Berlin/Heidelberg, Germany, 2006; pp. 327–329. [Google Scholar]
- Maffei, L.; Masullo, M.; Toma, R.A.; Ciaburro, G.; Firat, H.B. Awaking the awareness of the movida noise on residents: Measurements, experiments and modelling. In Proceedings of the 48th Inter-Noise, Madrid, Spain, 16–19 June 2019. [Google Scholar]
- International Organization for Standardization (ISO). ISO 12913-1: 2014, Acoustics-Soundscape—Part 1: Definition and Conceptual Framework; Technical Report; ISO: Geneva, Switzerland, 2014. [Google Scholar]
- Zanella, A.; Bui, N.; Castellani, A.; Vangelista, L.; Zorzi, M. Internet of Things for Smart Cities. IEEE Internet Things J. 2014, 1, 22–32. [Google Scholar] [CrossRef]
- Segura-Garcia, J.; Perez-Solano, J.J.; Cobos, M.; Navarro, E.; Felici-Castell, S.; Soriano, A.; Montes, F. Spatial Statistical Analysis of Urban Noise Data from a WASN Gathered by an IoT System: Application to a Small City. Appl. Sci. 2016, 6, 380. [Google Scholar] [CrossRef]
- International Organization for Standardization (ISO). ISO/TS 12913-2: 2018, Acoustics-Soundscape—Part 2: Data Collection and Reporting Requirements; Technical Report; ISO: Geneva, Switzerland, 2018. [Google Scholar]
- Salamon, J.; Jacoby, C.; Bello, J.P. A Dataset and Taxonomy for Urban Sound Research. In Proceedings of the 22nd ACM International Conference on Multimedia (MM ’14), Orlando, FL, USA, 3–7 November 2014; ACM: New York, NY, USA, 2014; pp. 1041–1044. [Google Scholar] [CrossRef]
- Gelfand, S.A. Hearing: An Introduction to Psychological and Physiological Acoustics; CRC Press: Boca-Raton, FL, USA, 2017. [Google Scholar]
- Terhardt, E.; Stoll, G.; Seewann, M. Algorithm for extraction of pitch and pitch salience from complex tonal signals. J. Acoust. Soc. Am. 1982, 71, 679–688. [Google Scholar] [CrossRef]
- Lingsong, H.; Crocker, M.J.; Ran, Z. FFT based complex critical band filter bank and time-varying loudness, fluctuation strength and roughness. In Proceedings of the International Congress on Sound and Vibration 2007 (ICSV14), Cairns, Australia, 9–12 July 2007. [Google Scholar]
- Pastor-Aparicio, A.; Lopez-Ballester, J.; Segura-Garcia, J.; Felici-Castell, S.; Cobos, M.; Fayos-Jordan, R.; Perez-Solano, J. Real time implementation for psycho-acoustic annoyance monitoring on wireless acoustic sensor networks. In Proceedings of the 48th Inter-Noise, Madrid, Spain, 16–19 June 2019. [Google Scholar]
- Belloch, J.A.; Badía, J.M.; Igual, F.D.; Cobos, M. Practical Considerations for Acoustic Source Localization in the IoT Era: Platforms, Energy Efficiency and Performance. IEEE Internet Things J. 2019, 6, 5068–5079. [Google Scholar] [CrossRef]
- Aytar, Y.; Vondrick, C.; Torralba, A. SoundNet: Learning Sound Representations from Unlabeled Video. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, 5–10 December 2016; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 892–900. [Google Scholar]
- Giannakopoulos, T.; Perantonis, S. Recognition of Urban Sound Events using Deep Context-Aware Feature Extractors and Handcrafted Features. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Rhodes, Greece, 25–27 May 2018. [Google Scholar]
- Martin-Morato, I.; Mesaros, A.; Heittola, T.; Virtanen, T.; Cobos, M.; Ferri, F.J. Sound Event Envelope Estimation in Polyphonic Mixtures. In Proceedings of the 2019 IEEE International Conference on Acoustics (ICASSP 2019), Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 935–939. [Google Scholar] [CrossRef]
- Grais, E.; Umut Sen, M.; Erdogan, H. Deep neural networks for single channel source separation. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2013. [Google Scholar] [CrossRef]
- Vera-Diaz, J.; Pizarro, D.; Macias-Guarasa, J. Towards End-to-End Acoustic Localization Using Deep Learning: From Audio Signals to Source Position Coordinates. Sensors 2018, 18, 3418. [Google Scholar] [CrossRef] [PubMed]
L | S | R | F | ||
---|---|---|---|---|---|
Avg. t PC | 0.0561 | 0.0001 | 0.5152 | 0.6429 | 1.2143 |
RPi3B t | 0.0610 | 0.0001 | 0.8520 | 0.7423 | 1.6554 |
Layer | Size | Filters | Stride |
---|---|---|---|
Input | 16,000 × 1 | ||
Convolutional S1 | 512 × 1 | 10 | 10 |
Batch Norm. S1 | |||
ReLU. S1 | |||
Max Pool S1 | 2 × 1 | 2 | |
Convolutional S2 | 256 × 1 | 20 | 5 |
Batch Norm. S2 | |||
ReLU. S2 | |||
Max Pool S2 | 2 × 1 | 2 | |
Convolutional S3 | 128 × 1 | 40 | 2 |
Batch Norm. S3 | |||
ReLU. S3 | |||
Max Pool S3 | 2 × 1 | 2 | |
Convolutional S4 | 64 × 1 | 60 | 2 |
Batch Norm. S4 | |||
ReLU. S4 | |||
Max Pool S4 | 2 × 1 | 2 | |
Convolutional S5 | 32 × 1 | 80 | 1 |
Batch Normalization S5 | |||
ReLU | |||
Max Pool S5 | 2 × 1 | 2 | |
Dropout 30% | |||
Fully Connected | 1 × 5 | ||
Regression Output | 1 × 5 |
No Conv. Stages | PA Prediction RMSE |
---|---|
2 stages | 12.981 |
3 stages | 4.512 |
4 stages | 3.039 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lopez-Ballester, J.; Pastor-Aparicio, A.; Segura-Garcia, J.; Felici-Castell, S.; Cobos, M. Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks. Appl. Sci. 2019, 9, 3136. https://doi.org/10.3390/app9153136
Lopez-Ballester J, Pastor-Aparicio A, Segura-Garcia J, Felici-Castell S, Cobos M. Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks. Applied Sciences. 2019; 9(15):3136. https://doi.org/10.3390/app9153136
Chicago/Turabian StyleLopez-Ballester, Jesus, Adolfo Pastor-Aparicio, Jaume Segura-Garcia, Santiago Felici-Castell, and Maximo Cobos. 2019. "Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks" Applied Sciences 9, no. 15: 3136. https://doi.org/10.3390/app9153136
APA StyleLopez-Ballester, J., Pastor-Aparicio, A., Segura-Garcia, J., Felici-Castell, S., & Cobos, M. (2019). Computation of Psycho-Acoustic Annoyance Using Deep Neural Networks. Applied Sciences, 9(15), 3136. https://doi.org/10.3390/app9153136