applsci-logo

Journal Browser

Journal Browser

New Advances in Audio Signal Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 23058

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
Interests: machine learning; deep learning; speech processing; speech emotion recognition; automatic speech diagnosis systems
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
Interests: machine learning; speech processing; artificial intelligence; deep learing

Special Issue Information

Dear Colleagues,

Recent increases in computational power have heavily impacted the field of data analysis and processing, especially with the diffusion of artificial intelligence (AI) and deep learning. Audio signals have always carried a huge amount of information and have thus received a great boost in terms of processing and inference options. This is reflected in the fast improvements to classically difficult tasks such as speech recognition and sound classification, while opening up entirely new fields of machine learning-empowered audio analysis, such as affective computing.

Therefore, many of the new perspectives in audio signal processing involve the design and use of AI techniques that use audio as an input, such as classification/recognition tasks, or that are aimed at the enhancement of audio signals, such as algorithmic denoising or equalization. More generally, however, automation is the key aspect for the current business and academic fields involving signals-as-data. Therefore, fast, automatic solutions to the parsing, segmentation, labelling, and processing of audio data have to be considered a vivid topic. Additionally, despite the data-driven nature of AI, domain-specific knowledge allows for the enhancement of any solution, which leads to signal processing being instrumental in phases such as data augmentation or preparation.

Nevertheless, “classical” signal processing approaches based on new or partially unexplored domains or techniques are obviously also a valuable topic when taking into account the heavy use of acoustic features in many fields, including AI.

Prof. Dr. Giovanni Costantini
Dr. Daniele Casali
Guest Editors

(Dr. Valerio Cesarini will assist us in organizing this Special Issue)

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • audio signal processing
  • audio analysis
  • audio classification
  • speech recognition
  • machine learning
  • artificial intelligence

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1460 KiB  
Article
A Scalogram-Based CNN Approach for Audio Classification in Construction Sites
by Michele Scarpiniti, Raffaele Parisi and Yong-Cheol Lee
Appl. Sci. 2024, 14(1), 90; https://doi.org/10.3390/app14010090 - 21 Dec 2023
Cited by 2 | Viewed by 2154
Abstract
The automatic monitoring of activities in construction sites through the proper use of acoustic signals is a recent field of research that is currently in continuous evolution. In particular, the use of techniques based on Convolutional Neural Networks (CNNs) working on the spectrogram [...] Read more.
The automatic monitoring of activities in construction sites through the proper use of acoustic signals is a recent field of research that is currently in continuous evolution. In particular, the use of techniques based on Convolutional Neural Networks (CNNs) working on the spectrogram of the signal or its mel-scale variants was demonstrated to be quite successful. Nevertheless, the spectrogram has some limitations, which are due to the intrinsic trade-off between temporal and spectral resolutions. In order to overcome these limitations, in this paper, we propose employing the scalogramas a proper time–frequency representation of the audio signal. The scalogram is defined as the square modulus of the Continuous Wavelet Transform (CWT) and is known as a powerful tool for analyzing real-world signals. Experimental results, obtained on real-world sounds recorded in construction sites, have demonstrated the effectiveness of the proposed approach, which is able to clearly outperform most state-of-the-art solutions. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

18 pages, 5556 KiB  
Article
A Stable Sound Field Control Method for a Personal Audio System
by Song Wang and Cong Zhang
Appl. Sci. 2023, 13(22), 12209; https://doi.org/10.3390/app132212209 - 10 Nov 2023
Viewed by 1204
Abstract
A personal audio system has a wide application prospect in people’s lives, which can be implemented by sound field control technology. However, the current sound field control technology is mainly based on sound pressure or its improvement, ignoring another physical property of sound: [...] Read more.
A personal audio system has a wide application prospect in people’s lives, which can be implemented by sound field control technology. However, the current sound field control technology is mainly based on sound pressure or its improvement, ignoring another physical property of sound: particle velocity, which is not conducive to the stability of the entire reconstruction system. To address the problem, a sound field method is constructed in this paper, which minimizes the reconstruction error in the bright zone, minimizes the loudspeaker array effort in the reconstruction system, and at the same time controls the particle velocity and sound pressure of the dark zone. Five unevenly placed loudspeakers were used as the initial setup for the computer simulation experiment. Simulation results suggest that the proposed method is better than the PM (pressure matching) and EDPM (eigen decomposition pseudoinverse method) methods in the bright zone in an acoustic contrast index, the ACC (acoustic contrast control) method in a reconstruction error index, and the ACC, PM, and EDPM methods in the bright zone in a loudspeaker array effort index. The average array effort of the proposed method is the smallest, which is about 9.4790, 8.0712, and 4.8176 dB less than that of the ACC method, the PM method in the bright zone, and the EDPM method in the bright zone, respectively, so the proposed method can produce the most stable reconstruction system when the loudspeaker system is not evenly placed. The results of computer experiments demonstrate the performance of the proposed method, and suggest that compared with traditional methods, the proposed method can achieve more balanced results in the three indexes of acoustic contrast, reconstruction error, and loudspeaker array effort on the whole. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

26 pages, 7723 KiB  
Article
A Feasibility Study for a Hand-Held Acoustic Imaging Camera
by Danilo Greco
Appl. Sci. 2023, 13(19), 11110; https://doi.org/10.3390/app131911110 - 9 Oct 2023
Viewed by 1716
Abstract
Acoustic imaging systems construct spatial maps of sound sources and have potential in various applications, but large, cumbersome form factors limit their adoption. This paper investigates methodologies to miniaturize acoustic camera systems for improved mobility. Our approach optimizes planar microphone array design to [...] Read more.
Acoustic imaging systems construct spatial maps of sound sources and have potential in various applications, but large, cumbersome form factors limit their adoption. This paper investigates methodologies to miniaturize acoustic camera systems for improved mobility. Our approach optimizes planar microphone array design to achieve directional sensing capabilities on significantly reduced footprints compared to benchmarks. The current prototype utilizes a 128−microphone, 50 × 50 cm2 array with beamforming algorithms to visualize acoustic fields in real time but its stationary bulk hampers portability. We propose minimizing the physical aperture by carefully selecting microphone positions and quantities with tailored spatial filter synthesis. This irregular array geometry concentrates sensitivity toward target directions while avoiding aliasing artefacts. Simulations demonstrate a 32−element, ≈20 × 20 cm2 array optimized this way can outperform the previous array in directivity and noise suppression in a sub-range of frequencies below 4 kHz, supporting a 4× surface factor reduction with acceptable trade-offs. Ongoing work involves building and testing miniature arrays to validate performance predictions and address hardware challenges. The improved mobility of compact acoustic cameras could expand applications in car monitoring, urban noise mapping and other industrial fields limited by current large systems. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

20 pages, 2713 KiB  
Article
OneBitPitch (OBP): Ultra-High-Speed Pitch Detection Algorithm Based on One-Bit Quantization and Modified Autocorrelation
by Davide Coccoluto, Valerio Cesarini and Giovanni Costantini
Appl. Sci. 2023, 13(14), 8191; https://doi.org/10.3390/app13148191 - 14 Jul 2023
Cited by 2 | Viewed by 3935
Abstract
This paper presents a novel, high-speed, and low-complexity algorithm for pitch (F0) detection, along with a new dataset for testing and a comparison of some of the most effective existing techniques. The algorithm, called OneBitPitch (OBP), is based on a modified autocorrelation function [...] Read more.
This paper presents a novel, high-speed, and low-complexity algorithm for pitch (F0) detection, along with a new dataset for testing and a comparison of some of the most effective existing techniques. The algorithm, called OneBitPitch (OBP), is based on a modified autocorrelation function applied to a single-bit signal for fast computation. The focus is explicitly on speed for real-time pitch detection applications in pitch detection. A testing procedure is proposed using a proprietary synthetic dataset (SYNTHPITCH) against three of the most widely used algorithms: YIN, SWIPE (Sawtooth Inspired Pitch Estimator) and NLS (Nonlinear-Least Squares-based). The results show how OBP is 9 times faster than the fastest of its alternatives, and 50 times faster than a gold standard like SWIPE, with a mean elapsed time of 4.6 ms, or 0.046 × realtime. OBP is slightly less accurate for high-precision landmarks and noisy signals, but its performance in terms of acceptable error (<2%) is comparable to YIN and SWIPE. NLS emerges as the most accurate, but it is not flexible, being dependent on the input and requiring prior setup. OBP shows to be robust to octave errors while providing acceptable accuracies at ultra-high speeds, with a building nature suited for FPGA (Field-Programmable Gate Array) implementations. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

12 pages, 1840 KiB  
Article
Building Ensemble of Resnet for Dolphin Whistle Detection
by Loris Nanni, Daniela Cuza and Sheryl Brahnam
Appl. Sci. 2023, 13(14), 8029; https://doi.org/10.3390/app13148029 - 10 Jul 2023
Cited by 3 | Viewed by 1476
Abstract
Ecoacoustics is arguably the best method for monitoring marine environments, but analyzing and interpreting acoustic data has traditionally demanded substantial human supervision and resources. These bottlenecks can be addressed by harnessing contemporary methods for automated audio signal analysis. This paper focuses on the [...] Read more.
Ecoacoustics is arguably the best method for monitoring marine environments, but analyzing and interpreting acoustic data has traditionally demanded substantial human supervision and resources. These bottlenecks can be addressed by harnessing contemporary methods for automated audio signal analysis. This paper focuses on the problem of assessing dolphin whistles using state-of-the-art deep learning methods. Our system utilizes a fusion of various resnet50 networks integrated with data augmentation (DA) techniques applied not to the training data but to the test set. We also present training speeds and classification results using DA to the training set. Through extensive experiments conducted on a publicly available benchmark, our findings demonstrate that our ensemble yields significant performance enhancements across several commonly used metrics. For example, our approach obtained an accuracy of 0.949 compared to 0.923, the best reported in the literature. We also provide training and testing sets that other researchers can use for comparison purposes, as well as all the MATLAB/PyTorch source code used in this study. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

28 pages, 1814 KiB  
Article
Weakly Supervised U-Net with Limited Upsampling for Sound Event Detection
by Sangwon Lee, Hyemi Kim and Gil-Jin Jang
Appl. Sci. 2023, 13(11), 6822; https://doi.org/10.3390/app13116822 - 4 Jun 2023
Cited by 3 | Viewed by 1438
Abstract
Sound event detection (SED) is the task of finding the identities of sound events, as well as their onset and offset timings from audio recordings. When complete timing information is not available in the training data, but only the event identities are known, [...] Read more.
Sound event detection (SED) is the task of finding the identities of sound events, as well as their onset and offset timings from audio recordings. When complete timing information is not available in the training data, but only the event identities are known, SED should be solved by weakly supervised learning. The conventional U-Net with global weighted rank pooling (GWRP) has shown a decent performance, but extensive computation is demanded. We propose a novel U-Net with limited upsampling (LUU-Net) and global threshold average pooling (GTAP) to reduce the model size, as well as the computational overhead. The expansion along the frequency axis in the U-Net decoder was minimized, so that the output map sizes were reduced by 40% at the convolutional layers and 12.5% at the fully connected layers without SED performance degradation. The experimental results on a mixed dataset of DCASE 2018 Tasks 1 and 2 showed that our limited upsampling U-Net (LUU-Net) with GTAP was about 23% faster in training and achieved 0.644 in audio tagging and 0.531 in weakly supervised SED tasks in terms of F1 scores, while U-Net with GWRP showed 0.629 and 0.492, respectively. The major contribution of the proposed LUU-Net is the reduction in the computation time with the SED performance being maintained or improved. The other proposed method, GTAP, further improved the training time reduction and provides versatility for various audio mixing conditions by adjusting a single hyperparameter. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

27 pages, 8292 KiB  
Article
Vacuum Cleaner Noise Annoyance: An Investigation of Psychoacoustic Parameters, Effect of Test Methodology, and Interaction Effect between Loudness and Sharpness
by Serkan Atamer and Mehmet Ercan Altinsoy
Appl. Sci. 2023, 13(10), 6136; https://doi.org/10.3390/app13106136 - 17 May 2023
Viewed by 2921
Abstract
The first aim of this paper was to determine the variability in the signal characteristics and psychoacoustic data of canister-type vacuum cleaners. Fifteen vacuum cleaners with different sound power levels, provided by the manufacturers, were selected as test units to calculate their acoustic [...] Read more.
The first aim of this paper was to determine the variability in the signal characteristics and psychoacoustic data of canister-type vacuum cleaners. Fifteen vacuum cleaners with different sound power levels, provided by the manufacturers, were selected as test units to calculate their acoustic and psychoacoustic parameters. The selection of the devices was based on an even distribution of the reported sound power levels. The investigated variability in the acoustic and psychoacoustic parameters on different vacuum cleaners was discussed to derive the common characteristics of canister-type vacuum cleaner noise. The derived common characteristics were compared with the those in the available literature on the noise generation mechanisms of vacuum cleaners. Based on these characteristics, prototypical vacuum cleaner noise was defined. The second aim of this paper was to understand the annoyance perception of vacuum cleaner noise. Annoyance assessments were obtained from two sets of listening experiments. The first listening experiment was conducted to find the correlates of annoyance evaluations. Loudness, sharpness and tonal components at lower and higher frequencies were found to be dominant correlates of vacuum cleaner noise annoyance estimations. In the second listening experiment, a possible interaction between loudness and sharpness was investigated in different listening test methods. The selected loudness and sharpness values for this experiment were consistent with the observed ranges in the first part. No significant interaction between loudness and sharpness was observed, although each separately correlated significantly positively with annoyance. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

12 pages, 1419 KiB  
Article
Investigation of Machine Learning Model Flexibility for Automatic Application of Reverberation Effect on Audio Signal
by Mantas Tamulionis, Tomyslav Sledevič and Artūras Serackis
Appl. Sci. 2023, 13(9), 5604; https://doi.org/10.3390/app13095604 - 1 May 2023
Viewed by 1405
Abstract
This paper discusses an algorithm that attempts to automatically calculate the effect of room reverberation by training a mathematical model based on a recurrent neural network on anechoic and reverberant sound samples. Modelling the room impulse response (RIR) recorded at a 44.1 kHz [...] Read more.
This paper discusses an algorithm that attempts to automatically calculate the effect of room reverberation by training a mathematical model based on a recurrent neural network on anechoic and reverberant sound samples. Modelling the room impulse response (RIR) recorded at a 44.1 kHz sampling rate using a system identification-based approach in the time domain, even with deep learning models, is prohibitively complex and it is almost impossible to automatically learn the parameters of the model for a reverberation time longer than 1 s. Therefore, this paper presents a method to model a reverberated audio signal in the frequency domain. To reduce complexity, the spectrum is analyzed on a logarithmic scale, based on the subjective characteristics of human hearing, by calculating 10 octaves in the range 20–20,000 Hz and dividing each octave by 1/3 or 1/12 of the bandwidth. This maintains equal resolution at high, mid, and low frequencies. The study examines three different recurrent network structures: LSTM, BiLSTM, and GRU, comparing the different sizes of the two hidden layers. The experimental study was carried out to compare the modelling when each octave of the spectrum is divided into a different number of bands, as well as to assess the feasibility of using a single model to predict the spectrum of a reverberated audio in adjacent frequency bands. The paper also presents and describes in detail a new RIR dataset that, although synthetic, is calibrated with recorded impulses. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

13 pages, 2301 KiB  
Article
COVID-19 Detection Model with Acoustic Features from Cough Sound and Its Application
by Sera Kim, Ji-Young Baek and Seok-Pil Lee
Appl. Sci. 2023, 13(4), 2378; https://doi.org/10.3390/app13042378 - 13 Feb 2023
Cited by 7 | Viewed by 2316
Abstract
Contrary to expectations that the coronavirus pandemic would terminate quickly, the number of people infected with the virus did not decrease worldwide and coronavirus-related deaths continue to occur every day. The standard COVID-19 diagnostic test technique used today, PCR testing, requires professional staff [...] Read more.
Contrary to expectations that the coronavirus pandemic would terminate quickly, the number of people infected with the virus did not decrease worldwide and coronavirus-related deaths continue to occur every day. The standard COVID-19 diagnostic test technique used today, PCR testing, requires professional staff and equipment, which is expensive and takes a long time to produce test results. In this paper, we propose a feature set consisting of four features: MFCC, Δ2-MFCC, Δ-MFCC, and spectral contrast as a feature set optimized for the diagnosis of COVID-19, and apply it to a model that combines ResNet-50 and DNN. Crowdsourcing datasets from Cambridge, Coswara, and COUGHVID are used as the cough sound data for our study. Through direct listening and inspection of the dataset, audio recordings that contained only cough sounds were collected and used for training. The model was trained and tested using cough sound features extracted from crowdsourced cough data and had a sensitivity and specificity of 0.95 and 0.96, respectively. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

21 pages, 6990 KiB  
Article
Experimental Assessment of the Acoustic Performance of Nozzles Designed for Clean Agent Fire Suppression
by Marco Strianese, Nicolò Torricelli, Luca Tarozzi and Paolo E. Santangelo
Appl. Sci. 2023, 13(1), 186; https://doi.org/10.3390/app13010186 - 23 Dec 2022
Cited by 2 | Viewed by 2006
Abstract
Discharge through nozzles used in gas-based fire protection of data centers may generate noise that causes the performance of hard drives to decay considerably; silent nozzles are employed to limit this harmful effect. This work focuses on proposing an experimental methodology to assess [...] Read more.
Discharge through nozzles used in gas-based fire protection of data centers may generate noise that causes the performance of hard drives to decay considerably; silent nozzles are employed to limit this harmful effect. This work focuses on proposing an experimental methodology to assess the impact of sound emitted by gaseous jets by comparing various nozzles under several operating conditions, together with relating that impact to design parameters. A setup was developed and repeatability of the experiments was evaluated; standard and silent nozzles were tested regarding the discharge of inert gases and halocarbon compounds. The ability of silent nozzles to contain the emitted noise—generally below the 110 dB reference threshold—was proven effective; a relationship between Reynolds number and peak noise level is suggested to support the reported increase in noise maxima as released flow rate increases. Hard drives with lower speed were the most affected. Spectral analysis was conducted, with sound at the higher frequency range causing performance decay even if lower than the acknowledged threshold. Independence of emitted noise from the selected clean agent was also observed in terms of released volumetric flow rate, yet the denser the fluid, the lower the generated noise under the same released mass flow rate. Full article
(This article belongs to the Special Issue New Advances in Audio Signal Processing)
Show Figures

Figure 1

Back to TopTop