applsci-logo

Journal Browser

Journal Browser

Audio Signal Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (15 March 2016) | Viewed by 224592

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Department of Signal Processing and Acoustics, School of Electrical Engineering, Aalto University, P.O. Box 13000 FI-00076 Aalto, Espoo, Finland
Interests: acoustic signal processing; audio signal processing; audio systems; music technology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Audio signal processing is a highly active research field where digital signal processing theory meets human sound perception and real-time programming requirements. It has a wide range of applications in computers, gaming, and music technology, to name a few of the largest areas. Successful applications include for example perceptual audio coding, digital music synthesizers, and music recognition software. The fact that music is now often listened to using headphones from a mobile device leads to new problems related to background noise control and signal enhancement. Developments in processor technology, such as parallel computing, are changing the way signal-processing algorithms are designed for audio.

In this Special Issue we want to address recent advances in the following topics:

-          Audio signal analysis
-          Music information retrieval
-          Enhancement and restoration of audio
-          Audio equalization and filtering
-          Audio effects processing
-          Sound synthesis and modeling
-          Audio coding
-          Sound capture and noise control
-          Sound source separation
-          Room acoustics and spatial audio
-          Signal processing for headphones and loudspeakers
-          High-performance computing in audio

Submissions are invited for both original research and review articles. Additionally, invited papers based on excellent contributions to recent conferences in this field will be included in this Special Issue. It is hoped that this collection of high-quality works in audio signal processing will serve as an inspiration for future research in this field.

Prof. Dr. Vesa Valimaki
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • audio signal analysis
  • music information retrieval
  • enhancement and restoration of audio
  • audio equalization and filtering
  • audio effects processing
  • sound synthesis and modeling
  • audio coding
  • sound capture and noise control
  • sound source separation
  • room acoustics and spatial audio
  • signal processing for headphones and loudspeakers
  • high-performance computing in audio

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

714 KiB  
Article
Sinusoidal Parameter Estimation Using Quadratic Interpolation around Power-Scaled Magnitude Spectrum Peaks
by Kurt James Werner and François Georges Germain
Appl. Sci. 2016, 6(10), 306; https://doi.org/10.3390/app6100306 - 21 Oct 2016
Cited by 11 | Viewed by 7687
Abstract
The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on [...] Read more.
The magnitude of the Discrete Fourier Transform (DFT) of a discrete-time signal has a limited frequency definition. Quadratic interpolation over the three DFT samples surrounding magnitude peaks improves the estimation of parameters (frequency and amplitude) of resolved sinusoids beyond that limit. Interpolating on a rescaled magnitude spectrum using a logarithmic scale has been shown to improve those estimates. In this article, we show how to heuristically tune a power scaling parameter to outperform linear and logarithmic scaling at an equivalent computational cost. Although this power scaling factor is computed heuristically rather than analytically, it is shown to depend in a structured way on window parameters. Invariance properties of this family of estimators are studied and the existence of a bias due to noise is shown. Comparing to two state-of-the-art estimators, we show that an optimized power scaling has a lower systematic bias and lower mean-squared-error in noisy conditions for ten out of twelve common windowing functions. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

2739 KiB  
Article
Passive Guaranteed Simulation of Analog Audio Circuits: A Port-Hamiltonian Approach
by Antoine Falaize and Thomas Hélie
Appl. Sci. 2016, 6(10), 273; https://doi.org/10.3390/app6100273 - 24 Sep 2016
Cited by 57 | Viewed by 7648
Abstract
We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On [...] Read more.
We present a method that generates passive-guaranteed stable simulations of analog audio circuits from electronic schematics for real-time issues. On one hand, this method is based on a continuous-time power-balanced state-space representation structured into its energy-storing parts, dissipative parts, and external sources. On the other hand, a numerical scheme is especially designed to preserve this structure and the power balance. These state-space structures define the class of port-Hamiltonian systems. The derivation of this structured system associated with the electronic circuit is achieved by an automated analysis of the interconnection network combined with a dictionary of models for each elementary component. The numerical scheme is based on the combination of finite differences applied on the state (with respect to the time variable) and on the total energy (with respect to the state). This combination provides a discrete-time version of the power balance. This set of algorithms is valid for both the linear and nonlinear case. Finally, three applications of increasing complexities are given: a diode clipper, a common-emitter bipolar-junction transistor amplifier, and a wah pedal. The results are compared to offline simulations obtained from a popular circuit simulator. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

8755 KiB  
Article
Adaptive Wavelet Threshold Denoising Method for Machinery Sound Based on Improved Fruit Fly Optimization Algorithm
by Jing Xu, Zhongbin Wang, Chao Tan, Lei Si, Lin Zhang and Xinhua Liu
Appl. Sci. 2016, 6(7), 199; https://doi.org/10.3390/app6070199 - 6 Jul 2016
Cited by 35 | Viewed by 7947
Abstract
As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate [...] Read more.
As the sound signal of a machine contains abundant information and is easy to measure, acoustic-based monitoring or diagnosis systems exhibit obvious superiority, especially in some extreme conditions. However, the sound directly collected from industrial field is always polluted. In order to eliminate noise components from machinery sound, a wavelet threshold denoising method optimized by an improved fruit fly optimization algorithm (WTD-IFOA) is proposed in this paper. The sound is firstly decomposed by wavelet transform (WT) to obtain coefficients of each level. As the wavelet threshold functions proposed by Donoho were discontinuous, many modified functions with continuous first and second order derivative were presented to realize adaptively denoising. However, the function-based denoising process is time-consuming and it is difficult to find optimal thresholds. To overcome these problems, fruit fly optimization algorithm (FOA) was introduced to the process. Moreover, to avoid falling into local extremes, an improved fly distance range obeying normal distribution was proposed on the basis of original FOA. Then, sound signal of a motor was recorded in a soundproof laboratory, and Gauss white noise was added into the signal. The simulation results illustrated the effectiveness and superiority of the proposed approach by a comprehensive comparison among five typical methods. Finally, an industrial application on a shearer in coal mining working face was performed to demonstrate the practical effect. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

1219 KiB  
Article
Eluding the Physical Constraints in a Nonlinear Interaction Sound Synthesis Model for Gesture Guidance
by Etienne Thoret, Mitsuko Aramaki, Charles Gondre, Sølvi Ystad and Richard Kronland-Martinet
Appl. Sci. 2016, 6(7), 192; https://doi.org/10.3390/app6070192 - 30 Jun 2016
Cited by 5 | Viewed by 5329
Abstract
In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the [...] Read more.
In this paper, a flexible control strategy for a synthesis model dedicated to nonlinear friction phenomena is proposed. This model enables to synthesize different types of sound sources, such as creaky doors, singing glasses, squeaking wet plates or bowed strings. Based on the perceptual stance that a sound is perceived as the result of an action on an object we propose a genuine source/filter synthesis approach that enables to elude physical constraints induced by the coupling between the interacting objects. This approach makes it possible to independently control and freely combine the action and the object. Different implementations and applications related to computer animation, gesture learning for rehabilitation and expert gestures are presented at the end of this paper. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

3651 KiB  
Article
Modal Processor Effects Inspired by Hammond Tonewheel Organs
by Kurt James Werner and Jonathan S. Abel
Appl. Sci. 2016, 6(7), 185; https://doi.org/10.3390/app6070185 - 28 Jun 2016
Cited by 5 | Viewed by 8589
Abstract
In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical [...] Read more.
In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

651 KiB  
Article
Metrics for Polyphonic Sound Event Detection
by Annamaria Mesaros, Toni Heittola and Tuomas Virtanen
Appl. Sci. 2016, 6(6), 162; https://doi.org/10.3390/app6060162 - 25 May 2016
Cited by 412 | Viewed by 22437
Abstract
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected [...] Read more.
This paper presents and discusses various metrics proposed for evaluation of polyphonic sound event detection systems used in realistic situations where there are typically multiple sound sources active simultaneously. The system output in this case contains overlapping events, marked as multiple sounds detected as being active at the same time. The polyphonic system output requires a suitable procedure for evaluation against a reference. Metrics from neighboring fields such as speech recognition and speaker diarization can be used, but they need to be partially redefined to deal with the overlapping events. We present a review of the most common metrics in the field and the way they are adapted and interpreted in the polyphonic case. We discuss segment-based and event-based definitions of each metric and explain the consequences of instance-based and class-based averaging using a case study. In parallel, we provide a toolbox containing implementations of presented metrics. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1977 KiB  
Article
Chord Recognition Based on Temporal Correlation Support Vector Machine
by Zhongyang Rao, Xin Guan and Jianfu Teng
Appl. Sci. 2016, 6(5), 157; https://doi.org/10.3390/app6050157 - 19 May 2016
Cited by 8 | Viewed by 7862
Abstract
In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the [...] Read more.
In this paper, we propose a method called temporal correlation support vector machine (TCSVM) for automatic major-minor chord recognition in audio music. We first use robust principal component analysis to separate the singing voice from the music to reduce the influence of the singing voice and consider the temporal correlations of the chord features. Using robust principal component analysis, we expect the low-rank component of the spectrogram matrix to contain the musical accompaniment and the sparse component to contain the vocal signals. Then, we extract a new logarithmic pitch class profile (LPCP) feature called enhanced LPCP from the low-rank part. To exploit the temporal correlation among the LPCP features of chords, we propose an improved support vector machine algorithm called TCSVM. We perform this study using the MIREX’09 (Music Information Retrieval Evaluation eXchange) Audio Chord Estimation dataset. Furthermore, we conduct comprehensive experiments using different pitch class profile feature vectors to examine the performance of TCSVM. The results of our method are comparable to the state-of-the-art methods that entered the MIREX in 2013 and 2014 for the MIREX’09 Audio Chord Estimation task dataset. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

975 KiB  
Article
Two-Polarisation Physical Model of Bowed Strings with Nonlinear Contact and Friction Forces, and Application to Gesture-Based Sound Synthesis
by Charlotte Desvages and Stefan Bilbao
Appl. Sci. 2016, 6(5), 135; https://doi.org/10.3390/app6050135 - 10 May 2016
Cited by 20 | Viewed by 10445
Abstract
Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation [...] Read more.
Recent bowed string sound synthesis has relied on physical modelling techniques; the achievable realism and flexibility of gestural control are appealing, and the heavier computational cost becomes less significant as technology improves. A bowed string sound synthesis algorithm is designed, by simulating two-polarisation string motion, discretising the partial differential equations governing the string’s behaviour with the finite difference method. A globally energy balanced scheme is used, as a guarantee of numerical stability under highly nonlinear conditions. In one polarisation, a nonlinear contact model is used for the normal forces exerted by the dynamic bow hair, left hand fingers, and fingerboard. In the other polarisation, a force-velocity friction curve is used for the resulting tangential forces. The scheme update requires the solution of two nonlinear vector equations. The dynamic input parameters allow for simulating a wide range of gestures; some typical bow and left hand gestures are presented, along with synthetic sound and video demonstrations. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1211 KiB  
Article
Dynamical Systems for Audio Synthesis: Embracing Nonlinearities and Delay-Free Loops
by David Medine
Appl. Sci. 2016, 6(5), 134; https://doi.org/10.3390/app6050134 - 10 May 2016
Cited by 7 | Viewed by 5677
Abstract
Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is [...] Read more.
Many systems featuring nonlinearities and delay-free loops are of interest in digital audio, particularly in virtual analog and physical modeling applications. Many of these systems can be posed as systems of implicitly related ordinary differential equations. Provided each equation in the network is itself an explicit one, straightforward numerical solvers may be employed to compute the output of such systems without resorting to linearization or matrix inversions for every parameter change. This is a cheap and effective means for synthesizing delay-free, nonlinear systems without resorting to large lookup tables, iterative methods, or the insertion of fictitious delay and is therefor suitable for real-time applications. Several examples are shown to illustrate the efficacy of this approach. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

1876 KiB  
Article
Psychoacoustic Approaches for Harmonic Music Mixing
by Roman B. Gebhardt, Matthew E. P. Davies and Bernhard U. Seeber
Appl. Sci. 2016, 6(5), 123; https://doi.org/10.3390/app6050123 - 3 May 2016
Cited by 10 | Viewed by 7595
Abstract
The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software, [...] Read more.
The practice of harmonic mixing is a technique used by DJs for the beat-synchronous and harmonic alignment of two or more pieces of music. In this paper, we present a new harmonic mixing method based on psychoacoustic principles. Unlike existing commercial DJ-mixing software, which determines compatible matches between songs via key estimation and harmonic relationships in the circle of fifths, our approach is built around the measurement of musical consonance. Given two tracks, we first extract a set of partials using a sinusoidal model and average this information over sixteenth note temporal frames. By scaling the partials of one track over ±6 semitones (in 1/8th semitone steps), we determine the pitch-shift that maximizes the consonance of the resulting mix. For this, we measure the consonance between all combinations of dyads within each frame according to psychoacoustic models of roughness and pitch commonality. To evaluate our method, we conducted a listening test where short musical excerpts were mixed together under different pitch shifts and rated according to consonance and pleasantness. Results demonstrate that sensory roughness computed from a small number of partials in each of the musical audio signals constitutes a reliable indicator to yield maximum perceptual consonance and pleasantness ratings by musically-trained listeners. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1223 KiB  
Article
Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids
by Marcelo Caetano, George P. Kafentzis, Athanasios Mouchtaris and Yannis Stylianou
Appl. Sci. 2016, 6(5), 127; https://doi.org/10.3390/app6050127 - 2 May 2016
Cited by 8 | Viewed by 7500
Abstract
Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this [...] Read more.
Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this work, we propose a full-band representation that fits sinusoids across the entire spectrum. We use the extended adaptive Quasi-Harmonic Model (eaQHM) to iteratively estimate amplitude- and frequency-modulated (AM–FM) sinusoids able to capture challenging features such as sharp attacks, transients, and instrumental noise. We use the signal-to-reconstruction-error ratio (SRER) as the objective measure for the analysis and synthesis of 89 musical instrument sounds from different instrumental families. We compare against quasi-stationary sinusoids and exponentially damped sinusoids. First, we show that the SRER increases with adaptation in eaQHM. Then, we show that full-band modeling with eaQHM captures partials at the higher frequency end of the spectrum that are neglected by spectral decomposition. Finally, we demonstrate that a frame size equal to three periods of the fundamental frequency results in the highest SRER with AM–FM sinusoids from eaQHM. A listening test confirmed that the musical instrument sounds resynthesized from full-band analysis with eaQHM are virtually perceptually indistinguishable from the original recordings. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1119 KiB  
Article
Augmenting Environmental Interaction in Audio Feedback Systems
by Seunghun Kim, Graham Wakefield and Juhan Nam
Appl. Sci. 2016, 6(5), 125; https://doi.org/10.3390/app6050125 - 28 Apr 2016
Cited by 2 | Viewed by 5939
Abstract
Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by [...] Read more.
Audio feedback is defined as a positive feedback of acoustic signals where an audio input and output form a loop, and may be utilized artistically. This article presents new context-based controls over audio feedback, leading to the generation of desired sonic behaviors by enriching the influence of existing acoustic information such as room response and ambient noise. This ecological approach to audio feedback emphasizes mutual sonic interaction between signal processing and the acoustic environment. Mappings from analyses of the received signal to signal-processing parameters are designed to emphasize this specificity as an aesthetic goal. Our feedback system presents four types of mappings: approximate analyses of room reverberation to tempo-scale characteristics, ambient noise to amplitude and two different approximations of resonances to timbre. These mappings are validated computationally and evaluated experimentally in different acoustic conditions. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

11253 KiB  
Article
Blockwise Frequency Domain Active Noise Controller Over Distributed Networks
by Christian Antoñanzas, Miguel Ferrer, Maria De Diego and Alberto Gonzalez
Appl. Sci. 2016, 6(5), 124; https://doi.org/10.3390/app6050124 - 28 Apr 2016
Cited by 14 | Viewed by 4620
Abstract
This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is [...] Read more.
This work presents a practical active noise control system composed of distributed and collaborative acoustic nodes. To this end, experimental tests have been carried out in a listening room with acoustic nodes equipped with loudspeakers and microphones. The communication among the nodes is simulated by software. We have considered a distributed algorithm based on the Filtered-x Least Mean Square (FxLMS) method that introduces collaboration between nodes following an incremental strategy. For improving the processing efficiency in practical scenarios where data acquisition systems work by blocks of samples, the frequency-domain partitioned block technique has been used. Implementation aspects such as computational complexity, processing time of the network and convergence of the algorithm have been analyzed. Experimental results show that, without constraints in the network communications, the proposed distributed algorithm achieves the same performance as the centralized version. The performance of the proposed algorithm over a network with a given communication delay is also included. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1309 KiB  
Article
Influence of the Quality of Consumer Headphones in the Perception of Spatial Audio
by Pablo Gutierrez-Parera and Jose J. Lopez
Appl. Sci. 2016, 6(4), 117; https://doi.org/10.3390/app6040117 - 22 Apr 2016
Cited by 5 | Viewed by 8666
Abstract
High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between [...] Read more.
High quality headphones can generate a realistic sound immersion reproducing binaural recordings. However, most people commonly use consumer headphones of inferior quality, as the ones provided with smartphones or music players. Factors, such as weak frequency response, distortion and the sensitivity disparity between the left and right transducers could be some of the degrading factors. In this work, we are studying how these factors affect spatial perception. To this purpose, a series or perceptual tests have been carried out with a virtual headphone listening test methodology. The first experiment focuses on the analysis of how the disparity of sensitivity between the two transducers affects the final result. The second test studies the influence of the frequency response relating quality and spatial impression. The third test analyzes the effects of distortion using a Volterra kernels scheme for the simulation of the distortion using convolutions. Finally, the fourth tries to relate the quality of the frequency response with the accuracy on azimuth localization. The conclusions of the experiments are: the disparity between both transducers can affect the localization of the source; the perception of quality and spatial impression has a high correlation; the distortion produced by the range of headphones tested at a fixed level does not affect the perception of binaural sound; and that some frequency bands have an important role in the front-back confusions. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

1413 KiB  
Article
Semantically Controlled Adaptive Equalisation in Reduced Dimensionality Parameter Space
by Spyridon Stasis, Ryan Stables and Jason Hockman
Appl. Sci. 2016, 6(4), 116; https://doi.org/10.3390/app6040116 - 20 Apr 2016
Cited by 17 | Viewed by 6365
Abstract
Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality [...] Read more.
Equalisation is one of the most commonly-used tools in sound production, allowing users to control the gains of different frequency components in an audio signal. In this paper we present a model for mapping a set of equalisation parameters to a reduced dimensionality space. The purpose of this approach is to allow a user to interact with the system in an intuitive way through both the reduction of the number of parameters and the elimination of technical knowledge required to creatively equalise the input audio. The proposed model represents 13 equaliser parameters on a two-dimensional plane, which is trained with data extracted from a semantic equalisation plug-in, using the timbral adjectives warm and bright. We also include a parameter weighting stage in order to scale the input parameters to spectral features of the audio signal, making the system adaptive. To maximise the efficacy of the model, we evaluate a variety of dimensionality reduction and regression techniques, assessing the performance of both parameter reconstruction and structural preservation in low-dimensional space. After selecting an appropriate model based on the evaluation criteria, we conclude by subjectively evaluating the system using listening tests. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

1736 KiB  
Article
Frequency-Dependent Amplitude Panning for the Stereophonic Image Enhancement of Audio Recorded Using Two Closely Spaced Microphones
by Chan Jun Chun and Hong Kook Kim
Appl. Sci. 2016, 6(2), 39; https://doi.org/10.3390/app6020039 - 1 Feb 2016
Cited by 5 | Viewed by 5551
Abstract
In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak [...] Read more.
In this paper, we propose a new frequency-dependent amplitude panning method for stereophonic image enhancement applied to a sound source recorded using two closely spaced omni-directional microphones. The ability to detect the direction of such a sound source is limited due to weak spatial information, such as the inter-channel time difference (ICTD) and inter-channel level difference (ICLD). Moreover, when sound sources are recorded in a convolutive or a real room environment, the detection of sources is affected by reverberation effects. Thus, the proposed method first tries to estimate the source direction depending on the frequency using azimuth-frequency analysis. Then, a frequency-dependent amplitude panning technique is proposed to enhance the stereophonic image by modifying the stereophonic law of sines. To demonstrate the effectiveness of the proposed method, we compare its performance with that of a conventional method based on the beamforming technique in terms of directivity pattern, perceived direction, and quality degradation under three different recording conditions (anechoic, convolutive, and real reverberant). The comparison shows that the proposed method gives us better stereophonic images in a stereo loudspeaker reproduction than the conventional method without any annoying effects. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

4064 KiB  
Article
Auralization of Accelerating Passenger Cars Using Spectral Modeling Synthesis
by Reto Pieren, Thomas Bütler and Kurt Heutschi
Appl. Sci. 2016, 6(1), 5; https://doi.org/10.3390/app6010005 - 24 Dec 2015
Cited by 44 | Viewed by 9307
Abstract
While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the [...] Read more.
While the technique of auralization has been in use for quite some time in architectural acoustics, the application to environmental noise has been discovered only recently. With road traffic noise being the dominant noise source in most countries, particular interest lies in the synthesis of realistic pass-by sounds. This article describes an auralizator for pass-bys of accelerating passenger cars. The key element is a synthesizer that simulates the acoustical emission of different vehicles, driving on different surfaces, under different operating conditions. Audio signals for the emitted tire noise, as well as the propulsion noise are generated using spectral modeling synthesis, which gives complete control of the signal characteristics. The sound of propulsion is synthesized as a function of instantaneous engine speed, engine load and emission angle, whereas the sound of tires is created in dependence of vehicle speed and emission angle. The sound propagation is simulated by applying a series of time-variant digital filters. To obtain the corresponding steering parameters of the synthesizer, controlled experiments were carried out. The tire noise parameters were determined from coast-by measurements of passenger cars with idling engines. To obtain the propulsion noise parameters, measurements at different engine speeds, engine loads and emission angles were performed using a chassis dynamometer. The article shows how, from the measured data, the synthesizer parameters are calculated using audio signal processing. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

Review

Jump to: Research

789 KiB  
Review
A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
by Francesc Alías, Joan Claudi Socoró and Xavier Sevillano
Appl. Sci. 2016, 6(5), 143; https://doi.org/10.3390/app6050143 - 12 May 2016
Cited by 169 | Viewed by 19686
Abstract
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze [...] Read more.
Endowing machines with sensing capabilities similar to those of humans is a prevalent quest in engineering and computer science. In the pursuit of making computers sense their surroundings, a huge effort has been conducted to allow machines and computers to acquire, process, analyze and understand their environment in a human-like way. Focusing on the sense of hearing, the ability of computers to sense their acoustic environment as humans do goes by the name of machine hearing. To achieve this ambitious aim, the representation of the audio signal is of paramount importance. In this paper, we present an up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds. Besides revisiting classic approaches for completeness, we include the latest advances in the field based on new domains of analysis together with novel bio-inspired proposals. These approaches are described following a taxonomy that organizes them according to their physical or perceptual basis, being subsequently divided depending on the domain of computation (time, frequency, wavelet, image-based, cepstral, or other domains). The description of the approaches is accompanied with recent examples of their application to machine hearing related problems. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1883 KiB  
Review
All About Audio Equalization: Solutions and Frontiers
by Vesa Välimäki and Joshua D. Reiss
Appl. Sci. 2016, 6(5), 129; https://doi.org/10.3390/app6050129 - 6 May 2016
Cited by 111 | Viewed by 32303
Abstract
Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in [...] Read more.
Audio equalization is a vast and active research area. The extent of research means that one often cannot identify the preferred technique for a particular problem. This review paper bridges those gaps, systemically providing a deep understanding of the problems and approaches in audio equalization, their relative merits and applications. Digital signal processing techniques for modifying the spectral balance in audio signals and applications of these techniques are reviewed, ranging from classic equalizers to emerging designs based on new advances in signal processing and machine learning. Emphasis is placed on putting the range of approaches within a common mathematical and conceptual framework. The application areas discussed herein are diverse, and include well-defined, solvable problems of filter design subject to constraints, as well as newly emerging challenges that touch on problems in semantics, perception and human computer interaction. Case studies are given in order to illustrate key concepts and how they are applied in practice. We also recommend preferred signal processing approaches for important audio equalization problems. Finally, we discuss current challenges and the uncharted frontiers in this field. The source code for methods discussed in this paper is made available at https://code.soundsoftware.ac.uk/projects/allaboutaudioeq. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Graphical abstract

1618 KiB  
Review
A Review of Time-Scale Modification of Music Signals
by Jonathan Driedger and Meinard Müller
Appl. Sci. 2016, 6(2), 57; https://doi.org/10.3390/app6020057 - 18 Feb 2016
Cited by 61 | Viewed by 27435
Abstract
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. [...] Read more.
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures. Full article
(This article belongs to the Special Issue Audio Signal Processing)
Show Figures

Figure 1

Back to TopTop