1. Introduction
The human auditory system can sense and process sound information directly related to spatial structure parameters and, on this basis, assess the sound quality of the spatial sound field. Sandvad [
1] found that listeners can accurately identify the room from which the sound comes among a large number of different room pictures based on the indoor sound field binaural signals played back via acoustic devices. McGrath [
2] found that both listeners with normal vision (both eyes covered) and blind listeners can identify the size of the indoor space through their own voices and other occasional sounds in the room. Blind listeners can complete the hearing tasks more quickly and accurately [
3]. Therefore, in-depth research on the impact of the structural parameter on the auditory perception is of great significance, and can both explore human spatial auditory mechanism as well as test the effectiveness of the auditory virtual simulation technology.
Theoretically, the above research on subjective evaluation should be conducted by playing sound samples in different rooms with variable structural parameters, such as volume, shape, and wall absorption coefficient. However, this is usually inefficient due to the requirements for experimental environments and time-consuming in practice. As a result, the virtual sound synthesis technology and acoustic software have occurred to meet the practical requirement, by which the acoustic signals generated at different receiving points in the rooms can be simulated. After that, subjective experiments can be conducted, and the impacting rules of changed structural parameters on auditory perception can be explored.
With the help of digital signal processing and binaural technology, which is an actual reproduction of sounds receiving in the listener’s eardrum, acoustic signals generated in real rooms can be synthesized without experimental measurement. Therefore, psychoacoustic experimental research in complex auditory environments, such as concert halls, can be conducted simulatively. By simulating the auditory scene, not only can the rapid switching between different types of spatial sound fields be easily realized, but the acoustic signals can also be acquired and played back efficiently. Meanwhile, the requirements for the listeners’ auditory memory decline. Currently, binaural technology is primarily used in virtual reality to restore a real listening experience [
4,
5,
6,
7]. Focused on better recovering the real listening experience in virtual hearing, some other researches involved Head-Related Transfer Function (HRTF), a vital clue that describes the filtering effect of body surface of a listener as a sound propagates from the source to the ear drum in free space, to restore the spatialization cues of virtual sounds [
8,
9,
10].
Based on the above-mentioned researches, this study also uses binaural technology to simulate binaural signals generated in closed spaces with different structural parameters. Particularly, the software ODEON 8.0 is firstly used to obtain the room impulse responses at different receiving points in different rooms with variable structural parameters. Then, the binaural signals at different receiving points in real rooms with variable structural parameters are synthesized by using the binaural technology, and the subjective experiment is conducted by using audio devices of headphone, power amplifier and computer.
There are many researches conducted by using subjective experiments. Hyung [
11] investigated the effects of diffusing elements on sound field diffuseness by subjective evaluations using binaural impulse responses (BIRs) in seating areas in a surround hall. Their subjective experiment was used to quantitatively evaluate the combination of acoustic parameters and degree of diffusion. Burno [
12] conducted a subjective audio quality evaluation experiment to assess the performance of a precompensation algorithm for mitigating perceptible linear and nonlinear distortion in audio signals. The subjective experiment in his study used a degradation category scale to quantitatively evaluate the sound quality. Some researchers even introduced the anchor semantic differential (ASD) method to evaluate subjective annoyance of noises, and modeled the results of subjective evaluation by nonlinear methods to make further predictions of sound quality [
13]. By subjective experiment, Miguel [
14] established a methodology to define valid evaluation scales for different collectives and determine evaluation criteria related to the overall assessment of music hall acoustics. Their subjective experiment was on-site and evaluated some parameters such as reverberant, resounding, and balanced by grading. Lee [
15] conducted two subjective experiments for evaluating floor vibrations induced by walking barefoot in heavyweight buildings, one was rating the intensity of the vibrations and the acceptability and serviceability of the floors when the subject himself walked across, and the other was to rating the floor vibrations when a walker passed by the subject. Kurt [
16] conducted subjective evaluations to compare driving experience in real cars and virtual environment. The comparison evaluation was quantified by using semantic differential method. Fotis [
17] explored a method for auralization of a car pass-by in a street from using a wave-based acoustic prediction. A subjective evaluation based on pair comparison was carried out in order to detect the maximum spacing between the discrete source positions that still can produce a perceived continuous car pass-by auralization.
Our study aims to evaluate the changed auditory perception in eight simulated closed spaces with variable structural parameters. In this view, pair comparison method matches our purpose best and be selected in our subjective experiment, by which the preferable stimulus can be selected.
Both the virtual sound synthesis technology and the experimental method used in this study are not innovative. But, the main contribution of this research lies in analyzing the impact of changed structural parameters on the auditory perception in preference of three typical music sound types for those acoustic environments with higher auditory perception requirements such as concert hall. Eight closed-space models with different volume, shape, and wall absorption coefficient are simulated by ODEON 8.0. Then, 96 simulating binaural signals in those models are synthesized by binaural technology. By using pair comparison method in the subjective experiment, structural parameters for achieving preferable auditory perception of the three testing musical sounds are concluded. The conclusions can be a reference for the design of concert hall on purpose of achieving better listening experience.
2. Feasibility on ODEON-Based Auditory Scene Simulation in Closed Spaces
ODEON [
18,
19] is an architectural acoustic simulation software based on geometric acoustics developed by the Technical University of Denmark. The software uses the hybrid method [
20] combined with the virtual sound source method [
21,
22] and the sound ray tracking method [
23,
24,
25] as well as the second source method [
26], which can calculate the internal sound field of various types of closed spaces and simulate the noise control measures to be taken and their effects. Many scholars [
27,
28,
29,
30] have studied and confirmed the credibility of the software in simulating indoor sound field. In this paper, we will use hearing experiments to study the feasibility of ODEON 8.0 in simulating simplified closed-field auditory scenes.
First, the binaural signals in two real rooms are collected by a spherical sound source and a dummy head (see
Table 1 for details). Then, ODEON 8.0 is used to establish the indoor sound field models of the two rooms in order to simulate the corresponding binaural room impulse responses. After that, simulated binaural signals can be achieved by means of virtual auditory technology, referring the process of simulating the transmission of sound waves to both ears meanwhile obtaining the information of time and space. Finally, after the hearing experiment is designed and completed, the degree of consistency in the auditory perception is compared between the actual binaural signals recorded in the real rooms and the simulated binaural signal by use of ODEON 8.0. The feasibility of the software in spatial sound field auditory perception research is tested.
2.1. Binaural Signal Collection in Real Rooms
In two ordinary rooms with different spatial sizes (
Figure 1), the original signal (the sound of Windows 7 system startup, lasting ~3 s) is played with a spherical sound source and the binaural signals are collected with a dummy head. The structural parameters of the two rooms are shown in
Table 2, where L, W, and H represent the length, width, and height of the closed space, respectively, S is the internal surface area, and V is the volume. The midpoint of the short side of the room’s bottom is selected as the coordinate origin to establish a Cartesian coordinate system, in which the vertex coordinates of the windows in the rooms are Room A (2.45, −3.25, 1.00), (2.45, −3.25, 2.40), (0.75, −3.25, 2.40), (0.75, −3.25, 1.00); Room B (5.70, 0.50, 1.00), (5.70, 0.50, 2.40), (5.70, −1.00, 2.40), (5.70, −1.00, 1.00).
In Room A, the spherical sound sources are placed at two different positions denoted as P
A1 (3.60, 0, 1.50) and P
A2 (1.60, −2.25, 1.50), and the corresponding two dummy heads are set at receiving points R
A1 (1.60, 1.55, 1.13) and R
A2 (5.70, −1.55, 1.13). In Room B, the position coordinates of the sound sources and the receiving points are P
B1 (2.00, 0, 1.50), P
B2 (4.00, −0.30, 1.50), R
B1 (1.00, −1.00, 1.13), and R
B2 (5.00, 1.00, 1.13). The binaural signals from the two sound sources to the two receiving points in the room A are collected and recorded as P
A1R
A1, P
A1R
A2, P
A2R
A1, and P
A2R
A2. Additionally, the binaural signals in the room B are recorded as P
B1R
B1, P
B1R
B2, P
B2R
B1, and P
B2R
B2, respectively. The room windows remain closed during the signal collection. The positions of P
A1, P
A2, R
A1, R
A2, P
B1, P
B2, R
B1 and R
B2 are indicated in
Figure 2.
2.2. ODEON-Based Binaural Auditory Simulation in Virtual Rooms
With ODEON 8.0, we draw a rectangular room model, set sound absorption coefficient (concrete and glass), select sound sources and the receiving points (the position coordinates are the same as in the real room), establish sound field models of Room A and Room B (shown in
Figure 3) by means of sound ray tracking method (5000 sound rays), and calculate the eight binaural impulse responses (the response duration is 1 s for all points) in the virtual rooms. The reverberation times are 0.16 s for simulated room A and 0.2s for simulated room B, respectively. The original signal in
Section 2.1 is convolved with the eight impulse responses, respectively. Finally, the convolved left and right aural signals are played back with a properly balanced and adjusted headphone to produce “virtual hearing” that correspond to each receiving point in the real rooms. Before the listening test, all the headphones are connected with the power amplifier and adjusted at the same output level, with which the experimenter feels comfortable.
2.3. Evaluation Experiment of Auditory Perception Consistency between Real and Simulated Binaural Signals
For the auditory experiment, 20 listeners (undergraduates, postgraduates, and doctoral students majoring in acoustics aged between 20 and 30), who are familiar with the procedure of subjective listening test due to their former experience, are organized and the binaural signals are played back in pairs (“real” and “simulated”) through Sennheiser HD280 Pro headphones. The listeners are unaware of the difference between the stimuli in each pair, they are only instructed to judge whether each pair of sounds is aurally distinguishable (if the two stimuli sound different or same). According to the experimental results, the degree of aural consistency between “real” and “simulated” binaural signals is evaluated to test the feasibility and accuracy of using ODEON 8.0 for indoor sound field auditory simulation research.
2.4. Experimental Results
The evaluation results are analyzed and processed as shown in
Figure 4. The red bar denotes the number of listeners who sense that the “real” and “simulated” binaural signals are coincident; the black bar denotes the number of listeners who sense that the “real” and “simulated” binaural signals are different. The longer the bar, the more numerous the selected times.
Apparently, the “real” and “simulated” binaural signals are largely aurally consistent, with no obvious difference in terms of auditory perception. At significant level of α = 0.05, a t-test is employed to explore if there is significant difference between the selection of “same” and “different”. The result (p = 0.0126 < α) indicates that the selection of “same” and “different” are significantly different to each other. Therefore, with ODEON 8.0, a virtual auditory environment with high fidelity can be created. In conducting closed-space auditory simulations and related researches, using ODEON 8.0 to simulate indoor sound fields is feasible.
3. Auditory Perception Study of Sound Field in Closed Spaces with Different Structural Parameters
Similar as the method described in
Section 2.2, the closed-space models are established with ODEON 8.0, and the simulated binaural signals are synthesized to qualitatively study the impact of structural parameters of closed spaces on the auditory perception.
3.1. Experiment Details
3.1.1. Subjects
There were 20 participants in this subjective experiment, including 11 males and nine females, aged between 20 and 30. The participants were undergraduates, postgraduates, and doctoral students majoring in acoustics with normal hearing and subjective hearing test experience. All participants in this experiment were remunerated for their participation.
3.1.2. Binaural Signal Simulation
In this experiment, three kinds of musical sounds are selected for spatial auditory perception research, i.e., flute (wind instrument), violin (stringed instrument), and symphony (multiple instruments). The flute music used in the listening test is published by Sony Music, the violin, and symphony music are published by China National Symphony Orchestra. All three signals are without any auxiliary effect, so they are proper original signals.
In view of the influence of closed space volume on auditory perception, three cuboid space models with volumes of 24 m
3, 96 m
3, and 192 m
3 (see
Table 3 for specific dimensions) and thick pit sand mortar polished brick walls are established, which are denoted as a, b, and c. The positions of sound source and receiving points in each space are shown in
Table 4. The room sizes are selected randomly with referring the sizes of real rooms in
Section 2.
In view of the influence of inner surface absorption coefficient on auditory perception, four cuboid spatial models of thick pit sand mortar polished brick wall surface (the absorption coefficient of the wall is a function of frequency), α = 0.00 (total reflection), α = 0.50, and α = 1.00 (total sound absorption) with a volume of 192 m
3 are constructed, which are denoted as c, d, e, and f. The positions of sound source and receiving points in each space are the same as model c in
Table 4.
Considering the influence of space shapes on auditory perception, cuboid, semicylinder, and semisphere closed-space models are established and denoted as b, g, and h, respectively (see
Table 5 for specific dimensions). Assume the space volume is 96 m
3 and the wall surfaces are all thick pit sand mortar polished brick. The positions of sound source and receiving points in each space are shown in
Table 6.
To sum up, current research has established a total of eight closed-space models with different volumes, inner surface absorption coefficients, and space shapes. After determining the above experiment parameters, select the research type calculation level, set the number of sound lines to be 5000 (maximum number of sound lines, more accurate calculation result), and take 1000 ms as the response time of the binaural impulse. Then, the indoor binaural impulse responses of the above eight closed-space models are calculated. Next, the three original signals (flute, violin, and symphony with length of 10 s) are convolved with those binaural impulse responses to synthesize the binaural signals at different positions within different closed spaces, i.e., 8 (closed space) ×4 (receiving point) ×3 (original signal) = 96.
3.2. Experimental Process
This experiment adopts a pairwise comparison method, and the stimulus is presented to the subject through the headphone. Participants are asked to choose the more preferable audio in each pair of stimuli presented. The receiving points in different closed space that have the same coordinate index are arranged as a group to compare with each other. At each index, each of the musical sounds has a total of 34 pairs of stimuli, as shown in
Table 7. Therefore, the experiment includes 34 × 4 (receiving point index) × 3 (original signal) = 408 pairs of stimulation signals.
In the experiment, the experimenter plays the stimulus pairs on the computer. One stimulus signal is played for three times in succession, and the other is played (three times in succession) after an interval of 2 s. After each pair of stimuli is played, the subject has 5 s to select the one they preferred and felt more comfortable with and marks it on the questionnaire. Then, the next pair of stimuli is played until all 34 pairs at the receiving point have been played. After 10 min of rest, the stimulus pairs of another type of musical sound is played at the receiving point until the stimulation of all the three musical sounds is played. After that, a rest for half an hour is scheduled to eliminate hearing fatigue of the subjects, which may lead to a deviation of experiment results. Then, in the same way, the stimulus pairs of all three musical sounds are played at the next receiving point. Until the stimulation of the four receiving points has been played, the auditory experiment is finished.
In this experiment, the test sequences of the receiving points, of different musical sounds and of the stimulation pairs of the same receiving point and the same musical sound, are randomized by the Latin Square Scheme to eliminate the deviation of the experimental results that may be caused by the testing sequence.
The auditory experiment in current study is performed only once, and the subjects are required to make only one judgment for each pair of stimulation signals. The whole experiment lasts about 6.5 h.
3.3. Experimental Results
3.3.1. Summary of Results
First of all, analysis of variance (at significant level of α = 0.05) for the evaluation data given by all the participants proves that the effect (difference among the testing stimuli) is statistically significant (
p = 0.0095 < 0.05). The testing stimuli have significant effect on the participants. The listening test is valid. All the evaluation data given by the subjects are collected to perform cluster analysis to judge the validity of the evaluation results (as seen in
Figure 5). For this tree diagram, the y-axis denotes the distance of among each class in the cluster analysis. It refers to the difference among each class; the greater the distance, the greater the difference between the subject’s evaluation result and other subjects’.
According to the results of the cluster analysis, the evaluation data of No. 13 and No. 9 are significantly different from the evaluation results of other subjects, and should be eliminated. The rest of the data is valid in the subjective evaluation experiment.
For the three cuboid closed spaces, with volumes of 24 m
3, 96 m
3, and 192 m
3 (each with a thick pit sand mortar polished brick wall), the absolute variations of the corresponding space volume are ∣
∣ = 72 m
3, ∣
∣ = 96 m
3, and ∣
∣ = 168 m
3, respectively. The subjective test results of spatial auditory perception changes caused by volume are shown in
Figure 6.
It can be seen from
Figure 6 that, for the three different musical sounds, when the spatial volume changes, the influence in different receiving points on hearing is not obvious. When the volume is changing, the auditory perception of the flute music and the violin music change. Generally, the smaller the volume, the better the auditory perception of flute music and violin music will be. As for the symphony sound, its auditory perception does not change obviously in different spatial volumes.
For the four closed spaces, with absorption coefficients of 0, 0.5, and 1 (cases d, e, and f, respectively), and the inner surface of thick pit sand mortar polished brick (the absorption coefficient of the wall is a function of frequency, case c) in the same volume and shape, their sound absorptions are 0 m
2, 104 m
2, 208 m
2, and 31.98 m
2, respectively (where α of space c is calculated according to GB_T 3947-1996). The absolute variations of the corresponding spatial sound absorption are ∣
∣ = 31.98 m
2, ∣
∣ = 72.02 m
2, and ∣
∣ = 176.02 m
2. The subjective test results of the changing of the spatial auditory perception caused by the inner surface absorption coefficient are shown in
Figure 7.
It can be seen from
Figure 7 that when the inner surface absorption coefficient is changed, the spatial hearing changes caused by different receiving points of the three musical sounds are not obvious. In the extreme case of α = 0, the hearing effects of the three musical sounds are the worst. For flute music, when α = 1 the hearing effect is the best. For violin music, in the cases of α = 0.5 and α = 1, the hearing effect is relatively good with no obvious difference. For symphony music, in the three cases of α = 0.5, α = 1, and thick pit sand mortar polished brick wall surface, there is no obvious difference in hearing effect.
For the cuboid, cylinder, and hemisphere closed spaces with a volume of approximately 96 m
3, the changes in auditory perception are shown in
Figure 8. As can be seen from
Figure 8, for the flute music, the impact of changed receiving point on the auditory perception is very remarkable in the closed space of different shapes. At receiving point 1, the hemispherical space has the best auditory perception, and the other two shapes of space have similar auditory perception. At receiving point 2, the hemispherical space still has the best auditory perception; the auditory perceptions in rectangular and cylindrical spaces are also comparable. At receiving point 3, the auditory perception is optimal in the rectangular space, while the auditory perception in the other two shapes of spaces is also comparable. At receiving point 4, the hemispherical space has the best auditory perception, and the cylindrical space has the worst. For violin music, the hemispherical closed space has the best auditory perception at different receiving points, while the rectangular closed space has a better overall auditory perception than the cylindrical one. For symphonic music, the shape of the closed space has no obvious impact on the auditory perception in each receiving point.
3.3.2. Analysis of Variance
The listening test in the current research is of a two-factor interactive design. To be specific, when considering the effect of space volume, the inner surface absorption coefficient and space shape are held constantly, while the volume and the position of receiving point are changed. When considering the effect of inner surface absorption coefficient, the space volume and shape are held constant, while the absorption coefficient and the receiving point position are changed. When considering the effect of space shape, the space volume and the absorption coefficient are held constantly while the shape and the receiving point position are changed. Consequently, at the significant level of α = 0.05, a two-factor nonrepetitive experimental variance analysis is conducted to test whether the volume, inner surface absorption coefficient, shape of the closed space, and the position of the receiving point have significant impacts on auditory perception. The results of the analysis are shown in
Table 8.
From the results of the variance analysis, it can be seen that the change of inner surface absorption coefficient has a significant impact on the auditory perceptions of all the three testing musical sounds, which is consistent with our reasoning. The change of volume of closed spaces has a significant impact the auditory perception of flute and violin music, and the change of closed space shape has a significant impact on the auditory perception of the violin music. Besides, the change in position of the receiving point has no significant impact on the auditory perceptions of all the three testing musical sounds.
At significant level of α = 00.05, the post-hoc of the above ANOVA is conducted to explore whether there is significant difference among the observing samples. The evaluation results for each simulated model were taken as the observing sample. The results of the post-hoc are shown in
Table 9. It can be seen from
Table 9 that the auditory perception of flute and violin music has a significant difference among most simulated models. The auditory perception of flute music has no significant difference only between models b and h. The auditory perception of violin music has no significant difference between models b and c. However, the auditory perception of symphony music has no significant difference when space volume and shape change except models b and h. When the absorption coefficient changes, the auditory perception of all the testing music type changes significantly.
3.4. Discussion
When the position of the receiving point is not considered, the auditory perception changes caused by the volume, inner surface absorption coefficient, and shape are shown in
Figure 9.
It can be seen from
Figure 9 that, for the flute music, when the volume changes, the cabin with a volume of 24 m
3 achieves the best auditory perception. When the wall absorption coefficient changes, the cabins with the thick pit sand mortar polished brick wall surface and the full sound absorption wall surface can impart better auditory perception. When the shape changes, the hemispherical cabin can impart better auditory perception. For violin music, when the volume changes, the cabin with a volume of 24 m
3 has better auditory perception. When the absorption coefficient changes, the cabin with a sound absorption coefficient of 0.5 and the full absorption wall have better auditory perception. When the shape changes, the hemispherical cabin has better auditory perception. For the symphonic music, when the volume changes, there is no significant difference among the cabins. When the wall sound absorption coefficient changes, the total reflection wall cabin has the worst auditory perception, while the other cabins have no significant difference. When the shape changes, there is no significant difference among the cabins.
It can be concluded from the above observations that the flute and violin music are most suitable for playing in a hemispherical cabin with a small volume and a relatively large wall sound absorption coefficient in the various simulated cabin environments of this experiment, and that the symphony has no special requirements for the playing environment except for total reflection walls. That is to say, hemispherical cabin with a small volume and a relatively large absorption coefficient wall achieves better auditory perception for playing single instrument music such as the flute and violin. While multiple instruments music such as the symphonic has no specific requirement for playing except the extreme case of total reflection wall.