Comparative Analysis of HRTFs Measurement Using In-Ear Microphones
Round 1
Reviewer 1 Report
The manuscript describes some measurements of head-related transfer functions (HRTFs). The HRTF is a measure of the difference between the sound spectrum in the free field near the external ear and the spectrum at the tympanic membrane. HRTFs have previously been measured by many investigators, so the manuscript could be clearer about what new information their study was intended to provide. The manuscript would also be strengthened by adding additional technical detail. Some examples are listed below.
Specific comments:
lines 35-27, "The HRTFs are strongly individual because they depends [sic] on the shape of head, pinnae, and torso that are different for each human being. HRTFs can be measured through head and torso simulators, which standardize the body dimensions ...": If taken literally, the second sentence in this sequence contradicts the first. That is, if the HRTF is unique to an individual (it is), then it can't be measured using standardized dimensions. Perhaps the second sentence could be rephrased to indicate that a mannequin with standardized dimensions can be used to generate a standardized HRTF, or an estimated HRTF. To be clear, I'm only suggesting a change in the language here (and also in the abstract, lines 4-5); obviously the authors understand the issue because they did provide a clearer description later in the paragraph.
line 41: the word 'pitch' is potentially ambiguous. Most readers will understand the context, but it might be helpful to clarify that the sentence is not referring to the perceptual attribute of sound associated with fundamental frequency.
Fig 2: These appear to be taken from the manufacturer's information sheets; that should be indicated.
line 105: More information about the sweep signal should be provided. Also, my impression is that some information about the acoustic system may have been assumed rather than confirmed by measurements: for example, that the system was linear, that the loudspeaker frequency response was uniform. That seems less than ideal in a study that is entirely acoustic, but whatever was or was not done should be reported in the manuscript.
Figure 2c: The peak at 3 kHz is as expected for a measurement at the position of the tympanic membrane, which would be affected by the resonance of the model's ear canal. But in Figs. 5 and 9, it appears that the ear canal is occluded, and that the measurements are actually made in and around the concha. Whether that is the case or not, it should be stated explicitly. But if it is the case, doesn't that compromise the comparisons, by leaving out the contribution of the external auditory meatus?
Minor non-standard usage, easily corrected.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
The manuscript addresses the comparison of various Head-Related Transfer Function (HRTF) measurements obtained using in-ear microphones. HRTFs play a crucial role in immersive audio as they allow for the modeling of the acoustic path between a sound source and a listener's ears. The contribution of the article consists of two main experiments. In the first experiment, the authors compare HRTFs acquired using two different in-ear microphones, namely the Knowles FG-23329-D65 and the Sennheiser MKE2-EW Gold, with HRTFs measured using a head and torso simulator (Brüel and Kjær Type 4128C) considered as a reference. The authors investigate the influence of mic positions and head orientations on the results. The second experiment involves comparing the HRTFs of five real subjects with those obtained using the head and torso simulator. The results demonstrate that different mic placements primarily affect HRTF measurements at high frequencies, while greater differences can be attributed to the varying quality of the two in-ear microphones. Additionally, the tests conducted on real subjects reveal significant variability in HRTFs among individuals.
The paper is well-written, and while the contribution may not be groundbreaking, it remains interesting as it provides insights into measuring HRTFs using in-ear microphones. However, before publication, certain aspects regarding the results need clarification.
The introduction should be expanded to include information about the utilization of HRTFs and their main applications, along with references to relevant works in the field (e.g., Rafaely et al. "Spatial audio signal processing for binaural reproduction of recorded acoustic scenes – review and challenges" or Cobos et al. - "An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction").
In Section 4, in addition to presenting the log-spectral distance (LSD) formula and its formal explanation, it would be beneficial to provide the reader with an understanding of the specific aspects that the LSD metric aims to measure and the reasons behind its selection.
The comment regarding the first experiment, particularly the section referring to LSD values, could be improved by providing more detailed explanations. Specifically, lines 176 to 184 could be rephrased, incorporating a more thorough commentary on Figure 8 and an analysis of the influence of microphone positions on LSD values (Table 1).
The plots illustrating the comparisons of HRTFs measured using different microphones and positions are somewhat difficult to interpret due to their crowded nature. I suggest adding a grid behind each plot, including reference points, to facilitate result interpretation. Furthermore, it is worth noting that the response of HRTFs measured with the Knowles microphone is depicted up to 20 kHz in these plots, despite Figure 2a indicating that its response is limited to 10 kHz. To ensure consistency, it may be safer to represent HRTFs only up to 10 kHz.
Throughout the manuscript, particularly in the conclusions, it appears that too much emphasis is placed on the differences in results caused by the distinct characteristics of the microphones used for measurements. In my opinion, the primary aim of this work was to investigate the influence of different microphone positions, head orientations, and real subjects on HRTF measurements. Consequently, I believe the analysis lacks the definition of guidelines or "rules of thumb" for handling microphone positioning, which would be more relevant for readers.
Finally, as suggested by the authors in the conclusion section, future expansion of this research could greatly benefit from validation through subjective tests. Incorporating such tests would enhance the contribution and value of the study.
Minor comments:
- Line 79: deconvolution methods the adaptive filtering techniques -> deconvolution methods and adaptive filtering techniques.
- Line 150: the reference to Figure 5 does not seem to be right. Maybe it's better to reference Figure 6.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
The authors addressed the comments raised by the Reviewer.