1. Introduction
Among other appliances, vacuum cleaner noise is one of the most annoying household appliance noises in our living environments. For users, vacuum cleaner noise creates an unpleasant feeling, causes fatigue and even causes anger after long usage. For passive listeners, vacuum cleaner noise makes it almost impossible to concentrate on a task or to continue verbal communication. Vacuum cleaner noise in indoor spaces can reach up to 70–80 dB(A), which makes normal speech almost inaudible. Research on vacuum cleaner noise can be divided into two main categories. The first group of studies focuses on the noise generation mechanisms of a vacuum cleaner and possible design changes for noise reduction. In contrast, the second group of studies focuses on the noise annoyance evaluation of vacuum cleaners, trying to understand the dominant correlates and developing sound quality models. At this point, these two group of studies need to be combined: it is important to understand how specific noise components affect annoyance in order to effectively reduce the total annoyance perception.
In their recent publication, Yoshido and Hatta [
1] investigated the level of discomfort created by vacuum cleaner noise for active and passive listening conditions. Under active listening conditions, participants used the vacuum cleaners themselves, whereas under passive listening conditions, participants listened to recorded noise from the same vacuum cleaners. The main point of this study was to understand the difference between robot vacuum cleaners and conventional vacuum cleaners. Under the passive listening conditions, no distracting task was given to the participants. The results showed that the levels of uncomfortableness were significantly higher under the passive listening conditions. Kumar et al. [
2] focused on experimental assessments of annoyance with noise from three vacuum cleaners and the correlations between psychoacoustic parameters and annoyance evaluations. Annoyance index values for the three vacuum cleaners were calculated based on the model suggested by Altinsoy et al. [
3]. They pointed out that loudness has a critical significant effect on vacuum cleaner annoyance. Altinsoy [
4] investigated the main signal characteristics of vacuum cleaners, and no difference was observed between annoyance ratings from single microphone recordings and artificial head recordings. Additionally, it was found that loudness, sharpness, roughness, tonality and articulation index values can be used to model the annoyance ratings of participants. In particular, a free interview conducted with participants at the end of the listening experiments showed that most of the subjects claimed vacuum cleaner noise to be highly annoying when it disturbed communication. From this information, the articulation index values were also included in the developed model of vacuum cleaner noise annoyance.
Martin et al. [
5] compared the operating noise of two vacuum cleaners (with and without silent technology) in different usage scenarios (different floor coverings and different powers) with regard to the overall user experience (UX). They found that the subjective ratings depended on both the usage scenario and the vacuum cleaner model. The vacuum cleaner without noise reduction was rated significantly worse by users than the vacuum cleaner with noise reduction in UX. Furthermore, the rating of annoyance at low power consumption was different for different floor coverings.
Companies may be reluctant to invest in better sound quality since it is difficult to quantify the profit return from a better sounding device. Takada et al. [
6] investigated this issue by measuring the participants’ willingness to purchase a product based on its noise. They suggested that an improvement in sound quality, especially in conditions where participants were able to listen to the product noise, increased the commercial value of the better-sounding device and increased the participants’ willingness to buy the product. In another study by Takada [
7], a similar approach was also applied to vehicle-door-closing sounds. In addition, two experiments on customer product selection based on acoustic performance were conducted in the same publication, relating to vacuum cleaners and hair dryers. All these experiments show that a design addressing product sound sufficiently increases the willingness to buy that product.
In another study, Ih et al. [
8] focused on the annoyance estimations of vacuum cleaners and derived a prediction model for annoyance. One example of vacuum cleaner noise was recorded, and some frequency ranges were classified in terms of their importance. By increasing and decreasing the levels of these defined frequencies and using the orthogonal array technique, they designed listening tests with the aim of developing an annoyance index for vacuum cleaners. In addition to annoyance, the effects of defined frequency bands on the “performance”, “loudness” and “sharpness” of the vacuum cleaners were also investigated in the listening tests. The study concluded with an artificial neural network model that was developed for the prediction of vacuum cleaner annoyance. Lyon [
9] also used vacuum cleaners as an example in his work to explain the main stages of product sound quality analyses. Different components of vacuum cleaner noise were modified, and a listening test was created using a central composite design, so that a smaller number of stimuli could be used, rather than a full factorial design. The sounds of vacuum cleaner components (motor sound, suction fan noise, airflow noise and rotating brush noise) were changed, and it was found that the acceptability function of vacuum cleaner noise was dependent mainly on the airflow noise and motor noise components. It was explained that an equal reduction of 5 dB in both noise sources was required to obtain an optimally acceptable design.
In their study, Rukat et al. [
10] presented a comparison of the acoustic parameters of a vacuum cleaner on different surfaces. They performed various measurements of vacuum cleaners in different arrangements, taking into account that vacuum cleaners can be classified as devices with extensive sound sources. They found that the noise emitted by vacuum cleaners depends on the type of surface used and the arrangements of the device (canister and suction nozzle). They also concluded that it is sufficient to parameterize the acoustic performance of the device with single values, where it would be more feasible for the well-being of the end user to report the most unfavorable working conditions.
In addition to perceptual studies, other studies have focused on understanding the noise generation mechanisms of vacuum cleaners. A detailed acoustic characterization of a wet-type vacuum cleaner was conducted in the publication by Buratti et al. [
11]. They explained that the total emitted noise is the sum of several contributions, such as aerodynamic noise, and mechanical and electromagnetic components. The mechanical and electromagnetic components generate rotational discrete tonal noise and the aerodynamic noise generates broadband noise.
In a series of three publications [
12,
13,
14], Cudina and Prezelj explained the noise generation mechanisms of a vacuum cleaner in detail. These highly detailed publications showed the complexity of the generated noise and its components. The first publication provided an overview of the noise components that can be found in vacuum cleaner noise and how the mechanical and electromagnetic portions create tonal and broadband noise characteristics. Moreover, the consideration of the performance and noise characteristics at the same time offered insight into the inconsistency between the desired suction power and the noise level. The second publication of this series focused on the aerodynamic portion of the noise and the effect of blower geometry on different flow rates. A conclusion was made that vaned diffusers have more disadvantages than advantages and need to be omitted to reduce noise. The third and final publication of these series explained structure-borne noise. The researchers also suggested possible improvements for manufacturers to decrease structure-borne noise in vacuum cleaners.
Novakovic et al. [
15] designed a new centrifugal impeller to improve the noise quality of vacuum cleaners. The aim was to increase the perceived noise quality and not only to reduce the overall noise level. The optimization process was based on two different general noise exposure models. They finalized their propeller design with triangular flow channels. In listening tests, they found that it was possible to make a user-oriented design change based on the psychoacoustic findings.
Brungart and Lauchle [
16] performed sound power level measurements on a handheld vacuum cleaner to identify the main components of the noise. After analyzing the noise, they implemented modifications on the fan casing and the blade distribution, which changed the blade pass frequency. They evaluated the modifications in terms of their preference in jury testing, especially considering the magnitude of the tonal components in the overall noise. Brungart et al. [
17] investigated the effect of modifications on fans and motors on an upright vacuum cleaner in another publication. They found that prominent tonal noise is created by an interaction between the electric motor cooling fan and the surrounding gussets and posts. They removed these elements in an alternating fashion such that the first blade passing frequency of the electric motor cooling fan was eliminated.
Teoh et al. [
18] made modifications to a canister vacuum cleaner to reduce its noise. They pointed out that the noise of a canister vacuum cleaner consists of the blade passing noise generated by motor and the aerodynamically induced airborne noise. Two different noise reduction methods were used: the introduction of sound insulation panels made of porous expanded polypropylene and honeycomb noise filters. After these modifications, the total noise level was reduced by 7.4 dB(A), with a reduction in suction power of only 0.93%.
This study focuses on understanding the general sound characteristics of vacuum cleaners and their annoyance perception. There are many different brands and types of vacuum cleaners with different designs in the market. The differences in design result in differences in noise characteristics: some of the devices are loud, whereas some of them have higher sharpness values. Some of the devices have distinct tonal components, whereas some of the designs are free from tonality. Then, the main question is what kind of canister-type vacuum cleaners should be selected and recorded to investigate, as much as possible, the variability in noise that can be observed, so that the variability in the market can be properly represented? What is the generic vacuum cleaner noise, and how much variability can there be between different models? The goal is to select proper samples from the market such that these selected samples can represent the variability in noise.
To reach this goal, canister-type vacuum cleaners are selected such that the selected samples can represent the variability in noise. The main aim in this study is to select devices such that the selected samples can represent the variability in noise from canister vacuum cleaners.
First, the basic characteristics of vacuum cleaner noise are provided for the selected examples. Then, the ranges of calculated psychoacoustic parameters for selected vacuum cleaners are presented. Variability in the acoustic and psychoacoustic parameters on different vacuum cleaners is discussed to derive common characteristics of canister-type vacuum cleaner noise. This variability is then related to the available information on the noise generation mechanisms of vacuum cleaners in the literature. This observed variability in noise samples in the market is used to set up listening experiments and their ranges.
Afterward, two sets of listening tests are conducted in this study. The first listening test is an explanatory test to understand the main correlates of vacuum cleaner annoyance. Based on the results obtained from this test, a second set of listening tests is conducted to investigate the possible interaction effect on loudness and sharpness using a factorial design in different testing methodologies.
4. Listening Test 2 (Comparison of Different Test Methods and Loudness vs. Sharpness Factorial Design)
The second listening test was performed to investigate the possible interaction between loudness and sharpness. Stimuli were generated in the form of a factorial design, with selected loudness and sharpness values. One sample of vacuum cleaner noise without tonal components was selected, and its loudness and sharpness values were systematically changed by filtering. Four different sub-tests were conducted to investigate this possible interaction. In addition, three different experimental methods were used in these sub-tests to investigate the possible bias due to the experimental method. Nine participants were asked to rate annoyance of the vacuum cleaner noise signals in these three experiment methods. Finally, the results from the different test methodologies were compared and a repeated measures ANOVA was conducted to investigate the possible loudness sharpness interaction.
4.1. Stimuli, Subjects and Test Methods
Listening test 1 showed that loudness and sharpness have a significant effect on annoyance perception. However, it was not clear from this experiment if there was an interaction effect between these two parameters, i.e., higher-frequency content might influence the annoyance estimations as a function of the overall loudness of the stimulus. Multicollinearity is an important problem in statistical modeling that could lead to redundant input parameters in developed quality models. Moreover, the mathematical definitions of loudness and sharpness have a strong correlation from a purely acoustical point of view [
40]. These two facts are particularly critical in sound quality evaluations of vacuum cleaners, where the loudness and sharpness play important roles. First, it can be interpreted from verbal descriptions of the participants that, when the stimuli are louder, the stimuli with stronger high-frequency content are perceived as more annoying. However, we know from the definition that an increase in higher frequencies increases loudness as well as sharpness but at a different rate of change. For this reason, it is necessary to investigate whether there is an interaction effect between these two parameters.
To analyze this possible interaction effect, a series of listening tests was designed, in each of which the loudness and sharpness values were varied in the context of a factorial design. Listening test 2 was divided into four parts. Part 1 included a slider scale experiment with a 3 × 3 factorial experiment design for loudness and sharpness. Part 2 included a magnitude estimation test with a similar 3 × 3 factorial design. Part 3 of the listening test had the same 3 × 3 factorial design, but this time, a random access method was used. In Part 4, the factorial design was changed to 5 × 3 for loudness and sharpness, and the random access method was used.
Moreover, in addition to the possible loudness–sharpness interaction effect, this section also compares the different test methods to discuss the advantages and shortcomings in factorial design experiments. Mainly, for Parts 1, 2 and 3, the slider scale, magnitude estimation and random access methods were applied for the same stimuli under the same reproduction conditions. Finally, in Part 4, the variability in the loudness was extended in both directions so that the possible interaction effects can also be observable in the quieter and louder stimuli.
For Parts 1, 2 and 3, a 3 × 3 factorial design was used for the loudness and sharpness values. For Part 4, a 5 × 3 factorial experimental design was used for loudness and sharpness values. The only difference in Part 4 was that the maximum and minimum values of the loudness values were extended. The values used for each factorial design can be found in
Table 7. Here, stimuli 4–12 were used for the 3 × 3 design (numbers with an asterisk), and stimuli 1, 2, 3, 13, 14 and 15 were added for the 5 × 3 design. The loudness values, calculated according to the standard ISO 532-1 [
22], were selected to be approximately 16 sone, 20 sone and 25 sone for the 3 × 3 design. These values were selected such that they are in the limits measured for each vacuum cleaner given in
Table 3. For the 5 × 3 factorial design, the loudness values were extended to 13, 16, 20, 25 and 30 sone. Meanwhile, the sharpness values, calculated according to the calculation method of Aures ([
25] with [
22]), were selected as 2.4, 2.9 and 3.3 acum.
To obtain vacuum cleaner noise with different sharpness values, a parametric IIR low-pass filter was applied to a selected vacuum cleaner recording. The cutoff frequency of the low-pass filter was set to 4000 Hz. Around this particular frequency, vacuum cleaner noise decreases, and this decrease is different for different vacuum cleaners. Three different parametric low-pass filters with three different Q values were used, so the slope of each line in the FFT was different. Therefore, it was possible to obtain vacuum cleaner noise with different high-frequency components and thus different sharpness values. Since changing the high-frequency content affects the overall loudness of the sound, the overall level is slightly shifted for each filter case. As a result, the same loudness values are obtained. One example is shown in
Figure 15. Here, three stimuli have the same loudness but different sharpness values.
To generate the stimuli in this listening experiment, one original stimulus was taken as the basis. This original sound was selected such that the signal had no tonal components, a loudness of 20 sone (ISO 532-1) and a sharpness of 3.13 acum (Aures). Both loudness and sharpness values lie in the middle of the observed loudness and sharpness ranges. Intentionally, a stimulus without a tonal component was selected to eliminate any possible bias originating from the tonal component in this listening experiment.
Three different test methodologies were compared for factorial design experiments 1, 2 and 3 (
Figure 16). For these three experiments, the slider scale, magnitude estimation and random access methods were used. The slider scale experiment (
Figure 16, part a) used a quasi-continuous rating slider with verbal anchors (from 0 to 100 with a step size of 1) with equidistant neighboring categories (not at all, slightly, moderately, very or extremely), as in listening test 1. Participants used this slider to rate the annoyance of the given stimuli. The appearance of the stimuli was randomized for each participant, and participants were not allowed to navigate back and to change their evaluations for previous stimuli.
In the magnitude estimation experiments (
Figure 16, part b), an anchor stimulus and a defined annoyance value for that particular anchor stimulus were used. Participants were then asked to rate the annoyance of a particular stimulus relative to the anchor stimulus. The reference value for annoyance was set to 100 for the anchor stimulus. Participants could listen to the two given sounds as many times as necessary and they gave their ratings by entering a number in the free space below the play button. The order of the stimuli was also random, as in the slider scale experiment. This random order was different for each participant, and participants could not go back and change their ratings.
Lastly, the random access method (
Figure 16, part c) used a user interface where all of the stimuli were presented to the participant simultaneously. At any time, the participant could click the play buttons in any order to listen the stimuli, could compare them in pairs and could change their previously established response. They could drag and drop the playback icons to the field, which contained the same verbal anchors as in slider scale experiment. The position of the playback icon (i.e., stimulus) on the y-axis was taken as the rating of a participant.
The main difference between the random access method and the slider scale method is that, in the random access method, participants can always replay all stimuli, can change their decisions and have a better sense of control over their evaluations. However, the number of stimuli in such experiments is rather limited. Firstly, the user interface does not have enough space for an unlimited number of playback icons, and secondly, participants reported that, as the number of stimuli increased, it became more difficult to make a decision. When the “evaluation field” was filled (when a participant moved all the playback icons to their correct locations), participants clicked “evaluate” to submit the results.
Similar to listening test 1, the question was “How do you evaluate the annoyance?”, and participants were given the categories “not at all”, “slightly”, “moderately”, “very” or “extremely”. For the magnitude estimation test, the question was changed to “How do you evaluate the annoyance of signal A, compared to the signal B?” For the magnitude estimation procedure, stimulus 1 (lowest loudness and sharpness values) was used as the anchor stimulus.
Part 1 used a slider scale evaluation, Part 2 used a magnitude estimation, and Parts 3 and 4 used the random access method. Nine subjects participated in all parts of the experiment. The experiments were conducted in a soundproof audiometric booth. Three participants were female, and six participants were male. The age of the participants ranged from 25 to 38 years, with a mean of 31.6 and a standard deviation of 3.9. None of the participants reported having a known hearing problem. In each of these experiments, participants were given instructions similar to those in listening test 1.
4.2. Results
The results of Parts 1, 2 and 3 can be seen together in
Figure 17. For Part 1 and Part 3, the average annoyance evaluations and standard deviations were calculated from the individual ratings of participants. The results of Part 2 are shown on a second axis in the same graph. The main reason for this visualization was because the evaluations from a magnitude estimation are ratio-scaled quantities. The evaluations of each participant were linearized by taking the log10 of each value. Since the first stimulus was an anchor stimulus with a reference value of 100, participants evaluated this signal as 100, which was shown in a linearized way as the number “2” in this figure. Similarly, if a participant rated a stimulus as “200” (two times more annoying than the anchor stimulus), this was represented approximately as 2.30 on this graph.
A similar trend was observed between the three different methodologies, whereas the ‘drops’ between the different loudness levels (stimuli 3 to 4 and stimuli 6 to 7) were more obvious in the magnitude estimation. The slider scale and the random access methods had similar trends. These evaluation methods used the same scale and eventually showed similar standard deviations. In both cases, participants had a limited response scale, where they had to provide answers between the predefined numbers (i.e., 0–100), which reflect the categorizations with verbal anchors. Depending on the number of stimuli used for a listening experiment in annoyance evaluations, both methods can be used interchangeably. However, for an experiment with a large number of stimuli, the random access method can be disadvantageous for a participant, since it might be overwhelming to place many sound samples on the evaluation surface at the same time. From a similar perspective, access to all stimuli encourages a participant to play back every possible pair, which might lead the multiple-stimulus evaluation method to become a pairwise evaluation method. In contrast, in the slide scale method, where participants evaluated a single stimulus in each round, they usually reported that they were not sure at the beginning of the test, so they wanted to change their previous evaluations depending on the newly available stimulus. The slider scale method does not provide participants an opportunity to go “back” and “correct” their response. However, with a proper training session and randomization of the order of the stimuli for each participant, we can eliminate this possible bias, which we call “beginning bias”. Eventually, for the case with nine stimuli, both methods showed similar tendencies.
However, a magnitude estimation has its own advantages and disadvantages. The main disadvantage of using a magnitude estimation in annoyance evaluations is the question itself. The main feedback from the participants was that they could not estimate “what was two/three times more/less annoying”. These estimations are more suitable for evaluating better scalable quantities, for example, “two times longer” or “three times larger surface area”. For a line with a given length, participants can better “estimate” the length of a second line; however, the same approach is not always clear for participants of annoyance evaluations. The second disadvantage can be seen in the selection of the anchor stimulus. Here, stimulus 1, which had the lowest loudness and sharpness values, was selected as the anchor stimulus. Each comparison that depends on this particular stimulus can generate different biases [
41]. However, a more detailed investigation of every possible pairwise comparison of the data can be provided using a magnitude estimation. For example, a comparison of pairs 1–3 and 1–4 showed that participants could tolerate a louder tone (stimulus 4) better than a stimulus with the same loudness but relatively high sharpness (stimulus 3). However, the standard deviation of stimulus 3 makes this inference relatively difficult. In contrast, it was not possible to state a similar trend between stimuli 3 and 4 for Part 1 and Part 3.
The individual results for each part are shown in
Figure 18,
Figure 19,
Figure 20 and
Figure 21. In these four figures, the annoyance estimations of each loudness level and sharpness level are averaged over the number of subjects, and the error bars represent the 95% confidence intervals. In all four results, curves representing the different sharpness levels were almost parallel to each other, indicating no interaction between these two quantities. The rate of change of annoyance with changing sharpness was not different at each loudness level. In addition, a repeated measures ANOVA was performed for all tests. There was a separate statistically significant effect of loudness for all parts. The same significant effect was observed for sharpness. However, the interaction effect between loudness and sharpness was not significant in all cases. The results of the repeated measures ANOVA can be found in
Table 8.
Finally, a comparison between Part 3 and Part 4 is shown in
Figure 22. The three panels of this figure show the three sharpness levels used in both experiments. Part 4 has five loudness levels, whereas Part 3 has three loudness levels. In each panel of this figure, it can be seen that the slopes are almost the same in both experiments. This means that changing the loudness at each sharpness level results in an equal change in annoyance for both experiments. In the right panel, it is possible to see that the absolute annoyance evaluations for Part 3 are higher than those for Part 4. It appears that the participants scaled their evaluations for the maximum loudness and sharpness levels to fit within the given evaluation space (from 0 to 100).
5. Summary and Discussion
This study included a wide range of vacuum cleaner recordings selected from the market in a controlled manner. In particular, the sound power levels of the devices according to the manufacturers show a fine distribution among the observed ranges in the market, and the distribution is not stacked or concentrated on specific dB(A) values.
The recordings showed the variability in acoustic and psychoacoustic parameters and their ranges among the selected devices. Based on this observed variability, it was possible to derive the common characteristics of canister-type vacuum cleaner noise. These common characteristics were then compared with those in the literature on vacuum cleaner noise generation mechanisms. The observed variability was comparable with those in the literature. The measured ranges can be considered the limits of acoustic and psychoacoustic values available in the market. Ultimately, it was possible to define prototypical vacuum cleaner noise. This prototypical vacuum cleaner noise provided insight into the possible level ranges: frequency content and tonal content (i.e., frequency and intensity, respectively). Any reader working on vacuum cleaner noise can compare a measurement with the defined ranges in this study to verify that the limits defined in this study are adequate at representing the entire vacuum cleaner population. If new values emerge, either due to a new sampling method (e.g., selection of different vacuum cleaners) or a new technological advancement (e.g., decreasing levels), then it is possible to extend and improve this study to a more inclusionary approach between the vacuum cleaner noise annoyance studies available in the literature. In that manner, it should be possible to obtain reproducible results between different research groups working on the sound quality of vacuum cleaners. Furthermore, the definition of prototypical vacuum cleaner noise can help future studies make parametric modifications of the defined noise and investigate the influence of salient noise characteristics on annoyance ratings.
Recording condition is a static condition of a vacuum cleaner that must be taken into account because normal working conditions can change its emitted sound. However, this effect is rather random, and due to this complexity, it is not possible to generate a comparison baseline for different vacuum cleaners.
Prototypical vacuum cleaner noise can be explained as follows: vacuum cleaner noise is quasi-stationary and has an increasing A-weighted level of about 500 Hz, where the highest level is mostly reached. In this range, most vacuum cleaners have a tone of approximately at 100 Hz, which varies in amplitude depending on the device. At frequencies higher than 500 Hz, A-weighted vacuum cleaner noise tends to decrease, with the range changing depending on whether hard flooring or carpet is used. After 5 kHz, the rate of decrease in the A-weighted levels usually increases. The noise levels reach a value below a threshold of about 10 kHz. In this defined range, different vacuum cleaners show different levels, although the main structure remains essentially the same. Among the defined frequency ranges and their intensities, vacuum cleaners have many tonal components lying in different frequencies. However, it can be roughly categorized that the tones are concentrated in three regions: the first region is around 100 Hz, the second region is approximately 200–800 Hz, and the last region is approximately 1000–10,000 Hz. These values are calculated based on the tonality standard DIN 45681 [
39] and the hearing model of Sottek [
29]. Additionally, the ranges of the psychoacoustical metrics calculated in this study are given in
Table 3, so any further study of vacuum cleaner sound quality can verify the reliability of these values, based on whether a new recording’s values are inside or outside of these defined ranges, keeping the recording conditions in mind.
In the second part of this study, the main correlates of the annoyance evaluations of vacuum cleaner noise were obtained in two listening tests. The first listening test included original and modified vacuum cleaner noise samples. The main correlates of the annoyance evaluations were found in this listening experiment. The second listening experiment was divided into four parts, and each part was designed in a full factorial experiment (between loudness and sharpness) with different experimental methods and ranges. The possible interaction between loudness and sharpness was investigated in these experiments.
The first listening test showed that the overall loudness, sharpness and especially tonal components at lower and higher frequencies play crucial roles in annoyance perception. The correlations between these three parameters and annoyance were found to be significant. The coefficients for the three correlations were found to be 0.963 for loudness, 0.763 for sharpness, 0.354 for tonality at low frequencies and 0.454 for tonality at high frequencies.
However, there is a relatively strong correlation between loudness and sharpness (0.778) and a moderate correlation between loudness and hearing model tonality (0.493). Although there is a strong correlation between loudness and sharpness, which might hint at a degree of multi-collinearity, sharpness was taken into account due to two reasons: Firstly, based on the range of differences with high frequencies, observed in
Figure 8, it makes sense to include sharpness as a parameter due to the variation. It is possible to have the same loudness values and different sharpness values. Secondly, the broadband noise-like nature of vacuum cleaner sounds changes its color significantly by changing the high-frequency content. An expert listening to the recordings can directly relate the mentioned characteristics: different vacuum cleaners have different band-noise characteristics with different amounts of high-frequency content. Moreover, changing the high-frequency content of vacuum cleaner noise is achievable by applying sound-absorbing materials at the air exit and other slits, as observable in some of the “low noise” vacuum cleaners on the market. Eventually, variation in the sharpness can be achieved by means of noise reduction techniques, as mentioned in different pieces of literature referred to in this study.
In the first three parts of the second experiment (Part 1, Part 2 and Part 3), a 3 × 3 factorial design was used in different experimental methods, and the significance of the loudness–sharpness interaction was tested using a repeated measures ANOVA. In the last part, the loudness range was extended in the 5 × 3 factorial design experiments so that the investigated range of loudness was close to the range observed from market research and the interaction could be investigated in louder and quieter stimuli. For all four parts, no significant interaction between loudness and sharpness was found.
In Parts 1, 2 and 3, three different experimental methods were compared with each other using the same stimuli, same subjects and same playback conditions. The investigated methods were slider scale (Part 1), magnitude estimation (Part 2) and random access (Part 3). As expected, the slider scale and random access methods showed quite high similarity, whereas the magnitude estimation method showed clear distinctions for loudness level changes, although statistical significance was not observed when the entire database was considered.
This different behavior from the magnitude estimation test can be the reason for the logarithmic bias [
41] since the stimulus with the lowest loudness and sharpness values was used as the anchor stimulus. In future studies, this effect could be further investigated using another anchor stimulus, such as the other extreme of the stimulus pool (the loudest and sharpest stimulus) or a stimulus right in the middle. In addition, after the magnitude estimation tests, participants usually commented that evaluations such as “two times more annoying” were rather complicated for them, compared with using the available scale with verbal anchors.
The results found in this study are similar to those of the cited studies on this topic [
2,
4,
8]. It was found that loudness and sharpness were strongly correlated with annoyance. In addition to these two terms, high correlations were found between roughness and fluctuation strength, and ratings of annoyance in the cited studies. Furthermore, tone-to-noise ratio was strongly correlated with the annoyance ratings. However, the cited studies do not include the correlations among the input parameters, so it is difficult to say whether the reported high correlations have direct psychoacoustic significance or whether the dominant effect of loudness is reflected in other input parameters due to multicollinearity. Apart from that, a direct comparison with the other cited papers is not possible because they differ in content and methodology. Yoshida’s [
1] work is a special case for a listening attitude (active and passive listening), and since they have used only upright cordless vacuum cleaners, a direct comparison was not feasible. Additionally, Lyon [
9] used a different approach, where the variation in signal characteristics was obtained via real mechanical modification of the device. Hence, the results were given based on these mechanical modifications but were not dependent on the psychoacoustic parameters. Hence, a direct comparison was not possible.
Finally, it is important to point out the potential limitations of this study: Although the selection of vacuum cleaners using maximum power and sound power levels was justified, each sampling is inherently subject to error. As with any other study of sound quality, a different selection of stimuli could lead to different results in the correlations. However, this limitation was minimized by including additional synthetic stimuli. Sound recordings were made under anechoic conditions. It is reasonable to assume that the actual auditory effect might be different if the same devices were operated under normal room conditions. However, room acoustic conditions are completely arbitrary and cannot be a reasonable basis for comparing different devices. The correlations obtained in the first listening test are only valid for the applied test method. As can be seen from the results of listening experiment 2, different test methods can also lead to different results. Finally, the significance and effect size obtained in listening experiment 2 could be different in an experiment with more participants.
As future work, the effect of tonality should also be investigated in a factorial design, allowing for a full-factorial design between all the major correlates of annoyance found in listening test 1. However, this could involve many input parameters with many levels, resulting in too many stimuli, which is not feasible for use in a single experiment. There, an experiment method should be defined that allows for separate experiments and their combined evaluations with as little biases or errors as possible.