1. Introduction
In today’s world, smart phones have become integral of part of day-to-day life [
1,
2,
3,
4]. Unlike computers, smartphones have applications that run only on a respective Mobile OS’s (like Android, iOS, Windows Fire Flower, etc.) with sensors for various functions [
5,
6]. However, these devices combines the features of a personal computer and mobile device such as Calling, Texting, etc.; with those of other popular digital mobile devices like personal digital assistants (PDAs), such as media player, video games, GPS navigation, digital camera, digital video camera and event calendar.
In most smart phones, to make the user aware of application functions, several notifications such as ringtones, message tones, app alerts, etc., are employed. These alerts are usually designed as sound notifications to be audible enough for the user to recognize the situation. A device that is placed in a noisy atmosphere requires loud alerts to get the user’s attention. In case the user is in a quiet environment and the device makes the same sound alert, it will be distressing to the user—as well as to the people nearby. Manual intervention by the user is required to adjust the volume by using the volume rocker buttons or by using touch user interface that may not be possible by the user in all situations. Although several sensors are available in the smartphone, automatic volume control based on the ambient environment does not currently exist in most devices.
Several researchers have focused on developing applications for detection of the ambient noise in different environments. In [
7], authors developed a system to improve the success of robotic speech presentations for users/listeners in various environments. The system senses a change in its aural relationship with the user and infers the need for an adjustment in its presentation from the user’s listening perspective. The system uses multiple ways to find the correct level of volume by collecting and collaborating all the resources. On the other hand, a new system was developed to investigate the source and context of the noise [
8]. By finding the most influential noise in the environment, a grid was created to localize the result. Finally, the system creates a sound map with the help of other noise sources. Further, the system can then extrapolate from the models to the shape of the auditory scene as a whole. Robust recognition of continuous speech by a mobile robot [
9] was proposed to perform automatic speech recognition with speakers. A microphone array is utilized with geometric source separation (GSS) along with a post-filter to reduce the interference from other sources. Using this concept, all the sounds in an environment can be easily identified. This approach is specifically designed for multiple source recognition and yet no attempt was made to improve speech recognition of a single source in the presence of environment noise.
There has been a growing interest in the field of human–computer/ human–machine interaction. In [
10], authors proposed a system combining various text-to-speech engines to develop a flexible text to speech engine. The new interaction method works exemplarily, but it does not know the context of location or situation it is placed. To overcome this problem, ‘HARK’ robot audition system [
11] was developed. The system consists of sound modules, namely source localization, sound partition and automatic speech recognition of separated speech signals that work on any mobile with different microphone configuration. This recognizes the quality of the speaker with the help of speakers mounted on it and can identify the direction and intensity of the sound.
Automatic sound level adjustment has always been a necessity for various reasons. In [
12], a system for automatic volume control was developed. In most cases, coworkers in a working environment gets disturbed due to loud sound that is generated when two user communicate accidentally. To overcome this, mobile remote presence (MRP) system [
13] was developed to identify the impacts of the sidetone operation in the audio–visual context for annoyance reduction. Automatic volume control for preserving intelligibility [
14] was developed to overcome the noise levels of the surroundings that change abruptly or can be transient. Due to repeated occurrences, it is difficult to alter the volume of an audio device by humans to adjust for the changing noise levels. Therefore, a system is necessary to avoid this problem and to analyze the background noise using the microphone in real time. It compensates only for fluctuating noise in the frequency and time domains that interferes with the intelligibility of speech.
Auditory perspective taking [
15] is a concept where the system imagines about being in a new environment or place and predicts what they can hear and how it will affect their general comprehension. With this knowledge of another’s auditory perspective, user can adapt to his auditory output to overcome ambient and environmental differences and ensure the quality of the conversation. Robot self-noise (or ego noise), echoes, and reverberation are common interferences that factor in this problem. A prototype robotic interface that uses perspective taking is developed [
15] to estimate the effectiveness of its own speech presentation and takes steps to improve intelligibility for human listeners. Here, the robot produces communication more smartly by context awareness and adjusts the volume accordingly. This success confirms the importance of these emerging roadmaps in user interface design, and the developed prototype shows the wide range of skills available to a perspective taking of the robotic interface. Adaptive volume control is one of the important features in mobile application. A new reinforcement learning based application named as RE–IN [
16] was developed for android platform for automatic volume control and screen brightness adjustment based on the surrounding noise while playing games. However, the developed application mainly relies on the ambient noise to adjust the volume and do not consider the human activity and state as it was mainly designed for gaming.
Mobile phone user community is required to be aware of sensors function and ability. For this purpose, machine learning has been employed to bridge the gap between the sensor developers [
17] and mobile phone users. The knowledge base technique gives a clear picture about the functions of mobile sensors and its durability in various smart phones and provides the user with the required knowledge to select the best sensor for mobile applications. Researchers have shown that mobile sensor sampling algorithm [
18] performs best in terms of energy saving as well as mobile area coverage by considering the input under normal walking performed by the human. This algorithm utilizes velocity of human as an important feature for mobile sensing in order to overcome the overlaps caused by spatial regularities. These works highlight the need for development of applications that rely on user activity and user state in order improve the performance of the device and provide comfort to the user.
Various studies show that on average a person spends more time on mobile phones rather than sleeping. A statistical study [
19] conducted on hand–phone activities by humans and its popularity among youngsters shows that, the amount of time spent by an average person using mobile phone is very high. If further concludes that the mobile device needs to be smart in order to reduce user interaction as much as possible.
To overcome the problem of automatic volume adjustment in a smart phone, in this study, an innovative ambient noise activity control method is developed using integrated multi-modal sensing approach. As current generation smartphones usually employs several types of sensors for various applications [
20,
21,
22,
23,
24], these sensors can be utilized to identify human activity and ambient environment. The proposed system will adjust the volume according to its environment, based on the decibel level that is measured through the microphone. Moreover, the software is developed as an application and will be useful in public places, industrial sites, transportation, hospitals, business meetings, libraries etc.
Figure 1 depicts the stream of multisensors employed. The prime objective of this application is to adjust the sound level according to the environment without user intervention. This has the following advantages:
Eliminates the user need for pressing the volume button;
Identify a suitable volume level automatically for dynamic environment;
Makes the mobile device more context-aware for volume adjustment.
2. Methods
For the development of the proposed system, the Android platform was chosen due to its market share and popularity.
Figure 2 presents the proposed framework with multisensors. When the user presses the image button, the program checks for the first-time execution. If so, the range selection and the calibration module will be executed correspondingly. Else, the time module will declare the program to execute in the given time. Thereafter the activity recognition modules recognize the activity of the user and the decibel module will measure the ambient decibel level accordingly. Finally, using these inputs, the sound level selector module adjusts the sound level according to the ambient setting. In what follows, the key steps in
Figure 2 are organized as subsections accordingly.
2.1. Device-Calibration Algorithm (Algorithm 1)
Every device has its own hardware configuration. To make the program compatible with various devices, calibration must be performed to the device’s microphones. First, a sample value is measured in a calm place (measured in decibels). According to the noise level chart [
15], a calm place should have less than 15 dB of sound. A test value is pre–decidedly stored in the program. By comparing the test value with the sample volume, the calibration value is derived. Further, the sound level is measured in decibels (dB).
Algorithm 1: Device-Calibration |
Step 1: get current decibel value Step 2: if decibel value < 15 or decibel value = 15 Calibration value = 0 else if decibel value > 15 Calibration value = decibel value- 15 Step 3: return calibration |
Figure 3 shows the execution of calibration process and
Figure 4 depicts the dialog box used to select the minimum and maximum volume. The dialog box contains two number picker and two buttons. The volume level is classified into five levels, namely, very low, low, medium, high and very high.
2.2. Grade-Selection Algorithm (Algorithm 2)
During the initial usage, the user needs to specify the range of volume (minimum and maximum values) according to the user’s requirement in the respective number pickers. The first number picker is used to specify the minimum volume whereas the second number picker is used to specify the highest volume level the user can bear. The grade is selected as follows:
where
n is the number of levels,
Min_V and
Max_V are the user selected numbers in number picker 1 and picker 2, respectively.
Therefore, the first number picker ranges from 1–15 and this range is already predefined in the mobile application. Moreover, the user has to select a volume level greater than the minimum level so that there will be no mathematical error in the identification of the level.
Algorithm 2: Grade-Selection |
Input: integer values from number pickers Output: the size of the level
- Step 1:
Open dialogue with number picker 1 and number Picker 2 - Step 2:
Get the selected number from number picker 1 and store as min - Step 3:
get a selected number from number picker 2 and store as max - Step 4:
if min is greater than max Display an error saying “minimum volume can’t be higher than maximum” - Step 4.1:
else Total = maximum – minimum - Step 5:
level = total/5
|
In this algorithm, the time unit checks whether the program runs at a specified time given by the user. From the time picker, the starting and the ending time of the program are received. The values from the time picker are in 12 h format. To receive the current time, the system calls the calendar method. This method returns the current time of the device in 24 h format. To match this data, Typecasting is employed to convert different data types into integer (minutes).
The running time for the program is indicated by From_time and To_time. This To _time is derived from the second time picker module since it is in the 12 h format (HH/MM/ (am_pm)). If the selected time is in the form of (HH/MM/pm) the following formula is used.
where
P is 12*60. Otherwise, if the time specified in (HH/MM/am), then this is used for typecasting.
The following equation is used if the current time is typecast to minutes.
The program will execute if the current time is between From_time and To_time. If the current _time is greater than From _time and lesser than To _time, the program quits itself with a message "time up".
The user provides the start time and stop time as inputs in 12 h format as shown in
Figure 5.
2.3. Activity-Recognition Algorithm (Algorithm 3)
The activity-recognition algorithm identifies the current activity of the user. The activities can be classified into simple and complex activities. Activities such as walking, walking on stairs, jogging, jumping, driving a motorcycle, driving a car or holding a mobile in hand and placing the mobile on table are known as simple activities. Commonly used sensors in mobile phones are accelerometer [
25,
26], gyroscope [
27], magnetometer [
28,
29], proximity sensor, light sensor [
30], barometer [
31,
32], thermometer, air humidity sensor [
33,
34], etc. The user activity can be recognized with the help of these sensors. Accelerometer sensor measures the acceleration in any axis. However, by moving the device in any axis, the sensor data will spike. The device’s orientation along its three-axis can be sensed by the gyroscope sensor [
35]. The acceleration can be measured along three spatial axis at any time instant as
The unit of x, y and z are in g-force [
36]. In the simple walking activity, if the xi value of Acci spikes for a certain time with short intervals, the device recognizes that the user is moving forward step-by-step. Similarly, if the device is idle there will not be any spikes in any Acci value and, but this occurs under two cases
Case 1: acci = < 0,0,0 > &gyro = < 180,0,0 >
If the device is idle and tilted 1800, it can be inferred that the device is kept on a table.
Case 2: acci = <0,0,0 > &gyro = < (90–180),0,0 >
If the device is idle and tilted 900 to 1800 degree, based on the test set we assume that the user is holding the device in hand. This activity recognition module is designed to identify the following simple activities like STILL—the device is placed anywhere outside; TILTING—user holds the device; and OTHERS—represents any other detected activity. Google activity recognition API is employed for user activity identification.
Algorithm 3: Activity-Recognition Algorithm |
Input: crisp values from sensors Output: Detected activity
- Step 1:
Get ACTIVITY_RECOGNITION permission from the User If permission granted go to step 2. Else go to decibel module - Step 2:
Connect with Google Activity Recognition API - Step 3:
Run method request Activity Updates - Step 4:
Check if a confident level of detected Activity. STILL = 100%, Put detected_activity = STILL - Step 4.1:
Else if confident level of detected Activity. TILTING = 100%, Put detected_activity= TILTING - Step 4.2:
else if detectedActivity = OTHERS Put detected_activity= OTHERS - Step 5:
return detected_activity.
|
2.4. Decibel-Detection Algorithm (Algorithm 4)
Using the microphone, a short burst of audio is recorded and saved in the ”*.avi” format. Then the “.avi” file is analyzed to identify the maximum amplitude (hertz per second) that ranges between 20 Hz to 2 kHz in a typical mobile phone. The decibel unit runs continuously to find the current decibel for each second when the decibel module is executed. For each iteration, 500 ms of audio is recorded and analyzed to compute the current decibel value. This decibel value is then sent to the volume selector unit.
Algorithm 4: Decibel-Detection Algorithm |
Input: short bursts of microphone recording Output: Decibel level
- Step 1:
Create and declare variable decibel and amplitude - Step 2:
Create a self-looping thread with 500 ms interval - Step 3:
Start media recorder with a microphone as source - Step 4:
For each iteration get amplitude from the media recorder - Step 5:
Decibel = Log10 (amplitude/32000) - Step 6:
Decibel = decibel value—calibration value - Step 7:
return decibel
|
2.5. Volume-Selection Algorithm (Algorithm 5)
Sound output in android is classified into these major categories
Alert sound;
Notification sound;
Media volume.
This selector unit adjusts the volume of these sounds according to the crisp values of activity recognition and decibel unit. If the device is in the user’s hand and active; it concludes that the user already has the attention of the device. At this time, the device need not make an alert noise for notification or ringtone. The value TILTING indicates that the device is in the user’s hand and active. In case of detected activity is STILL (the device is idle and resting on a place) then the device needs to make an alert sound to get the user’s attention. Therefore, when the system is in STILL state or another state, the system should make a sound. The volume of this sound must be moderated according to the ambient setting.
The decibel value from the decibel unit is used to decide the volume level of the device. If the decibel value ranges from 10 to 20, it indicates that the device is in a relatively quiet place, the device can get the user’s attention by using minimum volume. The volume escalates as the decibel level increases in the surrounding. If device is placed in a noisy or crowded environment, then the decibel volume will be high and hence the device needs to increase the sound level in order to get user’s attention. This unit will adjust the notification, ringtone and media volume at the same time if the device is in STILL or OTHER state. However, in TILTING state, only the media volume is adjusted.
The volume selector is executed every 10–20 s based on the activity. If the activity is identified as STILL, the volume selector gets the average decibel for 20 s and then it changes the volume level accordingly. For any other activity other than TILTING, the volume selector is executed for every 10 s. The volume selection is identified by the following equation:
where
and
G is the level and ‘i’ is the position.
Algorithm 5: Volume-Selection Algorithm |
Input: detected activity, the decibel level Output: volume level selection If detected activity is TILTING{ set(ringtone, Notification) = 0; for each I = 0 to 20 ( sum<− sum+decibel ) Average<−sum/20 Switch (average) Case 0–10: media volume < − Case 11–20: media volume< − Case 21–30: media volume < − Case 31–40: media volume < − Case >40: media volume <− } Else {if detected activity is STILL( put N < −20 ) Else (put N < −10) for each I = 0 to N sum = sum + decibel Average = sum/N end Switch (average) { Case 0–10: All volume< − Case 11–20: All volume< − Case 21–30: All volume < − Case 31–40: All volume < − Case > 40: All volume < − }} |
3. Results and Discussion
For experimental evaluation, five mobile phones from different makers—Samsung galaxy j2, Moto, HTC, Sony and Blackberry—were employed. The developed algorithms were embedded within the software in the form of a mobile application. The software relied on the inbuilt hardware available in the mobile phones for activity detection and volume detection. The application was tested in various environments and in different user activity states. The calibration was performed in a calm environment at various locations such as the living room, residential area and library.
Figure 6. depicts the outcome of the calibration process in three different environments where noise levels are different. The calibrated value shows small variations in the living room and library due to low noise level across different devices. The volume level ranges from 0 to 15 and the user has to choose between two values—minimum and maximum. In total there can be 120 combinations. The first one must be less than others, and these values are used to find the value of grade as follows:
The proposed system considers sound in five different levels, namely very low, low, medium, high and very high. As sound level varies for different values of user inputs, appropriate grade is assigned.
Figure 7 shows the volume level value for three sets of min and max values like Set 1 = (1, 15), Set 2 = (2, 13) and Set 3 = (3, 12). The volume selector unit selects the volume level according to the decibel level and detected activity. It is executed for every 10 s or 10-decibel averages according to the detected activity. To evaluate the program, various time averages are considered for analysis by considering different activities. For tilting and still activities, the decibel level of 20 is chosen. For every 10 s, the volume selector is executed.
Figure 8 represents the decibel selector that is executed for every 10 samples in an open park environment. From this, it is clear that the ambient volume level changes frequently and causes annoyance to the user.
Figure 9 depicts the volume level selection for 20 samples average. Although the volume change is minimum, there is a higher chance of user listening to the media at an unsuitable volume level.
Figure 10 shows the volume level selection for 30 samples average. The frequency of volume changed is low and the chance of user listening to media at correct volume is high. Based on the analysis, 20 samples are chosen to be more appropriate for the tilting activity. For other activities, the number of samples chosen is 10 since the activity is unknown. To overcome this problem, the volume selector is executed every 5 s.
Tests are also conducted to verify the activity recognition based on the activity detection API where the input is obtained from the gyro sensor in the device. Four activities (still, tilting, walking and running) are performed for validation to verify if the software detects the correct activities.
Figure 11 shows the sensors outputs in three axes x, y and z-axis for different activities. The activity detection API relies on the variations in the sensor outputs to identify these activities as shown in
Table 1 Although several activity states can be included in the algorithm, for illustration only four activities are chosen. More number of activities may also be included for better understanding about the activity recognition such as climbing, falling, running, etc. In some cases, complex activities like driving can be also considered to test the algorithms. When the detected activity is TILTING, the ringtone and notification is set as zero and only the media sound level is changed.
Figure 12 shows the tilting state activity where the ringtone and the notification volume are set as zero. According to the algorithm, all sound levels should be selected minimum. When the detected activity is still or other; volume is changed according to the surrounding volume level.
Figure 13,
Figure 14 and
Figure 15 represents various average decibels levels. When the average sound level is considered low and the detected activity is still, the volume level selector algorithm increases the sound level by one grade. In case, if the surrounding sound level average is above 25, the sound level selector algorithm selects the higher level. As the ambient sound level increases, the device sound level is increased in order to make the device sound audible for the user. The Maximum sound level is selected when the decibel level of the surrounding environment goes above 50. A Decibel level above 50 represents that the ambient sound level is noisy for the user. Therefore, the device selects the highest possible volume level to grab the user attention in the case of notification or ringtone. Taking into account the different manufactures of the mobile phones together with their characteristics, the decibel value for a user may differ. Hence user feedback and selections of minimum and maximum volume levels plays in key role in the identification of a comfortable volume level. The comparison of the proposed system with existing systems is tabulated in
Table 1.
From the comparative analysis, it is clear that the proposed system offers various features when compared to the existing methods. Moreover, the developed algorithm provides greater accuracy to detect the activity and sound level in real time.
4. Discussion
In this study, different algorithms are developed in the proposed system—Grade-selection algorithm for computing execution time; Activity-recognition algorithm that fuses various sensors for identification of user activity; Decibel-selection algorithm for effective sound level measurement in both device and surrounding environment; Volume selection for adjusting the volume automatically based on the ambient noise level. This proposed application was tested at various locations with five different sound levels. Based on the tests conducted with several mobile devices, the sound level in the device was regulated to be within the comfort level of the users. Moreover, the system performed well for low noise level and high noise levels as shown in
Figure 8. When the application identifies the unknown user activity, the time interval is reduced to five seconds. For noise levels greater than 15 decibels, the volume selector module automatically regulates the volume to the user comfortable level.
As compared to the manual systems [
36,
38] where the user intervention is required, this system performs better even in commercial areas with higher noise levels. Most of the existing methods [
13,
16] focus on a specific application and are not applicable to various environments and user activities. In comparison, the proposed work relies on user activity and ambient noise levels from various environments for automatic volume control. More types of activity can be included to further improve the application scope and performance that adapts more to the user lifestyle.
Different people may have different motion intensities and individual comfort levels. In the present scenario, the application only relies on the type of the activity and not the intensity level. The effect of user intensity level on the volume is yet to be analyzed and will be a topic for our future research.
The proposed system also does not rely on GPS sensor data and google map location to identify the environment. Instead, with the help of sensors it identifies the noise level and identifies the activity and thereby automatically adjusts the sound level accordingly according to its dynamic state. Future work will rely on including GPS location data to further improve the application scope and performance.