1. Introduction
At large musical events, an audio mixer is used for tuning the results of the sounds by manually adjusting the knobs of the volume or equalizer. However, it takes lots of time to achieve the ideal situation precisely by operating manual knobs. A powerful audio mixer is an electronic device for combining and adjusting sounds of many different audio signals [
1]. It cannot only modify the spectrum and amplitude of the input signal, but it also integrates various separate audio channels to output the audio signal. Compared with traditional analog mixers [
2], digital mixers [
3] can mix a variety of effects at the same time. It has advantages in terms of cleaner sound quality, higher signal-to-noise ratio noise suppression, and smaller device volume, but it has a more complicated operation.
In general, a sound engineer, who is responsible for the public address (PA) system, must satisfy and deal with all requests from performers, musicians, and listeners during a live band performance. While different instruments are played, performers will have different and individual adjustment needs. The traditional audio mixer has multiple input channels. Through the audio mixer, the audio input can be played via speakers [
4]. It is often necessary for the sound engineer to listen to the audio effects on the spot and to then make adjustments immediately. It is not easy for an inexperienced person to control every detailed operation and the sound effects. To adjust the mixer system effectively, the integration of platform functions has gradually attracted attention. For example, Jaloudi’s [
5] intergraded wireless and wired microphone system transmits the audio input via a wired cable to the RCA connector on the mixing console. In addition, Yao [
6] developed an interface that makes it easy to adjust the sound effect on the application. The audio output is transmitted to a Class-D amplifier via a wired cable. In addition, it is still a hard work for an experienced sound engineer to remotely control an audio mixer [
7] and to make the auditory effects become more stunning. Therefore, some approaches have proposed a flexible way to use manual adjustment and control [
1,
8] on an audio mixer, or to reduce the difficulty of the mixer operation.
The proposed development platform provides remote audio control and speech recognition, which can connect the applications (APP) on mobile phones and the mixer via Bluetooth, so the sound engineer can walk around in front of the stage for testing. The sound engineer can adjust and change the parameters of setting in time to achieve the best output of audio. The proposed smart system platform used in this article is a prototype design to integrate various modules. The DSP kernel of the mixer is KT0707 [
9], and the micro control unit (MCU) kernel of DTC is implemented by Atmeaga328p [
10]. The self-developed APP using the library of SpeechRecognizer [
11] combines speech recognition and remote-control technologies, which allows us to upload the speech to the Google cloud server, and then this service returns the string data. It yields the maximized accuracy of speech control on the APP.
The problems encountered with the audio mixer and the proposed solutions are listed in
Table 1. The proposed work can improve the drawbacks of the traditional audio mixer, i.e., labor-intensive, non-remote control [
2,
3,
4,
5,
6,
7], imprecise mixer knob tuning [
2,
3,
4,
5,
7], and dense layout of the operation interface [
4,
5,
8]. Further detail is introduced in
Section 2. To compare with the products on the market, smart speakers with a similar price and functions [
12,
13,
14] are employed in the experimental results. Although the smart speakers also provide application and functions of remote control and speech recognition, the equalizer and reverberation cannot be tuned appropriately. In this case, the user cannot achieve the best sound effect.
2. Methods
The structures of the proposed smart system platform design are divided into three parts: (1) digital mixer adjustment (DSP, KT0707), (2) DTC, and (3) Android APP design.
Figure 1 shows the block diagram of the proposed system platform design.
To satisfy the requests of the performers, KT0707 was deployed as the DSP of the proposed design, after several actual tests were implemented. KT0707 can simulate the performance of a small-scale band outdoors, i.e., three audio input channels are specified for keyboard, bass, and guitar individually, and a wireless microphone input channel is specified for the vocals. The audio signals in the wireless microphone and three-channel audio sources are transmitted to the mixer’s analog digital converter (ADC) through radio frequency continuously. All the audio input processed by the audio mixer can be played via speakers. The DSP kernel of the audio mixer has a powerful digital processing ability that can deal with many control commands such as “volume tuning”, “volume up by 10%”, and “adjustment of high/middle/low frequency” requested from the user on the APP. DTC receives the packets of control command via Bluetooth from the APP, and then it transmits the command into the DSP for further control by I2C. In addition to the UI interface, the APP also provides a speech recognition function and a semantic recognition function to convert speech into text. The received semantic contents are mapped to the corresponding control codes, and all packets including the control codes are transmitted to DTC via Bluetooth.
2.1. Proposed Data Transfer Controller Design
2.1.1. Hardware Design
Atmeaga328p is utilized in MCU as a kernel of the DTC circuit. AMS1117-3.3v and AMS1117-5v [
15] are both employed as the voltage suppliers for Bluetooth 5.0, KT0707 (DSP), and MCU. A 16-MHz quartz oscillator is used as the clock of MCU. The functions of the SPI library on the Arduino platform are deployed for the burning coded program on MCU [
16]. The Bluetooth communication module [
17] for the UART interface is used in the proposed design. In the design of the printed circuit board (PCB), copper is retained on the unused area and is connected to a ground for strengthening the stability of the Bluetooth transmission. The Bluetooth will be settled along the border of PCB to achieve better transmissions.
Figure 2 demonstrates the proposed DTC circuit layout design with a size of 4.8 × 4.2 cm
2.
2.1.2. Firmware Design
The DTC is mainly used as a bridge between the DSP and APP on mobile phones for communication. The firmware design for transmissions and communications will be programmed according to different protocols. The I
2C library [
18] is programmed according to the provided timing diagram of KT0707, as shown in
Figure 3. A self-defined Bluetooth packet format (SBPF), as shown in
Figure 4, is designed based on the Bluetooth 5.0 transmission protocol [
19].
Figure 4a represents the format of SBPF with a length of 11 bytes. One byte is used for the left edge of the block, another byte is used for the packet length. Two bytes are used for storing the register address, and four bytes are used for storing the register value. Another two bytes are used for CRC, and the last byte is used for the right edge of the block. In addition, the proposed SBPF adopts CRC (CRC-16-CCITT) [
20] and ACK mechanisms to improve the accuracy and integrity of the transmitted data.
Figure 4b provides further descriptions of the communication protocol. An MCU is employed as a slave terminal, and the proposed APP is the master terminal. In practice, a specific library is created to control the proposed SBPF. There are three main functions in the library, i.e., “General_Tx”, “General_Rx”, and “Interrupt_Process”, are all are used for processing with the following actions: (1) packaging and scheduling, (2) received data filtering and identifying, and (3) time scheduling and retransmission mechanism. To confirm the integrity of packet transmission, an ACK mechanism is deployed in the proposed design. If the ACK is successfully received by the MCU, it indicates that the current packet has been correctly transmitted; if it fails, the current packet will be retransmitted again. However, if the maximum amount of retransmission times is exceeded, the current packet is ignored in order to prevent delaying subsequent packets.
The control commands of DSP, including the register names, values, and features are partially listed in
Table 2. The proposed smart system platform mainly contains the following functions: (1) switches of volume control for four input sources, (2) switch of equalization (EQ; low, middle, and high frequency) and the gain adjustments for four input sources, and (3) reverberation processing of two input sources.
The reverberation adjustment is divided into the following two parts: (1) Reverberation of room size, which will simulate the room size of the space where the analog sound made is located. The smaller value represents the smaller room. (2) The reverberation level is the influence level of the reverberation, and a larger value causes a heavier space sense. On the other hand, EQ-frequency control adjusts the center frequency of the gain, and the value ranges between 15 Hz and 20,600 Hz. EQ-gain control adjusts the gain centered on the control frequency, and the value ranges between −24 dB and 12 dB.
2.2. Proposed APP Design
APP design is the core of the proposed system. The complete APP functional block diagram is shown in
Figure 5. As multi-thread technology is employed in APP, the UI interaction and speech recognition can be executed simultaneously. Different from the MCU, single thread technology only can execute single events to prevent memory conflicts. Except for the above-mentioned functions, the Bluetooth transmission protocol adopts Bluetooth low energy (BLE) technology in the APP to program the connection and transmission functions. It can reduce the communication delay of MCU and the consumption of the smart system platform. In addition, the proposed APP includes functions of speech recognition, semantic analysis, and the storage of DSP instruction sets.
In this work, speech recognition technology provided on the Android platform by Google SpeechRecognizer [
21], which is employed to realize the function of speech control. This provides the functions to convert speech into text with a high accuracy. Additionally, it has the advantage of being able to be executed in various noisy environments. The recognized control command will be converted into text. The function and the scale in the text will be analyzed by semantic analysis, then the packets will be transmitted by DSP to tune the mixer. For example, if the recognized control command is “turn on the volume”, it will be recognized that the volume needs to be tuned and the switch will be turned on. The pseudo code of speech recognition is shown in
Figure 6.
The main control program of SBPF on the APP, which is named BEPAPP, is a self-development programmed in the proposed system. The program on the MCU will receive, analyze, and respond to the packets. Here, we use C++ on MCU, and JAVA is used for APP. Furthermore, the interrupt mechanism will be triggered on MCU as it applies a single thread model. In contrast, multiple threads can be applied on an Android system, which can improve overall performance.
The flowcharts of the proposed smart digital mixer system, including speech recognition, UI interface, and Bluetooth, are shown in
Figure 7. The right flowchart in
Figure 7 describes the processing of the Bluetooth transmission. On the APP, “scan to peripheral devices” can be triggered, and the connection can be created between Bluetooth and the audio mixer. If the connection has been created successfully, the user can enter the main operation interface. The middle flowchart in
Figure 7 describes the processing of the user interface, and the control command can be obtained according to the speech recognition. The component ID is detected and triggered according to the control command. The left flowchart in
Figure 7 describes the processing of speech recognition. Speech recognition is triggered once the voice is received on the APP, and then the content of the speech is converted to text. Next, semantic analysis is employed to obtain the corresponding control command. Finally, the control command is transmitted to MCU via SBPF and Bluetooth.
3. Realization Results, Comparison, and Discussion
For the interface of the APP, ConstraintLayout [
22] provided by the Android platform is used as the base layout method. The operation process of the APP is introduced in
Figure 8. The initial screen of the interface is shown in
Figure 8a. Bluetooth scanning will be activated when the button “start scanning” is pressed. All the devices that can be connected will be shown in the list. Once the Bluetooth device is connected, then a screen will be shown, as in
Figure 8b. Press the “setting” button to enter the main screen of the control functions. The main control screen is shown in
Figure 8d. The tuning button of the main volume is located in the middle of the screen. The volume control screen is shown in
Figure 8c. The slider bar is used to control the volume adjustment of the wired/wireless microphone, external sound source volume, reverberation level, and the room size. The equalizer control screen is shown in
Figure 8e. The slider bar is used to control the gain of the high, medium, and low frequency band for the frequency response of the input audio. The hardware framework of the proposed system is demonstrated in
Figure 9. The user can manipulate the mixer system on the APP. Once the devices are connected via Bluetooth, audio tuning can be executed sing voice control or touch control. The proposed design improves the drawbacks of the traditional audio mixer system to a smart system with intuitive operation and rapid response.
Table 3 summarizes the comparison results of seven items for common commercial products of smart speakers [
7,
8,
12,
13,
14]. As the price is one of the consider factors, the total cost of the proposed design was USD 398.95 dollars, which included a JBL PartyBox 300 speaker (USD 379.95 dollars) and smart digital mixer system (USD 19 dollars).
The proposed low-cost smart digital mixer system includes an application (APP), data transfer controller (DTC), and digital signal processor (DSP). DSP can be connected to diverse speaker output devices with a 3.5 mm AUX cable, and it is not limited to the speaker used, which is a JBL PartyBox 300 speaker in the article. The functions of most of the products on the market are bundled on the speakers. The different levels of speaker devices have different sound qualities. Practical testing reveals that the sound quality is premium and does not have noticeable distortion by playing the audio via wired headphones. In addition, the output sound quality depends on digital signal processing, speakers, and the employed Class-D audio amplifier. In general, a Class-D audio amplifier costs about 1 to 50 US dollars depending on the different power output and output channels. The main contribution of the proposed work is to provide a DSP platform design that can achieve adjustment of the audio mixer, and the sound effect can be transmitted to alternative output devices by APP control with speech recognition. The JBL PartyBox 300 speaker is adopted in the experiments, and its cost also is included in
Table 3. Although some companies develop APPs for their own products to achieve remote-control [
12,
13,
14], the proposed system still contains all the functions of a smart speaker. The users can adjust the effects of the equalizer and reverberation according to their preference. Compared to the analog mixer, which only can achieve one adjustment at a time, the digital mixer can achieve multiple adjustments at one time. Therefore, the proposed system of the digital mixer has more mobility than the analog mixer. Overall, the proposed system has the advantage of a low-cost price and having full functions. It also provides high flexibility to connect with multiple types of Bluetooth speakers.
To verify the accuracy of speech recognition on the APP, 14 control commands of speech were created in the proposed system. The 13 eligible respondents (R1–R13) whose ages were between 20 and 30 were invited to test the speech recognition function. Every control command was tested at least three to five times. The total tests reached 770 times. The testing environments were set up in office and band rehearsal studios. The test result is shown in
Figure 10. The accuracy reached 90% of nine respondents. The average accuracy of all respondents reached 92.3%. The result revealed that the accuracy of speech recognition achieved excellent results, even though the testing environment had ambient noise.
4. Conclusions
Overall, the proposed smart digital mixer system can control the devices remotely via speech on the APP. The convenient design can reduce the manpower of sound engineers and the rehearsal time, especially at large-scale musical events. Thirteen respondents were invited to test the speech recognition on the APP, and the total test times were over 770 times. The results show the average accuracy of all respondents reached 92.3%. Speech recognition can resist little noise, but there will be recognition errors when the environment has ambient noise. Based on previous experience, headphones with active noise cancellation can be applied to improve the problem. Moreover, this low-cost digital system design could potentially become the solution to replace commercial products in the future.