Next Article in Journal
Denial of Service Attack Prevention and Mitigation for Secure Access in IoT GPS-based Intelligent Transportation Systems
Next Article in Special Issue
Operational Amplifiers Defect Detection and Localization Using Digital Injectors and Observer Circuits
Previous Article in Journal
Ship Detection in SAR Images Based on Steady CFAR Detector and Knowledge-Oriented GBDT Classifier
Previous Article in Special Issue
Field Programmable Analog Array Based Non-Integer Filter Designs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Analog MP3 Compression Psychoacoustic Model Implemented on a Field-Programmable Analog Array

by
Lenno Liu
,
Jennifer Hasler
* and
Pranav Mathews
Electrical and Computer Engineering (ECE), Georgia Institute of Technology, Atlanta, GA 30332-250, USA
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(14), 2691; https://doi.org/10.3390/electronics13142691
Submission received: 15 May 2024 / Revised: 29 June 2024 / Accepted: 6 July 2024 / Published: 10 July 2024

Abstract

:
This effort describes a low-power analog MP3 psychoacoustic model designed, implemented, and demonstrated using a large-scale Field-Programmable Analog Array (FPAA) to allocate bits for MP3 compression. Although MP3 encoders are assumed to be a digital algorithm, this effort looks to create analog algorithm versions. This block could enable low-power real-time MP3 encoding of microphone input signals for higher-quality acoustic transmissions, significantly improving the audio quality of the compressed transmitted signal from battery-powered devices. An exponentially spaced filterbank enables a low-power representation consistent with human hearing transduction. An analog psychoacoustic model uses signal masking to determine channel bit rates. These designs would enable a fully integrated MP3 encoder when implementing the straightforward aspects of the MDCT.

1. Introducing Psychoacoustic MP3 Compression

Lossy audio compression algorithms take advantage of how humans perceive sound to achieve high compression ratios. MP3 audio coding in particular uses a psychoacoustic/perceptual model [1] to decrease bit allocation to regions that are less likely to be heard by humans and increase bit allocation to regions that would be audible (Figure 1). Using a psychoacoustic model results in an ≈ 10 × decrease in the transmitted bit rate without a perceptible loss in audio quality for most individuals, which is why algorithms like MP3 are favored over the earlier MPEG1/MPEG2 that do not use a psychoacoustic model [2], although they require significantly more computation to compute this psychoacoustic model.
The psychoacoustic model (Figure 1) determines what signals to mask through the non-uniform dependence of human auditory perception on signal frequency (20 Hz–20 kHz) and sound pressure level (SPL) (up to 120 dB [3]) [1,4]. For example, a 65 dB SPL signal at 100 Hz is perceived at roughly the same loudness as a 40 dB signal at 1 kHz ([5]). Neighboring signals can also affect each other through simultaneous, or spectral, masking which causes large input signals to suppress the ear’s ability to sense smaller signals at nearby frequencies. This causes a strong 1 kHz signal to render a weak signal at 1.2 kHz inaudible. A psychoacoustic model allocates bits based on both the ear’s frequency response and its spectral masking.
All MP1, MP2, and MP3 encoders process the audio signal into 1152-sample frames (26 ms, 44.1 kSPS, 576 increase per frame) through 32 parallel, linearly spaced filters (≈600 Hz bandwidth for each filter) between 20 Hz and 20 kHz. Each output from the filterbank is then passed through a Modified Discrete Cosine Transform (MDCT) (Figure 1). The MDCT is similar to a Discrete Cosine Transform (DCT) but differs in that it smooths through overlapping frames. These outputs pass through quantization, Huffman coding, and bitstream formatting for the final MP1, MP2, and MP3 output (Figure 2). Real-time MP3 decoding is possible on most microprocessors (μP) in real time [6].
Custom hardware implementing these common MP1, MP2, and MP3 encoding blocks have been demonstrated experimentally. A 350 nm CMOS chip demonstrated processing with 165 mW of power (6.4 × 6.7 mm2 size die) [7,8], and a 90 nm CMOS chip showed 50 mW–100 mW in simulation [9,10,11,12,13,14]. Some commercial chips also efficiently perform the filterbank + MDCT computation (VS1063) [15].
However, experimental hardware implementations of psychoacoustic models (Figure 1) are typically not published. The psychoacoustic model requires much more computation than decoding the MP3 result and more than the MP3 initial filtering [16,17]. Typical digital implementations take a 1024-point FFT of the 1152 samples at a given frame (at least a half overlap with the previous frame) to create a linearly spaced frequency spectrum. From this spectrum, the psychoacoustic model determines the bit allocation through MDCT scaling and masking computations, which can be conceptualized as a spatial high-pass filtering of the frequency spectrum, but this process is computationally expensive. The connection between psychoacoustic model and mammalian hearing [18] includes considering a bio-inspired filterbank on the front end (an LPF cochlear cascade (e.g., [19]) as a DSP model [18]) masking functions and the resulting bit allocation.
This effort shows the first experimental implementation of a low-power analog MP3 psychoacoustic model (Figure 2) using a large-scale Field-Programmable Analog Array (FPAA) [20] for determining audio bit-rate compression. After an overview of the SoC FPAA (Section 2), this discussion considers the exponentially spaced filterbank (Section 3) and the analog psychoacoustic model (Section 4) leading to the digital output encoding for the bit-rate determination (Section 5). This work focuses on implementing an analog psychoacoustic model on an FPAA; the MDCT algorithm can utilize previous low-power analog DCT computation [21,22]. An early version of an analog MP3 perceptual model was previously proposed although never explicitly demonstrated [23]; this effort develops and demonstrates a full analog MP3 model on the SoC FPAA.

2. SoC FPAA Overview

This section overviews the SoC FPAA for experimentally demonstrating the analog psychoacoustic model (Figure 2). The SoC FPAA is a reconfigurable, programmable platform that can be used in place of an IC, for example in analog computing or signal processing [24]. The SoC FPAA contains an array of Computational Analog Blocks (CABs, 98 CABs in the SoC FPAA [25]), among other components, where each CAB contains various analog circuit components. The SoC FPAA tool chain, starting from an Xcos graphical description is compiled, targeted, and potentially measured through Scilab tools operating in an Ubuntu VM [24]. Efficient designs pack most to all computation into a single CAB.
Floating-Gate (FG) devices enable programmability on this standard CMOS IC, where the programmed values are retained (within 1–100 μV) for the chip lifetime and are directly reprogrammed through hot-electron injection and electron-tunneling processes on the IC [20,25]. FG devices make up the routing fabric, enabling efficient computation-in-routing, and are also present in all programmable analog circuit elements. FG devices make capacitors effectively low-frequency devices, and therefore capacitances can affect (gate to drain overlap capacitance) or can be used for the circuit operation.
Each CAB contains 4 Transconductance Amplifiers (TAs) with FG transistors setting the bias current (2 TAs with FG input transistors), 4 selectable capacitors, 4 transmission gates (T-gates), 2 nFETs, 2 pFETs, 2 nFET current mirrors, and 990 FG pFET routing elements. The TA element output current ( I o u t ) for a programmable bias current ( I b i a s ) and differential inputs ( V + , V ) is modeled as
I o u t = I b i a s tanh κ ( V + V ) 2 U T
where κ is the gate-to-surface-potential coupling, and U T is the thermal voltage (kT/q ≈ 25.8 mV at 300 K). The FG-TAs are the two TA elements that have FG pFETs instead of pFETs for the input differential transistor pair. The transconductance ( G m ) of the TA’s maximum linear region is
TA : G m = κ I b i a s 2 U T , FGTA : G m = C i n C T κ I b i a s 2 U T
where C i n / C T is the ratio of the input capacitance ( C 1 ) to the total pFET FG gate capacitance ( C T ). Each capacitor bank contains three selectable parallel capacitors (64 fF, 128 fF, and 256 fF), allowing 7 different capacitance values for each bank ranging from 64 fF to 448 fF. Local CAB routing capacitances are ≈160 fF per local routing line [26].

3. Exponential Spectrum Analyzer

The MP3 psychoacoustic model requires a time-dependent frequency spectrum of the incoming signal, an exponential spectrum analyzer, which is implemented as a bank of eight parallel bandpass filters (C4 topology) and amplitude detectors tuned in the analog telephone band (≈1 decade) of 300 to 4 kHz [3]. The C 4 bandpass filter circuit (Figure 2) uses two TAs, an input capacitor C 1 , and a feedthrough capacitor C 2 [27]. An FG-TA enables a greater output signal linear range. A bandpass bank of eight elements between 300 Hz and 3 kHz are tuned through the bias currents of the two TAs, the feedforward TA ( I b i a s ), and the feedback FG-TA ( I f b ) to achieve a nearly similar response with an exponentially shifted center frequency (Figure 3, 100 mV p p ).
The center frequency ( f c e n t e r ), bandpass gain ( A v ), and resonance (Q) are functions of the programmed TA currents (Figure 3) ( I b i a s and I f b ) and selected capacitance ratios for C 1 and C o u t . C 1 is held at a ratio of two, and a small C 2 (parasitic capacitance) enables a gain greater than one for these structures. C o u t varies by a ratio m from its unit capacitance. Using these formulations [27],
A v = C 1 C 2 1 1 + C o u t C 2 2 U T κ V L I f b I b i a s
Q = C T C o u t C 2 κ V L 2 U T I b i a s I f b + C o 2 U T κ V L I f b I b i a s
f c e n t e r = 1 2 π κ I b i a s I f b 2 U T V L C o u t C T I b i a s I f b / m ,
where C T is the total capacitance on the input terminal, and the FG-TA linear range, V L = 670 mV arises from choosing a low capacitive input coupling.
Amplitude signal detection arises from a minimum envelope detector using a common-drain amplifier biased with a low current (Figure 2). The asymmetric common-drain response gives a rapid following for decreasing signals (e.g., the local signal minimum), and a slow response (slew response) for increasing signals, typical of diode-capacitor amplitude detector circuits. For a 1 nA bias current, the slew rate on the estimated 550 fF load is 180 mV/ms, and lower programmed currents give a slew rate of 16 mV/ms. The amplitude detection circuit is integrated into the same CAB as the bandpass filter circuit. Using a TA in the feedback increases the signal sensitivity of the amplitude detector from U T / κ by the gain of the amplifier. The FG TA enables the programming of the input offsets to a desired level.
The circuit finds the minimum envelope of the bandpass filter output for a combined frequency response (Figure 3b), similar to the front-end of other analog acoustic signal processing (e.g., [28]). The combined frequency response appears shifted due to a temperature increase during the measurement. These measurements are taken for a constant system gate voltage, holding the FG voltage fixed, thereby creating a significant temperature dependence during these measurements. A temperature-stable deployed version would use an FG bootstrap circuit to set the global gate voltage in a way that dramatically reduces the FG bias current dependence [29].

4. Computing the Psychoacoustic Model

The outputs of the exponential spectrum analyzer are taken by the analog psychoacoustic model, where the spectrum is continuously modified to compute the simultaneous (or spectral) masking (Section 4.1), and the modified analog spectrum is then converted to a digital representation (Section 4.2).

4.1. Spectral Masking Computed as a Spatial HPF

Spectral masking over a spatially distributed (e.g., parallel processing and outputs) time-varying spectrum can be modeled primarily as a spatial high-pass filter (HPF) that translates efficiently to a low-gain Winner-Take-All (WTA) circuit (e.g., [30]) with diffusor coupling between nodes (Figure 4). An HPF on the spectrum would suppress a smaller amplitude signal that is near a large signal amplitude. The HPF space constant is programmed to model these masking functions. More exponential spectrum analyzer stages would give more refined masking dynamics. The masking HPF circuit (Figure 4) does not require the higher gain typical of WTA circuits (e.g., [30]) and as a result, directly takes an input voltage vector ( V i n , n ) that gives a vector current output ( I n ). Each node has a programmed reference current ( I r e f ) to bias the input transistors at that node and to account for mismatch at each node; in practice, the input and reference current transistors may also be used together as an input or a reference. The total output current is the sum of all the bias currents; some proportion of the bias current will be at each output, effectively giving a relative bit rate for each band over a tunable maximum current.
Diffusor coupling between transistors results in a large-scale linear resistive spreading network often resulting in a spatial low-pass filter (LPF) [31]. The circuit takes a difference from the vector of input voltages ( V i n , n ) and the smoothed, low-pass filtered version of the inputs ( V n ) through the input pFET transistors. Spatial HPFs are computed through the subtraction of a signal and in early retinal modeling [19], as well as in diffusor retinal modeling [31]. Expanding around a common input ( V 0 ) and common node voltage ( V i n , 0 ) bias voltage yields a linearized model with a common source conductance ( g s ), transconductance ( g m = κ g s ), and coupling resistance ( R = U T / I 0 exp ( κ V h V 0 ) / U T ), which illustrates the spatial filtering. Input currents into the diffusor shows the second-order low-pass filtering,
I n = 1 R ( V n 1 2 V n + V n + 1 ) h 2 R 2 V ( x ) x
h 2 g m R 2 V ( x ) x + 1 κ V ( x ) = V i n ( x ) ,
where V(x) is the continuous spatial approximation of V n , and V(x + h) = V n + 1 . This second-order low-pass filtering creates a local average of the amplitude activity. This LPF becomes an HPF by subtracting the LPF result from the input signal,
I n = g m V i n , n g s V n ,
effectively strengthening the larger signals and attenuating close smaller signals. This spatial filtering block modeling the perceptual model operation sits between the amplitude detection and the I2F blocks (Figure 2). Although the spatial diffusor LPF is large-signal linear, larger input voltages result in exponential currents of the difference between gate and source voltage, further strengthening those large signals as one would want for an MP3 bit allocation. When R , sweeping two neighboring node inputs ( V i n , n and V i n , n + 1 ) show the matching of their transfer output ( V n and V n + 1 ) characteristics (Figure 5), with the resulting slope being their effective κ (=0.84). The FG diffusor-based [31,32] LPF can be tuned for particular frequency bands to have more or less LPF effect, effectively changing the psychoacoustic model where required.

4.2. Current-to-Frequency Converter (I2F)

The spectral masking output current is converted to a digital signal (Figure 5 and Figure 6) through a current-to-frequency (I2F) digital oscillation (Figure 2) similar to an integrate-and-fire neuron [19] or Σ Δ modulator. Some digital MP3 implementations take advantage of the shaped noise floor of a Σ Δ converter as part of their psychoacoustic model [33]. Integrating the current on the input capacitor ( C L 2 ) increases the capacitor voltage until it reaches a threshold V r e f , where the comparator turns on an additional discharge path (through a T-gate switch) to rapidly discharge the input capacitor. The capacitive coupling (through C f ) introduces enough hysteresis to prevent unwanted high-frequency oscillation while ensuring discharge of the input capacitor. Assuming the discharge current (set by the FG reference through the current mirror) is significantly larger than the I2f input current ( I i n ), which is the spectral masking output current, the oscillation frequency ( f o u t ) for a constant I i n is [19]
f o u t = I i n C L 2 C f V d d C L 2 + C f = A I i n ,
The discharge current from the storage capacitor ( C f ) is enabled by the drain of the NFET mirror that tends to initialize that voltage near GND with a significant charge-to-discharge C L 2 , further making the I2F a linear function of the input current. As C f is a parasitic capacitance for this implementation, the C f mismatch between different CABs results in a different factor (A ∝ C f ) from (6) between the input current and the output frequency. Using an explicit capacitance would greatly reduce this mismatch (<1%). Output frequencies greater than 80 kHz were measured from the I2F converter, which was more than sufficient enough in a 26 ms MP3 frame to give 8 bits (≈10 kHz) of precision or higher. The masking function and I2F converter operated with a ln(·) relationship between the input voltage, output current, and output spike rate for input voltages between 0.3 V and 1.8 V for a single stage (Figure 5) as well as for multiple parallel stages (Figure 6). The I2F allowed for an estimate of the input voltages.
These signals were measured through FPAA I/O pins for characterizing this system. When measuring these output voltages with external pins (Figure 6), a TA buffer was used to reduce the impact of external chip I/O coupling between these nodes and avoid coupling or premature spiking by signals feeding through C f b and causing V i n to rise above V r e f . The buffer significantly reduced but did not eliminate the issue (Figure 6). This issue was entirely an instrumentation issue as no significant coupling (e.g., [26]) was measured within FPAA CABs or when routing between FPAA CABs. In practice, these lines would stay on the FPAA, and further, these lines would be digitized to binary levels for the following digital stages.
For our external instrumentation (not required for internal computation), the output used edge-encoded digital sampling which was compared against an LPF average for the signal to generate a robust binary value. For the characterization, the signal was counted over a finite window to extract a frequency. The frequency range for this measurement spanned between 1 kHz and 20 KHz, resulting in roughly an 8–10 bit precision for these characterized measurements. The particular coupling between input nodes determines the strength of the high-pass filter modeling the spectral masking. An FG element sets R in the diffusor element that is programmed to I p r o g , R during programming. The I2F allows for a characterization of the WTA coupling (Figure 7), including that a shift of V i n , 0 (1.5 V, Figure 7, and 1.0 V, Figure 7, V C M is V i n , 0 ) changes the operating range, as predicted in Figure 5. If I p r o g , R is programmed too low relative to the V i n , 0 used, the effective R will be so large one will not see any coupling between these stages. One can see these effects by taking different levels of coupling I p r o g , R ) and subtracting the uncoupled output response from these values to see the coupling effect (Figure 7).

5. Digital Output Encoding for Bit-Rate Determination

The combination of the exponential spectrum analyzer and psychoacoustic model results in a strong response at one center frequency with an inhibited response at nearby filter taps (Figure 8). The measurement is effectively a frequency response with a sine wave input and resulting output voltage. For the two-tap system (Figure 8), as the frequency increases above 1.2 kHz, the psychoacoustic model for tap 5 begins to win, suppressing the tap 4 output. For frequencies above 2 kHz, the system returns to the uncoupled response. These dynamics are preserved for different input signal amplitudes. As the number of taps are increased from two to four to eight, the responses become sharper due to the psychoacoustic model effect (Figure 8 and Figure 9). Each of the taps takes its turn winning suppressing the other taps. Different spectral signals have different predictable results for this eight-tap psychoacoustic model encoder (Figure 9). Sine, triangle, and square wave inputs (754 Hz fundamental frequency) show the strong response for the fundamental frequency, suppression of nearby bands, and different band responses for the third harmonic component (Figure 10).
One can estimate an entire MP3 encoder for the bandwidth of a wired phone application (8–10 kSPS, 3.4–4 kHz bandwidth) that includes the digital bandpass filtering and quantization phase as well as the perceptual model (Figure 11). An eight-parallel-tap analog psychoacoustic model would be sufficient for this approach (Figure 11b) consuming roughly 2.5 μW for this function. The power consumption (without the instrumentation buffers, Figure 11b) is primarily from the I2F blocks (86%) that might not be required for a full MP3 implementation. The feedforward computation (also on an SoC FPAA [20] would require a seven-tap delay line (one CAB per delay line tap, 25 nW for seven taps, and a 100 μs delay [34]), and an 8 × 8 Vector-Matrix Multiplication (VMM) to complete the digital filter (2–4 CABs, roughly 50 nW at 10 kSPS [35]). The feedforward filterbank results in 650 Hz bandwidth outputs, requiring an amplitude detection out of the outputs, where eight outputs fully cover the application space.
The quantization requires matching eight exponentially spaced values to eight linearly spaced relative values to determine the relative bit selection (Figure 12). Each linearly spaced band is a weighted linear combination of the exponential band; the question is determining that precalculated weighting value for each of the outputs (a matrix transformation between input bands and output bands). The M exponential bands (m values) have a normalized transfer function H m ( s ) that has energy in the N linearly space bands (n values) determined by the matrix B as
B m n = j 2 π f l i n , n 1 j 2 π f l i n , n H m ( s ) j 2 π f m i n j 2 π f m a x | H m ( s 1 ) | d s 1 d s
where f m a x is the maximum input frequency, f m i n is the minimum input frequency, and f l i n , n = f m i n + ( f m a x f m i n ) n M . The computation is another 8 × 8 VMM operation operating at 650 Hz or slower (5 nW or less power).
This entire MP3 compression would require roughly 27–31 CABs, still less than one-third of the CAB components. The system feedforward power is ≈80 nW with a psychoacoustic model requiring 2.5 μW, or 340 nW without the I2F converters. A parallel MDCT (36 × 18) could be enabled on the SoC FPAA (if space would permit) using 17 additional ladder filter blocks at each output biased at lower currents (75 nW), and slower 36 × 18 VMM and amplitude detector operations (143 nW). The power could be roughly 25–100 μW for nearly efficient digital computations, significantly lower than previous custom implementations [7,8,9,10,11,12,13,15]. The remainder of the MP3 compression could be directly computed through the MSP430 processor.
The equivalent digital computation would require FFTs covering the entire frequency range to get the exponential bandpass samples and then finishing the psychoacoustic masking from these outputs for the bit allocation. The model for comparison requires a 128-pt FFT every 10 ms to cover the 200 Hz to 4 kHz bandwidth (at least 10 kSPS) to have a few bands for computing the final amplitude computation. Using a 64-pt input buffer overlap, the number of computations going from the sampled input signal to the resulting exponentially spaced eight bands with 10–20 ms sample rates would be roughly 2 MMAC(/s), resulting in 200 μW for only optimal single-precision calculations. The resulting computations for the soft WTA and resulting bit computation result in an order of magnitude fewer MAC operations than this initial subbanding computation. This digital analysis provides a typical solution for the psychoacoustic model as MP3 accelerators focus on the MDCT computation [7,8,9,10,11,12,13,15], and commercial model algorithm implementations are rarely published. This analysis also provides a baseline to compare the resulting analog implementation, particularly an analog implementation on a configurable device; a custom IC solution would have improved energy efficiency.

6. Summary and Discussion

A low-power analog MP3 psychoacoustic model was designed, implemented, and demonstrated using an FPAA for determining audio bit-rate compression. An exponentially spaced filterbank (Section 3) enabled a low-power representation consistent with human hearing transduction. An analog psychoacoustic model (Section 4) used signal masking to determine channel bit rates. These designs would enable the hardware implementation of the MDCT in order to have a fully integrated MP3 encoder. This block could enable a low-power real-time MP3 encoding of a microphone input signal for higher-quality acoustic transmissions, significantly improving the audio quality of the compressed transmitted signal from a battery-powered smartphone.
This effort is the first analog perceptual model implementation, and even digital implementations for a perceptual model algorithm are rarely discussed. It is assumed the psychoacoustic model takes up the largest complexity percentage [36], particularly depending on the amount of mammalian hearing modeling [18]. One computational estimate for a simple perceptual model encoder was that it took over 76% of the entire required computation, where the input FFT was not part of the required computation [37], which is a limited form of this analog implementation [38]. In another case, the psychoacoustic required one of two DSP cores and did not include the initial FFT computation [39]. One perceptual model hardware design required 500k gates for a synthesized design (clocked at 1 MHz) for the perceptual algorithm. A 10 fF load capacitance per gate (typical of the likely 180 nm CMOS IC process) resulted in 15 mW of algorithm power if the IC had been placed, routed, fabricated, and measured [40]. A lot of the focus requires architecture-level comparisons, where analog compared to digital computation offers a low-power and real-time approach with similar complexity.
A low-power MP3 encoder that takes a microphone or other analog input (no ADC required) and generates the required digital outputs enables MP3 encoding in multiple low-power applications, Although analog phone line quality is 48 kbps (8 kSPS, 8-bit μ -law), typical cell phone transmission ranges between 4.75 kbits/s and 13 kbits/s (kbps = kbits per second) with increasing pressure to decrease these data rates. MP3 compression could significantly increase the audio quality of these signals.
This experimental demonstration showed a very low-power MP3 implementation on a range of embedded devices to enable higher-quality audio to be transmitted for a given potential bit rate. Microphone front-end amplifiers often require more energy than the 2–4 μW energy requirements for a full MP3 encoder model, resulting in a widely used edge device algorithm capable of compressing audio signals to remote locations. All forms of edge devices could utilize this ultra-low-power algorithm. Other sensor signals can be compressed, although the perceptual model may or may not match the needed information from that environment. With scaled-down FPAA devices [41], one expects to have smaller CABs, as well as more MP3-favorable CABs, making an MP3 encoder as one of multiple algorithms running on such an FPAA. The MP3 encoder would be a small percentage of the total number of CABs. These elements could be synthesized as a custom module using analog standard cells (e.g., [42]) resulting in a small custom module for an edge device. Further, this MP3 design would be possible in any FPAA device; this particular FPAA device was fabricated with a 350 nm CMOS process although FPAA devices have been and will be fabricated with much smaller IC processes (40 nm, 28 nm, and 16 nm CMOS).

Author Contributions

Conceptualization, J.H. and L.L.; methodology, L.L, J.H. and P.M.; formal analysis, L.L. and J.H.; writing—original draft, J.H., L.L. and P.M.; writing—review and editing, J.H., P.M. and L.L.; supervision, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors want to thank the multiple discussions with David Anderson over many years, including identifying this question as an important signal processing question.

Conflicts of Interest

Jennifer Hasler and Pranav Mathews declare no conflicts of interest. Lenno Liu performed all of the technical work at Georgia Tech with no conflicts of interest. After graduation, Lenno Liu was employed by Texas Instruments, Dallas.

References

  1. Jayant, N.; Johnston, J.J.; Safranek, R. Signal compression based on models of human perception. Proc. IEEE 1993, 81, 1385–1422. [Google Scholar] [CrossRef]
  2. McCandless, M. The MP3 revolution. IEEE Intell. Syst. Their Appl. 1999, 14, 8–9. [Google Scholar] [CrossRef]
  3. Leach, W.M. Basic Principles of Sound. In Introduction to Electroacoustics and Audio Amplifier Design; Kendall Hunt Publishing Company: Dubuque, IA, USA, 2003; pp. 1–12. [Google Scholar]
  4. Musmann, H. Genesis of the MP3 audio coding standard. IEEE Trans. Consum. Electron. 2006, 52, 1043–1049. [Google Scholar] [CrossRef]
  5. Standard ISO 226:2003; Acoustics–Normal Equal-Loudness-Level Contours. International Organization for Standardization: Geneva, Switzerland, 2003.
  6. Bazyar, M.; Sudirman, R. A Robust Data Embedding Method for MPEG Layer III Audio Steganography. Int. J. Secur. Its Appl. 2015, 9, 317–327. [Google Scholar] [CrossRef]
  7. Hong, S.; Park, B.; Song, Y.; See, H.; Kim, J.; Lee, H.; Kim, D.; Song, M. A full accuracy MPEG1 audio layer 3 (MP3) decoder with internal data converters. In Proceedings of the IEEE 2000 Custom Integrated Circuits Conference, Orlando, FL, USA, 21–24 May 2000; pp. 563–566. [Google Scholar]
  8. Hong, S.; Kim, D.; Song, M. A low power full accuracy MPEG1 audio layer III (MP3) decoder with on-chip data converters. IEEE Trans. Consum. Electron. 2000, 46, 903–906. [Google Scholar] [CrossRef]
  9. Tsai, T.H.; Wang, C.K.; Liu, C.N. Low power techniques for MP3 audio decoder using subband cut-off approach. In Proceedings of the IEEE Workshop on Signal Processing Systems Design and Implementation, Suzhou, China, 28–30 May 2005. [Google Scholar]
  10. Dai, X.; Wagh, M.D. An MDCT Hardware Accelerator for MP3 Audio. In Proceedings of the 2008 Symposium on Application Specific Processors, Anaheim, CA, USA, 8–9 June 2008; pp. 121–125. [Google Scholar]
  11. Malik, P.; Ufnal, M.; Luczyk, A.W.; Balaz, M.; Pleskacz, W.A. MDCT/IMDCT low power implementations in 90 nm CMOS technology for MP3 audio. In Proceedings of the 2009 12th International Symposium on Design and Diagnostics of ElectronicCircuits & Systems, Liberec, Czech Republic, 15–17 April 2009; pp. 144–147. [Google Scholar]
  12. Kim, H.S.; Kim, S.H.; Chung, K.S.; Han, T.H. Low power implementation of MDCT/IMDCT for MP3 audio decoder. In Proceedings of the 2010 International SoC Design Conference, Las Vegas, NV, USA, 27–29 September 2010; pp. 143–146. [Google Scholar]
  13. Jeong, H.; Kim, J.; kyung Cho, W. Low-power multiplierless DCT architecture using image correlation. IEEE Trans. Consum. Electron. 2004, 50, 262–267. [Google Scholar] [CrossRef]
  14. Birkl, B.; Hooser, B.; Janssens, M.; Lenke, F.; Vorisek, V. Design integration, DFT, and verification methodology for an MPEG 1/2 audio layer 3 (MP3) SoC device. In Proceedings of the IEEE 2002 Custom Integrated Circuits Conference (Cat. No. 02CH37285), Orlando, FL, USA, 15 May 2002; pp. 303–306. [Google Scholar]
  15. VLSI Solution. MP3/OGG Vorbis Encoder and Audio Codec Circuit; Version 1.31; VLSI Solution: Tampere, Finland, 2017. [Google Scholar]
  16. Brandenburg, K.; Bosi, M. Overview of MPEG Audio: Current and Future Standards for Low Bit-Rate Audio Coding. J. Audio Eng. Soc 1997, 45, 4–21. [Google Scholar]
  17. Brandenburg, K. Low bitrate audio coding-state-of-the-art, challenges and future directions. In Proceedings of the International Conference on Signal Processing, Beijing, China, 21–25 August 2000; Volume 1, pp. 1–4. [Google Scholar]
  18. Baumgarte, F. Improved audio coding using a psychoacoustic model based on a cochlear filter bank. IEEE Trans. Speech Audio Process. 2002, 10, 495–503. [Google Scholar] [CrossRef]
  19. Mead, C. Analog VLSI and Neural Systems; Addison-Wesley: Wokingham, UK, 1989. [Google Scholar]
  20. Hasler, J. Large-Scale Field Programmable Analog Arrays. Proc. IEEE 2020, 108, 1283–1302. [Google Scholar] [CrossRef]
  21. Suh, S.; Basu, A.; Schlottmann, C.; Hasler.; Barry, J.R. Low-Power Discrete Fourier Transform for OFDM: A Programmable Analog Approach. IEEE TCAS I 2011, 58, 290–298. [Google Scholar] [CrossRef]
  22. García Moreno, D.; Del Barrio, A.A.; Botella, G.; Hasler, J. A Cluster of FPAAs to Recognize Images Using Neural Networks. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 3391–3395. [Google Scholar]
  23. Twigg, C.; Hasler. A Large-Scale Reconfigurable Analog Signal Processor (RASP). In Proceedings of the IEEE CICC, San Jose, CA, USA, 10–13 September 2006. [Google Scholar]
  24. Hasler, J.O.; Natarajan, A. An Open-Source ToolSet for FPAA Design. WOSET 2020, 5, 2020. [Google Scholar]
  25. George, S.; Kim, S.; Shah, S.; Hasler, J.; Collins, M.; Adil, F.; R, W.; Nease, S.; Ramakrishnan, S. A Programmable and Configurable Mixed-Mode FPAA SoC. IEEE Trans. VLSI 2016, 24, 2253–2261. [Google Scholar] [CrossRef]
  26. Hasler, J.; Kim, S.; Adil, F. Scaling Floating-Gate Devices Predicting Behavior for Programmable and Configurable Circuits and Systems. J. Low Power Electron. Appl. 2016, 6, 13. [Google Scholar] [CrossRef]
  27. Graham, D.W.; Hasler; Chawla, R.; Smith, P.D. A Low-Power Programmable Bandpass Filter Section for Higher Order Filter Applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2007, 54, 1165–1176. [Google Scholar] [CrossRef]
  28. Shah, S.; Hasler, J. Low power speech detector on a FPAA. In Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, USA, 28–31 May 2017; pp. 1–4. [Google Scholar] [CrossRef]
  29. Shah, S.; Toreyin, H.; Hasler, J.; Natarajan, A. Temperature Sensitivity and Compensation on a Reconfigurable Platform. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2018, 26, 604–607. [Google Scholar] [CrossRef]
  30. Lazzaro, J.; Ryckebusch, S.; Mahowald, M.; Mead, C.A. Winner-Take-All Networks of O(N) Complexity. In Proceedings of the Advances in Neural Information Processing Systems; Touretzky, D., Ed.; Morgan-Kaufmann: Burlington, MA, USA, 1988; Volume 1. [Google Scholar]
  31. Boahen, K.A.; Andreou, A.G. A Contrast Sensitive Silicon Retina with Reciprocal Synapses. In Proceedings of the Advances in Neural Information Processing Systems; Moody, J., Hanson, S., Lippmann, R., Eds.; Morgan-Kaufmann: Burlington, MA, USA, 1991; Volume 4. [Google Scholar]
  32. Smith, P.; Hasler. A programmable diffuser circuit based on floating-gate devices. In Proceedings of the IEEE MWSCAS, Tulsa, OK, USA, 4–7 August 2002; Volume 1, pp. 1–291. [Google Scholar]
  33. Dunn, C.; Sandler, M. Psychoacoustically Optimal Sigma-Delta Modulation. J. Audio Eng. Soc 1997, 45, 212–223. [Google Scholar]
  34. Hasler, J.; Shah, S. An SoC FPAA Based Programmable, Ladder-Filter Based, Linear-Phase Analog Filter. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 68, 592–602. [Google Scholar] [CrossRef]
  35. Schlottmann, C.; Hasler. A highly dense, low power, programmable analog vector-matrix multiplier: The FPAA implementation. IEEE J. Emerg. CAS 2011, 1, 403–411. [Google Scholar] [CrossRef]
  36. Kurniawati, E.; Lau, C.; Premkumar, B.; Absar, J.; George, S. New implementation techniques of an efficient MPEG advanced audio coder. IEEE Trans. Consum. Electron. 2004, 50, 655–665. [Google Scholar] [CrossRef]
  37. Tsai, T.H.; Huang, S.W.; Chen, L.G. Design of a low power psycho-acoustic model co-processor for MPEG-2/4 AAC LC stereo encoder. In Proceedings of the 2003 IEEE International Symposium on Circuits and Systems (ISCAS), Bangkok, Thailand, 25–28 May 200; Volume 2.
  38. Tsai, T.H.; Huang, S.W.; Wang, Y.W. Architecture design of MDCT-based psychoacoustic model co-processor in MPEG advanced audio coding. In Proceedings of the 2004 IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, BC, Canada, 23–26 May 2004; Volume 2, pp. 2–761. [Google Scholar]
  39. Kim, S.Y.; Oh, H.O.; Lee, K.S.; Kim, K.S.; Youn, D.H.; Lee, J.Y. A real-time implementation of the MPEG-2 audio encoder. IEEE Trans. Consum. Electron. 1997, 43, 593–597. [Google Scholar]
  40. Tsai, T.H.; Huang, S.W.; Wang, Y.W. An accelerated pshchoacoustic model chip design for MPEG 2/4 AAC audio. In Proceedings of the Cellular Neural Networks and Their Applications, Hsinchu, Taiwan, 28–30 May 2005; pp. 245–248. [Google Scholar]
  41. Hasler, J. The Rise of SoC FPAA Devices. In Proceedings of the CICC, Newport Beach, CA, USA, 24–27 April 2022. [Google Scholar]
  42. Ige, A.; Yang, L.; Yang, H.; Hasler, J.; Hao, C. Analog System High-level Synthesis for Energy-Efficient Reconfigurable Computing. J. Low Power Electron. Appl. 2023; in Press. [Google Scholar]
Figure 1. This effort builds a low-power analog MP3 psychoacoustic/perceptual model on a large-scale Field-Programmable Analog Array (FPAA), in particular on an SoC FPAA.
Figure 1. This effort builds a low-power analog MP3 psychoacoustic/perceptual model on a large-scale Field-Programmable Analog Array (FPAA), in particular on an SoC FPAA.
Electronics 13 02691 g001
Figure 2. Top level picture of the MP3 encoder and the resulting analog psychoacoustic/perceptual model. The analog psychoacoustic system arises from four core blocks ( C 4 bandpass filter, amplitude detector, Local Winner-Take All (WTA), and a current-to-frequency (I2F) blocks) compiled into two unique CABs. The full system is a parallelization of these two CAB blocks. The first stages are modeled by an exponential spectrum analyzer that consists of 8 parallel C4 bandpass filters + amplitude detector blocks (Section 3). Filtered signals from the spectrum analyzer are fed into a local Winner-Take All (WTA) which does the bulk of our perceptual modeling (Section 4.1). A current-to-frequency converter (I2F), made of a Σ Δ modulator or an integrate-and-fire neuron is then used to convert the analog masking currents from the WTA into a digital signal (Section 4.2). A single parallel thread of this system takes two unique CABs on the FPAA. Bias currents for the psychoacoustic model and ADC are low: 1 nA for the WTA bias and 100 nA for the comparator. The quantization, Huffman coding, and bitstream formatting processes have been simplified into the “Encoder” block.
Figure 2. Top level picture of the MP3 encoder and the resulting analog psychoacoustic/perceptual model. The analog psychoacoustic system arises from four core blocks ( C 4 bandpass filter, amplitude detector, Local Winner-Take All (WTA), and a current-to-frequency (I2F) blocks) compiled into two unique CABs. The full system is a parallelization of these two CAB blocks. The first stages are modeled by an exponential spectrum analyzer that consists of 8 parallel C4 bandpass filters + amplitude detector blocks (Section 3). Filtered signals from the spectrum analyzer are fed into a local Winner-Take All (WTA) which does the bulk of our perceptual modeling (Section 4.1). A current-to-frequency converter (I2F), made of a Σ Δ modulator or an integrate-and-fire neuron is then used to convert the analog masking currents from the WTA into a digital signal (Section 4.2). A single parallel thread of this system takes two unique CABs on the FPAA. Bias currents for the psychoacoustic model and ADC are low: 1 nA for the WTA bias and 100 nA for the comparator. The quantization, Huffman coding, and bitstream formatting processes have been simplified into the “Encoder” block.
Electronics 13 02691 g002
Figure 3. (a) Bandpass filterbank ( C 4 ) frequency response of a tuned 8-channel filterbank to a 0.1 V p p input signal. Each C 4 filter was compiled and programmed to set CABs, and manually tuned by adjusting capacitances and bias currents. The tuning compensated for the mismatch between CABs to achieve a precise center frequency, gain, and Q factor for each filter. The C 4 bandpass filters’ Q 3 . (b) Eight-channel bandpass and min detect response for a 0.5 V p p input signal. (c) Tuned C 4 filterbank parameters. C 1 selected two capacitors for all 8 stages; C o u t varied its capacitance selection with the center frequency. These components were all in the same FPAA column.
Figure 3. (a) Bandpass filterbank ( C 4 ) frequency response of a tuned 8-channel filterbank to a 0.1 V p p input signal. Each C 4 filter was compiled and programmed to set CABs, and manually tuned by adjusting capacitances and bias currents. The tuning compensated for the mismatch between CABs to achieve a precise center frequency, gain, and Q factor for each filter. The C 4 bandpass filters’ Q 3 . (b) Eight-channel bandpass and min detect response for a 0.5 V p p input signal. (c) Tuned C 4 filterbank parameters. C 1 selected two capacitors for all 8 stages; C o u t varied its capacitance selection with the center frequency. These components were all in the same FPAA column.
Electronics 13 02691 g003
Figure 4. The circuit diagram as well as linearized (small-signal) representation for the low-gain, local Winner-Take-All (WTA) circuit performing a spatial high-pass filter. The FG pFET devices are devices found in the routing fabric.
Figure 4. The circuit diagram as well as linearized (small-signal) representation for the low-gain, local Winner-Take-All (WTA) circuit performing a spatial high-pass filter. The FG pFET devices are devices found in the routing fabric.
Electronics 13 02691 g004
Figure 5. Measurements for two neighboring programmed elements with R for an input sweep V i n , n and V i n , n + 1 ), including V n and V n + 1 for d i s t b i a s = 1 nA, effectively a pFET source follower’s characteristics, as well as the I2F output at these two locations, showing a wide log-linear region in the voltage-to-frequency conversion from 0.3 V to 1.8 V with no output nor cross-dependence.
Figure 5. Measurements for two neighboring programmed elements with R for an input sweep V i n , n and V i n , n + 1 ), including V n and V n + 1 for d i s t b i a s = 1 nA, effectively a pFET source follower’s characteristics, as well as the I2F output at these two locations, showing a wide log-linear region in the voltage-to-frequency conversion from 0.3 V to 1.8 V with no output nor cross-dependence.
Electronics 13 02691 g005
Figure 6. Typical output waveforms for four programmed I2F converters (different DC voltage levels). The small spikes in each output are due to parasitic capacitive coupling.
Figure 6. Typical output waveforms for four programmed I2F converters (different DC voltage levels). The small spikes in each output are due to parasitic capacitive coupling.
Electronics 13 02691 g006
Figure 7. Response of two cross-coupled WTA blocks with I2F conversion at two different V i n , 0 values (1.5 V and 1.0 V). All plots show the effect of I p r o g , R at three different programmed coupling values (100 nA, 1 μA, and 10 μA). The operating range for V i n , 0 = 1.5 V is ≈−0.45 V to 0.45 V, and for V i n , 0 = 1.0 V, it is ≈−0.9 V to 0.9 V. Normalized frequency: differences in V i n caused a differential shift in f o u t , which was normalized to V i n . 1 V i n , 2 = Δ V = 0 . Output frequency–uncoupled output frequency (normalized): The intrinsic f o u t ( V i n ) slope was fitted and removed to better observe the coupling behavior. For V i n , 0 = 1.5 V, one sees little coupling effect for I p r o g , R = 100 nA, and for V i n , 0 = 1.0 V, one sees little coupling effect for I p r o g , R = 100 nA, 1 μA.
Figure 7. Response of two cross-coupled WTA blocks with I2F conversion at two different V i n , 0 values (1.5 V and 1.0 V). All plots show the effect of I p r o g , R at three different programmed coupling values (100 nA, 1 μA, and 10 μA). The operating range for V i n , 0 = 1.5 V is ≈−0.45 V to 0.45 V, and for V i n , 0 = 1.0 V, it is ≈−0.9 V to 0.9 V. Normalized frequency: differences in V i n caused a differential shift in f o u t , which was normalized to V i n . 1 V i n , 2 = Δ V = 0 . Output frequency–uncoupled output frequency (normalized): The intrinsic f o u t ( V i n ) slope was fitted and removed to better observe the coupling behavior. For V i n , 0 = 1.5 V, one sees little coupling effect for I p r o g , R = 100 nA, and for V i n , 0 = 1.0 V, one sees little coupling effect for I p r o g , R = 100 nA, 1 μA.
Electronics 13 02691 g007
Figure 8. Two- and four-tap input frequency, output I2F measurements with diffusor coupling, I p r o g , R , programmed to 7 μA. The two-tap measurements were measured for two different input amplitudes (0.1 V and 0.25 V) using the frequency programmed for the 4th and 5th taps. The four-tap measurements were measured for two different input amplitudes (0.1 V and 0.2 V) using the frequency programmed for the 4th through 7th taps.
Figure 8. Two- and four-tap input frequency, output I2F measurements with diffusor coupling, I p r o g , R , programmed to 7 μA. The two-tap measurements were measured for two different input amplitudes (0.1 V and 0.25 V) using the frequency programmed for the 4th and 5th taps. The four-tap measurements were measured for two different input amplitudes (0.1 V and 0.2 V) using the frequency programmed for the 4th through 7th taps.
Electronics 13 02691 g008
Figure 9. Measurements from a full parallel, 8-channel measurement to I2F output for the psychoacoustic model. The input amplitude, V i n , was 0.5 V, I p r o g , R was 4 μA, and V i n , 0 = 1.5 V. The results show the I2F frequency output for a sinusoidal input sweep.
Figure 9. Measurements from a full parallel, 8-channel measurement to I2F output for the psychoacoustic model. The input amplitude, V i n , was 0.5 V, I p r o g , R was 4 μA, and V i n , 0 = 1.5 V. The results show the I2F frequency output for a sinusoidal input sweep.
Electronics 13 02691 g009
Figure 10. Tap inputs for a 754 Hz sine wave, triangle wave, and square wave. The triangle and square wave signals have significant third-harmonic components. The 8-parallel-tap filterbank spans between 300 Hz and beyond 3 kHz.
Figure 10. Tap inputs for a 754 Hz sine wave, triangle wave, and square wave. The triangle and square wave signals have significant third-harmonic components. The 8-parallel-tap filterbank spans between 300 Hz and beyond 3 kHz.
Electronics 13 02691 g010
Figure 11. MP3 proposed system. (a) Block diagram for the MP3 encoder system for an acoustic compression of a phone line signal (8–10 kSPS, 3.4–4 kHz bandwidth). (b) Psychoacoustic model bias current consumption of each block in the implemented 8-parallel-tap system ( V d d = 2.5 V). Buffers were used for instrumenting the output measurements to I/O pins (not necessary for on-chip MP3) as well as isolating the CAB components; buffers are not necessary between CAB components. The consumption of these buffers is eliminated with entirely on-chip processing. A 32-parallel-tap system for a 20 kHz bandwidth would scale to requiring 10 μW.
Figure 11. MP3 proposed system. (a) Block diagram for the MP3 encoder system for an acoustic compression of a phone line signal (8–10 kSPS, 3.4–4 kHz bandwidth). (b) Psychoacoustic model bias current consumption of each block in the implemented 8-parallel-tap system ( V d d = 2.5 V). Buffers were used for instrumenting the output measurements to I/O pins (not necessary for on-chip MP3) as well as isolating the CAB components; buffers are not necessary between CAB components. The consumption of these buffers is eliminated with entirely on-chip processing. A 32-parallel-tap system for a 20 kHz bandwidth would scale to requiring 10 μW.
Electronics 13 02691 g011
Figure 12. Plot and values for calculated exponential-to-linear converter constants, B, for 8 perceptual model signals (m) to 8 DCT model values (n).
Figure 12. Plot and values for calculated exponential-to-linear converter constants, B, for 8 perceptual model signals (m) to 8 DCT model values (n).
Electronics 13 02691 g012
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, L.; Hasler, J.; Mathews, P. An Analog MP3 Compression Psychoacoustic Model Implemented on a Field-Programmable Analog Array. Electronics 2024, 13, 2691. https://doi.org/10.3390/electronics13142691

AMA Style

Liu L, Hasler J, Mathews P. An Analog MP3 Compression Psychoacoustic Model Implemented on a Field-Programmable Analog Array. Electronics. 2024; 13(14):2691. https://doi.org/10.3390/electronics13142691

Chicago/Turabian Style

Liu, Lenno, Jennifer Hasler, and Pranav Mathews. 2024. "An Analog MP3 Compression Psychoacoustic Model Implemented on a Field-Programmable Analog Array" Electronics 13, no. 14: 2691. https://doi.org/10.3390/electronics13142691

APA Style

Liu, L., Hasler, J., & Mathews, P. (2024). An Analog MP3 Compression Psychoacoustic Model Implemented on a Field-Programmable Analog Array. Electronics, 13(14), 2691. https://doi.org/10.3390/electronics13142691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop