1. Introduction
With the deepening of controlled nuclear fusion research, higher real-time performance and better filtering effect requirements have been put forward for its power supply system [
1].Among the critical subsystems of the Experimental and Advanced Superconducting Tokamak (EAST), the fusion magnet power system plays a key role in providing support for plasma generation, confinement, and control. However, due to the high-power state in which the fusion magnet power supply operates, the surrounding magnetic field environment is complex, and the current feedback signal is often contaminated with noise, mostly white noise. Therefore, filtering the current feedback signal is indispensable. As a result, a fast and effective real-time filtering algorithm for current feedback signals is essential for the fusion magnet system’s security.
Filtering algorithms have been extensively studied by numerous scholars. The traditional method, based on discrete Fourier transform, is simple to implement [
2,
3] and is usually used for spectrum analysis and harmonic removal. However, it is more applicable to the analysis of smooth signals, and does not work well for time-varying signals. Short-time Fourier transform (STFT) [
4] compensates for this deficiency by using time–frequency windows to localize instantaneous features. Nevertheless, its fixed resolution restricts its usefulness. In recent studies, researchers have proposed combining STFT with convolutional neural networks [
5], normative methods [
6], and integrated learning for noise-containing signal processing for intelligent island detection. Despite their effectiveness in extracting signal features, machine learning-based methods have the drawbacks of having many network parameters and requiring large amounts of data for training. The authors of [
7] used empirical modal decomposition (EMD) for multiresolution filtering, which does not use external basis functions for the decomposition but instead uses the time-scale difference of the local extremum points. However, this method suffers from problems such as modal aliasing and endpoint effects.While computationally complex filtering algorithms are practical, they have low real-time performance, and pose a risk of phase lag. Critical issues related to phase lag include plasma loss of control and power supply emergency failure, among others. On the other hand, computationally simple filtering algorithms are poorly filtered and cannot be adopted. All of the aforementioned methods have poor real-time performance, filter nonsmooth signals ineffectively, and produce significant phase lags that impair tracking performance.
The current of the fusion magnet power supply is controlled, and follows the plasma control command with typical nonsmooth signal characteristics. The wavelet transform method is considered to be the most desirable, as it is faster, more reliable, and more efficient than other filtering algorithms. This dominance is mainly related to its multiresolution characteristics, which are suitable for processing nonsmooth signals. Moreover, the wavelet transform theory is mature and reliable, and the analysis of approximate signals using wavelet bases has long been applied in plasma science [
8,
9]. Additionally, the discrete wavelet transform is simple to calculate and easy to apply, and the proper selection of wavelet base types and decomposition depth can effectively complete the filtering. For wavelet transform denoising methods, there are currently maximum modulus methods [
10], coefficient correlation methods [
11], and threshold noise reduction methods [
12], among which the most widely used method is the wavelet threshold denoising method, including hard and soft thresholding methods. However, the soft thresholding method involves numerous nonlinear calculations [
13], which can significantly increase computation time with deeper wavelet decomposition layers, compromising real-time performance. Besides the complexity of threshold calculation, both hard and soft thresholding methods lead to boundary effects during wavelet decomposition and wavelet reconstruction, which affect the filtering effect. A left–right symmetric extension method is proposed to diminish the boundary effect, and a hard-threshold wavelet denoising method is adopted to filter the converter feedback signal [
14]. Nonetheless, the filtering effect of this method is limited by the short sliding window, and the real-time performance is not enough when the sliding window length is lengthened. To sum up, the current wavelet-based filtering algorithm suffers from the problems of complex soft threshold calculation, obvious boundary effects, and adaptation to only short sliding window data.
In this study, an optimized real-time wavelet filtering algorithm based on hardware implementation is proposed. The proposed method eliminates the boundary effect by back propagation and updating, adopts the algorithm design concept of the pipeline, and proposes a wavelet loopback structure to improve the wavelet transform calculation process. Finally, it realizes the balance of real-time and filtering effect to meet the demand of current signal filtering. The main contributions of this paper are as follows:
In response to the boundary effect problem of the discrete wavelet transform, this paper proposes an updated half-edge extension method to eliminate errors caused by boundary extension.
A hardware-based wavelet loop structure is designed to achieve fast filtering of long sliding window data and implemented on an FPGA.
A pipelining method is adopted to improve the wavelet decomposition and reconstruction process and optimize the transmission of signals between different decomposition layers, further enhancing real-time performance.
The proposed method’s effectiveness was demonstrated by experimenting on the fusion magnet power system of the Experimental and Advanced Superconducting Tokamak (EAST), including RMSE, SNR, and filtering time.
The remaining structure of this paper is as follows.
Section 2 provides a detailed introduction to the selection of wavelet decomposition scales and wavelet bases, and specifically describes the updated half-edge extension method.
Section 3 introduces the wavelet loop structure and the pipelining process of the wavelet transform.
Section 4 demonstrates the real-time performance and effectiveness of the proposed algorithm.
Section 5 discusses the limitations of the proposed method. Finally,
Section 6 presents the conclusion.
2. Filtering Algorithm Design
According to the requirements of the controller, the filtering algorithm time must be at the us level. For the EAST PF system controller, the control period is 100 us and the execution time of the control software is 40 us. Considering the time margin, the real-time filtering program must be completed within 10 us. However, what is most important is that the time requirements are followed on the premise of the filter’s effectiveness.
Since the useful signal output by the power supply has continuity in the time domain, its function can easily find the best similar signal in the wavelet domain generated by translation and contraction. In contrast, the electromagnetic noise has no continuity in the time domain, so the wavelet coefficients generated by the electromagnetic noise in the wavelet domain still have strong randomness. It is hard to find similar waveforms [
15]. Therefore, after wavelet decomposition, the wavelet coefficients corresponding to the useful signal are larger, while the wavelet coefficients corresponding to the electromagnetic noise signal are smaller. Based on the above analysis, the forced denoising wavelet transform method is used to discard the detail coefficients to retain the approximate coefficients.
2.1. Wavelet Transform Principle
Wavelet transform is a time–frequency analysis method developed in recent decades to analyze the local variation of signals in time series, which is widely used in signal processing, image denoising, digital watermarking, etc. Mallat proposed a fast algorithm for tower multiscale analysis and the reconstruction of signals [
16], one of the many fast algorithms for wavelet transform. It is extensively used because of its small operation and excellent transform effect. It outputs high-frequency and low-frequency components, called detail coefficients D and approximate coefficients A. The wavelet coefficient components of the next level are obtained from the approximate coefficients of the previous level, shaped like a tower, as shown in
Figure 1. In this paper, only the approximate coefficients of wavelet decomposition are retained, which reduces the computational effort by half compared with the conventional wavelet transform.
The wavelet coefficients of the discrete wavelet transform are shown in Equation (
1).
where
and
represent the high-pass and low-pass orthogonal matrix filters, respectively, and
represents the data sequence at discrete time.
and
represent the approximation coefficients and detail coefficients obtained from the decomposition at the
layer.
2.2. Selection of Wavelet Bases and Decomposition Layers
One of the problems in the practical application of wavelet transform is the selection of wavelet bases and decomposition layers. Using different wavelet bases and decomposition layers for the same problem will produce different results and have a great impact on the filtering effect. The fusion magnet power system’s signal has low-frequency and high-intensity characteristics. Therefore, in order to improve the denoising effect of real-time filtering, it is necessary to select the wavelet that supports a short time and fast processing speed. Daubechies wavelets have better orthogonality and tight support, which are very suitable characteristics for this scenario [
17,
18]. Which Daubechies wavelet is adopted is one of the problems that need to be solved. In this paper, simulation experiments were conducted by MATLAB R2011b produced by the Mathworks Company in Hefei, China, setting the sampling frequency to 10 kHz and sampling points to 1000, with a sinusoidal signal at 50 Hz fundamental frequency mixed with an SNR ratio of 5 dB white noise, respectively, using db1, db2, db3, and db4 for three-layer decomposition. Other types of Daubechies wavelet filters are longer, so they were not considered. By comparing the signal-to-noise ratio (SNR) and root-mean-squared error (RMSE) and considering the computational volume, the Daubechies wavelet type was selected comprehensively. SNR and RMSE are defined as follows.
where
s is the original signal,
is the denoised signal, and
N is the signal length.
As shown in
Table 1, the filtering effects of both db2 and db4 are better than other kinds. However, the length of the db4 wavelet filter is 4 times longer than that of the db2 wavelet, which means that the computation of the db4 wavelet is much larger. Taking this into account, the db2 wavelet was used as the wavelet base for the experiment. According to the experimental results in
Table 2, the number of decomposition layers was selected as 3 in this paper. Although four layers are more effective, the increased computational effort is also obvious.
2.3. Boundary Extension Method
Real-time wavelet transform data are of infinite length, and a sliding window is commonly used in engineering to handle these types of data. Only a finite-length window of data is processed at a point in time, so boundary extension problems are inevitable. For real-time wavelet filters, the boundary effect is especially obvious. Traditional wavelet boundary extension methods include zero extension, periodic extension, symmetric extension, etc. Zero extension is simple in form and easy to apply, but the data produced is meaningless; it will not be discussed here. The periodic extension will destroy the consistency of signal boundaries [
19,
20]. The traditional symmetric extension is accomplished by mirroring the boundaries of the signal, as shown in
Figure 2. The mirroring operation makes the signal symmetric at the boundary and eliminates sharp discontinuities at the boundary connections, which has an important practical significance [
21]. The authors of [
14] propose a left symmetric extension method during wavelet decomposition and a right symmetric extension method during wavelet reconstruction, as shown in
Figure 3, which ensures both boundary continuity and a better filtering effect. However, no matter what kind of boundary extension method is used, there will be a boundary effect at the left boundary when the data of the front and back sliding window are spliced, which will cause errors in the waveform at the junction. Different extension methods are used to reduce the error. Therefore, this paper proposes an updated half-edge extension method. As shown in
Figure 4, different from the left–right symmetric extension method in [
14], the original previous data are taken as the extension values during decomposition. In this way, the left boundary can be avoided to produce boundary effects.
When this method is applied to the sliding window problem, there is also the problem of error at the window connection, i.e., the right boundary of the previous window will be connected with the updated wavelet denoised data. As shown in
Figure 5, assuming that each window slides by 8 units, the wavelet reconstruction will be updated by 8 units of data. However, the right boundary data of the previous window
and
are not calculated using the collected data, and the direct connection with the data of the next window will inevitably have a certain boundary effect. Therefore, every time the window slides, the right boundary data are updated. This method not only inherits the advantages of symmetric extension, but also eliminates the real-time sliding window left boundary error.
The simulation comparison of four boundary extension methods based on MATLAB R2011b is shown in
Figure 6, and the performance of the four boundary extension methods is judged by calculating the signal-to-noise ratio (SNR) and the root-mean-square error (RMSE). Trapezoidal waves are most often used for fusion magnet power testing. In this paper, white noise with an SNR of 5 dB, 10 dB, and 15 dB and a power of 36 dBW was superimposed on the basis of trapezoidal waveform, respectively. The window length was selected as 32, according to the real-time and filtering effect.
Table 3 shows the comparison of different extension methods for superimposed 5 dB noise, and the rest of the results are shown below.
As shown in
Figure 6, when the superimposed noise is larger, i.e., the SNR of the added noise is smaller, the filtering effect is better. The extension method in this paper has the smallest RMSE and the largest SNR under different intensities of noise compared with other conventional extension methods. Compared with the left–right symmetric extension method of [
14], this paper has better performance by updating the data at the window connection, which eliminates the error at the boundary connection.
3. Hardware Accelerated Architecture Design
Wavelet transform has been widely used in engineering because of its excellent filtering effect. DSPs or computers are routinely used as its application platforms in engineering. Although DSP has excellent advantages in floating point operation, it still has some shortcomings in parallel execution. DSP cannot satisfy the real-time requirement when the calculation volume is large and the real-time requirement is high. Computers have excellent computational performance. Likewise, its disadvantages are significant, including large size, high cost, and being unfavorable for embedded deployment. FPGAs are often used as hardware acceleration platforms for various algorithms because of their high performance, low latency, and reconfigurability properties [
22,
23,
24]. To further enhance the real-time performance of the algorithm, FPGAs are used as the hardware acceleration platform for the filtering algorithm, and the hardware acceleration structure of parallelization and pipelining of the wavelet transform algorithm is studied. The overall time consumption of the filtering algorithm is effectively reduced compared to the traditional serial execution method. In addition, for the characteristics of wavelet transform and signal, a wavelet loopback transmission structure is proposed to avoid repeated calculations, as shown in
Figure 7. The overall time consumption of the FPGA-based scheme is effectively reduced compared to the traditional DSP and the computer implementation of square failure.
The most important calculation for implementing the Mallat algorithm is the convolution of the original data with the filter. As seen from Equation (
1), the essence of the discrete convolution calculation is that the data are multiplied by the filter coefficients and then added together. Therefore, the multiplication and addition operations are split and executed in different cycles, and multiple data sets are multiplied and added in one cycle. As shown in
Figure 8, only one operation is performed in different pipeline cycles, thereby effectively reducing the path delay and increasing the circuit operating frequency. Since the filter coefficients to be convolved are all decimal numbers, they cannot be stored directly in FPGA. The commonly used fixed-pointing method in engineering is adopted in this paper. The
,
,
, and
in
Figure 8 are all integers after 15-bit fixed pointing.
The entire wavelet loopback structure consists of a wavelet transform module, a write-back control module, and a window data FIFO for caching data. The wavelet transform module consists of two parts: decomposition and reconstruction. This part has a temporal sequence and cannot be processed in parallel, so this paper uses a pipeline structure. Both reconstruction and decomposition are in three layers, as shown in
Figure 9. Because there are upsampling and downsampling operations between different layers, the results of the current layer calculation are cached using an interlevel FIFO. When the acquisition module sends the acquisition data, the wavelet transform module applies wavelet-forced denoising to the data. Then, it sends them to the window data FIFO as a data stream to complete the window data convergence and update some of them.
When the update is completed and the FIFO reaches its full state, the write-back control module sends all the filtered window data in the FIFO to the module of the next level of data processing. While sending, the write-back module intercepts all of the window data and retains part of the data, as shown in
Figure 10, and writes back to FIFO in parallel. During this period, FIFO carries out read/write operations at the same time. This method does not add additional data transfer time and avoids double-counting the next window, thus realizing the real-time filtering of long sliding window data.
A short sliding window can meet the requirements if the collected signals are used for signal feedback. When the collected signals are used for fault diagnosis and performance testing, the extraction of weak features of the signal is required to be higher, so a sliding window of longer length is often needed. In order to verify that the method in this paper can satisfy the fault diagnosis application scenario, the experiment selects 512 original data as the total length of the window, with 20 as the step size, and performs data processing in the form of a sliding window. Therefore, there is a large amount of repetitive calculation for two wavelet transforms. The wavelet loopback structure proposed in this paper writes back most of the data to the FIFO in parallel, while sending the wavelet-transformed data stream to the poststage digital signal processing through the write-back control module. Only the wavelet transform values of the 20 updated original data and the part of the data affected by the previous boundary extension topology need to be calculated each time, as shown in
Figure 10. This structure can effectively avoid the repetitive calculation of wavelet transform, reduce the saving data time, and improve the real-time performance of the wavelet filtering algorithm.
4. Experimental Verification
In order to verify the effect of the proposed hardware acceleration scheme, this paper builds an FPGA-based hardware verification platform with the board shown in
Figure 11 below, in which the main chip of the board adopts XILINX’s xc7z020clg400-2 model. The whole filtering program is written in the Verilog language and developed on the VIVADO platform, and the resources used are shown in
Table 4.
In this paper, the actual data collected from the EAST fusion magnet power supply are filtered and processed, as shown in
Figure 12.
The algorithm’s effectiveness is verified by filtering the actual collected current signals of different amplitudes.
Table 5 and
Table 6 show that the method in this paper effectively improves the signal-to-noise ratio of the current signal while reducing its root-mean-square value.
Figure 12 shows the filtering performance of the proposed method at 2000 A, 4000 A, and 8000 A, and the experimental results prove the excellent filtering effect of the proposed method.
In this paper, we use Xilinx’s ILA IP to verify the real-time monitoring of the FPGA’s internal signal waveform to verify the filtering algorithm’s real-time performance. The computation time is shown in
Figure 13. It is verified that the time used for the wavelet transform of 20 steps of data for the method in this paper is 163 * 10 ns = 1.63 us (clocked at 100 Mhz), and the time for the wavelet transform of a single window length of 512 is 678 * 10 ns = 6.78 us (clocked at 100 Mhz).
Compared to the scheme in [
14], this paper has similar performance at lower frequencies, which benefits embedded deployment while meeting the demand for windows of different lengths. Achieving the high real-time performance of [
14] relies mainly on the ultrahigh frequency of the industrial control computer; as shown in
Table 7, its main frequency is up to 2536 Mhz. However, its deployment cost is high, and only for short sliding window data. Regarding the real-time filtering of long sliding window data, the method in this paper takes less time and has higher real-time performance than the method in [
14], as shown in
Table 8. In addition, the proposed loopback structure reduces the time by 67.4%, which can satisfy the power system time requirements.