1. Introduction
Capacitors (a single capacitor or many capacitors connected in series or as parallels) are widely used in the dc-link, filtering, and snubber circuits of power converters. Although modern capacitor technology has advanced with great improvement, capacitors are still reported to be one of the stand-out components in terms of failure rate in the field operation of power converters [
1,
2]. If a capacitor comes to the end of its life, the power converter will malfunction. This will normally lead to a systematic converter inspection and analysis. After the degraded capacitor is identified, maintenance can be carried out for replacement. Even for a capacitor bank that only has a single degraded capacitor, the entire bank would be recommended to be directly replaced as a whole to ensure the overall converter performance and reliability [
3,
4].This post-accident maintenance approach is straightforward, but very costly and unsafe when the maintenance replacement is carried out normally, after the converter having performed abnormally or failed. There are more stringent reliability requirements in the field of aerospace, energy industries, and emerging modern transportation systems including electric vehicle, high-speed railway train, and Maglev train. Therefore, implementation of reliable capacitor health status evaluation or monitoring to ensure reliable field operation and preventive maintenance is significantly important [
5].
In
Figure 1, a simplified equivalent model of capacitors is shown along with its corresponding frequency characteristics. The capacitor impedance Z is divided into three different regions dominated by C, ESR, and the equivalent series inductance (ESL). ESL has strong influence on the switching state of power converters [
6]. Most of state-of-the-art capacitor monitoring methods are based on the two typical indicators, namely C and ESR.
The health status of capacitors can be evaluated through comparison of estimated or measured capacitance/ESR values and the original values. The end-of-life criteria vary greatly with the specific capacitor type. For example, the widely accepted end-of-life criterion for the aluminum electrolytic capacitor (AEC) is a 20% reduction of the original C, or double of the original ESR [
7]. However, there are practical difficulties in evaluating the health status of a capacitor by this widely-accepted criteria. Firstly, the capacitance and ESR are highly temperature- and frequency-dependent [
8,
9,
10], which might lead to inappropriate maintenance decisions. Secondly, the estimation approach must survive in the actual capacitor aging, measurement noises, environmental disturbances, load variances and so on.
In the past decade, many efforts from academia and industry are made in the development of C and ESR estimation methods. According to the methodology and implementation [
3], they can be mainly classified into three types, including the capacitor ripple current sensor based method, the circuit model based method, and data-based advanced algorithm method. For the first category, methods in [
11,
12,
13] adopt a classic current sensor (e.g., resistors or Hall Effect sensors) to calculate the ESR, while methods in [
14,
15] are based on the PCB Rogowski coil sensor. These methods can be implemented online. Methods in [
12,
13] estimate the ESR at switching frequency while methods in [
11,
14,
15] over a high frequency range. However, the accuracy of these methods on ESR estimation is comparatively low. An alternative approach is to estimate C and ESR offline, which can provide results of a full frequency range. This can be achieved by externally injecting a desirable signal of current or voltage at a certain frequency into the capacitor [
16]. Different algorithms [
17,
18,
19] are utilized to extract the relationship between the input current and the output voltage of the experimental circuit that is composed of a signal generator, a power amplifier, and an oscilloscope. These methods require additional hardware or software, which increases the cost and complexity. In addition, the approaches in [
17,
18,
19] need the removal of the capacitor from the converter. Therefore, using existing sensors for the control or protection purpose of converter is highly preferred. The voltage ripple or current information of a capacitor can be indirectly yielded according to the circuit structure and operation, which is the idea of the second category. Several representative methods are reported in [
10,
20,
21,
22,
23]. Most of those approaches can achieve online monitoring of capacitors without external signal injections. However, those approaches are heavily dependent on the converter structure, which limits their applications. A few methods [
24,
25,
26,
27,
28,
29] have been recently proposed to externally inject a signal for the C or ESR estimation under various applications, which requires special operation of the converter for external signal generation during the normal operation. Recently, efforts are being continuously made on the third category [
29,
30,
31,
32,
33], where the power converters are treated as a black box, and only their terminal information is utilized for capacitance estimation.
The motivation of the paper is to lift the application limits on the prior traditional methods and further reduce the requirement on the extra sensors and hardware circuits. The estimation is based on intelligent algorithms, such as support vector regression (SVR) and the artificial neural network (ANN) algorithm. Less sensitive to noises with less hidden layers, the artificial neural network is the best non-deep learning model and chosen as the baseline for deep learning models [
34,
35]. The pioneered research in [
31,
32,
33] uses the 300 Hz voltage ripple to produce a nice mapping to estimate C at 300 Hz, but the influence of ESR during the actual capacitor aging is ignored. In other words, the actual capacitor degradation progress where both C and ESR degrade fails to be considered into training. Only a single best fitting of the ANN is given, which might be an occasional good result in certain challenging fitting tasks.
Among the aforementioned approaches, estimation of C or ESR is preferred at room temperature where the healthy capacitors have huge differences with the aged one, which can avoid the influence of temperature [
9,
10]. For most of the given approaches, C is preferred to be estimated at low frequency (30 Hz in [
26,
29], 120 Hz in [
25], 100 Hz in [
22]) as shown in
Figure 2b while ESR at high frequency range over 1 kHz (mostly at switching frequency [
10,
21,
23,
27]). However, according to [
8,
9], the difference between ESR of the degraded capacitor and the healthy one is much larger at l20 Hz than over 1 kHz, which has been further proven in [
24,
28]. Therefore, using ESR at low frequency is also appealing, as the life indicator with the additional advantage that manufacturers also provide important reference data at low frequency.
In this paper, using the reliability-critical Maglev chopper as the background, a two-input ANN is explored for the dc-link capacitor C/ESR estimation. The proposed method only uses inputs sensed by the existing voltage/current sensors installed in the Maglev chopper. The load variance’s impact on the ANN is investigated as well as the actual capacitor aging. In experiments, our proposed ANN aims at building an aggressive mapping between capacitance/ESR at 120 Hz and inputs (voltage ripple at 5 kHz and average levitation current), the strong nonlinearity and complexity of which cannot be described or suggested by linear circuit equations. Therefore, cross validation must be applied to avoid over-fitting or under-fitting in the learning process to ensure the stability of ANN.
2. Maglev Chopper
Electromagnetic Suspension (EMS) Maglev trains are very successful in commercial applications [
36,
37]. There are two types of EMS Maglev trains. One adopts short primary linear induction motor (SLIM) installed on two sides of the rail as shown in
Figure 2, and the other uses long primary linear synchronous motor (LSM) [
36]. The former type targets at a top speed of 100 km/h while the latter can operate up to 200 km/h. Their levitation system and tires of the automobiles stay the same [
31]. The right half of
Figure 2a shows the cross-sectional schematic of the EMS-SLIM Maglev system. From
Figure 2a, it can be seen that the primary side of SLIM is installed on the bogie and the secondary side (reaction plate) is installed on the top of the F-track. The actual side view of the bogie is shown in the upper left corner of
Figure 2a. Each carriage of the Maglev train contains five bogies, and each bogie consists of two suspension solenoid modules, which are both located on both sides of the F-track. As shown in
Figure 2a, a suspension control box controls two electromagnets in each module simultaneously. The mutual mechanical coupling between the paired electromagnet modules is decoupled by the bogie. This allows that the control of one solenoid module will be independent of the other modules, so that the suspension control will be much easier. The air gap and acceleration sensor with an additional current sensor feedback the air gap, acceleration, and levitation current information into the control processor where the reference air gap is set as in the right of
Figure 2a. The suspension control processor then adjusts the duty cycle of the Maglev chopper, producing a desired levitation current. This levitation current is injected into the coil of electromagnets so that a suspension force is generated for stable levitation of the Maglev train. The main circuit of one Maglev chopper is shown in
Figure 2b. The chopper is constituted by two IGBT modules. In each IGBT module, there is only one active switch (VQ1 or VQ2). The dc-link capacitor, an AEC, is mainly used to stabilize the dc voltage.
A typical mission profile of Maglev chopper based on the first domestic commercial Maglev line in China is shown in
Figure 3. The single one-way trip lasts about 20 min. Initially, the train is levitated so that the levitation current rapidly increases to its rated value. During the movement of Maglev, the current ripples are as a result of dynamic suspension control. At around 600 s, the train is still levitated but stops at a station. During the stop, the levitation current is almost a constant, as seen in
Figure 3. Finally at around 1100 s, the train is landed at the terminal station when the current drops to zero. The train will normally wait around 10 min for the instruction to move again and return to its starting station. During the waiting period, the levitation current stays at zero. Thus, the train being levitated without movement where the levitation current is stable provides a nice slot for C/ESR estimation.
3. ANN for C and ESR Estimation
ANN is capable of both adequately approximating any complex nonlinear relationship and adapting to unknown systems. Therefore, an intelligent solution for capacitor condition monitoring can be achieved by finding an accurate mapping between the input data available from the converter and the targeted output (i.e., C or ESR). After the completion of training, such a mapping becomes a monitor. In this role, it continuously receives the actual input data of the Maglev chopper and estimates C or ESR.
The temperature effect of electrolytic capacitors during their service life can cause their own electrolytes to evaporate, leading to deterioration [
6]. In [
38], electrolyte evaporation is closely examined, where it was found that reduction in C and increase in ESR are actually in different tempos. A typical illustration of capacitor degradation is shown in
Figure 4. It reflects C and ESR variances for most AECs. The ESR is reported to decrease much earlier and faster in the aging process.
In [
31,
32,
33], different ANNs have been firstly proposed and discussed for C estimation. However, ESR estimation is not discussed, and the ESR’s effect on C estimation is ignored. The actual capacitor aging progress where both C and ESR vary is ignored. Especially, ESR in dc-dc converters contribute a large proportion of the output AC voltage ripple [
21]. The proposed ANN must reflect the actual degradation process of a capacitor to comprehensively determine the capacitor health status. In this way, the ANN can be more robust to produce reliable estimation.
The two-input ANN has been used here, as shown in
Figure 5. The target is C or ESR estimation. The input data of ANN is chosen to be the average levitation current and the voltage ripple on the dc-link capacitor. Within the Maglev chopper, a voltage sensor has been selected to monitor the dc-link voltage for the protection purposes, and a current sensor for the levitation current monitoring for suspension control. No extra sensors are required for the proposed ANN.
3.1. ANN Structure
The neural network model used in this paper is a simple ANN whose structure consists of three layers, i.e., an input layer, a hidden layer, and an output layer, as shown in
Figure 5. The input layer has no computational function, and its main role is to store the data input to the ANN. The hidden layer has a computational function, which can be used for forward propagation of data in the input layer. The hidden layer is interconnected with the input layer and the output layer through weighted connecting lines with different weights, thus establishing a nonlinear relationship between the input layer and the output layer. The main function of the output layer is to output the target parameters and calculate the loss function for error back propagation, thus updating the weights. The C or ESR is normalized and output to the ANN. Usually, a single hidden layer with multiple neurons is then able to fit any nonlinear function [
39]. However, the reason for not using multiple hidden layers (also known as deep learning) in this paper is the following [
39,
40,
41].
Multiple hidden layers mean there are multiple layers of neurons, which means there are more neurons in the network structure. More neurons make the network training slow, and it is difficult to remove residual noise during training.
For some training cases, curve fitting becomes very specialized. This reduces the ability of the neural network to estimate new inputs rather than training inputs.
Multiple hidden layers increase the risk of producing local optima. Eventually, a locally optimal neural network will be trained. This results in producing inaccurate prediction outputs when making predictions.
Therefore, a single hidden layer is also used in this paper.
Next, the mathematical definitions of ANN shall be given. As shown in
Figure 5, any ANN has a key element, neuron. The inputs can be represented as an vector I
1 = (i
1, i
2), where i
1 and i
2 are the value of the 1th and the 2th dimension of input, respectively. A weight is related with each connected pair of neurons. Hence, weights connected to the first neuron (i.e., neuron 1) in the hidden layer can be represented as a weight vector of W
1 = (w
11, w
21), where w
11 and w
21 represent the weights associated to the connection between the input, and the neuron 1. The first mapping process from inputs to the first neuron in hidden layer can be denoted as follows:
where
denotes the result after the inner product of inputs and weights.
A neuron contains a threshold value that is used to regulate its action potential, therefore, to mimic that, an activation function named sigmoid nonlinear function
(which can be chosen in various ways as detailed below) is followed by the first mapping process, which has the following form:
The other mapping processes, the inputs to other hidden layers are with the similar explanations as the Equations (1) and (2).
Similarly, the mapping result
from the hidden layer to the output layer can be written as:
During the training process, in order to obtain the optimal weight parameters and thus minimize the difference between the predicted output of the neural network and the actual result, it is necessary to set the corresponding loss function and specify the corresponding optimization algorithm. In this task, a loss function named mean square error
between estimateed values
and ground labels
is described as:
As shown in
Figure 5, the training process is a typical supervised training with the output C/ESR known. The process mainly includes an adaptive optimization algorithm, which is used for error minimization purposes. The training purpose is to find the best parameters that fits the input to the output. Therefore, a corresponding weight function composed of hidden neurons can be found. The optimizer algorithm used in this training is actually Bayesian regularization [
31]. This optimization is a very effective global optimization algorithm. It has fewer iterations (more time efficient) and the granularity can be very small compared to the grid search algorithm. In addition, the training algorithm that we used prevents overfitting by stopping the iterations before the model converges iteratively on the training dataset.
3.2. Size of the ANN Hidden Layer
Neurons in the hidden layer of a neural network convert the input layer data into data that can be used in the output layer. An insufficient number of neurons can be underfitted, resulting in a network that is uncapable of expressing the task. A sufficient number of neurons mean that the ANN requires more data for training, which greatly increases the computational workload and severely slows down the network training speed. It may also negatively affect the generalization properties of ANNs when increasing the number of neurons to very large numbers [
40,
41,
42]. These effects limit the applicability of ANN online settings. Essentially, there should be a trade-off between the number of neurons in the hidden layer and the computational power and generalization efficiency of the neural network. Several techniques exist in the literature [
42] to determine the optimal number of neurons in the hidden layer. In [
43], it is suggested that the sufficient amount of hidden neurons m (Nc for C Estimation and N
ESR for ESR Estimation), which can be calculated by:
where
,
are the number of neurons in the output layer and the number of samples in the training set, respectively. However, when the method to calculate the optimal number of neurons in the hidden layer is inapplicable, the trial-and-error method is the most primitive method in existing studies. When the trial-and-error method is used, trial-and-error results that are closer to the optimal number of neurons are usually obtained. In fact, in most applications, the user will keep changing the number of hidden neurons during the training process until the training generates a neural network with the optimal number of neurons.
3.3. Size of the Training Data
All artificial neural networks have to go through a training phase in order to evaluate their performance in the testing phase. After determining the optimal number of neurons in the network, it is necessary to select the number of training samples in order for the ANN to exhibit acceptable generalization capabilities in the testing phase. In [
34], Vapnik and Chervonenkis defined a parameter called Vapnik-Chervonenkis dimension (
) as a metric of the ANN generalization ability. If the number of training samples exceeds the
, the error in the testing phase can be within a limited range. Regardless of the number of layers of the artificial neural network and the type of activation function it uses, for an ANN with
weights and
nodes, it can be shown that [
34]:
where
is the dimension of the input data. A useful rule of thumb is that the
value should be around One tenth the number of the training samples.
4. Simulation Results
A typical simulation model of a standard Maglev chopper was developed in Matlab. The circuit simulation parameters are presented in
Table 1.
The activation function for the hidden layer is chosen to be ‘tansig’ while the output activation is chosen to be ‘purelin’ in Matlab. The training algorithm is ‘trainbr’. The maximal training step is set to be 100,000, the minimum training error is set to be 1.0 × 10
−8, and the learning rate is 0.01. Since the condition monitoring problem is actually a problem of both regression and estimation, methods that use known data are based on curve fitting to establish the relationship between inputs and outputs in this paper. Three metric parameters R
2, PE, and MAPE are used to evaluate the regression performance:
where
are the actual value and average of C or ESR, and
is the estimated value. The coefficient of determination R
2 is a value between 0 and 1. As long as the regression factor is close to 1, a strong correlation between inputs and outputs can be achieved. Percentage error (PE) and mean absolute percentage error (MAPE) are used for assessing the percentage error of estimated result.
In [
31,
32,
33], ESR’s effect on the estimation of C using the ANN approach is ignored. In the actual aging of a dc-link capacitor, as shown in
Figure 4, C will decrease while ESR increase, although the reduction rate for the ESR is much larger. Co is still 1800 μF, ESRO is 31.5 mΩ. The variation range for C is 80% to 100% Co while ESR is 100% to 200% ESRO. The tuning step for C is 2% of Co while for the ESR is 10% of ESRO. The normal duty cycle is 0.6 which responds to a normal current of 8 A. Considering the possible variance of the levitation current when the train stand still but is levitated, the duty cycle is chosen to be 60%, 61% and 62% for training. This consideration is used as load perturbations to increase the robustness of the ANN and reinforce its immunity to noises. Overall 363 sample data can be generated by Matlab simulation. Considering the two-input ANN, the desired neuron number for this ANN is about 3 to 18, which gives an important reference to determine the number of neurons. This is supposed to be a more challenging fitting task than the previous research in [
31,
32,
33].
To find a best fitting, 363 data sets are randomly divided into 10 subsections. 8 subsections are used for training, 1 for testing and 1 for validation. After the single data division on 363 data sets is done, the training can start. When the number of neuron N
ESR increased from 1 up to 10, surprisingly in
Figure 6a, the overall R for ESR estimation is 0.99999 for a single fitting. Similarly, when the number of neuron NC increased from 1 up to 10, the overall R for ESR estimation reaches 0.99972 as shown in
Figure 7a.
However, the obtained mapping by the single data division cannot prove if the found mapping can actually reveal the real correlation between the inputs and C. The found mapping might only be a reflection of a local optimal relationship, which only indicates an occasional nice estimation. Therefore, cross validation must be applied. It is achieved by randomly dividing 363 data sets ten times for ten different data combinations for training, validation, and test. Correspondingly, 10 different trainings are performed.
For stability under cross validation, the proposed ANN also shows stable performance in C and ESR estimation. However, the ESR estimation is more stable than C. The MAPE for ESR estimation as shown in
Figure 6b is only 0.01%. For different fittings, the PEs under different data samples are distributed in a very small range of ±0.15%. The result of C estimation is shown in
Figure 7b. Its MAPE is 0.16%. The PEs under different estimations are distributed in a larger range of ±0.6%. Further increasing N
C is not conducive to reducing the MAPE as well as the errors under different estimations in 10 random fittings.
6. Discussion
Both C and ESR estimation using the two-input ANN to estimate C or ESR of in Maglev choppers are discussed in this paper. Different from the previous data-driven methods [
30,
31,
32,
33], the ESR’s degradation effect on the C estimation is considered. Therefore, the actual capacitor aging progress is integrated into training. Moreover, an aggressive fitting is tested by our proposed ANN. In simulations, the fittings on ESR/C estimation are almost equally good, although the stability to estimate C is slightly worse. Experimental results, however, illustrated that ESR estimation using the proposed two-input ANN outperforms C estimation. The fitting for ESR estimation is very successful at both 120 Hz and 5 kHz. For the single best fitting, both C and ESR estimation can achieve excellent results, i.e., a very high R close to 1. C estimation at 120 Hz needs much more neurons in the hidden layer to achieve a satisfactory single fitting, so the estimation on C takes more time and effort for training. However, in terms of stability, C estimation shows some problems. The found mapping by the ANN might be a local optimal solution. There is high possibility that the good fitting might be occasional. By changing the training and testing data sets, the estimation errors will differ greatly. It indicates that in practical applications, the proposed ANN might occasionally provide inaccurate results.
In comparison with prior-art data-drive approaches, ESR estimation by the proposed two-input ANN is much more stable. After ten random selections of train and test data, all trained mapping can accurately estimate the ESR. The MAPEs are around 1%. Moreover, the neuron number is only 10 for ESR estimation, which indicates a quick training process. One reason why there is a huge difference in C estimation by the ANN is that the capacitance in simulation would not change with the frequency as it will in practice. Therefore, in the experiment, the nonlinearity on C estimation is strengthened. Another reason is that the voltage ripple caused by the ESR variation is much larger in the high frequency dc-dc converter than by C, so the fitting for ESR estimation is comparatively easier. However, there is no discussion on the temperature influence on the ESR or C estimation due to the page limit, which deserves future research.
Therefore, future online capacitor monitoring can use the two-input ANN for ESR estimation at 120 or 5 kHz, dependent on the availability of original values. The ten-neuron ANN will generate a very simple target function composed of several polynomials. This obtained mapping via ANN training can be implemented easily into a typical DSP control board. In other words, the function can be integrated into the Maglev suspension control. The online capacitor health condition monitoring can be implemented each the Maglev train is levitated stably before put into movement.