2.1. Analysis of the Results of Conventional Single-Directional Aging Experiments
The analytic model of the IGBT is the analytical relationship between the sensitive factor in the IGBT and the number of failures (i.e., the lifetime). The aging data is fitted correspondingly, and the expression is intuitive. In order to describe the relationship between the cumulative failure rate
F and the number of experiments (i.e., the number of cycles), it is necessary to select an appropriate life distribution. If the distribution model is properly selected, the
F under various working conditions could be obtained with more accuracy [
15]. Common life distribution models include normal distribution, lognormal distribution, Weibull distribution and log-Weibull distribution. The choosing of distribution model is the key to switching device fault reconfiguration.
The normal distribution, also known as the Gauss distribution, is the most common and widely used distribution of the probability distribution of all random phenomena. It can be used to describe many natural phenomena and various physical properties. In practical applications, there are many experimental data that can be fitted with a normal distribution as its approximate distribution [
16]. The lognormal distribution is a typical model that is derived from a normal distribution. When the logarithm of the expiration time “
ln (
t)” obeys the normal distribution,
t obeys logarithmic normal distribution. Logarithmic transformation can reduce the larger number to a smaller number. Lognormal distribution is widely used in fatigue life research [
17]. The Weibull distribution is a continuous distribution that is widely used in reliability, and its advantage lies in its strong adaptability to various types of test data. It is widely used in reliability engineering, especially for the distribution of wear and tear failure of electromechanical products. Since it can easily infer its distribution parameters using probability values, it is widely used in data processing of various life tests [
15]. The log-Weibull distribution is a special correlation distribution of the Weibull distribution which is obtained by taking the logarithm of
x and obeying the Weibull distribution. The effect is similar to the lognormal distribution, and the larger number can be reduced by logarithmic transformation [
18].
Table 1 is the equation for the distribution function and density function of the four distributions.
Because the design life of IGBTs is generally more than 20 years, aging data is generally obtained by means of accelerated fatigue test (AFT) [
19].
In
Figure 2, G
1 is auxiliary GTO,
I1 is a fatigue current and
I2 is used to calculate
Tj. R and C form an absorption branch. If G
1 is turned on,
I1 flows through the IGBT part. If G
1 is turned off, G
2 provides the freewheeling route of
I1. C
1 and C
2 produce
I1 and
I2, respectively. During the experiment, aging current
I1 only flows through IGBT. So, it can be called SAFT.
During AFT, the ambient temperature is approximately 25 °C.
I2 is 1.5 A, which is no more than 0.1% of the DUT rated current (1500 A) as a rule of thumb. The cycle time is 12 s with a duty cycle of 48%. During the AFT, DUT is placed in the actual TC power module and is then connected to the test platform. In AFT,
F is calculated from the observed
UCES [
7], as shown in Equation (3).
In Equation (3), UCES0 is the initial saturation voltage of a new DUT, and UCES is the current value. When UCES rises to 1.2 times that of UCES0, F is defined to be 1.
With the data from SAFT, the corresponding
F is calculated. Four different distribution models are fitted by the LSM method. As for the normal distribution, it gives:
As for the lognormal distribution, it gives:
As for the Weibull distribution, it gives:
As for the Logarithmic Weibull distribution, it gives:
In order to select the most suitable one of the four distributions, we plot the curve of the experimental data and four distributions, and the density functions are also plotted. The results are shown in
Figure 3 and
Figure 4. It should be noted that in
Figure 3 and
Figure 4, the curve of the Weibull distribution coincides with the curve of the Logarithmic Weibull distribution. In
Figure 3 and
Figure 4, the horizontal axis is the number of cycle times, and the vertical axis is cumulative failure rate, which is calculated according to DUT saturation voltage drop [
13,
17].
From
Figure 4, it seems that the Weibull distribution is closer to the practical fatigue process (hence the fault transition process) compared with the normal distribution, because the latter one shows a larger error. What is more, the lognormal distribution is basically consistent with normal distribution; hence the same fitting accuracy is obtained. However, the transient component of failure density in the case of the lognormal distribution function is too high; therefore, it is not suitable to describe the failure density of high-power IGBTs. The logarithmic Weibull distribution has the same characterization accuracy as the Weibull distribution, however its calculation process is much more complicated. Finally, the Weibull distribution is the most suitable one.
2.2. Effect of FWD Actions on Device Aging
It is not enough to consider IGBT only in the fatigue model. In the device, current can flow in two directions, as shown in
Figure 5, where C and E are the external terminals of the device. In
Figure 5a, current flows through IGBT. In
Figure 5b, current flows through FWD. In AFT, the DUT is heated by the test current flowing through it. However, conventional AFT only offers the test current in one direction, i.e., the direction shown in
Figure 5a [
13,
14,
15], considering that IGBT is more fragile. In conventional AFT (hence SAFT, single-directional AFT), the aging of IGBT is equal to the aging of DUT. However, the current through FWD also generates heat, hence FWD actions also cause DUT to age. SAFT should be improved to bidirectional AFT (BAFT) where the test current flows through both the IGBT and FWD. The data that is obtained by BAFT accurately shows the aging effects caused by IGBTs and FWD. Such an effect is shown more clearly in junction temperature variation, as shown in the simulation of the TC operation below.
Figure 6 shows TC topology, where Q
11~Q
32 are IGBTs, D
11~D
32 are FWDs, QB, RB and DB constitute the brake energy consumption branch. Q
11~Q
32 and D
11~D
32 are usually packaged in six modules onto the heat sink. QB and DB is not used frequency, therefore only the fatigue of Q
11~Q
32 and D
11~D
32 are considered here.
The internal temperature of the IGBTs and FWDs under the given operating conditions was obtained in an electro-thermal simulation of the TC model. In the simulation, the EMU went through two operating cycles. The simulation results are shown in
Figure 7. The horizontal axis in
Figure 7 is time, and the vertical axis is junction temperature. The switching device used is 1500 A/3300V, the switching frequency is 57 Hz~1000 Hz and the ambient temperature is 25 °C.
In
Figure 7, the maximum variation of FWD junction temperature (Δ
Tj2) is 56.3 °C, and that of IGBT (Δ
Tj1) is 39.7 °C. Junction temperature variation severely affects device lifetime [
20,
21]. According to the experimental data of LESIT [
22] and CIPS08 [
23], the junction temperature change and the average junction temperature are the key factors affecting the aging process.
Table 2 shows the maximum values of Δ
Tj2 andΔ
Tj1 under different load conditions [
24]. Δ
Tj2 and Δ
Tj1 are affected by various factors, such as power loss of IGBT and FWD, heat dissipation condition and thermal resistance. In
Table 2, Δ
Tj2 and Δ
Tj1 increase as the vehicle load increases (which appears in the form of load current, i.e., the fatigue current), which means that there is certain one-to-one correspondence between Δ
Tj1, Δ
Tj2 and the load current.
In
Figure 7, it should be noted that Δ
Tj1 is higher in the traction phase than Δ
Tj2 and lower in the braking phase. What is more, Δ
Tj2 grows much faster than Δ
Tj1 in the high-speed braking phase. This stems from the fact that more TC power output is required in the high-speed braking phase than in the traction phase, and such braking power mainly flows through the FWD instead of the IGBT [
6]. Therefore, the role of FWD should not be ignored during the fatigue of the switching device.
As stated above, the IGBT ages faster with the FWD actions, i.e., the fatigue process is accelerated. If the aging state of the DUT is still represented by the aging state of the IGBT (as is the case in the existing references [
2,
3,
4]), this acceleration effect of the FWD action must be considered.
2.3. BAFT and the Analysis of BAFT Results
The fatigue model reveals the relationship between the main factors affecting fatigue accumulation and the degree of fatigue. The load condition of the vehicle directly affects the fatigue of the switching device. The TC current output reflects the load level and this current flow completely through the switching device. Therefore, there is a one-to-one correspondence between the TC current output and the fatigue level of the switching device, and there is a one-to-one correspondence between the device current and the fatigue level and between the device current and the device lifetime. The correspondence is shown in Equation (8).
Nf is the number of cycles the device can withstand before failure. We denote Nf as the lifetime of the device; Ieq is the equivalent current amplitude of the device.
The acceleration effect of the FWD action means that the life of the device is shortened, as shown in Equation (9).
In Equation (7), Nf2 is the practical lifetime with FWD action, while Nf1 is the lifetime of only considering IGBT action; n is a coefficient indicating an acceleration effect.
However,
n is difficult to obtain through experimental testing, so other ways need to be considered to express the aging acceleration effect of FWD on power modules. For better applicability, we present the Equation (8) in its current form, as shown in Equation (10).
In Equation (10), iG is the current of the IGBT, iD is the current of FWD and μ is the acceleration factor. The fatigue of semiconductor devices is caused by the accumulation of thermal stress effects, so the current is in the form of RMS (Root-Mean-Square), and there is a square operation.
In order to obtain
μ, we propose a new AFT (BAFT) with bi-directional fatigue current. The BAFT test platform is shown in
Figure 8.
In
Figure 8, the device functions are similar to those in the single-directional accelerated fatigue test platform of
Figure 2, except that when G
2 and G
4 are turned on,
I1 flows through the FWD portion inside the DUT. In BAFT, the number of cycles of IGBT and FWD is equal, so only half of the total number of actions is considered.
For comparison, a bi-directional experiment was performed at
I1 of 1500 A. Based on the data obtained from the bi-directional experiment, the cumulative failure rate curve of BAFT is shown together with the cumulative failure rate curve of SAFT at 1500 A in
Figure 9. In
Figure 9, the horizontal axis is the number of cycle times and the vertical axis is cumulative failure rate. It can be seen that the bi-directional experiment significantly accelerates the aging of the entire switching device.
Since the fatigue process of the switching device follows the Weibull distribution, the Weibull distribution is fitted by the least squares method according to the bi-directional aging experimental data under 1500 A, and the mathematical analysis relationship between the BAFT and the number of cycles
x can be fitted to:
The result curve of SAFT is fitted as:
The mean of the Weibull distribution:
In Equation (13), Γ is a gamma function, and λ and k are parameters of the Weibull distribution.
To derive
μ, we define an intermediate variable
m which is derived from the quotient between the mean of
FB (
x) and
FS (
x), as shown in Equation (14).
In Equation (14), λ1 = 283769, k1 = 15.05, λ2 = 317254, k2 = 12.98. Using these values, m can be calculated as 1.1125.
From the difference between BAFT and SAFT, it is given that:
Consider iG = iD = 1500A and m = 1.1125. Equation (10) shows that FWD accelerates the aging speed by 23.77%.
2.4. Fault Model Fitted by Weighted LSM
However, it is not enough to analyze the experimental data of bi-directional aging under a single current. In order to obtain a more consistent distribution function with the practical trend of failure rate, this paper will analyze the bi-directional aging under different current values. Take the bi-directional 900A experiment as an example. The data collected from the experiment are plotted in
Figure 10. In
Figure 10, the horizontal axis is the number of cycle times and the vertical axis is cumulative failure rate.
It should be noted that when the test current decreases, the aging speed decreases sharply [
25]. It is too difficult to obtain complete aging data through experiments. Therefore, only the part with a failure rate of 0.4 is used for analysis.
Through the previous analysis, Weibull distribution is selected to fit the failure rate distribution. The least squares method can be used to estimate the parameters of non-linear regression analysis. The traditional methods of least squares are direct search method, lattice search method, Gauss-Newton method, Newton-Raphson method, etc. [
26]. Direct search method and lattice search method are inefficient and seldom used in practical engineering [
27]. Gauss-Newton method is also called linearization method. Newton-Raphson method is an improvement of Gauss-Newton method. However, in some cases, these two methods may require much iteration to converge, and in a few cases, they may not converge at all [
26].
To a certain extent, the problem of the above method can be avoided by using a genetic algorithm for least squares. GA (genetic algorithm) is an adaptive stochastic search method that is based on natural selection and biogenetic theory [
28]. Firstly, the solution of the problem to be solved is regarded as a number of the individuals in the population. The individual in the population is coded and expressed as a string. Then, according to the fitness of the individuals, the selection, crossing and variation operations between the individuals are simulated to generate new solutions. In the process of population cyclic evolution, individuals gradually evolve to the state of the approximate optimal solution; therefore, the optimal solution of the problem can be obtained. The flow chart is shown in
Figure 11.
The most important part of GA is fitness function. When GA simulates the selection of nature, the adaptation of each generation of population to conditions corresponds to a necessary parameter, which determines how much probability the individuals in this population participate in the heredity. The function of evaluating this probability is called fitness function. The suitability of fitness function selection will affect the whole GA. Selecting a different fitness function will result in different convergence and accuracy of the whole algorithm [
29].
Aiming at the fitting of Weibull distribution in this paper, the purpose of our fitting is to find the appropriate values of λ and k, so we take λ and k as variables in the initial population.
The fitness function selected is:
In Equation (16), xi is the number of experimental cycles and Yi is the failure rate corresponding to the number i experimental cycle in the actual experimental data.
It should be noted that GA is to solve the fitness function to a minimum value, and the corresponding variables of the minimum value as the output of the optimal solution. The concept of LSM is also used in the construction of fitness function. The purpose of the fitness function is to minimize the absolute value of the difference between the output value of the function to be fitted and the actual data. After summing the squares of the difference between each output value and the practical data, we can calculate a function that is closest to the curve that was formed by the practical data.
Because of the existence of crossing and variation operations, GA has randomness in the process of finding the optimal solution, which can avoid the algorithm falling into local optimum to a certain extent. The search for the optimal solution by GA is general. The fitting degree of λ and k fitted by GA should be the closest to all data points, and the fitted function should be the closest to the actual curve.
Taking
λ and
k as input variables and Equation (16) as fitness function, the algorithm is simulated by the GA toolbox of MATLAB. After 200 generations of the calculation, the optimal value is
λ = 925830,
k = 5.7629. The relationship curve between the optimal solution of each generation and the number of iterations of the algorithm is shown in
Figure 12. In
Figure 12, the horizontal axis is the number of generation and the vertical axis is fitness value. It can be seen that the algorithm converges basically after about 10 generations.
Using GA to calculate this can get better fitting accuracy and convergence, however when facing a large number of data in IGBT acceleration test, GA takes too much time, which is the biggest problem of genetic algorithm. Therefore, we introduce a weighted least squares method.
The weighted algorithm is helpful to select the most important region of a set of data. Because the aging process is too slow and the data that are obtained are less when the current is small, in order to describe the average level of bi-directional aging of IGBT under low current (900 A), the LSM method of weighted residual is introduced in this paper. The LSM method of weighted residuals can conveniently determine which interval in the whole definition domain is the most concerned.
The standard LSM is to form a hyper plane in two dimensions (single independent variable, single dependent variable or SISO) by some functions. The sum of the distances between the hyper plane and the points in the sample space is the smallest. This hyper plane can be defined as
; the equation at any point on the extended hyper plane can be defined as:
In Equation (17), C is the set of sample spaces.
In weighted LSM, we define residual as the sum of the distances between points and the hyper planes.
In the improved LSM, the definition
where
is defined as the quotient between the number of abscissa
of sample point
and the total number of sample points.
is usually determined by experiment or experience. For IGBT aging failure rate fitting, it is actually the resolution of junction temperature fluctuation (the relative relationship between failure rate and junction temperature fluctuation) or the time resolution (the relationship between failure rate and time).
If defined ,
This is actually the definition of product in vector space.
Note: Here A can be given according to the normal distribution.
According to the weighted LSM theory, we select different weights and obtain three curves, as shown in
Figure 13. In
Figure 13, the horizontal axis is the number of cycle times and the vertical axis is cumulative failure rate.
Curve 1 is obtained by maximizing the weight of the interval [7 × 105, 8 × 105] of experiments’ number, parameter λ = 868359, k = 8.99, mean value calculated by Equation (5), E1 = 822222.
Curve 2 is obtained by maximizing the weight of interval [4 × 105, 7 × 105]. The parameters λ = 918264, k = 5.96, E2 = 851580.
Curve 3 is obtained by maximizing the weight of interval [1 × 105, 5 × 105]. The parameters λ = 987945, k = 4.02, E3 = 895730. Compared with the experimental data curves, the three curves fit the original data most accurately in the interval with the largest weights.
From the point of view of the relationship between failure rate and time, in fact, what we need to focus on is not only the accuracy of characterization of failure rate in the whole life cycle of products, however also the accuracy of the most product failure processes. Accurately grasping the change of the latter is more conducive to accurately predicting the state of aging damage of components. According to this idea, observing the experimental data curve under 900 A, the failure rate rises rapidly in the interval where the number of tests is [4.5 × 105, 7.5 × 105], so we think that curve 2 fits the aging situation under 900 A most accurately. By fitting the data with this weighted least squares method, the function corresponding to the above curve 2 is the closest to the actual distribution function.
In order to validate the distribution function fitted by weighted LSM, we compare the fitting results with those of GA. The function curve obtained is compared with the curve that was fitted by weighted LSM as shown in
Figure 14, where the horizontal axis is the number of cycle times and the vertical axis is cumulative failure rate.
According to
Figure 14, the curve of the function that was obtained by GA and curve 2 almost coincides in the interval of cycle number [4 × 10
5, 8 × 10
5], which proves that the weighted LSM can more accurately fit the distribution function. Compared with the GA, the computational complexity of weighted LSM is greatly reduced and the computation time is shortened by 27% compared with the GA. So, the weighted LSM is more suitable for fitting the distribution.
2.5. Analytical Fatigue Model
Different BAFTs are based on different
I1, as shown in
Figure 15. The horizontal axis in
Figure 15 is the number of cycle times, and the vertical axis is cumulative failure rate. Under
I1 at 1200 A and 900 A, the mathematical model between cumulative failure rate
F and cycle number
x is as follows:
Obtained from Equation (12)
In Equations (21)–(23), FB1500(x), FB1200(x) and FB900(x) are the cumulative failure efficiencies of I1 at 1500A, 1200A and 900A, respectively, and their mean values are 304890, 479170 and 851580. The corresponding equipment service life Nf can be expressed in terms of the mean value.
In addition, the ambient temperature
Ta should also be considered [
30,
31]. According to Equation (9), the
Nf expression considering
Ta is obtained.
In Equation (24), g is specific functions.
Finally, after the fitting calculation, Equation (9) is expressed as:
In Equation (25),
k is an Arrhenius coefficient. According to Arrhenius’s law (a temperature increase of 10 °C means that the service life is shortened by half),
k is calculated to be 60.65. The experimental and fitting curves are shown in
Figure 16.
Considering the RMS calculation scheme, the amplitude is
times of the effective value, so:
The data obtained in the experiment and the curve that was obtained by mathematical fitting is shown in
Figure 16. The vertical axis in
Figure 16 is the cycle counts that a DUT could survive before failure. The horizontal axis is the amplitude value of the test current. As shown in
Figure 16, the obtained curve conforms to the obtained data very well.
By Equation (27), when Ieq and Ta are known, the service life or failure rate of the IGBT can be predicted. Equation (27) has been subjected to actual field data. The example we consider here is a metro line application in Shenzhen, a southern-China city, with the annual temperature average of 23 °C. The Ieq has been recorded to be 489 A in a single year; therefore, Nf is calculated to be 2,197,460. Considering that a train normally operates 204 cycles per day and normally 300 days per year, the train is capable of operating around 36 years under the same working conditions.
According to statistics offered by metro operating agency, the initial failure rate of a new module is 157 FIT. Given that they hold that the failure rate rise of 200% means that the switching devices in a batch have failed, the statistical service life of the applied power module in the specific metro line is around 47 years (offered by the agency). This means that the approach that was proposed has a prediction error of around 23%, which is within project tolerance (normally 20%~30%). It should be noted that the lifetime predicted by our model in Equation (27) is more likely to be true and more convincing, because a metro train is designed to operate for no more than 30 years, and the 47-year lifetime prediction does not take into consideration the acceleration of the aging process, as is shown by a bathtub curve.