1. Introduction
Power electronics technology has shown tremendous advancements as an essential component of power conversion and control processes to meet specific demands. Metal Oxide Field Effect Transistors (MOSFETs) and Insulated Gate Bipolar Transistors (IGBTs) have been used in a wide spectrum of power electronics systems. The application area ranges from automotive and locomotive to avionics and aerospace and other space missions as well as safety-critical operations. The proper functioning and health monitoring of these devices are of utmost importance to avoiding downtime, human safety, and the prevention of catastrophic failures [
1].
The discrete power devices or modules in power electronic devices will be exposed to various stresses including electrical loading, thermal stresses, and mechanical vibration as well as humidity and chemical stresses as operating or environmental conditions. These stresses cause degradation or catastrophic failure, which affects the proper functioning and hence the lifetime and reliability of electrical and electronic products and systems. This has been the main driving force behind improving the reliability of power devices. As part of this, the reliability requirement of safety and mission-critical applications such as automotive, locomotive, and airplane applications oblige almost zero-defect or zero-failure tolerance. Regardless of the efforts that significantly reduced the failure rates of power devices, reliability remains the central focus in many application areas [
2]. The demand for uninterrupted and consistent power delivery, system availability, and the safety of these critical applications has led to the need for advanced reliability assessment techniques, and accurate prognostics have been instigated by [
3,
4]. This helps to optimize design parameters and quality characteristics, which ultimately helps to minimize failure tolerances and establish maintenance strategies [
5,
6].
Prognostics and health management (PHM) is an engineering method enabling the diagnostics and prognostics of products and systems based on in situ health monitoring, offline degradation patterns, and the identification of failure modes and mechanisms. There are multiple failure modes of power MOSFETs that can be grouped as package structure failure and precursor drift failure. Each failure mode can be caused by one or multiple failure mechanisms. According to the literature, the common failure mechanisms in power devices can be either chip-related (intrinsic) or package-related (extrinsic) [
1,
5].
Chip-related failures occur during the body diode and gate oxide degradation, while package-related failures happen during bond wire failures (liftoff or fracture) and solder layers degradation due to wear-out (degradation) or overstress [
2,
7]. In fact, the package-related failures are due to thermomechanical stress caused by a mismatch in the coefficient of thermal expansion of multilayer material. This mismatch, in turn, occurs when power MOSFETs are subjected to thermal cycling stress. As the reliability of such power semiconductor devices is greatly affected due to various failure mechanisms, lifetime prediction has become a critical issue in avoiding catastrophic failures by taking early warning measures. The lifetime prediction of power devices demands understanding the dominant failure modes and mechanisms. Consequently, the first and most important task in the lifetime prediction of power devices is the extraction of failure precursor data. Thereupon, the extraction of dominant failure precursor parameters leads to the selection of an appropriate prognostic method [
1]. The main failure precursors often used for power MOSFETs are on-state resistance (R
dson), threshold voltage (V
th), and junction temperature (T
j), while collector–emitter voltage (V
CE) and collector–emitter current (I
CE) [
8,
9,
10,
11] are used for IGBTs.
The studies on PHM for power devices can be divided into three categories: physics-based, data-driven, and fusion prognostic approaches. Physics-based models, also known as model-based approaches, employ mathematical models or equations developed based on the first principle of damage or degradation mechanisms observed with experiments. These models utilize knowledge about the system’s or product’s lifecycle loading and failure mechanisms to perform reliability modeling and lifetime assessment [
12,
13]. For example, model-based approaches make use of junction temperature swings and cycles for failure data based on empirical or analytical lifetime models such as the Coffin–Manson model, the Bayerer model, the Norris–Landzberg model, and so on in the lifetime estimation of power devices [
1,
14].
In model-based approaches, filtering algorithms such as Kalman filtering (KF) and particle filtering (PF) as well as Bayesian methods are usually employed to estimate model parameters recursively by using measured data. Dusmez et al. [
15] used the KF algorithm for the lifetime estimation of thermally stressed MOSFETs, with an assumption of a linear system with Gaussian noise, which is often idealistic. Patil et al. [
9] used the particle filter algorithm in the lifetime prediction of IGBTs with emitter–collector current data. Wu et al. [
16] employed an improved PF algorithm on a thermally overstressed MOSFET dataset. Although there is some progress in the model-based prognostics for power MOSFETs, physical models that describe the evolution of degradation are not accurate, as the future operating condition is usually uncertain. In addition, it is challenging to formulate a physical model that describes the degradation of power MOSFETs working under complex and dynamic systems. Also, the popular methods for model parameter estimation, such as PF, have the drawback of particle degeneracy, and KF assumes ideal system linearity with Gaussian noise.
On the other hand, data-driven approaches implement algorithms and methods that recognize patterns from large amounts of experimental or simulation data to drive empirical degradation models. These methods do not require an explicit mathematical model to describe the evolution of power MOSFET degradation. Recently, artificial intelligence (AI) and machine learning algorithms benefited the power electronics sector due to their immense potential in anomaly detection, diagnostics, and the prognostics of semiconductor device degradation [
17]. There are very few assumptions made about the underlying principles governing the lifetime of power devices, as machine learning predictive algorithms are trained with raw data performance indicators or precursors. Pugalenth, et al. [
18] employed a feed-forward neural network to model the reliability of power converters using run-to-failure data. Although they are able to predict the future lifetime, vanilla neural networks are not designed for sequential data and are less accurate. Zhao et al. [
19] derived composite precursor parameters and applied a genetic programming algorithm in the lifetime prediction of power devices. Similarly, fusion prognostic approaches can be employed in lifetime prediction, as they leverage the advantages of both data-driven and model-driven methods.
In general, classical model-based approaches and statistical and filtering algorithms have limitations in terms of handling the dynamic failure precursor (performance data) degradation nature of power devices. Although there are some studies on lifetime prediction, they are not often on the prognostics of power MOSFETs based on degradation data collected under realistic working conditions and do not utilize advanced prognostic approaches. Recently, the recurrent neural network (RNN) has been used to overcome these shortcomings. The notion of time step introduced in RNNs makes it suitable for sequential learning [
20], including natural language processing, machine translation, image captioning, genomic analysis, and so on [
21]. Nevertheless, RNN also suffers from the problem of gradient exploding or vanishing, which led to the introduction of the long short-term memory (LSTM) algorithm. LSTM networks are a popular variant of RNN that are able to address the shortcomings of traditional RNN due to their long-term memory capacity. With their immense ability to predict long-term degradation dependency, deep learning algorithms such as LSTM and GRU have been widely used for time-series data prediction in LEDs, batteries, and other applications [
22,
23].
In this paper, a prognostic method based on LSTM and its variant GRU algorithm is developed to estimate the lifetime of power devices. The study mainly focuses on the long-term lifetime prediction of power MOSFETs using failure precursor data. An accelerated aging test based on power cycling is conducted to gather electrical failure precursor parameters for the power device test samples. The accelerated aging test designed in this work is based on a power cycling test where power devices are supplied with electrical power and exposed to a certain period of ON/OFF, which is suitable for mimicking real application conditions. The on-state resistance degradation precursor will be used as the long-term time-series data to train and validate the proposed LSTM and GRU models. The failure precursors have been partitioned as training and testing sets for proposed neural network structures to prevent information leakage in the developed model.
The remaining part of this paper is organized as follows:
Section 2 describes the prognostic model developed based on the deep learning algorithms RNN, LSTM, and GRU. The experimental design and setup employed to gather the accelerated aging degradation data are described in
Section 3. In
Section 4, the experimental results, the details of data analysis, as well as results and discussions are presented, and conclusions are drawn in
Section 5.
2. Description of the Proposed Theory and Methodology
In this section, an overview of the proposed methodologies will be discussed. The proposed models and algorithms for modeling the degradation of power MOSFET device test samples and deep learning-based reliability assessment and lifetime prediction are introduced. The prognostic model developed based on the LSTM and GRU algorithms as well as the model regularization and prediction accuracy metrics for the performance degradation of power MOSFET test samples are presented.
Figure 1 shows an overview of the experimental design and setup for data collection, the prognostic model with LSTM/GRU networks, and the long-term lifetime prediction and evaluation methodology.
There are a few considerations that have to be addressed in the application of LSTM for the long-term lifetime prediction of power MOSFETs. The first consideration is an appropriate choice of training algorithms that are suitable for deep learning, as some of the classical stochastic gradient descent- or batch gradient descent-based training algorithms tend to be slower to converge to optimal solutions. The second issue is handling the problem of overfitting, as it is one of the drawbacks of deep neural networks in general and the LSTM algorithm in particular.
Thus, a prognostic model based on the LSTM and GRU long-term lifetime prediction algorithms is developed to address these challenges by incorporating three elements. These are the LSTM/GRU algorithm architectures and network parameter training using the Adam optimization method as well as the dropout to prevent overfitting problems encountered in deep learning. Adaptive moment optimization (Adam) is an efficient optimization algorithm that uses estimates of a gradient’s first and second moments to adapt the learning rate for each neural network weight. This algorithm is often used instead of the traditional stochastic gradient descent procedure, as it requires low memory and combines the advantages of its predecessor, the adaptive gradient algorithm (AdaGrad), and root mean square propagation (RMSProp) [
24].
2.1. An Overview of RNN, LSTM, and GRU
As computing power grows, machine learning algorithms have become an essential part of many industrial and business sectors. The application of machine learning algorithms has shown tremendous growth in the diagnostics, prognostics, anomaly detection, and general reliability assessment of power electronic components and systems. This is because machine learning algorithms are able to overcome the drawbacks of traditional model-based and statistical approaches in handling uncertainties in terms of noise factors, unknown failure mechanisms, as well as dynamic environmental and loading conditions [
17].
2.1.1. Standard Recurrent Neural Networks (RNN) Architecture
In classical multilayer perceptron (MLP), also known as feed-forward neural networks, information flows in one direction (i.e., forward propagation), which makes it not suitable for time series data. Recurrent neural networks are variants of neural networks that are suitable for sequence learning, as they allow for the use of the previous layer output as the input to the current state in order to predict the future layer output [
20]. This cell architecture enables RNNs to have a state and thus a memory which is used to capture information operations in previous states. A typical architecture for an RNN unrolled in time
in the estimation of performance parameters (such as R
dson) for power MOSFETs is shown in
Figure 2.
It can be noticed that the recurrent neuron is fed not only information from
from the current time step
but also a hidden state
from the previous time step
. Then, an output
will be generated at the output layer along with an updated hidden state
. This process can generally be described mathematically, as follows:
where
is the activation function,
,
, and
are the weight matrices between the hidden and input layer, the hidden layer and itself, as well as the output and hidden layers, respectively, at adjacent time steps. Similarly, the vectors
and
are bias parameters added at hidden and output layers that enable the nodes to learn an offset [
20]. Theoretically, RNNs are thought to have the capability of handling long-term dependencies. Practically, however, RNNs are able to look back 5–10 time steps [
21], as they suffer from vanishing or exploding gradients during the back-propagation of error signals. The LSTMs are specifically designed to overcome such problems and are suitable for sequential time series modeling, as discussed in the next section.
2.1.2. Long Short-Term Memory (LSTM) Algorithm Architecture
Long short-term memory (LSTM) is a special variant of RNN used in the area of deep learning. Introduced by Hochreiter et al. [
25], the LSTM is a powerful algorithm for overcoming exploding and vanishing gradient problems observed with simple RNNs. Due to this, LSTMs are found to be suitable for sequential time-series data analysis with long-term dependencies. The network architecture of an LSTM cell is shown in
Figure 3a. It can be observed that an LSTM cell has three sigmoid (
) activation functions which control the flow of information and protect the input (
), forget (
), and output (
) gates. These sigmoid gates are activated by inputs from the current input layer
as well as from the hidden layer
at the previous time step, which enables the LSTM algorithm to function as intended. The cell state (also called the internal state or memory),
, is used to preserve information at the current time.
The mathematical formulation for each gate and the operation of the LSTM algorithm are described in this section. The first step is to identify which information to remember and, on the other hand, to choose the information that has to be forgotten. This is executed by the forget gate, which uses the sigmoid function. The sigmoid function takes the inputs
and
and outputs values between 0 and 1, where 0 represents discarding everything and 1 represents preserving everything from the previous cell state.
The second step is to decide what information to store in the cell state, which is determined by the input gate layer with a sigmoid function, and candidate information can be saved as
at new cell states as follows:
Combining Equations (3) to (5), the values of old cell states
can be updated as the current cell state or cell memory
and calculated as follows:
Third, the output gate layer determines what information to output as
with the sigmoid function and output hidden state
, as follows:
In these equations,
,
;
,
and
,
are weight vectors for the input, output, and foregate gate layers, respectively, and
,
, and
are the biases for the input, output, and forget gates between the current time
and previous time
in the LSTM network. The symbol
is the unit by the element-wise product, also known as the Hadamard product. In addition,
and
are sigmoid and hyperbolic tangent activation functions that map values between 0 and 1 and between −1 and 1, respectively. The values for the
and
functions are calculated by using:
The standard LSTM and GRU algorithms have Bi-LSTM and Bi-GRU architecture variants, which are the variants of the standard networks; they have forward and backward propagation. This paper will not implement the Bi-LSTM and Bi-GRU variants in the deep learning algorithm for sequential data modeling.
2.1.3. Gated Recurrent Unit (GRU)
The gated recurrent unit (GRU) is another variation of an LSTM that has a simplified neural network compared to LSTM. GRU was first introduced by Cho et al. [
26] for machine translation. It has been widely adopted in the prognostic application for degradation modeling [
22]. The GRUs are considered less computationally expensive, as they use fewer training parameters, resulting in faster convergence and a good choice for smaller datasets. On the other hand, LSTMs may work better for larger datasets, as they retain more temporal information. The cell structure of GRU has only two gates (i.e., a forget gate and an update gate), with no separate output gate, as shown in
Figure 3b, and the equations are as follows:
Here,
and
are the reset gate and update gate, respectively, while
is the current input for the cell at time t;
denotes the previous output of the hidden layer; and
is the sigmoid function.
is the state candidate that uses the tanh function to scale the date for the current state
. The network architectures of the deep RNN, LSTM, as well as GRU can be represented as shown in
Figure 4.
2.2. Training Algorithm for LSTM and Variants
A proper selection of a training algorithm is an important part of learning with neural networks. The gradient descent or loss function has been a widely used optimization algorithm. In addition, the LSTM algorithm has been trained using Root Mean Squared Propagation (RMSProp) and the Adaptive Gradient (AdaGrad) version of gradient descent. Whereas the former is an extension of the gradient descent or loss function, the latter uses a decaying average of partial gradients in the adaptation of the step size for each parameter. In this paper, Adam (adaptive moment estimation) optimization is used, as it helps to realize the advantages of both RMSProp and AdaGrad and is hence considered the popular optimization algorithm for most machine learning approaches.
2.3. Model Regularization to Overcome Overfitting
One of the common drawbacks of training neural network algorithms such as LSTM is overfitting. The accuracy of a model will be in doubt, even if it shows overfitting, as the model is trying to capture all the noise and outliers available in a dataset. In general, overfitting is a scenario where the LSTM or GRU model tries to learn from the details along with the noise in the data and tries to fit each data point on the curve, while the model curves may not correspond to the patterns in the new data. To overcome overfitting, regularization is employed so that it prevents models from either overfitting or underfitting by discouraging the learning of a more complicated or flexible model. Mainly, three types of methods, namely, Lasso (L1), Ridge (L2), and dropout regularization, have been used to prevent overfitting. A combination of L1 and L2, known as Elastic Net regression (L1/L2 regularization), can also be applied. L1 regularization sums up the absolute value of weights to the network, and L2 regularization adds the sum of all squares of the weights in the network. Recently, dropout regularization [
27] has been found to be effective for neural networks in preventing overfitting and, as a result, has been used in the prognostics of batteries using LSTM [
28].
2.4. Prediction of Long-Term Lifetime
Due to the thermal, electrical, mechanical, and humidity stress that discrete or module devices are exposed to, their failure mechanism leads to degradation or device failure. As a common phenomenon, the degradation failure mechanism dominates, and in the power electronics industry, the failure threshold of power MOSFETs is when the on-state resistance (
) reaches up to 17% [
1] of its initial or pristine state value. Other performance parameters such as the threshold voltage (V
th) and drain current (I
d) are also considered failure precursors when the threshold values reach 20% [
7] and five times (5×) [
5] of their initial values, respectively [
29]. Thus, for power MOSFETs, the RUL can be estimated when a certain failure precursor crosses the failure threshold.
where T
EOL is about a 17% increment from the pristine state R
dson value, considering on-state resistance as the main failure precursor (degradation parameter);
is the time when the prediction started considering a certain portion (30%, 50%, or 70% of training data) from degradation trend data.
The initial on-state resistance value at a pristine state is 37.137 mohm for DUT #1, 36.733 mohm for DUT #5, and 37.231 mohm for DUT #9; the respective failure threshold will be 44.56 mohm, 42.98 mohm, and 43.56 mohm. However, the degradation value shows that the failure precursor values of the devices have not crossed their failure threshold, and thus, the prediction metrics of MAE, MSE, and RMSE will be suitable for long-term lifetime prediction.
4. Results and Discussion
In this section, an analysis of the on-state resistance degradation data for three samples that represent different power cycling conditions is conducted to validate the proposed algorithm. All the tests are implemented with Python 3.9 on a laptop equipped with an AMD Ryzen 7-5800 H processor (16 MB cache, up to 4.4 GHz), 16 G DDR4 3200 memory, and an NVIDIA GeForce RTX 3050 4 GB graphic card. All the proposed algorithms in this paper including simple RNN, LSTM, as well as GRU have similar model arrangements, except for the replacement of each particular algorithm for computation and a fair comparison of the prediction results.
The proposed LSTM/GRU predictive algorithms are trained using the training set and optimized according to their performance on the validation set. The performance of each model at different prediction points was evaluated using only testing data that were not included in the training process to prevent information leakage. The one-step and multi-step prediction of long-term power MOSFET degradation data can be conducted. The one-step ahead prediction is mainly suitable for online failure prediction, as all-failure precursor measurements cannot be easily obtained while working. On the other hand, a prediction of a long-term degradation trend suggested that LSTMs and variants are suitable for reflecting on future degradation phases [
32]. Thus, multi-step prediction is used in the proposed LSTM algorithm and compared with the traditional simple RNN and GRU methods.
4.1. Data Preprocessing, Parameter Setting, and Model Formation for the Models
The degradation data from power MOSFETs 1, 5, and 9 (selected one sample from each testing condition) are used to validate the proposed algorithm. To make a fair comparison of the different algorithms for validation, the model parameters are kept similar. The configurations of the RNN, LSTM, and GRU models considered are as follows: an Adam optimizer with a default learning rate = 0.001, two hidden layers with 128 and 64 neurons, one output layer, an Adam optimizer, a training loss that the model minimizes, which is a mean squared error, an epoch size = 100, and a batch size = 16, as shown in
Table 3.
4.2. Implementation of RNN/ LSTM/GRU Models and Prediction Results
After setting up the model parameters, the model is trained with training data, and the future lifetime of power MOSFETs is predicted with three different prediction starting points (cycles). These prediction starting points considered are the on-state resistance data of the first 24,000 cycles (i.e., 30% of the full degradation path), 40,000 cycles (i.e., 50% of the full lifetime), as well as 57,600 cycles (i.e., 70% of the full lifetime). The training and testing partition of the degradation data is performed, as shown in
Figure 9.
The long-term lifetime prediction of power MOSFETs with a starting point of 40,000 cycles is carried out first, followed by 24,000 cycles and, finally, 57,600 cycles. This procedure will be helpful in comparing the prediction performance of algorithms with multiple training and testing datasets. After training the proposed algorithms with 40,000 cycles of on-state resistance degradation data, the model losses showed that RNN has faster convergence compared to LSTM and GRU, whereas the training and testing losses are smaller for the advanced model at the end of model training, as depicted in
Figure 10.
The multi-step ahead prediction results based on the on-state resistance degradation data for power MOSFET 1 at a starting point of 40,000 cycles (50% training and 50% testing split) using the RNN, LSTM, and GRU algorithms are carried out, as shown in
Figure 11.
It can be noted that the prediction accuracy of the LSTM and GRU is superior compared to that of the simple RNN algorithm, with an MAPE of 0.9%, 0.78%, and 1.72%, respectively. The prediction metrics results at 40,000 cycles, where the first 50% of the data are used in model training, and at 24,000 as well as 54,000 cycles of training data are shown in
Figure 12. It can be noted that the LSTM and GRU performed better for the prediction compared to the RNN as the results of MAPE manifested. Similarly, the distribution of the prediction error from the estimated and testing data is plotted in
Figure 11e and confirms the prediction metrics results, with a wide base distribution error for RNN and a narrow distribution for LSTM and GRU.
The difference in the performance of the proposed prognostics algorithms for MOSFET 1 can be easily observed by visualizing the MAPE, MSE, and RMSE of estimates at the different starting points of measured data, as shown in
Table 4.
In addition, the long-term lifetime prediction of the MOSFET 5 and 9 test samples from two different conditions of junction temperature swing is performed at a 40,000 starting point. The prediction results along with the distribution of the model prediction error are shown in
Figure 13a–d and
Figure 14a–d. In the prediction process, the training loss for RNN converged faster compared to that of LSTM and GRU, which may be attributed to its simplified cell architecture, as shown in
Figure 15 and
Figure 16. Similarly, the on-state resistance prediction error estimated from 40,000 cycles of the starting point of the training and testing data values using the RNN, LSTM, and GRU prediction models is shown in
Figure 13e.
4.3. Discussion Based on Lifetime Prediction Metrics and Model Robustness
As the prediction of long-term lifetime is conducted using the proposed method on degradation data, the next logical procedure is to evaluate the model prediction error. The model prediction error for such a regression type of the problem is mainly assessed using MAPE, MSE, as well as RMSE. The robustness of the proposed method is tested by using different proportions of training data or measurement cycles of failure precursors. Here, the long-term lifetime of two test samples (MOSFET 9 and 5) is explored at 24,000 and 54,000 cycles of starting points with 30% and 70% training data and 70% and 30% testing data, respectively, to demonstrate the robustness of the proposed algorithms.
Figure 17 presents the prediction plots and distribution of prediction error at 24,000 cycles of a starting point for MOSFET 9 degradation data. It can be noted that the prediction accuracy has decreased for the three models, as the algorithms used less training data. On the other hand, the RNN model is less affected by the smaller dataset, as its capability is also limited by shorter memories compared to the LSTM and GRU algorithms.
Similarly, the prediction plots at 54,000 cycles of the starting point and the distribution of prediction errors for MOSFET 5 degradation data are presented in
Figure 18. The results showed that GRU and LSTM outperformed the simple RNN model. In addition, the overall prediction accuracy of the three models increased as more training on-state resistance degradation data were used, as compared to the fewer training data of 24,000 (30%) and 40,000 (50%) cycles.
Although the prediction plots are displayed for the randomly selected test samples 1, 5, and 9 due to space limitations, all these prediction results are given as depicted in
Table 4,
Table 5,
Table 6 and
Table 7. Based on the long-term lifetime prediction, the model performance metrics results with MAPE, MSE, and RMSE for MOSFETs 1, 5, and 9 at 24,000 cycles of (30% of a full lifetime) starting points are shown in
Table 5.
As shown in
Table 6, the MAPE prediction error metrics at 40,000 cycles of the starting point are 1.72%, 0.90%, and 0.78% for MOSFET 1; 0.94%, 0.6%, and 0.6% for MOSFET 5; and 1.05%, 0.91%, and 0.78% for MOSFET 9 using the RNN-, LSTM-, and GRU-based models, respectively. Similarly, the MSE and RMSE results for the model performance metrics are given in the same chart. These results showed that the LSTM- and GRU-based prognostic models performed better compared to the simple RNN-based model. In addition, the prediction error decreased compared to the previous scenario, where training data of only 24,000 cycles of starting points was used.
Lastly, the MAPE, MSE, and RMSE prediction error metrics at 54,000 cycles of starting points for MOSFET 1, 5, and 9 are given in
Table 7. The long-term lifetime prediction results for MOSFET 9 show MAPE results of 1.07%, 0.89%, and 0.65% with RNN, LSTM, and GRU, respectively. These prediction results show that the model’s prediction accuracy increases as it receives more training data compared to 24,000 and 4000 cycles of starting points, and GRU- and LSTM-based models predict better compared to the RNN model.
It is worth noting that the prediction errors with more training data (such as 70%) are smaller and closer to the actual or measured value compared with the models using less training (such as 30%) data, as shown in
Table 5 and
Table 7, respectively. This is interesting, as the prediction uncertainty increases with an increase in the long-term prediction curve and less training data. Overall, the prediction metrics values of the proposed LSTM and GRU methods showed accurate and precise long-term estimation, which shows a reliable multistep-ahead prediction for power MOSFET degradation precursor parameters.
In general, the long-term lifetime prognostics results showed that the proposed algorithms are suitable for dealing with failure precursor degradation analysis problems for power MOSFETs. The LSTM and GRU performed better compared to the simple RNN model for long-term lifetime predictions. It is also worth noting that the convergence speed of RNN and GRU is faster than that of LSTM in model training, which is attributed to the simpler internal cell structure of RNN followed by GRU relative to LSTM networks.
5. Conclusions
In this study, data-driven deep learning algorithms based on LSTM and GRU are used to predict the future degradation pattern and, hence, the lifetime of power MOSFET devices. As one of the dominant performance parameters, the on-state resistance failure precursor data of these devices are considered in the implementation of the proposed algorithm. To demonstrate the proposed LSTM and GRU models, the on-state resistance data from an accelerated degradation test based on power cycling were collected at different junction temperature swings of 45 °C, 100 °C, and 110 °C. The adaptive moment estimation (Adam) optimizer is used to update network weights. Dropout regularization (0.2) is employed to prevent overfitting, and a learning rate of 0.0001 is set during data training for the constructed neural network models.
The accuracy of prognostic models based on the RNN, LSTM, and GRU algorithms is evaluated using the MAPE, MSE, and RMSE prediction metrics. The prediction results based on the proposed LSTM and GRU showed an accurate and precise lifetime prediction compared to the classic RNN algorithm. It is also worth noting that the convergence speed of RNN and GRU is faster than that of LSTM in model training, which is attributed to the simpler cell structures. The robustness of the proposed approaches is verified by using 30% and 70% of the measured data for model training in addition to the 50% training and testing setup, which shows the adaptability of the model for power device degradation trends. In general, the LSTM and GRU models are found to be effective for degradation assessment and long-term lifetime predictions for power devices based on failure precursor data. With an online data acquisition system, prognostic models can be employed in the condition monitoring of power devices.