1. Introduction
The metal oxide varistor (MOV) is a nonlinear resistance element with zinc oxide (ZnO) as the main component. Due to its high surge current tolerance and very large current absorption capacity, it is the key component in surge protective devices (SPD). At present, it has been widely used in systems to protect railway communication from. Its performance directly affects the safety of the whole system [
1]. In their applications, due to the influence of environmental factors such as high voltage and high temperature, varistors are more likely to deteriorate and often fail to reach their calibrated working life. At present, the operation and maintenance of SPD is generally periodic maintenance, which is mainly conducted by staff on a regular basis to observe whether the appearance is damaged or whether the thermal fuse device is fused. The efficiency is relatively low, and only SPDs that have completely failed can be detected; the SPDs that are about to fail or have just failed are often unable to respond. With the advancement of intelligent railways, it is necessary to conduct effective and accurate online evaluations of lifetime surge capacity to prevent disasters caused by SPDs and improve the level of lightning protection operation and maintenance. With the development of electronic technology, monitoring methods have become increasingly mature and advanced. It is feasible to carry out online life predictions for surge protective devices under the existing technical conditions [
2,
3,
4,
5,
6]. At present, the breakdown voltage U
1mA and leakage current I
Leakage are often used to judge the degree of deterioration in varistors. However, these two indicators can only reflect the working state of the varistor and cannot reflect the degree of deterioration, and there will be a certain amount of lag in their judgment on the failure of the varistor [
7,
8].
The nonlinear coefficient is an important index reflecting the working performance of the varistor. The nonlinear coefficient is a calculated parameter that reflects the ratio of the current to the voltage after the voltage exceeds a threshold. The greater the value, the more effective the protection of the varistor on the circuit. At present, it has been shown that there is a strong correlation between the nonlinear coefficient and the degree of deterioration in a varistor. When the varistor works normally, its range is generally 20–100 [
9]. Yi Guilai et al. [
10] found that the nonlinear coefficient of a varistor changes regularly under the impact current. Through a varistor impact experiment, Yang Zhongjiang et al. [
11] found that the nonlinear coefficient of the varistor would decrease with its degradation. In recent years, some research work has explored the factors affecting the nonlinear coefficient. He Jinliang et al. [
12] found that the formation of a liquid phase during sintering has a significant influence on the nonlinear coefficient, which is related to the chemical reaction between the liquid phase and the mixed elements at the grain boundary. Nahm et al. [
13] found that the nonlinear coefficient has a great relationship with the morphology of the grain boundary. At the same time, there are many researchers and scholars committed to improving the nonlinear coefficient of varistors [
14,
15,
16,
17,
18], but there are few studies that have applied the nonlinear coefficient as the main parameter to construct the lifetime model.
In practice in engineering, there is often a need for a way to make automatic and accurate judgments on the deterioration of varistors. Nowadays, with the continuous development of the computer industry and the maturity of artificial neural network technology, this goal has become possible. At present, the trend of development in neural networks is deeper and more complex structures, which have a good fitting effect in the face of big data. However, lightning strikes occurring under natural conditions are difficult to predict, so the acquisition cycle is long and it is extremely difficult to obtain the key parameters in the deterioration process of varistor for a full life cycle. Even in a laboratory setting, collecting data for a full life cycle requires several hours of repeated surge impacts, and each experiment results in the complete failure of a varistor, which creates high experimental costs. In the face of limited sample data, too large network models will produce serious overfitting problems. So, we need to find a model of the right scale.
By establishing feedforward links between neurons, the recurrent neural network (RNN) can mine the temporal information between data and has good a adaptability in the face of temporal data tasks. On this basis, LSTM can effectively improve the gradient explosion problem in RNN by constructing the gate structure, and it has a very good fitting effect in the face of small-scale data. BiLSTM is an improved LSTM with a higher timing prediction accuracy, which has been widely used in text recognition, semantic recognition and other aspects [
19,
20,
21]. This model has good application prospects in predicting the life of a varistor.
In this paper, life prediction for varistors is realized by building a life prediction model via the change in the nonlinear coefficient during the deterioration of varistors based on BiLSTM. The structure of the paper is as follows. The first section is an introduction. In
Section 2, the life prediction method for varistors is presented, including parameter selection, data enhancement and model construction. In
Section 3, the correctness of the model is verified by experiments. The conclusion is in
Section 4.
2. Materials and Methods
2.1. Electrical Parameters of the Varistor
In research, breakdown voltage, leakage current and nonlinear coefficient are common electrical parameters used to measure the performance of a varistor. They can be measured directly by professional equipment. Breakdown voltage refers to the voltage across the varistor when 1 mA of direct current flows through it. Leakage current refers to the current flowing through the varistor at a specified temperature and maximum direct current voltage.
Figure 1 shows the changes in these three electrical parameters with various times of impacts in a measured experiment. As there is a large difference in the range, in order to compare the trends of changes in the three electrical parameters more intuitively, the data have been normalized.
It can be seen that there is no correlation between the leakage flow curve and the impact times. The level remains low until the varistor is completely damaged and then increases sharply. Therefore, it can only reflect the working state of the varistor and not its deterioration. The breakdown voltage and nonlinear coefficient are both negatively correlated with the number of impacts. But it is clear that the change in the breakdown voltage lags behind the change in the nonlinear coefficient. Even after the 35th impact, when the varistor is close to damage, the breakdown voltage is still more than 70% of its peak value, which cannot reflect the deterioration of the varistor over time. Therefore, the nonlinear coefficient is chosen as the electrical parameter used to construct the life prediction model.
Nonlinearity is an inherent characteristic of MOV. When the terminal voltage is lower than a certain threshold, the flux of the varistor is about zero. When the terminal voltage exceeds this threshold, the through current in the varistor increases sharply with the increase in the terminal voltage, and its volt–ampere characteristic curve is shown in
Figure 2.
Within a certain range greater than the terminal voltage threshold, the volt–ampere characteristic curve is approximately a straight line, and its curve can be approximately expressed as
where lgI is the vertical coordinate in
Figure 2, lgU is the horizontal coordinate in
Figure 2, and A is the slope of the straight line, i.e.,
θ is the angle at which the approximate line intersects the coordinate axis, and b is the intercept of the straight line.
Let b = AlgC, where C is the material parameter of the varistor, i.e.,
It can be seen that the volt–ampere characteristic of the varistor in this range is determined by the parameters A and C. A is the nonlinear coefficient, which is an electrical parameter describing the nonlinear degree of MOV. The larger the A value, the steeper the volt–ampere characteristic curve in this voltage range, and the stronger the absorption ability of the varistor for large currents.
After experiencing multiple surge impacts, the varistor deteriorates; ions on both sides of the forward and reverse bias barriers migrate, resulting in a drop in the varistor voltage and a deterioration in the volt–ampere characteristics of the varistor, as shown in
Figure 3. At the same time, the nonlinear coefficient A also decreases, and the current absorption ability of the varistor decreases. When nonlinear coefficient A decreases to a certain extent, the varistor fails and cannot play the role in overcurrent protection. This paper focuses on the trend of change in the nonlinear coefficient in the process of deterioration, and establishes a life prediction model based on experimental data.
Figure 4 shows the variation in the nonlinear coefficients of three different varistors in the same model but in different batches in the surge impact experiment. It can be seen that the nonlinear coefficient has a clear correlation with the number of impacts. But the correlation between different varistors is significantly different. Therefore, it is difficult to obtain a reliable model with generalization ability through nonlinear fitting and other traditional statistical methods.
2.2. BiLSTM Network
Long short-term memory network (LSTM) is a special recurrent neural network (RNN) which was proposed by Hochreiter and Schmidhuber in 1997 [
22]. It solves the problem of long-term dependence on information in time series learning through network design and is widely used in engineering practices.
The structure of the network in the single-layer LSTM is shown in
Figure 5. x
t is the input at the t training, and y
t is the output at the t training. The time–series information is transmitted between each training through two transmission states: cell state C and hidden state h. For example, during the t training, the states C
t−1 and h
t−1 obtained from the t-1 training will participate in the training, and the new states C
t and h
t will be calculated to participate in the t + 1 training.
The LSTM network contains three gate structures and four intermediate state parameters f, t, and o.
This structure is responsible for controlling the influence of historical information on the current training. Its state parameter f
t represents the degree of forgetting the cell state information at the previous time. When it is equal to 1, it means that the cell state information at the previous time is completely retained, and when it is equal to 0, it means that the cell state information at the previous time is completely forgotten.
where σ is sigmoid activation function.
This structure is responsible for controlling the new information in the cell state.
This structure is responsible for controlling the output information in the cell state.
The weight matrix W and the bias matrix b are obtained through the supervised learning of the training data and updating with the loss function as a metric.
Bi-directional long short-term memory (BiLSTM) is a complex LSTM model proposed by Mike Schuster and Kuldip K. Paliwal [
23]. A single BiLSTM layer is composed of two LSTM layers, one forward processing sequence and one reverse processing sequence, as shown in
Figure 6. Together, these two LSTM outputs form the BiLSTM output. Compared with the LSTM structure, the BiLSTM structure increases the transmission of reverse information and enhances the calculation accuracy by paying attention to the relationship between the before and after moments and the present moment at the same time.
2.3. Life Prediction Model
A flow chart for the proposed life prediction model is shown in
Figure 7. The model consists of five steps: data collection, data enhancement, data preprocessing, network training and prediction.
A certain current is used to impact the varistor at a certain time interval to simulate its actual working conditions. A varistor undergoing several impacts from full health until complete damage is recorded as one experiment. In the experiment, the nonlinear coefficient is calculated directly by the experimental equipment. The nonlinear coefficient of the varistor in one experiment was recorded as one set of data A.
The nonlinear coefficient data obtained from the surge impact test provide the original data for establishing a life prediction model for a varistor. However, due to the long test cycle and high cost, there is the problem that the amount of data available is small, which may affect the prediction effect of the model. In order to further improve the accuracy and generalization ability of the network, this paper uses data enhancement to improve the sample quality. It should be considered that the experimental data are discrete and have time sequence regularity. Common methods, such as flipping, clipping and interpolation, would break the regularity. Finally, add-noise is selected for data enhancement.
Gaussian white noise with SNR 50 is added to the original data to enhance the data in a ratio of 1:1. Some of the generated data is shown in
Figure 8.
In the degradation process of different varistors in the impact test, the variation curves of the nonlinear coefficients are bound to be different. In order to make the established life model have better generalization and migration, the nonlinear coefficients are normalized by 0–1 to be the life parameters Ã, i.e.
where A
max is the maximum value in this set of data, namely, the nonlinear coefficient in full health state; A
min is the minimum value in this set of data, that is, the nonlinear coefficient after complete damage.
The data set à of sample length N is divided and reconstructed, and its continuous n data Ã
i, Ã
i+1, …, Ã
i+n−1 are taken as a sample, Ã
i+n, to serve as the supervision label of the sample as shown in
Figure 9, i.e.
where X is the sample set and Y is the sample label set. X and Y constitute a data set, and each experiment generates a data set.
The network includes several BiLSTM layers, a full connection layer and a regression layer. The overall structure is shown in
Figure 10. The normalized and divided data set is taken as the input of the network. The output of the network is the value predicted by the lifetime model.
3. Experiment and Analysis
3.1. Experimental Setup and Data Processing
The experimental sample used is a certain brand of LKD 22S181K R03 model ZnO varistor, with a breakdown varistor voltage of U
1mA = 180 V, a nominal discharge current of I
n = 12 KA and a maximum discharge current of I
Max = 25 KA. We selected these types of varistors from the same batch for the surge impact test. The experimental equipment is shown in
Figure 11, including a test bench, a surge generator and a data acquisition terminal.
The test sample is impacted with a current of 8/20 μs waveform, which is stipulated by national standards and railway industry standards, The theoretical impact waveform and the actual impact waveform of the experimental device are shown in
Figure 12.
The interval between each surge impact is 5 min, and its state parameters are recorded until it is completely damaged. Complete damage often takes more than 40 impacts. After complete damage, the appearance clearly changes, one corner of the epoxy resin layer bursts and granular black powder appears, as shown in
Figure 13.
After each impact test, the value of the nonlinear coefficient is read and recorded from the experimental terminal. The nonlinear coefficients of each varistor from initial to complete failure were recorded as a data set. The data set is processed as described in
Section 2.3 and becomes the input to the lifetime prediction model.
3.2. Prediction Performance of Life Model under Different Hyper-Parameters
In the life prediction model proposed in this paper, in addition to the network training parameters, there are still a large number of hyper-parameters, among which the training sample length n and the number of BiLSTM network layers have the greatest impact on the model. In order to verify the influence of different hyper-parameters on the life model, the root mean square error (RMSE) was originally used to evaluate the prediction accuracy of the model, namely:
where
is the value predicted by the model.
The data set is divided into the training set and the test set according to the ratio of 80% to 20%. The training set data are used to train the life model under different preset hyperparameters by cyclic iteration method. Test set data include test set 1 and test set 2. The data from test set 1 and the training set are collected from the same batch of varistors of the same model, while the data from test set 2 and the training set are from different batches.
Figure 14 shows statistics for the prediction errors in the life curves of test set 1 and test set 2 generated by the models under different hyper-parameters after 100 independent trainings, in which the prediction accuracy of test set 2 reflects the generalization performance of the model.
From the statistics for 100 independent test errors under each hyper-parametric condition, it can be seen that the model has good stability and reliability. For test set 1, when the network contains a 1-layer BiLSTM layer and a 2-layer BiLSTM layer, the performance is close, but the performance decreases when there are three layers. For test set 2, layer one performance is the best. In the case of the same number of layers, when the training sample length is two, it performs better on test set 1, but it performs worse on test set 2. When the sample length is three and four, the model has a relatively balanced performance on the two data sets, and when the sample length is four, the performance is more stable. If the sample length continues to increase, the accuracy of the model will decline. Clearly, the multi-layer BiLSTM layer in this model cannot further increase the accuracy of prediction and will definitely occupy more computing resources. An appropriate increase in the sample length will help to enhance the stability of the model, but if it is too long, it will lead to an excessively high degree of model fitting and a lack of generalization and mobility.
3.3. Comparison between Predicted Curve and Observed Curve
Finally, the number of BiLSTM layers in the network is one, and the sample length is four. Other hyper-parameters are selected according to the empirical formula. If we input the limited length experimental data into the built varistor life model, the model converges quickly, as shown in
Figure 15. The predicted life curve is obtained and compared with the observed life curve, as shown in
Figure 16. The results show that the prediction trend from the model is smooth, and the predicted value at the key node where the life parameter decreases is consistent with the observed value. The local error mainly occurs at the node after the varistor is damaged, and the overall error meets the expectation. The output of this model can predict the lifetime of varistors.
3.4. Performance Comparison of Different Structures
In addition to the LSTM structure and BiLSTM structure, this paper also considers a lightweight LSTM structure called GRU for the characteristics of small-scale data.
The Gate Recurrent Unit (GRU) is a simplified LSTM model. Compared with LSTM, GRU uses an update gate instead of a forgetting gate and an input gate, thus reducing the number of parameters by about one third. In addition, GRU omits the intermediate parameter in LSTM, which ignores the long-term memory and focuses more on the association between the previous moment and this moment. And the GRU structure cancels the second-order nonlinear function during output. Therefore, GRU tends to have a higher computational efficiency and a stronger generalization in the face of data with small samples and simple structures.
In this paper, three different structures are tested on the original data and enhanced data, and the test results are shown in
Table 1. In order to more clearly compare the computational efficiency of different algorithms, the training was set up with 100 epochs. In practice, five to ten epochs can already achieve very good results which require less than two seconds.
The experimental results show that the training time required by the three structures is close to each other. However, compared with other structures, the BiLSTM structure has a higher prediction accuracy on both test set 1 and test set 2.
4. Conclusions
In this paper, a life prediction method based on BiLSTM is proposed to study the cumulative effect of the degradation process caused by surge impact on varistors. Compared with the existing life prediction methods, this method uses nonlinear coefficients to construct life parameters, which can better characterize the degradation process of varistors under surge impact, and uses the BiLSTM network to build a life model which can effectively use historical data to judge the life of varistors. Experiments show that this method has a high prediction accuracy, simple implementation structure, short network training time and uses less computational resources. At the same time, the model has good generalization and mobility, and has application prospects in predicting the life of varistors. At the same time, this method may be applicable to other small-scale time series forecasting scenarios.
Subsequent studies will further explore changes in the lifetime of varistors under varying surges and build more accurate life models with the help of more complex networks.