1. Introduction
In the era of Big Data, fault diagnosis has always played an important role in industrial production [
1,
2,
3]. In practical environments, the equipment works with complex operating conditions and strong noise. Its core components (such as bearings, gears, and motors) occasionally fail, and some faults are challenging to locate using traditional methods, so intelligent fault diagnosis is essential to ensuring their safety and reliability [
4,
5,
6,
7,
8,
9].
Convolutional Neural Network (CNN) [
10,
11,
12,
13,
14,
15], one of the representative deep learning models, is increasingly applied to deal with fault diagnosis tasks relying on its outstanding advances in deep features learning and nonlinear classification. Zhang et al. [
16] proposed a fault diagnosis method. The method uses multiple parallel convolutional layers to extract rich and complementary fault features effectively and then transforms the one-dimension signal into a two-dimension signal by continuous wavelet transforms. Huo et al. [
17] extended a convolutional neural network with transfer learning, which can adaptively process one-dimensional vibration signals into two-dimension matrices and reduce the data distribution distance between the source and the target domains. Wang et al. [
18] proposed a new multi-sensor information fusion method constructing the time-domain signals into a rectangular two-dimension matrix and then used an improved 2D-CNN to realize signal classification. Wen et al. [
19] employed LeNet-5 to develop a new CNN and convert signals into 2D images through a conversion method. The proposed method easily captures the features. S. et al. [
20] applied a one-dimensional convolutional neural network to motor fault diagnosis and proposed adding feature classification and extraction to a single learning body. Zhao et al. [
21] proposed a normalized CNN for rolling bearing diagnosis with different severity and fault directions under data imbalance and variable conditions. Abdeljaber et al. [
22] presented a novel, fast and accurate structural damage detection system using 1D Convolutional Neural Networks (CNNs) that has an inherent adaptive design to fuse feature extraction and classification blocks into a single and compact learning body. Dibaj et al. [
23] proposed a fault diagnosis approach based on variational mode decomposition and CNN for rotating machinery. Janssens et al. [
24] proposed a DL model for condition monitoring using CNN and proved that the feature learning method was significantly better than the feature extraction method in fault diagnosis of rotating machines. A Convolutional Neural Network (CNN) machine learning algorithm is proposed to classify gearbox faults in [
25], and the learning features of the CNN filters are visualized to understand the physical fault diagnosis phenomena. Cheng et al. [
26] implied an intelligent fault diagnosis method for the rotation machine based on a new continuous wavelet transforming the local binary convolutional neural network. Therefore, in the field of fault diagnosis, scholars have utilized a lot of convolutional neural networks to improve the accuracy of diagnosis and finally achieved excellent results. Although these studies verify the efficiency of CNN in the fault diagnosis field, the two following problems remain.
In the course of fault diagnosis, firstly, converting vibration signals to spectrogram requires a quantity of computation. Secondly, spectrograms are resized as small images before the training model to decrease computation cost time; some signal nature will be lost during the compression process. Therefore, the proposed method is dedicated to one-dimension data [
27,
28,
29]. In addition, the traditional 1D-CNN uses the pooling layer to obtain the receptive field. Information on the Avg/Max pooling region is insufficient to capture the importance of the pooling feature. Although stride convolution could learn from neighbor features, it fails to model the importance of down sampling procedures adaptively and limits shift-invariance because it focuses only on one fixed location within each sliding window and discards the rest [
30].
Since the equipment is usually affected by load changes and environmental noises, some scholars use dilated convolution to replace the pooling layer in standard convolution. This process can retain the timing relation of original signals and obtain larger-scale feature information, which helps improve the feature learning ability of neural networks. In [
31], Su et al. used a dilated convolution deep belief network dynamic multilayer perceptron for bearing fault recognition under alterable running states. Zhao et al. [
32] proposed a novel transfer learning framework based on a Deep Multi-Scale Convolutional Neural Network (MSCNN), in which dilated convolution and global average pooling were introduced to realize intelligent fault diagnosis of rolling bearings. Han et al. [
33] used dilated convolutions to construct a Novel Multi-scale Dilated Convolutional Neural Network (NMDCNN) to enrich the field of view coverage. Meanwhile, piecewise cross-entropy is used to balance the misclassification cost between healthy samples and faulty samples. Chu et al. [
34] advanced a novel multi-scale convolution model based on Multi-Dilation Rates And Multi-Attention Mechanism (MDRMA-MSCM) for mechanical fault diagnosis. Wang et al. [
35] proposed Cascade Convolutional Neural Network (C-CNN) for fault diagnosis, and a cascade structure was built to avoid information loss. In conclusion, dilated convolution has made significant achievements in the field of fault diagnosis, and its advantages are as follows:
- (1)
Dilated convolution replaces the pooling layer operation in feature processing. This process not only cannot reduce the receptive field of the network but also can leave the temporal relationship of the original signal completely, which is of great significance for mining the domain invariant characteristics of the signal.
- (2)
The use of large convolution kernels in standard convolution will increase the computation amount. Compared with larger convolution kernels, dilated convolution has significantly less computation making the model more accurate in processing fault features.
However, dilated convolution still maintains the black-box feature, which assigns the same weight to different feature channels, and cannot adaptively adjust the corresponding weight according to the importance of the channel. In particular, the lack of attention mechanism ignores the exceptional contribution of the information data segment.
In recent years, the attention mechanism has been experimentally implied to solve this problem. It is inspired by the method that the human brain uses limited computation to obtain high-value information while processing information. Based on the idea, it has been applied in plenty of deep learning methods. Extracting the features of the original vibration signals, the attentional mechanism can dynamically enhance the weight of significant feature channels, thus improving diagnostic accuracy. Zhang et al. [
36] proposed a fault diagnosis method, which is utilized to realize spatiotemporal feature fusion, where vibration signal fused features with attention weight. Li et al. [
37] constructed a rolling bearing fault diagnosis model, which combines a Dual-stage Attention-based Recurrent Neural Network (DA-RNN) and Convolutional Block Attention Module (CBAM). Yang et al. [
38] developed a method based on multilayer bidirectional gated recurrent units with an attention mechanism to access the interpretability of neural networks in fault diagnosis. Zhang et al. [
39] proposed a Hybrid Attention improved Residual Network (HA-ResNet) based method to diagnose the fault of the wind turbines’ gearbox. This method highlights the essential frequency bands of wavelet coefficients and the fault features of convolution channels. Cao et al. [
40] built a deep domain-adaptive multi-task learning model Y-Net, which is exploited to enable domain-adaptive diagnosis of faults in planetary gearboxes. The Squeeze and Excitation Residual (SE-Res) modules are utilized to reduce the redundancy of the model and improve the separability of deep features.
Attention mechanisms can be used to solve the problem that dilated convolution cannot adaptively obtain the particular weight of informative data segments. At the same time, the dilated convolution is introduced into the attention mechanism so that the model can gather a larger receptive field and extract more features. Inspired by the dilated convolution and attention mechanism, an ECNN model was proposed in this study. The model employs a novel Pyramidal Dilated Convolution to extract more valuable features, and the Residual Network Feature Calibration and Fusion (ResNet-FCF) block is implied to assign different weights to different feature channels. This article contributes:
- (1)
In this model, the pyramidal dilatated convolution is designed to extract the features of data segments, which dramatically improves the receiving field and captures more features.
- (2)
The ResNet-FCF is designed as a novel attention network architecture. Based on the local interaction learning scheme, ResNet-FCF introduces a residual network with one-dimensional convolution for global cross-channel interaction. The module assigns precise weight to the informative data segments.
- (3)
The model has a good diagnosis effect under three practical examples.
The rest of this paper is organized as follows. The basic theory of the convolutional neural networks and residual networks are briefly introduced in
Section 2. Moreover, it introduces an ECNN model and a novel fault diagnosis method. In
Section 3, the experiment results are analyzed and discussed. Ultimately, the conclusions are presented in
Section 4.
3. Results
In this section, the Windows10 operating system is used. The Random Access Memory (RAM) and graphics process unit are 16GB and GeForce GTX 1060, respectively. The ECNN test model was built and trained under the Keras framework. Keras is an open-source artificial neural network library written in Python, which can be used as a high-level application interface for Tensorflow, Microsoft-CNTK and Theano to design, debug, evaluate, apply and visualize deep learning models. Three examples are given to verify that the ECNN fault diagnosis model can extract a considerable number of data features with equivalent calculations and assign much weight to essential features and little to irrelevant features. In the experiment, three different data sets are used to verify its effectiveness: Case Western Reserve University (CWRU) rolling bearing, National Aeronautics and Space Administration (NASA) rolling bearing, and the Permanent Magnet Synchronous Motor (PMSM) of our lab, respectively.
3.1. The CWRU Rolling Bearing Example
The CWRU bearing data set is recognized as an authoritative fault diagnosis standard data set. In order to objectively compare the method with others, this paper selects the CWRU bearing data set for algorithm verification. In the experiment, the drive end data are selected. The bearing has three types of fault conditions: the inner ring, outer ring and rolling element. The depth and location of each type of fault are different, as shown in
Table 1. There are four hundred samples collected for each condition. The load of this experiment is variable and can be divided into HP1, HP2 and HP3 load data. The experimental data contains 16 kinds of conditions. The signals are shown in
Figure 6.
3.1.1. Model Structure
In the experiment, six traditional methods and four convolutional neural networks were selected for the comparison. The conventional method extracts 11 time-domain features and 13 frequency-domain features from the samples and then classifies them using BPNN and SVM. More details about the eleven TD features and thirteen FD features can be seen in [
46,
47], respectively. CNN, dilated convolution and SE (Squeeze-and-Excitation) CNN are taken as the comparative experimental models of deep convolutional neural networks. The first layer of the ECNN model uses a one-dimensional way for data processing, and then different dilated rates are set for the other eight layers. This process can replace the pooling layer in standard CNN so more valuable features can be extracted while the receptive field increases. The designed ResNet-FCF block assigns different weights to feature channels after each layer. The model parameters of the deep learning network ECNN are shown in
Table 2.
3.1.2. Diagnosis Results of Different Methods
In order to obtain more accurate results, the experiment was divided into ten groups. The results show that the performance of BPNN and SVM depends on feature extraction to a large extent. When the selected input becomes sensitive features, the diagnostic results are further improved to 79.63% and 87.14%, respectively, but this is expensive and time-consuming. The accuracy of the proposed method is 99.12% higher than that of the nine methods, as shown in
Table 3 and
Figure 7.
3.1.3. Visualization of the Fault Diagnosis Process
ECNN is set for fault diagnosis, in which nine layers exist. Each layer initially uses different dilation rates for feature extraction and then utilizes the ResNet-FCF block to assign appropriate weights for the extracted features effectively. Thereby the informative data segments will receive more attention and their corresponding channel weights increase. This paper takes advantage of the t-SNE dimensionality reduction technique to visualize the features in each layer. The diagram of the structure is shown in
Figure 8.
3.1.4. Weight Comparison of ECNN and SECNN
To explore the role of ResNet-FCF in every convolutional layer of ECNN, we visualize the weights generated by each convolutional layer and then compare the effects of ResNet-FCF and SE blocks at each layer in detail. As shown in
Figure 9, the results indicate that when different types of samples are input, the weight generated by the fault diagnosis model is more related to the category information. The combination of local cross-channel and global cross-channel interactions in the ResNet-FCF block significantly improves the ability of channel attention.
3.1.5. The Influence of Hyperparameters
The structural parameters of the model are determined by experiments. Two key parameters
k (convolution kernel size) and
C (number of channels) are selected through cross-validation, as shown in
Figure 10. The results of the 3D hyperparametric graph show that the accuracy of the model is the highest when the size of the convolution kernel is sixty-four, and the number of channels is three.
3.1.6. The Influence of Different Segmentation Ratios
In the experiment, with the increased number of training samples, the accuracy and experiment time of each network structure also increase. As seen in
Table 4, under the conditions of different sample segmentation ratios, the ECNN fault diagnosis model has achieved the greatest results.
3.1.7. The Influence of Different SNR
To further research the performance of the model, various levels of white noise were added to the traditional learning model and the deep learning model, respectively. The results show that the traditional methods are less competitive compared with CNN in terms of noise-resistance. The model behaves brilliantly in terms of robustness compared with other deep learning networks.
Table 5 shows the accuracy of each model under noise conditions.
Figure 11 shows the noise-resistance curve.
3.1.8. The Influence of Variable Load
In order to analyze the generalization ability of these models, every method is required to be trained under different loads, and another load will be used as a test set. The experiment results are shown in
Figure 12 and
Table 6.
3.2. The NASA Rolling Bearing Example
In this section, the NASA dataset is further used to demonstrate the superiority of the proposed method in feature extraction and contributions from different data segments. From the beginning to the end of the data collection, there are four cases in the data set: normal, inner ring failure, outer ring failure and rolling element failure. The test time of normal, inner ring and rolling elements are all thirty-five days, with the exception of thirty days for the test of the outer ring. The data set extracted in this experiment was collected after frequent failures, and the data of the last five days were taken as samples shown in
Figure 13 and
Table 7.
This experiment obtained 300 training samples and 100 testing samples from each condition, and the length of each sample was set to 1024. Ten experiments were conducted for each one of the four deep learning networks. The experiment results are shown in
Figure 14 and
Table 8. Compared with other deep learning methods, the proposed method has obvious advantages and enables to accurately identify different types of faults.
The following conclusions can be drawn: different changes in load have different influences on signals. Compared with other deep learning methods, the ECNN fault diagnosis model has a high average accuracy rate that reaches 97.19% under variable working conditions. The results show that the model has a strong generalization ability and domain adaptability.
3.3. The Example of PMSM
Further study on the electrical systems of Permanent Magnet Synchronous Motors (PMSM) will be carried out to verify the transferability and extensibility of the ECNN-based method.
The experimental platform comprises the following parts: upper computer, electromechanical actuator and its controller, sensor, data acquisition system, loading system and power supply system. The structure of the experimental platform is shown in
Figure 15.
3.3.1. The Faults of Motor
In order to guarantee the safety of the test and prevent irreversible damage to the motor, mainly the inter-turn short circuit fault of the motor is simulated in this study, as shown in
Figure 16. The inter-turn short circuit fault is mainly reflected in the change in the winding resistance value. Therefore, the three-phase winding of the motor is externally connected, in which the resistance of the two-phase windings has the same value (the other phase is not a series of resistance or a different resistance value of the resistance) to simulate the three-phase winding asymmetry. At the same time, the three-phase current and speed signals of the motor under different fault depths are collected.
As shown in
Figure 16, number 1 represent the controller of the output bus, numbers 2, 3, and 4 represent the three current sensor output signal, number 5 on behalf of the 5V DC power supply, numbers 6, 7 and 8 are used to measure the a, b, and c three-phase current of the current sensor respectively, number 9, 10 and 11 are concatenated in an a, b, and c three-phase resistor respectively, and number 12 represents the input bus at the end of the motor end.
3.3.2. Influence of Short Circuit on Three-Phase Current of Motor
The 1%, 2%, 4%, 8% and 16% inter-turn short circuit faults were injected for a PMSM stator winding. (a), (b), (c), (d), (e) and (f) are the three-phase current curves of the motor under different working conditions. The red line is the a-phase current curve, the purple line is the B-phase current curve, and the blue line is the C-phase current curve. As shown in
Figure 17.
3.3.3. Verification of Different Deep Learning Algorithms
Four deep learning models were trained ten times to obtain average accuracy. As shown in
Figure 18. The results show that ECNN can improve the diagnosis ability of small faults and has a good recognition ability for inter-turn short circuit faults with the same fault type and different fault depths. In the experiment, the accuracy of CNN is 93.38%, while the accuracy of optimized DCNN is improved to 96.61%. The recognition accuracy of the algorithm combining SECNN is 97.25%, while the accuracy of the ECNN network is the highest, reaching 99.87%. Deep network ECNN has great advantages in feature extraction and state recognition. The confusion matrix of the four deep learning models is shown in
Figure 19.