Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques

Xu, Xiaozheng; Li, Ying; Ding, Xuebao

doi:10.3390/app14188431

Open AccessArticle

Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques

by

Xiaozheng Xu

^1,*,

Ying Li

¹ and

Xuebao Ding

²

¹

School of Art and Design College, Shenyang Ligong University, Shenyang 110180, China

²

School of Design & Manufacturing Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8431; https://doi.org/10.3390/app14188431

Submission received: 14 August 2024 / Revised: 15 September 2024 / Accepted: 17 September 2024 / Published: 19 September 2024

(This article belongs to the Section Mechanical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The fault diagnosis of rolling bearing acoustic radiation signals holds significant importance in industrial equipment maintenance. It effectively prevents equipment failures and downtime, ensuring the smooth operation of the production process. Compared with traditional vibration signals, acoustic radiation signals have the advantage of non-contact measurement. They can diagnose faults in special conditions where sensors cannot be installed and provide more comprehensive equipment status information. Therefore, to extract the fault characteristic information of rolling bearings from complex acoustic signals, this paper proposes an advanced deep learning model combining Gramian Angular Field (GAF), ResNet1D, ResNet2D, and multi-head attention mechanism, named CRAMNet (Combined ResNet Attention Multi-Head Net), to diagnose the faults of rolling bearing acoustic radiation signals. Firstly, this method includes converting one-dimensional signals into GAF images and performing data standardization and segmentation. Then, the method utilizes ResNet1D to extract features from one-dimensional signals and ResNet2D to extract features from GAF images. Further, it combines the multi-head attention mechanism to enhance feature representation and capture dependencies between different channels. Finally, this paper compares the proposed method with several traditional models (including CNN, LSTM, DenseNet, and CNN-Transformers). Experimental results show that the proposed method performs outstandingly in terms of accuracy and robustness. The combination of residual networks and multi-head attention mechanism in the model significantly enhances its ability to accurately diagnose rolling bearing faults, proving the superiority of the algorithm.

Keywords:

rolling bearings; acoustic signal; ResNet; multi-head self-attention mechanism; fault diagnosis

1. Introduction

Rolling bearings play a crucial role in mechanical systems. As a core component of mechanical systems, the primary function of rolling bearings is to support rotating shafts and reduce friction, thereby ensuring stable and efficient operation of the machinery. Their applications are wide-ranging, covering various fields from small electric motors to large industrial machinery. The performance of rolling bearings directly affects the operational efficiency, reliability, and lifespan of equipment [1,2,3,4]. Therefore, the fault diagnosis and maintenance of rolling bearings are of great significance for ensuring the normal operation of mechanical systems and extending the service life of equipment.

Data-driven deep learning methods have shown significant potential in modern industrial fault diagnosis, with the aim of enhancing the reliability and maintenance efficiency of equipment [5,6]. Traditional fault diagnosis methods typically rely on experts to manually extract features. In contrast, deep learning models, through training on a large amount of historical data, can achieve improved diagnostic accuracy. Not only can they automatically extract important features from raw data, but they can also handle large-scale, high-dimensional, and complex time series data. This offers the potential for them to adapt to different industrial conditions and maintain high diagnostic performance. Therefore, data-driven deep learning methods are increasingly being explored in modern industrial fault diagnosis [7,8].

Among the many deep learning models, ResNet has gained widespread application due to its ability to address the vanishing gradient problem. ResNet introduces residual blocks and skip connections, effectively solving the vanishing gradient issue in deep neural networks. Residual connections simplify the optimization process of deep networks, reducing the difficulty of training deep networks and making them easier to converge. Currently, a large number of researchers are making significant contributions in this field. Gu et al. [9] proposed an optimized squeeze-and-excitation ResNet model-based method for enhanced fault diagnosis of rolling bearings. This method combines grid search, support vector regression, ensemble empirical mode decomposition, and low-rank multi-mode fusion to effectively process vibration fusion signals. Karan et al. [10] proposed an enhanced Inception-ResNet-V2 model based on Constant-Q non-stationary Gabor transform for early classification of rolling bearing faults in induction motors. The results show that this model has a relatively high classification accuracy under low-load and full-load conditions. Yao et al. [11] proposed a double-stream residual network and improved capsule network (DMRCN) for bearing fault diagnosis based on transfer learning. Experimental results indicate that DMRCN outperforms other deep transfer learning methods in fault diagnosis under different conditions. Hu et al. [12] combined improved selective kernel networks with enhanced Inception-ResNet-v2 to propose an integrated deep neural network. This model designs a new three-branch network, incorporating separable convolution, to achieve adaptive convolution kernel selection and efficient feature extraction. Experimental results demonstrate the superior performance of this method in fault diagnosis.

While many studies focus on vibration signals for fault diagnosis of rolling bearings, traditional acoustic radiation techniques have also been extensively researched and applied. In the field of rolling bearing fault diagnosis, acoustic radiation signals, compared to traditional vibration signal measurement methods, achieve non-contact measurement. This feature allows for detection without direct contact with the equipment surface, which is especially suitable for high-speed rotating components, hazardous environments, or areas that are difficult to access, overcoming the bottleneck of limited vibration sensor installation positions. Furthermore, acoustic radiation signals have shown prominence in early fault detection. They exhibit high sensitivity to early-stage weak signals of micro-cracks or wear, enabling the capture and diagnosis of faults at their initial stages, thus achieving preventive maintenance. This further enhances the application potential of acoustic radiation signals in rolling bearing fault diagnosis. Acoustic radiation techniques have a long history in fault diagnosis and have proven effective in various industrial applications [13,14,15].

Relevant scholars have conducted the following research. Liu et al. [16] proposed a new framework for bearing fault diagnosis based on a word segmentation network. This framework uses signal segmentation methods, fault pattern mapping, and density peak clustering methods to comprehensively model AE signals, enhancing fault detection capabilities under different rotational speed conditions. Kim et al. [17] proposed a method for bearing fault diagnosis using acoustic radiation signals. By using normalized bearing characteristic components (NBCCs) as inputs to CNNs and by combining with Grad-CAM, they achieved feature extraction and visual interpretation. Li et al. [18] proposed a novel bearing fault diagnosis algorithm using acoustic signals. This method is based on deep extreme learning machine (SNP-DELM) and the efficient fault diagnosis of bearings using motor current signals. Huang et al. [19] investigated the use of time-domain filtering and frequency-domain filtering, combined with a two-step Wiener filtering method, to achieve bearing fault signal enhancement and effective noise control. They used the parameter-adaptive MOMEMDA method to analyze the output of bearing fault signals, achieving bearing fault diagnosis and fault location identification. Although acoustic signals have certain advantages in non-contact measurement and early fault detection, extracting fault characteristic information of rolling bearings from complex noise signals remains a challenging issue. Therefore, this paper proposes a method to transform acoustic signals into one-dimensional and two-dimensional representations, combining them for feature fusion using ResNet. By integrating features from one-dimensional signals and two-dimensional time–frequency diagrams, this approach fully utilizes the complementarity of these two signal representations. One-dimensional signals can provide local temporal features, while two-dimensional time–frequency diagrams can provide global frequency characteristics. To further enhance the feature extraction and fault diagnosis capabilities of the model, this paper introduces the multi-head attention mechanism.

The multi-head self-attention mechanism, as an important component of deep learning models, has gained wide application in the fields of natural language processing and computer vision [20,21,22]. Its development began with the Transformer model, which effectively addresses the long-distance dependency issues in sequential data processing using the attention mechanism. This significantly improves computational efficiency, making it suitable for large-scale data processing and complex model training.

In recent years, the multi-head self-attention mechanism has shown a broad application prospect in the field of mechanical fault diagnosis. It can simultaneously focus on different feature channels, extracting more detailed and rich fault features, thereby improving diagnostic accuracy. When analyzing time series data, the multi-head self-attention mechanism can capture long-distance dependencies within the data, aiding in the identification of early fault signals. Yao et al. [23] proposed a two-level ABD system for noise reduction and anti-noise fault diagnosis. This system uses adaptive multi-head self-attention mechanisms to suppress background noise in multiple channels. By spatially following the remaining noise through the multi-head self-attention mechanism, the system estimates the noise level of the signal samples, ultimately improving diagnostic performance in noisy environments. Experimental results have verified the effectiveness of this system. Li et al. [24] proposed a modular rolling bearing fault diagnosis method based on capsule networks and multi-head self-attention mechanisms. By combining the translational invariance of capsule networks with the embedding attention mechanism of compressors, the completeness of information is ensured. The zero-shot training method introduced achieves cross-domain and imbalanced sample fault diagnosis. Experimental results demonstrate that this method effectively addresses the issues of cross-domain diagnosis and sample imbalance. Hou et al. [25] proposed a fault diagnosis method based on extended convolution, bidirectional gated recurrent units, and multi-head attention mechanisms for achieving rapid and accurate fault diagnosis. This model achieved high prediction accuracy on two different datasets with varying levels of Gaussian noise, indicating its potential in practical applications of early fault diagnosis of bearings. Yu et al. [26] proposed a multi-head attention-based encoder network combined with a dynamic alarm threshold method for fault detection in gearboxes. Field data validated the effectiveness of this method, significantly reducing false alarm rates and improving detection accuracy.

Therefore, to further enhance the model’s feature extraction and fault diagnosis capabilities, this paper introduces the multi-head self-attention mechanism. By simultaneously focusing on different parts of the input, capturing complex dependencies and global features, the model’s adaptability and diagnostic accuracy under different working conditions are improved. In summary, the main contributions of this paper are as follows:

Proposing a method that combines one-dimensional and two-dimensional transformations of acoustic signals for feature fusion using ResNet.
Introducing the multi-head self-attention mechanism to enhance the model’s feature extraction capability and diagnostic performance.
Demonstrating through experiments the superiority of this method in bearing fault diagnosis, showcasing its high adaptability and reliability under different working conditions.

Through this study, a novel and effective method is provided for rolling bearing fault diagnosis, with significant theoretical significance and application value. The rest of this paper is organized as follows. Section 2 details the proposed theoretical methods. Section 3 presents the experimental validation of the proposed method. Finally, Section 4 concludes the paper.

2. Methods

2.1. ResNet

Residual Networks (ResNets) are a pioneering architecture in the field of deep neural networks, designed to tackle the degradation problem that arises when training very deep networks. The fundamental innovation in a ResNet is the introduction of residual blocks, which incorporate identity mappings to enable the learning of residual functions instead of direct mappings. Its composition is shown in Figure 1.

The core idea of a residual block can be expressed as

y = F (x, \{W_{i}\}) + x

(1)

where x is the input, y is the output,

\{W_{i}\}

is the parameter of the residual function, and

F (x, \{W_{i}\})

represents the residual function to be learned, typically composed of several layers of neural networks. For a two-layer residual block, the specific form is

F (x, \{W_{i}\}) = W_{2} σ (W_{1} x)

(2)

where W₁ and W₂ are the weight matrices of the two layers and σ is the activation function.

A complete ResNet is constructed by stacking multiple residual blocks. The forward propagation in ResNet is described by the following equation:

y_{l} = F (y_{l - 1}, \{W_{i}\}) + y_{l - 1}

(3)

where y_l is the output of the l-th layer,

F (y_{l - 1}, \{W_{i}\})

is the residual function of the l-th layer, and y_l−1 is the output of the (l − 1)-th layer.

2.2. GAF

The GAF is a method used to encode time series data as images, which can then be processed by image-based deep learning algorithms. The GAF transforms a one-dimensional time series into a two-dimensional image by representing the angular relationship between each pair of time points. The GASF is one of the two types of GAF. It captures the angular relationships by computing the summation of cosine values between time points. The calculation process is shown in Figure 2. The formula for GASF is as follows:

G A S F = \cos (ϕ_{i} - ϕ_{j})

(4)

where Φ represents the polar coordinate angles of the normalized time series and Φ_i and Φ_j denote the angles corresponding to time points i and j, respectively.

The polar coordinate transformation for a time series value x_i is defined as

ϕ_{i} = \arccos (\frac{x_{i} - \min (x)}{\max (x) - \min (x)})

(5)

2.3. Multi-Head Attention

The theoretical formula for the multi-head attention mechanism can be summarized as follows:

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, h e a d_{2}, …, h e a d_{h}) W_{O}

(6)

where Q, K, and V in the formula represent matrix Query, Key, and Value, respectively; head_h represents the number of heads; and

W_{O} \in R^{(h \cdot d_{k}) \times d}

represents a linear transformation matrix. Each head is computed as

h e a d_{i} = A t t e n t i o n (Q W_{Q}^{i}, K W_{K}^{i}, V W_{V}^{i})

(7)

The attention function is defined as

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(8)

where d_k is the dimension of the key vectors. Its composition is shown in Figure 3.

2.4. CRAMNet

By integrating the above modules, this paper combines GAF, ResNet1D, ResNet2D, and the multi-head self-attention mechanism to propose the CRAMNet network model. The specific technical roadmap of the CRAMNet is shown in Figure 4.

3. Results

3.1. Experiment Introduction

In this study, the experimental data are collected from a self-built test rig. Figure 5 shows the configuration of the experimental test system, which is used to verify the feasibility of the proposed method. The entire system consists of a frequency converter, motor, shaft, test bearing, data acquisition device (model: PAK MKII SC42, B&K, Copenhagen, Denmark), array microphone circular board, and array microphone sensors (model: BSWA MPA436, sensitivity: 15.5 mV/Pa, sampling frequency: 16,384 Hz). During the experiment, the motor drives the rotating shaft and the test bearing to rotate, and the array microphones collect the acoustic signals generated during the operation of the bearing in real-time. These signals are transmitted to the data acquisition device and then to the computer for analysis. The array microphones, which consist of multiple sensors, accurately locate the sound source and have high resolution and sensitivity, capable of capturing weak acoustic signals. This is extremely important for detecting early faults and subtle anomalies in bearings.

The specific parameters of the test bearing are shown in Table 1. During the experiment, the motor speed was set to 1500 rpm. The experimental samples were extracted using a sliding window method, with a window length of 256 and a sample data length of 2048. Each fault sample quantity was 400, with 300 samples used to train the model and 100 samples used to test the model. The operating conditions of the rolling bearing mainly include normal, inner ring fault, outer ring fault, rolling element fault, cage fault, and composite fault. Detailed data are shown in Table 2.

In the experiment, five different types of bearing faults were created, including normal state (N), inner race fault (IF), rolling element fault (RF), outer ring fault (OF), and cage fault (CF), as shown in Figure 6. Each fault was introduced through precise mechanical damage in a laboratory environment. The authenticity of each fault type was further validated through vibration signal analysis in both the frequency domain and time-domain feature extraction techniques. This ensured the effectiveness of the dataset for fault diagnosis tasks.

In the neural network training process, to improve the model’s training accuracy, data augmentation was performed using rotation, translation, scaling, and cropping methods during the generation of two-dimensional image data. During the training process, dropout and batch normalization techniques were used to accelerate convergence and prevent overfitting. The dropout rate was set to 0.4, the optimizer chosen was Adam with a learning rate of 0.001, and the loss function was set to cross-entropy loss. The training batch size was set to 32, and the number of epochs was set to 100. The network was implemented using Python version 3.10.6, PyTorch version 1.11.0+cu113, CUDA version 11.3, and the CUDA device name was NVIDIA GeForce RTX 3080. Table 3 shows the detailed network parameters.

3.2. Analysis Results

The training and testing process as well as the loss value of the algorithm proposed in this paper are illustrated in Figure 7. The blue line depicts the accuracy rate’s trajectory throughout the training cycles, initially starting from a relatively low value but rapidly ascending in the initial few cycles before stabilizing at approximately 100%. On the other hand, the green line represents the loss value during these training cycles. Around the 10th cycle, it reaches an extremely low point and remains close to zero for the remainder of the training period. The swift increase in accuracy and simultaneous decrease in loss values during the initial phase underscored this model’s efficient learning capability. Furthermore, maintaining stability across subsequent cycles further accentuates robustness within our training process.

To demonstrate the rationality of the algorithm proposed in this paper, the following fusion experiment analysis was conducted on the algorithm. The following evaluation metrics were selected: F1-Score, Matthews correlation coefficient (MCC), sensitivity, specificity, accuracy, and precision. Each module was evaluated using these metrics, and the formulas for each metric are defined as follows:

F 1 = 2 \times \frac{P r e c i s i o n \times S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y}

(9)

M C C = \frac{(T P \times T N) - (F P \times F N)}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

(10)

S e n s i t i v i t y = \frac{T P}{T P + F N}

(11)

S p e c i f i c i t y = \frac{T N}{T N + F P}

(12)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(13)

P r e c i s i o n = \frac{T P}{T P + F P}

(14)

where TP stands for True Positive, TN stands for True Negative, FP stands for False Positive, and FN stands for False Negative.

From Table 4, it can be seen that ResNet1D’s sensitivity and precision are both above 0.85, while its specificity and MCC are relatively lower, indicating that this model has some shortcomings in identifying negative samples, and its overall classification performance is still lacking. ResNet2D shows significant improvement, but its overall metric scores are still not ideal. After combining both one-dimensional and two-dimensional features, the performance of the combined model improves compared to ResNet1D and ResNet2D. Accuracy and specificity increase to 0.92 and 0.91, respectively, demonstrating an overall enhancement in classification performance. By introducing a 4-head multi-head attention mechanism, the model’s performance metrics are further improved. The F1-Score reaches 0.95, MCC reaches 0.92, and specificity reaches 0.94, indicating a significant improvement in the recognition accuracy of negative samples. When the number of attention heads is increased to eight, all the model’s performance metrics approach a perfect state, with sensitivity and specificity both reaching 0.98 and 0.97, respectively. Using a 12-head multi-head attention mechanism, the model achieves an average score of 1.00 across all metrics, meaning that the model can correctly identify all class samples without any misclassification. This demonstrates that the method has reached optimal performance with the current dataset, proving its effectiveness and reliability in fault diagnosis tasks.

Figure 8 shows the t-SNE dimensionality reduction results of features extracted by different modules. Each point represents a sample, with different colors indicating different classes. In Figure 8a,b, the ResNet1D module demonstrates a certain level of discriminative ability in feature extraction, while the ResNet2D module exhibits a higher capacity in capturing spatial features. The distribution of features extracted by the multi-head attention mechanism after t-SNE dimensionality reduction is shown in Figure 8c, indicating that the attention mechanism performs excellently in feature extraction and classification, effectively capturing the relevant information in the data. From Figure 8d–f, although there is some discrimination of different classes in the two-dimensional space, there is still some overlap. This suggests that the features extracted by the FC1 layer have relatively weak discriminative power, requiring further feature extraction and processing. Compared to the FC1 layer, the FC2 layer shows improved clustering effects in the two-dimensional space, with more dispersed distribution and less overlap of different classes, indicating that the FC2 layer performs better in feature extraction and discrimination. The features of the FC3 layer exhibit more distinct clustering effects in the two-dimensional space, with clearer distribution of different classes. This suggests that as the network depth increases, the model’s feature extraction ability gradually enhances, allowing for better differentiation of different classes.

Overall, the feature extraction effects vary at different stages, with deeper models and the multi-head attention mechanism demonstrating superior performance in feature extraction and classification.

3.3. Comparison Results with Other Methods

To verify the classification superiority of the proposed CRAMNet model, we designed a series of experiments to compare it with several commonly used classification methods. These methods include Convolutional Neural Network (CNN), Long Short-Term Memory Network (LSTM), Dense Convolutional Network (DenseNet), and Convolutional Neural Network combined with Transformers (CNN-Transformers). The specific comparison methods are shown in Table 5. The experimental training and test results of several methods are shown in Figure 9.

From Figure 9a,b of the training stage, it can be seen that the CNN and LSTM training accuracy rapidly rises and stabilizes, ultimately stabilizing at a higher level. The training accuracy and training loss of the DenseNet method change trends are similar to those of CNN, but the starting point is lower. The training accuracy and training loss of the CNN-Transformer method change fluctuates greatly, which may be caused by the instability of training due to data augmentation. The training accuracy of CRAMNet rapidly rises in the early stage and ultimately reaches an accuracy of nearly 100%, and the training loss also significantly decreases, showing good training effects. From Figure 10c,d of the testing stage, it can be seen that the testing accuracy and testing loss of CNN, LSTM, and DenseNet methods are generally poor, and the testing loss decreases slowly. The testing accuracy and testing loss of the CNN-Transformer method fluctuate greatly, showing a certain instability. The testing accuracy of CRAMNet remains at a high level throughout the training process and gradually increases. The testing loss rapidly decreases and remains at a lower level, showing good generalization ability. In summary, CRAMNet performs well in both training and testing stages, with the advantages of fast convergence, high accuracy, and low loss. Compared with other methods, CRAMNet demonstrates better performance and robustness, proving its superiority.

The confusion matrix results of several methods are shown in Figure 10. The results of the confusion matrix in Figure 10 reveal varying classification performances among different models. Notably, CRAMNet model (MultiHeadAttention_12Head) achieves a remarkable 100% accuracy across all categories, surpassing other models significantly. Conversely, the LSTM model exhibits the poorest performance with increased misclassifications, particularly for categories 1 and 2, resulting in substantial confusion. While CNN-Transformer models perform well overall, they struggle specifically with categories 2 and 6. Similarly, the DenseNet model demonstrates strong performance across most categories but experiences more misclassifications in categories 1, 2, and 6. Lastly, although the CNN model displays high accuracy in numerous categories, it underperforms notably in categories 1, 2, 5 and 6.

The comparison results of 10 experiments of different models are shown in Figure 11. It can be seen from the figure that the average test accuracy of the CNN model is close to 80%. The error bar shows that its accuracy has certain fluctuations. The average test accuracy of the LSTM model is about 60%, and the model does not perform well when dealing with the current task. The test accuracy of DenseNet model is slightly higher than that of LSTM, nearly 70%. The test accuracy of the CNN-Transformer model is significantly improved with an average accuracy of about 80% and small errors. This shows that the data enhancement strategy has a significant effect on the improvement of model performance. CRAMNet has the best performance, with an average test accuracy of nearly 100% and a small error range. Compared with other models, CRAMNet not only leads in average test accuracy, but also shows better robustness and stability, so it is more suitable for application in practical engineering.

To achieve quantitative analysis, this paper uses accuracy, recall, precision, and F1-Score as evaluation metrics for bearing fault data. Each method was trained 10 times, with each epoch set to 100, and evaluated on the same test set. The average results of the tests are shown in Table 6.

As can be seen from Table 5, the CRAMNet outperforms other comparative methods in all metrics, particularly in precision and F1-Score. This demonstrates that by combining the features of one-dimensional signals and GAF images, and utilizing the multi-head self-attention mechanism, the proposed method can more effectively capture the key features in the bearing fault data, thereby improving classification performance. In the experiments, the classic CNN and LSTM models performed well when processing one-dimensional signal data, while DenseNet achieved relatively good results on GAF image data due to its advantages in handling image data. However, the combined model proposed in this paper significantly improved classification performance by integrating one-dimensional signal and two-dimensional image features and incorporating the multi-head self-attention mechanism.

In terms of computation time, the CNN has a relatively simple structure, and thus, its running time is faster. LSTM does not parallelize as effectively on GPUs as CNN, and its computation time is relatively long, reaching up to 45 s. Although DenseNet can efficiently utilize features, its network is deeper, leading to higher computational complexity and longer processing time. CNN-Transformers have a slightly higher computational complexity compared to traditional CNNs, but the parallel-friendliness of Transformers places them between CNNs and LSTM in terms of efficiency. The CRAMNet includes a multi-head attention mechanism, which improves the classification performance of the model but also slightly increases its computational complexity, meaning the processing time may be slightly longer than CNNs. In future research, network architecture should be further optimized. In summary, CRAMNets have important application potential in fault diagnosis tasks.

4. Discussion and Conclusions

Fault diagnosis of rolling bearing acoustic emission signals is of great importance in the maintenance of industrial equipment as it can effectively prevent equipment failures and downtime, ensuring the smooth operation of the production process. However, extracting fault characteristics of rolling bearings from complex acoustic signals is a challenging task. This paper proposes a comprehensive method called CRAMNet to address this challenge. CRAMNet combines GAF, ResNet1D, ResNet2D, and the multi-head self-attention mechanism to achieve efficient fault diagnosis of rolling bearing acoustic radiation signals. The conclusions drawn are as follows:

Experimental results show that CRAMNet achieves nearly 100% accuracy throughout the entire training process. CRAMNet’s fast convergence and high stability during training further demonstrate its superiority in diagnosing rolling bearing faults.
The experimental results demonstrate that CRAMNet excels in terms of precision and recall. Compared with several traditional models (including CNN, LSTM, DenseNet, and CNN-Transformers), CRAMNet outperforms in all evaluation metrics, achieving an accuracy of up to 100%. Specifically, the combination of CRAMNet’s residual network and multi-head self-attention mechanism significantly enhances its fault diagnosis capability for rolling bearings, proving the effectiveness and advancement of this method.
The research findings not only provide an effective tool for the fault diagnosis of rolling bearing acoustic radiation signals but also offer new ideas and methods for the condition monitoring and fault diagnosis of other industrial equipment. Future research can further optimize the model structure and integrate more advanced algorithms to improve fault diagnosis accuracy and efficiency, exploring its application potential in other types of equipment.

Author Contributions

Conceptualization, X.X. and Y.L.; methodology, X.X.; software, X.X. and Y.L.; validation, X.X. and Y.L.; formal analysis, X.X. and Y.L.; investigation, X.X. and Y.L.; resources, X.X. and Y.L.; data curation, X.D.; writing—original draft preparation, X.X.; writing—review and editing, X.X. and X.D.; visualization, X.D.; supervision, X.D.; project administration, X.X. and Y.L.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (No. 52275119).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The detailed data supporting the results of this study are available from the corresponding authors upon request.

Acknowledgments

Thanks to Bai of Shenyang Jianzhu University for providing the experimental bench.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, R.; Jiang, H.; Zhu, K.; Wang, Y.; Liu, C. A deep feature enhanced reinforcement learning method for rolling bearing fault diagnosis. Adv. Eng. Inform. 2022, 54, 101750. [Google Scholar] [CrossRef]
Qian, C.; Jiang, Q.; Shen, Y.; Huo, C.; Zhang, Q. An intelligent fault diagnosis method for rolling bearings based on feature transfer with improved DenseNet and joint distribution adaptation. Meas. Sci. Technol. 2022, 33, 025101. [Google Scholar] [CrossRef]
Fang, Z.; Wu, Q.-E.; Wang, W.; Wu, S. Research on Improved Fault Detection Method of Rolling Bearing Based on Signal Feature Fusion Technology. Appl. Sci. 2023, 13, 12987. [Google Scholar] [CrossRef]
Yan, G.; Chen, J.; Bai, Y.; Yu, C.; Yu, C. A Survey on Fault Diagnosis Approaches for Rolling Bearings of Railway Vehicles. Processes 2022, 10, 724. [Google Scholar] [CrossRef]
Wang, G.; Lu, M. Multiscale Deep Subspace Clustering Network with Hierarchical Fusion Mechanism for Mechanical Fault Diagnosis. IEEE Trans. Instrum. Meas. 2024, 73, 3523915. [Google Scholar] [CrossRef]
Huang, Q.; Liang, B.; Dai, X.; Su, S.; Zhang, E. Unmanned aerial vehicle fault diagnosis based on ensemble deep learning model. Meas. Sci. Technol. 2024, 35, 046205. [Google Scholar] [CrossRef]
Ding, P.; Xu, Y.; Qin, P.; Sun, X.-M. A novel deep learning approach for intelligent bearing fault diagnosis under extremely small samples. Appl. Intell. 2024, 54, 5306–5316. [Google Scholar] [CrossRef]
Wen, C.; Xue, Y.; Liu, W.; Chen, G.; Liu, X. Bearing fault diagnosis via fusing small samples and training multi-state Siamese neural networks. Neurocomputing 2024, 576, 127355. [Google Scholar] [CrossRef]
Gu, X.; Tian, Y.; Li, C.; Wei, Y.; Li, D. Improved SE-ResNet Acoustic–Vibration Fusion for Rolling Bearing Composite Fault Diagnosis. Appl. Sci. 2024, 14, 2182. [Google Scholar] [CrossRef]
Kumar, K.K.; Mandava, S. Real-time bearing fault classification of induction motor using enhanced inception ResNet-V2. Appl. Artif. Intell. 2024, 38, 2378270. [Google Scholar]
Yao, L.H.; Wang, H.W.; Tao, L.; Fang, Z.W.; Wang, H.R.; Liu, Y.; Wang, H.L. Bearing fault diagnosis based on transfer learning with dual-flow manifold ResNet and improved CapsNet. Meas. Sci. Technol. 2024, 35, 076123. [Google Scholar] [CrossRef]
Hu, B.Q.; Liu, J.; Xu, Y.; Huo, T.L. An Integrated Bearing Fault Diagnosis Method Based on Multibranch SKNet and Enhanced Inception-ResNet-v2. Shock Vib. 2024, 1, 9071328. [Google Scholar] [CrossRef]
Wang, X.; Liu, X.; He, T.; Xiao, D.; Shan, Y. Structural damage acoustic emission information enhancement through acoustic black hole mechanism. Measurement 2021, 190, 110673. [Google Scholar] [CrossRef]
Makhutov, N.A.; Vasil’ev, I.E.; Ivanov, V.I.; Elizarov, S.V.; Chernov, D.V. Testing the Technique for the cluster analysis of acoustic emission pulse arrays under the formation of a conical glass granulate pile. Inorg. Mater. 2017, 53, 1513–1524. [Google Scholar] [CrossRef]
Karkkainen, T.J.; Talvitie, J.P.; Kuisma, M.; Hannonen, J.; Strom, J.-P.; Mengotti, E.; Silventoinen, P. Acoustic emission in power semiconductor modules—First observations. IEEE Trans. Power Electron. 2014, 29, 6081–6086. [Google Scholar] [CrossRef]
Liu, Z.; Li, H.; Lin, J.; Jiao, J.; Shen, T.; Zhang, B.; Liu, H. A novel acoustic emission signal segmentation network for bearing fault fingerprint feature extraction under varying speed conditions. Eng. Appl. Artif. Intell. 2023, 126, 106819. [Google Scholar] [CrossRef]
Kim, J.Y.; Kim, J.M. Bearing Fault Diagnosis Using Grad-CAM and Acoustic Emission Signals. Appl. Sci. 2020, 10, 2050. [Google Scholar] [CrossRef]
Li, K.; Xiong, M.; Li, F.; Su, L.; Wu, J. A novel fault diagnosis algorithm for rotating machinery based on a sparsity and neighborhood preserving deep extreme learning machine. Neurocomputing 2019, 350, 261–270. [Google Scholar] [CrossRef]
Shuai, H.; Junxia, L.; Lei, W.; Wei, Z. Research on acoustic fault diagnosis of bearings based on spatial filtering and time-frequency domain filtering. Measurement 2023, 221, 113533. [Google Scholar] [CrossRef]
Liu, Q.; Wang, J.; Dai, H.; Ning, L.; Nie, P. Bridge Structural Damage Identification Based on Parallel Multi-head Self-attention Mechanism and Bidirectional Long and Short-term Memory Network. Arab. J. Sci. Eng. 2024. [Google Scholar] [CrossRef]
Chen, L.; Xu, Y.; Zhu, Q.-X.; He, Y.-L. Adaptive Multi-Head Self-Attention Based Supervised VAE for Industrial Soft Sensing with Missing Data. IEEE Trans. Autom. Sci. Eng. 2023, 21, 3564–3575. [Google Scholar] [CrossRef]
Zhang, X.; Wu, Z.; Liu, K.; Zhao, Z.; Wang, J.; Wu, C. Text Sentiment Classification Based on BERT Embedding and Sliced Multi-Head Self-Attention Bi-GRU. Sensors 2023, 23, 1481. [Google Scholar] [CrossRef] [PubMed]
Yao, Y.; Gui, G.; Yang, S.; Zhang, S. A recursive multi-head self-attention learning for acoustic-based gear fault diagnosis in real-industrial noise condition. Eng. Appl. Artif. Intell. 2024, 133, 108240. [Google Scholar] [CrossRef]
Li, S.; Xu, Y.; Jiang, W.; Zhao, K.; Liu, W. A modular fault diagnosis method for rolling bearing based on mask kernel and multi-head self-attention mechanism. Trans. Inst. Meas. Control 2024, 46, 899–912. [Google Scholar] [CrossRef]
Hou, P.; Zhang, J.; Jiang, Z.; Tang, Y.; Lin, Y. A Bearing Fault Diagnosis Method Based on Dilated Convolution and Multi-Head Self-Attention Mechanism. Appl. Sci. 2023, 13, 12770. [Google Scholar] [CrossRef]
Yu, X.; Zhang, Z.; Tang, B.; Zhao, M. A multi-head self-attention autoencoder network for fault detection of wind turbine gearboxes under random loads. Meas. Sci. Technol. 2024, 35, 086137. [Google Scholar] [CrossRef]

Figure 1. Residual network module.

Figure 2. One-dimensional signals are converted into two-dimensional signals using the GAF.

Figure 3. Multi-head attention mechanism module.

Figure 4. Overall technical flowchart of the CRAMNet.

Figure 5. Test bench for bearing faults.

Figure 6. Failure modes of different bearing components.

Figure 7. Loss and accuracy of the CRAMNet training process.

Figure 8. Visualized results during training (a) ResNet1D features, (b) ResNet2D features, (c) attention features, (d) FC1 features, (e) FC2 features, and (f) FC3 features.

Figure 9. Comparison of the results of different methods: (a) train accuracy of different models, (b) train loss of different models, (c) test accuracy of different models, and (d) test loss of different models.

Figure 10. Confusion matrix results for different models: (a) CNN; (b) LSTM; (c) DenseNet; (d) CNN-Transformers; and (e) CRAMNet.

Figure 11. The results of 10 experiments with different models compared.

Table 1. Structural parameters of rolling bearing.

Structural Parameters	Parameter Values	Structural Parameters	Parameter Values
Bearing type	UC205	Contact angle	0°
Outside diameter	52 mm	The number of roller	9
Bore diameter	25 mm	Width	34.1 mm

Table 2. Detailed dataset information.

Fault Types	Working Speed (r/min)	Training/Testing Sample Size	Label
Normal (N)	1500	300/100	0
Outer ring fault (OF)	1500	300/100	1
Inner ring fault (IF)	1500	300/100	2
Rolling element fault (RF)	1500	300/100	3
Cage fault (CF)	1500	300/100	4
Combined inner and outer fault (IO)	1500	300/100	5
Combined rolling and outer fault (RO)	1500	300/100	6

Table 3. Detailed network parameters.

Layer Name	Input Shape	Output Shape	Kernel Size
ResNet1D
Conv1D	(1, 2048)	(64, 1024)	7
ResBlock1D-1 Conv1	(64, 1024)	(128, 1024)	3
ResBlock1D-1 Conv2	(128, 1024)	(128, 1024)	3
ResBlock1D-1 pooling	(64, 1024)	(128, 1024)	1
ResBlock1D-2 Conv1	(128, 1024)	(128, 1024)	3
ResBlock1D-2 Conv2	(128, 1024)	(128, 1024)	3
AdaptiveAvgPool1D	(128, 1024)	(128, 1)	-
Flatten	(128, 1)	(128,)	-
ResNet2D
Conv2D	(1, 32, 32)	(64, 16, 16)	7 × 7
ResBlock2D-1 Conv1	(64, 16, 16)	(128, 16, 16)	3 × 3
ResBlock2D-1 Conv2	(128, 16, 16)	(128, 16, 16)	3 × 3
ResBlock2D-1 pooling	(64, 16, 16)	(128, 16, 16)	1 × 1
ResBlock2D-2 Conv1	(128, 16, 16)	(128, 16, 16)	3 × 3
ResBlock2D-2 Conv2	(128, 16, 16)	(256, 16, 16)	3 × 3
AdaptiveAvgPool2D	(256, 16, 16)	(256, 1, 1)	-
Flatten	(256, 1, 1)	(256,)	-
Multi-Head Attention
MultiHeadAttention	(256,)	(256,)	4–16
CombinedModel
ResNet1D	(1, 2048)	(128,)	-
ResNet2D	(1, 32, 32)	(128,)	-
Linear (fc1)	(128,)	(128,)	-
Linear (fc2)	(256,)	(128,)	-
MultiHeadAttention	(256,)	(256,)	-
Linear (fc3)	(256,)	7	-

Table 4. Comparison of different modules.

	Model	F1-Score	MCC	Sensitivity	Specificity	Accuracy	Precision
1	ResNet1D	0.85	0.8	0.87	0.83	0.84	0.86
2	ResNet2D	0.91	0.88	0.92	0.89	0.9	0.91
3	CombinedModel	0.93	0.9	0.94	0.91	0.92	0.93
4	CombinedModel + MultiHeadAttention_4Head	0.95	0.92	0.96	0.94	0.95	0.95
5	CombinedModel + MultiHeadAttention_8Head	0.98	0.96	0.98	0.97	0.97	0.97
6	CombinedModel + MultiHeadAttention_12Head	1.0	1.0	1.0	1.0	1.0	1.0

Table 5. Structure and parameter setting of comparison methods.

Models	Parameters
CNN	2*[Conv-Pool]-fc1-fc2-Classifier
LSTM	3*lstm-fc1-fc2-Classifier
DenseNet	3*dense-Classifier
CNN-Transformers	2*[Conv-Pool]-fc1-fc2-transforms-Classifier

Table 6. Numerical comparison results of different methods.

Method	Accuracy	Recall	Precision	F1-Score	MCC	Sensitivity	Specificity	Time (Seconds per 100 Iterations on GPU)
CNN	0.724	0.724	0.729	0.724	0.681	0.726	0.715	25 s
LSTM	0.576	0.576	0.576	0.576	0.559	0.576	0.561	45 s
DenseNet	0.679	0.679	0.679	0.679	0.661	0.681	0.673	40 s
CNN-Transformers	0.889	0.889	0.889	0.889	0.870	0.887	0.872	30 s
CRAMNet	1.0	1.0	1.0	1.0	1.0	1.0	1.0	35 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, X.; Li, Y.; Ding, X. Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques. Appl. Sci. 2024, 14, 8431. https://doi.org/10.3390/app14188431

AMA Style

Xu X, Li Y, Ding X. Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques. Applied Sciences. 2024; 14(18):8431. https://doi.org/10.3390/app14188431

Chicago/Turabian Style

Xu, Xiaozheng, Ying Li, and Xuebao Ding. 2024. "Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques" Applied Sciences 14, no. 18: 8431. https://doi.org/10.3390/app14188431

APA Style

Xu, X., Li, Y., & Ding, X. (2024). Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques. Applied Sciences, 14(18), 8431. https://doi.org/10.3390/app14188431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Combined ResNet Attention Multi-Head Net (CRAMNet): A Novel Approach to Fault Diagnosis of Rolling Bearings Using Acoustic Radiation Signals and Advanced Deep Learning Techniques

Abstract

1. Introduction

2. Methods

2.1. ResNet

2.2. GAF

2.3. Multi-Head Attention

2.4. CRAMNet

3. Results

3.1. Experiment Introduction

3.2. Analysis Results

3.3. Comparison Results with Other Methods

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI