Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings
Abstract
:1. Introduction
- A CNN-BiLSTM-MHSA-based fault diagnosis model was developed for rotor motors, combining the strengths of CNN, BiLSTM, and MHSA to extract spatial, temporal, and attention-based features for accurate fault detection.
- Advanced signal processing techniques, including Fast Fourier Transform (FFT) and Butterworth band-stop filters, were applied to enhance the quality of vibration data, improving feature extraction and the robustness of the fault diagnosis model under varying operational conditions.
- A comprehensive diagnostic framework was proposed to classify different bearing fault severities, achieving a high accuracy of 99.33% on a test set, significantly outperforming traditional CNN and LSTM models in fault classification tasks.
- The model’s performance was validated using experimental data from a rotor motor fault simulation test bench, demonstrating its effectiveness and reliability in real-world applications.
2. Theoretical Basis
2.1. CNN
- The convolutional layer applies a filter (or convolution kernel) to the input data, performing weighted summation over local regions to extract relevant features. This operation is defined by Equation (1):
- The activation function introduces nonlinearity into the network, enhancing its ability to model complex patterns. The Rectified Linear Unit (ReLU) is one of the most commonly used activation functions. It is defined as shown in Equation (2):
- The pooling layer reduces the spatial dimensions of the features and reduces the amount of computation. Common pooling operations are maximum pooling and average pooling. Their calculation formulas are expressed as follows in Equation (3):
- Located at the end of the network, the fully connected layer maps the features extracted from the convolutional and pooling layers to the final output classification. The formula of the fully connected layer is expressed as follows in Equation (4):
2.2. BiLSTM
- Cell state: The core component of an LSTM network is the cell state, responsible for storing and transferring information, enabling the network to retain long-term dependencies and excel in processing sequential data.
- Forget gate: The forget gate determines which information to discard from the cell state. Using a sigmoid activation function, it evaluates each cell’s state value to decide how much to retain, allowing the network to flexibly adapt its memory. The forget gate operation is defined by Equation (5):
- Input gate: The input gate contains a sigmoid layer that selects values to update the cell state and a tanh layer that generates candidate values for addition. These functions are defined by Equations (6) and (7):
- Module status update: The process of updating the cell state involves multiplying the output of the forget gate by the original cell state, then adding the output of the input gate. This operation maintains the validity and continuity of the cell state. The formula for updating a cell is expressed as follows in Equation (8):
- Hidden state: The hidden state is the output of the LSTM unit, containing information from the current time step, which is passed to the next time step to ensure coherence and consistency of the data.
- Output gate: The output gate determines the value of the next hidden state, which is the actual output of the LSTM network. The output gate is adjusted so that the network output reflects the key information of the current time step. The formula for the output gate is expressed as follows in Equations (9) and (10).
2.3. MHSA
2.4. The Proposed Diagnostic Model
3. Experimental Verification
3.1. Experimental Design and Data Acquisition
3.1.1. Experimental Overview
3.1.2. Laboratory Equipment and Test Instruments
3.1.3. Bearing Failure Implantation
3.1.4. Test Procedure
3.2. The Preprocessing Process of Raw Collected Data
3.2.1. Band-Stop Filters
- The filter order is set to 4. A higher filter order yields a steeper frequency response, enhancing the effectiveness of the filter in attenuating specific frequency bands.
- The band-stop frequency range is defined from 5000 Hz to 10,000 Hz, targeting a range where noise and interference is notably significant and, thus, requires suppression.
3.2.2. FFT
3.3. Model Design and Construction
4. Model Verification and Performance Analysis
4.1. Efficient Data Selection for Enhanced Model Training and Generalization
4.1.1. Advantages of Selective Data Extraction
4.1.2. Dataset Design
4.2. Training
4.2.1. Training Mechanism and Model Optimization
- The optimizer is reset to avoid gradient accumulation;
- The model generates predictions for the input data, computes the loss by comparing predictions with actual labels, and performs backpropagation;
- The computed gradients are used by the optimizer to update the model weights, improving its performance.
4.2.2. Adaptive Learning Rate Adjustment
4.2.3. Key Optimization Strategies and Regularization
4.3. Comparative Analysis of Diagnosis Results and Algorithms
- Precision measures the proportion of samples predicted to be positive that are actually positive. It is calculated using Equation (25):
- Recall evaluates the proportion of true-positive samples correctly identified by the model. It is calculated using Equation (26):
- The F1 score is the harmonic mean of precision and recall, balancing the two metrics. It is computed using Equation (27):
- RMSE represents the square root of the average squared differences between predicted and true values. A lower RMSE indicates a better fit between the predicted and actual values. RMSE is calculated using Equation (28):
- MSE is the mean of the squared deviations between the predicted and true values, reflecting the degree of model error accumulation. MSE is calculated using Equation (29):
- MAE is the mean of the absolute differences between predicted and true values. A smaller MAE signifies a smaller deviation between predicted and true values. MAE is calculated using Equation (30):
- MAPE calculates the percentage difference between the predicted and true values. MAPE is calculated using Equation (31):
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
UAV | Unmanned Aerial Vehicle |
CNN | Convolutional Neural Network |
LSTM | Long Short-Term Memory |
BiLSTM | Bidirectional Long Short-Term Memory |
MHSA | Multi-Head Self-Attention |
ARAE | Attention Recurrent Autoencoder |
WD | Wavelet Denoising |
ESC | Electronic Speed Controller |
SVM | Support Vector Machine |
DL | Deep Learning |
AI | Artificial Intelligence |
ECNN | Efficient Convolutional Neural Network |
EH-SBW | Electro-Hydraulic Steer-By-Wire |
1DCNN | One-Dimensional Convolutional Neural Network |
LTFM | Lightweight Time-Focused Model |
WDRU | Weighted Diminish Recurrent Unit |
FAN | Feature Aggregation Network |
FTSVD | Flexible Tensor Singular Value Decomposition |
TDR | Trajectory Dimension Ratio |
TRPCA | Tensor Robust Principal Component Analysis |
GE-LRTLM | Graph-Embedded Low-Rank Tensor Learning Machine |
RNN | Recurrent Neural Network |
NLP | Natural Language Processing |
FFT | Fast Fourier Transform |
ReLU | Rectified Linear Unit |
DFT | Discrete Fourier Transform |
CPU | Central Processing Unit |
GPU | Graphics Processing Unit |
FC | Fully Connected |
t-SNE | t-distributed Stochastic Neighbor Embedding |
Prec | Precision |
Rec | Recall |
TP | True Positive |
FP | False Positive |
FN | False Negative |
RMSE | Root Mean Square Error |
MAE | Mean Absolute Error |
MSE | Mean Square Error |
MAPE | Mean Absolute Percentage Error |
References
- Pusca, R.; Sbaa, S.; Bessous, N.; Romary, R.; Bousseksou, R. Mechanical Failure Detection in Induction Motors Using Stator Current and Stray Flux Analysis Techniques. Eng. Proc. 2022, 14, 19. [Google Scholar] [CrossRef]
- Rajeev, K.; Anand, R.S. Bearing Fault Diagnosis Using Multiple Feature Selection Algorithms with SVM. Prog. Artif. Intell. 2024, 13, 119–133. [Google Scholar]
- Qiu, W.; Wang, B.; Hu, X. Rolling Bearing Fault Diagnosis Based on RQA with STD and WOA-SVM. Heliyon 2024, 10, e26141. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Li, Y.; Zhang, J.; Liu, F. The Research on Fault Diagnosis of Rolling Bearing Based on Current Signal CNN-SVM. Meas. Sci. Technol. 2023, 34, 125021. [Google Scholar] [CrossRef]
- Qu, J.; Xu, Z.; Li, C.; Zhang, M. Fault Diagnosis of Bearings Using Wavelet Packet Energy Spectrum and SSA-DBN. Processes 2023, 11, 1875. [Google Scholar] [CrossRef]
- Xue, L.; Yang, F.; Chen, Z.; Gao, H. An AVMD-DBN-ELM Model for Bearing Fault Diagnosis. Sensors 2022, 22, 9369. [Google Scholar] [CrossRef]
- Ni, Z.; Sun, J.; Liu, Q.; Wu, T.; Zhang, S. Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis. Processes 2024, 12, 1951. [Google Scholar] [CrossRef]
- An, K.; Lu, J.; Zhu, Q.; Wang, X.; De Silva, C.W.; Xia, M.; Lu, S. Edge solution for real-time motor fault diagnosis based on efficient convolutional neural network. IEEE Trans. Instrum. Meas. 2023, 72, 3516912. [Google Scholar] [CrossRef]
- Evangeline, I.S.; Darwin, S.; Raj, I.F.E. A deep residual neural network model for synchronous motor fault diagnostics. Appl. Soft Comput. 2024, 160, 111683. [Google Scholar] [CrossRef]
- Fan, H.; Ren, Z.; Zhang, X.; Cao, X.; Ma, H.; Huang, J. A gray texture image data-driven intelligent fault diagnosis method of induction motor rotor-bearing system under variable load conditions. Measurement 2024, 233, 114742. [Google Scholar] [CrossRef]
- Zhang, S.; Liang, W.; Zhao, W.; Luan, Z.; Wang, C.; Xu, K. Electro-hydraulic SBW fault diagnosis method based on novel 1DCNN-LSTM with attention mechanisms and transfer learning. Mech. Syst. Signal Process. 2024, 220, 111644. [Google Scholar] [CrossRef]
- Kim, M.C.; Lee, J.H.; Wang, D.H. Induction Motor Fault Diagnosis Using Support Vector Machine, Neural Networks, and Boosting Methods. Sensors 2023, 23, 2585. [Google Scholar] [CrossRef] [PubMed]
- Yang, T.; Jiang, L.; Guo, Y.; Han, Q.; Li, X. LTFM-net framework: Advanced intelligent diagnostics and interpretability of insulated bearing faults in offshore wind turbines under complex operational conditions. Ocean Eng. 2024, 309 Pt 2, 118533. [Google Scholar] [CrossRef]
- Raouf, I.; Kumar, P.; Kim, H.S. Deep learning-based fault diagnosis of servo motor bearing using the attention-guided feature aggregation network. Expert Syst. Appl. 2024, 258, 125–137. [Google Scholar] [CrossRef]
- Cheng, L.; Kong, X.; Zhang, Y.; Zhu, Y.; Qi, H.; Zhang, J. A novel causal feature learning-based domain generalization framework for bearing fault diagnosis with a mixture of data from multiple working conditions and machines. Adv. Eng. Inform. 2024, 15 Pt A, 102622. [Google Scholar] [CrossRef]
- Ma, H.; Li, J.; Huang, J.; Wang, R.; Ge, R.; Zhang, F. Adaptive Embedded Flexible Tensor Singular Spectrum Decomposition. Electronics 2024, 14, 21. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, F.; Coombs, T.; Chu, F. The first-kind flexible tensor SVD: Innovations in multi-sensor data fusion processing. Nonlinear Dyn. 2024. [Google Scholar] [CrossRef]
- Xu, H.; Wang, X.; Huang, J.; Zhang, F.; Chu, F. Semi-supervised multi-sensor information fusion tailored graph embedded low-rank tensor learning machine under extremely low labeled rate. Inf. Fusion 2024, 105, 102222. [Google Scholar] [CrossRef]
- Abid, A.; Khan, M.T.; Iqbal, J. A review on fault detection and diagnosis techniques: Basics and beyond. Artif. Intell. Rev. 2021, 54, 3639–3664. [Google Scholar] [CrossRef]
- Dubaish, A.A.; Jaber, A.A. State-of-the-art review into signal processing and artificial intelligence-based approaches applied in gearbox defect diagnosis. Eng. Technol. J. 2024, 42, 157–172. [Google Scholar] [CrossRef]
- Barai, V.; Ramteke, S.M.; Dhanalkotwar, V.; Nagmote, Y.; Shende, S.; Deshmukh, D. Bearing fault diagnosis using signal processing and machine learning techniques: A review. IOP Conf. Ser. Mater. Sci. Eng. 2022, 1259, 012034. [Google Scholar]
- Xu, X.; Cao, D.; Zhou, Y.; Gao, J. Application of neural network algorithm in fault diagnosis of mechanical intelligence. Mech. Syst. Signal Process. 2020, 141, 106625. [Google Scholar] [CrossRef]
- Guo, Y.; Zhou, J.; Dong, Z.; She, H.; Xu, W. Research on bearing fault diagnosis based on novel MRSVD-CWT and improved CNN-LSTM. Meas. Sci. Technol. 2024, 35, 095003. [Google Scholar] [CrossRef]
- Zhang, Q.; Wei, X.; Wang, Y.; Hou, C. Convolutional Neural Network with Attention Mechanism and Visual Vibration Signal Analysis for Bearing Fault Diagnosis. Sensors 2024, 24, 1831. [Google Scholar] [CrossRef] [PubMed]
- Kang, J.; Zhu, X.; Shen, L.; Li, M. Fault diagnosis of a wave energy converter gearbox based on an Adam optimized CNN-LSTM algorithm. Renew. Energy 2024, 121, 1022. [Google Scholar] [CrossRef]
- Li, Z.; Jiang, Z.; Gao, Z.; Zhang, W. A state estimation method based on CNN-LSTM for ball screw. Meas. Control 2024, 57, 1417–1434. [Google Scholar] [CrossRef]
- Liu, X.; Chen, G.; Wang, H.; Wei, X. A Siamese CNN-BiLSTM-based method for unbalance few-shot fault diagnosis of rolling bearings. Meas. Control 2024, 57, 551–565. [Google Scholar] [CrossRef]
- Lu, S.; Liu, M.; Yin, L.; Yin, Z.; Liu, X.; Zheng, W. The multi-modal fusion in visual question answering: A review of attention mechanisms. PeerJ Comput. Sci. 2023, 9, e1400. [Google Scholar] [CrossRef]
- Li, S.; Xu, Y.; Jiang, W.; Zhao, K.; Liu, W. A modular fault diagnosis method for rolling bearing based on mask kernel and multi-head self-attention mechanism. Trans. Inst. Meas. Control 2024, 46, 899–912. [Google Scholar] [CrossRef]
- Gao, H.; Ma, J.; Zhang, Z.; Cai, C. Bearing Fault Diagnosis Method Based on Attention Mechanism and Multi-Channel Feature Fusion. IEEE Access 2024, 12, 45011–45025. [Google Scholar] [CrossRef]
- Chu, S.; Zhang, J.; Liu, F.; Kong, X.; Jiang, Z.; Mao, Z. Fault identification model of diesel engine based on mixed attention: Single-cylinder fault data driven whole-cylinder diagnosis. Expert Syst. Appl. 2024, 255, 124769. [Google Scholar] [CrossRef]
- Wang, P.; Zhao, X.; Yang, Y.; Ma, H.; Han, Q.; Luo, Z.; Wen, B. Dynamic modeling and analysis of two-span rotor-pedestal system with bearing tilt and extended defect: Simulation and experiment. Appl. Math. Model. 2024, 125, 1–28. [Google Scholar] [CrossRef]
- He, X.; Ding, J.; Wang, X.; Zhang, Q.; Zhao, W.; Wang, K. Adaptive extraction of characteristic ridges from time-frequency representation for wheelset bearings failure diagnosis under time-varying speed. Measurement 2024, 242, 115987. [Google Scholar] [CrossRef]
- Xu, M.; Yu, Q.; Chen, S.; Lin, J. Rolling Bearing Fault Diagnosis Based on CNN-LSTM with FFT and SVD. Information 2024, 15, 399. [Google Scholar] [CrossRef]
- Fang, X.; Zheng, J.; Jiang, B. A rolling bearing fault diagnosis method based on vibro-acoustic data fusion and fast Fourier transform (FFT). Int. J. Data Sci. Anal. 2024, 1–10. [Google Scholar] [CrossRef]
- Kong, X.; Li, X.; Zhou, Q.; Hu, Z.; Shi, C. Attention recurrent autoencoder hybrid model for early fault diagnosis of rotating machinery. IEEE Trans. Instrum. Meas. 2021, 70, 2505110. [Google Scholar] [CrossRef]
- Zhou, Q.; Tang, J. An Interpretable Parallel Spatial CNN-LSTM Architecture for Fault Diagnosis in Rotating Machinery. IEEE Internet Things J. 2024, 11, 31730–31744. [Google Scholar] [CrossRef]
- Gao, Y.; Kim, C.H.; Kim, J.M. A novel hybrid deep learning method for fault diagnosis of rotating machinery based on extended WDCNN and long short-term memory. Sensors 2021, 21, 6614. [Google Scholar] [CrossRef]
Equipment | Model Number | Parameters | Brand |
---|---|---|---|
Acceleration sensors | BH5011 | Sensitivity: 10 mV/g | BH |
Amplitude range: ±500 g | |||
Frequency range: 0–13 kHz | |||
Data acquisition processor | AC5000 | Resolution: 16-bit | BH |
Enter path: 32 | |||
Maximum sampling rate: 102.4 KSPS | |||
Deep-groove ball bearings | 6907 | Rolling body diameter: 5.2 mm | NSK |
Section diameter: 45 mm | |||
Inside diameter: 35 mm | |||
External diameter: 55 mm | |||
Thickness: 10 mm | |||
Number of rolling bodies: 13 |
Fault Severity | Minor (Width mm × Depth mm) | Severe (Width mm × Depth mm) | |
---|---|---|---|
Fault Type | |||
Outer ring failure | 0.5 × 0.5 | 0.5 × 1 | |
Inner ring failure | 0.5 × 0.5 | 0.5 × 1 |
Layer Name | Input Size | Kernel Size | Stride | Number of Output Channels | Output Size |
---|---|---|---|---|---|
Conv1 | (16, 1, 10,000, 1) | (3, 1) | (1, 1) | 64 | (16, 64, 9998, 1) |
Pool1 | (16, 64, 9998, 1) | (2, 1) | (2, 1) | — | (16, 64, 4999, 1) |
Conv2 | (16, 64, 4999, 1) | (3, 1) | (1, 1) | 128 | (16, 128, 4997, 1) |
Pool2 | (16, 128, 4997, 1) | (2, 1) | (2, 1) | — | (16, 128, 2498, 1) |
Conv3 | (16, 128, 2498, 1) | (3, 1) | (1, 1) | 256 | (16, 256, 2496, 1) |
Pool3 | (16, 256, 2496, 1) | (2, 1) | (2, 1) | — | (16, 256, 1248, 1) |
Flatten | (16, 256, 1248, 1) | — | — | — | (16, 319, 488) |
Dropout | (16, 319, 488) | — | — | — | (16, 319, 488) |
BiLSTM | (16, 1248, 256) | — | — | 256 | (16, 1248, 256) |
MHSA | (16, 1248, 256) | — | — | 256 | (16, 1248, 256) |
FC | (16, 256) | — | — | 5 | (16, 5) |
Parameter Name | Parameter Value | Description |
---|---|---|
Learning Rate | 0.0005 | Used to adjust the update magnitude of the model parameters. |
Batch Size | 16 | The number of samples used in each training iteration. |
Number of Epochs | 50 | Total training rounds (may end early due to the early stopping mechanism). |
Patience | 5 | Early stopping mechanism parameter that stops training if there is no improvement in 5 consecutive epoch losses. |
Loss Function | Cross-entropy Loss | Loss functions for multi-categorization problems. |
Optimizer | Adam | Optimizers with adaptive learning capabilities. |
Learning Rate Scheduler | Optimizer; step size = 10; γ = 0.1 | The learning rate is updated every 10 epochs with a decay factor of 0.1. |
Dropout | 50% | The dropout ratio is applied in the model to minimize overfitting. |
Fault Type | Minor Failure of the Inner Ring | Severe Failure of the Inner Ring | Minor Failure of the Outer Ring | Severe Failure of the Outer Ring | Normalcy | |
---|---|---|---|---|---|---|
Indicator | ||||||
Precision (%) | 100 | 97.0588 | 100 | 100 | 100 | |
Recall (%) | 95.8333 | 100 | 100 | 100 | 100 | |
F1 score (%) | 97.8723 | 98.5075 | 100 | 100 | 100 |
Model | RMSE | MSE | MAE | MAPE | Accuracy |
---|---|---|---|---|---|
CNN | 0.43 | 0.19 | 0.11 | 3.42% | 93.33% |
LSTM | 1.26 | 1.58 | 0.69 | 33.20% | 62.00% |
ARAE | 1.00 | 1.00 | 0.53 | 26.24% | 66.00% |
CNN-LSTM | 0.18 | 0.03 | 0.02 | 1.37% | 98.87% |
LTFM-Net | 0.18 | 0.03 | 0.03 | 4.17% | 96.67% |
WDCNN-LSTM | 0.20 | 0.04 | 0.04 | 4.76% | 96.00% |
CNN-BiLSTM-MHSA | 0.08 | 0.01 | 0.01 | 0.79% | 99.33% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Li, W.; Yuan, F.; Zhi, H.; Guo, M.; Xin, B.; Gao, Z. Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings. Mathematics 2025, 13, 334. https://doi.org/10.3390/math13030334
Yang Z, Li W, Yuan F, Zhi H, Guo M, Xin B, Gao Z. Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings. Mathematics. 2025; 13(3):334. https://doi.org/10.3390/math13030334
Chicago/Turabian StyleYang, Zizhen, Wei Li, Fang Yuan, Haifeng Zhi, Min Guo, Bo Xin, and Zhilong Gao. 2025. "Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings" Mathematics 13, no. 3: 334. https://doi.org/10.3390/math13030334
APA StyleYang, Z., Li, W., Yuan, F., Zhi, H., Guo, M., Xin, B., & Gao, Z. (2025). Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings. Mathematics, 13(3), 334. https://doi.org/10.3390/math13030334