1. Introduction
As the foundation of global logistics, the shipping industry is responsible for the majority of international trade transportation. Ensuring the safe and efficient operation of ships is of paramount importance [
1,
2,
3]. Ship engines, as the primary source of power for maritime vessels, are responsible for propelling ships forward. The performance and operational status of these engines directly impact the safety, economy, and reliability of ships [
4,
5,
6]. In particular, in modernized, large-scale ocean-going vessels, the engine is subjected to high loads and a complex operational environment over extended periods. Any fault may result in the disruption of the ship’s operations or even lead to significant safety incidents. Among the numerous types of engine faults, misfire faults are particularly prevalent and hazardous [
7,
8].
A misfire fault is typically observed when a cylinder within an engine fails to ignite the fuel mixture at the designated point in the combustion cycle. Cylinder misfire results in inadequate power output and diminished fuel efficiency and may even precipitate a series of chain reactions, including increased engine vibration and elevated emissions [
9]. In the event that such faults are not identified and rectified in a timely manner, they may result in additional wear or damage to engine components and potentially even lead to more severe safety incidents [
10]. For instance, a power system fault resulting from an engine misfire during an ocean voyage may result in the loss of propulsion and control, thereby placing the vessel in a perilous situation. Furthermore, a misfire fault increases the operating cost of a ship, as it not only elevates fuel consumption but may also necessitate a broader range of equipment maintenance and repair.
In light of the aforementioned considerations, the diagnosis of malfunctions in marine propulsion systems, with a particular emphasis on the early identification and characterization of misfire issues, assumes paramount importance. The early detection and treatment of misfire faults can prevent the development of minor issues into significant accidents, assist ship operators in reducing operational risks, minimize unnecessary economic losses, and ensure the safety of the ship and its personnel [
11,
12]. Fault diagnosis technology is a method of extracting fault characteristics and determining the type and location of fault occurrence through the analysis of equipment operation data. In industrial equipment and mechanical systems, fault diagnosis technology has been widely utilized in wind turbines, aviation engines, railroad locomotives, and other fields, with the fault diagnosis of ship engines representing a significant area of interest within this field. With the advancement of marine engine technology, particularly in the context of the increased complexity of electronic control and fuel injection systems, traditional diagnostic methods have proven inadequate in addressing the increasingly complex fault modes and signal characteristics [
13].
In practice, traditional methods for diagnosing marine engine misfire faults typically rely on the experience and intuition of the operator or the detection of simple engine control system alarms. These methods are clearly inadequate. First, manual diagnosis depends on the expertise of the operator, which may prove challenging in complex navigational conditions, potentially leading to delays in identifying the root cause of the fault. Secondly, the alarm system of the engine control system is typically only capable of detecting significant faults that have already occurred. It lacks the necessary sensitivity and early warning capability to detect early faults, which makes it challenging to identify potential issues in a timely manner. Furthermore, traditional methods often prove inadequate for diagnosing elusive or subtle misfire faults. The growing complexity and nonlinear characteristics of ship engine signals have rendered traditional signal processing methods, which rely on rules and statistical analysis, insufficient for modern ships that require efficient and accurate fault diagnosis [
14,
15,
16]. For example, Han [
17] proposed the AGap slope as a novel approach to misfire detection. By comparing the inter-cylinder slope difference between the teeth of the same cylinder in two adjacent cycles, the AGap slope can effectively eliminate the inter-cylinder slope error. Experimental results demonstrate that the method exhibits an average misfire detection rate of 90.2% across a range of test conditions. Furthermore, the detection rate can reach 93% to 98% within the 1500 to 4000 rpm range. However, the detection rate is reduced when the engine load is close to neutral or the speed exceeds 4000 rpm. Wang et al. [
18] proposed a diagnostic strategy with an adaptive threshold algorithm. This algorithm is based on an angular domain identification method that determines the misfire-sensitive region in real time through relative scatter analysis. The time unit is then computed based on this analysis. The time unit change value of each operational cylinder is employed to compute a weighted average, thereby constructing a misfire feature signal as an analytical object. The results of real-vehicle validation demonstrate that the novel strategy is capable of adjusting the diagnostic threshold in real-time, enhancing the real-time diagnosis of a misfire (89% improvement), and increasing the feature signal amplitude by over 25% following cylinder filtering in the continuous misfire mode. Moreover, the method is capable of detecting various misfire types across the full spectrum of operating conditions, obviating the necessity to establish discrete thresholds for different vehicle driving states and operational scenarios. This reduces the calibration workload and the impact of vehicle dispersion. Sharib et al. [
19] proposed the use of RedLeo Pro V8 software to simulate input data for the purpose of monitoring and controlling the engine system. This method has been demonstrated to be effective in distinguishing between normal and abnormal signals, with the signals being designed through an adaptive system with the objective of reducing noise and improving diagnostic accuracy. The final results demonstrate the efficacy of the proposed method in feature extraction and selection, rendering it an effective approach for engine troubleshooting. Syta et al. [
20] analyzed the vibration signals of the Rotax 912 ULS aircraft engine to detect misfires in individual cylinders. A linear metric was developed to describe the vibration level based on power amplitude spectral values at two selected frequencies. In addition, a nonlinear metric was calculated from the periodicity of engine operation. The results demonstrate that both methods are effective in detecting misfires in diverse cylinder configurations and that their combination enables the identification of faulty cylinders. Jafari et al. [
21] employed an acoustic emission sensor to detect misfires in a multi-cylinder diesel engine. The angular periodic modulation (cyclic bursts) in the signal power was highlighted by squared envelope spectral processing of the acoustic emission signal. This study demonstrates the effectiveness of combining sensor technology with signal processing for misfire detection in a six-cylinder diesel engine. Kang et al. [
22] proposed an efficient method for detecting and monitoring engine misfires, focusing on small speed changes on the crankshaft, simulating five engine states (one normal ignition and four misfires) in the experiment. The results show that the composition of
6f is the largest under normal conditions, but with the occurrence of fire, the composition of
f increases gradually. 3D FFT modes with ratios of
f,
2f and
3f,
6f show a greater distance between the misfire state and the normal state. However, it should be noted that all these methods have certain limitations.
In order to overcome the limitations of traditional signal processing methods, machine learning techniques have been introduced into the field of fault diagnosis of ship engines in a gradual and progressive manner [
23]. In contrast to conventional methodologies, machine learning enables the automatic discovery of features through a data-driven approach, eliminating the necessity for manually designed feature extraction techniques. This significantly enhances the automation and precision of fault diagnosis [
24,
25]. For instance, Syta et al. [
26] put forth a methodology for the detection and identification of misfires in aviation internal combustion engines through the analysis of vibration time series. This approach employs a machine learning classification model to discern the operational states of the engine. The findings indicated that the utilization of nonlinear metrics facilitated a high degree of accuracy. The classification accuracy was demonstrated with a reduced number of samples. Singh [
27] put forth a novel approach to identifying misfires through the assessment of radiated sound quality metrics in the vicinity of the cylinder block or exhaust pipe. This method has been subjected to rigorous testing on a four-stroke, four-cylinder engine across a range of load and speed conditions. Sound quality metrics, including noise, roughness, and fluctuation intensity, were predicted by a support vector machine classifier with an accuracy of 94%. In comparison, conventional methods for vibration signals and sound pressure levels exhibited a prediction accuracy of 82% and 85%, respectively. This suggests that misfire detection based on sound quality is more accurate and independent of engine speed and torque. In contrast to conventional methods, the new method does not necessitate direct contact with engine components, is computationally rapid, has a broad range of applicability, and can be readily implemented under the hood or in close proximity to the exhaust pipe via acoustic sensors. Mulay et al. [
28] employed piezoelectric accelerometers to obtain cylinder vibration signals for the purpose of detecting misfires and analyzing the specific vibration modes that occur at the time of misfire. Twelve statistical features were extracted, and useful features were filtered by the J48 decision tree algorithm and classified using regression classification and IBk classification. Subsequently, the performance of the classifiers was compared, and an effective misfire detection algorithm was proposed by integrating the classifiers through voting. While the aforementioned machine learning techniques exhibit commendable classification capabilities in certain marine engine fault diagnosis scenarios, they are susceptible to the challenge of excessive computational complexity when confronted with voluminous data or intricate signals.
As data size and model complexity have increased, artificial neural networks (ANNs) have emerged as a prominent area of research in the field of fault diagnosis. ANNs emulate the intricate workings of neurons in the human brain, forming intricate multilayer networks that can automatically extract high-dimensional features from data and perform complex nonlinear mapping. Compared to traditional machine learning, ANNs offer enhanced flexibility in fault diagnosis and the ability to handle more complex signal features. For example, Liu et al. [
29] proposed a novel misfire detection model for turbocharged diesel engines using artificial neural networks (ANNs). The model was implemented in the MATLAB/Neural Network Toolbox environment and experimentally investigated on a V6 turbocharged diesel engine. The preliminary results demonstrated that the model successfully detected misfires in the majority of cases, although some misdetections were observed, and the mean-square error was relatively high. However, by incorporating the engine speed variations within the cycle into the training data, the model ultimately achieved fully accurate detection, thereby providing a new method for accurately detecting misfires in turbocharged diesel engines. Jafarian et al. [
30] investigated misfire faults in internal combustion engines, with a particular focus on the analysis of signal variations captured using different sensors. The engine faults were subjected to experimental analysis, and it was proposed that a Fast Fourier Transform (FFT) be employed for signal transformation and feature extraction, with the utilization of artificial neural networks (ANNs) in the fault classification stage. By measuring the performance metrics of the ANNs and comparing them with the results of similar studies in the related literature, the results demonstrate the efficacy of incorporating vibration signals into the analysis of internal combustion engine faults. However, traditional shallow neural networks are susceptible to converging on local optimal solutions due to their shallow network structure, exhibiting suboptimal training efficiency and poor generalization ability when confronted with voluminous and complex datasets.
To address these challenges, deep learning methodologies have been extensively employed, particularly with the advent of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), along with their enhanced long short-term memory (LSTM) networks and bidirectional long short-term memory (BiLSTM) networks. These developments have considerably elevated the sophistication of diagnostic techniques for ship engine faults [
31,
32]. For instance, Zhang et al. [
33] investigated a misfire detection methodology based on convolutional neural networks (CNNs), utilizing experimental data from a six-cylinder inline diesel engine for network training and testing to identify misfire patterns in one and two cylinders. The results demonstrate that the convolutional neural network (CNN) is capable of accurately detecting complete misfires in one or two cylinders under steady-state conditions, with a detection accuracy exceeding 96% in the case of partial misfires in one cylinder when fuel injection is reduced to half of the normal amount. Furthermore, under non-steady-state conditions, such as acceleration or deceleration, the CNN demonstrates satisfactory performance within a limited acceleration range. However, the network’s efficacy declines when the absolute acceleration of the engine speed surpasses 100 r/min/s. Venkatesh et al. [
34] put forth a methodology for identifying misfires in internal combustion engines through the application of transfer learning techniques. Initially, vibration signals are gathered from the upper portion of the engine and then presented as input to the deep learning algorithm. In order to identify misfire states, pre-trained networks (e.g., AlexNet, VGG-16, GoogLeNet, and ResNet-50) are employed. Furthermore, the effects of hyper-parameters (e.g., batch size, solver, learning rate, and training-to-test ratio) are investigated. Xu et al. [
35] proposed a domain-adversarial wide-kernel convolutional neural network (DAWDCNN) for diesel engine misfire fault diagnosis. This was done with the aim of addressing the impact of diesel engine noise variations and stochasticity on the performance of existing diagnostic methods. The DAWDCNN demonstrates superior generalization performance in 11 noisy domain adaptation tasks relative to the conventional staged domain adversarial training approach. The experimental outcomes indicate that the mean accuracy of the DAWDCNN on the four datasets surpasses that of random forests, long- and short-term memory networks, and other comparable techniques. Wang et al. [
36] proposed a novel approach based on long short-term memory recurrent neural networks (LSTM RNNs) for the detection of diesel engine misfires. The findings indicate that the LSTM RNN-based algorithm is capable of overcoming the inherent limitations of traditional methods. The network structure, which inputs a fixed segment of raw rotational speed signals and utilizes misfire or no-fault labels as outputs, has demonstrated a notable accuracy in diagnosing misfires.
However, convolutional neural networks (CNNs) are designed for the extraction of static image features, particularly those related to spatial dimensions. They are less effective when processing time-series information due to their limited processing time [
37,
38]. Recurrent neural networks (RNNs) and their enhanced variants, such as long short-term memory (LSTM) and bidirectional LSTM (BiLSTM), are optimized for time-series data analysis. They excel in capturing temporal dependencies but are less adept at feature extraction from images [
39,
40].
In order to address the deficiencies of convolutional neural networks (CNNs) in capturing time series dependencies and the limitations of recurrent neural networks (RNNs) in extracting spatial features of images, a novel intelligent diagnostic model for marine dual-fuel engine misfire with ResNet18 in combination with BiLSTM is proposed, aiming to improve accuracy and real-time diagnosis of faults. In contrast to traditional fault diagnosis techniques, this approach employs the continuous wavelet transform (CWT) to transform the one-dimensional instantaneous rotational speed signal into a two-dimensional time-frequency image, thereby preserving the time-frequency characteristics of the signal. The two-dimensional image data are fed into a network to extract high-dimensional feature representations through a deep convolutional layer. These are then passed to a bidirectional long short-term memory (BiLSTM) network for temporal processing, which enables the capture of the dynamically changing characteristics of the signal. This method not only extracts the deep features of fault signals from the images but also processes the time-dependent information in the signals through the BiLSTM network, thereby achieving more accurate fault identification.
The principal findings of the study can be classified into three main aspects.
(1) An intelligent diagnostic framework combining continuous wavelet transform (CWT) and deep learning models is proposed. This framework utilizes the continuous wavelet transform (CWT) to convert the instantaneous rotational speed signal of a ship’s engine from a one-dimensional time series to a two-dimensional time-frequency image. Additionally, it captures the time-frequency features of the misfire fault signal through multiscale decomposition. This framework effectively addresses the limitations of traditional signal processing methods in capturing non-smooth signals, providing a more comprehensive input for subsequent deep learning models.
(2) An intelligent diagnostic model (ResNet–BiLSTM) is constructed by fusing ResNet18 and BiLSTM. The ResNet18 model serves as a feature extractor, enabling comprehensive mining of local spatial features in the time-frequency image. The BiLSTM network, on the other hand, is capable of capturing temporal dependencies in the signal. The fusion model enables the dual learning of time-frequency features and timing information, thereby markedly enhancing the detection capability for misfire faults.
(3) A series of comparative experiments were conducted to evaluate the performance of the proposed ResNet–BiLSTM model in comparison with fusion models (AlexNet–BiLSTM, VGG11–BiLSTM) and existing methods (AlexNet–LSTM, VGG–LSTM). The results demonstrated that the ResNet–BiLSTM model exhibited superior comprehensive performance, outperforming the other models.
The remaining sections are organized as follows:
Section 2 introduces the fundamental principles of the relevant theories.
Section 3 describes the implementation process of the intelligent fault diagnosis method based on the improved ResNet–BiLSTM fusion model.
Section 4 provides a comprehensive account of the data collection process and the construction of the dataset.
Section 5 presents a comparative analysis of the different models and their respective outcomes.
Section 6 offers a summary of the conclusions and suggests directions for future research.
2. Basic Theory
2.1. Continuous Wavelet Transform
The continuous wavelet transform (CWT) is a powerful tool for analyzing signals at multiple scales. It provides a joint representation of a signal in time and frequency by convolving the signal with a set of wavelet functions with different scales and positions [
41]. This method is particularly effective in analyzing non-stationary signals and transient phenomena and is applicable to a variety of engineering and scientific fields, including the fault diagnosis of instantaneous rotational speed signals from ship dual-fuel engines.
The core of the CWT lies in the selection of wavelet functions. In contrast to the Fourier transform, the wavelet transform employs basis functions that are localized, finitely supported waveforms—namely, wavelets—that can be adjusted in both time and frequency. The CWT can be expressed as follows [
42]:
In the equation, represents the original input signal, is the wavelet basis function (mother wavelet), denotes the complex conjugate of the wavelet function, is the scale factor (controlling the width of the wavelet), refers to the time translation factor (controlling the position of the mother wavelet), and is a normalization factor ensuring that the transformation maintains the same energy across different scales.
In this study, the Morse wavelet is employed as the mother wavelet function, and the Fourier transform of the generalized Morse wavelet is:
In this context, represents the unit step, is a normalizing constant, ω is the frequency parameter that controls the frequency of the wavelet function, is viewed as a decay or compactness parameter, and characterizes the symmetry of the Morse wavelet, respectively.
2.2. Structure of the ResNet Network Model
The ResNet (residual network) is a deep convolutional neural network (CNN) architecture, initially proposed by Kaiming He and colleagues [
43]. The fundamental innovation of ResNet can be attributed to the introduction of the concept of residual learning, which markedly enhances the training efficacy and performance of deep networks.
In the context of intelligent fault diagnosis for engine misfires, ResNet demonstrates the capacity to process complex time-series data and high-dimensional feature data. The deep convolutional operations effectively extract key features from the data, thereby improving diagnostic accuracy. Moreover, engine fault samples are frequently scarce in comparison to normal samples, and ResNet’s residual learning mechanism is particularly adept at addressing class imbalance issues, enabling the model to effectively learn features from the limited fault samples. The residual block represents the fundamental unit of ResNet. It comprises two principal components: one or more convolutional layers and a shortcut connection.
The fundamental configuration of a residual block is illustrated in
Figure 1. As illustrated in the diagram, the sole distinction between the two types of residual blocks pertains to the manner of implementing the shortcut connection. In one instance, the shortcut connection is implemented through a convolutional layer to adjust the number of channels (as illustrated by the dashed line on the right side of
Figure 1b). In contrast, in the other instance, it is directly connected without adjusting the number of channels (as illustrated by the solid line on the right side of
Figure 1a).
2.3. Structure of the BiLSTM Network Model
LSTM (long short-term memory) network was developed to address the “vanishing gradient” and “exploding gradient” issues that are commonly encountered in traditional recurrent neural networks (RNNs). This is achieved through the introduction of specialized memory cells and three gate structures: the forget gate, the input gate, and the output gate, which regulate the flow of long short-term information. LSTM effectively selects which time-step information to retain or discard, thereby overcoming the limitations of traditional RNNs in capturing long-term dependencies [
44]. The overall framework of the LSTM model is illustrated in
Figure 2.
The operation of the LSTM can be described as a process of filtering information within the cell state. The network discards superfluous, dated data and incorporates novel information based on the present input and the hidden state from the preceding time step. This process enables the network to retain pertinent data for subsequent time steps. Initially, the forget gate determines which components of the cell state should be discarded based on the preceding hidden state and the present input. Subsequently, the input gate determines which novel information will be incorporated into the cell state. In conclusion, the output gate regulates which data from the present cell state will be utilized for the final output and updates the hidden state. The coordinated operation of these gates enables LSTM to efficiently retain, update, and output information at each time step, thereby ensuring that its hidden state reflects long-term dependencies.
The detailed computation process of LSTM is illustrated in
Figure 3, where (a), (b), (c), and (d), respectively, show the computation processes for the forget gate
, input gate
, current cell state
, and output gate
.
The computation formula for the forget gate is given by
The input gate is calculated as
The formula for the cell state at the current moment is
The output gate is given by
In Equations (3)–(6), represents the sigmoid activation function, while , , and correspond to the weight matrices for the forget gate, input gate, and output gate, respectively. The bias terms , , and correspond to the forget gate, input gate, and output gate, respectively. The symbol denotes the concatenation of two vectors into a longer vector, while the symbol represents multiplication by element. Through these computational processes, the output and state updates of each layer of the LSTM can be obtained.
Bidirectional long short-term memory (BiLSTM) networks represent a variant of the LSTM. A limitation of the traditional LSTM is its inability to process time-series information in a bidirectional manner, from past to future or vice versa. In contrast, a bidirectional long short-term memory (BiLSTM) network incorporates a backward long short-term memory (LSTM) layer, enabling the simultaneous extraction of information from both the past and future directions of the time series. The dual-direction capability of BiLSTM renders it more effective at capturing long-term dependency information, thereby making it particularly suitable for tasks with strong temporal dependencies.
In the context of fault diagnosis for engine misfires in ships, the instantaneous rotational speed signal of the engine represents a typical time-series dataset that contains dynamically changing fault patterns. The bidirectional structure of BiLSTM enables the capture of changing trends in the signal both before and after, thereby enhancing sensitivity to fault features and improving diagnostic accuracy.
Figure 4 depicts the architecture of the bidirectional long short-term memory (BiLSTM) model. In contrast to the traditional unidirectional LSTM, the BiLSTM is constituted of two LSTM networks. One LSTM (the upper part) processes the input sequence in a forward direction (from time step
to
), while the other LSTM (the lower part) processes the input sequence in a reverse direction (from time step
to
).
As illustrated, the input vector at each time step (e.g., Vector 1, Vector 2, Vector 3) is fed simultaneously into both the forward and backward LSTMs. The forward LSTM generates the hidden states, designated as , , and , while the backward LSTM generates the hidden states, designated as , , and . At each time step, the outputs from the forward and backward LSTMs are concatenated (e.g., , , ) to form a complete output vector for that time step.
This structure enables BiLSTM to simultaneously utilize both past and future information in the sequence, thereby allowing it to capture complex temporal dependencies within time-series data. In the case of complex signals with time dependencies, such as the instantaneous rotational speed of a ship’s dual-fuel engine, the BiLSTM model is better able to learn the dynamic variation features in the signal, thereby improving the accuracy of fault diagnosis. The combination of forward and backward LSTMs allows the model to focus on the changes in fault signals prior to the current moment while also incorporating information from future time steps, thereby facilitating more accurate fault detection and identification.
5. Results and Discussion
To validate the effectiveness of the proposed model, this study, cylinder misfire experiments were conducted by doing cylinder misfire experiments on the dual-fuel engine mentioned in
Section 4.1 as well as by utilizing the samples constructed in
Section 4.2.3 as inputs to the model and conducting comparative experiments with a variety of models. To comprehensively assess the superiority of the combined model, several models were selected for comparative analysis. These models include both single models and fusion models. The single models consist of classic CNN architectures (LeNet-5, AlexNet, VGG11, ResNet18) as well as BiLSTM. The fusion models include AlexNet–BiLSTM, VGG11–BiLSTM, and existing methods such as AlexNet–LSTM and VGG–LSTM.
The model parameters are set as follows: The batch size is 32, the number of epochs is 100, SGD is selected as the optimizer, cross-entropy is used as the loss function, the learning rate is set to 0.001, the deep learning framework used is Pytorch (version 2.4.0), and the programming language is Python. The hardware configurations are as follows: The central processing unit is a 12th-generation Intel Core i5-12400F, while the graphical processing unit is a NVIDIA GeForce RTX 3060 Ti, accompanied by 8 GB of RAM.
In particular, SGD is selected as the optimizer for the model for several reasons.
(1) From the convergence stability perspective, SGD provides a more stable convergence path during model training, especially at smaller learning rates. In comparison to adaptive methods (e.g., Adam, RMSprop), SGD enables the model to gradually approach the local optimal solution without over-tuning, thus avoiding unnecessary oscillations on complex datasets.
(2) Regarding the ability to generalize, SGD is often considered to have superior generalization ability. Adaptive optimizers (e.g., Adam, etc.) dynamically adjust the learning rate, which may result in the model overfitting the training data. In contrast, SGD can more effectively control the generalization effect of the model and perform more robustly on the validation and test sets.
(3) In regard to the efficacy of the training program, SGD combined with a smaller learning rate (e.g., 0.001) is suitable for long-term training (100 epochs) and can gradually approach the global optimum in the process of continuous updating. Furthermore, for image classification tasks, SGD is typically able to effectively utilize the feature extraction capabilities of deep networks during training, thereby assisting the network in learning features more effectively at different levels.
(4) In terms of resource requirements, In a given hardware configuration (e.g., an NVIDIA GeForce RTX 3060 Ti graphics card and 8 GB RAM), SGD is capable of achieving superior performance with constrained computational resources, while minimizing computational overhead, making it an optimal choice for training deep learning models. In contrast, while the Adam optimizer can facilitate convergence in certain instances, it typically requires more memory and computational resources, particularly when dealing with larger datasets. Consequently, in resource-constrained environments, SGD outperforms Adam due to its enhanced computational efficiency and reduced memory consumption, making it particularly well-suited for long or large-scale model training tasks.
In summary, SGD, with its stable convergence and excellent generalization ability, is more suitable for the long-term training requirements of this study.
5.1. Fault Diagnosis Based on Classic CNN Model
In order to identify the most appropriate model for comparative analysis, this study employed a number of classic convolutional neural network (CNN) models, including LeNet-5, AlexNet, VGG11, and ResNet18, in order to conduct fault diagnosis experiments on the same dataset. At the outset, the training and validation samples were input into each network model for training purposes. Following iterative optimization, the test samples were employed to evaluate the model’s performance. This approach allows for an effective comparison of the performance of different CNN models in the fault diagnosis task, thereby facilitating the selection of the optimal model. The training results of the four classic CNN models are presented in
Figure 9, which illustrates the training accuracy in
Figure 9a, training loss in
Figure 9b, validation accuracy in
Figure 9c, validation loss in
Figure 9d. The graphs for the visual observation of the performance of each model during training and validation, thus enabling an assessment of their effectiveness in the fault diagnosis task.
A review of the performance of the four classic CNN models during the training process, as illustrated in the accompanying figure, reveals the following observations:
(1) The training accuracy is as follows: As the number of training epochs increases, the training accuracy of all models demonstrates a gradual improvement. The ResNet18 model exhibits the highest accuracy, approaching 1.0, suggesting that it performs optimally on the training set. Additionally, VGG11 and AlexNet demonstrate robust performance with high accuracy, though slightly below that of ResNet18. LeNet-5 exhibits comparatively suboptimal performance, with a gradual increase in accuracy that does not reach a high level by the conclusion of the experiment.
(2) Similarly, as the number of epochs increases, the training loss of all models decreases, which aligns with the trend of rising accuracy. The decrease in loss for ResNet18 is the most rapid, with the loss value dropping and stabilizing at an early stage of the process, indicating a smooth optimization. VGG11 and AlexNet exhibit a slower decline in loss, with the final values slightly above that of ResNet18. LeNet-5’s loss value decreases rapidly in the initial stages but remains at a higher level by the end.
(3) With regard to the validation set accuracy, the ResNet18 model demonstrates the most optimal performance on the validation set, with its accuracy approaching 1.0 at an early stage, thereby exhibiting excellent generalization capabilities. Subsequently, VGG11 and AlexNet demonstrate a gradual increase in validation accuracy, stabilizing in the later epochs but still lower than ResNet18. In contrast, LeNet-5 exhibits relatively low validation accuracy, reaching a peak of approximately 0.8, which suggests limited generalization capability.
(4) From the validation set loss, ResNet18 exhibits the most stable validation loss, reaching a minimum at an early stage, which reflects its robust optimization performance. VGG11 and AlexNet display some fluctuations in the initial stages, but their losses stabilize as the training progresses. LeNet-5’s validation loss initially decreases at a gradual rate, followed by an increase, and remains relatively high, indicating the potential for underfitting on the validation set.
To validate the generalization capability of the aforementioned classic CNN models and eliminate the influence of randomness on the results, each model was subjected to 10 independent repeat experiments. The specific steps were as follows: while maintaining the model hyperparameters constant, 50% of each fault class was randomly selected as a new test set for the independent evaluation of the models. The mean, standard deviation, and measurement time for each model over 10 trials are presented in
Table 4, with the results illustrated in
Figure 10. The results demonstrate that ResNet18 not only exhibits the shortest measurement time and the smallest standard deviation in comparison to other classic CNN models, but it also displays superior accuracy performance.
Moreover,
Figure 11 depicts the four most optimal training outcomes for the ResNet18 model.
Figure 11a–d illustrate the training accuracy, training loss, validation accuracy, and validation loss, respectively. The presented graphs offer a more detailed representation of the model’s performance, showcasing the consistency and efficacy of ResNet18 across various training processes.
As illustrated in
Figure 11, while ResNet18 demonstrates remarkable proficiency on the training set, it displays considerable variability on the validation set, particularly during the initial stages of training. This suggests that ResNet18 is unable to fully account for the temporal dependencies inherent in the rotational speed data. The model is unable to effectively process time-series information, which has a detrimental impact on its ability to generalize on the validation set.
5.2. Fault Diagnosis Based on BiLSTM Model
To assess the efficacy of models designed to process time-series data, fault diagnosis experiments were conducted utilizing BiLSTM models (single-layer, double-layer, and triple-layer) on the identical dataset. In this experiment, the single-layer BiLSTM model is represented by BiLSTM1, the double-layer BiLSTM model is represented by BiLSTM2, and the triple-layer BiLSTM model is represented by BiLSTM3. The number of hidden nodes for all three BiLSTM models was set to 128. The training results of these models are illustrated in
Figure 12, where plots (a), (b), (c), and (d) represent the training accuracy, training loss, validation accuracy, and validation loss, respectively. A summary of the diagnosis results is provided in
Table 5.
As illustrated in
Figure 12 and
Table 5, the double-layer BiLSTM model exhibits superior performance in comparison to the single-layer and triple-layer models. The double-layer BiLSTM model exhibits superior training accuracy, reaching 88.35%, and validation accuracy, at 79.89%. These values exceed those of the other two models. Furthermore, the loss values of the double-layer BiLSTM are markedly lower than those of the single-layer and triple-layer models, suggesting that it more effectively captures the data features during training. Furthermore, the training time remains within a reasonable range. The double-layer structure allows for deeper extraction of temporal information, thereby providing stronger modeling capability compared to the single-layer model. At the same time, it avoids the potential overfitting issues observed in the triple-layer model, as its moderate complexity strikes a balance between feature learning and model generalization.
To assess the generalization capacity of the double-layer BiLSTM model, performance evaluations were conducted using the test set. The specific procedure was as follows: the trained model was subjected to 10 independent experiments using the test set. For each test, 50% of the samples from each category were randomly selected to form a new test set, ensuring category balance. The resulting test results are presented in
Figure 13 and
Table 6. The content of the bar graph represents accuracy and the content of the line graph represents test time. The average accuracy across the ten independent experiments was 79.99%, with a standard error of 1.01, indicating that the model’s performance remains relatively stable across different combinations of test samples.
The analysis of BiLSTM models with varying numbers of layers revealed that, while the double-layer BiLSTM model exhibited relatively superior performance, the training process still encounters convergence issues. This suggests that the model’s capacity to extract features remains constrained, particularly when confronted with intricate time-series data. While the double-layer BiLSTM structure offers improvements in capturing temporal dependencies, it may still encounter difficulties in optimizing when faced with more intricate signal patterns, potentially resulting in suboptimal performance.
5.3. Fault Diagnosis Based on Fusion Model
In light of the preceding analysis, this study has selected AlexNet, VGG11, and ResNet18—three classic convolutional neural network (CNN) models that have demonstrated robust performance—and has combined each with a double-layer bidirectional long short-term memory (BiLSTM) unit to construct three fusion models. These convolutional neural network (CNN) models are capable of extracting powerful features from the input data, effectively capturing spatial characteristics. Bidirectional long short-term memory (BiLSTM) networks are particularly adept at handling sequential data and capturing long-range dependencies within sequences. Therefore, the combination of convolutional neural networks (CNNs) and bidirectional long short-term memory (BiLSTM) networks can fully leverage the advantages of CNNs in feature extraction while utilizing BiLSTM to manage dynamic changes in time series. In the AlexNet–BiLSTM and VGG11–BiLSTM models, the feature extraction components utilize the AlexNet and VGG11 networks, respectively. The extracted features are adjusted using the Permute and Reshape methods to ensure compatibility with the input format of the BiLSTM. The training results of the three fusion models are illustrated in
Figure 14, where graphs (a), (b), (c), and (d) represent training set accuracy, training set loss, validation set accuracy, and validation set loss, respectively. The diagnostic results are shown in
Table 7.
The figure above illustrates the performance of the three different fusion models during the training process, thereby revealing the following conclusions.
(1) The accuracy of the training set is as follows: As the number of epochs increases, the ResNet–BiLSTM model demonstrates a rapid increase in training accuracy, reaching a value of approximately 1.0 after approximately 40 epochs, with a relatively stable curve. Additionally, the VGG11–BiLSTM model rapidly attains an accuracy approaching 1.0, exhibiting a trajectory closely aligned with that of the ResNet–BiLSTM model, but the convergence rate was insufficient. In contrast, the AlexNet–BiLSTM model demonstrates a more gradual improvement in accuracy. While it ultimately approaches 1.0, its overall growth rate is not as rapid as that of the first two models, and it experiences notable delays during the mid-training phase.
(2) With the increase in epochs, the loss curves for ResNet–BiLSTM and VGG11–BiLSTM decrease rapidly, approaching zero after 50 epochs, indicating that these models experience a swift reduction in loss during training. In contrast, the loss curves for AlexNet–BiLSTM display a slower decline, with significantly higher values in the initial stages compared to the other two models. While it also approaches zero after 100 epochs, this reduction occurs at a slower pace.
(3) The validation accuracy curve for ResNet–BiLSTM exhibits the most optimal performance, displaying a rapid increase and stabilizing near 1.0 after 30 epochs, with a relatively smooth curve and minimal fluctuations. Similarly, the VGG11–BiLSTM model also exhibits a rapid increase in accuracy, approaching 1.0. However, it displays greater fluctuations between certain epochs, indicating slightly lower stability. In contrast, the performance of AlexNet–BiLSTM on the validation set is inferior, with a curve that fluctuates significantly and maintains a relatively low accuracy.
(4) Regarding the validation set loss, ResNet–BiLSTM demonstrates a rapid decrease, maintaining a low level throughout the latter stages of training with minimal fluctuations. VGG11–BiLSTM also exhibits a rapid decrease in validation loss, but it experiences considerable fluctuations between some epochs. AlexNet–BiLSTM presents the poorest performance in terms of validation loss, with relatively high values and substantial volatility.
The table above presents the diagnostic results of three distinct fusion models. As evidenced by the data presented in the table, ResNet–BiLSTM exhibits the most optimal performance on both the training and validation sets. It achieves the lowest training loss and validation loss, at 0.0016 and 0.0364, respectively, while also attaining the highest validation accuracy of 99.08%. Although ResNet–BiLSTM necessitates a more extended training period due to its deeper network structure as a feature extractor, in addition to the supplementary computational overhead of BiLSTM when processing temporal features, this augmented complexity results in elevated time costs. In contrast, other models, such as AlexNet–BiLSTM and VGG11–BiLSTM, while demonstrating reduced training times, exhibit inferior accuracy and loss performance. This suggests that although ResNet–BiLSTM has greater computational resource and time demands, its notable performance enhancement makes it a valuable compromise for the high accuracy demands of practical applications.
In conclusion, the ResNet–BiLSTM model demonstrates the optimal integration of the residual network architecture of ResNet18 with the temporal dependency capture capabilities of BiLSTM, thereby achieving the most comprehensive performance among the three models. The model demonstrates a high level of accuracy during training, approaching 1.0 with remarkable swiftness. Furthermore, it exhibits superior performance on the validation set, attaining a validation accuracy of 99.08% and the lowest validation loss with minimal fluctuations. This illustrates the model’s robust generalization capacity and stability. Furthermore, the training time for ResNet–BiLSTM is only 2.949 h, indicating that the model maintains high precision while also exhibiting considerable computational efficiency, rendering it well-suited for applications where training time and model performance are of paramount importance. This makes it the optimal model selected for this study. In contrast, while VGG11–BiLSTM also achieves a validation accuracy of 97.64% and performs well on the training set, its validation loss and accuracy exhibit significant fluctuations during certain epochs, indicating slightly lower stability compared to ResNet–BiLSTM. Although the training time for VGG11–BiLSTM is marginally shorter than that of ResNet–BiLSTM, this time advantage does not translate into a significant performance improvement. In contrast, the larger fluctuations in its validation loss suggest that its generalization ability and stability are somewhat inferior to those of ResNet–BiLSTM. AlexNet–BiLSTM, while ultimately achieving a high training accuracy, shows a slower improvement rate and comparatively poor performance on the validation set, with a validation accuracy of only 95.78%. The higher validation loss and noticeable fluctuations indicate an insufficient generalization ability. Despite the shortest training time at just 1.521 h, the performance falls significantly short of the other two models.
To assess the model’s capacity for generalization, a new test set was constructed by randomly selecting 50% of the samples from each class, thus ensuring a fair and unbiased evaluation. Subsequently, the model’s performance was evaluated through 10 independent experiments, with the objective of ensuring the reliability of the results and eliminating the effects of randomness. The results of the experiment are presented in
Figure 15 and
Table 8. The mean test accuracy was 99.30%, with a standard deviation of 0.08, indicating that the model exhibits remarkable stability across diverse test sets. The average test time was 13.56 s, with a standard deviation of 0.07 s, thereby demonstrating consistency and efficiency in the model’s operational performance. These results indicate that the proposed model not only achieves high diagnostic accuracy but also exhibits excellent stability and consistency in terms of testing time and performance fluctuations, thereby further confirming the model’s generalization ability and reliability in practical applications.
5.4. Comparative Experimental Analysis of Different Models
To enhance the reliability of the proposed model’s performance assessment, this study selected the AlexNet–LSTM model referenced in [
45] and the VGG–LSTM model referenced in [
46] for diagnosing engine misfire faults. The aforementioned models were then compared with the proposed ResNet–BiLSTM model based on a number of criteria, including accuracy, loss values, training time, and the number of parameters. The training results are illustrated in
Figure 16, wherein figures (a), (b), (c), and (d) represent the training set accuracy, training set loss, validation set accuracy, and validation set loss, respectively. The specific diagnostic parameter results are presented in
Table 9 for the reader’s convenience. It is noteworthy that the diagnostic outcomes presented herein encompass comparative results for all models discussed in this study.
As illustrated in
Figure 16, the performance of three distinct models during the training phase is depicted. From the figure, the following observations can be made.
(1) The training accuracy is as follows: As the number of training epochs increases, the training accuracy of the ResNet–BiLSTM model remains at a consistently high level throughout the training process, ultimately approaching 1.0. This demonstrates the model’s capacity for effective fitting and robust learning with respect to the training set. Furthermore, the increasing trend is notably smooth, exhibiting minimal fluctuations, which indicates rapid and stable convergence of the model. While the remaining two models also demonstrated a training set accuracy of approximately 1.0, neither exhibited the same rapid convergence as the proposed model.
(2) As the number of training epochs increases, the training loss of the ResNet–BiLSTM model rapidly decreases in the initial stages, eventually stabilizing at a value close to zero. This trend indicates the model’s effective fitting to the training data, reflecting a strong learning capability with a low loss value. The overall trend is characterized by a smooth trajectory with minimal fluctuations, which indicates excellent convergence. In contrast, the AlexNet–LSTM model demonstrates a relatively modest reduction in training loss, ultimately reaching a value of approximately 0.017. This indicates that, despite achieving a reasonable level of accuracy on the training set, the model’s fitting ability is not as strong as that of the ResNet–BiLSTM model. The final loss of the VGG–LSTM model is numerically lower than that of AlexNet–LSTM, but its convergence rate is slow.
(3) In terms of the validation accuracy. The ResNet–BiLSTM model demonstrates a rapid increase in validation accuracy during the initial stages of training (the first 30 epochs), subsequently stabilizing and approaching a level approximating 1.0. This suggests that the model is highly effective in capturing data features, thereby enabling it to learn effective representations rapidly. Throughout the majority of the training process, the validation accuracy remains at a high level, thereby demonstrating the model’s robust capacity for generalization. In contrast, the AlexNet–LSTM model exhibits a gradual increase in validation accuracy, ultimately reaching a value of approximately 0.96. While this final accuracy is relatively high, the improvement process is slower, indicating that the learning speed is not as rapid as that of ResNet–BiLSTM. During specific training phases (such as the initial 30 epochs), the increase in accuracy is constrained, indicating a potential limitation in feature extraction capabilities. The VGG–LSTM model demonstrates the lowest validation accuracy throughout the training process, reaching a maximum of approximately 0.93. In the initial 40 epochs, the accuracy exhibits significant fluctuations, suggesting an unstable learning process that may be influenced by overfitting or underfitting.
(4) The ResNet–BiLSTM model exhibited the lowest validation loss, ultimately converging to approximately 0.036. This indicates that the model performs with great stability and efficacy on the validation set, exhibiting minimal fluctuations throughout the training process. This demonstrates the model’s capacity for adaptability to the validation data. In contrast, the AlexNet–LSTM model exhibits a validation loss of approximately 0.116, which is considerably higher than that observed in the ResNet–BiLSTM model. This indicates that the model’s performance on the validation set is suboptimal and exhibits some degree of overfitting. Although the fluctuations in validation loss are minimal, they remain higher than those of the ResNet–BiLSTM model, indicating a deficiency in generalization capability compared to ResNet–BiLSTM. The VGG–LSTM model exhibits the highest final validation loss, approximately 0.204, and experiences considerable fluctuations in the early stages of training, suggesting an unstable performance on the validation set. Although the training loss is relatively low, the validation loss indicates that the model is unable to generalize effectively, suggesting that overfitting may be a risk.
As illustrated in
Table 9, the diagnostic outcomes of the various models are delineated. As can be seen from the table, the following observations can be made:
The ResNet–BiLSTM model demonstrated the most optimal overall performance, attaining the highest accuracy on both the training and validation sets (99.97% and 99.08%, respectively) and the lowest loss (0.0016 and 0.0364, respectively). Moreover, it demonstrates remarkable stability and generalization capability. Despite the increased computational complexity associated with a larger number of parameters and longer training time, the model’s superior performance justifies the additional computational resources required.
In comparison, among the single models, ResNet18 demonstrates the most optimal performance, while the other models exhibit overall performance inferior to ResNet18. Nevertheless, the accuracy and loss of ResNet18 on the validation set suggest that it has limited capacity for handling sequential data, particularly in dynamic environments where the model’s adaptability may be constrained. Moreover, the relatively extended training period does not confer a notable advantage in efficiency over the ResNet–BiLSTM.
Among the fusion models, AlexNet–LSTM exhibits relatively good overall performance, effectively combining the feature extraction ability of AlexNet with the time-series learning ability of LSTM. However, AlexNet–LSTM demonstrates significantly lower accuracy than ResNet–BiLSTM in both the validation and test sets, indicating that it may encounter greater challenges in addressing complex time-series data, particularly in terms of the model’s generalization ability and robustness.
In conclusion, the ResNet–BiLSTM model, which demonstrated superior performance in training and validation, as well as notable advantages in generalization capability and model stability, was identified as the optimal model in this study.
As illustrated in
Figure 17, the confusion matrices of the four models—ResNet18, AlexNet–LSTM, VGG–LSTM, and ResNet–BiLSTM—further substantiate the exceptional performance of the ResNet–BiLSTM model. The model exhibits high accuracy in identifying diverse fault types in the classification task. It is noteworthy that the classification accuracy for the normal state reached 100%, which is indicative of exceptional performance. The classification accuracy for cylinder 6 misfire is 99.50%, while the accuracies for cylinders 2, 4, and 5 misfires are all 99.25%. The accuracy for cylinder 3 misfire is 99.00%. In comparison, the classification accuracy for cylinder 1 misfire is slightly lower, at 98.25%. These findings suggest that the ResNet–BiLSTM model exhibits a high degree of discernibility and reliability in addressing complex fault types associated with diverse cylinder misfires.
In comparison, while ResNet18 performs optimally as a standalone model, its accuracy in diagnosing cylinder 4 misfire is notably inferior to that of the ResNet–BiLSTM ensemble model, thereby underscoring its limitations in certain specific classification tasks. Furthermore, the diagnostic accuracy of ResNet18 is inferior to that of ResNet–BiLSTM when applied to other cylinder misfires, thus reinforcing the superiority of the ensemble model. While the classification performance of the AlexNet–LSTM model is superior to that of VGG–LSTM, it nevertheless falls short of the proposed model in terms of overall performance.
In summary, the ResNet–BiLSTM model, which integrates the exceptional feature extraction capabilities of ResNet18 with the advantages of BiLSTM in processing sequential information, not only outperforms the standalone ResNet18 model and the compared models in classification accuracy but also demonstrates superior generalization ability. This is particularly evident in the context of intelligent fault diagnosis for engine misfires, where it demonstrates superior stability and efficiency in diagnostic performance.
The preceding analysis demonstrates that the ResNet–BiLSTM model exhibits superior overall performance compared to the other models. This paper will conduct a more detailed evaluation of the ResNet–BiLSTM model to gain further insights into its performance in practical applications. Specifically, this section provides an in-depth analysis of the model based on precision, recall, and F1-score, as mentioned in reference [
47]. The formulas for these metrics are as follows:
where
represents the true positives, which is the number of samples that the model correctly predicts as positive;
denotes the false positives, which is the number of samples that the model incorrectly predicts as positive; and
signifies the false negatives, which is the number of samples that the model incorrectly predicts as negative.
Table 10 presents results that are consistent with those shown in
Figure 17, indicating that the model performs well across all categories. While there are minor shortcomings in precision and recall for specific categories (such as cylinders 1 and 2), the overall performance remains satisfactory. This provides evidence that the model is effective and accurate in classification tasks.
6. Conclusions
In order to achieve intelligent diagnosis of ship dual-fuel engine misfire faults, this paper proposes a ResNet–BiLSTM model that integrates ResNet with BiLSTM. This model fuses the robust local feature extraction capabilities of deep residual networks (ResNets) with the benefits of bidirectional long short-term memory (BiLSTM) networks in processing time series data, markedly enhancing the precision of identifying intricate fault patterns and augmenting diagnostic efficacy. The principal conclusions are as follows:
(1) By employing sensor-collected instantaneous rotational speed data from the engine, this study utilized a sliding window technique for data augmentation, which not only markedly increased the sample size but also simulated the operating states of the engine at disparate moments, thereby enhancing the model’s adaptability to various operating conditions. Subsequently, a continuous wavelet transform (CWT) was applied to convert the one-dimensional time series data into two-dimensional graphical data. This approach permits the concurrent examination of signals in both the time and frequency domains, thereby disclosing spectral characteristics obscured within the time series. Furthermore, the incorporation of image data augmented the diversity of data representation, enabling the model to comprehend and learn the characteristics of engine misfires from multiple scales and perspectives, thereby achieving exemplary performance in fault diagnosis tasks.
(2) By employing image data, a series of convolutional neural network (CNN) and recurrent neural network (RNN) models were developed, encompassing LeNet-5, AlexNet, VGG11, ResNet18, and bidirectional long short-term memory (BiLSTM) networks. The integration and comparative analysis of different CNNs with a double-layer BiLSTM model revealed that the ResNet–BiLSTM model, which combines ResNet with BiLSTM, demonstrated superior performance across various performance metrics. In particular, the ResNet–BiLSTM model demonstrates significantly lower loss values on both the training and validation sets in comparison to other fusion models, which indicates its superior capacity for data fitting. Moreover, this model demonstrates superior classification accuracy and exceptional generalization ability, outperforming other fusion models in these respects.
(3) A comprehensive performance analysis was conducted on the proposed ResNet–BiLSTM model in comparison to the existing AlexNet–LSTM and VGG–LSTM models, with a particular emphasis on key metrics, including accuracy, loss value, training time, and parameter count. The findings demonstrate that despite the ResNet–BiLSTM model exhibiting a greater number of parameters and a longer training period in comparison to the other two models, it attains a more rapid convergence, higher accuracy, and lower loss values, thereby exhibiting markedly superior overall performance. From a practical standpoint, the ResNet–BiLSTM model is particularly well-suited to tasks that necessitate high precision and model performance, given its exceptional accuracy and stability.
Moreover, a comprehensive assessment of the model was conducted using pivotal metrics, including the confusion matrix, precision, recall, and F1-score. The findings demonstrate that the ResNet–BiLSTM model markedly outperforms existing techniques in terms of fault diagnosis accuracy. Even in instances where fault categories are difficult to differentiate, the ResNet–BiLSTM model demonstrates an exceptional capacity for classification. The model exhibits remarkable precision in the majority of categories, underscoring its robust capacity to accurately identify positive samples.
Furthermore, the preliminary results demonstrate that the methodology proposed in this paper is not only applicable to the diagnosis of misfires in marine dual-fuel engines but also has the potential for extension to fault diagnosis tasks in engines with varying cylinder numbers (12 or 16) and various models, including diesel and gas engines. Other similar diagnostic tasks can be realized by appropriately adjusting the model parameters and structure. This method demonstrates robust fault recognition capabilities across diverse internal combustion engine types, showcasing remarkable generality and adaptability. It offers novel insights into fault detection in other internal combustion engines, further enhancing the practical applicability of the research.
Although the ResNet–BiLSTM model has been shown to perform well in intelligent fault diagnosis of ship dual-fuel engine misfires, there is still scope for further improvement. Further optimization opportunities may be identified in the following areas:
(1) It is recommended that the dataset be expanded and diversified. While the current data preprocessing and augmentation methods have effectively enhanced model performance, the scale and diversity of the dataset remain limited. Expanding the dataset, particularly by incorporating data from a greater variety of operating conditions and fault types, could enhance the model’s generalization ability and robustness.
(2) The model structure may be optimized as follows: Although the ResNet–BiLSTM model effectively combines the strengths of ResNet and BiLSTM, there is still scope for further optimization of its structure. It would be beneficial to enhance the model’s feature extraction capabilities and classification performance, thereby improving its ability to recognize complex fault patterns.
(3) The integration of data from diverse sensors through multimodal learning approaches can facilitate the consolidation of information from disparate data sources, thereby enhancing the accuracy and reliability of fault diagnosis and reducing diagnostic errors attributable to the limitations of a single data source.