1. Introduction
With the advancement in marine equipment automation and the increasing complexity of machinery, modern marine systems are moving towards expansion, reduced human intervention, and enhanced intelligence. Consequently, the demand for reliability in marine machinery and equipment is also on the rise. Diesel engines have long been used as power devices in the maritime industry, and their reliability is crucial to the safe and efficient operation of vessels. However, due to the prolonged exposure to high temperatures and pressures in their operational environment, diesel engines inevitably experience performance degradation and even failures. Therefore, to ensure navigation safety and minimize economic losses, it is necessary to conduct real-time performance monitoring and fault diagnosis. By contrast with traditional fault monitoring methods such as monitoring and alarms or manual inspections, intelligent fault monitoring methods can effectively and accurately handle large amounts of collected data, providing more reliable diagnostic results by reducing human intervention in fault monitoring. Indeed, data-driven intelligent fault diagnostic methods have emerged as a promising research topic in the maritime industry [
1]. Unlike model-based fault diagnostic techniques, this approach does not require a significant amount of prior knowledge. Instead, it utilizes abundant real-time measurement data to construct fault models for predicting the health condition of diesel engines. Machine learning and deep learning technologies have made remarkable achievements in intelligent fault diagnosis for machinery as branches of data-driven methodologies. They can automatically learn useful information and features from historical monitoring data. Lazakis et al. [
2] used an artificial neural network (ANN) model to predict the condition of marine mechanical systems. Zeng et al. [
3] proposed a defense strategy for detecting and mitigating False Data Injection Attacks (FDIA) in ship DC microgrids. They employed an Artificial Neural Network (ANN) model for identifying and restoring corrupted data. The experimental results demonstrated the effectiveness of their method in detecting and mitigating FDIA, with errors less than 0.5 V. Qi et al. [
4] proposed a deep learning model-based cabin auxiliary device detector, and the experimental results verified the effectiveness of the model. Wang et al. [
5] introduced a stochastic convolutional neural network (CNN) structure in their work addressing the health monitoring of marine diesel engines. Their objective was to utilize convolutional computations and pooling operations within the CNN architecture to automatically extract discerning features from vibration signals. Han et al. [
6] introduced CNN for detecting and isolating propeller faults in dynamically positioned marine vessels. Kim et al. [
7] utilized a one-dimensional CNN model to analyze the vibration data of ship auxiliary equipment for fault diagnosis purposes. Ftoutou et al. [
8] used an unsupervised fuzzy clustering approach for the time–frequency signal analysis of diesel engines, and the algorithm was experimentally proved to have a high fault detection rate. Diez Olivan et al. [
9] put forward a comprehensive evaluation framework aimed at fault diagnosis for various states. This framework incorporated outlier detection and state characterization utilizing the K-means clustering algorithm. Additionally, fuzzy models of distances were employed for pattern recognition to accomplish fault diagnosis.
The above literature shows that both machine learning and deep learning have achieved good results in intelligent fault diagnosis methods for machinery. However, their ideal and assumed application scenarios present the following characteristics: (1) The samples in the training dataset (source domain) should have the same distribution as the samples in the testing dataset (target domain); (2) During the training phase, there are a large amount of labeled data available. However, due to the continuous operating conditions and dynamic working environment, the above assumptions are not satisfied in real industrial scenarios [
10,
11]. Undeniably, variations in distribution can result in the suboptimal performance of a model trained under certain conditions when applied to different conditions [
12].
Transfer learning presents a promising paradigm for harnessing the acquired knowledge from annotated data in the source domain, thereby enabling the identification of the health status of unannotated data in the target domain. Domain adaptation (DA), as an active and widely embraced subfield within the realm of transfer learning, has garnered significant attention in its application towards enhancing the diagnostic efficacy of intelligent fault diagnosis techniques, specifically in the context of cross-domain scenarios. Within the domain of intelligent fault diagnosis, two commonly employed methods for domain adaptation have emerged as prominent solutions: robust regularization-based [
13,
14,
15,
16,
17,
18] and domain adversarial-based [
19,
20,
21]. In the first one, variance measures such as the maximum mean difference [
22], Wasserstein [
23], and CORAL [
24] are embedded into the objective function, allowing the network to learn domain-invariant features during the training process. The second method carefully designs an adversarial framework with a discriminator to blur domain distinctions, thereby capturing transferable features [
25].
Transfer learning techniques have been extensively applied in industrial fault diagnosis and have achieved certain results. Nonetheless, prevailing methodologies often assume an identical label space for both the source and target domains, i.e., . In industrial applications, collecting a complete dataset is often a challenging task. As a result, a more realistic and challenging scenario arises where only a subset of classes from the target domain exist in the source domain, i.e., . This particular scenario can be referred to as partial transfer fault diagnosis (PTFD). In the partial transfer learning problem, outlier data, which exist in the source domain but not in the target domain, will have a negative impact on the transfer process.
Research on partial transfer problems is emerging in intelligent fault diagnosis. In their study, Zhao et al. [
25] introduced a weighted adversarial comparison approach for mitigating the impact of irrelevant source samples and reducing disparities in edge distributions in the context of partial domain fault diagnosis. Jiao et al. [
26] presented a novel approach, known as the multi-weighted domain adversarial network (MWDAN), to address the challenges of PTFD. This approach encompasses the simultaneous incorporation of class-level and instance-level weighting mechanisms, enabling the discrimination of the label space and providing a quantitative measure of data sample transferability. Zhao et al. [
27] introduced a novel strategy called the multi-discriminator deep weighted adversarial network (MDWAN) approach. This approach incorporates a weight function that quantifies the contribution of source domain samples to both domain discriminators and classifiers.
Undoubtedly, several transfer methods highlighted above emphasize the utilization of weighting techniques to selectively transfer knowledge from the source domain. However, it is essential to acknowledge the potential risks associated with relying solely on classifiers or discriminators to exclude outliers. This precautionary approach is necessitated by the fact that variations in domain distributions, arising from alterations in operating conditions and environmental noise, may lead to erroneous estimations of the target label space. Therefore, it is imperative to consider these uncertainties when implementing knowledge transfer techniques.
In this paper, the authors propose a Multi-scale and Multi-view domain adversarial network specifically designed for scenarios involving partial domain adaptation learning, where the target domain has fewer categories than the source domain. To address the challenge of strong non-linear relationships in marine diesel engine data, the authors propose a deep separable convolutional (DSC) neural network model for feature extraction. This model effectively captures complex patterns and reduces computation by utilizing separable convolutions. To enhance domain adaptation, the authors introduce an auxiliary domain discriminator learning strategy. This strategy helps identify and filter outlier source samples, promoting the positive transfer of shared samples. By using the auxiliary domain discriminator, similarity weights between the source and target domains can be measured, facilitating effective knowledge transfer. Furthermore, the paper employs two classifiers with different viewpoints to predict the same sample. This approach aims to map the original data to a more suitable feature space, ultimately improving the quality and discrimination ability of the extracted features. In summary, the proposed Multi-scale and Multi-view domain adversarial network combines the use of deep separable convolutional neural networks, auxiliary domain discriminator learning, and multiple classifiers to tackle the challenges of partial domain adaptation learning. This approach enhances the efficiency and effectiveness of knowledge transfer and feature extraction in such scenarios.
This paper introduces several key ideas and contributions:
The introduction of a novel network called MMDAN to address the partial set domain adaptation problem in marine diesel engine fault diagnosis. MMDAN overcomes the limitation of having an identical label space between the source and target domains.
A DSC-based multi-scale feature extraction method is proposed as part of MMDAN. The method replaces traditional convolutional layers with DSC layers, which not only extract useful information effectively but also reduce the number of parameters and computational effort. The inclusion of a residual connection layer further enhances the extraction accuracy.
The paper presents a multi-scale classifier strategy where the same sample is predicted using two classifiers with different viewpoints. These classifiers have different weights and predict the same sample from distinct perspectives. Agreement between the predictions of the two classifiers confirms the applicability of the features extracted by the shared feature extractor.
The proposed auxiliary discriminator learning strategy involves the use of an additional discriminator to quantify the transferability of source domain samples. This auxiliary discriminator helps identify and filter outlier samples, reducing distribution differences between the source and target domains. The partial set domain adaptation (PSDA) approach, incorporating the auxiliary discriminator, demonstrates improved diagnostic performance, even when test data are contaminated with noise.
These contributions collectively enhance the effectiveness and robustness of fault diagnosis in marine diesel engines, particularly in scenarios involving partial set domain adaptation and noisy test data. As for the remaining parts of the paper, they are arranged as follows:
Section 2 gives an overview of transfer learning theory. This section covers fundamental concepts and methodologies related to transfer learning that form the basis for the proposed approach.
Section 3 thoroughly presents the key components and steps of the proposed framework for fault detection and diagnosis. It describes the overall architecture and components of the MMDAN network, including the multi-scale feature extraction method, the multi-scale classifier strategy, and the auxiliary discriminator learning strategy. This section provides a comprehensive understanding of the proposed model’s design and functionality.
Section 4 evaluates and compares the performance of the proposed model to other existing algorithms or approaches. This comparison is conducted using relevant datasets or experiments to highlight the superiority and effectiveness of the MMDAN network in addressing the PSDA problem in marine diesel engine fault diagnosis. Finally,
Section 5 concludes the paper by summarizing the contributions and implications of the research.
3. Proposed Method
In the PSDA problem, the source category is asymmetric to the target category. As a result, the source label space can be naturally divided into two subsets: the shared space (which is identical to the target label space) and the outlier space (which differs from the target label space).
In this paper, a Multi-scale and Multi-view domain adversarial network-based industrial process partial set fault detection algorithm is proposed, and the network structure is shown in
Figure 2. Specifically, MMDAN consists of Multi-scale Feature Extractor
G, Multi-view Classifier
C, Auxiliary Domain Discriminator
D1, and Domain Discriminator
D2, and the parameters are shown in
Figure 3.
3.1. Feature Extractor Structure G
The feature extractor G employs a multi-scale feature extraction module, consisting of two DSC layers with distinct weights, to extract multi-view features from two domains. This approach replaces conventional convolutions with DSCs, which have a smaller parameter count. Consequently, the model complexity is effectively reduced, leading to shorter training times. Furthermore, the integration of DSC blocks with different parameters enables the multi-scale feature extraction strategy. This strategy facilitates the extraction of multi-scale features that encompass a richer set of diagnostic information, along with complementary insights, by combining the outputs of these DSC blocks.
As illustrated in
Figure 4, there is a data segment comprising K channels, the width and height of the convolution kernel are
Dm and
Dn, respectively, and the parameters of this part are
Dm ×
Dn × K; then, after point-by-point convolution, the size of the convolution kernel is 1 × 1 × K. If we want N feature maps, then the parameters of point-by-point convolution are K × 1 × 1 × N. The parameters of DSC
PDSC are
Dm ×
Dn × K + K × 1 × 1 × N. The parameters of ordinary convolution are
Dm ×
Dn × K × N. The optimization equation for both is shown in the following equation (the computation of the multiplication method is ignored since it is much larger than that of the addition method):
As demonstrated by Equation (1), the computational efficiency of DSC exceeds that of traditional CNNs. This is primarily due to the reduction in the number of parameters computed, while still maintaining satisfactory prediction accuracy. Thus, the key advantage of DSC, as an optimized variant of CNN, lies in its superior computational efficiency and its ability to effectively reduce the parameter count within adversarial networks.
Meanwhile, in order to effectively suppress the overfitting phenomenon, the maximum pooling layer and the average pooling layer are used to reduce the feature dimension, the attention layer helps to eliminate redundant information, and the residual connection is designed to prevent gradient degradation and information loss during the training process and facilitate feature extraction at different levels. The feature
f can be represented as
where
indicates the weight matrix and
is the bias vector, and
x is the input vector of the last pooling layer.
3.2. Auxiliary Domain Discriminator D
In the context of the PSDA, the coexistence of distinct label spaces in the source and target domains, denoted as Ct ⊆ Cs, presents a challenge for direct adversarial training implementation. The presence of outlier classes in the source domain negatively impacts the test performance, as it implies that certain source outlier classes may not have corresponding classes in the target domain. Consequently, it becomes vital to disregard these outlier classes during the domain adaptation process. Nonetheless, this presents a challenge, since the target domain data lack supervision, making it difficult to determine the presence of faults within the target domain. To address this issue, a logical approach is to align known class samples between the two domains while disregarding the target outliers during domain adaptation. To achieve this, a similarity learning strategy is proposed. This strategy aims to quantify the contribution of each source class, aid the domain discriminator
D1 in estimating the similarity between target samples and source samples, and assign similarity weights to each target sample. Additionally, a secondary game is introduced between the domain discriminator
D2 and the feature generator
G to explore the transferable features. Given that the label spaces of source outliers and target outliers are distinct, it is expected that the target data will exhibit significant dissimilarities compared to the outlier class data. Therefore, the probability of assigning the target data to the outlier class should be minimized. Accordingly, the error in domain label classification can serve as a reliable indicator of similarity. In this study, the similarity of the target sample is defined by Equation (6) for further analysis and evaluation.
where
denotes the ground-truth label of the target domain, and in addition, the min–max normalization method is used to show the relative similarity of different target samples.
where
represents the similarity weight assigned to the
j-th target sample, and
ε denotes a small positive value. This formulation ensures that the similarity weights assigned to the shared classes surpass those assigned to the outlier classes. Given the limited contribution of outlier class samples, domain adaptation can be selectively executed solely on the shared classes between the domains, thereby avoiding any detrimental negative transfer and promoting positive transfer. It should be noted that the auxiliary domain discriminator
D1 does not partake in adversarial training alongside the feature generator. Consequently, the loss function, denoted as
, exclusively updates the parameters
.
Let
and
represent the defined domain labels of the
i-th source sample and the
j-th target sample, respectively. In adversarial learning, the target samples undergo min–max normalization, and similarity weights are then assigned to them in order to minimize the disparity in domain distributions. As such, the objective of weighted adversarial learning can be formally defined as follows:
D1 distinguishes the importance of different failure classes and extracts quality domain-invariant representations specifically for the shared classes. In the case of samples originating from outlier source classes, their respective loss functions are assigned relatively smaller weights, thereby minimizing their impact during training. On the other hand, for samples originating from cross-domain shared classes, a higher weight is assigned to the target loss function. The intention behind this is to mitigate domain bias and facilitate a better alignment between the source and target domains for the shared labeled samples. By doing so, a more effective matching between the two domains is achieved. The domain discriminator D2 plays a minimax game with the feature generator G to explore transferable features.
3.3. Multi-View Classifiers C
Within this section, a novel multi-view predictive adversarial network is presented, encompassing two modules with distinct views within the classifier component. While the samples from the source domain possess sufficient labels, those from the target domain lack such annotations. Consequently, the enhancement of accuracy for unlabeled target domain samples through gradient descent becomes unfeasible. In order to bolster accuracy within the target domain, two classifiers are devised, each adopting a unique perspective on a given sample. The weights of these classifiers are constrained to guarantee their divergent viewpoints. Specifically, y1 and y2 denote the categories with the highest probability of prediction. It is posited that samples lacking labels can be accurately predicted when both classifiers concur on the prediction, i.e., y1 = y2.
In the case of the source domain, where a substantial number of labels are available, the network can be trained using traditional supervised learning methods. The commonly employed metric for such training is the cross-entropy loss, which can be expressed as follows:
where
is the consistent prediction loss, and
denotes the number of source domain data.
and
denote two different classifiers.
Under the guidance of source domain supervision, the shared feature extractor and the two distinct classifiers are updated iteratively until achieving accurate classification. This is accomplished through the alignment of multi-view predictions and auxiliary domain adversarial training.
To enhance the generalization capability of the shared feature extractor and the two classifiers within the target domain, Mean Squared Error (MSE) loss is typically employed as an indicator of prediction consistency. Mathematically, MSE loss can be expressed as follows:
where n is the number of source domain data, and
and
denote the predictions of two different classifiers.
To ensure the accurate prediction of target samples, we aim for classifiers C
1 and C
2 to classify samples based on different viewpoints. Therefore, we impose a constraint on the weights of C
1 and C
2 to be distinct, allowing the two classifiers to predict the same sample in the shared feature space from different perspectives. In the cost function, we include the term
, where W
1 and W
2 represent the weights of the fully connected layers in C
1 and C
2, respectively. Mathematically, this loss for Multi-View constraints can be expressed as follows:
3.4. The Overall Function Optimizes the Object
In summary, the overall objectives of the proposed method are:
The network parameters are trained to achieve:
where
α,
β,
γ, and
δ respectively are the coefficients of
and
. The optimization process utilizes the Stochastic Gradient Descent (SGD) algorithm and incorporates a GRL during the backpropagation process.
The complete diagnostic procedure of the proposed MMDAN can be summarized as follows.
Step 1: Before using the samples from the source and target domains as inputs to the network, the data undergo preprocessing, which typically includes normalization to ensure the data are within a specific range. This normalization step helps to standardize the input data and ensure that they have similar scales or distributions. Additionally, the samples from the source and target domains are divided into separate sets for training and testing. This division is crucial for evaluating the performance of the MMDAN model accurately and avoiding overfitting.
Step 2: During this step, the MMDAN model performs feature extraction using Equation (5), incorporates an auxiliary discriminator D1 to highlight the importance of source domain samples, calculates the losses of both D1 and D2 using Equations (8) and (9), respectively, and employs multiple classifiers with different views to predict the classes based on the shared features extracted by Gf.
Step 3: The optimization objective, as defined by Equation (13), combines the various loss terms and incorporates Equation (12) to promote the incorporation of different views from the two classifiers during the training process. This optimization objective guides the learning process of the MMDAN model towards achieving better performance and adaptation in the target domain.
Step 4: After the model training is complete, the samples from the target domain test set are fed into the shared feature extractor, and the average of the predictions from the two classifiers is taken to generate the final prediction result for these samples.