1. Introduction
The sucker rod pump system (SRPS) is widely employed in the field of oil exploitation [
1,
2]. Unfortunately, due to continuous operation and harsh working environments, some components risk degradation and inevitable failure, e.g., traveling valve leakage, parting rod, etc. [
3]. The common approaches to monitoring SRPSs rely on the dynamometer card (DC), which is measured by the load sensor installed on the “horse head” [
4]. These DC-based methods inevitably suffer from high maintenance cost and low detection frequency, resulting in poor performance in terms of the real-time diagnosis of SRPSs.
Since the whole SPRS is driven by an electric motor, motor power curves (MPCs) show advantages in accessibility and real-time performance and have drawn increasing attention for use in the diagnosis of faults in the SPRS. The authors of [
5,
6] distilled features from MPCs and diagnosed faults with improved hidden conditional random fields. A condition-monitoring system based on motor power was proposed in [
7]. Despite the fact that these methods have achieved outstanding performance, their successful application relies on abundant labeled data, which is an assumption that does not always hold due to the fact that MPC-based research is still in its initial stages. Some research seeks to expand MPC data by transforming the readily available DCs [
8,
9]. However, these approaches assume that the transformed and actual MPCs share an identical distribution, which is an assumption that does not hold due to the inevitable idealization and simplification that occurs in the transformation [
10]. The distribution discrepancy, in turn, causes a serious performance degradation in the diagnosis [
11]. Therefore, this paper explores how to leverage readily available DCs to achieve MPC-based diagnosis with respect to the distribution discrepancy.
Domain adaptation (DA) is famous as an efficient approach to mitigating domain discrepancy by extracting domain-invariant features [
12,
13,
14]. Traditional DA employs the Maximum Mean Discrepancy (MMD) term as the discrepancy penalty in order to extract the domain-invariant features [
15,
16,
17,
18]. Another strand of research explores a domain discriminator to align the distributions in an adversarial manner. The authors of [
19,
20] used a one-dimensional convolutional neural network (1-D CNN) and a double task-specific classifier to learn domain-invariant features for fault diagnosis. A domain discriminator and MMD were exploited together via ensemble learning for feature extraction in [
21,
22]; they extended the adversarial network to the joint adaptation network to mitigate the distribution discrepancies in both the label and feature spaces.
In the absence of target labels, the aforementioned approaches have been demonstrated to be very effective in terms of intelligent diagnostics. Nevertheless, these methodologies restrict two domains to sharing an identical label space, which does not always hold in actual industrial applications [
23]. A more general scenario is that some of the categories from the source domain do not appear in the target domain; this is referred to as partial domain adaptation (PDA) [
24]. Following the objective of the standard DA methodologies, the outlier categories of the source domain are also forced to align with the target domain, which in turn causes negative transfer and misclassification [
25]. A promising solution is to assign different weights to the instances that belong to different categories. The authors of [
26,
27] designed a multi-class adversarial loss to align instances in the shared label space. Two attention matrices were constructed to guide the model to diminish the distribution discrepancy as well as avoid negative transfer in [
28]. In [
29], a multi-discriminator was presented to pair the instances with the same machine conditions. The authors of [
30] designed class-level and instance-level weights to alleviate the domain shift problem.
The aforementioned state-of-the-art PDA methodologies mainly focus on marginal distribution alignment in the shared label space. In this paper, we propose a conditional distribution-level weighting strategy and integrate it with the class-level weighting strategy into an adversarial approach to further mitigate the negative transfer when the categories of actual MPCs do not cover the categories of readily available DCs. The proposed weighting methodology comprehensively evaluates the weights of the samples in the source domain and seeks to further reduce the instance weight of outlier categories and highlights the instances of shared categories. Moreover, a 1-D CNN construct is conducted as the backbone of the feature generator network to extract domain-invariant features from the time-series data. The main contributions of this paper can be summarized as follows:
We construct an MPC dataset containing six categories of working conditions by conversing the DC instances with a mathematical model. The rationality of the converted MPCs is proved by comparison with the actual MPCs.
We propose to incorporate a class-level and conditional distribution-level weighting learning strategy into the adversarial domain adaptation to narrow down the discrepancies between converted and actual MPCs.
Extensive experiments are carried out on the MPCs collected by self-developed portable devices in the practical application scenario. The results demonstrate that the proposed methodology outperforms five other state-of-the-art methods in terms of diagnostic accuracy and distribution alignment.
The remaining parts of this article are organized as follows. The mathematical model for converting the DCs to MPCs is surveyed in
Section 2.
Section 3 describes the proposed diagnostic methodology.
Section 4 shows the effectiveness of the proposed method through experimental verification. Finally,
Section 5 concludes this article.
3. Diagnostic Methodology Based on Converted Motor Power Curves
To address the above dilemma, a novel partial DA-based methodology is proposed in this section to narrow down the distribution discrepancy of the converted and collected MPCs. Overall, the methodology is built upon an adversarial DA architecture. Specifically, it employs conditional distribution-level and class-level weights to prevent the negative adaptation caused by the label space mismatching.
3.1. Problem Formulation
To clearly investigate the fault diagnostic problem, several symbols and concepts are first introduced. The MPCs that are converted by labeled DCs are denoted as the source domain containing instances associated with categories of working conditions. The collected MPCs are denoted as target domain containing unlabeled samples with categories, where . can be divided into shared label space and outlier label space .
Due to the distribution discrepancy () caused by mathematical error, the diagnostic model trained on is usually not tailored for . Typical DA-based methodologies aim at exploring domain-invariant features to bridge the distribution discrepancy. Nevertheless, the negative transfer effect caused by seriously degrades performance. The goal of this section is to design a partial DA-based diagnostic methodology to discriminate and leverage the knowledge learned from to promote the diagnosis of unlabeled .
3.2. Network Architecture
The network architecture of the proposed diagnostic methodology is illustrated in
Figure 3. Overall, the methodology consists of a feature generator network
parameterized by
, a label classifier
parameterized by
, a domain classifier
parameterized by
, a class-level attention matric
, and a conditional distribution-level attention metric
. To benefit from its excellent nonlinear characterization capabilities for the time-series signal, a 1-D CNN is selected as the backbone of the
. Concretely,
is implemented with three 1-D CNNs and one fully connected layer (FC). The
and
consist of one and two FCs, respectively. Moreover, batch normalization (BN) and dropout techniques are adopted to improve the generalization ability. The detailed parameters and architecture are given in
Figure 4.
3.2.1. Domain Adversarial Learning
Inspired by GAN, DA-based methodologies bridge the gap between two domains so that the classifier trained on the source domain generalizes well to the target domain. In the traditional DA stage,
is constructed to distinguish the features of the source and target domains, whereas
is trained in a min-max adversarial training game to make the discriminator wrongly recognize the input of the source and target domains. The instances of
and
are fed into
to extract the domain-invariant features. Then, the features of the source domain are sent to
to ensure that it contains category information. What is more, the features of the source and target domains are contrasted with the domain classifier
to avoid retaining the characteristics of domains. Training such a DA-based methodology is equivalent to calculating the following optimization formula
where
denotes the cross-entropy loss function,
denotes the domain label, and
is a regularization parameter. As demonstrated in the first half of the optimization formula,
and
are updated to minimize the label loss of source samples to avoid losing label-related information during feature extraction. In the other half,
is learned to maximize the domain loss so that the extracted features are as irrelevant as possible to the domain, while
is trained to minimize it so that the extracted features contain more information for diagnosing class labels.
3.2.2. Weighting Learning Strategy
Theoretically, the domain shifts can be diminished by optimizing the above-mentioned objective. However, as intuitively illustrated in
Figure 2, pure DA-based methods are prone to performance degeneration or even to misdiagnosis due to the outlier classes. The proposed weighting learning strategy aims to modify the above methodologies by incorporating weight metrics to indicate shared and outlier categories; this focuses on instances from
and downweights instances from
. The proposed weighting strategy is expected to add benefits to pure DA approaches from two perspectives: class-level weighting
and conditional distribution-level weighting
. Both represent the weights of source instances.
The output of
can provide a category probability of the input features. Since the label space of
is disjoint with
, the feature values of
corresponding to the categories of
are relatively small. Therefore, we calculate the output probabilities of all the instance features of
as follows:
where
denotes the output probabilities of the ith instance in
assigned to
jth category. To mitigate the influence of randomness and a few mistakes, the average of the label predictions on all target data is used as the class-level weight as follows:
The class-level weight is further normalized as to demonstrate the relative importance of the classes. The weights associated with the outlier classes are expected to be much smaller than those of the shared classes, mainly because the target samples are significantly dissimilar to the samples belonging to the outlier classes.
In contrast to
,
is a distribution-level weight estimated by the conditional distribution discrepancy. Inspired by the index MMD that has been widely applied in transfer learning attributable to its superior capacity for characterizing distribution similarity, this paper exploits the MMD as the metric to measure the distribution discrepancy of each category as
where
represents the Reproducing Kernel Hilbert Space and
. It is reasonable that the
corresponding to shared categories is likely to be much less than that within outlier categories. Therefore, on the basis of the MMD metrics,
can be formulated as
Similar to , also needs to be normalized as .
Benefiting from and , more attention is paid to shared conditions, and the negative transfer caused by outliers will be mitigated.
3.3. Overall Objective and Training
In the training process, the proposed diagnostic methodology diminishes the distribution discrepancy of the source and target domains. Meanwhile, in order to avoid the negative transfer caused by the source outlier instances, this methodology applies the trained classifier and conditional MMD index to estimate the weight of the source data. According to Equation (
7), the final objective loss function is summarized as
where
and
indicate loss functions of the label classifier and domain classifier, respectively. Generally, minimizing
encourages the classifier to produce vectors with one dominant element denoting the label of the samples. This, in turn, enhances the performance of the feature extractor and helps to learn more transferable features for classification. Moreover,
and
are incorporated to highlight the importance of samples belonging to the shared classes. Meanwhile, the domain classifier is trained to minimize the
to learn domain-invariant feature representations in order to confuse
. What is more, a gradient reversal layer (GRL) [
32] is placed before the domain classifier to receive the gradient of
by multiplying a negative factor. The network is updated by employing the adaptive moment estimation optimizer (Adam) with the learning rate
that is set to 0.001. The parameters
,
, and
are updated simultaneously at each step as
4. Industrial Experiments
In this section, we validate the performance of the proposed methodology with a set of DCs and MPCs collected from wells with the same mechanical parameters. Firstly, the MPCs converted from the DCs are evaluated by mechanistic analysis in comparison with actual MPCs and some diagnostic experiments. Meanwhile, we compare the diagnostic methodology with several other popular approaches to demonstrate the effectiveness of the improvement in partial transfer scenarios.
4.1. Data Collection
The experimental platform is illustrated in
Figure 5. The DCs are collected by the load and displacement sensors installed on the “horse head” over a long period of time. The voltage and current of the inverter are collected and then calculated by the chip ATT7022B to obtain the MPCs. The obtained MPCs are analyzed and stored in the intelligent automatic metrical apparatus, after which the real-time diagnostic results and historical data can be viewed through the cell phone APP.
After long-term practice, 300 groups of MPCs are collected from seven oil wells with the same mechanical parameters, as shown in
Table 1. In particular, it should be noted that these 300 instances do not contain the working condition of gas locking. However, some DCs of gas locking are collected for inverting. All simulations are implemented in the MATLAB and Pytorch frameworks and conducted on a workstation with a Core i7-9700K
[email protected] GHz and a GTX2080TI GPU with 11 GB memory.
4.2. Validation of the Converted Motor Power Curves
Combined with the mechanism characteristics, the analysis of the converted and actual MPCs under different working conditions is summarized as follows:
Normal working condition : Since the wells are well filled, the loads of the upstroke and downstroke are relatively balanced under the influence of the crank and unbalanced weight, resulting in similar peaks for the upstroke and the downstroke in the MPC.
Traveling valve leakage : Due to the leakage, the oil in the sucker rod leaks into the pump during the upstroke, delaying the increase of the pressure and opening the standing valve. Therefore, the power of the upstroke is less than the normal working condition, and the first peak shifts to the left.
Insufficient liquid supply : Due to the insufficient supply capacity of the oil layer, the oil cannot fill the chamber during the upstroke. During the downstroke, the oil in the sucker rod falls quickly into the chamber when the traveling valve is opened, which reduces the system load. The load increases rapidly when the plunger hits the oil interface, resulting in double peaks in the power curve. The average value of the MPC is usually lower than normal power.
Gas-affected : Similar to condition , due to the superabundant dissolving gas, the oil fills the chamber, resulting in lower average power in the downstroke. Departing from the double peaks in , the superabundant gas acts as a buffer when the traveling valve is open. Thus the power change is relatively smooth, without a rapid increase.
Gas locking : This is a special case of . The gas in the chamber makes the pressure insufficient to open the standing and traveling valves, so the oil cannot be adequately discharged. Since no oil is taken to the ground, the motor power curve has negative values during the downstroke due to the gravity of the oil in the sucker rod.
Parting rod : The motor load is mainly caused by the crank and the weight of the rod above the breakpoint. During the upstroke, the energy stored in the crank is more than the requirement to uplift the remaining rod, resulting in the apparent negative power in the MPC.
As illustrated in the above figures, the characteristics embodied in the converted MPCs under different working conditions conform to the mechanism analysis, and the converted and actual MPCs have similar trends.
In order to assure a more in-depth validity of the quantitative analysis, we select 100 samples of DCs for each working condition and convert them to the MPCs to diagnose 300 instances of actual MPCs. We conduct experiments with two different diagnostic frameworks based on CNN and 1-D CNN to verify the effectiveness of the converted MPCs from the two perspectives of time-series and image, respectively. The diagnostic results are presented in
Figure 12. It can be observed that the diagnostic accuracies achieve
and
. It reveals that both approaches achieve overall satisfactory performance. In particular, the 1-D CNN outperforms the CNN in terms of diagnostic performance, implying that the time-series-based approach is more applicable to the MPCs than analyzing the curves as pictures. Nevertheless, these conventional diagnosis methods do not satisfy industrial demands, owing to distribution discrepancies. What is more, these methods also do not remove the interference of the outlier classes.
4.3. Diagnosis Based on Partial Domain Adaptation
In this section, the proposed PDA-based diagnostic method is employed to minimize the distribution discrepancy across domains in practical application scenarios. Firstly, we investigate the convergence performance of the proposed method to ensure that the improved portions do not undermine its overall stability.
Figure 13 plots the loss and accuracy curves with respect to training iterations. From these results, we can observe the classifier loss decreases rapidly and converges to 0. The adversarial loss oscillates reciprocally, which demonstrates that the domain classifier and the label classifier progress together and maintain relative balance. What is more, the accuracy curves ultimately approach 1, which further illustrates that the method can mitigate negative transfer efficiently and stably.
Furthermore, we conduct an ablation study to examine the efficiency of each component of the weighting scheme. By reducing the weighting strategy in the model, we investigate four variants of the proposed method: the DA-based method without the weighting learning strategy, the DA-based method with class-level weighting, the DA-based method with conditional distribution-level weighting, and the proposed method containing both class-level and conditional distribution-level weighting. The results shown in
Figure 14 demonstrate: (1) both the class-level weighting and conditional distribution-level weighting are essential for the good performance of the proposed method; (2) despite the fact that the conditional distribution-level weighting is inferior to class-level weighting, it can further enhance the performance of class-level weighting.
For a more comprehensive evaluation, we compare the proposed methodology with representative DA and PDA baselines, including 1-D CNN, DANN [
19], CIDA [
33], WATN [
26], and MWDAN [
30]. We show the comparison results in
Table 2 and demonstrate visual insights into the distribution discrepancy of distilled features with t-distributed stochastic neighbor embedding (t-SNE) [
34] in
Figure 15. In particular, the following results are the average of five rounds of experiments with random data splitting in each round. The training data and test data for each algorithm are identical.
From
Table 2 and
Figure 15, some results can be clearly determined. From
Figure 15a, we observe that the source samples under various working conditions are aliased together and, thus, are difficult to distinguish. As shown in
Figure 15b, the samples under different working conditions demonstrate a tendency to be gradually separable, but the aliasing phenomenon is still serious. As
Figure 15c demonstrates, with the help of domain adaption, the samples from the same category are clustered, while the samples from different categories are separated. Nevertheless, we find that many target data are aligned with the source-only samples, resulting in low overall accuracy. With the help of the class-level weighting strategy in CIDA and the weighting discriminant network in WATN, the distribution alignment is improved significantly, and fewer target samples are close to the outlier category in
Figure 15d,e. As shown in
Figure 15f, by using both a class-level weighting strategy and a weighting discriminant network, MWDAN further separates the categories and achieves satisfactory classification performance in both the source and target domains. It shows that the superposition of these two weighting strategies can achieve better results. Comparing
Figure 15f,g, we observe that the outlier classes can be better identified, and less classification confusion is obtained with the help of the proposed methodology. This reveals that the proposed conditional distribution-level weighting strategy outperforms the weighting discriminant network and can better cooperate with the class-level weighting strategy to inhibit the negative effect of irrelevant instances. Overall, these results illustrate the superior performance of the proposed methodology in handling the practical PDA problems in SRPSs.
5. Conclusions
This article proposes a PDA-based diagnostic methodology to exploit readily available DCs to implement intractable unsupervised MPC-based diagnosis in the SRPS. The proposed methodology constructs a mathematical model to convert DCs to MPCS under six working conditions. Meanwhile, a novel adversarial domain adaptation method is adopted to diminish the distribution discrepancy between converted and actual MPCs as well as to avoid negative transfer. In particular, we superpose the class-level and conditional distribution-level weights to mitigate the negative transfer caused by the inconsistent label space of MPCs and DCs. We conduct several experiments on a set of actual MPCs collected by self-developed devices for fault diagnosis. The MPCs converted by the model proposed in this paper have the same trend as the actual power, and the 1-D CNN model trained with the converted power can achieve an accuracy of in the actual MPC dataset. When leveraging the labeled converted MPCs to diagnose the actual unlabeled MPCs, the accuracy of the proposed PDA-based methodology can reach and outperform other relevant methods.
Although this diagnostic methodology achieves outstanding progress in the MPC-based diagnosis of SRPSs, there are additional works worth exploring in the future, e.g., the mathematical model of the motor and gearbox needs further refinement, the backbone of the feature generator network can be re-attempted with recurrent designs, such as RNN or LSTMs, etc. Moreover, more experiments need to be conducted on other oil wells to make the model more generalized.