1. Introduction
As an agricultural powerhouse, China feeds nearly 20% of the world’s population with only 9% of the world’s arable land [
1]. Against this backdrop, the level of agricultural mechanization in China has continuously improved alongside the rapid development of the national industrial level [
2]. Agricultural mechanization plays a crucial role in promoting agricultural modernization and sustainable development, making intelligent fault-diagnosis research in agricultural mechanization especially important [
3]. Agricultural machinery is widely used in various stages of modern agricultural production, including sowing, fertilization, tillage, and harvesting [
4]. Permanent magnet synchronous motors (PMSMs) are key power components in agricultural machinery. Due to their advantages of high power density, high efficiency, and excellent control performance, they are widely applied in sowing machines, harvesters, seeders, electric tractors, spraying equipment, and tillage [
5]. However, the harsh working environments faced by agricultural machinery—such as extreme temperatures, humidity, corrosion, and dust—along with complex and variable operating conditions—including multimodal vibrations and shocks, fluctuating loads, high loads, frequent starts and stops, and overload operations—pose significant challenges to the durability and safe operation of PMSMs, leading to faults [
6]. Common faults in PMSMs include short circuit faults, mechanical faults, and permanent magnet failures. Among these, an ITSC fault is one of the most common short circuit faults in PMSMs [
7]. These faults not only pose significant hazards but are also difficult to detect. The occurrence of ITSC faults creates a new closed loop at the short circuit point, generating a large fault current. This not only disrupts the magnetic field distribution in the air gap, increasing motor vibrations but also produces a considerable amount of heat, further threatening the insulation of nearby windings. If not detected and addressed promptly, this can lead to further deterioration of the fault severity, potentially causing the motor and agricultural machinery to lose control, resulting in serious losses [
8]. Early diagnosis of ITSC faults not only helps protect equipment and improve production efficiency but also reduces costs and ensures safety, making it highly significant.
Existing fault-diagnosis approaches mainly focus on model-based methods and data-driven approaches [
9]. Model-based methods are developed based on the analysis of signal features in different domains, primarily including time domain features, frequency domain features, and time-frequency domain features. These methods demonstrate high accuracy when agricultural machinery operates under stable conditions; however, their effectiveness is often limited in dynamic operating conditions. Data-driven methods are typically used for more complex agricultural machinery fault-diagnosis problems. They rely on machine learning (ML) algorithms to achieve fault recognition and classification. Commonly used algorithms include artificial neural networks (ANN), random forests (RF), extreme learning machines (SVM), and support vector machines (SVM). Nevertheless, these methods share a common drawback: the effectiveness of the algorithms largely depends on the quality of the extracted fault features, which are typically manually extracted. This process requires not only a strong background in expertise but also involves a degree of subjectivity, and it can be time-consuming.
Deep learning models possess the self-learning capability of distributed features, enabling them to automatically abstract and extract the relationships and hierarchical structures among vast amounts of data [
10]. In the application of fault diagnosis, deep learning can achieve end-to-end feature extraction and fault classification, effectively overcoming the drawbacks of the aforementioned methods [
11]. Xu et al. addressed identifying and diagnosing faults in a tractor’s transmission system [
12]. They proposed a fault-diagnosis model that combines transformer networks and time generative adversarial networks (Time GANs), achieving high accuracy in fault diagnosis. Xie et al. applied deep learning to the fault diagnosis of rolling bearings in agricultural machinery [
1]. Their method employs energy spectrum and singular value decomposition for noise reduction of the vibration signals and then combines ResNet and Vision Transformer to achieve fault diagnosis of the bearings. Lee et al. proposed a method for diagnosing ITSC faults using current signals and rotational speed signals. This approach employs a recurrent neural network (RNN) for fault-feature extraction and incorporates an attention mechanism to assess the severity of ITSC faults [
13]. Zhu et al. proposed an intelligent fault-diagnosis method based on principal component analysis (PCA) and deep belief networks (DBNs), which, according to experimental results, is more effective and easier to implement compared to other approaches [
14]. Zhu et al. applied a novel capsule network for bearing fault diagnosis, integrating deep learning and short-time Fourier transform (STFT) to convert one-dimensional signals into time-frequency maps, with validation results demonstrating that this model outperforms traditional methods in terms of generalization ability [
15]. Husari et al. proposed a hybrid model architecture for diagnosing early ITSC faults, which uses current as input, employs a CNN for feature extraction, and utilizes long short-term memory (LSTM) networks and gated recurrent units (GRUs) for fault diagnosis and severity identification, achieving an accuracy exceeding 97%, thereby outperforming single deep learning models [
16,
17].
From the above analysis, it is evident that although deep learning-based methods for ITSC fault diagnosis have shown promising results, such as fault recognition accuracy exceeding 97%, an examination of the confusion matrix on the validation set reveals that the model still faces issues related to false alarms and missed detections [
18]. This indicates that there is still significant room for improvement in the model’s feature-extraction capability. In motor fault diagnosis, a single type of signal often fails to capture all the characteristic information about the motor’s operating condition. Sufficiently rich features are a necessary condition for achieving high recognition accuracy in fault diagnosis [
19]. Recent studies have shown that diagnostic systems using multi-sensor resources and sensor fusion technologies can provide superior and more robust diagnostic results. Based on this analysis, to further improve the accuracy of ITSC fault diagnosis and reduce the model’s false alarm and missed detection rates, this paper proposes a CNN model based on multi-source data fusion for identifying the severity of ITSC faults. This method employs both current and vibration signals for feature learning to enhance the richness of the fault-feature space and uses the fused features for subsequent fault diagnosis to improve the accuracy of the model’s fault classification. The main contributions of this paper are summarized as follows:
- (1)
An indicator suitable for early-stage ITSC fault-severity analysis is derived from the equivalent circuit. This indicator cannot be directly used for fault diagnosis of ITSC, but it can serve as a guide for setting the fault severity during the experimental process.
- (2)
A feature-level multi-source data-fusion algorithm based on CNNs is proposed. This algorithm employs Bayesian optimization for model hyperparameter tuning, fusing current and vibration signal features to enhance the richness of the fault-feature space, thereby improving the accuracy of ITSC fault-severity identification.
- (3)
A signal synchronization method is proposed to construct a synchronized signal dataset for current and vibration signals. By calculating the maximum cross-correlation of synchronized signals collected by two devices, synchronization of the current and vibration signals acquired by both devices is achieved.
- (4)
The effectiveness of the proposed multi-source data-fusion algorithm is validated through experiments. The experimental results indicate that compared to three other methods, the proposed algorithm not only demonstrates higher training efficiency but also superior model performance, highlighting its advantages.
The remainder of this paper is arranged as follows.
Section 2 analyzes the mechanism of ITSC faults.
Section 3 introduces the details of the proposed algorithm.
Section 4 describes the experimental equipment used and the settings required for simulating fault tests. In
Section 5, the experiment results are presented to demonstrate the superiority of the proposed algorithm. Finally,
Section 6 summarizes this article.
2. ITSC Fault in PMSMs
Estimating ITSC faults is crucial to ensure the safe operation of PMSMs, primarily because it enhances the safety and reliability of the motor, reduces potential safety hazards such as fires, and effectively protects the equipment from damage, thus avoiding costly repairs [
20]. Previous research has lacked effective indicators specifically designed for the early diagnosis of ITSC faults. This paper presents an equivalent circuit model for ITSC faults based on the winding coil structure, and on this basis, derives an indicator to guide the setting of early ITSC fault severity in experiments.
The winding of a PMSM is typically composed of multiple coils arranged in series or parallel. This paper primarily studies the winding structure of multiple coils in series. To ensure the uniformity of the air gap magnetic field, reduce harmonic content, and improve the efficiency and performance of the motor, the winding of the PMSM generally adopts a distributed winding structure. The coils are wound into appropriate shapes and installed in two stator slots with a certain spacing between them. When a coil in a certain slot experiences an ITSC fault, the related turns in the corresponding slot are also affected by the ITSC fault, as shown in
Figure 1.
Figure 1 shows a cross-sectional view illustrating the winding structure of an 8-pole, 36-slot PMSM with an ITSC fault. Each turn of wire within the slot is uniquely marked in the form of Pc-t. For example, A1–4 indicates that the fourth turn of the first coil in the A winding has experienced an ITSC fault. The red section in
Figure 1 indicates the location of the ITSC fault, while the corresponding enlarged view shows the position and number of the shorted wire turns within the slot.
The ITSC fault in the winding can lead to changes in the structure of the faulty phase winding in a PMSM. At the shorted point, the faulty phase winding is divided into a faulty part and a healthy part. Parameters related to the faulty phase winding, such as resistance, inductance, and magnetic flux, will also change accordingly. Additionally, the shorted point will create a new closed loop, if the resistance at that point is low enough, the fault current can increase significantly and generate a large amount of heat. If heat dissipation is not timely, it may cause further damage to the adjacent insulation, exacerbating the severity of the fault. Changes in the winding structure and the presence of fault current can lead to an imbalance in the air-gap magnetic field and introduce higher harmonic components, thereby exacerbating the unbalanced magnetic pull between the stator and rotor, which in turn increases motor vibrations [
21,
22,
23]. Additionally, the disruption of the current balance among the windings results in larger torque fluctuations, further intensifying the motor’s vibration [
24].
Therefore, ITSC faults not only cause changes in the three-phase current but also exacerbate motor vibrations and alter their vibration characteristics [
25]. This means that the fault features are reflected in both the current and vibration signals. The characteristic information contained in different signals varies, and their sensitivity to the motor’s operating conditions is also different [
26,
27]. To enrich the extracted feature space and improve the accuracy of ITSC fault identification, this study employs both current and vibration signals for feature learning, thereby increasing the diversity of the feature space. The fused features are then used for subsequent fault diagnosis to enhance the accuracy of the model’s fault classification.
Assuming that the first coil of winding A experiences an ITSC fault, the schematic of the equivalent circuit model is shown in
Figure 2. From the figure, it can be observed that after the fault occurs, the faulty phase winding is divided into two parts: the yellow part represents the shorted section, while the green part represents the remaining healthy section. The red part indicates the newly formed closed loop at the short circuit point and its fault current. In the fault current loop, the fault phase current
ia is divided into two components: one is the current
if flowing through the fault resistance
Rf, and the other is the current
ia −
if flowing through the shorted winding.
Let
Ns be the number of turns that are shorted in the A-phase winding,
Nt be the number of turns in each coil, and
Nc be the number of coils in each phase winding. The proportion of the shorted turns relative to a single coil and the total number of coils can be expressed as follows:
where
η represents the proportion of the shorted turns in the coil relative to the total number of turns in that coil, and
μ represents the proportion of the shorted turns in the faulty phase relative to the total number of turns in that phase winding. Based on the above analysis, the equivalent circuit model is derived as follows:
where
In Equation (2), Raf, Rah, Rf, Rb, and Rc indicate the resistance of the shorted part of the faulty phase winding A, the resistance of the remaining healthy part, and the fault resistance at the shorted point, the resistance of the phase winding B, and the resistance of the phase winding C, respectively. The ia, ib, ic, and if represent the currents flowing through the A, B, and C phase windings, as well as the fault current at the shorted point of the fault resistance, respectively. Lah, Laf, Mahf, Mahb, Mafb, Mahc, and Mafc denote the self-inductance of the healthy portion and the shorted portion of faulted phase winding A, the mutual inductance between these two portions, the mutual inductance between these two portions with phase B, and the mutual inductance between these two portions with phase C, respectively. Ψfah, Ψfaf, Ψfb, and Ψfc represent the permanent magnet flux linkage of the healthy portion of the faulted phase winding A, the permanent magnet flux linkage of the faulted portion, the permanent magnet flux linkage of phase B winding, and the permanent magnet flux linkage of phase C winding, respectively.
The resistance and permanent magnet flux linkage are directly proportional to the number of turns in the winding; therefore, the various parts of the faulted phase winding can be expressed as:
where
Ra and
Ψf stand for the phase resistance and permanent magnet flux linkage of phase A winding under the condition of a healthy state.
Since the focus of the study is on the early fault diagnosis of ITSCs, only the case of faults occurring within a single coil is considered. To streamline the analysis, it is also assumed that the ITSC fault affects the other phase windings symmetrically. The mutual inductance between coils within the same phase winding is ignored [
28]. The relationship between the inductance of different parts of the winding and the degree of the shorted ratio can be expressed as [
29]:
According to Kirchhoff’s current law, the sum of currents flowing into the same node is zero; thus, we can conclude the following:
By substituting Equations (3) and (4) into (2) and combining with Equation (5), the expression for the fault current can be derived as follows:
From Equation (4), it can be calculated that
μ(
Laf +
Mahf)
− Laf = 0. In the early stages of an ITSC fault, the amplitude of
vn can be considered negligible compared to that of
va, thus
va ≈
va −
vn. Let
va =
Va sin(
ωt), Thus, an approximate expression for the fault current amplitude can be derived as follows:
It is widely recognized that the voltage amplitude of the stator winding in a PMSM has a positive correlation with the motor speed [
30]. Then, Equation (7) can be represented as follows:
where
K represents a constant coefficient that can be considered a known quantity, while
ωr denotes the mechanical speed of the rotor.
From Equation (8), it can be seen that
K can be considered a known quantity, while
Ra represents the inherent parameter of the motor, which can also be regarded as known. The remaining parameters will directly affect the fault current
if. The parameters
μ and
Rf are both related to the severity of the ITSC fault, whereas
ωr is independent of the fault severity. By removing
ωr from the right side of Equation (8), only the parameters related to the severity of the ITSC fault remain, expressed as follows:
The right side of the equation contains only known quantities and parameters related to the severity of the ITSC fault. We define the right side expression as a representation of the fault severity, denoted by the symbol FI. When the motor is in a healthy state, this fault-severity indicator is 0, conversely, when the faulted phase winding of the motor is completely shorted and the fault resistance at the shorted point is 0, this indicator becomes infinite. The left side of the equation represents the ratio of the magnitude of the fault current to the mechanical speed of the rotor. From the derivation process, it can be seen that this indicator is only applicable for analyzing the fault severity in the early stages of ITSC faults, during which the indicator is generally unaffected by speed. It increases as the fault resistance Rf decreases or the shorted ratio μ increases, and vice versa. In practice, it is difficult to directly measure Rf and μ during the motor’s operation. Therefore, this indicator is not suitable for the direct estimation of ITSC fault severity. However, it can be used as a fault-severity indicator in experiments, guiding the setting of fault levels for ITSC faults.
3. Proposed Algorithm
3.1. Fault-Diagnosis Methods Based on Multi-Source Data Fusion
Multi-source data fusion, also known as information fusion, is a technology that enables the automated processing of integrated information. This technology originated in the military domain and has gradually been widely applied in the civilian sector after years of development. Today, it has achieved rapid advancements in various fields, including robot control, autonomous driving, and fault diagnosis [
31]. The application of data fusion in fault diagnosis relies on the research object’s ability to collect information from multiple types of sensors. Additionally, to achieve a comprehensive analysis and accurate assessment of the fault status, it is necessary to utilize various signal processing techniques to obtain a rich feature space. By combining different intelligent algorithms, multi-level data fusion can be achieved, leading to the final assessment results.
According to the different levels of data fusion, data-fusion methods can be classified into three categories: data-level fusion, feature-level fusion, and decision-level fusion. When selecting different fusion levels, it is necessary to consider the balance between fusion performance and implementation cost [
11].
Data-level fusion, also known as pixel-level fusion, refers to the direct integration of raw data, which maximally preserves the original information and exhibits superior fusion performance. Xia et al. proposed a CNN-based fault-diagnosis method for rotating machinery that combines sensor fusion with spatiotemporal information to achieve automatic feature extraction [
32]. Chen et al. proposed a fault-diagnosis method for gearboxes based on DCNN, which integrates the raw data of vertical and horizontal vibration signals to achieve automatic feature extraction [
33]. However, data-level fusion models do not possess the ability to correct errors, and their performance is poor when the sensor types are different or when there are significant differences in magnitude.
Feature-level fusion can achieve a more refined integration of information through the dimensionality reduction of the data. Compared to data-level fusion, feature-level fusion offers higher robustness, less information redundancy, greater flexibility, and better real-time performance, allowing for a more comprehensive representation of the data’s characteristics. Azamfar et al. proposed a novel two-dimensional CNN architecture that integrates features extracted from multiple current sensors to monitor gearbox faults under different operating conditions and speeds [
34]. Parai et al. proposed a feature-level fusion method for circuit fault diagnosis, which employs wavelet analysis for fault-feature extraction from multiple signals and uses PCA for feature fusion. Ultimately, circuit fault-type diagnosis is achieved based on a support vector machine [
9]. In recent years, the combination of feature-level fusion and deep learning models has gradually become a common approach to achieve better diagnostic results.
Decision-level fusion typically occurs after each model or sensor completes its processing independently, followed by a decision merger. This approach has a high degree of information integration and lower computational complexity, making it suitable for various signal sources or data types. Common decision-level fusion methods include the Dempster–Shafer (D-S) theory, decision tree fusion, and weighted voting. However, the complexity of decision-level fusion methods is relatively high, with significant loss of original data and various challenges in choosing fusion strategies, which is why they are less frequently used in conjunction with deep learning models.
Based on the advantages and disadvantages of different levels of multi-source data fusion described earlier, and in conjunction with the content of this study, a multi-source data-fusion algorithm based on a CNN model has been proposed. This algorithm is applied to the identification of the severity of ITSC faults, aiming to enhance feature learning from various types of sensor information. By enriching the feature space and integrating multi-faceted fault characteristics, the algorithm seeks to improve the diagnostic accuracy of ITSC fault identification under various complex operating conditions.
3.2. Convolutional Neural Networks
A CNN is a significant type of deep neural network that can achieve end-to-end fault classification by automatically extracting local features [
35]. Compared to traditional artificial neural networks, a CNN has convolutional layers that feature “local connectivity” and “weight sharing”, which substantially decreases the number of parameters in the model and lowers the training difficulty [
36]. CNN models typically consist of multiple hidden layers that can automatically extract various features from the input signals. The lower hidden layers focus on learning the basic characteristics of the input signals, while the higher layers abstract and re-extract these basic features to form more complex high-dimensional features, resulting in more accurate classification [
37].
The convolutional layer typically needs to work in conjunction with functional layers such as pooling layers, normalization layers, activation layers, and dropout layers to enhance the feature-extraction capability of the convolutional module. A typical structure of a convolutional module is shown in
Figure 3a. In this module, the convolutional layer filters out redundant information from the input signals through convolution operations, reinforcing important features related to fault classification. The pooling layer usually follows the convolutional layer to perform feature dimensionality reduction while maintaining the translation invariance of the features. The activation function accelerates the convergence of the model and helps mitigate the vanishing gradient problem to some extent. The dropout layer is primarily used to prevent overfitting, thereby improving the model’s generalization ability.
In this paper, to maintain the high resolution of the extracted fault features, reduce downsampling operations, and enhance the model’s ability to capture multi-level complex information, a dilated CNN is employed. Its structural diagram is illustrated in
Figure 3b, and the expression for dilated convolution is shown in Equation (10) [
38].
where
F(
x) stands for the dilated convolution operation. The input signal
S ∈
Rn is convolved using the operator *, with
x indicating the specific element of the input signal involved in the convolution. The dilation factor is denoted by
d, while
f: {0, 1,…, k − 1} →
R represents the set of weight values applied during the convolution. The parameter
k defines the size of the weight matrix, and
x −
d·
i indicates the
i-th element of the input signal undergoing the convolution operation.
3.3. The Attention Mechanism
Due to its significant impact on deep learning models, attention mechanisms have garnered widespread attention in recent years. The core of this mechanism lies in adjusting weights to guide the model in filtering out redundant information that is irrelevant to the task, thereby focusing attention on the features that are more critical for achieving the task objectives [
35]. In CNN models, common attention mechanisms include channel attention, spatial attention, and hybrid attention. This paper employs a channel attention mechanism to adjust the weights of input features extracted from different signals under varying fault levels and operating conditions, reallocating the model’s attention and enhancing the contribution of each channel feature to improve the performance of the ITSC fault-diagnosis model.
The typical channel attention mechanism is known as SENet (Squeeze and Excitation Network), and its core structure mainly includes three steps: squeeze, excitation, and scaling, as illustrated in
Figure 4 [
39]. During the training process of the fault-diagnosis model, the feature weights extracted by the model are adjusted and redistributed after undergoing the squeeze and excitation operations of SENet. The weights of features that are relevant and sensitive to fault level classification are enhanced, while irrelevant features are suppressed or weakened. The core three steps of SENet can be expressed as follows:
In SENet, the first step involves applying a squeeze operation to the information from each input channel, which sets the stage for adjusting the weights of the different channels. Assuming the input feature map has dimensions W × H × C, where × denotes scalar multiplication, this operation converts the input features into global features of size 1 × 1 × C through global average pooling. In Equation (11), Fsq denotes the squeeze operation, uc represents the entire input feature map, and Zc refers to the global features obtained after the squeeze operation.
The second step is the excitation operation, which aims to capture the relationships between the features input from different channels. The key step of the excitation operation is to input the global features of size 1 × 1 × C obtained from the squeeze operation into a fully connected layer of dimension C ÷ r × C, where r represents the scaling factor, primarily used to reduce the number of channels, which in turn lowers the computational complexity of the model. The output from the previous step is passed through a ReLU activation layer to a second fully connected layer, where the number of feature channels is restored. Subsequently, the output feature information is normalized to the range of (0, 1) through a Sigmoid activation layer, completing the readjustment of the weights for the relevant fault features across different channels. In Equation (11), S represents the adjusted weights of the fault features for each channel, Fex denotes the excitation operation, while W1 and W2 represent the operations of the two fully connected layers. σ and δ correspond to the ReLU activation layer and the Sigmoid activation layer, respectively.
The third step is the scaling operation, which primarily involves multiplying the fault features in each channel by the redistributed weights to recalibrate the fault features and achieve the overall adjustment of the attention mechanism. As shown in Equation (11), denotes the features after channel attention adjustment, and Fscale denotes the scaling operation.
3.4. Bayesian Optimization Algorithm
The CNN-based ITSC fault-diagnosis model features a flexible and variable structure, requiring numerous hyperparameters. Different combinations of hyperparameters can significantly impact the model’s training efficiency and final validation accuracy. To achieve a robust ITSC fault-diagnosis model, it is essential to optimize various hyperparameter combinations to identify the best set. Common optimization algorithms include grid search, random search, and Bayesian optimization. Random search and grid search are methods of random enumeration and exhaustive search, respectively, which can lead to a degree of blindness in the optimization process [
40]. This may result in a significant waste of computational resources on unsuitable hyperparameter combinations. Consequently, under limited computational resources, these two methods often struggle to yield satisfactory results without extensive prior experience. In contrast to these two algorithms, Bayesian optimization is a global optimization technique that employs a sequential search process [
41]. It effectively utilizes the prior information of known data points to autonomously adjust its optimization strategy. Therefore, this paper adopts Bayesian optimization to optimize the hyperparameters of the ITSC fault-diagnosis model.
Bayesian optimization is a global optimization strategy named after the famous Bayes’ theorem (as shown in Equation (12)) used in its framework. This algorithm constructs a probabilistic model of the objective function to effectively select the most promising evaluation points in each iteration, making it particularly suitable for scenarios where the objective function is expensive or difficult to evaluate.
where
f represents the objective function, which is typically difficult to express directly in functional form, in the fault-diagnosis model, it reflects the overall performance of the model.
D1:t denotes the sample points of the hyperparameter combinations to be optimized, with the number of sample points being
t. The process of hyperparameter tuning for the ITSC fault-diagnosis model based on Bayesian optimization mainly includes the following key steps:
Select several initial values as the starting points for Bayesian optimization based on the given hyperparameter combinations and the value ranges for each hyperparameter.
- (2)
Constructing the probabilistic model
Establish a probabilistic model for the objective function using a Gaussian process model based on the evaluated sample points. This model can predict the objective function values and their corresponding uncertainties for untested points.
- (3)
Selecting evaluation points
Consider both the predicted values and uncertainties of the model and select the next set of promising evaluation points from the probabilistic model using the expected improvement acquisition function.
- (4)
Objective function evaluation
Evaluate the objective function at the newly obtained sample points and compute the function values.
- (5)
Updating the probabilistic model
Combine the results from the new evaluation points with previous data to update the probabilistic model, thereby enhancing its accuracy. Repeat (3) to (5) until the specified termination criteria are met. Finally, select the hyperparameter combination corresponding to the model with the best performance as the output result.
The hyperparameter tuning process of the ITSC fault-diagnosis model is illustrated in
Figure 5. As shown in the figure, the hyperparameter adjustment for the entire fault-diagnosis model is primarily divided into two parts. One part is the constructing and training process of the fault-diagnosis model, as indicated in the black box. When the model meets the termination criteria for training, the final testing accuracy will be passed to the Bayesian optimization process. The other part involves hyperparameter optimization based on the Bayesian optimization algorithm, as depicted in the green box. In this entire process, Bayesian optimization is responsible for finding the optimal hyperparameter combinations, which mainly includes the initialization and optimization of the hyperparameter combinations. Throughout the process, the model training and hyperparameter optimization cycle back and forth until the termination criteria for optimization are met. Ultimately, the optimal hyperparameter combination and its validation accuracy are selected as the output results.
3.5. Multi-Source Data-Fusion Algorithm Based on Bayesian Optimization and a CNN
The above analysis indicates that the occurrence of ITSC faults not only affects the three-phase currents of the windings but also alters the distribution of the air gap magnetic field. This leads to the generation of unbalanced radial magnetic forces within the motor, which exacerbates the production of vibration signals and alters the vibration characteristics of the motor. Therefore, this paper proposes a fault-diagnosis method for ITSCs based on the fusion of three-phase current signals and vibration signals. This method utilizes both three-phase current signals and vibration signals as sources for fault diagnosis. Using a CNN model, it extracts features from the current and vibration signals separately and then performs a feature-level fusion of the two types of signals. The channel attention mechanism is employed to adjust the weights of the fault features from different signals and channels. Finally, to improve the training efficiency of the model and enhance its performance, Bayesian optimization is used to fine-tune the training hyperparameters of the model.
The flowchart of the entire process is shown in
Figure 6, which can be divided into five specific steps.
- (1)
Data collection. Experiments are conducted with varying degrees of ITSC faults, collecting three-phase current signals and vibration signals simultaneously under different operating conditions.
- (2)
Dataset preparation. The collected current and vibration signals undergo data synchronization, followed by a series of preprocessing steps including filtering, normalization, downsampling, data slicing, and grouping. Ultimately, the processed current and vibration signals are organized into a dataset suitable for a multi-source data-fusion model.
- (3)
Model construction and initialization. This study employs a multi-stream high-level feature fusion model based on CNNs. The two types of signals are fed into different CNN branches for feature extraction, and then the extracted features are fused at the high level of the network. Fault-severity recognition is achieved through fully connected layers and classification layers. The model’s training hyperparameters are initialized using Bayesian optimization.
- (4)
Model training and optimization. The dataset constructed in Step 2 and the multi-source data-fusion model built in Step 3 are used for training. The performance of the trained model is evaluated using a test set, and the Bayesian optimization algorithm updates the hyperparameter combinations based on the results, repeating the process until the optimization iterations or model performance reaches the termination criteria.
- (5)
Output results. When the model reaches the optimization termination condition, the best ITSC model is selected for output, which includes the optimal hyperparameter combinations and the corresponding accuracy results for ITSC fault diagnosis.
Figure 6.
Flowchart of ITSC fault-diagnosis algorithm based on multi-source data fusion.
Figure 6.
Flowchart of ITSC fault-diagnosis algorithm based on multi-source data fusion.
4. Experimental Setup and Data Description
To validate the effectiveness of the ITSC fault-diagnosis algorithm based on the fusion of current and vibration signal fault characteristics, as well as the Bayesian-optimized ITSC fault-diagnosis model, a simulation test for ITSC faults in PMSM was conducted. The experiment was carried out on the test bench for the dual-drive motor, as shown in
Figure 7. The main test equipment included the motor under test and its controller, a dynamometer, torque and speed sensors, vibration sensors, current sensors, and a data acquisition system. The motor under test was an 8-pole, 36-slot PMSM with a star-connected winding structure. The specific parameters are listed in
Table 1.
The torque sensor used in the experiment was the HCNJ-101, with a measurement accuracy of ±0.1%; the current sensor was the ETA-5301B, with a measurement accuracy of 3% RD; the vibration sensor was the KS78B100 from MMF Germany, with an IEPE interface, a sensitivity of 100 mV/g and an accuracy of 2% RD. To prevent low-frequency interference during data acquisition, the sampling frequency for the current signal was set to 1 MHz. Since the maximum sampling frequency for the vibration signal acquisition using the NI cRIO9068 was also 1 MHz, different devices were used for data collection. The data acquisition device for the current signal was an oscilloscope with a sampling frequency of 1 MHz, while the vibration signal was acquired using the NI9401 card within the NI cRIO9068, at a sampling frequency of 10.24 kHz.
To ensure synchronization between the data acquisition devices, a signal generator was used to create a sweep frequency signal with a period of 8 s, sweeping from 20 Hz to 2 kHz, with a voltage signal amplitude of ±2 V. The waveform of the synchronization signal is shown in
Figure 8. During the data acquisition process, both the current and vibration data acquisition devices received the synchronization signal from the signal generator simultaneously. The oscilloscope had a sampling frequency of 1 MHz, while the NI9223 acquisition card used by the NI cRIO9068 had a sampling frequency of 100 kHz.
In conducting simulation tests of ITSC faults in motors, two key factors need to be fully considered: on one hand, the establishment of early ITSC faults with varying degrees of severity; on the other hand, the operational conditions of the motor should be as comprehensive as possible. Based on the previous analysis, the severity of ITSC faults is determined by the number of shorted turns and the fault resistance. To simulate different degrees of ITSC faults, the test motor has been modified accordingly, as shown in
Figure 9.
Figure 9a displays the tested faulty motor, with terminals on both sides representing the lead wires of windings with different turn counts, allowing for varying degrees of short circuit simulation through paired connections.
Figure 9b shows the fault resistance and its heat dissipation device; the fault resistance can be replaced to simulate different levels of insulation damage. The combination of these two parameters facilitates the simulation of varying degrees of ITSC faults.
Figure 9c illustrates the temperature measurement device, which continuously monitors the temperature of the faulty motor and its fault resistance throughout the experiment to prevent damage due to excessive temperature.
The operational conditions for the motor during the fault simulation tests are presented in
Table 2. To simulate the operating conditions of agricultural machinery during acceleration, deceleration, and constant speed, a total of 8 constant speed scenarios and 2 acceleration and deceleration scenarios are included. The settings for variable speed conditions are shown in
Figure 10.
After completing the signal acquisition, a series of data preprocessing operations were required, including signal synchronization, filtering, normalization, downsampling, slicing, and grouping, to organize the current and vibration signals into a dataset suitable for multi-source data-fusion models.
The oscilloscope can record signal sampling for 10 s at a time, while the cRIO9068 has a longer signal sampling duration. Therefore, during the data acquisition phase, the cRIO9068 must be turned on first for data collection, followed by the activation of the oscilloscope. In the end, the oscilloscope is stopped first, followed by the cessation of recording on the cRIO9068. The data synchronization process between the two signals involves using the time segment occupied by the data recorded by the oscilloscope to slice the corresponding data recorded by the cRIO9068 within that time frame. Throughout this process, the synchronized signals recorded by both devices serve as timestamps to determine the start and end times of the signals. To maximize the length of the synchronized signal, the synchronization signal from the oscilloscope is downsampled to 100 kHz, denoted as
y, while the synchronization signal from the cRIO9068 is denoted as
x. The expression for the cross-correlation of the two signals is as follows:
where,
represents the cross-correlation index between the two signals,
m denotes the offset of the signal in the NI cRIO9068, with a range of (0,
N − 1), and
N is the length of signal
x. When
m takes on different values, a series of cross-correlation indices can be obtained for signal
x relative to signal
y after shifting
m points. The maximum value in this sequence indicates the highest cross-correlation index for the two signals at the corresponding offset. Since both signals are derived from the same synchronous signal generated by the signal generator, if the two signals coincide, the offset for the corresponding cross-correlation index represents the starting point for truncating signal
x using signal
y, while the endpoint is determined by the number of sampling points contained in signal
y.
Figure 11 illustrates the results of signal synchronization achieved through cross-correlation analysis of synchronous signals collected by two devices. To compare the effectiveness of the signal synchronization, the synchronous signals of equal amplitude have been slightly offset in the diagram. Signal x is the downsampled synchronous signal recorded by the cRIO9068, while signal y is the synchronous signal captured by the oscilloscope. Signal x′ is derived from signal y, and signal y′ is consistent with signal y. It can be observed from the figure that the length of signal x is significantly greater than that of signal y, which aligns with the setup during the experiment where the cRIO9068 was initiated earlier and stopped later than the oscilloscope. The starting point of signal x′ is determined by the maximum cross-correlation offset between signals x and y, while the endpoint of signal x is determined by the length of signal y. The locally enlarged area in the figure indicates the positions of the frequency sweep cycle transitions for the four synchronous signals. It is evident that the transition positions for these four synchronous signals are consistent, and the synchronization error is within one sampling period, indicating that the synchronization algorithm used is effective.
From the above analysis, it can be seen that during the signal synchronization process of the two devices, the synchronization signal collected by the oscilloscope has not changed. Therefore, the fundamental purpose of synchronizing the signals from these two devices is to use the synchronized signal as a timestamp to extract the corresponding time range data from the cRIO9068, thereby eliminating data that is out of sync. Due to the different sampling frequencies used by the two devices during signal acquisition, there will still be time synchronization errors when using the synchronized signal as a timestamp for data extraction. A schematic of this error is shown in
Figure 12.
In
Figure 12, the horizontal axis (X) represents the sampling time, and the vertical axis (Y) represents the signal amplitude. Assuming the point (6.7 × 10
−5, 3) in the figure is the calculated starting point of the synchronized signals from the two devices, the synchronization error between the synchronization signal and the vibration signal is t1, while the synchronization error between the synchronization signal and the current signal is t2. It is known that the algorithm ensures the synchronization error between the two signals is within one sampling period, so the upper limit of synchronization error between the current signal and the vibration signal is the sampling period of the vibration signal, which is 9.77 × 10
−5 s, significantly less than 0.1 milliseconds.
During data acquisition, each group of data is collected for 10 s, and the time length of each data slice when constructing the dataset is 0.2 s. Therefore, this maximum time error represents a very small proportion of each data slice’s time length, and its impact on the sample set constructed for signal synchronization in this paper can be considered negligible.
After synchronizing the current and vibration signals, the subsequent data preprocessing operations include filtering, normalization, downsampling, slicing, and grouping. The description of the resulting synchronized signal dataset is shown in
Table 3. From the table, it can be observed that the dataset consists of synchronized current and vibration signals, containing 17 fault-severity labels. Among these, “HL” represents data collected under normal motor conditions, while “A*R*” indicates data collected under different ITSC fault severity. The severity of faults in the ITSC is determined by varying the combinations of shorted turn ratios and fault resistances. Different shorted turn ratios are created by connecting two specific points from the multiple lead wires, each corresponding to different turns of the first coil in phase A. Fault resistance, on the other hand, can take any value within the range of 0 to 1M ohms when an ITSC fault occurs [
15]. If the fault resistance is extremely high, the fault current flowing through it will be minimal, effectively resulting in a healthy state. Conversely, if the fault resistance is too low, the fault current will be large enough to potentially cause irreversible damage to the test rig [
7]. Since this paper focuses on diagnosing early-stage faults in the ITSC, fault resistance is classified into three cases based on the above analysis. First, when the fault resistance is significantly greater than the impedance of the shorted wire. Second, when the fault resistance is slightly larger than the impedance of the shorted wire, resulting in a higher fault current and more noticeable impact on the motor. Third, when the fault resistance is close to the impedance of the shorted wire, leading to a large fault current that could potentially damage the experimental setup if the test is prolonged. Based on these considerations and experimental experience, only the ITSC scenario involving a single coil is considered when determining the number of shorted turns. The fault resistances are then set within the range of 5 Ω to 0.1 Ω, ensuring that the fault currents are clearly detectable while preventing any irreversible damage to the experimental setup. The severity of faults is arranged in ascending order based on the calculation from Equation (9).
Under each fault-severity label, both types of signals contain 1200 data samples, with 840 samples designated for training and 360 for testing.
Figure 13 displays the waveform of the current signal from a single sample in the constructed dataset, along with the corresponding vibration signal for that period. The horizontal axis represents the number of sampling points, and the vertical axis represents the normalized waveform amplitude. It is evident from the figure that each set of current signals has 3000 sampling points, while the vibration signals consist of 2048 sampling points. The time duration for both sets of signals matches the specified data slicing duration, which is 0.2 s.
6. Conclusions
In this article, a novel multi-source data-fusion algorithm was proposed for the ITSC fault diagnosis. The results indicate that, compared to using only current signals, the ITSC fault-diagnosis model that combines current and vibration signals for feature fusion performs better in terms of validation accuracy, error loss, F1 score, and feature learning capability. The following conclusions can be drawn. First, an indicator suitable for early-stage ITSC fault-severity analysis is derived from the equivalent circuit. Second, a feature-level multi-source data-fusion algorithm based on CNN has been introduced. This algorithm utilizes Bayesian optimization for hyperparameter tuning and integrates current and vibration signal features, enhancing the richness of the fault-feature space, and thereby improving the accuracy of ITSC fault-severity identification. To achieve synchronization of multi-source signals during experiments, a signal synchronization method has been proposed to construct a synchronized dataset of current and vibration signals. By calculating the maximum cross-correlation of the synchronized signals collected from two devices, successful synchronization of the current and vibration signals was achieved. Finally, the experimental results indicate that, among the four comparative models, the proposed MS model achieved the highest final validation accuracy, close to 98.99%, with the smallest error loss, below 0.04. In the F1 scores for 17 classification tasks, the MS model outperformed the others in 14 cases, demonstrating the strongest feature learning capability, which validates the effectiveness of the proposed multi-source data-fusion model.
Unfortunately, the proposed algorithm primarily focuses on improving the accuracy of identifying the severity of early ITSC faults under experimental conditions, without addressing issues such as signal interference, and insufficient or imbalanced sample sizes that may arise in real-world applications. To enhance the algorithm’s performance in practical scenarios, the next step will be to investigate methods for overcoming these challenges.