1. Introduction
Since the 21st century, the world has been experiencing rapid changes due to issues such as global, ecological, and climate shifts, along with population growth [
1]. Environmental protection and food security have garnered increasing attention, creating an urgent need for safe, intelligent, and sustainable solutions [
2]. Agricultural mechanization plays a vital role in advancing agricultural modernization and sustainable development, making intelligent fault diagnosis research of paramount importance [
3]. Agricultural machinery is extensively utilized in all aspects of modern agricultural production, including tillage, fertilization, sowing, and harvesting. Given their vital role in the production process, the efficient operation of these machines directly impacts both the efficiency and yield of agricultural output [
4]. Motors are key power components in agricultural machinery, being responsible for providing stable torque and achieving efficient energy conversion, thereby enhancing mechanical efficiency and reducing energy loss, which makes them more environmentally friendly [
5]. PMSMs are widely used in agricultural mechanization due to their excellent torque control performance, high power density, and high efficiency, coupled with China’s natural advantage in rare earth resources [
5]. They are utilized in equipment such as electric tractors, seeders, harvesters, spraying equipment, tillage, and seeding machinery, significantly enhancing the intelligence and automation levels of agricultural production. However, the operating conditions and environment of agricultural machinery pose challenges to the safe operation of PMSMs [
6]. The faults of PMSMs can generally be categorized into mechanical faults, permanent magnet faults, and electrical faults [
7]. Mechanical faults primarily refer to the failures caused by the damage of mechanical components, such as bearings, rotors, and shafts, with common fault types including eccentricity and bearing failures. Permanent magnet faults refer to irreversible partial or total demagnetization of the permanent magnets fixed on the rotor, which is unique to PMSMs and can be caused by various factors. Electrical faults usually occur due to damage to the stator windings, and the main fault types include open-circuit winding faults, ITSC faults, phase-to-phase short-circuit faults, and winding ground faults. Due to the limited installation space and the high power density requirements of PMSMs in agricultural machinery, the winding design becomes highly compact, which poses significant challenges for the heat dissipation of the motor windings. Furthermore, the operating environment of agricultural machinery is particularly harsh and variable, including exposure to dust, high temperatures, high humidity, complex modal vibrations, as well as frequent instantaneous overloads and fluctuating loads [
8]. These factors make ITSC faults one of the most common failures in PMSMs [
9]. The occurrence of these faults generates significant fault currents within the short windings, which not only affects the distribution of the air gap’s magnetic field and exacerbates motor vibrations but also causes excessive heat generation in the affected windings. If these issues are not detected and addressed in time, it can lead to a rapid increase in the stator winding temperature, damaging the insulation of nearby windings and further worsening the fault condition [
10]. This may even result in a loss of control over the motor and agricultural machinery, leading to catastrophic accidents and significant economic losses. Therefore, it is crucial to diagnose and address ITSC faults in their early stage.
Traditional fault diagnosis methods that rely heavily on regular maintenance and experience not only fail to provide early warnings but also are inefficient and costly [
11]. Thanks to the advancements in computer technology and sensor technology, intelligent fault diagnosis methods have received widespread attention and application in recent years [
12,
13]. Jiang et al. implemented fault diagnosis for the rolling bearings of a combine harvester using an improved variational modal decomposition (VMD) and machine learning method, with experimental results demonstrating the superiority of this approach [
14]. Parvin proposed a transformer neural network (TNN) model for diagnosing the severity of ITSC faults [
15]. By employing a multihead attention mechanism, this algorithm enables the model to concentrate on specific aspects of the input signals, achieving an experimental accuracy exceeding 96%. Li et al. used the correlation coefficient of permutation entropy as an evaluation index, combining random forest algorithms with support vector machines to identify the engine state of a tractor [
16]. Their experiments show that this algorithm has good recognition accuracy under small sample conditions. Fan et al. implemented a sparse classification framework for the composite fault diagnosis of tractor bearings, utilizing adaptive feature dictionary learning to automatically extract fault features, which improved the accuracy of fault state identification under heavy noise conditions [
17]. Lee et al. proposed an ITSC fault diagnosis model by combining an attention mechanism with a recurrent neural network (RNN) to realize the fault severity estimation [
18]. Xu et al. used Time Generative Adversarial Networks (Time GANs) for data augmentation to overcome the issue of limited fault samples and combined it with transformers to perform fault diagnosis of tractor transmission systems [
19].
Despite significant achievements in research on fault diagnosis using machine learning algorithms, there has been limited study on early fault diagnosis, let alone the early diagnosis of ITSC faults [
20]. In ITSC faults, early fault diagnosis is crucial, as overcurrent and overheating can lead to more severe issues. The existing ITSC fault models inadequately consider the impact of the coil structure within the winding on the fault model, failing to accurately reflect the relationship between winding parameters and fault severity [
21]. Moreover, the three-phase current signals utilized are generally lengthy one-dimensional signals that are highly susceptible to electromagnetic interference and can change with varying operating conditions [
22]. Consequently, accurately diagnosing ITSC faults requires the extraction of more profound and higher-dimensional features from the collected current signals, particularly when dealing with signals under dynamic operating conditions. This necessitates that the deep learning models employed have sufficient network depth and complexity [
23]. However, tests indicate that when the depth of the model increases to a certain extent, its performance tends to saturate and then rapidly decline, which is different from overfitting [
24]. Therefore, as the network depth increases, some performance degradation issues will arise. Additionally, the automatic tuning of hyperparameters for the network model is another pressing problem that needs to be addressed. The hyperparameters for the network architecture and training in the aforementioned studies largely rely on manual tuning based on experience, which can consume a significant amount of time and computational resources, even for those with considerable experience.
To address the aforementioned issues, a novel Bayesian optimization-based improvement algorithm was proposed for the enhancement of the ITSC fault diagnosis model. The primary improvements of this paper are outlined as follows:
- (1)
By conducting a mechanism analysis of PMSMs, this study investigates the relationship between the parameters of different winding components and the severity of ITSC faults. It proposes a fault model for ITSCs that considers the winding coil structure, as well as indicators that can be used to guide the setting of the severity of ITSC faults.
- (2)
A well-crafted deep learning network is proposed, which incorporates residual structures, multi-scale structures, and channel attention mechanisms. This network utilizes dilated convolutions for signal feature extraction, employs residual structures to enhance learning efficiency, and leverages multi-scale structures to enrich the scale of extracted features. Finally, the channel attention mechanism is used to adjust the weight of effective features in fault recognition, thereby improving the accuracy of fault severity identification.
- (3)
The Bayesian optimization algorithm is employed to address the tuning of hyperparameters for the fault diagnosis, enabling the automatic optimization of the model’s hyperparameters. Building upon the automatic optimization of model training hyperparameters using Bayesian optimization, the network’s feature extraction layers are divided into a three-layer architecture, integrating three improved CNN structures to achieve automatic optimization of the model architecture hyperparameters.
- (4)
The effectiveness of the proposed fault diagnosis method was evaluated through simulated ITSC fault tests conducted under both constant and dynamic operating conditions. By comparing it with five other fault diagnosis models of different structures, the advantages of the proposed method were validated.
The remainder of this paper is structured as follows:
Section 2 presents the ITSC model that considers the winding coil structure and derives an index that can be used to set ITSC fault parameters.
Section 3 introduces the proposed algorithm model along with the structure and components of each part.
Section 4 describes the experimental equipment used and the settings required for simulating fault tests, as well as detailing the generated dataset. In
Section 5, the fault diagnosis model proposed in this paper is compared with five other models of different structures, with experimental results demonstrating the effectiveness and superiority of the proposed algorithm. Finally,
Section 6 summarizes the work presented in this paper and discusses future improvements.
2. ITSC Fault in PMSMs
The estimation of ITSC faults is critically important for two main reasons. On one hand, these faults are very difficult to detect in their early stages [
25]. On the other hand, an ITSC fault can lead to overcurrent and overheating, which can cause more severe issues [
26]. In previous research, no index is particularly suitable for the estimation of an early-stage ITSC fault. In this paper, an equivalent circuit model is proposed, and an index is derived to guide the setting of the ITSC fault severity in experiments.
Currently, the winding structure of a PMSM mostly uses distributed winding arrangements. The coils are wound into appropriate shapes and distributed across two stator slots with a specific pitch. When an ITSC fault occurs in a few turns of the coil within a particular slot, the wires within the corresponding slot will also be shorted, as shown in
Figure 1a.
Figure 1a is a cross-sectional view of a PMSM with 8 poles and 36 slots. Every turn of the wire within the slot is labeled as Pc-t. For example, A1-3 denotes the 3rd turn wire of the first coil within winding phase A. The red section of the stator winding in the figure indicates the location where the ITSC fault happens, and the corresponding enlarged view shows the labels of the wires involved in the short circuit. Assuming an ITSC fault occurs in the first coil of winding phase A, the schematic diagram of the equivalent circuit model is shown in
Figure 1b. From the figure, it can be seen that after the fault occurs, the faulty phase winding will be divided into two parts. One part is the shorted section, and the other is the remaining healthy section. Additionally, the winding of the shorted section will form a new closed loop at the point of the shorted wires. When the current of phase A winding flows through the newly formed closed loop, it divides into the current
if passing through the fault resistance
Rf and the current (
ia–
if) passing through the shorted winding. Let
Nc be the number of coils in each phase winding,
Nt be the number of turns per coil, and
Ns be the number of turns shorted in the case of an ITSC fault. The degree of winding shorted can be expressed as:
where
μ indicates the proportion of shorted turns in the fault phase winding relative to the total number of turns in that phase winding. Based on the above analysis, the description of the equivalent circuit model is as follows:
where
In the formula, Rah, Raf, and Rf represent the resistance of the remaining healthy portion, the resistance of the shorted portion, and the fault resistance at the shorted point in fault phase winding A, respectively. ia, ib, and ic represent the current flowing through phase winding A, phase winding B, and phase winding C, respectively. van, vbn, and vcn represent the voltages of the three-phase windings with respect to the neutral point. if represents the current flowing through the fault resistance. Laa and Lbb denote the self-inductance of phase windings B and C, respectively. Lah and Laf denote the self-inductance of the remaining healthy portion and the shorted portion of fault phase winding A, respectively. Mbc indicates the mutual inductance between phase windings B and C. Mahf indicates the mutual inductance between the two portions of fault phase winding A. Mahb and Mafb represent the mutual inductance between the two portions of fault phase winding A and phase winding B, respectively. Mahc and Mafc denote the mutual inductance between the two portions of fault phase winding A and phase winding C, respectively. efah and efaf represent the induced electromotive forces generated by the permanent magnet in the shorted part and the remaining healthy part of phase winding A, respectively. efb and efc represent the induced electromotive forces generated by the permanent magnet in phase winding B and phase winding C, respectively. Ψfah and Ψfaf represent the flux linkage of the healthy portion and the shorted portion of fault phase winding A, respectively, associated with the permanent magnet. Ψfb and Ψfc represent the flux linkage of phase winding B and phase winding C induced by the permanent magnet, respectively.
Determining the parameters in the fault model is a crucial step for modeling and studying different fault states of the motor. The resistances of the healthy portion and the shorted portion are proportional to the contribution of each part to the total number of turns in the faulted phase winding. The expressions are as follows:
where
Ra stands for the resistance of phase winding A when there is no ITSC fault.
The flux linkage of the permanent magnet in the winding is proportional to the number of turns of the winding. The flux linkages of the healthy portion and the shorted portion of the fault phase winding are represented as follows:
where
Ψf stands for the flux linkage of the permanent magnet in phase winding A when there is no ITSC fault.
In the fault model of ITSCs in PMSMs, determining the parameters for the stator winding’s self-inductance and mutual inductance is the most complex part. This complexity arises from the changes in the magnetic field caused by the presence of the ITSC fault. The stator winding of a motor is typically composed of multiple coils, as shown in
Figure 2. For each coil within a given phase winding, it is necessary to separately discuss the coil’s self-inductance, the mutual inductance between this coil and other coils within the same phase winding, and the mutual inductance between this coil and different coils in the rest of the phase windings.
When studying the relationship between the mutual inductance of a coil within a given phase winding and another phase winding, and if the fault occurs only within a single coil, the mutual inductances between the two portions of the fault coil with another phase winding are described by the following equations:
where
M stands for the mutual inductance between the given phase winding and another phase winding.
Mip stands for the mutual inductance between the
i-th coil within the given phase winding and another phase winding.
Mafp represents the mutual inductance between the shorted wires within the fault coil and another phase winding, while
Mahp represents the mutual inductance between the remaining unshortened wires of the fault phase winding and another phase winding.
When the fault occurs in more than one coil, assuming that the ITSC fault occurs in the first
n coils, where
n ≥ 2, and the first
n − 1 coils are also shorted, the mutual inductances between the two portions of coils within the fault phase winding with another phase winding are described by Equation (6), The meanings of the parameters are consistent with those described earlier.
When studying the self-inductance and mutual inductance relationships between coils in a phase winding, since each phase winding is composed of multiple coils connected in series, and assuming a symmetrical distribution of stator winding coils, the self-inductance of each coil is essentially consistent. However, the mutual inductance between coils is related to their relative positions. Let
Lbob denote the self-inductance of a single coil in the phase winding, and
Mij denote the mutual inductance between two coils in the same winding, which depends on their relative positions, as described by Expression (7).
Here,
i,
j,
k, and
l represent the positions of each coil in the A-phase winding.
Based on the above analysis, the inductance of each coil can be described as:
where
represents the mutual inductance between the chosen coil and the remaining coils in the same winding.
L denotes the self-inductance of the phase winding. Assuming an ITSC fault occurs on the first coil of phase winding A under the condition of no distinction and neglecting the leakage inductance between the wires within the coil, the inductance between the coils in the fault phase winding satisfies the following relationship:
where
Lbobf represents the self-inductance of the shorted wires within the fault coil.
Lbobh represents the self-inductance of the unshortened wires within the fault coil.
Mbobf stands for the mutual inductance between the shorted wires and unshortened wires within the fault coil. The mutual inductances between the two portions of the fault coil and the other remaining coils within the fault winding satisfies the relationship:
where
Mbobf represents the mutual inductance between the shorted wires within the fault coil and the other remaining coils within the fault winding.
Mbobh represents the mutual inductance between the unshortened wires within the fault coil and the other remaining coils within the fault winding.
Based on the above analysis, the inductance of each portion in the fault winding can be described as:
Substituting Equations (3)~(11) into Equation (2), the resistance, inductance, and back electromotive force in the voltage balance voltage equation under ITSC fault conditions can be described as:
The efa represents the induced electromotive forces generated by the permanent magnet in phase winding A. Ψfa represents the flux linkage of phase winding A induced by the permanent magnet.
Since the analyzed stator winding is Y-connected, it follows, from Kirchhoff’s Current Law, that:
From Equations (2), (12), and (13), the expression for the fault current can be derived as:
Let
d1 =
μRa +
Rf −
μ2Ra,
d2 =
μ(
Laf +
Mahf) −
Laf, and
van =
va −
vn; then, the above equation can be rewritten as:
Since the focus of the study is on the early stage of ITSC faults, the amplitude of voltage
vn is much smaller than that of
va, so
va ≈
va −
vn. Assuming
va =
Va sin (
ωt), the analytical solution of Equation (15) can be described as:
At the early stage of an ITSC fault, the fault usually occurs in a single coil, there are fewer shorted turns of wires, and the fault resistance at the shorted point is relatively large. Therefore,
d1 > 0,
d2 < 0, and |
d1| >> |
d2|. As a result,
d1/
d2 tends towards −∞, and
d2/
d1 tends towards 0. Thus, from Equation (16), the approximate expression for the current amplitude can be obtained:
According to reference [
27], it is known that the amplitude of the three-phase voltage in the stator winding of the PMSM is positively correlated with the motor speed. Therefore, Equation (17) can be rewritten as:
where
ωr represents the mechanical speed of the PMSM.
K represents a known coefficient. By analyzing the above equation, it can be seen that the resistance of fault phase winding
Ra can be regarded as a known quantity in the equation, and the remaining parameters
μ,
Rf, and
ωr can directly affect the amplitude of the fault current
if. However, among these parameters,
μ and
Rf are related to the severity of the ITSC fault, while
ωr is not. If
ωr is excluded from Equation (18), an expression related only to the shorted degree
μ and the fault resistance
Rf will be derived:
where
FI stands for the severity index of the ITSC fault. When the tested motor is in a healthy state, this index is 0. When the winding of a certain phase of the motor is completely shorted and the fault resistance is 0, this index becomes infinite.
In the early stages of an ITSC fault, this index is essentially unaffected by speed and increases as the fault resistance Rf decreases or the degree of shorted turns μ increases, and vice versa. Each fault severity can be considered as a combination of different Rf and μ values. Of course, in actual motor operation, it is difficult to directly detect the fault resistance Rf and the degree of shorted turns μ, so this severity index is not suitable for estimating the severity of an ITSC fault. However, it can be used as an index for fault severity in experiments to guide the setting of ITSC fault severity.
4. Experimental Setup and Data Description
In order to verify the validity of the proposed Bayesian optimization-based improvement algorithm for the ITSC fault diagnosis model. Experiments are carried out on a PMSM. The setup consists of a simulated fault motor and its controller, an auxiliary test motor and its controller, data acquisition equipment, etc., as shown in
Figure 10. The fault motor is an 8-pole, 36-slot PMSM, with the windings configured in a star connection, featuring 108 turns of wire per phase. The specific parameters of the faulty motor are shown in
Table 1. The fault motor simulates different severities of ITSC faults by combining different fault resistances and shorted ratios. To prevent damage to the fault resistor, a cooling device is required for heat dissipation during the experiment. Temperature monitoring of the entire setup is conducted during the experiment to prevent overheating and damage. The fault resistor and its cooling device are shown in
Figure 10c, the fault motor and its shorted winding point terminals are shown in
Figure 10b, and the temperature measurement device is shown in
Figure 10d.
A fault motor simulation test was conducted using a test bench to replicate 17 different fault states of a PMSM exhibiting ITSC faults. This includes one healthy state and sixteen distinct fault conditions. The severity of the ITSC faults is determined by combinations of shorted degrees and fault resistances. The shorted degrees are defined as 5 turns, 9 turns, 11 turns, and 15 turns, totaling four categories. The fault resistances are set at 5 Ω, 1 Ω, 0.5 Ω, and 0.1 Ω, also totaling four categories, resulting in 16 fault levels. Considering the healthy state of the motor as having a fault level of 0, the experimental data encompass a total of 17 fault severities. To simulate the motor’s operating conditions during agricultural machinery acceleration, deceleration, and constant speed driving, 8 constant speed scenarios and 2 variable speed scenarios were established during the bench test, as detailed in
Table 2.
It can be seen from
Table 2 that there are 10 different operating conditions in the test process, each generated by combinations of five speeds and two torques. The two load torques are both constant, while among the five speeds, four are constant and one represents an acceleration and deceleration condition. The dynamic speed variation ranges from 850 rpm to 1550 rpm and then back to 850 rpm, as shown in
Figure 11. For each distinct fault condition of the motor, ITSC fault tests are conducted under the aforementioned 10 conditions. The Yokogawa DL850EA oscilloscope is used to record the three-phase current, with a sampling frequency of 1 MHz. The data sampling duration for the fault motor under each operating condition is 10 s. The entire data collection process employs a field-oriented control (FOC) strategy using the VFD037C23A inverter, operating at the switching frequency of 15 kHz, with the auxiliary test motor using speed closed-loop control and the tested motor using current closed-loop control.
During the experiment, due to the absence of hardware filtering, a relatively high sampling frequency of 1 MHz was chosen to avoid signal aliasing caused by interference and other factors during data acquisition. If the raw data were directly used for dataset construction, it would impose a significant challenge on computer hardware resources and severely impact the training speed. The goal of this study is to use deep learning models to extract low-frequency features from the acquired experimental data that are useful for classifying the severity of ITSC faults. Therefore, during data preprocessing, the acquired data are first filtered and then down-sampled to retain low-frequency features while reducing the memory usage of the dataset. A zero-phase low-pass filter is applied to the data, and the down-sampled sampling frequency is set to 15 kHz, matching the switching frequency of the controller. To facilitate the comparison of data under different fault severities and operating conditions and to accelerate the convergence of the deep learning model, the acquired data are normalized to the range of [−1, 1]. To aid in training the deep learning model, the down-sampled three-phase current data are divided into equal-length data slices, each containing sufficient feature information. The length of each data slice is set to 3000 sampling points, which ensures that, at the lowest operating speed, the three-phase current signal collected over one cycle of the motor’s rotation is captured in each slice.
The labels of the data slices correspond to their fault severity, as shown in
Table 3. In the labels, “HL” denotes the data collected under healthy motor conditions, while “A*R*” stands for the data collected under different combinations of fault resistors and shorted ratios. “A2”, “A4”, “A5”, and “A6” represent shorted turns of 5, 9, 11, and 15, respectively. “R5”, “R1”, “R0.5”, and “R0.1” indicate fault resistances of 5 Ω, 1 Ω, 0.5 Ω, and 0.1 Ω, respectively. The fault severities in
Table 3 are arranged in ascending order based on the severity calculated using Equation (18). The sampled data were organized into datasets according to different fault severities, ensuring that the amount of data for each condition under a specific fault severity was equal and the quantities of data corresponding to each fault severity were also equal. For each fault level, the number of data samples is set at 1200, with 360 samples randomly selected for testing, leaving 840 samples for training, resulting in a ratio of 3:7. Ultimately, all training samples form the training set, while all testing samples comprise the validation set.
The comparison of the data before and after preprocessing is shown in
Figure 12. In each figure, the left side displays the original three-phase current signal, while the right side shows the three-phase current after data preprocessing.
Figure 12a illustrates the three-phase current under healthy conditions at a speed of 150 rpm and a torque of 3.0 Nm.
Figure 12b depicts the three-phase current of a faulty motor with the fault label “A5R0.1”, collected under dynamic speed conditions at a torque of 3.0 Nm.
5. Results and Comparisons
After completing the data preprocessing and dataset construction, the proposed Bayesian optimization-based ITSC fault diagnosis model is used to analyze the three-phase current signals. The whole procedure is carried out offline. The hyperparameter combinations to be optimized and their search space are shown in
Table 4.
Among them,
Linit represents the initial learning rate of the entire model,
G1 represents the gradient optimization coefficient of the Adam optimizer,
L2R represents the
L2 regularization coefficient,
P represents the probability of dropout, and the data type for the above hyperparameters is all real numbers. The depths of the three convolutional layers are denoted by
d1,
d2, and
d3, and the numbers of convolutional kernels for each layer are represented by
w1,
w2, and
w3. Both the number of convolutional kernels and the depth of the convolutional layers are integer types. The size of the convolutional kernels is set to a fixed value of 1 × 3, the dilation rate is set to 2, the learning rate decay step size is set to 20, and the decay factor is set to 0.1. “Transform” indicates whether the hyperparameters are searched on a logarithmic scale during the search process in the set space. Based on experience, the maximum number of iterations for Bayesian optimization is set to 60, with 40 training epochs per iteration. The values of the hyperparameters for the optimal combination obtained are shown in
Table 4, and the corresponding schematic diagram of the optimal model architecture is presented in
Figure 13.
To verify the advantages of the proposed improved CNN architecture and to compare the performance improvements brought by different combinations of enhancements, several models are constructed: a conventional CNN model without any architecture enhancement (CNN); a conventional CNN model with the residual network structure (Res); a CNN model that shares both residual and multi-scale networks (MK-Res); and a CNN model that shares both residual and attention mechanisms (SE-Res). The architecture hyperparameters of the feature layers for these four models are set consistently with the proposed improved CNN model. The training hyperparameters for the four models were obtained through hyperparameter tuning using Bayesian optimization. The error loss and validation accuracy of the five models throughout the training process were recorded as they varied with the number of training epochs, and the results are compared in
Figure 14.
Figure 14a and
Table 5 compare the test accuracy trends of the five models as training epochs progress. It can be seen that, compared to the CNN model, all four improved models exhibit varying extends of enhancement in the final test accuracy. The final test accuracy of the CNN model is 96.16%. The final test accuracy of the Res model is 97.35%, which represents an improvement of 1.19% over the CNN model. The MK-Res model achieves a final test accuracy of 98.06%, improving by 1.90% compared to the CNN model. The SE-Res model has a final test accuracy of 97.47%, an increase of 1.31% over the CNN model. The proposed model reaches a final validation accuracy of 98.25%, marking an improvement of 2.09% compared to the CNN model.
It is equally important to note that the feature extraction layers of all five models are consistent, with the differences between the models lying in the use of various improved architectures within the feature extraction layers. From the final results, it is evident that the residual network structure, multi-scale network structure, and channel attention mechanism all contribute to varying degrees of performance improvement, with the combination of all three achieving the most significant enhancement. Based on the principles of these improved architectures, the channel attention mechanism is able to discard irrelevant parameters during training, thus not only improving the model’s performance but also accelerating the overall convergence speed. The residual network structure helps the model train more effectively and improves recognition accuracy. The multi-scale network architecture enriches the scale of the extracted fault features, enhancing the diversity of the fault feature space, which, in turn, boosts the model’s final recognition accuracy. From the final results, it can be seen that for complex tasks such as ITSC fault severity recognition, the multi-scale architecture has the greatest impact on the model’s performance, followed by the channel attention mechanism, with the combined use of all three yielding the best results.
Figure 14b and
Table 5 show the comparison of the loss trends of the five models as training epochs progress. From the figure, it is evident that the final error losses of the four improved models are all better than those of the CNN model. Among them, the proposed model has the smallest error loss and exhibits the best generalization capability, followed by the MK-Res and SE-Res models. The Res model has the highest error loss among the four improved models.
To accurately assess the performance of the proposed model in different severity labels, three metrics are introduced for comprehensive evaluation: recall (
r), precision (
p), and
F1 score. In large datasets, there exists a tradeoff between recall and precision. The
F1 score takes into account both recall and precision, thereby providing a more holistic representation of the algorithm’s performance. The specific definitions of these evaluation metrics are presented in Equation (28):
To comprehensively compare the performance of the proposed model with the four other models, the confusion matrices of the five models on the test dataset are compared. The confusion matrices for the five models are shown in
Figure 15,
Figure 16,
Figure 17,
Figure 18 and
Figure 19. The leftmost labels of the confusion matrix represent the actual severity of ITSC faults contained in the test dataset, categorized into 17 types, arranged in ascending order according to the fault severity calculated using Equation (18). According to the definitions of precision and recall, the precision for each label is derived from the ratio of the number of samples in the diagonal to the total number of samples in that column, as shown in the row vector at the bottom of the confusion matrix. The recall for each label is determined by the ratio of the number of samples in the diagonal to the total number of samples in that row, as indicated by the column vector on the right side of the confusion matrix. The classification accuracy of the models is calculated as the ratio of the number of correctly classified samples on the diagonal to the total number of samples in the test dataset.
From the figures, it can be observed that, compared to the confusion matrix of the CNN model, the four improved ITSC fault diagnosis models show a significant reduction in the number of misclassified samples. All four improved models exhibit varying degrees of improvement in terms of “false alarms” and “missed detections”, although there remains room for further enhancement.
To further compare the performance of the five models across different fault severity labels, the
F1 scores and overall test accuracy for each model based on the precision and recall from the confusion matrices in the test dataset are calculated. The comparison results are shown in
Table 6. From the table, it can be seen that while the four improved ITSC fault diagnosis models show varying degrees of improvement in the overall test accuracy compared to the CNN model, the
F1 scores for different fault severity labels reveal mixed performances among the five models. The four improved models exhibit clear advantages in labels associated with lighter fault degrees, showing significant increases in
F1 scores. Among the 17 different fault classifications, the proposed models achieved the highest scores in 12 of the fault categories, demonstrating the best performance.
To reduce the impact of randomness, the diagnostic results from five repeated experiments were averaged, and the standard deviation of the results for each experiment was calculated. Additionally, the time taken by the model to recognize the test set in each experiment was recorded, and the average recognition time per data slice was computed, as shown in
Table 7. From
Table 7, it can be seen that the proposed model not only achieves the highest average accuracy of 98.20% but also has the smallest standard deviation of 0.105%, indicating both good accuracy and stability. The complexity of the deep learning model is represented by the total number of adjustable parameters, including weights and biases, as shown in
Table 7. It is evident from the table that each improvement measure added to the model increases its complexity. The proposed model, incorporating the most improvements, has the highest complexity. The average recognition time of the model reflects its data processing speed. From the table, it can be seen that the proposed model has the longest average recognition time, reaching 1.14 ms, but this is still much smaller than the 0.2 s sampling time per data slice, meaning the model can meet the required time for data processing.
Through a comprehensive analysis of the five ITSC fault diagnosis models, it is evident that the proposed model exhibits the best performance in terms of the final test accuracy and stability. In the
F1 scores across 17 different fault severity labels, the proposed model demonstrates overall superior performance, making it the best-performing model among the five. To validate the feature learning capability of the proposed model, the t-distribution stochastic neighbor-embedding algorithm (T-SNE) was used to visualize the features from the final output layer of the ITSC fault diagnosis model, and the results were compared with the other four models. The two-dimensional visualization results are shown in
Figure 20. From the figure, it can be observed that the feature map contains 17 colors, each corresponding to a specific fault severity label, with each point representing a data sample.
Figure 20a shows the feature distribution of the input layers of each model. It is apparent that the feature distribution of the input data is chaotic, with significant overlap among samples of different colors, making it difficult to discern the fault severity of the corresponding samples in the dataset based solely on the input data.
Figure 20b–f display the feature distribution maps of the classification layers for the CNN model, Res model, MK-Res model, SE-Res model, and the proposed model, respectively. From these figures, it can be seen that after feature extraction by the model, the samples within the same ITSC fault severity labels exhibit good intra-class clustering characteristics. The proposed ITSC fault diagnosis model has the fewest misclassified sample points compared to the other four models. Additionally, the boundaries between different ITSC fault labels are clear and more distant, resulting in better separation characteristics among different categories. Thus, the proposed model demonstrates superior feature learning and discrimination capabilities.