4.1. Simulated Current Data
The classifier development using simulated current data was conducted separately for the FFTs of the three-phase currents, here referred to as IA, IB, and IC. In addition, statistical features computed from the FFTs of the three-phase currents (dataset Ifeat_200Hz) were used to develop classifiers.
The results of the current-based classifiers are shown in
Table 4. With the raw FFTs of individual phase currents as the input, the CB classifier achieved 99.2–100.0% BAC with a standard deviation of 0–1.9% in nested CV with only 30% of the samples used in the training. These CB classifiers had a BAC of 99.6–100.0% on the holdout test dataset, already showing excellent generalization performance on unseen data with a small number of training samples. The corresponding LR classifier, on the other hand, had a BAC of 72.9–86.7% with a standard deviation of 16.3–20.3% in nested CV when 30% of the samples were used. Still, these LR classifiers had 92.9–100.0% BAC on the holdout test dataset. However, the nested CV score of LR with raw FFT input increased when the number of samples was increased to 50% and did so even more with 70% of the samples where the BAC was 98.3–99.7% with a standard deviation decreased to 0.7–3.3%. The results with FFT-based data demonstrate that the standard deviation of BAC
nCV decreases with the LR model when more samples are used to develop the model. With the CB model, the standard deviation is relatively low already with the lowest number of samples. The results with feature-based data demonstrate, on the other hand, that the standard deviation decreases with both model types when more samples are used to develop the model. This suggests that there was not enough data used in the development of the models that had high variance.
There are several reasons why the nested CV score can be lower than the corresponding score on the holdout test dataset. The nested CV score is based on evaluating each sample in the development dataset, i.e., the majority of the whole dataset, whereas the holdout test dataset is a minor part of the whole dataset. Thus, the nested CV provides a better estimation of how the model works on data that has not been used in the model development. On the other hand, there are fewer samples available for training a model within the outer loop of the nested CV procedure than there are available for training the final model, which can affect the prediction accuracy.
Table 4 also shows that the performance of the CB classifier trained on the FFTs of individual phase currents remained approximately the same when the number of samples used in the development was increased, although the training time measured as real-time increases. However, using the feature-based input I
feat_200Hz to train the CB classifiers requires 70% of the samples to be used in the development to reach almost as high BAC in nested CV (96.0% ± 4.7%) and when using the holdout test dataset (98.9%). Still, one should note that the model development is approximately more than three times faster with the feature-based dataset compared with raw FFT data, as the number of inputs is lower. In addition, the lower number of inputs affects the computation time of the model itself when it is used to make predictions.
In contrast to the CB classifier, the LR classifier performed better when using the feature-based input rather than raw FFT input. The development of the LR classifier took only 0.4 min with feature-based input, regardless of the number of samples, which is approximately 47 times faster compared with the corresponding CB models. Compared to the development of LR and CB models using raw FFT data, the feature-based LR model was respectively 22–46 and 114–144 times faster to train, depending on the number of samples used. From the application point of view, the best choice from these options would be to develop an LR model that takes features computed from FFT data as input as that model is both fast to train and achieves 100% BAC in nested CV and when using the holdout test dataset. This LR model extrapolates well, as the holdout test dataset included lower and higher load points than the development dataset.
4.2. Simulated Vibration Velocity Data
The simulated vibration velocity FFT data was used to form two datasets, namely v
v_5000Hz and v
v_feat_5000Hz. The former contains unprocessed FFT data (vibration spectrum), and the latter only contains statistical features computed from the FFT data. Using the simulated vibration spectrum as the input for the classifiers, high BACs are obtained with both classifiers, as shown in
Table 5. With raw FFT data, the LR classifier achieved 98.3% BAC with a standard deviation of 3.7% in nested CV and 100% BAC with the holdout test dataset using only 30% of the samples. Improvement was nevertheless obtained when 70% of the samples was used to train the LR model as the standard deviation of BAC in nested CV decreased to zero while the BAC remained at 100.0% in nested CV and using the holdout test dataset. With the CB classifier trained on raw FFT data, 70% of the samples were required to obtain 97.2% BAC in nested CV, but the standard deviation was still higher than with the LR classifiers. The nested CV BAC of feature-based LR and CB classifiers only increased slightly (from 81.1% to 85.2% and from 81.7% to 84.8%, respectively), when the number of samples was increased from 30% to 70%. Similarly, as with the simulated current data, the standard deviations decrease with simulated vibration data when more samples are used in the model development.
With simulated vibration velocity data, the feature-based LR and CB models were, respectively, 5–20 and 7–10 times faster to develop compared with the pure FFT-based classifiers. The time required to develop the CB classifiers remained approximately the same regardless of the number of samples used in the training. The development of feature-based CB classifiers was approximately eight times faster compared with raw FFT. Based on these results, it can be concluded that the LR model trained with the FFT of vibration velocity data works the best in this case, and the model extrapolates well, as 100% BAC was obtained on the holdout test dataset that included lower and higher load points than the development dataset. Although its development time is higher compared with that of the same model trained on the feature-based input, it is still reasonable.
These results suggest that the extracted statistical features fail to capture all the relevant information from the raw FFT vibration velocity data, whereas with the simulated current data, the features led to better results. The results in
Section 4.1 demonstrate that with the simulated current data, the number of samples has a greater effect on the accuracy of the LR classifier compared with the CB model. With CB, BAC of 100% was already obtained in nested CV and with the holdout test dataset with the lowest amount of samples used, whereas the LR model required the highest amount of samples tested to achieve the same. However, with the latter, it was the feature-based approach that was not only the most accurate but also the fastest to develop and one of the fastest to make predictions. With simulated vibration velocity data, on the other hand, the feature-based approach did not yield as high accuracies as the FFT-based approach. Still, BAC of 100% was obtained in nested CV and with the holdout test dataset with the vibration velocity spectrum as the input for the LR model, although it does this with a higher computational cost compared with the best current-based model. In general, the input feature set had a more dominant effect on the accuracy and the computational efficiency than the number of training samples.
4.3. Measured Vibration Acceleration Data
Four different sets of features (a
v_200Hz, a
v_feat_200Hz, a
v_feat_f1, and a
v_feat_f123) were separately formed from the signals of six accelerometers and used to develop LR and CB classifiers to identify a BRB in IM. The data acquisition of the measurement data was discussed in
Section 3.2.2. Vibration acceleration sensors were mounted on the drive-end and non-drive-end shields of the IM in vertical, horizontal, and axial directions. In this section, these sensors are referred to as DE
hor, DE
vert, DE
ax, NDE
hor, NDE
vert, and NDE
ax. Classifiers were also trained using the frequency-wise sum of the fast Fourier-transformed vibration signals of the six sensors and with the statistical frequency domain features computed from the FFT data, as discussed in
Section 3.3.2. The model development procedure was repeated five times, as discussed in
Section 3.5, and the results shown in
Table 6 are the average values obtained from these five repetitions. The best signal source for each dataset was selected based on BAC
w as described in
Section 3.5. The best signal sources are shown in bold in
Table 6.
The best BAC
w score, 90.1%, was obtained with the LR classifier trained on FFT data (a
v_200Hz), computed from the sensor DE
hor signal. However, the BAC
w for the LR classifier trained on the feature-based a
v_feat_f123 dataset was almost as high (87.3%), while the computation time required to develop the feature-based classifier was approximately 1.5 times shorter than that of the FFT-based classifier. The slightly longer development time of the FFT-based classifier is not only caused by the higher number of input variables but also due to the different feature engineering options in the hyperparameter optimization, which were discussed in
Section 3.5. In particular, having ROCKET transformation as one option to process the data caused slightly longer computation times with the FFT-based datasets.
The highest BAC
w score for the CB classifier was 86.2%, which was obtained with three of the four input types (excluding a
v_feat_f1). Although the highest BAC
test with CB was obtained with a
v_feat_f1 and DE
hor, the corresponding nested CV BAC was only 79.0% ± 10.8%. The possible reason for such a result was discussed in
Section 4.1.
In this case, the standard deviations of BAC
nCV were 7.4–13.4% with the LR model and 7.6–15.3% with the CB model. However, the standard deviations of the models trained with a specific input are relatively close to each other regardless of the sensor, i.e., the measurement direction. This suggests that there might be some samples in the dataset with an information value that is not so good, i.e., they are challenging to learn from and to classify. This could be confirmed by looking at the individual samples one by one and checking whether samples of some specific operation area are systematically misclassified. In such case, obtaining more data for development could help, as the results in
Section 4.1 and
Section 4.2 demonstrate.
The development time of the CB classifiers was in the range of 3.3–7.3 min with the feature-based approach and 67.8 min with the FFT data. The CB model was faster to train than the LR model with the feature-based datasets av_feat_200Hz and av_feat_f1, but a bit slower with the av_feat_f123 dataset. However, with the FFT-based dataset av_200Hz, the LR model was almost seven times faster to develop than the CB model, suggesting that with these datasets, the LR model scales better to a higher number of input features than the CB model. One must keep in mind that the number of training samples is constant in each of the experiments shown in this section.
When using the raw FFT data as input for either classifier, the optimization algorithm found that applying the ROCKET transformation on the FFT data results in a smaller logistic loss. Analyzing the hyperparameters of the LR model, the inverse of regularization strength C obtained higher values with the feature-based dataset compared with the raw FFT. This is logical, as raw FFT data contain many more variables than the feature datasets, and thus stronger regularization is needed to prevent overfitting the model. Overfitting is especially a problem when the number of features is higher than the number of samples. With L2 regularization applied, the values of the coefficients of irrelevant features achieve values closer to zero than without regularization, which means that the regularized model does not respond so strongly to changes in these features.
The average computation times required to develop the classifiers and the corresponding BAC
w with different input features are visualized in
Figure 12. It summarizes the discussed findings and demonstrates that while the feature-based datasets mean a short development time with both classifiers, the maximum weighted BAC with them is lower than 88%. However,
Figure 12 also shows that the LR model scales better in terms of development time and can detect the bar failures more accurately than the CB classifier.
The computation times required to make predictions (i.e., the model run time), with FFT and feature-based classifiers and the corresponding BAC
w are visualized in
Figure 13. It shows that the FFT-based classifiers are slower to use for predicting the bar failures than the corresponding feature-based models. To analyze the reasons behind this,
Table 7 shows a breakdown of the total computation time required for predicting with these classifiers, including the computation time that the data processing requires as well as the time required to run the actual model to obtain a prediction.
With raw FFT data, the data processing step takes approximately the same amount of time with both classifiers. However, with the LR model, the actual prediction can be obtained in a significantly shorter time than with the CB model, as it is 481 times faster. Both classifiers trained with raw FFT data make use of the ROCKET transformation, which makes their data processing time longer compared to the feature-based approach. This suggests that the LR model scales better, not only in terms of development time when the number of features increases, but also in terms of the computation time required to make predictions. The feature-based LR model has a more than four times faster data processing pipeline and computes the actual prediction almost ten times faster than the corresponding CB model. In total, the feature-based LR model is over five times faster in computing a prediction than CB, but their accuracy is similar.
Even though the raw FFT-based LR classifier achieved the highest accuracy in this study,
Figure 12 and
Figure 13 show the importance of feature engineering. The feature-based classifiers are not only significantly faster to train but also to use in operation, and thus it may be beneficial to study the more extensive extraction of statistical features. While the most accurate model (i.e., the FFT-based LR model) can make approximately 17 predictions each second, the feature-based LR model reaches a speed of over 900 predictions per second. Each of the developed models is computationally fast enough to be used for real-time fault monitoring during operation. Naturally, depending on the hardware used (e.g., in edge computation), the computation time of the slowest models might limit the frequency of analyzing the bar condition, which should be considered when selecting the methods.
Figure 14 and
Figure 15 show classifications computed on the holdout test dataset with the best feature-based LR and CB models, respectively. In both, the x-axis and y-axis indicate the operation point of the machine (i.e., the rotation speed and load, respectively), while the color of the markers shows whether the classification was correct or not. There were four measurements available in the holdout test dataset for most of the operation points—two with both a BRB and a healthy rotor bar.
Figure 14 shows that the LR model classifies all but two samples correctly. This LR model was trained using features computed from the frequency-wise sum of six FFTs of measured vibration acceleration signals (a
v_feat_f123 dataset). The first is at the operation point, where the speed is 1500 RPM with zero load, in which case one of the two samples with a healthy rotor bar is classified as broken. At this operation point, the model is extrapolating, as the lowest load included in the model development data was 5%. The challenge in the zero load condition might be caused by the fact that when the load is low, the slip is low too, which in turn means that the side-bands in the vibration spectrum that are characteristic of the rotor bar failure are closer to the harmonic frequencies in comparison with the high slip values [
17]. The second wrongly classified sample is at speed is 900 RPM with a 60% load, in which a BRB is classified as
healthy.
The CB classifier, which was trained using features computed from the FFTs of measured vibration acceleration sensor DE
vert (a
v_feat_f123 dataset), failed to correctly classify seven samples out of 67 samples in the holdout test dataset, as shown in
Figure 15. As with the LR model, a broken bar was also detected as healthy at a 60% load and with 900 RPM with CB. Two of the misclassified samples represented extrapolating operation points with a load of 100% and a speed of 900 RPM where BRBs were classified as
healthy. The same misclassification was made for samples with a load of 20% and at 900 RPM, and at the same load level but at 1500 RPM speed, healthy bars were classified as
broken. Since the raw FFT-based LR model classified these operation points correctly, it might be that the difference between the faulty and healthy case is not so clear in the FTT frequency response, and hence the few selected statistical features fail to capture it, whereas the FFT-based LR model is sensitive enough to recognize the difference. Regardless of the model type, interpreting the classifiers is challenging, as various feature transformations are applied to the input data (ROCKET applied to FFT data or various methods applied to statistical features).
The results demonstrate that one specific measurement direction is not significantly better than any other regarding how accurately the bar failure can be detected. Interestingly, for each dataset, there is still a visible pattern regarding what is the best and worst measurement direction, as they are the same for both classifiers. For example, with raw FFT data, on average the horizontal measurement direction resulted in slightly higher accuracy than other directions, whereas the vertical direction is a bit worse than other directions. The horizontal direction is also a slightly better option with the av_feat_f1 dataset. With the av_feat_200Hz dataset, the vertical measurement direction is accuracy-wise better than other directions. The frequency-wise sum of FFTs computed from all signals was found to be best with the av_feat_f123 dataset, with a minor margin over individual signals. However, it requires all six measurements to be available for monitoring. Based on these findings, it seems that the most accurate rotor bar failure detection can be obtained with an LR classifier trained with the raw FFT data of vibration acceleration measured in a horizontal direction, and by transforming the FFT data using the ROCKET algorithm.
The experiments presented in this section included two additional input feature sets where domain knowledge was utilized to compute the statistical features of FFT, only within a narrow frequency range around the first or the first three harmonic frequencies, and not from the whole FFT sequence. The computation of the features around the first three harmonic frequencies resulted in almost as high accuracy as was achieved with the FFT-based input data but with 96 times shorter development time with the LR model, which demonstrates the potential of the feature-based approach even though only five features were extracted from each of the narrow frequency ranges. Focusing the analysis on the relevant frequency ranges reduces the amount of noise and redundant or irrelevant input features, which might be one reason for lower standard deviations in nested CV scores with the feature-based dataset. This highlights the importance of feature engineering. Still, the highest BACw score was obtained by using the data that were transformed using the fast Fourier and the ROCKET methods to train an LR model. In this study, the LR model performed overall slightly better than the CB model when both the accuracy and the computational efficiency are considered.