1. Introduction
Electric machines are a vital component for major industries, such as manufacturing, mining, agriculture, energy and transport sectors. It is fair to say that all of these sectors are currently undergoing major growth and technological innovation. These machines are typically required to operate under harsh environmental conditions and demanding drive cycles, which gives rise to premature degradation and the occurrence of catastrophic failure modes [
1,
2]. It is imperative for the future viability and sustainability of these sectors that we have efficient, robust and highly reliable electric machines. Sudden catastrophic machine breakdown results in acute manufacturing downtime, dramatic reductions in productivity and health and safety concerns. Moreover, performing critical maintenance is both labour-intensive and costly. Faults can be difficult to diagnose and troubleshoot for maintenance teams [
3,
4,
5].
Broadly, electric machines can be broken down into the bearings, stator, rotor and other elements as shown in
Figure 1a. The statistics of failure for three classes, low voltage, medium voltage and high voltage, are presented in
Figure 1b. Bearings are the dominant failure mode at low and medium voltages, followed closely in the latter class by stator fault modes. The final class, high voltage, is dominated by stator fault modes, as seen in
Figure 1b. Research studies have shown bearings to be responsible for up to
of low-voltage electric machines breakdowns and up to
of all rotating machine failures [
2,
6,
7,
8,
9].
Condition-based monitoring has received considerable attention over the past years [
2,
10,
11,
12]; hence, a rich literature exists at present. The areas of detection and diagnosis of fault modes has received the most attention with mature industrial technologies available. More recently, advanced methods of prognosis are being investigated, and these focus on the more challenging problem of predicting the Remaining Useful Life (RUL) of the machines or sub-components [
13,
14,
15]. This is illustrated in
Figure 1c. Knowing the RUL of a component ensures that the state of health of a machine is known and that suitable maintenance can be performed at the optimal times. The maximum usable life is achieved without the threat of total machine breakdown occurring [
16,
17,
18,
19].
Typically, machine-learning (ML) methods for CbM focus on three key aspects: signal sensing modalities, feature engineering and supervised learning algorithms. Sensing modalities for CbM methods that have been explored in recent years include acoustic emission (noise) [
5,
20,
21,
22], air gap torque monitoring [
23,
24], motor current signal analysis (MCSA) [
25,
26], oil and debris-monitoring [
27,
28], thermography [
29,
30], speed fluctuations monitoring [
31], induced voltage, magnetic flux monitoring and vibration signals [
4,
5,
11,
16].
The Short-Time Fourier Transform (STFT) feature extraction method has been extensively used to extract useful time-frequency features, which reported to achieve high levels of classification accuracy [
32,
33,
34,
35]. Envelope Analysis (EA) has also been used extensively for prognostic and diagnostic purposes as the method is simple yet versatile, making it applicable to many different types of mechanical fault monitoring processes [
21,
22,
36,
37,
38]. Other methods of time-frequency feature extraction, which have shown good promise for bearing prognostic use cases are Wavelet Transformation (WT) [
39,
40,
41,
42] and Empirical Mode Decomposition (EMD) [
43,
44,
45,
46,
47,
48], both of which were reported to achieve highly accurate performance scores.
In the area of supervised learning, two of the most versatile, simple and effective ML techniques that have been extensively used for bearing condition monitoring are
k-Nearest Neighbour (
k-NN) [
49,
50,
51] and Support Vector Machines (SVM) [
52,
53,
54,
55]. More recently, advanced deep-learning techniques have been applied for prognostic purposes, and these use multiple traditional methods combined and neural networks to perform RUL classification [
40,
45,
48,
56,
57,
58,
59].
In [
59], an RUL prediction method was proposed based on a long short-term memory (LSTM) neural network framework and deep features, which were learned adaptively from the two health states. Benali [
48] proposed a method to characterize and classify seven different bearing classes using statistical features, EMD energy entropy and an artificial neural network (ANN). Li [
56] used a combination of two supervised ML techniques; a regression model and multi-layer ANN to predict the RUL of rolling element bearings.
Here, in this paper, a novel ML method for RUL is proposed, using non-linear signal processing techniques to perform feature engineering based on STFT and EA with One-third octave band feature compression. The rationale for using Fourier and EA-based feature extraction was as a result of detailed vibration signal analysis conducted on the bearing signals from the dataset. This motivated the incorporation of non-linear feature compression of the multidimensional feature space using Octave bands as the prognostic information signatures are highly concentrated in the lower portion of the spectra.
These features form part of the ML method recipes alongside twelve different supervised learning algorithms based on k-NN and SVM to determine the optimal choice for RUL classification. SVM and k-NN algorithms were chosen because of their robustness for supervised learning problems, in particular problems with datasets of limited size, which greatly limits the suitability of applying other more advanced deep-learning approaches, e.g., ANN and LSTM. The work also highlights the importance of using non-linear wear-state models to track the degradation severity levels; this has been shown to greatly improve the performance of the ML classifiers overall for this RUL task. The time frequency analysis conducted on the vibration signals also motivated the investigation of non-linear wear state models as the bearing degradation typically does not follow linear trends.
The remainder of this paper is organised as follows.
Section 2 presents a graphical and statistical analysis of the vibration signals. The proposed ML method is detailed in
Section 3. The experimental procedure is illustrated in
Section 4. In
Section 5, the results from the proposed CbM system are presented.
Section 6 presents the major trends and findings from the results and statistical analysis, and the limitations of this work are discussed. Finally,
Section 7 concludes the work and highlights future research avenues to explore.
4. Experimental Procedure
This section describes the experimental procedure, which can be summarised under three main strands: the ML method recipes, the round robin framework and the performance metrics.
4.1. ML Method Recipes
The experimental procedure for this research involved varying the following parameters as described in the previous section: (a) feature extraction using either STFT of the discrete-time domain signal or the envelope of the vibration time-series data, (b) feature selection using full spectra from 0 to 12,800 Hz , linear bands or one-third octave bands as feature vectors, (c) the wear-state classification model using either linear, , or non-linear, , temporal class boundaries, (d) model training and testing using a SVM or k-NN method approach. In the case of SVM kernelling six function options including: linear, quadratic, cubic, fine, medium and coarse Gaussian, were applied to convert the input signals to a higher dimensional feature space. Six classification methods for determining the target class for the k-NN algorithm were investigated including fine, medium, coarse, cosine, cubic and weighted.
4.2. Round Robin Framework
All seven bearing test cases were incorporated into each RUL estimation process in a round robin framework that seeks to maximise the data set as well as mitigate problems relating to over-fitting. The experimentation process involved allocating six bearing signals as training datasets to teach the ML algorithms. The seventh bearing test-case was used for testing purposes. Once RUL estimations were obtained for each of the out-of-sample test signals, the bearing was added back to the in-sample testing pool, and the next sequential bearing was transferred to the testing pool. The ML prediction model was retrained, and this was iterated until RUL estimations had been obtained for all seven bearing test cases.
The incorporation of this framework to train and test the performance of each prediction algorithm greatly reduces over-fitting as we are only using out-of-sample test signals. All model training data comprises of signals from a completely different bearing for each test case. This gives an extremely accurate interpretation of how the models would perform in a real-world application using signals from different bearings used to train the models in every case.
The classification process involves dealing with a multi-class and multi-label classification model, which comprises five temporal wear-state classes to be estimated for thousands of consecutive time samples. A moving-average (MA) filtering technique is incorporated to smooth out any undesirable whipsawing or erratic transitioning between the five temporal wear-state classes. The MA technique involves taking a window length of nine discrete-time samples, consisting of the current target prediction, the previous four predicted targets and the following four predictions. The mode of the nine predicted values is then assigned to the current target sample. In a real-time application, these nine samples comprise a 40 s temporal time period. This short time period is extremely negligible over a bearing’s lifetime, which is typically years for a real system.
4.3. Performance Metrics
The performance of each ML approach investigated was analysed by computing the Jaccard Index [
74,
75], Equation (
8) and multiplying by a factor of 100 to obtain a percentage accuracy value.
where
z represents the true class of a time-sample and
represents the class prediction from the ML algorithm.
The Mean Absolute Error (MAE) was calculated for each of the classification models and feature selection options [
76,
77]. Using Equation (
9), the absolute error between the predicted target instances and the real expected values was calculated for each time-sample. This resulted in a natural number in the range 1 to 4, as we are dealing with five wear-state classes and the maximum error a prediction could possibly be classified as would be four classes away from the expected real class value. The error for each individual time-sample was summed and divided by the total number of test-samples to calculate the MAE. These MAE values for each classification model were then normalised by dividing each MAE result by 4 and are compared in the tables below.
where
M represents the total number of target instances to be classified,
z represents the true class of a target instance and
represents the class prediction from the ML algorithm.
5. Results
This section presents the results obtained from the proposed ML framework for RUL classification.
5.1. Linear Wear-State Classification Approach
The linear wear-state classification accuracy and MAE error results achieved using the STFT features extracted from the discrete-time signals are presented in
Table 4. In the case of the SVM classification method, the lowest accuracy results were recorded at
for the one-third octave band feature set using a fine Gaussian kernel function. The highest classification performance achieved was
using the one-third octave band feature selection and a coarse Gaussian kernelling method.
For the k-NN classification method, the lowest classification performance recorded was for the linear band feature set using fine k-NN. The highest classification accuracy achieved was by the one-third octave band features with cosine k-NN. The MAE results indicate that the coarse Gaussian kernel using the one-third octave band features was also the best-performing model with the lowest MAE score. The lowest error scores of were achieved by both the cubic and cosine k-NN models.
The results of the experiment using the signal envelope-derived features and linear wear-state classification are presented in
Table 5. In the case of the SVM model, the lowest performance was recorded at
accuracy for the one-third octave band FFT features using a fine Gaussian SVM classifier. The highest classification performance, on the other hand, was achieved at
, for the one-third octave band summed envelope features using a Linear SVM kernel function.
For the k-NN classification results, the lowest performance was recorded at accuracy for the squared one-third octave band features from a fine k-NN classifier. The highest classification accuracy was achieved at for the squared linear band features using a coarse k-NN classifier. The MAE results indicate that the linear, coarse and medium Gaussian kernel functions using the summed one-third octave band features were the best-performing model with the lowest MAE score. The coarse k-NN model proved to be the best option from the MAE values also, proving to be the most accurate in both Jaccard Index and MAE aspects.
5.2. Non-Linear Wear-State Classification Approach
The results achieved using non-linear wear-state classes and STFT features are presented in
Table 6. For the SVM experiments, the lowest performance recorded was
for the linear band features using a cubic kernel function SVM model. The highest performance of
was achieved by the one-third octave band features using a medium Gaussian kernel function.
In the case of the k-NN experiments, the lowest performance recorded was from the linear frequency band features with the Fine k-NN. The highest performance accuracy of was achieved using the one-third octave band features with Coarse k-NN. This was also the highest classification performance achieved overall for the STFT analysis feature study.
The normalised MAE results for the STFT spectral features using SVM classification models indicate that the Medium and Coarse Gaussian kernel function using the one-third octave band features was also the best-performing model with the lowest normalised MAE score. Similarly for the k-NN experiments, the one-third octave using Coarse k-NN achieved the lowest error score.
The non-linear wear-state results for the Signal-Envelope-derived features using are presented in
Table 7. For the SVM models, the lowest performance recorded was
for the squared one-third octave band features using a cubic kernel function, whereas the highest performance of
was achieved for the summed one-third octave band envelope features using a linear SVM kernel function. In the case of the
k-NN classification results, the lowest performance recorded was
for the squared one-third octave band features using a Fine
k-NN classifier.
The highest performance accuracy was for the one-third octave band FFT features using a cosine k-NN classifier. This was also the highest classification performance achieved overall for the Envelope Analysis k-NN study and importantly for all of the experimental studies reported in this work. The normalised MAE results using the signal envelope derived spectral features applying SVM classification models indicate that the Linear kernel function using the one-third octave band features were the best-performing model with the lowest MAE score of .
The Cosine k-NN model proved to be the best options for the non-linear temporal classes, with an error score of , which was also the lowest overall error score across all four experimental studies presented. Accordingly, these best MAE values also correspond with the best classification accuracy achieved, which is not unexpected.
6. Discussion
The results presented in
Section 5 highlight the best- and worst-performing supervised ML approaches for both the linear and the non-linear wear-state classes that were investigated. The SVM-based algorithm approach that employed one-third octave band-based features was found to yield the best performance for the linear wear-state classes. This was the case for both STFT- and EA-based features, achieving scores of J =
with MAE =
for
using SVM (Coarse G) and J =
with MAE =
for
using SVM (Linear), respectively, as shown in
Table 4 and
Table 5, for the linear wear-state classification.
Again, for the non-linear wear-state classification the SVM (Medium G) based algorithm performed extremely well using the
STFT features by achieving scores of J =
with MAE =
as shown in
Table 6. Using the same octave band feature set, the
k-NN (Cubic) approach had comparable performance coming in at slightly less accuracy at J =
with the same MAE =
, as shown in
Table 6. In the case of the EA features for non-linear wear-state classification, the SVM (Linear) achieved J =
with MAE =
, see as shown in
Table 7. However, the
k-NN (Cosine) had superior performance using the spectral-based EA features,
, achieving J =
with MAE =
. This was the best performance achieved for all the ML approaches over this entire investigation, with the best classification accuracy and the lowest MAE.
In order to better analyse and interpret the results more closely, confusion matrices are presented for the best-performing approaches for both the linear and non-linear wear-state classification approaches investigated. Accordingly, these confusion matrices correspond to the approaches that have their values highlighted in bold font in
Table 4,
Table 5,
Table 6 and
Table 7 as discussed previously. At the class level, these confusion matrices enable the classification results to be examined, and they allow the MAE to be quantified and better appreciated—for instance, regarding how many samples from class 1 were incorrectly classified as class 5.
The confusion matrices shown in
Figure 7 allow us to see the classification performance for each class by observing the percentage score along the diagonal. It is noted that the vast majority of classification inaccuracy (MAE) tends to be predicting the neighbouring class, which is significant, while this information was captured collectively for the entire class set using the MAE metric for ML approaches; however, these confusion matrices offer the granularity to identify which specific classes were the most challenging to estimate.
All of the ML recipes presented performed very well on class 1, the max and min range being
to
in comparison to the max and min range for class 5 of
to
. The ML recipe with the best performance overall at
with MAE of
for the linear wear-state classification was that of the
features with a SVM (Linear) algorithm, this is shown in matrix (c) of
Figure 7. This particular ML recipe achieved the highest classification for class 4 at
and was second best in class 1, 3 and 5, which ultimately led to it achieving the best overall score.
Similarly, as shown in
Figure 8, the classification performance for class 1 for the non-linear wear-state classes was good; however, the range was wider, with max and min values of
and
, respectively. Whereas in class 5, the max and min range was from
to
. The best performance overall at
with MAE of
for the linear wear-state classification was the ML recipe that comprised of the
features with
k-NN (Coarse) algorithm, this is shown in matrix (d) of
Figure 8. This ML recipe strong performance in class 1 and average performance in class 2 and 3 with poor performance in class 5.
However, the performance in class 4 at
significantly outperformed the other ML recipes shown, and this lead to it scoring the best overall. These trends in individual class performance also be viewed in
Figure 9, without the benefit of visualising where the incorrectly classified test cases have been predicted. These points along with a mean value correspond to the diagonal values for the confusion matrices in
Figure 7 and
Figure 8. The high performance of class 1 for both the linear and non-linear wear-state class options is identifiable as well as the decreasing trend as the classes approach the failure stage of the bearings under test.
Prior work by Sutrisno et al. [
78], Singleton et al. [
79] and Lei et al. [
80], presented ML methods that achieved percentage accuracy scores of
,
and
, respectively, using the PRONOSTIA bearing dataset. However, these proposed methods utilised a framework where only bearings S. 01 and S. 02 were used for training the algorithm, and the remaining five bearings were used for testing. The round-robin experimental framework presented in this paper presents the mean percentage accuracy of all seven bearing signals whereas the prior work only presented the mean of five signals. Furthermore, the MAE performance metric was used for analysis purposes to ascertain the severity of the misclassifications.
Feature subset compression using the non-linear one-third octave-based filtered for both linear and non-linear wear-states performed very well in both cases. This can be attributed to placing a higher emphasis on the lower portion of the spectra for feature extraction. From a feature-engineering perspective, this was shown to offer more valuable diagnostic trend information for characterising the health condition of the bearings under test [
11].
Importantly, this reduces the multivariate dimensionality [
70,
71] of the feature space in a more optimal way compared with linear filtering as the results demonstrated by yielding superior classification performance. Moreover, using a non-linear wear-state model approach to classification is more suitable as the ageing mechanisms typically follow an non-linear exponential trend; hence, we can see significantly higher performances achieved. Clearly, a trade-off exists as if the size of class 1, is too large with respect to the others, then the suitability of the RUL framework for taking timely action, such as equipment maintenance and critical parts replacement diminishes.
As the subsequent classes would therefore be too short in time, the severity of degradation between these classes would occur rapidly. Our non-linear exponential model described in Equation (
6) strikes a suitable balance and was found to work extremely well in this proposed approach.
The highest overall classification scores were achieved using the Cosine k-NN classifier. This was achieved across all seven bearing test-cases using the round robin framework and, hence, demonstrates how the proposed ML methodology performs on unseen raw vibration signals. However, these k-NN and SVM ML algorithms are heavily reliant on the depth of the training data, which is common in the field of supervised learning and hence makes these methods prone to producing over-fitted prediction models. While this paper introduced a valuable and robust approach for RUL estimation, future work might investigate applying this proposed method on larger data-sets.
The dataset used here in this research was limited in terms of the total number specimens aged and captured using vibration signals In addition, the level of accelerated ageing is perhaps too rapid, which led to a high proportion of abrupt failure modes occurring approximately of the time. These have a completely different degradation trend to the gradual ageing mechanisms; hence, this places limits on testing the true efficacy of the ML frameworks and recipes due to model over-fitting. If the datasets were significantly improved by increasing the number of specimens and reducing the level of accelerated ageing, this would offer the potential to explore ML approaches that employ advanced deep learning using neural networks.
In testing the versatility and robustness of the proposed ML method recipes on different bearing types and sizes under different speed and load conditions, work could explore vibration data gathered from research testbeds where the shaft speed changes. This will require developing extensive experimental campaigns to create more advanced datasets that better reflect typical real-world operating conditions.
7. Conclusions
Traditionally, condition-based monitoring (CbM) of electric and rotating machines has focused heavily on two primary areas, the detection and the diagnosis of fault modes. More recently, research efforts have investigated the more challenging area of prognosis to determine the remaining useful life (RUL) of the machine under test. This paper introduced a valuable machine learning (ML) approach to estimate the RUL of rolling element bearings, which are a core component of rotating machines.
The proposed ML recipes and approaches comprise of signal processing techniques and ML algorithms applied to real-world vibration signals, which were acquired from the outer-race of bearings degraded over time using an accelerated ageing test-rig. The paper reports the results for linear and non-linear wear-state models using novel feature engineering derived from Short-Time Fourier Transform (STFT) and Envelope Analysis (EA) representations. In addition, two different classification algorithm approaches, k-Nearest Neighbour (k-NN) and Support Vector Machines (SVM), were investigated and compared.
This work achieved classification accuracy results of up to with a mean absolute error (MAE) of , which demonstrates the method’s efficacy for performing the task of RUL. This ultimately offers a robust and low complexity approach that is highly valuable for advanced predictive maintenance purposes in industry.