1. Introduction
Since the rise of machines and consequent industrial revolutions, rotating machines have become an integral and inevitable asset within virtually all industrial setups, irrespective of the sector. This is mainly due to their versatility and ability to adapt to the incredibly harsh operating environments that prevail in most industries. Components such as electric motors [
1,
2,
3,
4] and gearboxes are integral to the functioning of numerous industrial rotating systems and are often envisaged to be robust enough to withstand highly dynamic operations.
However, despite their often-resilient designs and configurations, gearboxes still experience failures, some of which have direct devastating consequences on income, environment, and human safety. Although the value of gearboxes has never been undermined, their criticality (especially due to their contributions to failure rates) to alternative energy systems, however, has further increased the intensity of the scrutiny they have received in recent times. For instance, the study by Spinato et al. [
5] highlighted that wind turbine (WT) gearboxes account for the highest mean-time-to-repair among all onshore sub-assemblies. Similarly, Gray and Watson [
6] also highlighted that as much as 33% of total operational downtime of energy generation systems can be attributable to their gearboxes. The findings from these studies [
5,
6] have also been corroborated by regional surveys and other technical reports. For instance, 3-year WT operational data from offshore wind farms in Netherlands stated that gearbox failures can account for more than 55% of total annual downtime, which sometimes corresponds to over 52% of energy not served [
7]. Feng et al. [
8] also conducted a similar survey for selected wind farms in the United Kingdom and reported nearly identical findings.
In general, incessant failure of the gearboxes of industrial systems are often attributed to inaccurate estimation of actual operating loads, unforeseen changes to loading conditions during operations, faulty component design, and/or inaccurate repair/replace decisions [
9]. Traditionally, maintenance interventions (mainly repair and replace) have served as remedies to such failures since the amount of consideration allotted to downtime was insignificant in the past. In contrast to such times, modern-day operations are very lean and mainly customer-oriented, owing to fierce global competitions. This is perhaps the reason for the surge in the popularity of predictive and condition-based maintenance (CBM) strategies [
10,
11,
12], whereby industrial assets dictate the frequency of maintenance interventions. Just as gearboxes have earned themselves the status of inevitability within most industrial operations, vibration monitoring (VM) [
13,
14,
15] is arguably one of the most widely applied CBM techniques owing to the established fact that all structures (static or rotating) exhibit their own peculiar individual dynamic characteristics. The fundamental premise of VM is to adequately understand, track, and determine the trend of these characteristics for individual critical assets, so as to determine deviations at incipient stages before the occurrence of catastrophic failures. Despite the huge successes recorded with well-established VM techniques in time [
16,
17], frequency [
18], and time–frequency [
19] domains, the rigour often associated with individualised synthesis of large volumes of data acquired from each measurement location on a typical rotating machine can prolong decision-making, which may lead to fatal consequences when dealing with critical safety systems. To further compound this problem, most modern-day industrial rotating machines are multi-component (e.g., gears, bearings, drive belts, rotors, electric motors, couplings, etc.) and multi-state (e.g., speed and load variations), thereby requiring a holistic approach to VM.
In order to alleviate this limitation and better optimise conventional rotating machine VM approaches, research endeavours over the past few decades have been exploring mechanisms through which VM information can be harmonised into single but representative frameworks. Such approaches are generally referred to as data fusion, information fusion, or hybridisation. In general, data fusion involves the combination of data and information from different sources to obtain enhanced accuracy that may not be achievable from a single source. This approach has the potential to overcome some of the real-life problems that have plagued the use of VM for incipient fault detection and diagnosis. Data fusion can occur at different stages of a typical faults classification process, with sensor, feature, and decision levels being the most common stages. At the sensor or data acquisition level, data from various commensurate sensors are directly fused, after which the most representative features are used to take decisions [
20,
21,
22,
23,
24]. At the feature level, individual sensors are used to acquire data separately, and the required feature vectors extracted. The feature vectors are eventually fused together and form the basis for decision-making of operational state classification [
25,
26,
27]. Finally, the decision-level fusion approach harmonises the estimated decisions that have been separately drawn from the feature vectors of individual sensors [
28,
29].
Generally, machinery fault diagnosis approaches that embed machine learning techniques within them usually entail two key phases, namely feature extraction and fault identification. The feature vectors generated during the feature extraction phase are usually applied as inputs in the chosen machine learning technique(s) for the fault identification phase [
30]. The fundamental essence of fault identification is to distinguish healthy from faulty machine conditions, based on the extracted features, which is also equivalent to the mapping of information from the feature space to the fault space [
30]. Classifiers and statistical learning methods, such as k-nearest neighbour (k-NN) algorithms, Bayesian classifier, support vector machine (SVM), and artificial neural network (ANN) have been widely utilised in structural health monitoring systems of rotating machinery [
30].
Kalman filtering [
31], weighted average [
32], algebraic functions [
33], Bayesian estimators, nonlinear system fusion, and adaptive observers [
34] are some of the earlier and well-established traditional sensor fusion approaches. Despite the usefulness of the academic research advancements achieved through these techniques, their mathematical intrusiveness could be a reason for their limited application for real-life rotating machine faults classification. For instance, Bayesian estimators are popular; the processing frameworks for dynamic models that are nonlinear often entail some multi-dimensional integrals that are often analytically intractable, thereby leading to estimate difficulties. In addition, the generated outputs are sometimes generic and associated with multimodality, asymmetries, and discontinuities [
35]. With regards to Kalman filtering, linearised transformations are only proficient when error propagations can be estimated using a linear function; otherwise, the generated linearised outputs are negatively impacted and lead to complete divergence [
36]. Adaptive observers can be very power and accurate when dealing with continuous time domain signals, but their effectiveness dwindles under low control and switching frequency, which is often an attribute of nonlinearities of several rotating machine components, especially gear systems [
37]. Moreover, each of these classes of techniques [
31,
32,
33,
34] are often focussed on a single stage fusion as well as applied to a single machine component, thereby increasing computational complexity and, in turn, further complicating an already tedious faults classification process. On the contrary, the recently developed composite coherent spectrum (CCS) [
38] and poly-coherent composite spectrum (pCCS) [
38,
39,
40,
41] significantly reduce computational stages associated with data fusion since it embeds both sensor-level and feature-level fusion into a single framework.
While previous studies on CCS [
38] and pCCS [
38,
39,
40,
41] have yielded encouraging outcomes, they have only been applied to rotor-related machine faults (mainly misalignment, rub, crack, and bow), which does not adequately represent the multi-component configuration of most modern-day rotating machines. Besides the restriction of scope of the founding CCS and pCCS studies with respect to rotor-related faults, their diagnosis approach is manually implemented, which could be unrealistic when faced with routine condition monitoring of multi-component industrial rotating machines, which often entails high-frequency sampling at multiple locations. The automated framework applied here was recently presented by Yunusa-Kaltungo and Cao [
42] to help address the laborious nature of manual faults classification of CCS. Although the framework encompassed initial feature generation through CCS data fusion, data dimensionality reduction via principal component analysis (PCA) and subsequent faults classification was achieved using several machine learning techniques, including ANN, SVM, k-NN, etc. The outcomes of the automated approach are usually a set of visualised decision maps representing individually simulated scenarios, which simplifies as well as minimises diagnosis-related downtime by allowing asset operators to easily identify anomalies at their incipient stages without necessarily possessing vibration monitoring expertise. Additionally, comparisons between other VM techniques indicated that CCS has significant advantages as a feature extraction method, owing to its ability to greatly reduce potential complexities that are sometimes associated with the machine learning input datasets. The results obtained from the study [
42] were encouraging and showed that ANN was most compatible with CCS. However, the study was also limited by its application to only rotor-related faults.
This study adequately extends previous works on CCS and pCCS based on two main premises. Firstly, it establishes a framework by which the diagnosis of multiple classes of rotating machine faults can be automated through machine learning algorithms. The second major contribution of this study is that it significantly builds upon the encouraging results obtained from the preceding part of this approach that was limited to well-known rotor-related faults [
42] by incorporating gearbox fault detection into a single framework. Hence, the extension provided here now considers an entirely different and unique class of rotating machine components—the gearbox, so as to complement earlier findings and ascertain robustness. Additionally, the proposed approach is primarily based on tools and features that are universally established across academia and industry (especially amplitude spectrum), thereby easing the transfer of theoretical knowledge into practice.
To accomplish this, the paper initially compares the proficiency of its approach to that of earlier related approaches in
Section 2, after which a brief theoretical overview of the proposed framework is provided in
Section 3 such that the current paper can be fully comprehended without the need to consult earlier articles.
Section 4 provides full details of the experimental designs, with particular emphasis on the experimental rig configuration, types of machine operating conditions simulated, technical specifications of instruments, and signal processing parameters. Previous studies [
42] have already recommended several rotor fault detection features, but it is uncertain that all of such features will adequately support the computational effectiveness of the current study. Therefore, in
Section 5, the performance of relevant features is initially examined, after which the most influential features are then identified. In
Section 6, the results of faults classification based on the proposed approach are presented as well as explanations of the implications of the findings for VM of rotating machines (in this case, gearbox faults).
Section 7 provides the validation of the applied method with independent public datasets. Finally,
Section 8 concludes the study and highlights possible future directions.
2. Comparison with Closely Related Works
The study of fault diagnosis in rotating machines is well-established and continues to generate spates of useful but sometimes closely related outputs in some cases, which makes it imperative to compare and contrast to identify niche areas. Therefore, this section is based on a comparative analysis between closely related approaches in fault diagnosis of rotating machines and the current study, so as to clearly highlight areas of potential interface, overlap, variation, limitation, and superiority. In order to better show the advantages of the proposed method and point out future research directions,
Table 1 provides comparisons with other recent studies in a similar area. The main criteria used for comparison are the data types, classification algorithm(s), application of data fusion, and fault classes considered.
Yunusa-Kaltungo et al. developed CCS [
38] and pCCS [
38,
39,
40,
41], which significantly rationalised computational stages associated with fault diagnosis through data fusion by embedding both sensor- and feature-level fusion into a single framework. However, their application has been limited to rotor-related machine faults and entail manual classification, which will increase the downtime related to fault diagnosis. This limitation led to the proposal of an automated framework [
42] that still used features generated via CCS but further involved data dimensionality reduction by PCA and eventual machine learning-based faults classification. The outcomes were very encouraging especially that the study [
42] exposed the compatibility of ANN with CCS but study was again confined to rotor-related faults thereby not all encompassing.
Cao et al. [
43] developed a deep transfer learning approach based on a convolutional neural network (CNN) algorithm, and their study advocated the suitability of the approach for deep feature extraction and gear fault diagnosis. Similarly, Shao et al. [
44] developed a CNN-based deep transfer learning framework for mechanical fault diagnosis and classification, while Soualhi et al. [
45] proposed a health indicator fed into an adaptive neuro-fuzzy inference system (ANFIS) to detect the state of health of a typical system and then diagnose sources of anomalies. The data collected through this method are electrical signals, mainly current signals as opposed to mechanical signals such as vibrations for non-invasive benefits, since the current, voltage, or power sensors that are already integrated into the control systems of electrical machines can be used. Azamfar et al. [
46] developed a novel multi-sensor data fusion methodology based on 2-D CNN for gearboxes fault diagnosis using motor current signature analysis. Zhang et al. [
47] proposed a novel unsupervised learning algorithm named fast intrinsic component filtering (FICF) for the fault diagnosis of rotating machinery. These studies have no doubt enhanced the knowledge around the fault diagnosis of rotating machines. However, they are limited by either focus on the class of singular faults (e.g., rotor faults or gear faults or bearing faults alone) which implies that alternative approaches will need to be considered for other fault classes, thereby increasing rigour and downtime or computational intensiveness of CNN-based approaches.
Table 1 provides more targeted merits, demerits, and coverage of individual study classes.
6. Classification Results and Their Implications
Having established the most influential features for both shaft/rotor and gear faults in
Section 5, the next stage of the analysis involves reducing the dimensionality of such features using PCA as well as examining the abilities of different combination of features to retain the highest variability. The implementation of PCA was based on the theories described in Equations (7)–(10). The four classes of features considered for this study are rotor/shaft only (α
1–α
5); GMFs only (β
1–β
5), combined rotor/shaft and GMFs (α
1–α
5) + (β
1–β
5) and SE-GMFs (γ
1–γ
5) features. The content distributions for 10 PCs were compared for all classes of features at all speeds as shown in
Table 4. Since it is well established that the most significant information will usually reside within the first few PCs, the performance of combined PC
1–2 and PC
1–3 was compared, where it can be seen that PC
1–3 held slightly superior information, which implies that it holds the potential to offer the most distinctive classification for all scenarios.
However, owing to the higher data requirements for the PC
1–3 combination and correspondingly higher computational burden, the performance of the PC
1–2 combination was additionally explored for comparative purposes as shown in
Figure 7. As anticipated, the PC
1–3 combination offered the best separation between the clusters that represent all machine conditions (
Figure 7a–c), but the performances of several PC
1–2 combinations were also encouraging, especially those that involved (α1–α5) + (β1–β5) and SE-GMFs (γ
1–γ
5) features in
Figure 7e,f and
Figure 7m–o, respectively. However, despite the good intercluster separations achieved with PCA, its manual approach makes it unsustainable for routine diagnosis of rotating machines, whereby huge amounts of data related to highly dynamic scenarios is involved. Based on this perceived limitation, there is a need for applying approaches that possess self-learning capabilities with minimal human intervention. One of such approaches is ANN, whose proficiency with the current framework has already been established with several rotor-related faults at various machine speeds.
The current study aims to consolidate as well as extend the robustness of the approach by investigating an entirely novel class of faults with regards to a CCS-based data fusion approach. The classification problem is defined as classifying the data into 3 classes (BC, SF, and MF) based on the selected features. To achieve this, 3 ANN architectures were examined for PC
1–3 and PC
1–2 combinations for all cases at all speeds. For the PC
1–3 combination, the ANN architectures had 3–10–3, 3–20–3, and 3–30–3 configurations for ANN
1, ANN
2, and ANN
3 respectively. For PC
1-2 combinations, however, 2–10–3, 2–20–3, and 2–30–3 configurations were respectively applied for ANN
1, ANN
2, and ANN
3. In order to ascertain the performance without PCA, ANN
4 was computed without PCA and its outcome was also used for comparison (i.e., 10–30–3 for (α
1–α
5) + (β
1–β
5) and 5–30–3 for (γ
1–γ
5)). It is vital to note that 3–10–3, 3–20–3, 3–30–3, 2–10–3, 2–20–3, 2–30–3, 10–30–3, and 5–30–3 for individual ANN configurations, respectively representing the inputs, number of neurons for hidden layers, and outputs. The analysis was conducted based on a 70–15–15 random split of features extracted from measured VM data for training, validation, and testing, respectively. The PCA step described in
Section 3.2.1 was then applied to 85% of the datasets (i.e., combined training and validation datasets), after which 15% of the datasets were then extracted from the 85% and used for validation. Subsequently, the classification steps described in
Section 3.2.2 were then applied to the testing datasets. The transfer function adopted here is the sigmoid symmetric transfer function. Since the ANN type is backward propagation, scaled conjugate gradient (SCG) was used as a learning algorithm as well as for overfitting avoidance.
Table 5,
Table 6 and
Table 7 provide full details of the configurations and performance at all speeds.
There are 2 aspects of evaluating the performance of ANNs: one is the accuracy of fitting and the other is whether overfitting occurs. As shown in
Table 5,
Table 6 and
Table 7, the results of different ANN architectures are very similar for same scenarios (i.e., same speeds and same sets of features). For instance, at 21Hz, the accuracy of ANN with inputs of PC
1–3 for (α
1–α
5) + (β
1–β
5) was significantly better than that of PC
1–2 for (α
1–α
5) + (β
1–β
5) and PC
1–2 for (γ
1–γ
5). However, PC
1–2 for (γ
1–γ
5) has the best classification results at the other 2 speeds. The ANN computed based on inputs without PCA yielded similar results overall, except that it performed better than PC
1–2 for both (α
1–α
5) + (β
1–β
5) and (γ
1–γ
5) at 21Hz. This was because the percentages of explained variance by PC
1–2 at 21Hz were relatively small (i.e., 45.577 or 3% and 73.442 or 8%). In general, there was no significant difference in the accuracies of the ANNs trained based on these 3 features as inputs at the same speeds. Further evidence on the reason for not using (α
1–α
5) + (β
1–β
5) + (γ
1–γ
5) as a feature in this study are depicted in
Table A1 within
Appendix A. In order to demonstrate the rationale behind using ANN as the machine learning classifier in this study, the classification accuracy of ANN was compared to those obtained from three other machine learning classifiers, namely, k-NN (k = 10), naïve Bayes, and linear SVM as shown in
Table 8. The comparisons were based on two input feature types.
Figure 8 shows that k = 10 for k-NN had overall best results in a range of k from 1 to 15 for all considered scenarios. Therefore, k = 10 has been chosen for comparisons in
Table 8. The results indicate that the ANN method outperformed all other classifiers for every scenario considered in this study.
In order to ensure good classification effects, overfitting must be avoided. Since the decision boundary of the classifier trained by the input sets with 3 dimensions or above is reasonably hyperplane in nature, it is difficult to visualise the decision rules in a 2-dimensional map. Thus, the difficulty of direct observation on whether there is an overfitting problem in an ANN with high-dimensional inputs could yield challenges in practice. On the contrary, the decision rules of ANNs trained by 2-dimensional input sets can be easily displayed. Based on this premise, it is fair to assume that 2-dimensional training input sets with PCA are advantageous when the variations in accuracy are minimal.
It is well known that overfitting is an immense threat to the abilities of machine learning algorithms to accurately detect and classify new data, owing to the incorporation of extrinsic details during the training process. In this study, it is envisaged that the application of SCG as a training algorithm will help mitigate potential problems. During individual trainings, the initial values of neurons will be reset randomly, with a corresponding random redivision of the data into 3 distinct groups for training, validation, and testing. This approach implies that training multiple times with a single input set will produce different results with slightly different decision boundaries.
Figure 9 shows the decision rules of ANNs trained by PC
1–2 for (α
1–α
5) + (β
1–β
5) and (γ
1–γ
5) at different speeds (i.e., typical results after a single round of training). The input datasets here correspond to (d–f) and (m–o) in
Figure 7. The number of neurons of the hidden layer is considered as a variable for controlling potential overfitting problems. The emergence of complex boundary curves and narrow or slender envelope area within decision regions are likely indications of overfitting. For instance, the curvature of the decision boundary that exists between SF and MF regions is quite steep in
Figure 9c as well as the visible elongated sharp strip area at the lower end of MF region in
Figure 9i indicate that ANN
3 could be associated with overfitting problems. With reference to
Figure 9, decision maps generated from more neurons tend to be associated with overfitting problems. Therefore, since there are only 2 input values and 3 output classes in this classification, 10 neurons are adjudged sufficient for optimised, reasonably comprehensive, and complete classification of the cases considered in this study (i.e., increasing the number of neurons may not lead to better results). However, as more and more fault types emerge, it may be necessary to increase the number of neurons to boost the accuracy of classification. In general, the results presented here show that the initially proposed automated fault diagnosis framework is capable of identifying and classifying common gearbox faults using very simple and well-known features such as amplitude of rotor-related and GMF harmonics. This thereby provides good encouragement that the approach may be suitable for integrating rotor and gearbox fault diagnosis into a single framework in the near future.
7. Validation Dataset
In order to further examine the effectiveness of the applied method for classifying independent datasets, the study obtained publicly available gearbox fault datasets provided in an earlier study by Shao et al. [
44] for validation. According to Shao et al. [
44], the validation gearbox datasets were acquired from a drivetrain dynamic simulator, whereby two kinds of working conditions (i.e., rotating speed and load) were experimentally simulated. The rotating speed and load configurations were set to 20 Hz–0 V and 30 Hz–2 V. Vibration data were collected using 6 accelerometers mounted at 2 measuring positions. Position one (P1) datasets were acquired from the planetary gearbox measurement location in three directions (i.e., x, y, and z). Similarly, Position two (P2) datasets were acquired in three directions (i.e., x, y, and z) but from a parallel gearbox.
The different types of faults for both gearboxes are shown in
Table 9. The datasets contain five different working conditions (i.e., four fault types and one healthy). Hence, the fault diagnosis here is based on a 5-class classification task. For each of the scenarios, 10 VM datasets were acquired for approximately 200 s. During spectrum and CCS calculation, the signal processing parameters used are 5120 Hz sampling frequency (
), 80% segment overlap, 0.5 Hz frequency resolution (
), 249 number of segment averages, 10,240 number of FT data points (
), and Hanning window. For CCS computation, two forms of data fusion approaches were considered. The former on the one hand was implemented to fuse the data from all six accelerometers mounted at the two measurement locations into a single spectrum (i.e., P1xyz+P2xyz). The latter on the other hand was implemented to fuse the data from the two accelerometers that had the same orientation (i.e., P1x+P2x, P1y+P2y, and P1z+P2z). ANN
1 (2–10–5) was used as a classifier, and the PC
1–2 of shaft harmonic features (α
1–α
5) was used as input. The analysis was also conducted based on a 70–15–15 random split of data for training, validation, and testing, respectively. Based on the linear space generated by the application of PCA to the training and validation datasets, linear transform was then implemented on the testing datasets.
The classification problem is defined as classifying the data into 5 classes (Health, Chipped, Miss, Root, and Surface) based on the selected features.
Table 10 and
Table 11 and
Figure 10 show the results of the validation, where it can be observed that the applied approach effectively classifies all the considered validation datasets, thereby confirming the robustness. It was also observed that the outcomes obtained by integrating all six accelerometers are better than when only two accelerometers were used.