Next Article in Journal
In Vivo Neuropharmacological Potential of Gomphandra tetrandra (Wall.) Sleumer and In-Silico Study against β-Amyloid Precursor Protein
Previous Article in Journal
Tuyere-Level Syngas Injection in the Blast Furnace: A Computational Fluid Dynamics Investigation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System

1
ICT Digital Sector, SK Inc. C&C, Seongnam-si 13558, Korea
2
Department of Big Data Analytics, Kyung Hee University, Yongin-si 17104, Korea
3
Department of Industrial and Management Systems Engineering, Kyung Hee University, Yongin-si 17104, Korea
*
Author to whom correspondence should be addressed.
Processes 2021, 9(8), 1450; https://doi.org/10.3390/pr9081450
Submission received: 1 July 2021 / Revised: 4 August 2021 / Accepted: 16 August 2021 / Published: 20 August 2021
(This article belongs to the Section Sustainable Processes)

Abstract

:
Today, real-time fault detection and predictive maintenance based on sensor data are actively introduced in various areas such as manufacturing, aircraft, and power system monitoring. Many faults in motors or rotating machinery like industrial robots, aircraft engines, and wind turbines can be diagnosed by analyzing signal data such as vibration and noise. In this study, to detect failures based on vibration data, preprocessing was performed using signal processing techniques such as the Hamming window and the cepstrum transform. After that, 10 statistical condition indicators were extracted to train the machine learning models. Specifically, two types of Mahalanobis distance (MD)-based one-class classification methods, the MD classifier and the Mahalanobis–Taguchi system, were evaluated in detecting the faults of rotating machinery. Their performance for fault detection on rotating machinery was evaluated with different imbalanced ratios of data by comparing with binary classification models, which included classical versions and imbalanced classification versions of support vector machine and random forest algorithms. The experimental results showed the MD-based classifiers became more effective than binary classifiers in cases in which there were much fewer defect data than normal data, which is often common in the real-world industrial field.

1. Introduction

Recently, in manufacturing industry, there is much interest in smart manufacturing to improve productivity and competitiveness. The smart manufacturing is realized using advanced technologies such as the Internet of Things (IoT), artificial intelligence, and big data analysis [1]. Increasingly complex facilities in manufacturing systems need to be monitored and maintained in more sophisticated manners. To this end, the prognostics and health management (PHM) technology is capable of diagnosing or predicting faults by detecting or analyzing the condition of facilities using IoT, machine learning and big data analytics.
In particular, rotating machinery such as industrial motors, aircraft engines, and wind turbines are playing crucial roles in the automation of manufacturing systems. So, the fault detection of rotating machines has a decisive influence on system productivity. Many problems in rotary machines mainly come from the defects of bearing, gear boxes, or shaft deviation. The failure of a rotating machine that transmits power to various facilities results in great economic loss due to the performance degradation or shutdown of the system.
The rotating parts such as bearings often generate abnormal signal data if they have some problems; so, it is possible to diagnose the abnormal conditions by investigating the signal data. This signal data need appropriate preprocessing tasks based on various signal processing techniques, which make the signal data meaningful information that the user desires to analyze accurately and easily.
In this study, vibration data generated from rotating machines were preprocessed by applying appropriate signal processing techniques, and a fault-detection method was developed that can diagnose the abnormality of equipment parts in real time. The vibration data of normal and fault conditions were collected, and data standardization was then performed to compare with the same distribution. Thereafter, the Hamming window technique was applied to segment the vibration signal and a cepstrum technique was also adopted for enhancing the inherent characteristics by eliminating the existing noise. After preprocessing the data, 10 statistical condition indicators (SCIs), such as root mean squared (RMS) and peak-to-peak, were extracted to use for training the machine learning models. The extracted data were finally used to detect abnormal states by using the Mahalanobis distance (MD)-based one-class classification methods.
The MD-based one-class classification methods construct the Mahalanobis space (MS), represented by the MD using only the normal signal data, and then determine whether a new signal sample belongs to the MS or not. On the other hand, typical binary classification methods such as support vector machines (SVM) and random forest (RF) need both normal data and abnormal data to train the models for detecting abnormal condition of the system [2,3,4,5]. Unfortunately, in practical industrial systems, the amount of the fault data that can be collected is extremely small. For this reason, it is often difficult to apply typical two-class i.e., binary) classification techniques to construct the fault-detection models in real-life industrial systems. For that reason, in this paper we aimed to analyze the advantages and disadvantages of one-class classification techniques that consider data distribution. In particular, two MD-based classification methods were evaluated. First, the Mahalanobis distance classifier (MDC) used the Mahalanobis space based on MD to detect outliers and, moreover, the Mahalanobis–Taguchi System (MTS) adopted the Taguchi techniques to choose and use only key factors among all the variables.
The performances of the two MD-based classifiers were compared with binary classification methods and their imbalanced classification versions. The experimental results of performance comparison were investigated for the same test data set after training the models with different levels of imbalanced ratios (IRs) between normal and abnormal data in the training data set.
The remainder of this paper is structured as follows. In Section 2, we introduce related studies on the MD-based classification. In Section 3, we present the fault detection based on vibration data with the framework of the research. The signal processing methods, data preprocessing, and fault diagnosis classification models are also described. In Section 4, we compare the performance between one-class classifiers and binary classifiers according to different IRs of the same training data set. Finally, we conclude this paper with future work in Section 5.

2. Related Work

The MDC defines a normal group and constructs the MS using data from the normal group data [6]. A new sample is classified according to how far away it is from the pre-trained MS. Meanwhile, Taguchi proposed the MTS method by combining the MD-based classification method and the Taguchi method [7]. The Taguchi method is used to extract only effective variables with a large influence on MD estimation. The MTS method has been applied effectively to many fields such as diagnosis, pattern recognition, speech recognition and optimization [8,9].
The MTS technique is generally used for multivariate analysis. There are various studies comparing the performance between MTS and other multivariate analysis techniques. In large-scale samples, the performance of the techniques is similar, and there is a study in which the MTS technique is superior in small samples [10]. Moreover, the MTS still has the limitation of choosing optimal factors among all the variables [8,11], and so some studies integrated to MTS a feature selection such as genetic algorithm (GA) [12], particle swarm optimization [13], and ant colony optimization [14]. In particular, to improve the MTS process, Chen et al. developed two-stage Mahalanobis classification system (MCS) [15] and the integrated MCS (IMCS) [16]. In this paper, we focused on traditional MDC and MTS methods as one-class classifiers to compare their performance with binary classifiers according to the varying imbalanced ratio in detecting the fault of rotating machines based on the preprocessed vibration data.
Meanwhile, in the actual industry fields, there is little well-designed data that have proper quantities of positive samples and negative samples. Therefore, many researchers have studied to solve the imbalanced data set problem. According to [17], the number of published papers that study the imbalance learning is increasing since 2006. In 2016, 118 papers were published, and this is about 17 times the number of papers in 2006.
There are analytical studies to diagnose faults of rotating machines using the MD-based classification technique. Nader [18] used kernel whitening normalization and kernel principal component analysis (KPCA) to get the MD and showed that the techniques can be good choices when the training samples are small or the class is unique. Wei [19] suggested a novel kernel, Mahalanobis, ellipsoidal learning for one-class classification. Bartkowiak [20] used three methods, Parzen kernel density, mixture of Gaussians, and Support Vector Data Description (SVDD) after calculating MD, to find outliers for diagnosis of gearboxes.

3. Fault Detection Based on Vibration Data

3.1. Framework

The procedure for developing a fault-detection model that can classify normal and abnormal data is shown in Figure 1. First, the vibration data of normal and abnormal states are collected for analysis. The collected vibration data are subjected to the windowing process. In this process, a continuous signal having a long length is divided into blocks by using the Hamming window function, and the values are set to values near 0 toward the boundary of the window frame. The original signal is then separated from the noise by the cepstrum transform process and the signal is denoised. In this research, 10 SCIs, such as mean, peak-to-peak, and RMS, were used to extract features for classification models. Those indicators are often used to represent the features from time series data in bearing fault-detection problems [21,22,23,24].
The preprocessed data were split into training and test sets to evaluate the MD-based classification methods. By using the training sets of preprocessed data, two MD-based classification models, MDC and MTS, were constructed as one-class classifiers. They were evaluated by comparing their accuracy with two representative binary classification methods, SVM and RF, and their imbalanced classification versions, cost-sensitive SVM and cost-sensitive RF. Finally, the performances of the developed models were compared in terms of several classification performance measures with the same test sets.

3.2. Data Description

In this study, we used the vibration signal data of the ball bearing provided by the Bearing Data Center of Case Western Reserve University [25]. It collected the vibration data using the accelerometer in the sensors attached to the rotating machine. The data set contained 12,000 digital signal values per second under the condition of RPM 1750. It consisted of 12,000 continuous vibration values, and the class consisted of a normal state and three abnormal states of system fault, ‘Ball’, ‘Inner race’, and ‘Outer race’.
In this experiment, we prepared four training data sets according to different imbalance ratios (IR) to compare the performance of one-class classifiers and binary classifiers by mimicking real-life industrial fields, where the fault data are extremely rare. The IR was used to evaluate the imbalance rate of the binary data, which were calculated as in Equation (1). The composition of the training data set according to IR is shown in Table 1. The test set consisted of 25 data including 10 normal and 15 abnormal data (five for each of three failure types).
IR = ( #   o f   m a j o r i t y   d a t a ) ( #   o f   m i n o r i t y   d a t a ) = ( #   o f   n o r m a l   d a t a ) ( #   o f   a b n o r m a l   d a t a )

3.3. Signal Processing and Data Preprocessing

In this subsection, we describe appropriate signal processing techniques. Signal processing means processing digitized signals by an algorithm for modifying or improving the signal for a specific purpose. In this research, the signal processing, such as standardization, Hamming window, cepstrum transformation, and statistical indicator extraction, were performed to be used for input of training fault-detection models.

3.3.1. Standardization

First, to compare the collected vibration data with the same distribution, standardization was performed using Equation (2). The x i is the vibration value at time i in a signal data, z i is the standardized value of x i . The x ¯ and s mean the average and standard deviation of the vibration values x 1 , ,   x N , respectively, and N is the number of vibration values.
z i = x i x ¯ s ,         i = 1 . N

3.3.2. Hamming Window and Cepstrum

The vibration data used in this study were the arbitrarily divided data from a continuous vibration signal. There might have been a discontinuous part, that is, leakage error, which occurred because of arbitrary cutting of time series data. To remove the leakage, the Hamming window function was applied during Fast Fourier Transform (FFT). The window function made signal values near 0 toward the boundary of the window frame. By applying the Hamming window function, the signal periodicity can be ensured and a more accurate spectrum can be obtained from the result of the FFT. The window function is used when multiplying the original signal, as in Equation (3), where windowed signal g i is the multiplication of window function h(i) and input signal x i . Figure 2 shows the signal before applying the Hamming window function and the signal data after applying the Hamming window function.
g i = h ( i ) · x i
h ( i ) = 0.54 0.46 cos ( 2 π i N 1 )
The cepstrum transform has the effect of obtaining an enhanced value of the original signal characteristic because it can extract the original signal, that is, the formant, from the noise, as depicted in Figure 3. The spectrum X(f), which is represented in frequency domain, was obtained by applying FFT to the time domain signal, x(t), then making it squared and giving it log function results in Log | X ( f ) | 2 . Finally, inverse FFT was applied and we could get the result.

3.3.3. Extraction of Statistical Condition Indicators

SCIs are often used to effectively reflect the characteristics of the vibration data that have undergone signal processing [21,22,23,24]. Ten SCIs (mean, peak-to-peak, RMS, standard deviation, skewness, kurtosis, crest factor, shape factor, margin factor, and impulse factor) were extracted from the processed vibration data. Table 2 shows the formula of each indicator. Although we could detect the occurrence of faults by observing the changes in the statistical index values, we used them as the features for fault-detection modeling. The 10 SCIs were used as variables in the later classification modeling.

3.4. Fault Detection Using Mahalanobis Distance

To detect faults using the preprocessed data, we first used MDC, which is a MD-based classification technique. The technique uses MD as a comprehensive measure and constructs the MS using the MD of the normal signal group. The MD value of a signal will be used to distinguish normal and abnormal groups. In addition, the MTS method uses the Taguchi method to select only the important variables that have a major effect on the MD value and proceeds with the same procedure as MDC using only these important variables. The MTS method consists of four steps. Step 1 and step 2 are the classification procedure of MDC, and step 3 and step 4, including the Taguchi method, are the additional procedure for MTS.

3.4.1. Step 1: Constructing the MS with Normal Data

First, the normal and abnormal groups are distinguished from each other. MS is constructed using the normal data of the data set, which are denoted as shown in Table 3. The MS is a multi-dimensional unit space that is characterized with MD of the normal group. The MD is calculated through the three steps below.
1.
Standardization of normal data
The mean of the pth feature, x ¯ p , and the standard deviation, s p , are first calculated from the feature data of the normal group, Xp = ( x p j ) for j = 1 … n. The pth feature value of the jth sample, x p j , is standardized to z p j as follows:
z p j = x p j x ¯ p s p ,                 for   p = 1 k ;   j = 1   ,   n
2.
Calculation of the correlation matrix
The correlation matrix R for the standardized data of the normal group is obtained. The correlation coefficient between two variables, r p q , in the correlation matrix R is calculated as follows.
r p q = j = 1 n ( z p j z ¯ p ) ( z q j z ¯ q ) j = 1 n ( z p j z ¯ p ) 2 j = 1 n ( z q j z ¯ q ) 2         for   p , q = 1 k
3.
Calculation of the MD of normal data
The MD of the jth normal data, M D j , is calculated in Equation (7). The M D j is often called the scaled Mahalanobis distance since it was divided by the number of variables, k.
M D j = 1 k   Z j T R 1 Z j   for   j = 1     n
where Z j = ( z 1 j ,   ,   z k j ) T is the standardized vector of the jth variable and R 1 is the inverse of the correlation matrix. If the normal data are collected well, their MD values will have a value close to 1, since the average of M D j is statistically 1. The MS constructed from the MD values in this way is called a unit space.
In this study, we prepared four training data sets with different IRs, as presented in Table 1, which, therefore, constructed different MS from their normal data. Table A1 in Appendix A shows the SCI values of 20 normal data in the training data that were used to construct its MS, and Table A2 in Appendix A shows the standardized data of the SCI values. From the standardized SCI values, the correlation matrix can be calculated as shown in Table 4. Finally, the final MD values of the normal data were calculated, as presented in Table 5a. The values were distributed well around 1. Note that the transformation to MD made the resulting distribution have the mean value of 1.

3.4.2. Step 2: MD Calculation of Abnormal Data and Validation of MS

To check the validity of the MS derived from Step 1, the MD values of abnormal data in the training data set were tested. Mean x ¯ p , standard deviation s p , and the correlation matrix R, which were obtained from the normal data in Step 1, were used again to calculate the MD of the abnormal data. If the MS is properly constructed from the normal data, the MD values of the abnormal data will have much larger values than the mean value of the normal group (i.e., 1).
The MD-based classifier for abnormality detection will decide a new data to abnormal if its MD is greater than a predefined threshold. The threshold can be set by comparing the MD values between normal data and abnormal data when they have enough abnormal data. However, it is not proper in one-class classification problems, assuming the small number of abnormal data. In this case, we set the MD threshold based on the chi-square value of a specific confidence interval (e.g., CI = 99%) because the MD was known to follow the chi-square distribution where the degree of freedom (df) is the number of variables [26].
1.
Standardization of abnormal data
Abnormal data were prepared and denoted, as shown in Table 6. An abnormal data y p j was standardized to w p j by using mean x ¯ i and standard deviation s i of the normal data.
w p j = y p j x ¯ p s p                   for   p = 1 k ;   j = 1   ,   n
2.
Calculation of the MD of abnormal data
The MD values of the abnormal data were calculated by using the correlation matrix R of the normal data. W j = ( w 1 j ,   ,   w k j ) T was the standardized vector of the jth variable for the abnormal data.
M D j = 1 k   W j T R 1 W j   for   j = 1     n
3.
Validation of the MS
The MD values of the abnormal data are shown in Table 5b. The minimum value of the MD was 2269.5, and all the distance values were very far from the origin. Therefore, it can be said that the MS of the normal data was constructed successfully.
Now, the MS prepared in Step 2 can be used as the MDC classifier. Using the validated MS, MDC will classify a new data by comparing its MD with the specific threshold. Suppose that the threshold is set as the chi-square value χ ( 10 ,   0.01 ) 2 = 23.2 of CI = 99% and df = 10 since the number of variables is 10. If the MD value of a new data is greater than the threshold, the data are classified to an abnormal state, and otherwise, to a normal state.

3.4.3. Step 3: Important Variable Selection (Taguchi Method of MTS)

In Steps 3 and 4, the MTS extracts key variables through the Taguchi method and carries out the classification procedure by calculating the MD values in the same manner as Steps 1 and 2.
In addition to the classification procedure of MDC, MTS removes the variables that have no or little effect on the MD values and chooses the key variables. By constructing MS using only the key variables, the system can be easily interpreted, and the classification performance can also be better. In MTS, the Taguchi method is adopted for selecting key variables [27]. The Taguchi method uses the signal-to-noise ratio (SN ratio) as a criterion for determining the degree of influence on the MD values. SN ratio in quality engineering is a measure for evaluating system robustness; however, in the MTS, it is used as a measure to select important parameters for pattern recognition. The formula of the SN ratio for the larger-the-better characteristics is as follows.
S N R = 10   l o g (   1 k   j = 1 k 1 M D j )
To calculate the SN ratio, the experiment was planned with an appropriate two-level orthogonal array. One should choose an orthogonal array that has a greater number of columns than the number of the variables used in the experiment. Since the number of variables used in this study was 10, we conducted the experiment with L 12 ( 2 11 ) as the minimum two-level orthogonal array. Specifically, we used the Plackett–Burman L 12 ( 2 11 ) , as presented in Table 7, so that the interaction effects among features could be uniformly diluted into all the columns of the arrays, as suggested by Dr. Taguchi [27].
Level 1 of the orthogonal array means that the corresponding variable was used, and level 2 means that the variable was not used. As shown in Table 7, the MD values of the normal and abnormal groups were calculated with 12 experimental conditions at each row. The SN ratio was then calculated using the SN ratio formula in Equation (10).
Next, we calculated the gain using the difference of average between the case where the variables were used and the case where the variables were not used. The gain of the SN ratio was calculated as follows.
G a i n = S N R L 1 S N R L 2
Table 8 shows the results of calculating the gain of the SN ratio for each feature. If a feature has negative gain of SN ratio, the feature will be excluded from the feature set of MTS since the significance of the feature is low. If the gain of a feature is positive, the feature is selected as the key variable that has a significant effect on the MD value calculation. As shown in Table 8, seven features were selected as the key variables of MTS, excluding peak-to-peak, root mean squared, and crest factor features.

3.4.4. Step 4: Fault Detection Using MTS

Now, a new MS was constructed by using the seven features determined in Step 3, and then the classification procedure of Step 1 and Step 2 was conducted again. The threshold of determining the class was adjusted to the chi-square value, i.e., χ ( 7 , 0.01 ) 2 = 18.5 of CI = 99% and df = 7, because the number of selected important features was seven.

3.5. Fault Detection Based on Machine Learning Methods

In this subsection, we describe briefly how to classify normal and abnormal states by using binary classification machine learning methods. The models were developed by using both normal and abnormal data, and then they were used to distinguish whether a new data sample was a normal or abnormal state.
The four data sets shown in Table 1 were used for training two machine learning algorithms, SVM and RF, which are known to show convincible classification performance. In this study, as well as conventional SVM and RF, the imbalanced classification versions of SVM and RF were also tested because three of the four training data sets contained the small number of abnormal (fault) data, which is similar to real-life industrial field conditions. Specifically, cost-sensitive SVM (CS_SVM) and cost-sensitive RF (CS_RF) were used for the imbalanced classification algorithms. They adjust their class weights and make the training better. Parameter tuning of four machine learning methods was performed by using the grid search method under 3-fold cross validation.

4. Experiments

4.1. Classifiers and Datasets

To evaluate the proposed method, the classification performances using two MD-based one-class classification methods, MDC and MTS, were compared with those of four binary classification machine learning methods, which included classical versions of SVM and RF and their imbalanced classification versions, CS_SVM and CS_RF.
As shown in Table 1, the training data were constructed differently according to the imbalance ratio to investigate the change of binary classification methods. Note that MD and MS use only normal data for training because they are used as one-class classification methods. Twenty-five test data (10 normal and 15 abnormal) were used to compare the performance among all the classification models.

4.2. Experimental Results

As described in Section 3, one-class classifiers, MDC and MTS, classify a new sample based on the predefined threshold. We considered that the one-class classifiers do not know abnormal data and, so, the threshold was set according to confidence interval. In this research, the MDC used 10 features, and then the threshold was set to χ ( 10 , 0.1 ) 2 = 16.0 for CI = 90%, χ ( 10 , 0.05 ) 2 = 18.3 for CI = 95%, or χ ( 10 , 0.01 ) 2 = 23.2 for CI = 99%. Additionally, the MTS in this research used only seven important features and, therefore, the threshold was χ ( 7 , 0.1 ) 2 = 12.0 of CI = 90%, χ ( 7 , 0.05 ) 2 = 14.1 of CI = 95%, or χ ( 7 , 0.01 ) 2 = 18.5 of CI = 99%.
Table 9 shows the MD values of normal and abnormal data in test data set that were calculated by MDC and MTS. All normal data except for sample #7 were classified by MDC and MTS to normal because their MDs were less than the threshold. However, MDC misclassified sample #7 to abnormal because MD7 = 26.843 was a little greater than χ ( 10 , 0.01 ) 2 = 23.2.
The classification performances of MDC and MTS were compared with those of balanced and imbalanced binary classifiers of SVM and RF in terms of four measures such as accuracy, balanced accuracy, F-score, and G-mean. The last three measures are often used for imbalanced classification.
As shown in Table 10 and Figure 4, MTS had perfect accuracy, while MDC had the F-score of 0.968 and the G-mean 0.949 because the normal sample #7 was misclassified. Note that MTS and MDC always had the same performance regardless of any of the four training sets since they used only 20 normal data.
In the case of IR = 1.0, all of the four machine learning-based classifiers also showed perfect performance since there were enough abnormal data in the training data. However, as the number of abnormal data in training sets became smaller, which meant IR was higher, the overall classification performances turned lower. In the case of IR = 2.222 and IR = 3.333, CS_SVM showed similar performance to MDC and less than MTS, but SVM, RF, and CS_RF had lower performances. When IR became 6.667, all the binary classification methods had much lower performances than MDC and MTS.
Comparing two MD-based classifiers, MTS had better performance than MDC. Moreover, MTS can be said to be robust since it could be applied with smaller significant features in our experiments. So, the model can easily be interpreted with the small number of features in real-life industrial systems by using the important SCIs.

5. Conclusions and Future Work

In this study, we evaluated two MD-based one-class classification methods, MDC and MTS, for fault detection of rotating machines using vibration data. To use the vibration data for analysis, they were preprocessed by applying signal processing techniques such as the Hamming window and the cepstrum transformation. Moreover, 10 SCIs such as mean, standard deviation, peak-to-peak, and RMS were extracted and used as input variables for model training. To obtain meaningful results in the real-life industrial field where there are very few fault (abnormal) data compared with normal data, MDC and MTS were compared with the binary classification methods of training the data sets with different IRs.
We focused on one-class classification methods using MD because they do not need any abnormal data in training models. The two MD-based classifiers were compared with balanced and imbalanced binary classification algorithms such as SVM and RF. In the experiments, there was a tendency that the classification performances of the binary classification models were highly degraded as the number of abnormal data in the training set decreased. As a result, MDC and MTS showed much better performance than binary classifiers in the case of small amounts of abnormal training data.
The experiments are significant in that most working industrial systems in real fields rarely have fault data because they often stop the system before the occurrence of fault. Although the collection of fault data is possible, it needs a long time or high cost. This means that one-class classifiers are generally more useful in terms of cost, time, and effort if they can work with acceptable performance.
In addition, between MD-based classifiers, the MTS that selects only key variables through the Taguchi method can be useful in an actual operation environment since the small number of features are easily interpretable, as well as being fast and convenient to apply to the applications. In our experiment, MTS was robust enough to show better performance than MDC.
As future work, one can test many other signal processing techniques except for the Hamming window and the cepstrum transformation. The classification performance may be able to be improved by applying different signal processing techniques suited to the characteristics of data. In addition, some improved MTS methods such as MCS [15] and IMCS [16] have been developed, as described in Section 2. They can be evaluated with the traditional MD classifiers and the machine learning methods, as well. Finally, the performance may be able to be enhanced by applying recently developed deep learning algorithms and other parameter optimization techniques. From the viewpoint of the imbalanced data, one can use sampling-based techniques such as SMOTE [28] and ADASYN [29] in addition to applying cost-sensitive learning algorithms.

Author Contributions

Conceptualization, S.-G.K. and J.-Y.J.; methodology, S.-G.K. and J.-Y.J.; software, S.-G.K. and D.P.; validation, D.P. and J.-Y.J.; data curation, S.-G.K. and D.P.; writing—original draft preparation, S.-G.K.; writing—review and editing, J.-Y.J.; visualization, S.-G.K. and J.-Y.J.; supervision, J.-Y.J.; project administration, J.-Y.J.; funding acquisition, J.-Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grants funded by the Korean government (MSIT) (Nos. 2017H1D8A2031138, 2019R1F1A1064125, and the Korea Institute for Advancement of Technology (KIAT) grant funded by the Korean Government (MOTIE) (Advanced Training Program for Smart Factory, No. N0002429).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. SCI values extracted from the normal data in the training set.
Table A1. SCI values extracted from the normal data in the training set.
No.X1X2X3X4X5X6X7X8X9X10
x μ x s t d x s k e x k t x p x R M S x C F x S F x M F x I F
10.00220.111088.22688450.753311.08490.111099.853051.536735,304.57025146.0979
20.00220.108885.18397926.077810.66830.108898.039049.335933,338.47554836.8438
30.00220.107182.87577521.972110.34220.107196.528148.337431,608.35254665.9227
40.00220.108084.02847719.709010.51240.108197.273749.206932,280.49544786.5324
50.00220.106785.69838014.061810.49540.106798.352049.504332,362.57714868.8510
60.00210.103582.92977525.50479.99350.103596.533848.320730,518.98384664.5823
70.00220.105383.33407602.230310.20320.105496.837448.026130,964.20834650.7214
80.00220.107384.61767826.829510.47860.107397.676948.870132,159.07004773.4776
90.00220.109284.73867847.461810.68010.109397.750149.462133,158.98214834.9262
100.00220.108084.32157775.560610.52620.108097.488048.708631,886.46654748.5030
110.00220.105983.23987585.752310.24910.105996.773948.355131,415.75814679.5124
120.00220.107082.33647425.061610.29320.107196.147448.105031,339.24634625.1750
130.00220.109285.16137917.726210.70280.109298.002749.762732,885.38434876.8778
140.00220.106583.15867568.674310.30450.106696.704048.673031,968.79864706.8751
150.00220.108386.44378143.767110.70340.108398.807749.889833,144.69514929.4946
160.00220.107682.47927446.401410.35770.107696.223348.769631,740.42044692.7704
170.00220.108183.07347556.533110.45620.108296.662548.406231,988.05964679.0603
180.00220.109285.53657985.377710.72790.109298.248949.737633,212.45474886.6603
190.00220.107585.66128005.281210.56690.107598.317749.524032,216.51774869.0815
200.00220.107082.77907503.391810.32520.107096.453348.220931,383.32894651.0622
Table A2. Standardized SCI values of the normal data in the training set.
Table A2. Standardized SCI values of the normal data in the training set.
No.Z1Z2Z3Z4Z5Z6Z7Z8Z9Z10
x μ x s t d x s k e x k t x p x R M S x C F x S F x M F x I F
1−1.65672.17802.60942.59532.55952.17742.51303.04072.96092.9180
20.47340.78180.59190.60260.78630.78190.62870.36291.05890.4621
30.9235−0.2833−0.9385−0.9321−0.6018−0.2830−0.9406−0.8520−0.6148−0.8952
40.08540.3070−0.1742−0.18120.12250.3070−0.16620.20590.03550.0626
5−1.5919−0.55560.93290.93680.0501−0.55610.95390.56790.11490.7163
6−2.1366−2.5826−0.9027−0.9187−2.0860−2.5831−0.9347−0.8723−1.6686−0.9058
7−0.0113−1.4129−0.6346−0.6273−1.1934−1.4128−0.6194−1.2308−1.2379−1.0159
80.0416−0.19610.21640.2257−0.0211−0.19610.2527−0.2039−0.0820−0.0411
90.60991.06270.29660.30400.83641.06280.32870.51650.88530.4469
100.93200.24600.02010.03100.18150.24630.0564−0.4004−0.3457−0.2394
11−0.1638−1.0676−0.6971−0.6899−0.9981−1.0676−0.6853−0.8305−0.8011−0.7873
121.2926−0.3375−1.2961−1.3002−0.8102−0.3371−1.3360−1.1347−0.8751−1.2188
130.01771.03120.57690.57090.93311.03110.59110.88220.62060.7800
14−0.2039−0.6550−0.7509−0.7548−0.7625−0.6550−0.7579−0.4436−0.2661−0.5700
15−0.94410.47001.42721.42940.93570.46961.42721.03690.87151.1979
160.53590.0347−1.2014−1.2192−0.53600.0349−1.2573−0.3261−0.4870−0.6820
171.67200.3712−0.8074−0.8009−0.11680.3717−0.8010−0.7683−0.2474−0.7909
180.04831.01960.82570.82781.03991.01950.84680.85160.93700.8577
19−0.9893−0.06940.90830.90340.3547−0.06970.91820.5917−0.02640.7181
201.0654−0.3423−1.0026−1.0027−0.6739−0.3419−1.0183−0.9938−0.8324−1.0132

References

  1. Lee, S.H.; Yoon, B.D. Industry 4.0 and direction of failure prediction and health management technology (PHM). Trans. Korean Soc. Noise Vibr. Eng. 2015, 25, 22–28. [Google Scholar]
  2. Park, D.; Kim, S.; An, Y.; Jung, J.-Y. LiReD: A light-weight real-time fault detection system for edge computing using LSTM recurrent neural networks. Sensors 2018, 18, 2110. [Google Scholar] [CrossRef] [Green Version]
  3. Park, Y.-J.; Fan, S.-K.; Hsu, C.-Y. A review on fault detection and process diagnostics in industrial processes. Processes 2020, 8, 1123. [Google Scholar] [CrossRef]
  4. Fan, S.-K.S.; Hsu, C.-Y.; Tsai, D.-M.; He, F.; Cheng, C.-C. Data-driven approach for fault detection and diagnostic in semiconductor manufacturing. IEEE Trans. Autom. Sci. Eng. 2020, 17, 1925–1936. [Google Scholar]
  5. Lv, Q.; Yu, X.; Ma, H.; Ye, J.; Wu, W.; Wang, X. Applications of machine learning to reciprocating compressor fault diagnosis: A review. Processes 2021, 9, 909. [Google Scholar] [CrossRef]
  6. Xiang, S.; Nie, F.; Zhang, C. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit. 2008, 41, 3600–3612. [Google Scholar] [CrossRef]
  7. Taguchi, G.; Jugulum, R. The Mahalanobis Taguchi Strategy: A Pattern Technology System; John Wiley and Sons: Hoboken, NJ, USA, 2002. [Google Scholar]
  8. Woodall, W.H.; Koudelik, R.; Tsui, K.L.; Kim, S.B.; Stoumbos, Z.G.; Carvounis, C.P. A review and analysis of the Mahalanobis—Taguchi system. Technometrics 2003, 45, 1–15. [Google Scholar] [CrossRef]
  9. Cheng, L.; Yaghoubi, V.; van Paepegem, W.; Kersemans, M. On the influence of reference Mahalanobis distance space for quality classification of complex metal parts using vibrations. Appl. Sci. 2020, 10, 8620. [Google Scholar] [CrossRef]
  10. Wang, H.C.; Chiu, C.C.; Su, C.T. Data classification using the Mahalanobis Taguchi system. J. Chin. Inst. Ind. Eng. 2004, 21, 606–618. [Google Scholar] [CrossRef]
  11. Pal, A.; Maiti, J. Development of a hybrid methodology for dimensionality reduction in Mahalanobis–Taguchi system using Mahalanobis distance and binary particle swarm optimization. Expert Syst. Appl. 2010, 37, 1286–1293. [Google Scholar] [CrossRef]
  12. ReséNdiz, E.; Rull-Flores, C.A. Mahalanobis–Taguchi system applied to variable selection in automotive pedals components using Gompertz binary particle swarm optimization. Expert Syst. Appl. 2013, 40, 2361–2365. [Google Scholar] [CrossRef]
  13. Reséndiz, E.; Moncayo-Martínez, L.A.; Solís, G. Binary ant colony optimization applied to variable screening in the Mahalanobis-Taguchi system. Expert Syst. Appl. 2013, 40, 634–637. [Google Scholar] [CrossRef]
  14. El-Banna, M. A novel approach for classifying imbalance welding data: Mahalanobis genetic algorithm (MGA). Int. J. Adv. Manuf. Technol. 2015, 77, 407–425. [Google Scholar] [CrossRef]
  15. Cheng, L.; Yaghoubi, V.; van Paepegem, W.; Kersemans, M. Mahalanobis classification system (MCS) integrated with binary particle swarm optimization for robust quality classification of complex metallic turbine blades. Mech. Syst. Signal Process. 2021, 146, 107060. [Google Scholar] [CrossRef]
  16. Cheng, L.; Yaghoubi, V.; van Paepegem, W.; Kersemans, M. Quality inspection of complex-shaped metal parts by vibrations and an integrated Mahalanobis classification system. Struct. Health Monit. 2020, in press. [Google Scholar] [CrossRef]
  17. Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
  18. Nader, P.; Honeine, P.; Beauseroy, P. Mahalanobis-based one-class classification. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Reims, France, 21–24 September 2014; pp. 1–6. [Google Scholar]
  19. Wei, X.K.; Huang, G.B.; Li, Y.H. Mahalanobis ellipsoidal learning machine for one class classification. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; Volume 6, pp. 3528–3533. [Google Scholar]
  20. Bartkowiak, A.; Zimroz, R. Outliers analysis and one class classification approach for planetary gearbox diagnosis. J. Phys. Conf. Ser. 2011, 305, 012031. [Google Scholar] [CrossRef]
  21. Wang, Z.; Wang, Z.; Tao, L.; Ma, J. Fault diagnosis for bearing based on Mahalanobis-Taguchi system. In Proceedings of the IEEE 2012 Prognostics and System Health Management Conference (PHM-2012 Beijing), Beijing, China, 23–25 May 2012; pp. 1–5. [Google Scholar]
  22. Wang, X.; Zheng, Y.; Zhao, Z.; Wang, J. Bearing fault diagnosis based on statistical locally linear embedding. Sensors 2015, 15, 16225–16247. [Google Scholar] [CrossRef]
  23. Hui, K.H.; Ooi, C.S.; Lim, M.H.; Leong, M.S.; Al-Obaidi, S.M. An improved wrapper-based feature selection method for machinery fault diagnosis. PLoS ONE 2017, 12, e0189143. [Google Scholar] [CrossRef] [Green Version]
  24. Cao, R.; Yuan, J. Selection strategy of vibration feature target under centrifugal pumps cavitation. Appl. Sci. 2020, 10, 8190. [Google Scholar] [CrossRef]
  25. Loparo, K.A. Bearings Vibration Data Set. The Case Western Reserve University Bearing Data Center. Available online: https://csegroups.case.edu/bearingdatacenter/ (accessed on 1 July 2021).
  26. Brereton, R.G. The chi squared and multinormal distributions. J. Chemom. 2015, 29, 9–12. [Google Scholar] [CrossRef]
  27. Mori, T. Appendix D–Q&A: The MT (Mahalanobis-Taguchi) System and Pattern Recognition. In Taguchi Methods: Benefits, Impacts, Mathematics, Statistics, and Applications; Mori, T., Ed.; ASME Press: New York, NY, USA, 2011; pp. 1035–1036. [Google Scholar]
  28. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  29. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE World Congress On Computational Intelligence, Hong Kong, China, 1–6 June 2008; pp. 1322–1328. [Google Scholar]
Figure 1. Procedure for fault-detection evaluation in this research.
Figure 1. Procedure for fault-detection evaluation in this research.
Processes 09 01450 g001
Figure 2. The vibration data (a) before and (b) after applying the Hamming window function.
Figure 2. The vibration data (a) before and (b) after applying the Hamming window function.
Processes 09 01450 g002
Figure 3. Cepstrum transformation process.
Figure 3. Cepstrum transformation process.
Processes 09 01450 g003
Figure 4. Performance comparison among classifiers according to IRs. (a) F-score; (b) G-mean.
Figure 4. Performance comparison among classifiers according to IRs. (a) F-score; (b) G-mean.
Processes 09 01450 g004
Table 1. Data set configuration according to the imbalance ratio (IR). MDC and MTS use only normal data for training, while binary classification methods use both normal and abnormal data.
Table 1. Data set configuration according to the imbalance ratio (IR). MDC and MTS use only normal data for training, while binary classification methods use both normal and abnormal data.
DatasetIR# of Normal# of Abnormal (Fault Types)
Training SetTS 11.0002020 (Ball 7, Inner 7, Outer 6)
TS 22.2229 (Ball 3, Inner 3, Outer 3)
TS 33.3336 (Ball 2, Inner 2, Outer 2)
TS 46.6673 (Ball 1, Inner 1, Outer 1)
Test Set0.6671015 (Ball 5, Inner 5, Outer 5)
Table 2. List of statistical condition indicators (SCIs).
Table 2. List of statistical condition indicators (SCIs).
IndicatorFormulaIndicatorFormula
mean x μ = 1 N i = 1 N x i root mean squared x R M S = 1 N i = 1 N x i 2
standard deviation x s t d = 1 N 1 i = 1 N ( x i x ¯ ) 2 crest factor x C F = x p x R M S
skewness x s k e = i = 1 N ( x i x ¯ ) 3 ( N 1 ) x s t d 3 shape factor x S F = x R M S x ¯
kurtosis x k t = i = 1 N ( x i x ¯ ) 4 ( N 1 ) x s t d 4 margin factor x M F = x p ( 1 N i = 1 N x i ) ) 2
peak-to-peak x p = max ( x i ) min ( x i ) impulse factor x I F = x p x ¯
Table 3. Data schema of normal data.
Table 3. Data schema of normal data.
No.X1X2X3X4Xk
x μ x s t d x s k e x k t x I F
1 x 11 x 21 x 31 x 41 x k 1
2 x 12 x 22 x 32 x 42 x k 2
3 x 13 x 23 x 33 x 43 x k 3
n x 1 n x 2 n x 3 n x 4 n x k n
mean x ¯ 1 x ¯ 2 x ¯ 3 x ¯ 4 …. x ¯ k
std. s 1 s 2 s 3 s 4 s k
Table 4. Correlation matrix between standardized SCI for the normal data in training set.
Table 4. Correlation matrix between standardized SCI for the normal data in training set.
No.Z1Z2Z3Z4Z5Z6Z7Z8Z9Z10
x μ x s t d x s k e x k t x p x R M S x C F x S F x M F x I F
Z11.00000.1827−0.5465−0.5428−0.12200.1830−0.5381−0.4997−0.2303−0.5233
Z20.18271.00000.65330.65360.94131.00000.65100.76030.89180.7319
Z3−0.54650.65331.00001.00000.87040.65310.99940.93660.87120.9758
Z4−0.54280.65361.00001.00000.87070.65340.99960.93450.87080.9745
Z5−0.12200.94130.87040.87071.00000.94120.86900.90980.96750.9096
Z60.18301.00000.65310.65340.94121.00000.65080.76010.89170.7317
Z7−0.53810.65100.99940.99960.86900.65081.00000.92880.86620.9708
Z8−0.49970.76030.93660.93450.90980.76010.92881.00000.93810.9905
Z9−0.23030.89180.87120.87080.96750.89170.86620.93811.00000.9274
Z10−0.52330.73190.97580.97450.90960.73170.97080.99050.92741.0000
Table 5. Mahalanobis distances of training data.
Table 5. Mahalanobis distances of training data.
(a) MD values of 20 normal data.
j12345678910
MDj1.8801.1450.6720.6441.2801.6241.6940.3110.8521.548
j11121314151617181920
MDj0.6440.9320.8330.9841.0421.7590.5290.4660.8010.360
(b) MD values of 20 abnormal data.
j12345678910
MDj36,679.56969.27641.31,423,858.32269.5175,239.4444,333.321,216.3759,20310,207.1
j11121314151617181920
MDj112,695.997,902.2384,358.250,910.368,695.839,944.1168,569.933,557.837,939.4151,305.5
Table 6. Data schema of abnormal group data.
Table 6. Data schema of abnormal group data.
No.X1X2X3X4Xk
x μ x s t d x s k e x k t x R M S
1 y 11 y 21 y 31 y 41 y k 1
2 y 12 y 22 y 32 y 42 y k 2
3 y 13 y 23 y 33 y 43 y k 3
n y 1 n y 2 n y 3 n y 4 n y k n
Table 7. SN ratios with L 12 ( 2 11 ) .
Table 7. SN ratios with L 12 ( 2 11 ) .
No.X1X2X3X4X5X6X7X8X9X10X11SNR
x μ x s t d x s k e x k t x p x R M S x C F x S F x C I F x I F dummy
111111111111−1.124
221211122212−1.929
322121112221−3.201
412212111222−1.793
521221211122−2.190
622122121112−1.659
722212212111−2.119
812221221211−1.808
911222122121−2.178
1011122212212−1.487
1121112221221−1.631
1212111222122−1.358
Table 8. Gain of SN ratio for each feature.
Table 8. Gain of SN ratio for each feature.
LevelX1X2X3X4X5X6X7X8X9X10
x μ x s t d x s k e x k t x p x R M S x C F x S F x C I F x I F
L1−1.625−1.757−1.743−1.659−1.935−1.981−1.986−1.701−1.771−1.688
L2−2.122−1.990−2.003−2.087−1.811−1.765−1.761−2.046−1.975−2.058
Gain0.4970.2330.2600.428−0.124−0.216−0.2250.3450.2040.370
+/–+++++++
Table 9. Mahalanobis distances of test data.
Table 9. Mahalanobis distances of test data.
(a) MD values of 10 normal data of test data.
jClassifier12345678910
MDjMDC4.5691.7123.4982.2821.1651.13026.8431.6671.0201.611
MTS2.4371.2502.2682.5370.9900.5748.0300.8241.2251.670
(b) MD values of 15 abnormal data of test data.
jClassifier12345678
MDjMDC722,066.633,487.3636,322.921,249.41,099,766.738,561.8115,765.924,139.0
MTS79,053.52804.485,754.01707.4105,073.89267.47808.52743.2
jClassifier9101112131415
MDjMDC24,254.321,776.8254,774.4268,589.72,895,042.0865,706.81,994,251.8
MTS3591.81424.020,894.419,864.0244,433.8107,917.9170,577.2
Table 10. Performance for the test set of classifiers trained with different training sets.
Table 10. Performance for the test set of classifiers trained with different training sets.
Training SetIR
(# Normal: # Abnormal)
ClassifierParameterAccuracyBalanced AccuracyRecallPrecisionF-ScoreG-Mean
Any set
(20:0)
MDCn/a0.9600.9601.0000.9380.9680.949
MTSn/a1.0001.0001.0001.0001.0001.000
TS11.000
(20:20)
SVMC = 0.51.0001.0001.0001.0001.0001.000
RFn = 1001.0001.0001.0001.0001.0001.000
CS_SVMC = 0.51.0001.0001.0001.0001.0001.000
CS_RFn = 1001.0001.0001.0001.0001.0001.000
TS22.222
(20:9)
SVMC = 1.00.9200.9330.8661.0000.9280.930
RFn = 1000.8800.9000.8001.0000.8880.894
CS_SVMC = 0.10.9600.9660.9331.0000.9650.966
CS_RFn = 1000.8800.9000.8001.0000.8880.894
TS33.333
(20:6)
SVMC = 0.50.8800.9000.8001.0000.8880.894
RFn = 1000.8800.9000.8001.0000.8880.894
CS_SVMC = 0.30.9200.9330.8661.0000.9280.930
CS_RFn = 3000.8800.9000.8001.0000.8880.894
TS46.667
(20:3)
SVMC = 0.10.5600.6330.2661.0000.4210.516
RFn = 1000.5600.6330.2661.0000.4210.516
CS_SVMC = 0.10.5600.6330.2661.0000.4210.516
CS_RFn = 1000.5600.6330.2661.0000.4210.516
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, S.-G.; Park, D.; Jung, J.-Y. Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System. Processes 2021, 9, 1450. https://doi.org/10.3390/pr9081450

AMA Style

Kim S-G, Park D, Jung J-Y. Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System. Processes. 2021; 9(8):1450. https://doi.org/10.3390/pr9081450

Chicago/Turabian Style

Kim, Seul-Gi, Donghyun Park, and Jae-Yoon Jung. 2021. "Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System" Processes 9, no. 8: 1450. https://doi.org/10.3390/pr9081450

APA Style

Kim, S. -G., Park, D., & Jung, J. -Y. (2021). Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System. Processes, 9(8), 1450. https://doi.org/10.3390/pr9081450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop