1. Introduction
The rolling bearing is the core component of the coal mine main fan system, and its health exerts a key impact on the operation of the main fan. The main fan of the coal mine has a harsh working environment, a long service period, and complex working conditions. It is extremely prone to failures of the inner ring, outer ring, cage, and rolling elements of the rolling bearing at the drive end. The difficulty in fault diagnosis of rolling bearings is that the non-stationary vibration signals collected by the acceleration sensor have non-linear and time-varying characteristics. The use of ordinary methods to extract the characteristics of the vibration signal has a low fault recognition rate. The vibration signal data belong to the small sample data type, and the unbalanced problem is difficult to overcome. Therefore, seeking an effective feature extraction for non-stationary vibration signals and a fault diagnosis machine learning method suitable for small training samples is of great significance to the fault diagnosis of the rolling bearing of the coal mine main fan [
1].
Affected by external and internal factors, the main fan of coal mine had different types of failures during operation. The failures caused by horizontal vibration and vertical vibration of the rolling bearing at the drive end account for more than 70% [
2]. Moreover, the data of the vibration signal of the rolling bearing at the drive end are highly obtainable, and it is easy to arrange the sensor. Therefore, the vibration signal is selected as the feature vector that characterizes the health of the rolling bearing. Traditional main fan vibration signal analysis and processing mostly conduct Fourier transform analysis methods, wavelet analysis, and other methods, which cannot accurately reflect the non-stationary and short duration of the current vibration signal. It can only decompose low-frequency signals and cannot deal with high-frequency signals. The signal is decomposed [
3]. Wavelet packet analysis is a good signal time-frequency analysis method, having obtained fruitful application results in two fields of signal analysis and image processing [
4]. Wavelet packet analysis can overcome the shortcomings of traditional methods, extract the characteristics of the vibration signal, make the vibration signal pass through a series of filters with different center frequencies but the same broadband, and decompose the vibration signal in a more detailed time-domain plane. At the same time, this method can improve the resolution of the high frequency part of the vibration signal [
5]. In recent years, wavelet packet analysis has been widely used in the field of fault diagnosis for aviation equipment, wind turbines, and power units. Many researchers have combined wavelet packet analysis with diagnostic networks such as support vector machines, bat generalized regression neural networks, and high-order cumulative to achieve fault diagnosis of rolling bearings, analog circuits, refrigerators, and other forms of mechanical equipment [
6,
7,
8,
9].
In the field of intelligent diagnosis, neural networks require a large number of typical fault samples to train models, making most neural network models limited in the engineering application process of fault diagnosis. The deep forest algorithm can overcome the problems of the high cost of fault sample labeling and difficulty in fault data collection, finishing efficient fault diagnosis under small training samples [
10]. At the same time, neural networks rely on the choice of hyper-parameters, and parameter tuning is mostly manually selected [
11,
12]. The fault diagnosis method based on the deep forest algorithm is an efficient machine learning diagnosis method that uses random forest as the base feature extractor, applies guided aggregation strategy to learn the data, and uses k-fold cross-validation to refine the cascade forest layer expansion and validation, which can reduce the influence of hyper parameters on the model [
13,
14]. At present, the deep forest algorithm is widely used in the field of time series data and image processing data analysis, having achieved fruitful application results [
15]. The deep forest algorithm can be used to process real-time monitoring data and historical data of environmental weather, as well as to realize online fault diagnosis of photovoltaic modules [
16]. At the same time, the deep forest algorithm can be used to extract multi-level amplitude features and multi-level dense scale-invariant feature transformation features, amplitude features, and realize fault recognition in multi-feature mode [
17,
18,
19]. In addition, some scholars used machine learning methods to improve the deep forest model, solving the problems of the long characteristics of single-sample data of mechanical equipment vibration signals and the high cost of deep forest model data processing, and realized the fault diagnosis of mechanical equipment under small training samples [
20]. The bearing vibration signal data samples of coal mine main fans cover a large number of sample points, mostly 1000–4000 data points. Deep learning algorithms have defects in processing such data, and it is often difficult to achieve accurate diagnosis due to large computational complexity and imperfect feature learning system [
21]. Therefore, in order for the fault diagnosis of the main fan bearing of the coal mine to be completed, it is necessary to find a way to improve the deep forest algorithm.
Aiming at the shortcomings of the intelligent fault diagnosis method described in this article, combined with the advantages of wavelet packet analysis in feature extraction, we propose a fault diagnosis method for the rolling bearing of coal mine main fan based on wavelet packet analysis and deep forest algorithm. Firstly, we collected the vibration signals of the rolling bearing at the drive end of the main fan with different spectrum information, and then we applied the wavelet packet analysis feature extraction method to decompose the vibration signals and obtain the effective feature parameters that characterize the fault state. After that, the effective feature parameters extracted by the wavelet packet feature were used as the input of the deep forest algorithm, and the diagnosis model based on the wavelet packet and the deep forest algorithm was trained. Finally, the validity of the fault diagnosis method for the rolling bearing of the driving end of the coal mine main fan under the condition of small training samples was verified.
3. Fault Diagnosis Process of Coal Mine Main Fan Bearing Based on Wavelet Packet–Deep Forest Algorithm
In the field of coal mine main fan fault diagnosis, the deep forest algorithm has a strong feature learning capability. However, to better characterize the fault state of coal mine main fan bearings, researchers often select vibration data samples with more than 1000 data points. The deep forest algorithm has a large computational complexity and a low diagnostic accuracy when dealing with long data types. Therefore, in order for the diagnostic accuracy of the deep forest algorithm to be improved, a wavelet packet analysis feature extraction method is selected to process the bearing fault data samples, and a diagnosis method based on the wavelet packet–depth forest algorithm is introduced. The method includes three steps, specifically involving the construction of a sample dataset, wavelet packet feature extraction, and deep forest algorithm fault diagnosis, as shown in
Figure 6.
In the field of the coal mine main fan fault diagnosis, the deep forest algorithm has strong feature learning capability. Meanwhile, the deep forest model often selects vibration data samples with more than 1000 data points in order to better characterize the fault state of coal mine main fan bearings. The deep forest algorithm has a large computational complexity and low diagnostic accuracy when dealing with data of long data types. Therefore, in order for the diagnostic accuracy of the deep forest algorithm to be improved, a wavelet packet analysis feature extraction method is selected to process the bearing fault data samples, and a diagnosis method based on wavelet packet-depth forest algorithm is introduced. The method includes three steps, specifically involving the construction of a sample dataset, the wavelet packet feature extraction, and deep forest algorithm fault diagnosis, as shown in
Figure 6.
- (1)
Construction of the sample dataset. Under a certain sampling frequency, the vibration signal monitoring data of five typical health states of rolling bearing at the driving end of coal mine main fan are obtained by acceleration sensors: normal state, inner ring failure, outer ring failure, cage failure, and rolling body failure. Then, the vibration signal dataset is constructed using the monitoring data.
- (2)
Feature extraction based on wavelet packet decomposition. Firstly, the wavelet packet coefficients of the tree structure are obtained by selecting the appropriate number of wavelet packet decomposition layers and applying the wavelet packet decomposition method to the wavelet packet transform of the vibration signal. Then, the wavelet packet coefficients are decomposed and reconstructed to obtain the reconstructed signal. Subsequently, the wavelet packet energy features are extracted to obtain the energy distribution of each sub-band. Finally, the energy of the sub-bands under different health states is analyzed to obtain the feature parameters that can characterize the health state of the main ventilation fan in coal mines, and the fault feature vector set is constructed.
- (3)
Construction of fault diagnosis model. After wavelet packet feature extraction, the fault sample dataset is divided into two parts: the training set and the test set. Then, the feature vectors in the dataset are normalized in order to reduce the computational effort of the diagnostic model. Then, the normalized feature vectors are input into the multi-granularity scan structure of the diagnostic model to obtain the transformed augmented vectors. Subsequently, the transformed broadened vectors are input to the cascaded forest structure to obtain the feature class vectors. After that, the training set is input to the fault diagnosis model to train the deep forest fault diagnosis model. At the same time, the test set is input to the fault diagnosis model to determine the health status of the bearing according to the output of the deep forest algorithm, and then verify the diagnostic performance of the model.
4. Empirical Analysis
4.1. Construction of a Sample Dataset
As shown in
Table 1, the data of the main ventilation fan monitoring and supervision system of a mining company in Shanxi in 2021 was selected as the sample of bearing fault diagnosis data. The main fan of this coal mine is in a typical high gas mine. The fan ventilation method is a central parallel type, and the ventilation method is a mechanical extraction type. The coal mine fan system is arranged with three vertical shafts—the main shaft, the secondary shaft, and the return shaft—where the main shaft and the secondary shaft are used for air intake and the return shaft is used for air return. The mine design ventilation easy period air volume is 12,900 m
3/min, negative pressure is 2059 Pa, and the equilibrium hole is 6.21 m
2. The ventilation difficult period air volume is 12,780 m³/min, negative pressure is 2178 Pa, and the equilibrium hole is 5.43 m
2. The return shaft is installed with two FBCDZ-8-No34 (2 × 800 KW) counter-rotating axial flow main ventilation fans with impeller, and the diameter of the impeller is 3.4 m; one of the fans is used for working production, and the other one is used as a backup. The condition of the fans on site is shown in
Figure 7. The main fan is equipped with various sensors for gas, wind speed, negative pressure, etc., which can realize uninterrupted real-time inspection of the operation condition
Using the acceleration sensor with a 12,000 sampling frequency for vibration signal acquisition, we can obtain five kinds of health status monitoring data for the main fan rolling bearing: normal state of the drive end bearing, inner ring failure, outer ring failure, rolling body failure, and cage failure. Among them, the four kinds of fault state diagram involved are shown in
Figure 8.
During the initial commissioning phase of the main ventilator system, the site personnel will adjust the speed of the fan by means of a frequency converter and keep it running at a stable speed. The main ventilator is in a high gas mine and is in a difficult ventilation period. Therefore, the site personnel will set the frequency of the inverter according to the site air volume, negative pressure, gas concentration, and other conditions. After that, the inverter will adjust the bearing speed on the basis of the data collected by the sensors and ensure that the main ventilator is running under a stable working condition. At this time, the speed of the sampling motor is 1797 r/min, taking its approximate number as 1800 rpm. In addition, the data collected at the mine only covered the type of fault and did not record detailed information on the size of the fault. Therefore, in this experiment, the data involve one kind of normal state and four kinds of fault state data, and the label code is shown in
Table 1. When 2000 consecutive vibration signal data points are selected as a set of vibration signal data samples, 200 sets of normal state monitoring data, 100 sets of inner ring fault monitoring data, 100 sets of outer ring fault monitoring data, 100 sets of rolling body fault monitoring data, and 200 sets of cage fault data can be obtained, being labeled and coded correspondingly. The data sample sets for this fault diagnosis model validation are allocated according to the ratio of one to four between the test set and the training set.
4.2. Feature Extraction Based on Wavelet Packet Decomposition
Before feature extraction of the vibration signal, the original vibration signal needs to be routinely analyzed. Then, the wavelet packet feature extraction method is applied to decompose the vibration signal and obtain the feature parameters that can effectively characterize the vibration signal.
4.2.1. Experimental Signal Analysis
When the collected rolling bearing fault signals are processed, the data samples of the previous second can be obtained. Then, the time-frequency diagram analysis is performed on the data sample, and the time-domain diagram and the frequency spectrum diagram of the original bearing vibration signal are obtained, as shown in
Figure 9 and
Figure 10.
For the time domain curve, the vibration amplitude of the bearing was between 0.2 g and 0.2 g in the cage fault state, and the vibration amplitude was average. The vibration amplitude of the bearing was between −0.4 g and 0.4 g in the rolling body fault state, and the amplitude was smaller than the inner ring and outer ring fault state, and the vibration signal was uniform. It may have been due to its greater influence by noise interference and covered less fault information. In the inner ring fault state, the vibration amplitude of the bearing was between −1 g and 1.5 g, and the amplitude was smaller than that in the outer ring fault state. The bearing vibration amplitude in the outer ring fault state was stronger, and the amplitude was between −3 g and 3 g; the amplitude was larger, and it showed a continuous shock signal. The normal state of the bearing vibration amplitude was more stable, and the amplitude was between −0.2 g and 0.2 g; the amplitude was smaller, and the vibration signal was uniform.
From
Figure 10, it can be seen that for the spectrum curve, there were some differences in the characteristic frequencies of the five health states. The characteristic frequencies of 12.09 Hz, 12.09 Hz, 32.42 Hz, 39.27 Hz, and 38.94 Hz appeared in the cage fault state, rolling body fault state, inner ring fault state, outer ring fault state, and normal state, respectively, being very close to the characteristic frequencies obtained from the actual calculation. The vibration signal characteristic frequencies of the inner ring fault state and outer ring fault state were similar, and the bearing fault type cannot be identified by only using the ordinary vibration signal analysis method. Therefore, this paper combined wavelet packet analysis algorithm to decompose the vibration signal and obtain the feature vector with obvious energy difference. Then, the fault diagnosis of rolling bearing was realized by combining with the fault diagnosis method.
From the bearing normal state, outer ring fault state, inner ring fault state, and rolling body fault state, the cage fault state of the time frequency diagram can be seen, yet the fan bearing vibration time frequency diagram amplitude difference is not obvious. Except for the main frequency, the other frequency components of the normal state of the bearing are lower than the other healthy states. The energy values of the time-frequency diagrams of bearings in different health states are significantly different in different frequency bands. Therefore, in this paper, the wavelet packet method is chosen to extract the characteristic components of the vibration signals in different health states to characterize the health states of the bearings.
4.2.2. Wavelet Packet Feature Extraction
The three-layer wavelet packet decomposition was carried out with the help of Meyer wave, and the fault feature vectors of the five healthy states of rolling bearing normal, inner ring fault, outer ring fault, rolling element fault, and cage fault were obtained. Part of the data is shown in
Table 2.
In order to eliminate the influence of large differences in the value ranges of different characteristic parameters of rolling bearings in different health states, we normalized the training samples of deep forest fault diagnosis of rolling bearing health states. The normalized rolling bearing fault diagnosis training samples are shown in
Table 3.
The normalized rolling bearing wavelet packet energy feature vector was analyzed, and the energy histogram was able to be obtained, as shown in
Figure 11.
The wavelet packet feature extraction of the rolling bearing can obtain the feature vector E that can characterize the failure state of the main fan. That is, E is used as a data sample to verify the fault diagnosis method, where satisfies the formula . Taking the extracted feature vector E as the input feature vector x of the deep forest, we found that then satisfied the formula .
4.3. Fault Diagnosis of Rolling Bearings at Driving End of Main Fan
According to the fault diagnosis process, the fault diagnosis of the rolling bearing of the driving end of the main fan of the coal mine can be carried out. There are five kinds of health status of rolling bearings, and thus the fault diagnosis process is a five-category problem.
Firstly, we constructed a multi-granularity scanning structure. The eight-dimensional energy feature vector obtained by wavelet packet decomposition was input into the multi-granularity scanning structure. For the sequence data, two-dimensional, three-dimensional, and four-dimensional sliding windows were set to scan the original features. Taking 2D as an example, the process of multi-granularity scanning structure can be obtained, as shown in
Figure 12.
For the sequence data of the five-classification problem, there were eight-dimensional original input features. In the case of step size 1, seven feature vectors can be obtained by scanning the input features using a sliding window of two-dimensional feature size. Inputting the seven two-dimensional feature vectors into the random forest A and completely random forest B, we found that seven five-dimensional transformed feature vectors can be generated. Finally, the class vectors generated by each forest were stitched together, and 70-dimensional transformed feature vectors corresponding to the original eight-dimensional input feature vectors can be output.
Then, the cascaded forest structure was constructed as shown in
Figure 13. Each layer of the cascade forest structure was composed of multiple random forests. Each random forest contained multiple decision trees, and each tree outputted a result in the form of a class vector. The output of each decision tree was averaged, and the mean value was selected as the final prediction according to the maximum principle of the voting mechanism. The process of constructing a rolling bearing cascade forest structure was a five-category problem solving process, and the final output result of each random forest took the form of a probability of occurrence of a five-dimensional class vector. Therefore, taking the mean value of each decision tree output results in a five-dimensional class vector. The mean value with the highest probability of occurrence was selected as the final output of the model.
Finally, a fault diagnosis model of the rolling bearing of the main ventilator based on the deep forest was constructed, as shown in
Figure 14.
As shown in
Figure 14, the original input of the fault diagnosis of the rolling bearing of the main fan of deep forest was an eight-dimensional feature vector. In order to improve the feature diversity of the deep forest model, we used three sliding windows of different sizes in multi-granularity scanning, namely, two-dimensional, three-dimensional, and four-dimensional feature vectors. We used 2D feature window, 3D feature window, and 4D feature window to process sequence data samples and generate training sample data sets of seven, six, and five corresponding dimensions, respectively. Then, these training sample datasets were used to train Random Forest A1, Complete Random Forest A2, Random Forest B1, Complete Random Forest B2, Random Forest C1, and Complete Random Forest C2, respectively, which generated 70-dimensional, 60-dimensional, and 50-dimensional transform feature vector.
Then, we trained the results of the multi-granularity scan. The 70-dimensional transform feature vectors obtained from these two-dimensional sliding windows were input into the first-level cascade forest for training, and 20-dimensional augmented feature vectors were obtained. The 20-dimensional augmented feature vector was spliced with the 70-dimensional feature vector obtained from the original two-dimensional sliding window to obtain a 90-dimensional feature vector. Using these 90-dimensional feature vectors as the input data of the second-level cascading forest, we obtained the training results of the second-level cascading forest. Combining the training results of the second-level cascaded forest with the 60-dimensional feature vector obtained from the three-dimensional sliding window, we obtained the input data of the third-level cascading forest structure. Then, we concatenated the training results of the third-level cascade forest with the 50-dimensional feature vector obtained from the four-dimensional sliding window to obtain the input of the next layer. We repeated this process until the ideal classification result was obtained, and the training is ended. Finally, we took the average of the class vector results of each level to obtain the distribution probability of each class vector; took the maximum of it; and obtained the final output result, that is, we determined the type of fault diagnosis and the probability of occurrence of this type of fault.
4.4. Analysis of Results
We selected 560 sets of data samples to train the fault diagnosis model. Then, we selected the remaining 140 sets of data samples as the test set and carried out fault diagnosis experiments on the Python development platform to verify the effectiveness of the method proposed in this article. A set of fault diagnosis case results was selected for analysis, and an example of the fault diagnosis experiment result set is shown in
Figure 15.
After the diagnosis model training was over, we took the results of the four random forest outputs as an example of the result vector set and obtained the result set s = {0:38.86%, 1:54.44%, 2:6.09%, 3:0.42%, 4:0.19%}; in accordance with the principle of maximizing the voting mechanism, the fault diagnosis result of this example was the Y1-type fault of the rolling bearing, that is, the inner ring fault, and the probability of its occurrence was found to be 54.44%.
4.5. Comparison Experiments
In order to verify the effectiveness of the method in this paper, we selected random forest algorithm, SVM, and BP network as the diagnostic comparison algorithms, and the rolling bearing data were trained according to the implementation process of the algorithm. We used the fault diagnosis training set to train the algorithm and used the test set to verify it. The results are shown in
Table 4.
Comparing the fault recognition rates of the four methods, we found that the fault diagnosis model combining wavelet packet feature extraction and deep forest algorithm had significantly better fault diagnosis performance than random forest algorithm, SVM, and BP network when the same five health states were diagnosed. The average diagnostic accuracy of the method proposed in this article was as high as 98.5%. At the same time, when designing the fault diagnosis experiment, we found that the parameters of the deep forest did not need to be adjusted too much. The random forest algorithm, SVM, and BP network needed to adjust the parameters to improve the accuracy, which saved training time to a certain extent.