1. Introduction
As a critical component of intelligent manufacturing, mechanical intelligent fault diagnosis has become an essential part of “Made in China 2025” [
1]. In mechanical processing, cutting is the most important means of manufacturing. At present, research in this field mainly focuses on tool cutting parameter optimization [
2,
3] and tool wear condition monitoring [
4,
5]. Real-time monitoring of the tool wear state is an essential part of the computerized numerical control (CNC) machining process in a manufacturing workshop. The wear state of a tool is affected by the processing procedures, workpiece materials, cutting parameters, and other factors. The whole system exhibits strong nonlinearity and uncertainty. The tool wear will not only reduce the processing quality of the CNC machining equipment but also affect the surface roughness and machining accuracy of the workpiece and seriously affect the overall stability and processing efficiency of the CNC machining equipment. The wear state of a tool will directly affect the machining accuracy, surface quality, and production efficiency of the parts. Therefore, the technology of tool condition monitoring (TCM) is of great significance for ensuring the quality of processing and realizing continuous automatic processing [
6,
7,
8,
9].
TCM methods are divided into direct measurement methods and indirect measurement methods. Direct measurement methods include resistance measurement methods, optical measurement methods, discharge current measurement methods, ray measurement methods, and computer image processing methods. The tool wear state can be obtained directly, but due to the influence of the coolant and other disturbances in the production process, the tool wear state in the mechanical processing stage cannot be detected in real time, which is rarely used in actual industrial production [
10]. Indirect measurement methods include the cutting force measurement method, acoustic emission method, mechanical power measurement method, vibration signal and multi-information fusion detection [
11,
12,
13,
14,
15]. Indirect measurement methods can acquire signals in real time through a sensor during tool cutting. After data processing and feature extraction, hidden Markov model (HMM), fuzzy neural network (FNN), back propagation neural network (BPNN), support vector machine (SVM), and other machine learning (ML) models can be used to monitor tool wear [
16,
17,
18]. For example, Zhang Xiang et al. proposed micro-milling tool wear identification as the research object and established the HMM of tool wear. Eight optimal cutting forces were extracted as the HMM training input vectors by Fisher’s linear discriminant. The method can identify the micro-milling tool’s wear state with an accuracy rate of 85% [
16]. X. Li et al. proposed an FNN designed and developed for machinery prognostic monitoring. The FNN is basically a multi-layered fuzzy-rule-based neural network that integrates a fuzzy logic inference into a neural network structure. This method is helpful to accelerate the learning process of the complex conventional neural network structure, and the accuracy in prediction and rate of convergence are better than those of similar ML models [
17]. Liao Zhirong et al. proposed a tool wear condition monitoring system based on acoustic emission technology. By analysingrepresentative acoustic signals, the energy ratios from six different frequency bands are selected from the time–frequency domain. These are used as a classification feature to determine the amount of tool wear. In this method, the SVM is used as the classification method, which can ultimately achieve an accuracy rate of 93.3% [
18]. The traditional ML model adopts shallow learning. Since ML is affected by the quality instability of the manual extraction feature, a random initialization of the weights can easily enable the objective function to converge to the local minimum. When the number of layers is too large, the forward propagation of the residuals will be lost, leading to gradient diffusion. At the same time, ML is limited by the inability to capture the dependence of long-distance signals on the sequential input. Deep learning (DL) can effectively avoid these problems.
DL was first introduced into machine learning (ML) in 1986 and then used in an artificial neural network (ANN) [
19] in 2000. DL uses multi-level non-linear information to process low-level features to form more abstract high-level representations for supervised or unsupervised feature learning, representation, classification, and pattern recognition [
20]. The DL model is an “end-to-end learning” model, which does not require complex data pre-processing of the original data, making the construction of the model more concise (
Figure 1). At present, the DL method has emerged in the industrial field. DL models represented by a CNN have been gradually applied to the study of tool wear condition monitoring and achieved specific results [
21,
22,
23]. For example, Zhang Cunji et al. proposed transforming the vibration signal of a tool in the process of machining into an energy spectrum by a wavelet packet transform (WPT) and inputting the spectrum into a CNN to extract the features automatically and classify them accurately [
21]. German Terrazas et al. proposed that based on the gramian angular summation fields (GASF) module, a large number of continuous force signals generated by cutting tools in a high-speed milling process can be automatically converted into two-dimensional images, which are input into a CNN to obtain the tool wear status [
22]. Cao Dali et al. proposed the construction of a DenseNet using the dense connection, which adaptively extracts hidden high-dimensional features from original time series signals. The results showed that deepening the network layers is helpful for improving the accuracy of the tool wear monitoring model [
23]. The above methods adopt DL to extract features adaptively, which basically solves the shortcoming of a manual extraction of the signal features. However, the convolution neural network (CNN) used relies too heavily on high-dimensional feature extraction. The excessive number of convolutional layers is prone to gradient dispersion, and the number of convolutional layers is too small to grasp the global features and does not take into account the critical feature of the correlation between the timing signal samples generated during tool processing.
Therefore, this paper proposes a method for real-time monitoring of a tool wear state based on a CNN and bidirectional long short-term memory (BiLSTM) network model with an attention mechanism (CABLSTM). The sensor acquires the signals generated during tool processing in real time, which are directly fed into the CNN for parallel local feature extraction and then into the BiLSTM network for feature extraction of the long-distance dependence information. The attention mechanism is used to calculate the network weights and distribute them reasonably. Finally, the signal feature information with different weights is sent to a Softmax classifier to classify the tool wear status, avoiding the complexity and limitation caused by a manual feature extraction. This method can meet the real-time and accuracy requirements of tool monitoring in actual industrial production.
The remainder of this paper is organized as follows.
Section 2 presents the CABLSTM algorithm.
Section 3 presents the monitoring process of tool wear.
Section 4 presents the experimental results of the tool wear condition monitoring.
Section 5 concludes the article.
4. Experimental
4.1. Experimental Design
A real-time monitoring system for the tool wear state includes a condition monitoring facility and a data analysis unit. The condition monitoring facilities include the basic equipment used to process the workpiece, the equipment to collect the vibration signals generated during the processing, and the equipment to measure the value of tool wear. The data analysis facility included high-performance computers and DL platforms for analyzing and processing the data and classifying and reporting the tool wear status in real time.
4.1.1. Condition Monitoring
The experimental platform of this paper was provided by the Engineering Training Center of Guizhou University. A high-precision CNC vertical milling machine (Model: VM600) was used for the milling workpiece. No coolant was added during milling. The workpiece was milled steel (S136). The milling tool had a cemented carbide 4-edge milling cutter, and its surface was covered with layers of a titanium aluminum nitride coating. The diameter of the tool was 6 mm, the rake angle was 4°, the clearance angle was 8°, and the helix angle was 30°. The cutting parameters of the milling experiment are shown in
Table 2.
In the experiment, three accelerometers (Model: INV9822; Range: ±50 g) were magnetically attracted to the machine tool fixture in the
,
, and
directions for real-time acquisition of the original vibration signals generated during tool machining. A high-precision digital acquisition instrument (model: INV3018CT) from the Beijing Oriental Institute of Vibration and Noise was used to process the real-time signals and transmit them to a computer. The sampling frequency of the signal was 20 kHz, 200 mm of milling in each direction of the tool was recorded as a milling stroke, and each tool was milled for 330 strokes. After each milling stroke, the milling cutter was removed from the milling machine and photographed. A pre-calibrated high-precision digital microscope (EVDM-101) was used for the measurement, the optical magnification was 0.7×–4.5×, the electronic magnification was 35×–235×, and the measuring accuracy was 0.1 μm. During the measurement process, the position of the wear zone of the minor flank surface of the milling cutter, which was the most easily worn, was selected as the measurement position, and the same reference line was taken as the standard to ensure that the position remains unchanged during the measurement. The wear value (VBmax) was calculated by subtracting the current cutting edge length from the initial length of the cutting edge of the milling cutter. The real-time monitoring experimental device of the tool wear state is shown in
Figure 7.
4.1.2. Data Analysis
The DL hardware platform of the experiment used high-performance servers: An Intel Xeon E5-2650 processor, with a frequency of 2.3 GHz, 256 GB of memory, and an NVIDIA GeForce TITAN X graphics processing unit (GPU). The software platform used the Ubuntu 16.04.4 operating system with Keras as the front-end of the in-depth learning framework and TensorFlow as the back-end for data analysis.
The milling operation was carried out with four milling cutters (C1, C2, C3, and C4). Each milling cutter was performed 330 times, and 1320 original signal samples were obtained. The data of three milling cutters (C1, C2, and C3) were used for the training set and verification set of the model, and one milling cutter (C4) data was used for the test set of the model. The training set was used for model fitting the data samples, the verification set was used for adjusting the hyperparameters of the model, the initial ability of the model was evaluated, and the test set was used to evaluate the generalization ability of the final model. In the DL training process, a sufficient number of samples were needed to improve the learning quality of the neural network. The data samples of the original processed signals were long sequences of periodic timing signals. According to the principle of signal sampling, in this paper, 100,000 points of each sample were sampled continuously, and 50 short sequence timing signals with a length of 2000 were cut to be used for model input after data normalization to reduce the computational intensity of the network training. At the same time, data expansion could increase the experimental data based on the original magnitude data, improve the robustness of the network, and reduce the risk of overfitting.
The processing conditions in the experiment had the following characteristics: 1. Finishing milling and small back engagement were performed; 2. the workpiece was milled steel (S136) with high hardness after heat treatment; and 3. the experiment needed to produce tool data set quickly and accurately. This paper referred to references [
33,
34,
35] and the measurement methods of milling tool wear in 2010 prognostics and health management (PHM) competition. The following method was used as the blunt standard for the milling cutter in this experiment: The maximum value (VBmax) of the wear zone of the minor flank surface of the milling cutter was selected as the quantified value reflecting the wear state. It was specified that failure of the milling cutter occurred when the wear value of the milling cutter was greater than 0.13 mm. The wear process of the milling cutters (C1, C2, C3, and C4) is shown in
Figure 8.
Each sample contains three-dimensional vibration signals and the wear values of the four rear blades. To prevent mutual interference of the different blade wear values, the maximum wear value of the four blades was selected as the label of the milling stroke. The wear state of the tool was divided into initial wear, normal wear, and rapid wear. In this paper, the wear state of the tool was defined according to the actual wear curve of each milling cutter. The actual wear curve was used to determine the wear degree of the tool. The tool wear degree was divided into three types of label data, and the label data were converted by a one-hot coding form to facilitate the classification of the final tool wear state. The classification of the final tool wear state is shown in
Table 3.
4.2. Comparison of the Experimental Results of the Deep Learning Model
The original signal generated by the milling process was sampled and then sent to the DL neural network model. The model adaptively extracted the high-dimensional features implied in the time-series signal and calculated the actual output value and reality of the model. The Adam algorithm reduced the error distance between the values, and the network weight was continuously updated so that the actual output value of the model was closer to the real value. To further verify the performance of the proposed algorithm, we implemented the bearing fault diagnosis algorithm of the CNN model in [
25] and the turbofan engine life prediction algorithm of the BiLSTM model in [
26]. The above model was compared with our proposed CLSTM, CBLSTM, and CABLSTM networks. The five training models used the same training parameters. The specific training parameters of the model are shown in
Table 4.
After the training and verification of the DL neural network, different loss function values and accuracies were obtained. The loss function values of the CNN [
25], BiLSTM [
26], CLSTM, CBLSTM, and CABLSTM models and the accuracy of the verification set are shown in
Figure 9,
Figure 10,
Figure 11,
Figure 12 and
Figure 13, where the
axis was used to represent the number of iterations of the milling data set, and the double
axis was used to represent the loss function value and the model verification accuracy.
It can be concluded from the figure that the loss function value of the network model training set decreased with an increase in the number of iterations and finally stabilized. The loss function value of the verification set fluctuated periodically, and the loss function of the CLSTM model had a large amplitude. The CNN, BiLSTM, CBLSTM, and CABLSTM models were relatively stable, the overall trend of the loss function was decreasing and finally converging, there was no gradient explosion or dispersion phenomenon, and the network convergence speed was faster. The accuracy rates of the CNN and BiLSTM model validation sets were 87.57% and 86.36%, respectively, and the prediction accuracy was low. This result indicates that the individual DL network could predict the tool wear state, but deeper features could not be captured due to the limitation of the network model capability. There were deeper features hidden in the tool vibration signal. The network model proposed in this paper was superior to the CNN and BiLSTM network. This is because the network structure was relatively deep, which is conducive to mining deeper features. First, the CNN was used to extract the local features of the timing signals, which could effectively filter the noise in the original signal. At the same time, the length of the timing signal was reduced, which facilitates subsequent network learning depending on the time-series characteristics of the time-series signals and improved the ability of the model prediction.
In the network model proposed in this paper, the CABLSTM model had the best performance, which ewas superior to that of the CLSTM and CBLSTM models, and achieved high prediction accuracy. The initial prediction accuracy of the CLSTM model was relatively low. After 65 iterations, the accuracy of the verification set was basically stable and above 96%, and the accuracy was 96.42% after 100 iterations. The CBLSTM model used a two-way LSTM network to access past and future information; that is, it could extract timing signal features from both the forward and reverse directions and extract more abundant information features. After 42 iterations, the accuracy rate of the verification set was basically stable at over 96%, and the accuracy rate was 97.04% after 100 iterations. The CABLSTM model introduced the attention mechanism on the basis of CBLSTM, which selectively filtered out some key information from a large amount of information and focused on the key information, reducing the loss of key information features of long sequence texts. After 35 iterations, the accuracy of the verification set was basically stable and above 96%, the accuracy was 97.50% after 100 iterations, the loss function value reached 0.0651, and the network stability was higher. The loss function and the accuracy of the verification set and test set are shown in
Table 5.
The data of the milling cutter (C4) were selected as the test set of the DL network model to evaluate the generalization ability of the final model. The total number of test samples was 330, including 23 initial wear samples, 232 standard wear samples, and 75 sharp wear samples. The samples were randomly fed into the trained DL network model. The CABLSTM model had high precision and recall. The F1-score reaches the optimum value at 1 (perfect precision and recall), and the worst is 0. The F1-score in this paper was 0.9697. The evaluation indices of the CABLSTM model are shown in
Table 6. The test results show that the CABLSTM model proposed in this paper hade a strong generalization ability. Although the test time was not as good as that of the partial comparison model, the algorithm found a good balance between time and precision.
It can be concluded from the figure that the CABLSTM model proposed in this paper completed the inspection of the milling cutter (C4) with an accuracy of 96.97%. The predicted results of normal wear were more accurate. There were some deviations between the initial wear and sharp wear, but the deviations were within a reasonable range. The incorrect prediction results mainly occurred in the transition stage of the wear degree. This is because the tool was in the normal wear state for a long time during the machining process, the amount of data that could be learned by the model was relatively large, and the features were relatively distinct; in addition, the tool had a short period of initial wear and rapid wear, and the amount of data that could be obtained was insufficient. The confusion matrix of the wear test results of the tool test set is shown in
Figure 14.
When the real-time monitoring system of tool wear state was working, the acceleration sensors would bring a three-axis vibration signal of length 2000 to the monitoring model of the CABLSTM network. The model performed a forward calculation to identify the current tool wear state and achieve real-time monitoring of the tool wear state.
4.3. Comparison of Deep Learning and Machine Learning
To further validate the feasibility of the proposed model, a comparative experiment was designed with alternative ML models. The same data set used for DL was used in the experiment. More specifically, the commonly used models in traditional tool wear value detection approaches, including the BPNN, the SVM, the HMM, and the FNN, were compared with the CABLSTM model proposed in this paper. The wavelet threshold denoising method was used to perform noise reduction processing on the original signal collected by the acceleration sensor. The data features of the time domain, frequency domain, and time-frequency domain were extracted, and the specific extraction method is shown in
Table 7. Pearson’s correlation coefficient (PCC) was used to reflect the correlation between the feature and the wear value, and the feature with a correlation coefficient greater than 0.9 was selected as the extraction object to achieve a feature dimensionality reduction. The extracted features were used as the input of the ML model.
It can be concluded from
Table 7 that the accuracy of traditional ML models varied greatly, which was due to the instability of the artificial extraction features, and the construction of the model would have an impact on the prediction results. The DL model proposed in this paper could achieve ideal results by adaptively extracting hidden high-dimensional features and reasonable network depth design for tool processing signals without data pre-processing. The prediction accuracy was significantly higher than that of the BPNN, SVM, and HMM. However, the prediction accuracy of the FNN reached 94.24% because the FNN used a neural network to learn the rules of the fuzzy system. According to the learning sample of the input and output, the design parameters of the fuzzy system were automatically designed and adjusted to realize the self-learning and adaptive functions of the fuzzy system. Compared with the other algorithm models, this method demonstrated a great improvement in performance. The test sample speed of the CABLSTM model could reach 6 ms, which could meet the requirements of real-time tool wear monitoring in industrial production. The accuracy of ML and DL prediction is shown in
Table 8.
5. Conclusions
In this paper, we proposed the application of a CNN and RNN fusion to real-time monitoring of a tool wear state and modified the network parameters and structure according to the characteristics of vibration signals to monitor the tool wear degree in real time. The prediction accuracy of the CBLSTM reached 96.97%. In the pre-processing stage, the wear state of the tool was defined according to the actual wear curve, which was used to determine the wear degree of the tool and improve the accuracy of the data label classification. At the same time, the experimental data were added to the original magnitude data to improve the robustness of the algorithm by employing the data expansion method. A one-dimensional CNN was used to extract the local features, and abundant high-dimensional features were extracted from the original signal, which avoided the limitation of the traditional manual feature extraction, better characterizede the hidden tool wear state information in the original signal, and shortened the network model training time. The idea of introducing the attention mechanism was innovatively applied to the improved CBLSTM network model, which effectively improved the recognition accuracy and generalization performance of the real-time monitoring. The experimental results show that the CABLSTM model had certain advantages in the real-time monitoring of tool wear, which could meet the industrial requirements in terms of recognition accuracy and recognition speed.
In the process of actual manufacturing, the processing procedures and site conditions were often complicated and variable. There were many features that could reflect the wear state of a tool. In this paper, the original signal collected by the acceleration sensor was used as the tool wear monitoring index, which was restricted by the training data volume and processing method. It might not be applicable to meet the requirements of arbitrary working conditions. In future work, multi-source data fusion technology and DL theory will be used to further study the information characterizing the wear state of the tool, improve the proposed method, and extend the method to industrial monitoring.