1. Introduction
Many studies have investigated the machining results of end milling. The feed rate, attributes of the workpiece materials, cutting speed, depth of cut, cutting tools, and machine rigidity all affect the surface and dimensional accuracy of the parts. However, due to the complexity of the cutting, the ideal cutting conditions can only be achieved in laboratories or in theoretical analysis, resulting in difficulty in building an effective prediction model for real-world milling. We must obtain the assumed parameters of the model through many experiments and use various optimization techniques to improve the model in order to set the cutting conditions [
1].
ANNs have long been used to optimize cutting processes, such as tool wear monitoring and surface roughness prediction. Das et al. used the back propagation algorithm for training the neural network of turning carbide inserts, and the system showed potential for successful tool wear monitoring [
2]. Chien et al. developed a predictive model for the machinability of 304 stainless steel with ANNs to predict the surface roughness of the workpiece, the cutting force, and the tool life. It was shown that the errors of the surface roughness, the cutting force, and the tool life were 4.4, 5.3, and 4.2%, respectively [
3]. Karabulut et al. used ANNs and variance analysis results to predict the surface roughness values of compacted graphite iron after a face milling process [
4]. The results showed a strong correlation between the lead angle, chip thickness, and surface quality. The surface roughness values were improved with the increasing lead angle value.
During the early development of this technology, the detection and control of cutting forces were expected to optimize the milling results. Tsai et al. employed an accelerometer and a proximity sensor in the milling process and collected vibration and rotation data [
5,
6,
7,
8,
9,
10]. The spindle speed, feed rate, depth of cut, and vibration average per revolution (VAPR) were used as input parameters to develop a backpropagation-based artificial neural network (ANN) model to predict the surface roughness. The proposed ANN model had a very high accuracy rate (96–99%) in predicting surface roughness. The resulting high accuracy proved that an ANN can make accurate real-time predictions of surface roughness during end milling. Alique et al. established a versatile neural network model with a single hidden layer [
6]. Input parameters, such as the feed rate and depth of cut, were applied to predict the average cutting force under different conditions. The model could be used for monitoring, adaptive control, and the real-time prediction of surface roughness and cutting tool vibration. Cus et al. predicted the cutting force of a ball nose cutter using a three-layer ANN [
7]. The cutting speed, feed rate, radial and axial depth of cut, and cutter diameter were selected as the machining parameters to predict the components of the cutting force during the milling process, yielding an accuracy rate of ±4%. Kadirgama et al. employed an ANN to predict the cutting force for milling 618 stainless steel [
8]. The cutting speed, feed rate, axial depth of cut, and radial depth of cut were the input parameters, and the cutting force was the output. The range of error was approximately 12%. The error of the prediction was acceptable. According to the literature, through data training, ANN models can predict the cutting force during milling under different cutting conditions. Nevertheless, in this stage of development, the models still had limited applications in these experimental environments and conditions.
In recent years, advancements in sensor technology have improved the transmission method and data size of signals. Signal data can be captured, recorded, and transmitted back in real time during the machining process, so that substantial machining data can be obtained. In addition, big data analysis has become possible because of recent improvements in computing and data storage. Artificial neural algorithms developed using deep learning can extract features from data. If the machining data retrieved by sensors can be analyzed using ANNs, effective prediction models can be established.
A wireless sensory tool holder can be applied to machine tools in which the loads must be dynamically monitored for real-time monitoring and process recording. Ye used the sensory tool holder system for analysis during the rough machining of turbine blades and improved the planning process to shorten the processing time [
11]. Chen et al. employed a sensory tool holder to measure the cutting force when milling thin-walled parts [
12]. The cutting force was used as the load to determine the elastic deformation of the parts, and the volume error was offset by the deformation data so as to correct the processing path. This method successfully increased the machining accuracy and efficiency. Lu et al. collected signals in the machining process with a wireless sensory tool holder and extracted their features [
13]. The deep forest algorithm was applied to estimate the surface mass. The accuracy of the monitoring model for the training sets reached 99.54%, and it reached 90.91% in the case of the validation sets. This approach ensured the surface quality and increased the machining efficiency. The use of wireless sensory tool holders in machining could be expanded in the future.
With the development of deep learning, various ANN models have been established, the most common of which are multilayer perceptron (MP), deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), and long short-term memory (LSTM). The different connection and transmission methods of the models produce different analytical results. Accordingly, we input the same machining signals into different models for analysis to compare their prediction accuracies.
Although ANNs have been widely used to predict the effects of cutting parameters on machining results, most studies have used the machining conditions as the input. Few studies have extracted real-time machining signals as the input for ANNs. At the same time, there are few studies discussing the differences between different models based on cutting analysis. In the present study, we employed a sensory tool holder to collect cutting force signals during machining and converted the signals through Fourier transform for feature processing. Subsequently, the data were input into three different ANNs (DNN, CNN, and LSTM) for training. The goal of this training was to measure the surface roughness and dimensional accuracy after machining. After the training was completed, the data not used for training were used for testing to determine the training effects and prediction error rates of the models. Finally, by comparing the prediction accuracy and analytical efficiency of these three models, we aimed to identify a model with a high accuracy (with a percentage error of prediction below 10%) and the shortest computing time. The identified model can facilitate real-time surface roughness and machining accuracy prediction.
In the following, the second section introduces the methods and instruments used in this study, including the experimental operation process and the setting of the mathematical model. The third chapter presents the data training results and error analysis. The fourth chapter is the conclusion.
2. Materials and Methods
2.1. Artificial Neural Network (ANN)
Machine learning is an approach used to realize artificial intelligence. By using algorithms, machine learning can replace the previous methods by discovering rules and forming judgements after repeated experiments. Deep learning, a branch of machine learning, was initially a stagnant field due to its insufficient computational resources and efficiency. With recent improvements in hardware, particularly the emergence of high-quality graphics processing units and the rise of big data, deep learning has become the mainstream method of machine learning. An ANN is a type of mathematical, biomimetic neural network model and is the basis of the current deep learning models. Composed of artificial neurons, it contains an input layer, hidden layers, and an output layer. Data and signals can be stored or learned by such models. The calculation of a neuron is conducted through the functions of addition, subtraction, multiplication, and division. The variables, activation functions, errors, and weights input into the models are converted into output values. The most commonly applied activation functions are the Sigmoid function, rectified function (ReLU), and the hyperbolic tangent function. To construct an ANN, the parameters are set manually. Users should determine the appropriate number of neurons and layers in the model according to their requirements and the correct weights through repeated training. The numerous different neural networks developed up to the present day have produced satisfactory results in fields such as machine vision, speech recognition, natural language processing, and biomedicine.
ANN models can use various types of deep learning architectures, including MP, DNN, CNN, RNN, and LSTM. Different models have been used to predict machining results and achieve machine adaptive control. Lai et al. proposed a hybrid recurrent neural network (HRNN) model on the basis of a diagonal recurrent neural network [
14]. The constant force control applied during machining can be used to verify the effectiveness of the model through simulations and tests. Huang developed a new intelligent neural fuzzy system to assess surface roughness in an end milling operation [
15]. The model implemented the neural-assisted method to generate the fuzzy IF–THEN rules and obtain higher accuracy in surface roughness prediction. Huang et al. adopted a holistic local LSTM model (HLLSTM) to capture data features and retrieved diachronic machining signals from a triaxial accelerometer for training and testing in order to establish a deep-learning-based tool wear prediction system [
16]. The results of the HILSTM model were compared with those of a CNN and LSTM model, and the HILSTM model was proven to have a more satisfactory performance. Huang et al. proposed a deep convolutional neural network (DCNN) based on multi-domain feature fusion to predict tool wear [
17]. The performance of the prediction method was experimentally validated using a three-flute ball nose tungsten carbide cutter for dry milling using a high-speed CNC machine tool. Chan et al. also conducted tool wear prediction with an HLLSTM model [
18]. The model could reduce the average error of the actual tool wear values and accurately predict tool wear.
2.2. Experiment Procedure
In this study, we applied a full factorial design to determine the cutting parameters and employed a five-axis machine for milling. The milling machine was a 5-axis machining center CT-350, manufactured by Tongtai Inc., Kaohsiung, Taiwan, equipped with numeric command (Siemens 840Dsl). The axis of the machine is shown in
Figure 1. The workpieces were 80 mm × 80 mm × 60 mm SUS304 stainless steel hexahedrons.
Table 1 lists the mechanical properties and chemical composition of SUS304. In the cutting process, the four sides in the XY plane of the hexahedron were milled using the side edge, and the cutting depth was in the Z direction. The processing path was generated using the Siemens NX, as displayed in
Figure 2 [
19,
20]. A Ø 10 mm tungsten steel end mill from Chin Ming Precision Tools Co. Tainan, Taiwan, was used for the side milling. The specifications of the tool are presented in
Table 2. During the machining process, a sensory tool holder (Pro-micron GmbH & Co. KG, Kaufbeuren, Germany) was used to collect the cutting force signals. The specifications of the sensory tool holder are presented in
Table 3. It could measure the axial cutting force, cutting torque, and the bending moment in the X-Y direction and send data to a computer wirelessly. We wrote a neural network program in Python and then extracted features from the captured cutting force data. The compiler was Colaboratory, which is a product from Google Research and is free to use. Finally, we imported the features into the program for the model training and prediction.
To differentiate the surface roughness and dimensional accuracy after processing, we selected the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut as the four factors and set three factor levels to conduct a full factorial experiment. The machining parameters are shown in
Table 4. A total of 81 tests were designed, and each was conducted twice, resulting in 162 datasets. We conducted side milling on cubic workpieces. Face milling was first applied to the surface. Each workpiece had eight machined surfaces, including four upper and four lower. After the machining was completed, a measuring instrument (Hommel-etamic T8000) was employed to estimate the surface roughness, as displayed in
Figure 3. Each machined surface was measured three times to obtain an average value. A TESA-hite Magna 400 height gauge was used to estimate the machined surfaces and the datum surface. The positions of three points were measured to obtain the mismatch and average values in order to obtain the machining dimension error, as illustrated in
Figure 4.
2.3. Signal Preprocessing
The sensory tool holder could collect three types of cutting force signals (i.e., tension, torque, and the bending moment). We adopted the bending moment as the basis for the side milling evaluation (
Figure 5). We observed that the bending moment increased to 9~12 N m after the tool came into contact with the workpiece during the side milling process. After the tool left the workpiece, the bending moment decreased. The sample interval of the tool holder was 0.0004 s, and the measuring frequency was 2500 Hz. During machining, 2 s signals were captured, and 5000 signals were retrieved for each dataset. To achieve a satisfactory training result, we conducted feature extraction before inputting the data into the ANN models for training. The purpose of feature extraction was to obtain essential and meaningful features from the raw data and increase the analytical efficiency. The bending moment signals were converted to a frequency domain from a time domain using a Fourier transform technique. The bandwidth of the original signals was 2500 Hz, whereas the effective bandwidth of the signals after fast Fourier transform was 1250 Hz. Therefore, the number of each dataset was half of the original: 2500.
Figure 6 illustrates the idling frequency for a 6000 RPM spindle speed. A peak is observable at 100 Hz. The other three recorded peaks in
Figure 6 refer to the natural frequency of the sensory tool holder.
Figure 7 displays the spectrum during cutting. The cutting speed is 70 m/min, and the spindle speed is 2228 RPM. A peak is observable at 37 Hz. In addition to the rotation frequencies, the harmonic frequencies also had peak values of 74 Hz, 112 Hz, and 186 Hz.
2.4. Modeling Set-Up
Before inputting the data into the ANNs for training, the parameters of each ANN were set. Parameters common to all the ANNs were the stride, learning rate, and batch size. The learning rate affects the number of strides, so that a lower learning rate requires more strides during training. In this study, the learning rate was set as 0.00015. The mean-square error (MSE) was used as the loss function to determine the training result, and the root mean-square error (RMSE) was further used to evaluate the test results. The training revealed that after a certain number of strides, the loss function and RMSE no longer changed significantly. After observing the convergence of the models, we set the stride to 1000 and the batch size as half of the training data.
In total, 162 experimental data points were obtained. The training methods could be divided into training on all the data collectively and training on the classified data in turn. When training the classified data, we divided the data into three sets according to the three variances of the four factors (i.e., the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut). Each set had 54 data points. The training result for each set of data was determined using MSE as the loss function, and four data points were randomly selected to test for accuracy with RMSE. Convergence was achieved after three tests, and the mean absolute percentage error was calculated.
2.4.1. Convolutional Neural Network (CNN)
A CNN is a type of deep learning model that is mainly used for image recognition. It can effectively conduct feature identification and learning, as well as data analysis, minimizing the data size. It comprises an input layer, multiple hidden layers, and an output layer. The hidden layers are composed of convolutional layers, pooling layers, and fully connected layers. The convolutional layers are mainly responsible for feature extraction and can achieve superior spatial feature learning. The pooling layers filter feature data and retain the essential features to downsize the data, effectively reducing the difficulty of the training.
The input data for this study were obtained in time series order. Time series data are suitable for storage as a one-dimensional matrix. Thus, for the CNN model, we set the number of input channels to one and the input data format as a 1-by-2500 matrix. In order to choose a reasonable model size, the model contained a convolutional layer and a pooling layer, as presented in
Figure 8. The convolutional kernel size was 500, and the stride (step of each movement of the convolution kernel) of the kernel was 300. The zero padding was 200. The depth slice of the pooling layer was 3, and the stride of the depth slice was 2. Finally, we set the number of output channels to 16.
2.4.2. Deep Neural Network (DNN)
A DNN, as the name implies, is a neural network with dozens or hundreds of hidden layers. Each layer contains many neurons, and each neuron transmits its weighted output to a neuron in the next layer. Users must set appropriate parameters according to the project requirements. The activation function of the architecture is mainly used for nonlinear conversion, and the loss function is used to estimate the difference between the predicted and actual values. The model can be divided into two parts, namely the forward-propagation and backpropagation networks.
Since the number of each input dataset was 2500, the number of input layers of the DNN model was 2500. We had to adjust the number of hidden layers and the number of neurons to obtain the desired convergence. A model with more hidden layers is more complex and may cause overfitting. To reduce the complexity of the model and avoid overfitting, the number of hidden layers can be reduced. The configuration of the hidden layers and nodes in the DNN model after the convergence analysis are displayed in
Figure 9.
2.4.3. Long Short-Term Memory (LSTM)
An LSTM network is a special type of RNN. It was developed to solve the problem of vanishing and exploding gradients during training. Gradients vanish and explode because model weights disappear or become excessively large due to multiplication during backpropagation. By using a gating mechanism, an LSTM network solves this problem by using an input gate, forget door, and output gate to enable backpropagation for the identification of time series correlations in the data.
Our LSTM model had an LSTM layer and an output layer. The number of input nodes was 500, and the number of nodes in the LSTM layer was 32. Before the data were input, they were dimensionally transformed and rearranged into the form required by the LSTM model. MSE was used as the loss function, and RMSE was used to evaluate the training results.