4.1. Evaluation Index and Deep Learning Model Parameters
The dataset was constructed with a total of 900 datasets by measuring the distance of metal objects from the sensor (20 cm, 30 cm, 40 cm) for five types of metal objects. Set the sensor movement speed to 1 m/s, 3 m/s, and 5 m/s for 20 measurements each. The training data and test data were used in an 8:2 ratio. The time slot unit of the input time series data for prediction and learning was set to 64. It was compared and analyzed using the accuracy index of Equation (
3) to evaluate the learned model.
Accuracy is computed by dividing the number of correct predictions by the total number of predictions.
Table 3 lists the number of parameters and inference time based on the RNN layer type. Compared to the CNN model that processes images, the RNN model requires fewer parameters and exhibits a faster inference time. There are four layers: LSTM, LSTM-Bidirectional, GRU, and GRU-Bidirectional. In addition to this, there are various layers, such as the embedding layer, which we have not used because they are used for natural language processing or are more suitable for other purposes.
Overall, the general LSTM and GRU models have more parameters than the bidirectional model, and the inference time is also relatively slow. The GRU-Bidirectional model has fewer parameters than other models for all of the number of layers, and the inference time was the fastest. The inference time of the models is the lowest when the time slot of the input data is 64, and the nine-layer model of LSTM, which has the slowest inference time, can process in real time up to a sensor data acquisition frequency of up to about 43,000 Hz. The fastest model, the one-layer GRU-Bidirectional, is capable of real-time prediction up to about 92,000 Hz.
4.2. Performance Comparison Based on the RNN Layer Type
Figure 11,
Figure 12,
Figure 13,
Figure 14 and
Figure 15 depict the loss function and accuracy convergence graphs according to the type and depth of each layer of RNN when the distance between the sensor and the object is 20 cm. All RNN models converge in a similar manner, and the GRU model exhibited the best performance. When the layer was too shallow, data learning was not performed smoothly, and loss and accuracy were unstable at the beginning of learning. Subsequent to converging, the predicted results tended to overfit the training data. When the layer was deep, fast convergence, stability, and accuracy were shown to be high.
In the learning loss and accuracy convergence graph, the LSTM layer, when set to a single layer depth, exhibits optimal performance. This is indicative of the balance between model complexity and its ability to learn the underlying features of the data. Notably, with a shallow model depth, we observed increased fluctuations in the loss value, decreased accuracy, and slower convergence. This behavior suggests that a model with insufficient depth might struggle to capture the intricate features of the signal. Conversely, as the depth increased, the GRU-Bidirectional model outperformed others, demonstrating rapid convergence and superior performance. Such observations underline the significance of model depth and architecture in determining the learning capabilities of RNNs.
It is also worth noting that the detection models generally exhibited faster convergence and higher accuracy compared to classification models. This could be attributed to the inherent challenges associated with multi-class classification tasks, especially when dealing with intricate signal patterns.
Table 4 offers a comprehensive performance comparison across the test set, factoring in the varying layer types and depths of the RNN. It sheds light on two key performance metrics: detection, which determines the presence or absence of a metal object, and classification, which recognizes and categorizes among five distinct metal models. This table serves as a testament to the varying capabilities of different RNN configurations and provides insights into their respective strengths and limitations.
Moreover, it is crucial to emphasize that while certain RNN configurations might excel in one aspect, they might not necessarily be the best fit for other tasks. For instance, while GRU-Bidirectional models might converge faster and demonstrate lower loss values, they might require more computational resources. Such trade-offs should be considered when selecting an appropriate model for specific applications.
The model was hardly trained in the first layer of the four models, most of the signals were predicted as 0, and it was confirmed that a numerical value such as the null accuracy was obtained. The LSTM-Bidirectional model exhibited the highest accuracy of classification and a recognition rate of 95.93%. The LSTM model exhibited the highest accuracy of detection at 98.09%. The classification accuracy of the LSTM model was 87.44%, which is relatively low. However, the detection rate of the LSTM-Bidirectional model was 97.9%, which was 0.19% less than that of the LSTM, and showed a high accuracy. The LSTM-Bidirectional model is excellent for both detection and recognition and is suitable for practical use. The next best performing model is the GRU-Bidirectional model. Respectively, the detection rate and recognition rate are 97.6% and 95.51%. There was a slight difference in the numerical accuracy, and, since the GRU-Bidirectional model exhibited the highest inference speed, it was judged that there was no problem in adopting the GRU-Bidirectional model for an application that requires a higher speed.
Table 5 presents a performance comparison based on the type and depth of each layer of the RNN. The distance between the sensor and the object is 20 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
Table 6 presents a performance comparison table for each layer type and depth of the measured RNN and a sensing speed (1 m/s, 3 m/s, 5 m/s); the distance between the sensor and the object is 20 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
Table 7 presents a performance comparison based on the type and depth of each layer of the RNN; the distance between the sensor and the object is 30 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
Table 8 presents a performance comparison table for each layer type and depth of the measured RNN and a sensing speed (1 m/s, 3 m/s, 5 m/s); the distance between the sensor and the object is 30 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
Table 9 presents a performance comparison table when the type and depth of each layer of the measured RNN; the distance between the sensor and the object is 40 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
Table 10 presents a performance comparison for each layer type and depth of the measured RNN and a sensing speed (1 m/s, 3 m/s, 5 m/s); the distance between the sensor and the object is 40 cm. The detection performance of confirming the presence or absence of a metal object and the performance of classifying and recognizing five types of metal models were compared and analyzed.
At a distance of 20 cm, the nine layers of the LSTM model exhibited the best detection performance, and the classification and recognition performance was the highest performance in the nine layers of the LSTM-Bidirectional model. The detection performance at a distance of 30 cm was the highest in the nine layers of the LSTM model similar to that at a distance of 20 cm, and the classification and recognition performance showed the highest performance in the nine layers of the LSTM-Bidirectional model. In addition, the detection performance at a distance of 40 cm was the highest in the nine layers of the LSTM model, and the classification and recognition performance showed the highest performance in the nine layers of the LSTM-Bidirectional model. As a result of learning the deep learning model, in general, the shallower the model layer, the lower the performance compared to other layers because of the irregularity in the sequence data and its incapability to learn the correlation between the front and rear signals. Conversely, the correlation between the input data in the feature extraction process is decreased due to the deeper layer of the model and the greater distance between its input and output ends. Our verification tests confirmed that all models are suitable for real-time detection and classification.
The overall performance was observed to be excellent for a distance and speed of 40 cm and 5 m/s, respectively.
The detection performance was better in the forward LSTM and GRU with general learning. It was confirmed that forward learning was more advantageous because detection judged similar patterns as a single signal rather than recognizing each similar pattern. In recognition and classification performance, LSTM- and GRU-Bidirectional learning with the reverse order of sequence data showed better performance compared to the forward LSTM and GRU that were trained normally, and the number of parameters was not increased. In addition, the detection performance of the interactive model was also high. Accordingly, it was confirmed that the interactive model showed better performance. Thus, it was deemed suitable for real-time data processing.
Figure 16 depicts the accuracy comparison according to data time series units for deep learning model training. The time series unit was varied from 10 to 2000 units. The signal acquisition frequency of the sensor used in this paper was about 200 Hz, and the time series unit for optimal learning was analyzed accordingly. The shorter the time slot, the more similar were the training results to the null accuracy of the dataset. Accuracy starts to converge from time slot 60 or higher, and it was confirmed that convergence was achieved at time slot 300. If the sampling rate of the sensor was exceeded and the time slot was increased, the ratio of null data to the training data was found to increase. The null dataset denotes a signal in a static state, and most of the signals were determined to be in the static state. As the time slot was increased, the proportion of the dynamic signal reduced, resulting in lower accuracy. In addition, as the time slot was increased, the inference cost of the deep learning model was also increased, and the inference time was increased. Therefore, it was advantageous to set the time slot to 60~300 for real-time inference of the deep learning model.
Table 11 compares the accuracy of the single prediction method and the overlap prediction method of the deep learning model. In the single prediction method, a large number of errors occurred due to the prediction of the next piece of data by skipping over the previously predicted data without processing them again. The overlap prediction method used in this paper predicts some of the previously predicted data from the current and next pieces of data and readjusts the prediction results through probability distribution. In this way, the error was minimized, and high accuracy was shown according to the overlapping sequence prediction results.