5.1.1. Network Architecture
The structure of the hybrid network model MCRL, which combines the base neural network models MLP, CNN, RNN, and LSTM with a linear attention mechanism, is shown in
Figure 5.
The model employs a decay function to initialize the optimizer with a learning rate, decaying by 10% every 1000 steps. The input vector size is , which is initially divided into four branches and processed through linear transformations.
The first branch utilizes the ReLU linear attention mechanism to process data, which are then fed into the MLP. Initially, the two-dimensional output vector is flattened using TensorFlow’s flatten module into a one-dimensional vector of size . This vector is then passed through four hidden layers with 128, 64, 32, and 16 nodes, each followed by a ReLU activation function, finally transforming the output vector to a size of . It then passes through a hidden layer with four nodes and a Softmax activation function to produce the output.
The second branch downsamples the data, converting the two-dimensional output vector to a size of . After processing with the ReLU linear attention mechanism, the data are fused with the MLP output and fed into the CNN unit. It first passes through a convolutional layer with 32 kernels, followed by a max-pooling layer, resulting in an output vector of size . This is followed by a dropout layer with a default probability of 0.2, then flattened into a vector. The vector is then passed to a hidden layer with 512 nodes using ReLU activation and finally through a hidden layer with four nodes using Softmax activation to produce the output.
The third branch downsamples the data, converting the two-dimensional output vector to a size of . After processing with the ReLU linear attention mechanism, the data are fused with the CNN output and fed into the RNN unit. It passes through three hidden layers with 128, 64, and 64 nodes, with the third layer returning only the last timestep output, resulting in a vector. Each layer is followed by a Tanh activation function. Finally, it passes through a hidden layer with four nodes using Softmax activation to produce the output.
The fourth branch downsamples the data, converting the two-dimensional output vector to a size of . After processing with the ReLU linear attention mechanism, the data are fused with the RNN output and fed into the LSTM unit. It passes through three hidden layers with 64, 32, and 32 nodes, with the third layer returning only the last timestep output, resulting in a vector. Each layer is followed by a Tanh activation function. Finally, it passes through a hidden layer with four nodes using Softmax activation to produce the output.
The outputs of the four branches are finally fused and output through a linear transformation.
5.1.2. Experimental Results
We utilized datasets comprising the machining G-code for aerospace gear, pentagram-shaped bosses, and maple leaf models. The MCRL model was employed to classify paths within these datasets. The experimental environment was an RTX 4070.
Table 2 presents the hyperparameter settings for all experiments. This table includes the initial learning rate, learning rate decay rate, decay steps, batch size, training iterations before convergence, and total training time (in minutes).
The hyperparameters were set as follows:
Learning Rate: This parameter was set to either 0.0001 or 0.0005, depending on the dataset. The learning rates for the gear and pentagram datasets were 0.0001, while the rates for the maple leaf datasets were 0.0005.
Decay Rate: The decay rate varied between datasets, ranging from 0.8 to 0.9. The decay rates for the gear datasets were higher at 0.9, indicating a slower reduction in the learning rate over time compared to the 0.8 rate for the pentagram and maple leaf datasets.
Decay Steps: The decay steps were either 100 or 500. The gear and maple leaf datasets used a larger number of decay steps (500), implying a lower frequency of learning rate decay application, while the pentagram datasets employed 100 decay steps.
Batch Size: The batch sizes were set to either 32 or 64. The gear and maple leaf datasets used a batch size of 64, which generally provides a more stable gradient estimate, whereas the pentagram dataset used a smaller batch size of 32.
Iterations: The number of iterations required varied significantly between datasets. The gear dataset required the most iterations, totaling 497, while the pentagram dataset required 203. The maple leaf datasets required 234 and 340 iterations, respectively.
Time: The training times for each dataset also varied. The gear dataset required the longest training time, at 212 units, while the pentagram dataset required 169 units, and the maple leaf dataset required the shortest time of 40 units.
These configurations highlight the tailored approach needed to optimize model performance across different datasets, demonstrating the variability of training parameters essential for achieving effective learning outcomes.
- 2.
Performance Metrics
We evaluated the performance of the MCRL model using metrics such as accuracy, loss, precision, recall, F1 score, and AUC.
Accuracy: This metric represents the proportion of correctly predicted samples out of the total number of samples,
where:
TP (True Positives): The number of samples that are truly positive and predicted as positive.
TN (True Negatives): The number of samples that are truly negative and predicted as negative.
FP (False Positives): The number of samples that are truly negative but predicted as positive.
FN (False Negatives): The number of samples that are truly positive but predicted as negative.
Loss: This metric measures the disparity between the model’s predictions and the actual labels. In this experiment, the multiclass cross-entropy was used as the loss function.
where:
N is the total number of samples.
C is the total number of classes.
is the true label of the i-th sample, where if the true label of the i-th sample is class j, and 0 otherwise.
is the predicted probability that the i-th sample belongs to class j.
is the natural logarithm of the predicted probability of class j.
Precision: This metric indicates the proportion of correctly predicted positive samples out of all samples predicted as positive.
High precision signifies that the model accurately predicts positive classes, with few false positives.
Recall: This metric measures the proportion of correctly predicted positive samples out of all actual positive samples.
High recall indicates that the model can identify a majority of the actual positive samples.
F1 score: This metric measures the harmonic mean of precision and recall, providing a comprehensive measure of the model’s performance.
AUC: The area under the ROC (Receiver Operating Characteristic) curve quantifies the model’s capability to distinguish between classes.
where:
FPR: False Positive Rate
TPR: True Positive Rate
In this study, we compared the MCRL model with three other prominent models: ConvMixer (2022) [
29], ConvNeXt (2022) [
30], and MaxViT (2022) [
31].
Table 3 presents the performance metrics of the various models across different datasets.
In assessing model performance across various datasets, our model (MCRL) exhibited outstanding results in multiple tests. A thorough analysis of the experimental outcomes from the gear, pentagram, and maple leaf datasets provided an in-depth understanding of the performance variations between MCRL and the other models across different datasets. These findings are crucial for evaluating the overall effectiveness of the models and their adaptability to various application contexts. The following section provides a detailed analysis of the performance of the different models on each dataset.
The results of the models on the gear dataset can be summarized as follows:
The MCRL model achieved the best performance on the gear dataset, with an accuracy of 94.75%, a loss of 14.26, a precision of 96.23%, a recall of 93.52%, an F1 score of 94.85%, and an AUC of 97.71%. These results underscore MCRL’s superior predictive performance on this dataset and its ability to effectively balance various metrics.
The ConvMixer and ConvNeXt models also performed relatively well, with accuracies of 92.22% and 93.92%, respectively. While their other metrics were comparable, they still fell short compared to MCRL.
MaxViT slightly lagged behind ConvNeXt in overall performance but maintained high precision and F1 score.
The MLP, CNN, RNN, and LSTM models showed mediocre performance, particularly the MLP and RNN models, which fell below 90% across all metrics, highlighting their limitations on this dataset.
The results of the models on the pentagram dataset can be summarized as follows:
On the pentagram dataset, MCRL again excelled with an accuracy of 94.98%, a loss of 13.55, a precision of 96.47%, a recall of 93.35%, an F1 score of 95.63%, and an AUC of 97.58%.
The ConvMixer and ConvNeXt models performed well on the pentagram dataset, although they slightly lagged behind MCRL in recall and F1 score, with values around 93% and 92%, respectively.
MaxViT’s performance was fairly balanced, but it fell short in accuracy and AUC.
Other models, such as MLP, CNN, and RNN, showed relatively poor performance, with RNN notably underperforming, achieving an accuracy of only 86.34% and significant deficiencies in recall and F1 score.
The results of the models on the maple leaf dataset can be summarized as follows:
On the maple leaf dataset, the MCRL model achieved the highest accuracy of 96.32%, the lowest loss of 9.81, and precision and F1 scores of 96.52% and 96.35%, respectively, with an AUC of 98.66%. These results highlight MCRL’s exceptional performance in path classification tasks.
The ConvMixer and ConvNeXt models followed closely, with accuracies of 95.97% and 95.27%, and F1 scores exceeding 95%, although they still fell slightly short of MCRL.
MaxViT also showed stable performance, but its AUC was somewhat lower at 98.00.
Traditional models, such as MLP, CNN, and RNN, generally fell short of the advanced models mentioned above, with RNN showing particularly poor performance on this dataset, achieving an accuracy of only 85.63%.
Overall, the MCRL model exhibited exceptional performance across all four datasets, particularly demonstrating the best comprehensive metrics on the maple leaf dataset. Other models, such as ConvMixer, ConvNeXt, and MaxViT, also performed admirably, making them well suited for tasks requiring high accuracy and precision. In contrast, traditional models like MLP, CNN, RNN, and LSTM performed relatively poorly on these datasets, with RNN, in particular, struggling to match the performance of more advanced models in complex path classification tasks.
Accuracy and loss are the two most crucial evaluation metrics, as they provide a clear reflection of the model’s performance and optimization status.
Figure 6 and
Figure 7 illustrate the accuracy and loss of various models across different datasets.
Figure 6 compares the performance of various models, including MCRL, ConvMixer, ConvNeXt, MaxViT, MLP, CNN, RNN, and LSTM, across each dataset. The x-axis represents the number of training epochs, ranging from 0 to 500, while the y-axis displays the accuracy, which ranges approximately from 0.72 to 0.96.
Across all datasets, the MCRL model (depicted by the red line) consistently demonstrated superior performance, achieving the highest accuracy early in training and maintaining this advantage throughout subsequent epochs. ConvMixer, ConvNeXt, and MaxViT also performed admirably, although their accuracies were generally slightly lower than that of MCRL. In contrast, traditional models, such as MLP, CNN, RNN, and LSTM (represented by lighter colors), typically exhibited lower accuracy and slower convergence.
This comparative chart highlights MCRL’s exceptional performance across various datasets, illustrating its robustness and efficacy in different scenarios. The convergence patterns indicate that while most models stabilized after around 100 training epochs, MCRL excelled in both convergence speed and final accuracy, particularly on the gear and maple leaf datasets.
Figure 7a presents the loss curves of different models on the gear dataset as the training epochs progressed. It is evident that the MCRL model (red curve) converged more rapidly than the other models, achieving the lowest final loss value. Other models, such as ConvMixer, ConvNeXt, MaxViT, MLP, CNN, RNN, and LSTM, also exhibited a downward trend in loss during early training, but their final loss values were higher than those of MCRL.
In
Figure 7b, it is evident that the loss values for all models decreased as the number of training epochs increased. The MCRL model once again demonstrated the fastest convergence rate and the lowest final loss value, with other models trailing behind. Notably, MaxViT and LSTM exhibited a slower rate of loss reduction and ended with higher loss values compared to the other models.
In
Figure 7c, it is evident that the MCRL model continued to exhibit the best convergence speed and the lowest final loss value. The performance of other models mirrored the trends observed in the previous datasets: a rapid initial decrease in loss followed by stabilization, but with final loss values higher than those of MCRL.
Overall, the MCRL model consistently demonstrated superior convergence speed and lower final loss values across all datasets, outperforming the other models.
- 3.
Inference Time
Inference time refers to the duration a model requires to process input data and produce output predictions. It is a critical metric that directly impacts resource efficiency in deployment scenarios. Quantization-aware training (QAT) achieves this by simulating the effects of quantization during the training process, reducing the network’s weights and activations from 32-bit floating-point numbers to lower-precision 8-bit integers. This reduction in precision results in smaller memory usage, which accelerates data transfer and reduces memory bandwidth demands. Consequently, this experiment employed the QAT strategy to diminish the model’s inference time.
Table 4 presents the inference times (in seconds) of the proposed model alongside those of ConvMixer, ConvNeXt, MaxViT, MLP, CNN, RNN, and LSTM.
The observations of the inference time results are summarized as follows:
MCRL: The inference times across various datasets ranged from 3.1 to 3.9 s.
MCRL with QAT: This variant of MCRL utilized quantization-aware training (QAT) technology, which simulates the effects of reduced precision during training, lowering network weights and activations from 32-bit floating-point numbers to a minimum of 8-bit integers. This reduction in precision decreased memory usage and accelerated data transfer, thereby reducing inference time compared to the standard MCRL model. For instance, on the gear dataset, the inference time for MCRL with QAT was 2.8 s, while the standard MCRL took 3.2 s. Similar improvements were observed across all datasets.
ConvMixer (2022), ConvNeXt (2022), MaxViT (2022): These models, introduced in 2022, achieved inference times comparable to MCRL but showed variability across datasets, with ConvMixer and RNN exhibiting slightly higher inference times.
MLP, CNN, RNN, LSTM: These traditional models generally had higher inference times, with MLP showing the highest inference times across all datasets, particularly on the pentagram and maple leaf datasets.
In summary, MCRL with QAT emerged as a strong contender among the evaluated models, optimizing inference time while maintaining performance.
Taking the paths in the gear dataset as an example, four curves were randomly selected based on categories, and their classification accuracy within the network was examined. The green lines represent correct classifications, while the red lines indicate misclassifications, as shown in
Table 5.
The analysis revealed that the MCRL model achieved the best overall classification performance, with an impressive record of complete accuracy. In contrast, the ConvMixer, ConvNeXt, MaxViT, MLP, CNN, RNN, and LSTM models each exhibited a misclassification in one of the selected curves.
In summary, MCRL combines the strengths of MLP, CNN, RNN, and LSTM, significantly enhancing the overall performance of the model and making it the most effective deep learning network model for the given task.