All the experimentations are carried out in Python 3.11.2 on a Windows system with an Intel (R) Core (TM) i7-3770 CPU 3.40 GHz and 16 GB of RAM.
4.1. Performance Evaluation
The experimental results were evaluated using several criteria: precision, sensitivity, specificity, and F-measure. A comparison was made between the proposed (Ensemble Classifier Algorithm Stacking Process) and two state-of-the-art techniques, namely Support Vector Machine (SVM), Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN).
Table 1 presents the statistically significant characteristics identified during a network of two users having a connection. The false non-negative rate, accuracy (Acc), true positive rate (TPR)& f-measure were the performance metrics that were considered for the supervision applications. Botnets like Neris, Rbot, Vitut, Menti, and Sogou were considered for this purpose. The definitions for these requirements were established as follows:
TP indicates an accurate botnet attack prediction.
TN indicates the standard data’s right prediction.
FP specifies the wrong categorization of a botnet assault.
FN specifies the inaccurate classification of normal data.
Accuracy (
): This is the ratio of all datasets correctly identified for bugging and debugging to all bugging and debugging reports.
Sensitivity (
), which is determined according to Equation (18), is the proportion of negative cases that are wrongly categorized.
The ratio of non-negative examples that are correctly identified is known as specificity (
), and it is calculated using Equation (19).
The
in Equation (20) is created by merging precision and recall into a single metric.
The accuracy comparison of the existing CNN, ELM, SVM, and suggested PROPOSED algorithms is shown in
Table 2. The comparison of accuracy between the proposed method and the current ELM, CNN, and SVM methods is presented in
Figure 4. The number of datasets analyzed is displayed on the X-axis, while the corresponding accuracy percentages are shown on Y-axis (
Figure 5). The proposed method achieved an accuracy score of 94.08%, which is 3.68% better than the current ELM, CNN, and SVM methods that achieved 91.6% -> ELM, 92.56% -> CNN, and 92.56% -> SVM, respectively, as indicated by in
Table 1.
Table 3 compares the sensitivity achieved by the proposed approach with existing methods such as CNN, ELM, and SVM.
Moreover,
Figure 6 visually represents the sensitivity comparison between the PROPOSED approach and the CNN, ELM, and SVM methods. The number of datasets analysed (X-axis) versus % of sensitivity values computed (Y-axis). The PROPOSED approach scored a sensitivity of 86.5%, outperforming ELM by 3.44%, CNN by 3.3%, and SVM by 1.44%. In contrast,
Table 4 shows the specificity comparison between PROPOSED and existing algorithms, where CNN, ELM, and SVM techniques achieved 83.14%, 83.8%, and 85.14%, respectively.
Existing CNN, ELM, and SVM approaches are compared to the newly proposed method using the F-based metric, and the results are shown in
Table 5. To the contrary,
Figure 7 displays the percentage values of the F-measure on the Y-axis, and analytical datasets utilised on the X-axis. The PROPOSED method achieved the highest F-measure of 86.6%, outperforming ELM by 3.18%, CNN by 3.02%, and SVM by 1.86%. However, the existing CNN, ELM, and SVM methods scored 83.42%, 83.58%, and 84.74%, respectively.
Figure 8 compares the F-based measure percentage values obtained by the newly proposed method and the existing CNN, ELM, and SVM methods, where the X-axis shows the number of datasets analyzed, and the Y-axis shows the percentages generated by the F-test. The PROPOSED method outperforms ELM, CNN, and SVM, achieving an F-measure percentage of 78.24%, 3% higher than ELM, 2.4% better than CNN, and 1.02% better than the SVM. In contrast, the existing CNN, ELM, and SVM methods attain F-measure percentages of 75.34%, 76.54%, and 77.32%, respectively. Additionally,
Table 6 compares various parameters between the proposed method and the existing ELM, CNN, and SVM algorithms. The use of mathematical techniques in optimization technology is a practical application that is widely used across various fields of engineering to handle diverse challenges. Developing intelligent optimization techniques suitable for real-world engineering problems has long been an essential area of research due to the complexity and nonlinearity of these problems. Swarm intelligence is a kind of heuristic searching algorithm derived from genetic studies or observing the social behaviour of small creatures such as ants and bees. In swarm intelligence algorithms, individuals are represented by particles that move and interact with each other according to a set of rules. This approach is particularly useful in solving nonconvex, nonlinear, or non-differentiable optimization problems where the organizational structure of the method is not critical. Swarm intelligence algorithms, which are commonly used, encompass particle swarm optimization and genetic algorithms are two examples of such methods.
The process of adjusting hyperparameters in PROPOSED is challenging but significant. It involves selecting the appropriate activation and optimization functions and determining the optimal model structure. In addition, diagnostic procedures such as overfitting-underfitting must be enabled, and early halting functionality and batch-epoch definition should be established during the learning process. Underfitting occurs when NNs have not been trained for a sufficient amount of time, or the training set is not significant enough to establish the proper relationship between input and output variables, while overfitting happens when the model is too tightly matched to a small number of data points, rendering it only applicable to the original dataset. To prevent overfitting, the dropout regularization method is used to miss some neurons’ connections when training the model randomly. This study utilised the Optuna open-source platform to optimize PROPOSED performance and produce more accurate predictions for Android malware. Various PROPOSED model structures were designed by Optuna, and their hyperparameters were adjusted to assess their prediction performance. The PROPOSED was trained on the dataset for 2000 epochs with internal parameters updated after every 25 records, and Optuna was set to perform PROPOSED model optimization and evaluation 50 times. Each of the impacts of the hyperparameter on the PROPOSED system’s objective value (prediction accuracy) is depicted in
Figure 5.
It is observed that the Dropout factor at the input nodes has the most significant impact (44%) on the model’s best value for the best prediction accuracy. The learning rate (lr) parameter has a 21% effect on the model’s best prediction. The number of units located at the input nodes and the total number of layers contribute 20% and 15%, respectively, to the model’s best value.
Figure 9 also provides a useful plot demonstrating the relationship between the different hyperparameter values and the best forecast result. This chart clarifies how the value of each hyperparameter is related to the PROPOSED model’s optimal value.
Figure 10 displays the empirical distributions of the PROPOSED model. According to the confirmation performance analysis, the PROPOSED model outperformed the shallow are classified assessed in
Section 4.2 in terms of performance when it came to predicting Android malware.
Figure 11 presents the remaining performance measures for the PROPOSED system.
Figure 12a shows precision curves for learning and forecasting phases over the various epochs, whereas
Figure 12b shows the cross-entropy (loss). Last but just not least,
Figure 12c provides the model’s confusion matrix. According to the evaluation results described above, the PROPOSED system with the best prediction results has a four-layer architecture, including two hidden layers.
Figure 13 depicts the model’s abstract architectural perspective, and
Table 7 summarises each layer’s parameters. The Adamax optimizer, Binary Cross-Entropy (loss), and Softsign activation functions can all be used with the optimized model.
Figure 14 displays the classifier’s history of optimization over the epochs. In particular, the model’s optimized prediction accuracy increased by 2% from the XGboost one to 86%.
The proposed method consistently demonstrates superior performance, particularly in terms of accuracy, compared to ELMs, CNNs, and SVMs across various numbers of packets. With accuracy values ranging from 92.3% to 97.3%, the proposed method outperforms ELMs by 2 to 4 percentage points, CNNs by 0.4 to 4 percentage points, and SVMs by 0.3 to 2.9 percentage points. These results highlight the effectiveness of the proposed method in achieving higher accuracy rates, making it a promising approach for packet analysis tasks.
For 250 packets:
Proposed accuracy: 97.3%
ELMs accuracy: 95%
CNNs accuracy: 95.6%
SVMs accuracy: 97%
The proposed method exhibits superior performance compared to ELMs by 2.3 percentage points, CNNs by 1.7 percentage points, and SVMs by 0.3 percentage points in terms of accuracy.
4.3. Discussion
The experimental results of the proposed Ensemble Classifier Algorithm Stacking Process were evaluated using various criteria such as precision, sensitivity, specificity, and F-measure. A comparison was made between the proposed method and three state-of-the-art techniques: Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Convolutional Neural Network (CNN). The evaluation focused on the performance metrics relevant to botnet attack prediction, including false non-negative rate, accuracy (Acc), true positive rate (TPR), and F-measure. The accuracy comparison of the four algorithms is presented in
Table 2. The proposed method achieved an accuracy score of 94.08%, outperforming the existing ELM, CNN, and SVM methods that achieved 91.6%, 92.56%, and 92.56%, respectively. This indicates the effectiveness of the proposed method in accurately identifying and categorizing botnet attacks.
Furthermore, the sensitivity evaluation in
Table 3 demonstrates that the proposed approach achieved a sensitivity of 86.5%, surpassing the ELM by 3.44%, CNN by 3.3%, and SVM by 1.44%. The comparison in
Table 4 shows that the proposed method achieved a specificity of 82.4%, while CNN, ELM, and SVM achieved 83.14%, 83.8%, and 85.14%, respectively. To assess the overall performance, the F-measure was used as a combined metric of precision and recall. The proposed method achieved the highest F-measure of 86.6%, surpassing ELM by 3.18%, CNN by 3.02%, and SVM by 1.86%. The existing CNN, ELM, and SVM methods achieved F-measure percentages of 75.34%, 76.54%, and 77.32%, respectively. These results highlight the superiority of the proposed method in accurately predicting and categorizing botnet attacks. The optimization of hyperparameters in the proposed method was crucial for achieving the best prediction accuracy. Factors such as the number of layers, hidden layers, learning rate, dropout, optimizers, activation, and loss functions were considered. The Optuna open-source platform was utilized to optimize the performance of the proposed method. Through optimization, the proposed model architecture with four layers, including two hidden layers, achieved the best prediction results. The Adamax optimizer, Binary Cross-Entropy (loss), and Softsign activation function were identified as the optimal choices for the model.
Furthermore, the importance of input features was analyzed using the SHAP (Shapley Additive Explanations) framework. The analysis revealed the significant impact of certain features on the system’s classification output—for example, the feature had the most influence, where low values increased the probability of an application being classified as malware. The Intent.RECEIVE and class.DEFAULT features also exhibited notable influences on the system’s output. Finally, the experimental results and evaluations demonstrate the effectiveness of the proposed Ensemble Classifier Algorithm Stacking Process in accurately predicting and categorizing botnet attacks. The proposed method outperformed the existing state-of-the-art accuracy, sensitivity, specificity, and F-measure techniques. The optimization of hyperparameters and the analysis of input feature importance further enhanced the performance and interpretability of the proposed method. These findings highlight the potential of the proposed method for effective botnet detection and classification in real-world applications. Future research can focus on expanding the evaluation to larger datasets and exploring the generalizability of the proposed method to other types of cybersecurity threats.