1. Introduction
Gradient-based optimization methods have been widely applied to linear, differentiable, and continuous problems [
1]. However, practical problems are increasingly nonlinear, non-differentiable, and discontinuous, rendering gradient-based methods useless. In contrast, the intelligent algorithms developed rapidly in recent years can effectively solve practical problems, although they lack rigorous mathematical derivations.
In the twenty-first century, the development of technology has also led to further research in the field of intelligent algorithms. A series of new metaheuristic algorithms have been derived. For example, Kaveh et al. [
2] propose an efficient hybrid method based on the Harris Hawk Optimizer and the imperialist competitive algorithm. The Harris Hawk algorithm has an efficient exploitation strategy but performs poorly in the search for optimal solutions, which is compensated by the imperialist competitive algorithm. Song et al. [
3] identified the deficiency of the global search capability of the Harris Hawk Optimizer, and they proposed the persistent-trigonometric-differences mechanism to improve the global search capability of the HHO; in addition, they improved the energy factor of the original algorithm to better balance the exploration and exploitation of the algorithm; finally, they applied it to the parameter identification problem of photovoltaic model parameter extraction. Zhong et al. [
4] proposed an integrated learning Harris Hawk optimization algorithm with a terminal replacement mechanism. The authors propose to combine the comprehensive learning strategy with HHO to improve the convergence of the optimizer.
FS is an important data-preprocessing method in machine learning, but it is NP-hard [
5], with a search space of
for
n features, motivating the use of approximation algorithms to obtain near-optimal solutions as well as metaheuristic algorithms [
6]. Inbarani et al. [
7] proposed a hybrid FS method for disease detection by combining particle swarm-based packing methods with rough set theory. Zhang et al. [
8] introduced variational operators to improve the differential evolutionary algorithm and applied this algorithm to deal with FS. Tubishat et al. [
9] applied SALP swarm opposite learning and local search to FS. Taradeh et al. [
10] improved the GSA algorithm with a GA for more effective FS. Mafarja et al. [
11] used tournament and roulette selection to enhance the search and development process of WOA. Unfortunately, most of these algorithms cannot achieve an effective balance between exploration and exploitation, which means that a better balance in terms of efficiency (search time) and outcome (feature subset length) cannot be achieved in the feature selection problem. This paper applies the improved HHO to the feature selection problem. Because the optimization mechanism of the HHO algorithm is very similar to the optimization process of the feature selection problem, each Harris Hawk can be regarded as a piece of data filled with much redundant information, and the number of data features can be regarded as the dimension of the Harris Hawk hunting space, the selection process of data features is equivalent to the optimization process of Harris Hawk hunting, and the improved algorithm can achieve a better balance between exploration and exploitation. Next, we will detail the principle of the Harris Hawk algorithm, related applications, and an overview of our proposed algorithm improvement strategy.
The Harris hawk is a raptor living in southern Arizona in the United States. Heidari et al. [
12] proposed HHO in 2019 after observing cooperative feeding among Harris hawks. The algorithm has phases of exploration, transition, and development. Due to its simple principle, few hyperparameters, and fast convergence, HHO is widely used in problems involving global optimization, machine learning, medical bioinformatics, natural language processing, and image encryption. Ridha et al. [
13] proposed a method to reliably and accurately identify photovoltaic model parameters by combining HHO with a flower pollination algorithm. Jia et al. [
14] introduced dynamic optimization and a variational mechanism to improve HHO and applied it to the satellite image segmentation problem. Kamboj et al. [
15] proposed hHHO-SCA, which combined HHO and SCA for nonconvex and nonlinear problems. Yanan et al. [
16] embedded SSA in HHO to improve its search capability and applied it to FS. Chiwen et al. [
17] non-linearized the escape energy factor by chaotic perturbation to balance the development and exploration of HHO. Bui et al. [
18] combined HHO with artificial neural networks to optimize their performance for landslide sensitivity analysis. Roy et al. [
19] applied HHO to integrated a more effectiveness and robustness model order reduction.
These applications demonstrate the wide use of HHO to solve practical problems. However, as with traditional intelligent algorithms, HHO has some drawbacks. Each phase of the search process is too homogeneous. HHO mainly relies on the escape energy of the prey, whose range is [−2, 2], to control the hunting process. When its absolute value is greater than or equal to 1, HHO performs the search phase, and when it is less than 1, HHO selects one of four exploitation strategies. This means that in its early stage, the algorithm only performs a search, and in the late stage, it only performs exploitation. This undoubtedly causes the algorithm to easily fall into premature convergence. Therefore, HHO can be improved from the aspects of the update mechanism, addition of new operations, and in combination with other algorithms. The aspect of the update mechanism is mainly reflected in exploring and developing to obtain a balance [
20,
21,
22], which can occur through dynamic adjustment of the escape energy factor. Adding new operators can enhance performance in terms of local or global search capability [
23,
24]. Researchers have also tried to combine HHO with other intelligent algorithms [
25,
26,
27].
We propose an efficient improved greedy HHO (IGHHO), starting from the update mechanism and the incorporation of new operators, replacing the development strategy of HHO with a hybrid differential and greedy strategy. A new conversion strategy makes the algorithm more flexibly switch between exploitation and search, which compares the current particle with the previous global optimum, and if it is better than the previous global optimum, the algorithm next executes the exploitation strategy; otherwise, it executes the search strategy. The performance of IGHHO was tested on the CEC2017 test set [
28], which contains 29 benchmark functions. The ranking method and Wilcoxon rank-sum test results indicate significant improvement. Results on the FS problem show that the algorithm has advantages in practical applications. The major involvement of this study is explained as below:
A novel efficient IGHHO for global optimization and feature selection.
The proposed IGHHO has efficient flexibility to switch between search and development and has strong development capabilities.
The performance of IGHHO was better than with other state-of-the-art optimization techniques.
The IGHHO is applied for feature selection, and we verify the performance of this algorithm in an open dataset.
The remainder of this article is organized as follows.
Section 2 describes the principle of HHO and its model, and
Section 3 discusses the details of IGHHO and its motivation.
Section 4 describes experiments to test the function. The method is applied to FS in
Section 5.
Section 6 relates our conclusions and proposes future research directions.
5. Application to FS
FS is an integral part to improve classification performance by removing irrelevant and redundant features for fast computation [
47]. The wrapper method based on the population intelligence algorithm is widely used due to its simple algorithm and ease of implementation. The method treats the model as a black box [
48], evaluates the feature subset using classifiers or other learning models, and continuously improves its quality. Based on this, we apply IGHHO to the FS problem.
5.1. Model Description
We use the feature subset obtained by evaluating the K-Nearest Neighbor (KNN) classifier. Considering the impact of the data imbalance problem on feature selection [
49,
50], we designed the objective function by weighting the second-order classification error rate and the length of the feature subset,
where
is the length of the selected feature subset;
is the total number of features in the dataset;
is a factor to balance the classification error rate with the length of the feature subset, and
where
n is the number of problem classes,
is the number of correctly classified instances in class
k, and
is the number of all instances in class
k. To better classify classes with few instances, we use the square of the classification error rate to penalize poorly performing classes.
The reason for this consideration is that some classes in the dataset have very few instances, while others have very many instances. For example, for a binary classification problem with 10 instances, problem A has only one instance, while problem B has nine instances. The classifier can easily achieve 90% classification accuracy by simply classifying all instances as problem B, which seems efficient, but the algorithm will perform poorly on real-world problems. To consider only the classification error rate will cause the selected feature subset to contain more redundant features, which will greatly increase the algorithm’s computational complexity, especially for high-dimensional problems. Therefore, we consider the size of the feature subset as an objective function so as to minimize the ratio of the number of selected features to that of all features.
5.2. Dataset Descriptions
To verify the performance of IGHHO in FS, we tested it on 13 publicly available datasets in the UCI Machine Learning Repository [
51]. As shown in
Table 6, these datasets are low-, medium-, and high-dimensional, and there are fewer samples of high-dimensional data, which are highly unbalanced datasets, which are more problematic in FS.
5.3. Experimental Configuration and Parameter Settings
Since some data have fewer samples, we used five-fold cross-validation, dividing the dataset into five parts, taking four for training and one for testing. Only the training set was used for FS, and the test set was input to the KNN model to evaluate the FS performance.
We compared our FS method to HHO [
12], EPO [
52], SSA [
6], SMA [
44], BOA [
45], WOA [
11], and ALO [
46], which were based on advanced metaheuristics. For a fair comparison, the algorithms had consistent configurations, with a population size of 10, maximum of 50 iterations, 20 runs to take the mean value, and KNN parameter set to 5.
5.4. Experimental Results and Analysis
Experimental results comparing IGHHO with other algorithms on the FS problem are presented in
Table 7. From the total classification accuracy results, the proposed IGHHO algorithm ranks in the top three on all datasets, ranks first on the datasets Zoo, Waveform_noise, Lung, Sonar, Isolet, Leukemia, Arcene, and Colon, and even achieves 100% classification accuracy on the dataset Leukemia. In contrast, the EPO algorithm ranked first on only three datasets, and its classification accuracy was only 0.57% better than IGHHO; the SMA algorithm ranked first on only two datasets, Clean1 and Colon, and it is worth noting that on the dataset Clean1, IGHHO achieved only 0.1% less classification accuracy than the first place, while on the dataset Colon, IGHHO tied with SMA for first place; the ALO algorithm ranked first only on dataset CNAE, and again, IGHHO was only 1.57% less than it; HHO, SSA, BOA, and WOA algorithms did not achieve first place on any dataset. In terms of the average ranking, our proposed improved algorithm ranks 1.58 on average, pulling away from the second place of 2.88 and the third place of 3.69. This all indicates that our improved algorithm is dominant relative to the comparison algorithm.
On the other hand,
Figure 6 gives a box plot analysis of the classification accuracy between IGHHO and the comparison algorithm. We can see that the average classification accuracy of IGHHO is nearly 90%, which is much higher than that of the comparison algorithm. The optimal and lowest classification accuracy are also dominant compared with the comparison algorithm. Combined with the above analysis, the IGHHO algorithm proposed in this paper has some advantages over the traditional optimization algorithm in improving the classification accuracy of feature selection.
From the average size of the features selected by IGHHO and the comparison algorithms, as shown in
Figure 7,
Figure 8,
Figure 9 and
Figure 10, it can be observed that IGHHO achieves the shortest feature subset length on six datasets, EPO on five, and SMA on two. However, looking at its data, the IGHHO algorithm achieved a feature subset length nearly 32% shorter than the second place on the dataset Wine, and the final filtered feature subset length was 43% shorter overall compared to other comparative algorithms; nearly 43% shorter than the second place on Zoo and 45% shorter overall; nearly 65% shorter than the second place on Lung and 75% shorter overall; nearly 82% shorter than the second place on Colon and 94% shorter overall; nearly 88% shorter than the second place on Leukemia and 92% shorter overall; nearly 82% shorter than second place on Colon and 94% shorter than overall; and nearly 88% shorter than second place on Leukemia and 92% shorter than overall. For the dataset Waveform_nosie, IGHHO did not achieve outstanding feature subset lengths; for the dataset Sonar, IGHHO achieved about the same results as EPO, while both had a significant advantage over the other comparison algorithms (nearly 45% reduction); for the dataset Hill Valley, IGHHO achieved second place; for the dataset Clean1, IGHHO achieves the second place with an average reduction of 61% compared to the 4th, 5th, 6th, 7th, and 8th places; for the dataset Madelon, IGHHO ranks third and has a big disadvantage compared to the first place, but it still achieves good results compared to the other comparison algorithms (34% reduction in feature subset length). Finally, for Arcene, a dataset with a very large number of features, IGHHO ranks second and has a significant advantage (almost 86% reduction). Overall, the IGHHO algorithm proposed in this paper has some advantages over the comparison algorithm in terms of the length of the selected feature subset.
Table 8 compares the average computation time of IGHHO and other algorithms on the FS problem, from which it is clear that SMA is the fastest, and IGHHO shows a slight overall advantage. In terms of the IGHHO algorithm and the HHO algorithm, IGHHO is almost twice as faster as HHO on the medium-dimensional datasets Clean1, Madelon, Isolet, and CNAE and ranks in the top three, which is a competitive advantage over other comparative algorithms; on the other hand, it is about three times faster than the HHO algorithm on the high-dimensional datasets Colon, Leukemia and Arcene and ranks first (second on the dataset Colon), which is a significant advantage over other comparative algorithms. IGHHO’s good performance in operational efficiency is not surprising because we introduced a differential perturbation and greedy strategy in the development stage of the algorithm, which gives the algorithm the possibility to explore unknown regions while having high-intensity development capability in the late iteration. This dramatically accelerates the operational efficiency of the algorithm. This advantage is highly prominent when dealing with high-dimensional problems. In summary, the improvement of the HHO algorithm is successful.