1. Introduction
Choosing a meta-heuristic to solve a particular optimization problem is a non-trivial task. Without detailed prior information as to which particular algorithm to use, it demands the evaluation of several algorithms in order to find out which is more suitable to solve a given problem. However, due to the non-deterministic nature of these algorithms, this process demands to be repeated several times. Hyper-heuristics, which are
heuristics to choose heuristics, have been proposed as a means to both simplify and improve algorithm selection or configuration for optimization problems [
1,
2]. The idea is, through automation of the heuristic search, to provide effective and reusable cross-domain search methodologies that are applicable to the problems with different characteristics from various domains without requiring much expert involvement [
3].
Hyper-heuristics (HH) employ learning methods, by using some feedback from the search process. Based on the source of this feedback, HHs can be classified as online or offline. A hyper-heuristic employs online learning if learning takes place while the algorithm is solving an instance of a problem. It is offline if the knowledge is gathered in the form of rules or programs from a set of training instances that hopefully generalize to solving unseen instances [
1,
2].
Hyper-heuristics can also be classified as selection or generation methodologies [
4,
5]. Selection hyper-heuristics at the high-level control and mix low-level (meta)heuristics (LLHs), automatically deciding which one(s) to apply to the candidate solution(s) at each decision point of the iterative search process [
4]. On the other hand, generation methodologies produce new heuristics or heuristic parts using pre-defined components.
Much of the previous research focuses on online selection hyper-heuristics. The interest in such algorithms has been growing in recent years, but the majority of research in this area has been limited to single-objective optimization [
6].
The majority of the research in the HH literature focus on treating operators, such as crossover, mutation, and differential evolution as LLH [
2,
6]. In [
7], Cowling et al. proposed the
Choice Function, an equation responsible for rank heuristics considering the algorithm performance, in this case, a fitness function for a mono-objective problem, and the computational time.
In [
8], Li et al. introduced FFRMAB, a variation of the original Multi-Armed Bandit [
9] (MAB) where the
Fitness Rate Ranking (
FRR) was proposed as reward assignment. In this hyper-heuristic, a set of Differential Evolution operators was considered as LLH for being chosen. Following the choice of DE operators, Gonçalves et al. [
10] proposed a hyper-heuristic based on the choice function to control a set of five Differential Evolution operators. Results showed this hyper-heuristic overcoming the performance of the standard MOEA (using a single operator) when solving ten unconstrained benchmark functions with two and three objectives. In [
11], the authors applied a similar approach using several versions of MAB instead of using a Choice Function. In this case, the CEC 2009 benchmark [
12] was employed for performance evaluation. In [
13], Almeida et al. evaluated three different versions of MAB by applying them to the permutation flow shop problem.
In [
14], Guizzo et al. designed a hyper-heuristic to solve the Class Integration Test Order Problem [
15], a software engineering problem where nodes from a graph have to be visited, where these nodes are the classes to be tested. This hyper-heuristic was built using a choice function [
6] and a Multi-Armed Bandit [
9] to select a LLH from a set of nine to operate together with a fixed MOEA, in this case, the NSGA-II algorithm. This set was built by combining different crossover and mutation operators. The evaluation of LLHs was performed based on the dominance relationship among parent solutions and their offspring. In [
16], this approach was tested considering SPEA2 as the fixed MOEA. In [
17], de Carvalho tacked the same problem by creating a hyper-heuristic based on FRRMAB [
8] and considering the same set of LLHs. Among all these versions, the Choice Function applied together with NSGA-II ([
14] version) obtained the best results.
Only a few studies focus on online learning hyper-heuristics selecting/mixing multi-objective evolutionary algorithms (MOEAs) for solving multi-objective optimization problems. Among them, Maashi et al. [
6] proposed the Choice Function Hyper-Heuristic (HHCF) which is an interesting approach employing different quality indicators to evaluate a set of LLHs. In this approach, each LLH executes for some generations, then the resulting population is evaluated based on
m different quality indicators, generating a table
, where
n is the number of LLHs, and
m is the number of quality indicators. Following that, a second table containing the rankings is generated, which is used as input to a choice mathematical function responsible for determining which LLH should execute next. This approach was tested using the Walking Fish Group (WFG) benchmark [
18] and on a single real-world problem: the Crashworthiness problem [
19]. The authors used the algorithms NSGAII [
20], SPEA2 [
21], and MOGA [
22] as LLHs, and compared their results to Amalgam [
23] and to all single MOEAs. The experimental results indicated the success of HHCF outperforming them all. In [
24], Li et al. replaced MOGA by IBEA [
25], and evaluated the approach on solving another real-world problem: Wind Farm layout optimization [
26]. Although this hyper-heuristic yielded good results, the use of a two-level ranking approach was not justified properly and there was no theoretical background to it.
The following studies have employed well-established computational methods within the design of new hyper-heuristics. Li et al. [
3] proposed MOHH-RILA, an online selection hyper-heuristic treating the problem as a learning automata, where the automata action is the selection of an LLH from a set of MOEAs. As
reward and
penalty, they used Hypervolume and Spread indicators (the last in case of ties). Thus, this study can be classified as a Reinforcement Learning based online selection hyper-heuristic. The authors employed IBEA, SPEA2, and NSGAII as LLHs to find solutions for the WFG and DTLZ [
27] benchmarks and variants of the crashworthiness problem. The results showed that this approach outperformed HHCF, making it the state-of-the-art online selection hyper-heuristic for multi-objective optimization.
de Carvalho et al. [
28,
29] proposed the Multi-Objective Agent-Based Hyper-Heuristic (MOABHH), an online selection hyper-heuristic enabling the best performing LLH to have a greater share of the new solutions/offspring. They designed this HH as a multi-agent system by treating all LLH and quality indicators as agents and employed Social Choice Theory [
30] voting methods in order to summarize quality indicators preferences for the participation assignment mechanism. In the first study [
28], they employed NSGAII, IBEA, and SPEA2 as LLHs and applied MOABHH to the WFG Suite. GDE3 [
31] was additionally included within the LLH set in [
32]. In the latter [
29], they evaluated how MOABHH performs on real-world problems, including the Crashworthiness, Water [
33], Car Side Impact [
34], and Machining [
35]. In [
36], the authors evaluated the proposed approach considering different voting criterion considering these four real-world problems and the multi-objective travel salesperson problem [
37]. This proposed HH outperformed each individual MOEA and random choice hyper-heuristic; however, MOABHH was not compared to any of the other state-of-the-art hyper-heuristics.
Hence, the goal of this paper is to perform a thorough investigation of four reportedly top-ranking hyper-heuristics across an extensive set of problems: (i) MOABHH, which achieved promising results in a previous study; (ii) two variants of MOHH-RILA (HHLA and HHRL); and (iii) HHCF.
We used eighteen real-world multi-objective problems in total: four that were already used in [
32], ten problems from the
Black Box Optimization Competition [
38], and four other bi-objective problems from FourBarTuss [
39], Goliski [
40], Quagliarella [
41], and Poloni [
42]. We also increased the number of LLHs used by all HHs to five: NSGAII, IBEA, SPEA2, GDE3, and mIBEA [
43]. As a consequence, the task of selecting a LLH at each decision point was harder for all tested HHs.
This paper focuses on comparing selection online hyper-heuristics specialized in selecting multi-objective evolutionary algorithms. For this reason, hyper-heuristics focused on selecting/generating parts of algorithms such as heuristics (i.e., crossover and mutation) can not be considered. This is also the case of micro-genetic algorithms [
44], which also focus on selecting part of the evolutionary algorithms. Other approaches, such as parameter control [
45], which focuses on diminishing the effort on setting up parameters by automatically setting them, will also not be considered since they are not directly related to hyper-heuristics.
The rest of the paper is organized as follows:
Section 2 describes the studied hyper-heuristics and succinctly describes MOEAs employed as low-level heuristics.
Section 3 describes our methodology for the performed experiments. In
Section 4, we present and discuss our obtained results. Finally, we present our conclusions and further work in
Section 5.
3. Methodology
We set up the four studied hyper-heuristic controlling five LLH (GDE3, IBEA, NSGAII, SPEA2, and mIBEA) to solve the eighteen continuous real-world optimization problems presented in
Table 1. Different configurations were employed for these problems. In particular, the number of generations and population size were slightly different for P17-P18. This was due to the high computational effort demanded in these applications, which takes almost three months for experiments using the same setup considered for the problems P01–P16.
Table 2 presents all these parameter used in our experiments. We set up HHLA and HHRL parameters according to [
3], MOABHH according to [
32] and HHCF according to [
6] for P01–P16. For P17 and P18, we set MOABH
and HHLA and HHRL
. For these two problems, parameters were defined empirically.
All the original hyper-heuristics were designed to work with populational genetic algorithms. They have specific procedures to manipulate the current population of solutions. In this comparison, we add two other MOEAs (mIBEA and GDE3), which are modeled the same as the three others. For this reason, it does not demand deep changes in the compared hyper-heuristics, changes that would define new hyper-heuristics. Thus, algorithms such as MOEA/D [
76] and SMPSO [
55] cant be considered without proposing four new algorithms.
For genetic algorithms, we followed [
3,
6]. For GDE3, we followed [
32]. We analyzed the performance comparing both hyper-heuristic results with the five single meta-heuristic results. All MOEAs used are implemented by jMetal 5.7 [
71]. Each experiment is repeated 30 times.
We employed Hypervolume and IGD+ averages obtained from the 30 executions. First, for each problem, we joined all results obtained by all algorithms, found the nadir point, necessary for Hypervolume calculation, and took the non-dominated set in order to generate the known Pareto Front (), necessary for IGD+ calculation. Then, we calculated the quality indicator averages and compared them using Kruskal–Wallis as the statistical test with a confidence level of 95%. In order to perform this, we first identify which algorithm has the best average according to the quality indicator. Thus, all the other algorithms are compared to the best, generating a set of p-values. We define an algorithm tied statistically with the best when a given p-value is superior to the significance level, which in this case is .
4. Experimental Results
In this section, we present the empirical results and our analysis.
Table 3 and 5 present, respectively, averages for Hypervolume and IGD+. For each problem, the algorithms results were submitted to a Kruskal–Wallis statistical test where the following hypothesis was answered:
Hypothesis 1. Considering a quality indicator (Hypervolume or IGD+), is a given algorithm output equivalent to the algorithm that has the best output?
In these tables, we highlighted (in grey) the algorithm with the best average output; moreover, considering the statistic test outcomes, we represented in bold those algorithms whose outputs are equivalent to the best one.
4.1. Hypervolume Analysis
Table 3 presents Hypervolume averages. From the experiments, we could conclude the following:
If we compare the results obtained just by the five MOEAs, we could see that: (i) GDE3 obtained the highest results on five problems (P01, P04, P14, P16, and P17); (ii) IBEA was the best one on four problems (P02, P05, P07, and P13); (iii) NSGAII has the best average on four problems (P06, P10, P15, and P18); (iv) SPEA2 has the best average on five problems (P03, P08, P09, P11, and P12); and (v) mIBEA did not get the best results in any experiment.
If we consider just the four hyper-heuristics, we have: (i) MOABHH has higher averages on nine problems (P01, P04, P07, P08, P10, P12, and P15–P17); (ii) HHLA has its best results on three problems (P05, P14, and P18); (iii) HHRL performs better than others on five problems (P02, P03, P09, P11, and P13); (iv) HHCF is the best algorithm just in P06.
Finally, if we compare the results obtained by all nine algorithms, GDE3 has higher Hypervolume values on four problems (P04, P14, P16, and P17), IBEA has better averages on three problems (P05, P07, and P13), NSGAII has the best result on three problems (P06, P10, and P18), SPEA2 on three problems (P08, P09, and P11), MOABHH has better averages on three problems (P01, P12, and P15), and HHRL in two of them (P02 and P03). Moreover, mIBEA (among MOEAs), HHLA, and HHCF (both among HHs) did not excel in any problem.
Finally, it is interesting to compare the results obtained by HHs with individual MOEAs.
Table 4 presents a summary of the statistical pairwise comparison between HHs and individual MOEAs considering Hypervolume averages. An x in a certain cell means that the corresponding HH could not achieve a result as good as the one obtained by a particular MOEA, considering the Hypervolume indicator. First of all, we can notice that all HHs were overcome in problems P17 and P18. Moreover, MOABHH could not achieve statistically tied results with the best algorithms in three other problems (P10, P11, and P14), HHLA was bet in six other problems (P04, P06, P08, P09, P10, and P12), HHRL could not achieve the best result in two other problems (P06 and P10), and HHCF did not get good results in six other problems (P05, P07, P08, P09, P11, and P16).
4.2. IGD+ Analysis
Table 5 presents IGD+ averages. From the experiments, we could conclude the following:
If we compare the results obtained just by the five MOEAs, we could see that: (i) GDE3 has better averages on five problems (P01, P04, P14, P16, and P17); (ii) IBEA excelled in four problems (P7, P11, P13, and P15); (iii) NSGAII has obtained good results in two problems (P06 and P18); (iv) SPEA2 has obtained the best IGD+ values for the five problems (P02, and P8–P11); and (v) mIBEA has obtained the best results in two problems (P03 and P05).
If we consider just the four hyper-heuristics, we have: (i) MOABHH excels in seven problems (P01, P07, P08, P10, P12, P14, and P17); (ii) HHLA performs better in four problems (P05, P11, P15, and P18); (iii) HHRL performs better in six problems (P02–P04, P09, P11, and P13); and (iv) HHCF has good results in two problems (P06 and P16).
Finally, if we compare the results obtained by all nine algorithms, GDE3 is the best algorithm on three problems (P04, and P16–P17), IBEA performs better on three problems (P07, P11, and P15), NSGAII on 2 problems (P06 and P18), SPEA2 excels on four problems (P02, P08, P09, and P11), mIBEA just on P05, MOABHH performs better on four problems (P01, P10, P12, and P14), and HHRL also in two problems (P03 and P13). Moreover, HHLA and HHCF (both among HHs) did not excel in any problem.
Finally, as done in
Section 4.1, it is also interesting to compare the results obtained by HHs with individual MOEAs.
Table 6 presents a summary of the statistical pairwise comparison between HHs and individual MOEAs considering IGD+ averages. An x in a certain cell means that the corresponding HH could not achieve a result as good as the one obtained by a particular MOEA, considering the IGD+ indicator.
First of all, we can notice that all HHs were overcome in problems P15, P17, and P18. Moreover, MOABHH could not achieve statistically tied results with the best algorithms in three other problems (P03, P08, and P11), HHLA was also bet in three other problems (P04, P06, P08), HHRL could not achieve the best result in four other problems (P06, P07, P14, and P16), and HHCF did not get good results in six other problems (P05, P07, P08, P10, P11, and P13).
4.3. Hyper-Heuristics Analysis
4.4. Generality Analysis
In order to perform a cross-domain evaluation of algorithms, we followed [
77] and generated the average and aligned Friedman ranking considering both Hypervolume and IGD+ values. These rankings consider which position each algorithm takes on each problem. We also concatenated Hypervolume (
Table 3) and IGD+ (
Table 5) tables, both with 18 lines of data, in order to create a new table of mixed quality indicators with 36 lines of data. As best Hypervolume and IGD+ values are, respectively, the highest and the lowest ones, we have used
values instead.
Table 8 presents the statistical evaluation. In this table, smaller statistical values are considered as better.
Considering just MOEA results, SPEA2 is the best single MOEA according to almost all statistical scores. The second best is GDE3, while IBEA and mIBEA are the worst-performing MOEAs according to this cross-domain analysis, with mIBEA performing ‘slightly’ better than IBEA.
Considering all of the nine studied algorithms,
MOABHH is the algorithm which performs better in a cross-domain perspective, as highlighted (in grey) in
Table 8. HHRL is the second-best except considering mixed average ranking values (5th column of the table): in this specific case, HHLA is considered the second-best algorithm.
Another interesting result is that it is not the case that an HH always gets better results than the LLHs that it is composed of; one can notice that, in general, SPEA2 and GDE3 got better statistical values when compared both to HHLA and HHCF.
4.5. Discussion
The cross-domain tests illustrated that mIBEA is one of the worst poor-performing algorithms for this group of problems, while GDE3 is one of the top-performing algorithms. Thus, the inclusion of mIBEA and GDE3 in the LLH pool increases the difficulty of choosing the best algorithm for hyper-heuristics. In the previous studies, mainly three algorithms were considered in the LLH pool: IBEA (one of the worst algorithms in our study), SPEA2 (the best one in this study), and NSGAII.
MOABHH and HHRL performed quite well, showing clearly superior results when compared to HHLA and HHCF. Both MOABHH and HHRL removed the poor performing LLHs at the beginning of the search, letting the best LLH run more time than others. HHLA and HHCF, however, kept trying poor-performing LLHs. HHRL removed the poor performing LLHs right away, without giving a proper chance to them during its initialization process, but those LLHs might have performed well in the later stages of the search process. On the other hand, MOABHH has kept all the algorithms running in parallel and removing a percentage of the offspring generation from the worse algorithms. This increases the MOABHH capability of exploring the search and avoiding the chance of removing an LLH with potential good performance in the later stages from the LLH pool in the beginning of the execution.
5. Conclusions
In this study, we investigated reportedly the top four online selection hyper-heuristics across eighteen real-world optimization problems. The hyper-heuristics controlled and mixed a set of five low-level MOEAs to produce improved trade-off solutions. The performance of algorithms was also evaluated with a larger set of MOEAs.
To the best of the authors’ knowledge, this work is the first one which addresses the problem of using real-world problem instances for cross-domain performance evaluation of hyper-heuristics. In particular, we addressed in this paper the following issues: (i) an evaluation of four state-of-the-art online hyper-heuristics (MOABHH, HHRL, HHLA, and HHCF) using exclusively real-world problems; (ii) a harder selection task for these four hyper-heuristics by increasing the number of Low-Level heuristics used: in our work, we used five LLHs, whereas, in previous publications, the number of LLHs used were 3 or 4; (iii) a cross-domain tested formed by eighteen real-world optimization problems (presented in
Table 1) in order to evaluate multi-objective hyper-heuristics, which gives a more realistic overview of their performance.
As expected, the empirical results showed that individual MOEAs deliver different performances on different problems, making those real-world problem instances very useful for cross-domain performance evaluation of hyper-heuristics. Moreover, our results showed that hyper-heuristics have a better cross-domain performance than single meta-heuristics. This means that, when a new multi-objective optimization problem must be solved, hyper-heuristics are excellent candidates, reducing the user’s effort to run repeatedly several meta-heuristics with different parameters settings in order to get a solution. In particular, MOABHH turned out to be the best algorithm delivering the best overall cross-domain performance, beating the other state-of-the-art hyper-heuristics with respect to two quality indicators: IGD+ and Hypervolume.
As future work, these hyper-heuristics will be studied across various applications in the discrete multi-objective optimization domain, such as Search-Based Software Engineering Problems [
78] and exploring the potential of these algorithms on helping to improve the current pandemic, in terms of diagnosis and treatment [
79] and drug discovery [
80]. Many objective problems also impose a challenge for researchers and practitioners: some many-objective approaches were recently proposed in the literature. Another research direction would be to study the performance of the top-performing online learning selection hyper-heuristics across problems with more than eight objectives, varying their LLHs.