1. Introduction
The maximum diversity problem (MDP) represents a significant challenge in the field of optimization, which involves the selection of a subset
M, including elements of maximum diversity, from a greater set
N [
1]. This diversity is calculated as the sum of diversities between all possible pairs of elements belonging to
M, typically defined through a distance metric based on the specific attributes of the elements belonging to
N.
The concept of maximizing diversity was first introduced in 1977 [
2]. This groundbreaking work proposed and evaluated a heuristic, using data from a genetic resource selection program. The formulation of the MDP was subsequently refined and presented in mathematical terms in 1993, affirming the problem’s NP-hard status [
3].
The MDP has a wide range of practical applications across various fields, including resource management, project group formation [
4,
5], biological diversity conservation [
6], and the establishment of ecological preserves [
7]. This multitude of applications highlights the MDP’s versatility and its broad relevance.
Central to the MDP is a diversity index that captures the dissimilarity or distance between two elements. Common metrics—such as Euclidean, Manhattan, and Hamming distances—are often used, depending on the specific attributes and nature of the elements in set
N [
8]. Given the NP-hard nature of the MDP [
3], it is vital to employ algorithms that can deliver effective solutions within a feasible computational time.
Metaheuristics, which guide heuristics towards feasible solutions to specific problems, are widely used for obtaining effective solutions. Among them, we can mention the evolutionary algorithms, which are particularly prominent [
9], with genetic algorithms standing out [
10,
11].
A plethora of research addressing the MDP has been published since its inception, employing a host of methodologies. This research includes various metaheuristics, such as the greedy randomized adaptive search procedure [
12,
13,
14,
15,
16], tabu search [
16,
17,
18,
19], variable neighborhood search [
20,
21], and neural networks [
22]. A comprehensive comparative analysis of all the heuristics and metaheuristics proposed for the MDP up to 2012 was compiled in [
23]. More recently, a review focusing on discrete diversity and dispersion maximization from an operational research perspective illuminated the latest developments and approaches in this area [
24].
Among the various metaheuristic methodologies, the biased random-key genetic algorithm (BRKGA) [
25] has garnered interest, due to its effectiveness in solving discrete and global optimization problems. The BRKGA is an extension of the random-key genetic algorithm [
26], incorporating modifications to the parent selection process for crossover and implementation. Notably, the application of the BRKGA has been adapted to address the MDP [
27,
28,
29].
The main objective of the present investigation is to propose a new approach, based on the BRKGA, for tackling the MDP. While existing algorithms offer viable solutions, they often face challenges, regarding computational efficiency and the balance between exploration and exploitation. Specifically, these algorithms may struggle with large problem instances, or they may not consistently produce high-quality solutions, which are limitations. Our BRKGA with local search (BRKGALS) aims to overcome these limitations, by integrating a metaheuristic that uses local search to refine the solutions produced by the algorithm, thereby enhancing the solution quality while maintaining computational feasibility. The motivation behind our approach is to contribute to ongoing efforts to find more efficient and effective solutions for the MDP. In this process, we conduct an extensive computational study, using a wide range of instances from MDPLib, a highly regarded and commonly used library for the MDP. It is important to note that while MDPLib is specific to the MDP, other general optimization problem libraries, such as OR-Library and MIPLib, may also contain instances of diversity problems. However, none of these resources provide the MDP-focused dataset that MDPLib offers. In our study, we compare the performance of the proposed algorithm to the best-known solutions in the existing literature, demonstrating its competitiveness and potential for practical applications.
The subsequent sections of this article are organized as follows.
Section 2 provides definitions and discusses the state-of-the-art of the MDP. In
Section 3 and
Section 4, we discuss the proposed approach.
Section 5 presents the computational results, whereas in
Section 6, our conclusions are stated.
2. The Maximum Diversity Problem
To define the problem, consider a given set N containing n elements—that is, —where each element is characterized by a set of attributes . The MDP can be described as an integer quadratic programming problem, considering that , for all , is a binary variable equal to one if and only if element i is in the solution, that is, this element is selected to be a member of subset M, and zero otherwise. Additionally, is an array that stores the diversity indices between each pair of elements of N, that is, this array is filled such that and , for all .
Based on the earlier notations, the mathematical programming model for the MDP is stated as
In the formulation described in (
1)–(
3), the quadratic objective function stated in (
1) aims to maximize the sum of diversities of a subset
M with
elements taken from
N, with the equality presented in (
2) requiring that exactly
m elements are in a feasible solution, and the expression given in (
3) indicates that the variables of the problem must be binary.
The diversity to be maximized concerns the numerical value that corresponds to the degree of difference between the attributes of elements
i and
j, which can be measured, for example, by the distance between them, calculated using a certain metric. If the Euclidean distance is employed to calculate the diversity index,
is computed as
where
is the value of attribute
for element
and
denotes the diversity index between elements
i and
j, for
.
3. The BRKGA Metaheuristic
The BRKGA is a metaheuristic for solving discrete and global optimization problems, as proposed in [
25]. The BRKGA is a variation of the random-key genetic algorithm (RKGA) developed in [
26]. Both algorithms represent solutions as vectors of random keys, where each key is a real number generated in the continuous range
. Also, both algorithms are of general purpose and the aspects of the problem to be addressed are brought out through a special purpose algorithm called a decoder, which maps a vector of random keys into a solution of the problem, and calculates the objective function. The distinction between the functioning of BRKGA, RKGA and the obtaining of their solutions for optimizing the problem being addressed are stated as indicated below.
The BRKGA considers a population with p individuals, each represented by a chromosome. A chromosome is formed by genes, and has a fitness value corresponding to the objective function value relative to the problem being solved. A chromosome is a vector of random keys, where each allele, representing the value contained in a gene, is randomly generated in the continuous range . In the initialization phase of the BRKGA (similar to the RKGA), an initial population is created, and the fitness of each chromosome is calculated by the decoder. The population is then divided into elite individuals with the best fitness, and non-elite individuals, where . The algorithm evolves its individuals over a number of iterations called generations. There are three mechanisms in the process of evolution:
- (i)
Elitism: The BRKGA, like the RKGA, copies all elite individuals from the qth generation to the th generation, which ensures that there is never a deterioration in the best solution from one generation to the next.
- (ii)
Mutation: A small number of mutant individuals, namely, is introduced into the population of the next generation. A mutant is a new vector of random keys, which follows the same creation process as the individuals in the initial population.
- (iii)
Crossover: The crossover operation produces the remaining individuals in the population for the th generation.
The BRKGA differs from the RKGA in the way that parents are selected for mating. In the RKGA, two parents are randomly selected from the entire population to produce offspring. In the BRKGA, a biased selection of parents is performed, with each new individual being generated by combining a parent selected from the elite individuals, and a second parent being selected from the non-elite individuals (or the entire population, in some cases). Thus, for each allele of a chromosome with n genes (that is, for ), the ith allele of the descendant will be the ith allele of the elite parent with probability , and the ith allele of the non-elite parent with probability . The fitness of the generated chromosomes is then calculated, and a new iteration is started.
4. The Proposed BRKGA for the MDP
Several application programming interfaces (APIs) are available for the implementation of the BRKGA. An open-source API implemented in C++ was proposed in [
30]. This API can be secured from
http://mauricio.resende.info/src/brkgaAPI/ (accessed on 2 July 2023) and enables the parallelization of certain operations, such as solution decoding and independent evolution of multiple populations, with inter-population solution exchange. For the present study, we choose to use the aforementioned API [
30] due to its capabilities, which is fitted to our needs. Alternative APIs exist like the JAMES framework (a Java-based platform that is a distributed evolutionary algorithm implemented in Python) may be considered for other specific use cases.
We propose an implementation of the BRKGA tailored to the MDP, utilizing the chosen API. The proposed decoding algorithm transforms a method’s solution into an MDP solution, by sorting the vector of random keys of
n positions, and selecting the first
m indices to form the solution. As illustrated in
Figure 1 provides a decoding example showing the transformation process of a random key vector into an MDP solution. In this example, the vector of random keys corresponds to a new auxiliary vector constructed with sorted keys, preserving the indices. In the example,
and the MDP solution becomes
.
The solution set
M (
), obtained from the decoding process, is evaluated using the objective function given by
A key problem with the BRKGA is its inherent lack of guarantee that the solutions generated during the evolution process correspond to local optima. To address this problem, we perform a local search, at the end of each generation, to attain the best solution found so far.
To maintain the integrity of the method’s operation, the solution derived from the local search is not included in the subsequent generation of the BRKGA. Instead, this is stored separately as the best solution, if its objective function value surpasses that of the current best solution. In our implementation, the local search algorithm, based on [
12], is modified into an improvement-first version. We consider two elements,
and
, and calculate
, which represents the improvement due to swapping
i and
j. The search terminates if
, for every
i and
j. Otherwise, for each pair of elements
i and
j, if
, we perform the swap, and then restarts the local search using the resultant solution.
The distinguishing characteristic of the BRKGA, when compared to the RKGA, lies in the bias introduced in the parent selection process for crossover. This bias drives the evolution process towards solutions with better fitness values, thereby leading to a more effective exploration and exploitation of the search space. The diagram in
Figure 2 provides an overview of the entire process followed in our proposed BRKGA for MDP.
5. Computational Results
Our proposed algorithm, denoted by the BRKGALS, was implemented using the C++ language, and compiled in GCC, version 8.2.0, using the -fopenmp option to enable parallel API processing. The tests were performed on a computer Intel i7-4770 with 3.4 GHz and eight cores for processing, and with 16 GB of RAM, running Ubuntu 18.10.
Our experimental setup to evaluate the performance of the BRKGALS was carefully chosen. We used 100 instances of medium and large sets from MDPLib, known as the MDG dataset, in which arrays of real numbers were randomly generated within the range 0–10 of a uniform distribution. The instances fell into three categories:
MDG-a: A set of 40 matrices was generated as in [
16], 20 of which had dimensions
n = 500 and
, while the remaining 20 had dimensions
and
.
MDG-b: Mirroring MDG-a, once again a set of 40 matrices was generated according to [
16], with 20 matrices of
, and the other 20 matrices featuring dimensions
.
MDG-c: A set of 20 matrices was generated in [
23], with
and
.
For evaluation purposes, the results obtained by the BRKGALS were compared to the best results known in the literature [
31], and to two implementations of the BRKGA for the MDP: BRKGAAM [
27] and BRKGAPDM [
28]. According to the experiments conducted in [
28], we used the maximum number of generations without improvement,
, and the maximum processing time,
s (approximately 2 h), as the stopping criteria for the BRKGALS. In addition, we adopted the same (tuned) indicators, as reported in
Table 1.
To ascertain the robustness and performance of the proposed algorithm, we ran the BRKGALS 10 times for each instance, using different seeds of random numbers. The results obtained are reported in
Table 2 and
Table 3, for instances of type MDG-a with
and
, and for instances of
and
, respectively. For each instance identified in the first column, the second column presents the percentage difference given by
where
was the best solution obtained from 10 runs of the proposed algorithm, and
was the best solution known in the literature, that is, the target presented in the fifth columns of
Table 2 and
Table 3. The third (BRKGAM) and fourth (BRKAPDM) columns of these tables reproduce the percentage differences presented in [
27,
28], respectively.
In our analysis of the results, we observe that the proposed algorithm demonstrated superior average performance compared to the others. The mean deviations for the BRKGALS, BRKGAM, and BRKGAPDM methods with larger instances, that is,
and
, were 0.33, 0.50, and 1.19, respectively. Regarding the computational time, the MDG-a instances, that is,
and
, had an average execution time of about 82 s. As expected, for larger instances, the average time was greater and approximately 7200 s, with the maximum time being determined as the stopping criteria.
Figure 3 and
Figure 4 further complement these findings, showcasing the performance of the algorithms over time. In the figures, the symbols + and X are the values achieved by the BRKGAPDM and BRKGAM methods, respectively, as reported in [
27,
28].
In the other instances, the proposed algorithm preserved its ability to obtain higher-quality solutions for the MDP, in comparison to those found in the literature.
Table 4 and
Table 5 present the averages of the percentage differences of the proposed algorithm and those in the literature for medium-sized and large-sized instances of the type MDG-b and MDG-c, respectively. In
Table 5, the comparison does not include BRKGAAM, as we did not have execution information for this set of instances. Regarding the computational time, the MDG-b instances
and
had an average execution time of approximately 131 s. Larger instances, that is, MDG-b with
n = 2000 and
, and MDG-c with
, presented an average time of approximately 7200
s, which was the maximum time determined as stopping criterion. With these results, we can conclude that our proposed algorithm, the BRKGALS, demonstrated robust and superior performance compared to the other algorithms, providing a promising alternative for handling instances of the MDG problem, that is, from MDPLib.
6. Conclusions
In this study, we introduced the BRKGALS, a hybrid algorithm that amalgamates the BRKGA metaheuristic with a local search approach, to solve the MDP. The experimental results provided strong evidence of the BRKGALS’s superior performance and efficacy compared to other algorithms presented in the literature.
The computational experiments carried out on the MDG dataset, comprising both medium and large instances, highlighted the robustness and high-quality solution capacity of the BRKGALS. This new algorithm consistently surpassed existing algorithms, such as the BRKGAPDM and BRKGAM, achieving smaller deviations from the best-known solutions.
For MDG-a instances with and , the BRKGALS recorded average percentage differences between 0.00% and 0.14%, when compared to the best-known solutions. For larger MDG-a instances, that is, and , the proposed algorithm demonstrated remarkable performance, with average percentage differences spanning 0.23% to 0.48%. These outcomes validate the consistency of the BRKGALS in generating high-quality solutions, even for larger instances.
The average percentage differences for MDG-b and MDG-c instances, presented in
Table 4 and
Table 5, respectively, further illustrated the supremacy of the BRKGALS in deriving high-quality solutions across varying instance sizes and characteristics. The achieved average percentage differences varied from 0.04% to 0.32% for MDG-b instances, and from 0.20% to 0.31% for MDG-c instances, showcasing the reliability and efficacy of the BRKGALS.
In summation, the experimental results unequivocally position the BRKGALS as a preeminent algorithm for the MDP, in terms of solution quality. Its impressive ability to generate solutions proximate to the optimal ones validates the practical applicability of the hybrid approach that combines the BRKGA with local search. Additionally, our findings suggest several exciting prospects for future research, including exploration of parameter tuning techniques, code optimization, and investigating the integration of local search solutions into the BRKGA metaheuristic.
While our study utilized randomly generated datasets from MDPLib, the potential application of the BRKGALS in real-world scenarios is a promising avenue for further investigation. Note that evaluating its performance against techniques other than the BRKGA could offer a broader understanding of its effectiveness in resolving the MDP.
In conclusion, the BRKGALS emerges as a potent and efficacious algorithm, with which to tackle the maximum diversity problem’s challenges, and its exceptional performance paves the way for new advancements in this research domain.
Author Contributions
Conceptualization, G.S., A.L., R.O., V.L., J.F.-Z. and C.C.; data curation, G.S. and A.L.; investigation, G.S., A.L., R.O., V.L., J.F.-Z. and C.C.; methodology, G.S., A.L., R.O., V.L., J.F.-Z. and C.C.; software, G.S., A.L. and R.O.; validation, G.S., A.L., R.O., V.L., J.F.-Z. and C.C.; writing—original draft preparation, G.S., A.L., R.O. and J.F.-Z. writing—review and editing, V.L. and C.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was partially supported by the National Council for Scientific and Technological Development (CNPq) through grant 303192/2022-4 (R.O.), and Comissão de Aperfeiçoamento de Pessoal do Nível Superior (CAPES), from the Brazilian government; by FONDECYT, grant number 1200525 (V.L.), from the National Agency for Research and Development (ANID) of the Chilean government under the Ministry of Science and Technology, Knowledge, and Innovation; and by Portuguese funds through the CMAT - Research Centre of Mathematics of University of Minho, references UIDB/00013/2020, UIDP/00013/2020 (C.C.).
Data Availability Statement
The data and codes are available upon request to the authors.
Acknowledgments
The authors warmly thank the editors and reviewers for their helpful comments, which contributed to an improved version of our article.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Gomez, J.F.; Panadero, J.; Tordecilla, R.D.; Castaneda, J.; Juan, A.A. A multi-start biased-randomized algorithm for the capacitated dispersion problem. Mathematics 2022, 10, 2405. [Google Scholar] [CrossRef]
- Glover, F.; Hersh, G.; McMillan, C. Selecting Subsets of Maximum Diversity; Technical Report; University of Colorado at Boulder: Boulder, CO, USA, 1977. [Google Scholar]
- Kuo, C.C.; Glover, F.; Dhir, K. Analyzing and modeling the maximum diversity problem by zero-one programming. Decis. Sci. 1993, 24, 1171–1185. [Google Scholar] [CrossRef]
- Weitz, R.; Lakshminarayanan, S. An empirical comparison of heuristic methods for creating maximally diverse group. J. Oper. Res. Soc. 1998, 49, 635–646. [Google Scholar] [CrossRef]
- Dhir, K.; Glover, F.; Kuo, C.C. Optimizing Diversity for Engineering Management. In Proceedings of the IEEE International Engineering Management Conference, Dayton North, OH, USA, 17–19 October 1994. [Google Scholar]
- Agrafiotis, D.K. Stochastic algorithms for maximizing molecular diversity. J. Chem. Inf. Comput. Sci. 1997, 37, 841–851. [Google Scholar] [CrossRef]
- Unkel, W.C. Natural diversity and national forest. Nat. Areas J. 1985, 5, 8–13. [Google Scholar]
- Kochenberger, G.; Glover, F. Diversity Data Mining; Technical Report; The University of Mississipi: Oxford, MS, USA, 1999. [Google Scholar]
- Fogel, D.B. An introduction to simulated evolutionary optimization. IEEE Trans. Neural Netw. 1994, 5, 3–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Reinhold, V.N. Handbook of Genetic Algorithms; Lawrence Davis: New York, NY, USA, 1991. [Google Scholar]
- Roach, L.; Gao, X. Graphical local genetic algorithm for high-dimensional log-linear models. Mathematics 1993, 11, 2514. [Google Scholar] [CrossRef]
- Ghosh, J.B. Computational aspects of maximum diversity problem. Oper. Res. Lett. 1996, 19, 175–181. [Google Scholar] [CrossRef] [Green Version]
- Andrade, P.M.F. Um GRASP para o Problema da Maximização da Diversidade. Master’s Thesis, Universidade Federal Fluminense, Rio de Janeiro, Brazil, 2003. [Google Scholar]
- Silva, G.C.; Ochi, L.S.; Martins, S.L. Experimental comparison of greedy randomized adaptive search procedures for the maximum diversity problem. In Experimental and Efficient Algorithms, Third International Workshop; Springer: Berlin, Germany, 2004; Volume 3059, pp. 498–512. [Google Scholar]
- Silva, G.C.; Andrade, M.R.; Ochi, L.S.; Martins, S.L.; Plastino, A. New heuristics for the maximum diversity problem. J. Heuristics 2007, 13, 315–336. [Google Scholar] [CrossRef] [Green Version]
- Duarte, A.; Martí, R. Tabu search and GRASP for the maximum diversity problem. Eur. J. Oper. Res. 2007, 178, 71–84. [Google Scholar] [CrossRef]
- Palubeckis, G. Iterated tabu search for the maximum diversity problem. Appl. Math. Comput. 2007, 189, 37–383. [Google Scholar] [CrossRef]
- Aringhieri, R.; Cordone, R.; Melzani, Y. Tabu search versus GRASP for the maximum diversity problem. 4OR 2008, 6, 45–60. [Google Scholar] [CrossRef]
- Wang, Y.; Hao, J.K.; Glover, F.; Lü, Z. A tabu search based memetic algorithm for the maximum diversity problem. Eng. Appl. Artif. Intell. 2014, 27, 103–114. [Google Scholar] [CrossRef] [Green Version]
- Brimberg, J.; Mladenović, N.; Urošević, D.; Ngai, E. Variable neighborhood search for the heaviest K-subgraph. Comput. Oper. Res. 2009, 36, 2885–2891. [Google Scholar] [CrossRef]
- Aringhieri, R.; Cordone, R. Comparing local search metaheuristics for the maximum diversity problem. J. Oper. Res. Soc. 2011, 62, 266–280. [Google Scholar] [CrossRef]
- Wang, J.; Zhou, Y.; Yin, J.; Zhang, Y. Competitive Hopfield network combined with estimation of distribution for maximum diversity problems. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2009, 39, 1048–1066. [Google Scholar] [CrossRef]
- Martí, R.; Gallego, M.; Duarte, A.; Pardo, E.G. Heuristics and metaheuristics for the maximum diversity problem. J. Heuristics 2013, 19, 591–615. [Google Scholar] [CrossRef]
- Martí, R.; Martínez-Gavara, A.; Pérez-Peló, S.; Sánchez-Oro, J. A review on discrete diversity and dispersion maximization from an OR perspective. Eur. J. Oper. Res. 2022, 299, 795–813. [Google Scholar] [CrossRef]
- Gonçalves, J.F.; Resende, M.G.C. Biased random-key genetic algorithms for combinatorial optimization. J. Heuristics 2011, 17, 487–525. [Google Scholar] [CrossRef]
- Bean, J.C. Genetic algorithms and random keys for sequencing and optimization. ORSA J. Comput. 1994, 6, 154–160. [Google Scholar] [CrossRef]
- Moro, M.A. Meta-Heurísticas GRASP e BRKGA Aplicadas ao Problema da Diversidade máxima. Master’s Thesis, Universidade Estadual de Campinas, Campinas, SP, Brazil, 2017. [Google Scholar]
- Brito, J.; Fadel, A.; Semaan, G. Algoritmo heuríStico Aplicado ao Problema da Diversidade Máxima; Anais do LSBPO, GALOÁ: Rio de Janeiro, RJ, Brazil, 2018. [Google Scholar]
- Liu, X.; Chen, J.; Wang, M.; Wang, Y.; Su, Z.; Lü, Z. A two-phase tabu search based evolutionary algorithm for the maximum diversity problem. Discret. Optim. 2022, 44, 100613. [Google Scholar] [CrossRef]
- Toso, R.; Resende, M. brkgaAPI: A C++ Application Programming Interface for Biased Random-Key Genetic Algorithms; Technical Report; AT&T Labs Research: Florham Park, NJ, USA, 2012. [Google Scholar]
- Martı, R.; Duarte, A.; Martınez-Gavara, A.; Sánchez-Oro, J. The MDPLIB 2.0 Library of Benchmark Instances for Diversity Problems. 2021. Available online: https://www.uv.es/rmarti/paper/mdp.html (accessed on 2 July 2023).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).