1. Introduction
In data classification, the most common methods used are clustering methods based on statistical theory [
1] such as naive Bayes [
2] and artificial neural networks (ANNs) [
3]. These methods can use large-scale imbalanced data as clustering targets [
4]. However, with the development of large data networks, electronic data integration, imbalanced data analysis, large text databases, quantum data decoding and encoding analysis, and other data types in recent years, the efficient processing of data has become the foremost subject of current data classification and processing. The imbalanced data classification system studied in this paper belongs to this subject. When the imbalanced data is classified and processed, integration and classification of the data has gradually become the mainstream method for processing imbalanced data but this method usually results in a higher computational cost [
5]. In view of the cause, we have identified a meaningful conclusion—the efficient clustering of imbalanced data processing tends to increase the processing cost.
For the processing of imbalanced data, there are some mature algorithms such as sampling methods [
6,
7], cost-sensitive algorithms [
8,
9] and one-class classification [
8,
9]. However, there are some problems in that the processing of imbalanced data is mainly confined to the level of computer algorithm analysis and big data analysis, and lacks the function of efficient integration, which can result in privacy or data breaches, resulting in significant losses to data owners. Distributed and compatible construction of evolutionary algorithm structures is an innovative idea, which was proposed by Santucci [
10]. Centroid classification algorithm will provide new ideas for us to explore and solve imbalanced data classification problems. Mikalef proposed a new idea about data processing. Application level and resource level of big data can improve investment efficiency and business performance, which idea was proposed by Patrick Mikalef and it is applied preferentially to business [
11]. In addition to the above algorithms, there are other scholars who have performed research on imbalanced data processing. To seek parameter optimization to reduce the cost before training the model, Thai-Nghe proposed a fault-tolerant recognition algorithm that sets the artificial cost rate of data processing to ultrahigh parameters [
12]. In the integration of imbalanced data, most experts focus on how to improve the efficient integration and the integration of imbalanced data, such as the boosting data analysis method [
13] mentioned by Freund and the bagging data integration analysis algorithm [
14] proposed by Breiman. Sun proposed a modular data analysis integration strategy that classifies and integrates multiple imbalanced data and analyzed data using modular strategies [
15]. Chawla proposed a data integration strategy combining the (Synthetic Minority Oversampling Technique) SMOTE algorithm and boosting algorithm used in the sampling process [
16]. Lachiche [
17] proposed a new effective recursive algorithm combining
and
Bayesian structural systems, improving the difference between computer systems and artificial data in low-dimensional levels.
For generalized imbalanced data processing, the introduction of the above algorithms is limited to the specific algorithm program of data integration and the results of data classification. However, for the number of feature samples and the feature categories about the imbalanced data, the above algorithms may cause higher data classification costs and classification errors. Overall, in the previous research methods, most of them are integrated strategies under a single integrated classifier structure model such as support vector machine, decision tree analysis and the naive Bayesian method [
18]. These methods determine the integration effect based on data performance as a classification criterion, which is not conducive to the continuous integration and classification of imbalanced datasets. To solve the problem of high-efficiency classification of imbalanced data, this paper establishes a differential evolution integration algorithm with global convergence ability [
19,
20] based on the screening criterion of the mixed penalty function (differential evolution integration algorithm based on mixed penalty function screening criteria—the DE-MPFSC algorithm), which provides a useful method for solving the high-efficiency integration and classification of imbalanced data.
A: Source of The Idea
The idea of the DE-MPFSC algorithm is derived from the following three excellent overview papers. The purpose of the first paper [
21] is to summarize and organize the comprehensive foundation and the recent proposals on parameter adaptation about differential evolution (DE), which inspired us that the optimal parameters analysis should be carried out before the algorithm design. The second paper [
22], presents the basic structure, parameter analysis and the latest applications of evolutionary algorithms. It describes the related principles of various evolutionary algorithms in detail. Comparing the performance of DE algorithm is an effective way to further improve the understanding of evolutionary algorithm. The third paper [
23] introduces the various structures and main variants of the DE algorithm. The applications of the DE algorithm in the fields of commerce, agriculture and economics was emphasized in the literature. The theoretical design in this paper [
23] is consistent with the ideas about the topological structure of the divided regions [
20]. Because the population is so sensitive to the spatial structure of the search area during the evolution process, we carefully considered this idea and proposed a screening criterion based on region segmentation. However, when we reviewed their contributions, we had one common finding: for the variant of DE, when we test the algorithm, we implement it based on a procedural evaluation system of test functions and dimensions. However, the evaluation system ignores the important influence of convergence speed and convergence precision on individual evolution. Based on this finding, we systematically discuss the influence of convergence precision and convergence speed on the evolution process based on the structural indicators of the evolutionary algorithm. Finally, we propose an important idea: judge whether the evolutionary algorithm is better than before based on the degree of entanglement between convergence precision and convergence speed, then, further verify the scientificity of the idea proposed by us through numerical experiments.
In terms of improvements in the DE algorithm, in a multimodal interactive environment, balancing the parameters
F and
of the evolutionary algorithm is an important idea. In multi-objective parameter design and adaptive parameter analysis, the idea of self-adaptation was proposed by Janez Brest, namely (Self-Adapting Parameter Setting in Differential Evolution)JDE [
21,
24] and applied in the field of computer and machine learning, which can be also applied to large-scale problems [
25]. The dynamic parameter design idea of DE-MPFSC algorithm is based on this research and numerical experiments are also analyzed in detail. The Opposition Based Differential Evolution (OBDE), proposed by Rahnamayan et al. (2008) [
22,
26], mainly emphasizes the impact of the evolution speed on the DE algorithm. The OBDE algorithm can enhance the adaptability of the algorithm by improving the learning efficiency, which idea has a priority role in solving multi-objective problems. Our inspiration for the idea of decision space segmentation in this paper comes from it, which is innovative for improving the decision structure of evolutionary algorithms. The (Differential Evolution with Global and Local neighborhoods) DEGL algorithm [
23,
27] has made some improvements in the individual optimization space, emphasizing the influence of topological structure and spatial neighborhood on the evolutionary algorithm and highlighting the role of the idea of spatial segmentation, which can guide us to think deeply about the search area of the evolutionary algorithm [
28,
29]. The new model will be widely used in technological improvement. In Reference [
20], we independently proposed the spatial topology concept of the DE algorithm and gave relevant proofs. We independently proposed the spatial topology concept of the DE evolution algorithm and gave relevant proofs. Topology concept is a priority direction in expanding search area. Qin and Suganthan proposed a new version of adaptive differential evolution algorithm [
22,
30]. Through the adaptation of mutation factor
F and cross probability
, the (Self-Adaptive Differential Evolution) SADE algorithm can adjust the individual search precision and search speed from a micro perspective, which will fundamentally change the search pattern of individuals in the spatial neighborhood and will promote the healthy development of the DE algorithm and provide useful help for the dynamic adjustment strategy of the DE-MPFSC algorithm. In addition, the Modified DE algorithm (MDE) was proposed by Zou et al. [
21,
31]. The main idea of MDE is to mix the adaptive mechanism of mean and Gaussian distribution to improve the ability of parameters F and CR to update themselves. The Fitness Adaptation Differential Evolution (FADE) was proposed by Ghosh et al. [
21,
32], the strategy selection mechanism of the main idea is that individual selection behaviors can be randomly generated during the evolution process. When individual populations evolve in spatial neighborhoods, the tentative strategy of selecting individuals is also an effective way to expand the diversity of the population. The spatial structure of individual evolution is not completely random. The evolutionary pattern of individuals will be affected by mutations, resulting in structural deviations of individuals, which is not conducive to large-scale population evolution [
33], which was proposed by Caraffini. The structured properties of evolutionary algorithms is one of the important issues we will explore in the future. Especially in terms of the convergence and stability of evolutionary algorithms, this effect is significant. To this end, we should explore new ideas to reduce structural deviations. We roughly reviewed the variants of the DE algorithm and related research work from recent years. Based on the above viewpoints, we have developed an improved model of the DE algorithm and proposed the DE-MPFSC algorithm, which opens up new idea for further research on the DE algorithm and related applications.
B: Technical Route
a. According to the idea of region segmentation, propose the region screening criteria for individuals in the evolution process;
b. Construct the operation mechanism of the DE-MPFSC algorithm, including the introduction of the dynamic mutation factor F and the dynamic crossover probability ;
c. According to the structural evaluation system, the idea about entanglement degree of the convergence speed and convergence precision of the algorithm is proposed;
d. Compare the four classic DE algorithms JDE, OBDE, DEGL and SADE to further verify the advantages of the DE-MPFSC algorithm.
The first advantage of the DE-MPFSC algorithm is that it can classify and extract the original imbalanced data, generate an imbalanced data point searching area and balanced data point searching area, further redistribute the data searching unit. The second advantage is that it can clarify the progressive boundaries of imbalanced points and balanced points of the classified conditions. When the mixed penalty function is at a progressive boundary, DE algorithms only search inside or outside all data points and cannot cross data boundaries, which is not conducive to decreasing the time for data integration and classification. However, the DE-MPFSC algorithm can optimize the data structure by using the mixed penalty function. The third advantage can improve the accuracy and global optimization of imbalanced data classification and purification. The fourth advantage can analyze many different types of data structures. Because of the DE-MPFSC algorithm having self-adaptability, it can use the global convergence of heuristic algorithms to greatly shorten the search time of the searching area. The DE-MPFSC algorithm expands the searching space, improving the integration efficiency of imbalanced data, which results in wide adaptability. The full-text structure is arranged as follows:
Part one: The first and second chapters introduce the current situation of imbalanced data integration and classification, the advantage analysis of the DE-MPFSC algorithm and the form and normalization of the internal and external penalty functions.
Part two: The third chapter is mainly the processing of the constraint conditions for the mixed penalty function, the formal construction of the DE-MPFSC algorithm and property analysis of the mixed penalty function screening criteria.
Part three: The forth chapter is mainly the theoretical analysis of the DE-MPFSC algorithm. The validity and convergence of the DE-MPFSC algorithm are analyzed from the mathematical point of view.
Part four: In the fifth chapter, we creatively introduce the entanglement degree and entanglement degree error to compare the performance of the DE-MPFSC by numerical experiments.
Part five: The sixth chapter mainly establishes verification indicators of the Classification Accuracy , Adjusted Rand Index , Normalized Mutual Information datasets (due to the overall structure of this paper, we list the full name of the datasets in this section. The relevant definition of the datastes is in the data test chapter and the reference is attached). We analyze and verify the theoretical nature of the DE-MPFSC algorithm through the imbalanced data set (Unified Communications Irvine Machine Learning Repository, namely UCI) UCI machine learning data set;
Part six: Gives relevant conclusions.
3. Differential Evolution Integration Algorithm Based on Mixed Penalty Function Screening Criteria
3.1. Mixed Penalty Function
Simply using internal or external penalty function can transform the constraint problem into an unconstrained problem, which reduces the computational difficulty. However, there are two major defects. First, the position of the effective solution under the constraint conditions often has swing effects when the values of the respective penalty factors are different. In other words, if the optimal point lies on the constraint boundary, the optimal point will not be searched by using the two algorithms, but the iterative sequence generated by the two algorithms will be infinitely close to the optimal point. Second, the higher penalty factors are not to easy to be controlled. For example, for the external penalty function, the penalty factor is too large, which causes the iterative points to rotate around the optimal point and be mistaken for the optimal point, which causes incorrect solutions; if the penalty factor is too small, the condition number of matrix of the penalty function term often becomes too large, so that it is difficult for the iterative sequence to converge to the optimal point. To balance the optimal defects of the two methods, we will establish a mixed penalty function as follows:
or
where
,
,
,
is a monotone decreasing penalty factor sequence.
is the probability of fuzzy iterative points (that is, when the penalty factor reaches the approximate level of the optimal point by iterating, the closer the optimal point is, the bigger the degree of of the iterative point approximating the optimal point. Then, when
, the degree of approximation between
and
will be in a fuzzy state), which can adjust different kinds of penalty functions.
3.2. The Screening Criteria of the Mixed Penalty Function
When solving nonlinear constrained optimization problems
, to effectively balance the swing problem generated by the equality penalty and inequality constraints. Let the iterative point be
after
k iterations, and the accuracy of the iterative solution
be a sufficiently small real value
. According to the brief screening rules of the effective solutions proposed by Deb [
41], we establish the screening criteria of the mixed penalty function based on the internal penalty function (this screening criterion was established to relax the spatial search area in order to facilitate cross-regional search of the objective function), the external penalty function and their normalization, as follows:
When the two effective solutions and are inside the feasible domain under the constraint conditions, the effective solution with a smaller internal penalty function value is considered the optimal approximate solution and then , satisfies the formula ;
When the two effective solutions and appear in the form of certain solutions and are located at the boundary of the feasible domain under the constraint conditions, we take the mean value of the two effective solutions as an optimal approximate solution. Then, , ;
When the two effective solutions and appearing in the form of uncertain solutions are located at the boundary of the feasible domain under the constraint conditions, let be an internal iterative solution, then all internal iterative solutions constitute a positive monotonic decreasing sequence within the feasible domain. In addition, let be an external iterative solution, then all external iterative solutions constitute a positive monotonic increasing sequence outside the feasible domain. At the same time, let , , then we take the mean value of the two effective solutions as an optimal approximate solution. Further, there is the following: , ;
When the two effective solutions and are outside the feasible domain under constraint conditions, the effective solution with the larger exterior penalty function value will be considered as the optimal approximate solution and then , will satisfy the formula ;
When the solution of the constraint is an invalid solution, in order to ensure the global nature of the algorithm search, we will perform the domain expansion operation based on the probability of the two invalid solutions: Firstly, the invalid solution is taken as the center and the original space search area is dimension expanded with the preset precision as the radius, so that the constraint condition is relaxed and the invalid solution is validated.
The screening criterion possesses the following properties:
The internal penalty function or the external penalty function can only detect the inside of the feasible domain or the outside of the feasible domain; they are unable to detect the feasibility of the area by crossing the regional boundary, which can divide the feasible domain into two parts, interior and exterior. However, by adding the probability condition based on the penalty function, we can balance the approximate optimal solution according to the approximate degree of the iterative points and the global optimal point . In addition, the criterion (2) or (3) plays a balancing role in the optimal conditions, avoiding the phenomenon of the solution that cannot move in the interior and exterior of the feasible domain.
When the iterative points completely fall in the interior or exterior of the feasible domain, the criteria (1) or (4) has the characteristics of being able to flexibly select the optimal area, which cannot only avoid the loss of the approximate global optimal point caused by a single optimal area but also ensure searching the iterative points in the whole area. Further, it can avoid the occurrence of an approximate solution with less precision due to the singularity of the solution.
The criterion (1) or (4) adopts the selection strategy in probability, which has better optimal effect for the strong constraint optimization problem [
42]. This is essentially an intensive strategy for elite retention strategies, so it can be applied to stronger constraint optimization problems.
The internal penalty function tends to select the iterative points that are far from the feasible domain boundary and the external penalty function tends to select the iterative points that are closer to the feasible domain boundary , both of which are not conducive to the global convergence of the optimal point. However, the criterion can effectively avoid the dispersion distribution of iterative points and can find the global optimal point faster in the condition of the induction of the differential evolution algorithm.
3.3. The Processing of the Constraint Conditions
In the constraint optimization problems shown by
, there are two kinds of constraints: equality constraints and inequality constraints. Inequality constraints tend to expand the searching range, which is beneficial to global searching and can reduce the probability of fault tolerance. However, The equality constraints present a one-dimensional linear region, which can narrow the optimal range and is not conducive to global optimization. For this reason, we conduct high-dimension processing on the equality constraints:
where
is a sufficiently small real-valued slack variable, which increases the searching range by expanding the dimension, which is beneficial to global searching and global convergence. However, the phenomenon will reduce the feasible domain searching ratio, which is not beneficial to finding the global convergent point
. To solve the problem, we adopt the method of the self-adaptive slack variable to deal with the equality constraints [
43].
where
is the proportion of feasible solutions satisfying the slack variable in the current population,
,
is the maximum value that violates the effective solution in the initial population.
3.4. Implementation of Differential Evolution Integrated Algorithm Based on the Screening Criterion of the Mixed Penalty Function
In this paper, the mixed penalty function screening criterion is applied to the DE algorithm, the validity and the convergence of the algorithm are analyzed by the Markov model. Finally, we obtain the differential evolution integrated algorithm based on mixed function screening criteria (DE-MPFSC algorithm).
In the DE-MPFSC algorithm, let the space dimension be
, the population individuals are a set of
,
is an
n-dim decision variable,
is the
n-dim individual iterative step variable. The initial population is evenly distributed in the
n-dim space constraints. The mutation operator, crossover operator and selection operator
algorithm can act on the
n-dim space constraints and work on all individuals in the initial population to improve their searching ability, where the searching step calculation formula of the DE-MPFSC algorithm is as follows:
where
t is the number of iterations, and
and
are the self-adaptive learning rate of the population individuals. To ensure the accuracy of the calculation, we calculate according to the method proposed by Schwefel [
44]:
and
.
and
are all real uniform Gaussian distributions with a mean of 0 and a variance of 1. Then, we select individuals in the algorithm according to the mixed penalty function screening criteria and keep the outstanding individuals of
parent individuals and
progeny individuals to the next generation. The DE-MPFSC algorithm steps are as follows:
STEP 1: Initializing variables: According to the number of the population and the number of individuals, let , then generate the initial population within parents individuals and progeny individuals and each of the individuals corresponds with .
STEP 2: Calculating the fitness function value of the initial variable: Calculate the fitness function values of parent individuals and record the maximum value and minimum value. At the same time, calculate the fitness function values at each iterative points inside and outside the feasible domain under the condition of the initial conditions and records the maximum value and minimum value.
STEP 3: According to Formulas , and , generate corresponding mutation individuals.
STEP 4: Calculating the fitness function value of corresponding mutation individuals and calculating the probability of the fitness function value of the iterative points of different regions based on screening criterion of the mixed penalty function.
STEP 5: Conducting selection and crossover operations for corresponding mutation individuals according to Formulas and , generating progeny individuals and calculating their fitness function values and recording the iterative points that accord with the accuracy of the problem.
STEP 6: Consisting of the mutation individuals and the progeny individuals generated by the selection and crossover operations into a new individuals. According to the screening criteria, choose individuals of t generation as generation parents individuals and record the iterative points that accord with the accuracy of the problem.
STEP 7: Generating a series of iterative points , in two parts inside and outside the feasible domain.
STEP 8: According to the iterative points sequence , generated by step 7, the former are arranged in a monotonic decreasing sequence and the latter is arranged in a monotonic increasing sequence.
STEP 9: Calculating according to the probability formula of the screening criteria.
STEP 10: Substituting the result of step 9 into Formulas or , calculating the fitness function values and iterative points and judging whether the accuracy is satisfied.
STEP 11: If the iterative points satisfy the screening criteria and accuracy, then terminate the algorithm flows, otherwise let , return step 3.
5. Entanglement and Numerical Test of DE-MPFSC Algorithm Stability
We focus on the effect of the DE-MPFSC algorithm in imbalanced data integration, rather than focusing on the advantages of this algorithm over other algorithms. To test the advantages of the DE-MPFSC in processing the imbalanced data and to ensure the high property of the DE-MPFSC algorithm in the imbalanced data integration process compared to the traditional DE algorithm, we will conduct test experiments on the stability and effectiveness of the DE-MPFSC algorithm. Therefore, based on the topology structure between the convergence rate and convergence precision of the individuals and the quantum relationship [
20], including the entanglement phenomenon of individual convergence speed and convergence precision in the evolution process, we introduce the concept of entanglement degree
and the entanglement degree error
, analyze the stability of DE-MPFSC algorithm and DE algorithm from system theory.
5.1. Entanglement Degree and Entanglement Degree Error
Lemma 4 ([
20])
. Let be a continuous differentiable function defined in a complete normed linear space and , is a complete normed linear space. is a generalized n dimension complete normed linear subspace and forms an open cover of , then or . If , then the following formula, The lemma is detailed in Reference [
20], which theoretically shows that the convergence accuracy and convergence speed of the evolutionary individual in the iterative process represent an uncertain quantum relationship. The lemma is important for analyzing the distribution of the optimal point of the segmentation regions. The feasible points distributed in different regions force the internal and external penalty functions to search in different segmentation regions by the jumping function of probability. Let the region have two local optimal feasible points
and
, which are distributed to satisfy
and
,
, where
t is the number of individual iterations.
The individual entanglement degree we define is
, which is the numerical ratio between the convergence speed and convergence precision of individuals during population evolution. The higher the entanglement degree, the better the convergence stability of the algorithm or the faster convergence of the algorithm and the higher convergence precision. The entanglement degree error
is the correlation of the entanglement effect, which is the efficiency of the entanglement decided by
. The smaller the entanglement degree error, the smaller the fluctuation of the algorithm in the iterative process. Where,
is the individual convergence speed,
is the individual convergence precision,
is the entanglement coefficient,
is the discrete degree between entangled individuals. About the entangled relationship between convergence precision and convergence speed, we can refer to the topological geometric relationship described by the author of Reference [
20]. The effect of entanglement degree clearly shows the efficiency of algorithm convergence in the geometry.
5.2. Entangled Image and Analysis of DE-MPFSC Algorithm
Since data integration is closely related to data size and data dimension, we first reduces dimensionality of high-dimensional data before performing data experiments. Let the number of population individuals is 1000. The experiment is divided into 5 times on average, and the number of individuals per test is gradually increased from 200. Test runs an average of 5 times. The initial variation factor and initial crossover probability of the algorithm are determined by the literature [
19,
20]. The initial mutation factor and the initial crossover probability are respectively set to
. In the following experiments,
are dynamically adjusted according to the DE-MPFSC algorithm. The
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9 and
Figure 10 below shows the numerical image of the DE-MPFSC algorithm.
First, we analyze the stability and effectiveness of the DE-MPFSC algorithm from the number of population individuals. When the number of individuals are 200, 400, 600, 800, 1000, the DE algorithm entanglement degree about convergence precision and convergence speed are divided into two branches. However, the convergence speed and convergence precision are higher discrete and they do not converge to the entanglement center point along the convergence curve. The error test rates are 0.1, 0.65, 0.65, 0.55, 1, the correct test rate is 0 and the detection efficiency is 0, which shows that when the population individuals are 200, 400, 600, 800, 1000, the stability of the DE algorithm is not high and it is easy to generate fluctuations. Compared with the DE algorithm, the entanglement degree of the DE-MPFSC algorithm about convergence precision and convergence speed are divided into two branches. The convergence speed and convergence precision are lower discrete and they converge to the entanglement center point along the convergence curve. The error test rates are 0.25, 0.05, 0.05, 0.05, 0.05, the correct test rate are 0.75, 0.95, 0.95, 0.95, 0.95 and the detection efficiency are 3, 19, 19, 19, 19, which shows that when the population individuals are 200, 400, 600, 800, 1000, the stability of the DE-MPFSC algorithm is higher than DE algorithm.
In particular, when the number of individuals is 1000, the convergence speed and convergence precision are slightly lower than that of 800 individuals but the distribution is still scattered. They generate a more dramatic marginal differentiation phenomenon on branches and they do not converge to the entanglement center point along the convergence curve. The error test rate is 1, the correct test rate is 0 and the detection efficiency is 0. The situation must not be used as a basis for effective convergence, which shows that when the population individuals is 1000, the stability of the DE algorithm cannot be accepted and it is easy to generate fluctuations. Compared with the DE algorithm, the entanglement degree of the DE-MPFSC algorithm is divided into two branches. The convergence speed and convergence precision are lower discrete, which can further weak marginal differentiation phenomenon on branches and they converge to the entanglement center point along the convergence curve.
In summary, the stability of the DE-MPFSC algorithm is higher than that of the DE algorithm.
5.3. Numerical Test and Analysis of DE-MPFSC Algorithm
Since the data integration is closely related to the data size and data dimension, we set the population individuals to 1000, which is divided into 5 experiments and the average operation is 5 times for each experiment. According to the recommendations of Professor Storn R and Professor Price in Reference [
19], the initial mutation factor and initial crossover probability are set to
.
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 below shows the numerical effects of (Differential Evolution Algorithm Based on Mixed Penalty Function Screening Criteria)DE-MPFSC algorithm and (Differential Evolution)DE algorithm.
The above five data sheets in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5 are data analysis of the entanglement degree
, entanglement degree error
and error rate
about convergence precision and convergence speed. We experimented with 200 individuals as the cardinal number, repeating five times per experiment. We record the entanglement degree
, entanglement degree error
and error rate
about DE algorithm and DE-MPFSC algorithm in real time. At the same time, we record the dynamic mutation factor
F and the dynamic crossover probability
of the DE-MPFSC algorithm and analyze the data tables one by one.
In summary, when the number of individuals is 200, 400, 600, 800 or 1000, the range of entanglement error rate of the DE algorithm and the DE-MPFSC algorithm are respectively and , which shows that the entanglement degree of the DE-MPFSC algorithm is significantly higher than that of the DE algorithm, and the stability of the DE-MPFSC algorithm is better. The range of entanglement degree of the DE algorithm and the DE-MPFSC algorithm are respectively and , which shows that the entanglement degree of the DE-MPFSC algorithm is significantly higher than that of the DE algorithm, and the fluctuation of the DE-MPFSC algorithm is lower. The entanglement error ranges of the DE algorithm and the DE-MPFSC algorithm are respectively and , which shows that the entanglement degree of the DE-MPFSC algorithm is significantly higher than that of the DE algorithm, and the DE-MPFSC convergence of the algorithm is stronger. In addition, there IS an additional discovery during the experiment that the optimal dynamic mutation factor and the optimal dynamic crossover probability of the DE-MPFSC algorithm are and respectively.
In summary, the convergence, stability and effectiveness of the DE-MPFSC algorithm are higher than the DE algorithm, which has obvious advantages for the integration of individuals.
5.4. Test and Analysis of DE-MPFSC Algorithm about Several Classic DE Algorithms
5.4.1. Entanglement Degree Test of DE-MPFSC Algorithm about Several Classic DE Algorithms
We choose four classic DE improved algorithms JDE, OBDE, DEGL and SADE for related tests. Through entanglement degree test and data analysis, we can verify the related performance of the DE-MPFSC algorithm and its superiority in data integration. From the previous experiments, we can know that the DE-MPFSC is better than the DE in stability, effectiveness and convergence. At the same time, the optimal dynamic mutation factor
and the optimal crossover probability
of the DE-MPFSC algorithm were obtained through data experiments. In the following test experiments, we use them as the initial mutation factor and the initial cross probability to better compare with other classic DE algorithms. Because the function of data integration is closely related to the data size and dimension, the algorithm in this paper sets the number of individuals to 1000 and are divided into 5 experiments. The number of individuals gradually increases in each test and the test runs an average of 4 times. We record relevant experimental indicators: dynamic mutation probability
F, dynamic cross probability
, entanglement degree
, entanglement error
, error rate
, average individual loss efficiency.
Figure 11,
Figure 12,
Figure 13,
Figure 14 and
Figure 15 below shows a numerical results of the experiment.
We analyze the above
Figure 11,
Figure 12,
Figure 13,
Figure 14 and
Figure 15. First we analyze from the entanglement degree. In the process of the DE-MPFSC algorithm, the convergence speed and convergence precision of population individuals gradually stabilized as the population increased. The density and integration of the two entangled images gradually increase and, at the same time, the edge differentiation phenomenon is further weakened on each branch, which make the entangled branch converge to the center of the entanglement along the convergence curve. Compared with the DE-MPFSC algorithm, the entanglement phenomenon of the convergence speed and convergence precision of the JDE algorithm is less obvious. When the number of individuals is 200, 600 and 1000, the entangled image is highly dispersed, which causes a severe edge differentiation on the entangled branches. Although, when the number of individuals is 400 and 600, the state of entanglement is better than that of 200, 600 and 800, the continuity of individual evolution is not reached as a whole, it does not converge to the center of entanglement along the convergence curve. JDE’s entangled image shows irregular fluctuations. The OBDE algorithm is slightly better than the JDE algorithm in fluctuation, when the number of individuals is 200 and 1000, the entangled branches are severely differentiated and the dispersed degree is higher than that of JDE, which shows that the OBDE algorithm is not suitable. When the number of individuals is 400, 600 and 800, the dispersed degree of the entangled branches of the OBDE algorithm is lower but the edge dispersion of the entangled branches is more obvious than the DE-MPFSC algorithm. OBDE and JDE algorithms also do not show evolutionary continuity. The performance of the DEGL algorithm on evolutionary fluctuation is weaker than JDE and OBDE. When the number of individuals is 200 and 1000, the entangled branches are highly dispersed and do not converge to the entangled center. The DEGL is similar to the JDE algorithm and the OBDE algorithm. The individual convergence speed and convergence accuracy completely deviate from the entanglement center. The number of entangled branches integrated with 400 and 800 individuals is much higher than that with 200 and 1000 individuals. Although DEGL converges to the center of entanglement along the convergence curve, the edge differentiation of entangled branches is still not resolved. The continuity, volatility and convergence of the SADE algorithm are the closest to the DE-MPFSC algorithm but, unfortunately, a discrete state occurs when the number of individuals is 200, which makes the entanglement center distributed on the fitted line rather than an entanglement center. When the number of individuals is 400, 600, 800 and 1000, the edge dispersion of the entangled branches is weakened but it is not used for large-scale evolution.
Then we analyze from the correct test rate of entangled branches. The entanglement degree of the DE-MPFSC algorithm is divided into two branches about convergence precision and convergence speed. From the image, the convergence speed and convergence precision are lower discrete and they converge to the entanglement center point along the convergence curve. The detection efficiency are 3, 19, 19, 19, 19, which shows that when the population individuals are 200, 400, 600, 800, 1000, the stability of the DE-MPFSC algorithm is higher. Due to the fluctuation of the JDE algorithm, its correct test rate is 0, and its error test rates are 0.4, 0.4, 0.95, 0.6 and 0.95, respectively. JDE’s higher error test rate is easy to eliminate optimal individuals. The OBDE algorithm is weaker than the JDE algorithm in fluctuation. Its correct test rate is 0 but its error test rates are 1, 1, 0.4, 0.4, 1, which are higher than that of the JDE algorithm. When the number is 1000, the error test rate suddenly rebounds. The DEGL algorithm has a non-zero in correct test rate, which at least shows that there is a certain ability to adapt to individual mutations. But when the number of individuals is 200 and 1000, the error test rate of entangled branches is 1 and 0.95, which is still too high. When the number of individuals is 400 and 800, the correct test rate of entangled branches declines. The instability of this entanglement behavior is still not conducive to the evolution of mutation individuals. The correct test rate of the SADE algorithm are 0, 0, 0.35, 0.38, 0.35, and the error test rates are 0.4, 0.15, 0.15, 0.2, 0.15, which are much better than the JDE, OBDE and DEGL algorithms. Stability and continuity should be more conducive to the evolutionary behavior of large-scale individuals. However, compared to the DE-MPFSC algorithm, because of the correct test rate of the DEGL algorithm being too low, the detection of mutation individuals is not adaptive.
Finally, from the individual loss curve, we can know that the fluctuation of the JDE is higher than that of DE-MPFSC, OBDE, DEGL and SADE, indicating that during the individual evolution process, the population individual loss degree is higher. Especially for large-scale evolutionary populations, the risk of mutation individuals being lost during evolution will greatly increase. The individual loss degree of OBDE and DEGL rebounded in the later stages of evolution. The loss degree of mutation individuals showed an increasing trend relative to DE-MPFSC, JDE and SADE. The individual loss of SADE and DE-MPFSC is similar but the individual loss of SADE is higher than the latter. Because the individual losses of JDE, OBDE, DEGL and SADE are higher than DE-MPFSC, the advantages of the latter are obvious in stability. By analyzing the error rate curve, we can know that when DE-MPFSC is processing large-scale individuals (based on 200 individuals), the error rate of individual loss is lower than other algorithms.
5.4.2. Numerical Test of DE-MPFSC Algorithm about Several Classic DE Algorithms
We analyze the test results of the above data
Table 6. We already know that when the number of individuals is 200, 400, 600, 800, 1000, the entanglement error range of DE-MPFSC is
, The entanglement range of DE-MPFSC is
, and the entanglement error range of DE-MPFSC is
. When the number of individuals is 200, the entanglement error rate ranges of JDE, OBDE, DEGL and SADE are
,
,
,
, compared with DE-MPFSC, the former has a higher entanglement error rate, which shows that the entanglement degree of DE-MPFSC (since we have done a detailed analysis of the entanglement effect of the DE-MPFSC algorithm in the case of the same number of individuals, we will not repeat it here. For all comparative data, see
Section 5.3) is significantly higher than that of JDE, OBDE, DEGL and SADE, and the algorithm has good stability and effectiveness. We also found a special case, when the number of individuals is 200, the entanglement degree and entanglement degree error of JDE, OBDE, DEGL are infinite or non-existent. The reason for the case is that the entangled branches are too discrete or the edges of the entangled branches are too severely differentiated. Obviously, their entanglement efficiency is significantly lower than DE-MPFSC. The entanglement range and entanglement error of SADE are
and
,
, which is also lower than the entanglement effect of DE-MPFSC, which shows that the degree of entanglement of DE-MPFSC is significantly higher than that of JDE, OBDE, DEGL and SADE. The algorithm has small fluctuations and good convergence. When the number of individuals is 400, the entanglement error rate ranges of JDE, OBDE, DEGL and SADE are
,
,
,
, compared with DE-MPFSC, the entanglement error rate is higher, which shows that the degree of entanglement of DE-MPFSC is significantly higher than that of JDE, OBDE, DEGL and SADE, and the algorithm is better stability and effectiveness. The entanglement ranges of JDE, OBDE, DEGL and SADE are
,
,
,
, which shows that the degree of entanglement of the DE-MPFSC is significantly higher than that of JDE, OBDE, DEGL, SADE, and the DE-MPFSC algorithm has less volatility. The entanglement degree error ranges of JDE, OBDE, DEGL and SADE are
,
,
,
, which shows that the degree of entanglement of DE-MPFSC is significantly higher than that of JDE, OBDE, DEGL and SADE, and the algorithm has strong convergence. When the number of individuals is 600, 800, 1000, the comparison between DE-MPFSC and JDE, OBDE, DEGL and SADE is similar, and the data has been marked in the
Table 6. On the whole, the effectiveness, stability, and convergence of DE-MPFSC are higher than JDE, OBDE, DEGL and SADE on average.
Then, we analyze from the state of entangled branches. When the number of individuals is 200, 400 or 1000, the entanglement degree and entanglement degree error of JDE do not exist, and the entanglement branches are in a discrete state. As the number of individuals increases, the entanglement effect appears fluctuating. When the number of individuals is 200 or 1000, the entanglement degree and entanglement degree error of OBDE do not exist, and the entangled branches are in a discrete state. When the number of individuals is 200, 400, 600 or 800, the integration of entangled branches gradually increases. However, the increase of OBDE is weaker than that of DE-MPFSC. The entanglement of the former shows a coexistence state of gradualism and fluctuation. When the number of individuals is 200 or 1000, the entanglement degree and entanglement degree error of DEGL do not exist, and the entangled branches are in a discrete state. When the number of individuals is 400 and 800, the integration of entangled branches begins to appear and it reaches the best state when the number of individuals is 600. The entanglement of DEGL shows coexistence of symmetry and fluctuation. The case is not conducive to the stable development of population evolution. When the number of individuals gradually increases, the entanglement degree and entanglement degree error of SADE are always integrated and increase with the number of individuals. However, due to the dispersion of the edges of the entangled branches, the entanglement increases slowly. Overall, DE-MPFSC has better stability and effectiveness than JDE, OBDE, DEGL and SADE.