2.1. SVM System and SVR
SVM (Support Vector Machine) is a novel supervised learning algorithm that was first proposed by Cortes and Vapnik [
14] in 1995. It is based on the statistical theory concept of the VC (Vapnik Chervonenkis) dimension and can improve the generalization performance of learning machines by seeking structural risk minimization [
15]. The VC-dimension of the set of indicator functions
, is the maximum number of a set of vectors
that can be separated into two classes in all 2
h possible ways using functions of the set. If for any
there exists a set of
vectors that can be shattered by the set of the set
, then the VC dimension is equal to infinity [
16]. VC dimension measures the capacity of a hypothesis space. Capacity is a measure of complexity and measures the expressive power, richness or flexibility of a set of functions by assessing how wiggly its members can be. Notably, SVM can obtain good statistical laws under the premise of limited samples [
17]. SVM is divided into support vector classification (SVC) and SVR, which are used to solve data classification and data regression problems, respectively. This paper uses an SVR model for vessel trajectory prediction.
Given a set of sample data
for prediction, where
are the input variables and
are the output variables. The core of the SVR model is the kernel function. By introducing a kernel function, SVM subtly solves the inner product operation in high-dimensional feature space, which solves the problem of nonlinear classification and prediction. Commonly used kernel functions mainly include linear kernel functions, polynomial kernel functions, Radial Basis Function (RBF) and sigmoid kernel functions. Among them, RBF is the most widely used kernel function, which can realize nonlinear mapping [
18,
19]. It has better performance for both large samples and small samples and has fewer parameters than polynomial kernel functions. In this study, RBF function is employed as the kernel function for the SVR model [
20]:
where
stands for kernel function;
and
are the vectors in initial low-dimensional feature space and
g is the kernel function parameter;
is the nonlinear mapping function, which can map the input data into the high dimensional feature space; The operator
is the inner symbol. Through RBF kernel function mapping, each sample point
is fitted as closely as possible to the following SVR linear model:
where
is the weight vector, with
; and
is the bias, with
.
SVR takes an
-insensitive function as a loss function. First, a constant
is defined for a specific problem, where
. For a sample
, if
, there is no loss; otherwise, the corresponding loss is
. That is, the
-insensitive loss function can be written as:
As shown in
Figure 1, an interval band with
as the center and a width of 2
is established. If the training sample falls into the interval band, the prediction is considered correct.
In order to minimize the loss function, based on the criterion of structural risk minimization [
21], determining the constraints
and
and optimizing the objective function, the following optimization criterion is constructed:
This is the convex quadratic programming problem of the original problem of SVR, where
is the penalty factor and
, which aims to find a compromise between generalization performance and training error;
is the maximum tolerance value; and
and
are slack variables to avoid over-training. Combining these variables allows the model a certain degree of fault tolerance [
22]. The Lagrange function can be obtained by introducing Lagrange multipliers:
where
and
are Lagrange multipliers and
.
The partial derivatives of
are solved separately. Letting the partial derivatives be equal to zero [
23] yields:
Substituting Equation (7) into Equation (6) gives the SVR dual problem as:
The above process must satisfy KKT conditions [
24], that is:
Select the component
of
in the open interval
to calculate the parameter
b:
Finally, substitute the values of
and
into the formula (2):
which is called the decision function of SVR.
2.2. ACDE Algorithm
Differential evolution (DE) is a stochastic parallel optimization algorithm [
25,
26] based on population differences proposed by Storn and Price in 1997 [
27] on the basis of evolutionary ideas such as the genetic algorithm. DE solves optimization problems by imitating the heuristic swarm intelligence generated by competition and cooperation among organisms [
28]; the algorithm has strong usability, global optimization capability and stability [
29] and is widely used in clustering optimization calculations, neural network optimizations, constrained optimization calculations and filter designs. However, the DE algorithm also has two defects: (1) the fixed scaling factor hinders the full search performance of the algorithm when performing mutation operations [
30]; (2) in the later stage of optimization, the population diversity is reduced and the algorithm falls easily into local optima [
31,
32]. Among them, the first defect has been basically solved by Brest et al. 13 years ago. Brest et al. [
33]. They assign a separate scaling factor to each individual, so that it can be automatically adjusted according to two thresholds during the running of the algorithm but this will inevitably have some randomness and blindness [
34]. To solve these problems, this study uses an improved DE algorithm, ACDE, which improves upon DE in two aspects. First, in the mutation operation stage, the adaptive scaling factor can be changed with the number of iterations, thus addressing the insufficient searching ability caused by the fixed scaling factor. In the standard differential evolution algorithm, increasing the scaling factor can expand the search range and speed up the individual update in the population. The reduction of the scaling factor can enhance the ability of local search and improve the convergence of the algorithm. Based on this, in the early stage of algorithm evolution, the search range can be expanded by increasing the scaling factor. In the later stage of algorithm evolution, the scaling factor is reduced to speed up the convergence of the algorithm. At the same time, in order to avoid the destruction of the genetic structure of the dominant individual, the scaling factor of the individual with higher fitness should be reduced; in order to promote the improvement of the structure of the disadvantaged individual, the scaling factor of the individual with lower fitness should be increased. Second, chaotic motion has the characteristics of sensitivity, randomness, regularity and ergodicity and it can traverse the entire space state without repeating within the specified range. Based on chaos theory, a chaotic fine search strategy is proposed to improve the local search efficiency of the algorithm in order to avoid premature population.
For any optimization problem,
where
N is the dimension of the solution space and
and
represent the upper and lower bounds of the range of
values, respectively. The specific process of the ACDE algorithm is as follows:
Step 1: Population initialization.
D individuals randomly satisfying the constraint condition are generated in the
N-dimensional solution space as the
i-th individual of the 0-th generation population:
. At the same time, the maximum number of iterations is assumed to be
M. The value of the
j-th dimension of the
i-th individual is calculated according to the following formula:
where
represents a random number between [0,1] and obeys a uniform distribution.
Step 2: Chaos initialization [
13]. Chaotic motion has the characteristics of randomness, regularity and ergodicity and can traverse the entire space state without repetition within a specified range. Generally speaking, the chaotic variable is first mapped to the value interval of the optimization variable and then the chaotic variable is used for optimization. Here, we use the logistic mapping commonly used by chaotic systems, with the basic expression as follows:
where
,
is a natural number and
; when
, the system is completely chaotic.
Step 3: Mutation. Different from the genetic algorithm, the DE algorithm realizes individual mutation by differential means. That is, two different individuals in the population are selected and the vector difference is scaled to perform vector synthesis with the individuals to be mutated [
35]. The
k-generation population generated by
k iterations is denoted as:
Three individuals
are randomly selected from the population and the mutation operations are performed according to the following equations:
where
is the adaptive scaling factor, which controls the influence of the difference vector and changes as the number of iterations
k changes;
and
;
and
are the upper and lower limits of the scaling factor, respectively; and
,
and
are the optimal, suboptimal and worst-case fitness, respectively, with random selection for the current
k-th generation sub-population. Through the mutation operations, an intermediate individual is created:
In the process of evolution, in order to ensure the validity of the solution, it is necessary to judge whether the “genes” in the “chromosome” satisfy the boundary conditions. If the boundary condition is not met, the “gene” is generated in the same way as the initial population, that is, it is regenerated by a random method.
Step 4: Crossover. The inter-individual crossover operation is performed on the
g-generation population
and the intermediate individual
produced by the mutation operations [
36]. The specific method is as follows:
where
is the probability of crossover and
; and
is a random integer on the interval [1, 2, 3, ...,
N].
Step 5: Selection. Using the principle of greedy choice, the better individuals [
37] are selected to form the next generation population according to the following evaluation function:
Step 6: Chaotic fine search. Letting
, a chaotic fine search is performed on the decision variable
and
is mapped to a chaotic variable according to the following formula:
The next generation iterative chaotic variables are calculated according to Equation (15) and the chaotic variables are converted into decision variables according to Equation (22). Determine whether the chaotic search has reached the maximum number of iterations, if not, continue to iterate.
2.3. ACDE-SVR Prediction Model
One important factor affecting the prediction accuracy of the SVR model is the selection of model parameters. Based on its strong stability, superior global optimization performance and higher convergence speed, this study uses the ACDE algorithm to optimize the parameters of the SVR trajectory prediction model with the penalty factor
, insensitive factor
and kernel function parameter
g.
Figure 2 is the flow diagram of ACDE-SVR algorithm. The steps of the ACDE-SVR algorithm are as follows:
Step 1: The trajectory prediction sample is input after de-noising and the current number of iterations is set to k. The ACDE parameters including population size D, maximum number of iterations M, adaptive scaling factor and crossover probability cr are set according to the actual situation.
Step 2: The optimal range of parameters is set and the population is initialized randomly and uniformly.
Step 3: The SVR algorithm is used to predict each individual. The
(mean square error) between the predicted value
and real value
is calculated according to:
. The
is used as the fitness function [
38]. The fitness value, individual extremum of each individual
, global extremum
and global extremum points
are recorded.
Step 4: If the number of iterations is less than the maximum number of iterations, that is, , the algorithm proceeds to the next step. Otherwise, it proceeds to step 7 and outputs the global extreme point as the searched optimal parameter, .
Step 5: According to Equations (17)–(22), mutation, crossover, selection and chaotic fine search operations are carried out to generate new populations. The fitness of individuals within the population, individual extremum , global extremum and global extremum are calculated.
Step 6: Iterations are set as and the algorithm returns to step 4.
Step 7: The SVR prediction model is built based on the calculated parameters.
2.4. Setting of Parameters in ACDE Algorithm
For the ACDE algorithm, the parameters should be reasonably set to obtain better parameter optimization results. The main parameters of the ACDE algorithm are scaling factor F, population size D, maximum iteration number M and crossover probability cr. Among them, the selection of scaling factor is introduced in Part 2.2, which is related to the number of iterations k. The bases for setting the remaining three parameters are as follows:
(1) Population size D. According to experience, D should be 5 to 10 times the dimension of the problem but not less than 4; otherwise, the mutation operation will not be possible. Generally speaking, as D increases, the population diversity will gradually increase along with the probability of reaching an optimal solution. However, this will also greatly increase the calculation time. In general, D ranges from 20 to 50.
(2) Maximum number of iterations
M. This parameter is the termination condition of the evolutionary operation. A larger
M increases running time of the program and yields a more accurate optimal solution. However, when
M is too large, the accuracy of the optimal solution will not increase correspondingly, because the population has reached an extremely single state at this time and the algorithm cannot jump out of the local optimal solution [
39].
(3) Crossover probability
cr. The value range of this parameter is [0, 1] and the value range of this parameter is (0.0, 0.2) when the objective function is separable [
40]. If
cr is large, the population convergence is gradually accelerated and precocity is prone to occur.
2.5. ACDE Algorithm Superiority Verification
In order to verify the excellent global optimization performance of the proposed ACDE algorithm, which is compared with the current state-of-the-art differential evolution algorithm, the LSHADE algorithm [
41]. The ACDE algorithm and LSHADE algorithm are evaluated using a set of problems presented in the CEC2017 competition on single objective bound constrained real-parameter optimization. This benchmark contains 30 test functions with multiple characteristics.
T is the dimensionality of the problem and the functions are tested on 50
T. In short, functions 1–3 are unimodal, functions 4–10 are multimodal, functions 11–20 are hybrid functions and 21–30 are composition functions. More details can be found in Reference [
42].
The parameters values of ACDE algorithm are set as follows: population size
D of the ACDE algorithm set to 50, maximum iteration number
M set to 10,000*
T, the upper and lower limits of the scaling factor are 1.2 and 0.3 and crossover probability
cr set to 0.9. While the parameters of the LAHADE algorithm are set as shown in Reference [
41].
The two algorithms perform 51 simulation tests on 30 benchmark problems, respectively. The statistical results of the comparisons on the benchmarks with ACDE algorithm and LSHADE algorithm are summarized in
Table 1. It includes the mean and the standard deviations of error. The best results are marked in bold for all problems. As can be seen from
Table 1, the ACDE algorithm is significantly better or flatter than the LSHADE algorithm on 18 functions. The ACDE algorithm is less accurate than the LSHADE algorithm on 12 functions. In general, the optimization performance of the ACDE algorithm is slightly higher than the LSHADE algorithm.