1. Introduction
With the progress of intelligent control technology, the development of underwater vehicles has entered a new stage. Whether it is the exploration and exploitation of marine mineral resources, the investigation of marine topography, or military applications, it is inseparable from the participation of underwater vehicles [
1]. Trajectory tracking is one of the key technologies in the field of underwater vehicles. It is the premise and guarantee for underwater vehicles to complete the specified tasks [
2]. Therefore, the research of trajectory tracking control technology is particularly important.
At present, the main research methods of underwater vehicle trajectory tracking control are proportional-integral-derivative (PID) control, fuzzy control, backstepping control, sliding mode control, etc. In [
3], a variable integral PID controller based on disturbance observer has been designed to realize the heading control of an underwater vehicle. In [
4], good trajectory tracking results have been achieved for underactuated underwater vehicles using terminal sliding mode control. In [
5], a bio-inspired backstepping control method and a three-dimensional trajectory tracking controller have been proposed for the deep-diving control. However, most of these methods do not consider the constraints of system state and input, which leads to the phenomenon of thrust saturation in actual control easily. On the contrary, model predictive control (MPC) has the ability to deal with various constraints and has great flexibility in describing control problems. These remarkable features caused the MPC algorithm to gradually become a hot research topic in trajectory tracking control [
6].
However, the coupled, nonlinear, time-varying dynamics characteristic of underwater vehicles make the design of MPC very challenging. The nonlinear effects of hydrodynamic damping, Coriolis, and centripetal forces induce two main difficulties in using the nonlinear model predictive control (NMPC) when the underwater vehicle is working.
Firstly, it is difficult to establish an accurate model of the underwater vehicle. The environmental disturbance represented by the ocean current, load variation and change of hydrodynamic parameters will affect the establishment of the dynamic model of the underwater vehicle; in addition, the commonly used dynamic model of the underwater vehicle established by the Newton–Euler equation is a parametric model, which does not have the function of online adaptive correction [
7]. It is difficult to ensure accuracy and applicability when the underwater vehicle is in a complex environment. The neural network has a wide application in underwater vehicle modeling [
8,
9,
10] because of its good nonlinear approximation ability and adaptive learning function. The RBFNN is used in [
11] to approximate the uncertain interference and the uncertainty of the AUV model to suppress the influence of parameter perturbation. In [
12], an adaptive sliding mode control strategy based on RBFNN was proposed to solve the heading control problem of AUV.
Secondly, it is difficult to guarantee the real-time performance of the optimization process. The real-time performance of MPC is affected by dynamic model complexity and the rolling optimization algorithm [
13]. At present, linear model predictive control (LMPC) is often used to simplify the nonlinear model in order to meet the real-time requirements. In [
14], combined the LMPC with sonar images, the dynamic target tracking problem is realized. In [
15], the complex six degrees of freedom underwater vehicle mathematical model is linearized, and smooth tracking between trajectory points is achieved. In [
16], the linear robust MPC is applied to ensure the stability of surface ships in disturbance environments. Linearization technology plays an important role in the analysis and design of NMPC. However, the frequent disturbance of the ocean current and the complexity and nonlinearity of the dynamic model degrade the control performance of linearization considerably. When the curvature of the reference trajectory changes greatly, the LMPC is prone to overshoot [
17]. In addition, it is difficult to guarantee the feasibility of optimizing the problem at each sampling time because the linearization method changes the model online. Therefore, how to establish an accurate nonlinear model while ensuring real-time performance is particularly important in the development of NMPC.
In this paper, a trajectory tracking control architecture combined with NMPC and RBFNN is studied. In the off-line phase, the random step signals are used as the excitation signal to obtain the state response of the underwater vehicle, which are taken as the model training sample. The improved adaptive Levenberg–Marquardt-error surface compensation (IALM-ESC) algorithm is applied to the underwater vehicle system identification, and fewer network nodes are used to reflect the dynamic characteristics. In the real-time control phase, the system output collected in each sampling period contains the information of external interference. The network parameters are updated online according to the error between the system output and the network prediction output. At the same time, an improved adaptive gray wolf optimization (AGWO) is proposed to improve the NMPC optimization performance and ensure the real-time control. The innovation of this paper mainly includes the following aspects:
(1) The structure and parameters of the radial basis function neural network (RBFNN) are determined by the IALM-ESC algorithm. Compared with the traditional gradient descent (GD) method, the applied algorithm has great improvement in convergence speed and convergence effect.
(2) In the real-time control stage, the neural network parameters are adjusted and updated online according to the prediction error, which improves the adaptive ability of the controller in the complex underwater environment.
(3) Based on the traditional gray wolf optimization (GWO) algorithm, the idea of adaptive weight and worst-case crossover is added to improve the global search ability and convergence speed, so as to ensure the real-time performance of NMPC.
The content of this paper is arranged as follows:
Section 2 introduces the problems of this paper, including the kinematics and dynamics model of underwater vehicles.
Section 3 introduces the design process of the controller.
Section 4 is the related simulation experiments, including the results of model identification, optimization and tracking control. Some results and future work are described in
Section 5.
2. Problem Description
This paper studies the trajectory tracking control of the underwater vehicles in the horizontal plane, and the motion of surge, sway and yaw are considered.
The kinematics model of underwater vehicles in the horizontal plane is shown in (1), which is used to describe the transformation relationship between the motion coordinate system and inertial coordinate system.
where
is the coordinate transformation matrix, which is defined as:
where
is the heading angle.
The dynamic model of underwater vehicles is represented as [
18]:
where
is the vector composed of surge velocity, sway velocity and yaw angular velocity in the motion coordinate system;
is the vector composed of X-direction position, Y-direction position and heading angle in the inertial coordinate system;
is the inertia matrix;
is the centripetal force and Coriolis force matrix;
is the damping matrix;
is the restoring force vector;
is the force and moment vector acting on three degrees of freedom.
By combining kinematics and dynamics equations, the model of the underwater vehicle’s system can be established as:
where
, and
.
The system behavior in MPC needs to be described by a predictive model. However, for underwater vehicles, the randomness of ocean current direction and velocity have a great impact on its dynamic characteristics. Coupled with the strong nonlinearity and strong coupling of the underwater vehicle’s system, it is difficult to establish an accurate system model, which affects the trajectory tracking performance. Therefore, in this paper, the RBFNN identification method is used to identify the dynamic model of underwater vehicles. On the one hand, it can improve the accuracy of model identification; on the other hand, it can modify the network parameters online and suppress the external interference and environmental changes to achieve the purpose of adaptive control.
The trajectory tracking control process of the underwater vehicles studied in this paper is shown in
Figure 1.
As shown in
Figure 1, the reference trajectory
defined in inertial coordinate system is composed of
N discrete trajectory points from a given initial state. Suppose the trajectory point at a certain time is
. For trajectory tracking control, the reference trajectory should satisfy the physical characteristics constraints and the kinematics equation of the underwater vehicles [
19]. That is:
According to (5), the real-time state of the underwater vehicle is obtained as:
In this paper, a radial basis function neural- nonlinear model predictive control (RBF-NMPC) method is designed to give the corresponding control law at each sampling time, so that the real state of the underwater vehicles coincides with the reference trajectory . The external disturbance will change the state of the underwater vehicle. Therefore, the state information of an underwater vehicle contains the information of external interference. The model accuracy of an underwater vehicle can be improved by adjusting the network parameters. The adaptive ability of the model enables the controller to send out correct control law to ensure that the running state of the underwater vehicles is still on the reference trajectory when there are external disturbances.
3. Controller Design
The principle of RBF-NMPC constructed in this paper is shown in
Figure 2. The specific implementation process is as follows:
First, from the input and output data, the dynamic model of underwater vehicles is identified off-line by using the IALM-ESC algorithm, so that the RBFNN can basically grasp the dynamic of the underwater vehicle.
Second, in the real-time control stage, in order to improve the accuracy of the NMPC, the error between the system output and the network prediction output is used to adjust the RBFNN parameters. The adjusted network is used to predict the state quantity in the future.
Finally, in the NMPC optimization stage, the objective function is optimized by the proposed AGWO algorithm, and the control sequence is obtained.
3.1. RBFNN Training
RBFNN is a kind of local approximation neural network with simple structure. By introducing the idea of Gaussian kernel function, any nonlinear system can be approximated with a compact set and any precision. Suppose that there are
neurons in the hidden layer and
is the input of the RBFNN, then the output can be expressed as:
where
is the vector value of the center point of the
hidden layer neuron,
is the width vector of the Gaussian kernel function of the
hidden layer neuron, and
is the weight of the
hidden layer neuron.
When constructing RBFNN, there are two problems to be solved: one is structure identification, the other is parameter estimation. Structure identification is to determine the number of nodes in the hidden layer of the network, and the parameter estimation is to find a set of network parameters (center, band width and weight), which minimums the sample error function (root mean square error):
where
p is the current sample,
n is the total number of samples,
is the expected output of the sample, and
y is the output of RBFNN.
The IALM-ESC applied in this paper is an incremental network construction algorithm. Starting with zero network nodes, the maximum of the error surface is compensated by adding network nodes at the peak or valley of the error surface. The parameters of each new network node are adjusted by the following update rules [
20]:
where
is a quasi-Hessian matrix,
is the gradient vector,
is the adaptive damping coefficient, and its adjustment rule is as:
where
is the constant.
and
are the sum of sub matrix
and sub vector
of all samples, respectively, and there is:
where,
where,
is the row vector of Jacobian matrix, which is described as:
According to the update rule of the gradient descent learning algorithm, the elements of row vector of Jacobian matrix can be expressed as:
After adjusting all the parameters, if the root mean square error between the predicted value and the actual value of the RBFNN does not reach the target value, the network nodes will continue to be added at the maximum error, and the node parameters will be trained until the target value is met.
In the off-line phase, the model of the underwater vehicle is initially established. However, in the real-time control stage, the prediction model should also have the ability of adjustment to adapt to the unknown underwater environment. The underwater interference is decomposed into an interference force on each degree of freedom, which causes the state of the underwater vehicle to change. Therefore, it can be considered that the state information of an underwater vehicle contains the information of external interference. According to the error between the system output and the predicted state of the model, the model accuracy of an underwater vehicle can be improved by adjusting. This idea is widely used in the existing adaptive neural network controller of underwater vehicle [
21,
22]. In this paper, the adaptive gradient descent is used to adjust the network parameter in the real-time control phase to minimize the error between the MPC model and the system output. In each sampling period, after the new sample data are collected, the network parameters are updated once by (15). With the increase in running time, network parameters will be gradually adjusted to adapt to the changes of the external environment.
where
is adaptive learning rate, that equal to the adaptive damping coefficient in (10).
is the gradient vector, that composed of factors in (14).
3.2. Objective Function and Constraints
The velocity and angular velocity of each degree of freedom of underwater vehicle in the future can be estimated according to the established RBFNN prediction model. The expression is as follows:
where
is the prediction horizon,
is the prediction output at the sampling time
k.
is the input, and it will change with the prediction horizon.
and
are the order of the output and input, respectively. Then the position and attitude
of the underwater vehicle in the prediction horizon are calculated by the kinematics equation, and all the state variables
of the underwater vehicle in the prediction horizon are obtained.
In this paper, the minimum value of quadratic objective function is used to express the optimization performance index at
k time [
23]. The expression is as follows:
where
is the difference between reference trajectory and model prediction.
is the control increment.
and
represent the prediction horizon and control horizon.
Q and
λ are corresponding weighting matrices.
and
are the upper and lower bounds of
.
and
are the upper and lower bounds of
.
and
are the upper and lower bounds of
, respectively.
3.3. AGWO Algorithm
For solving NMPC, the traditional gradient descent method has limitations in calculating capability [
24]. Biological heuristic optimization algorithm has been proved to have strong application potential in complex NMPC problems [
25]. In [
26], a modified GWO and the Moth-Flame Optimization were proposed to improve the performance when applied as an NMPC solver. The GWO algorithm simulates the predatory behavior of gray wolf group and achieves the goal of optimization based on the mechanism of wolf group cooperation [
27]. In GWO algorithm, the first three wolves with the best fitness (optimal solution:
,
and
) guide other wolves to search for the target. The remaining wolves (candidate solutions) are defined as
, and they update their positions around
,
and
. The distance between the individual and the prey is shown in (18), and its position update is shown in (19).
where
t is the current iteration times.
and
are the coefficient vectors.
and
are the position vectors of prey and gray wolf, respectively, and there is
,
.
is the convergence factor and decreases linearly from 2 to 0 with the number of iterations. The modules of
and
are random numbers between [0, 1].
The location update of each search factor is expressed as:
With the increase in the number of iterations
t, the GWO algorithm finally finds the optimal solution through the way of trapping. Although the GWO algorithm has been widely used, it also has the characteristics of slow convergence speed and is easy to be limited to local minimum [
28]. In order to improve the performance of the GWO algorithm, an adaptive strategy is applied in this paper. The adaptive weight is added to Equation (20) to speed up the convergence speed. In addition, the ability to jump out of the local optimum can be improved by crossing the worst set of each iteration.
In general, wolf
has the highest command in the pack. However, in some special cases,
and
can also command wolves temporarily. Therefore, the position of the wolf pack must be iterated according to different weights. However, if the weight is fixed, it will not be conducive to the regeneration of the population. Therefore, an adaptive weighted position updating method is applied, as shown in (21).
where
,
and
are the adaptive weights in each iteration, which can be expressed as:
where,
is the standard normal distribution;
is the average value of weight update, which can be expressed as:
where
c is a constant, which is set to 0.1 in this paper.
is a file which stores better weight than the last iteration. In the initialization phase, the stored weights
are all 1.
In addition, inspired by the differential evolution algorithm, the idea of worst-case crossover is proposed in order to improve the global search performance of the GWO algorithm. The search factor set with poor fitness in each iteration exchanges information with
,
and
, so as to increase the diversity of the population. The ability to jump out of the local optimum can be improved by crossing the population. Set the number of bad sets as K. The information exchange formula of each difference factor in K is expressed as follows:
where
is a random number between [0, 3], and
d is the dimension of search factor.
Through the above improvements, the proposed AGWO algorithm has a great improvement in the convergence speed and the search ability of the global optimal value when compared with the traditional GWO algorithm.
The AGWO algorithm proposed in this paper is applied to the optimization problem shown in (17), and the first control sequence of the optimization result is taken as the optimal control law. Then it is loaded into the underwater vehicle’s system for real-time control. The flow chart of the proposed RBF-NMPC Algorithm 1 is as follows:
Algorithm 1 RBF-NMPC |
1: | Develop RBFNN predictive model offline using offline data; |
2: | Initialize the parameters of RBF-NMPC; |
3: | For k = 1 to N do |
4: | Sample the plant output ; |
5: | Update the parameters of RBFNN to adapt the real environment; |
6: | Calculate the prediction outputs ; |
7: | While current iteration times ; |
8: | Compute the control signal by AGWO; |
9: | ; |
10: |
End while |
11: | Sent the control signal to the underwater vehicle; |
12: | end for |