1. Introduction
Long ago, Leonard Euler spoke about the optimal arrangement of everything in the world: “For since the fabric of the universe is most perfect and the work of a most wise Creator, nothing at all takes place in the universe in which some rule of maximum or minimum does not appear”. Striving for optimality is natural in every sphere.
In order to optimally move an autonomous robot to a certain target position, currently, as a standard, engineers first solve the problem of optimal control, obtain the optimal trajectory, and then solve the additional problem of moving the robot along the obtained optimal trajectory. In most cases, the following approach is used to move the robot along a path. Initially, the object is made stable relative to a certain point in the state space. Then, the stability points are positioned along the desired path and the object is moved along the trajectory by following these points from one point to another [
1,
2,
3,
4,
5,
6,
7]. The difference between the existing methods is in solving the control synthesis problem to ensure stability relatively to some equilibrium point in the state space and in the location of these stability points.
Often, to ensure stability, the model of the control object is linearized relative to a certain point in the state space. Then, for the linear model of the object, a linear feedback control is found to arrange the eigenvalues of the closed-loop control system matrix on the left side of the complex plane. Sometimes, to improve the quality of stabilization, control channels or components of the control vector are defined that affect the movement of an object along a specific coordinate system axis of the state space. Then controllers, as a rule PI controllers, are inserted into these channels with the coefficients that are adjusted according to the specified control quality criterion [
3,
4]. In some cases, analytical or semi-analytical methods are used to solve the control synthesis problem and build nonlinear stable control systems [
5,
7]. But the stability property of the nonlinear model of the control object, obtained from the linearization of this model, is generally preserved only in the vicinity of a stable equilibrium point.
The main drawback of the approach when the control object is moved along the stable points on the trajectory is that even if this trajectory is obtained as a solution of the optimal control problem [
8], then the movement itself will never be optimal. To ensure optimality, it is necessary to move along the trajectory at a certain speed, but when approaching the stable equilibrium point, the speed of the control object tends to zero.
The optimal control problem generally does not require ensuring the stability of the control object. The construction of a stabilization system that provides the stability of the object relative to some equilibrium point in the state space is carried out by the researcher to achieve predictable behavior of the control object in the vicinity of a given trajectory.
The optimal control problem in the classical formulation is solved for a control object without any stabilization system; therefore, the resulting optimal control and the optimal trajectory will not be optimal for this object with a further introduced stabilization system. It follows that the classical formulation of the optimal control problem [
9] is missing something as far as its solution cannot be directly implemented in the real object, since this leads to an open-loop control. The open-loop control system is very sensitive to small disturbances, but they are always possible in real conditions, since no model accurately describes the control object. In order to achieve optimal control in a real object, it is necessary to build a feedback control system, which should provide some additional properties, for example stability relative to the trajectory or points on this trajectory. The authors of [
10] proposed an extended formulation statement of the optimal control problem, which has additional requirements established for the optimal trajectory. The optimal trajectory must have a non-empty neighborhood with a property of attraction. Performing these requirements provides implementation of the solution of the optimal control problem directly in the real control object.
In [
11,
12], an approach to solving the extended optimal control problem on the base of the synthesized control is presented. This approach ensures obtaining a solution of the optimal control problem in the class of practically implemented control functions. According to this approach, initially, the control synthesis problem is solved. So, the control object becomes stable in the state space relatively to some equilibrium point. In the second stage, the optimal control problem is solved by determination of optimal positions of the stable equilibrium point. Switching stable points after a constant time interval ensures moving the control object from initial state to the terminal one optimally according to the given quality criterion. Optimal positions of stable equilibrium points can be far from the optimal trajectory in the state space; therefore, a control object does not slow down its motion speed. Studies of synthesized control in various optimal control problems have shown that such control is not sensitive to perturbations and can be directly implemented in a real object [
13,
14].
In synthesized control, the optimal control problem is solved for a control object already with a stabilization system. Another advantage of synthesized control is that the position of the stable point does not change during the time interval; that is, an optimal control function is solved using the class of piece-wise constant functions, which simplifies the search for the optimal solution.
It is possible that piece-wise constant control in the synthesized approach finds several optimal solutions with practically the same value of the quality criterion. This circumstance prompted in us the idea to find among all almost-optimal solutions one that is less sensitive to perturbations. This approach is called adaptive synthesized control.
In this work, a principle of adaptive synthesized control is proposed in
Section 2, methods for solving it are discussed in
Section 3 and further in the
Section 4, a computational experiment to determine the solution of the optimal control problem for the spatial motion of quadcopter by adaptive synthesized control is considered.
2. Adaptive Synthesized Control
Consider the principle of adaptive synthesized control for solving the optimal control problem in its extended formulation [
10].
Initially, the control synthesis problem is solved to provide stability of the control object relatively some point in the state space. In the problem, the mathematical model of the control object in the form of ordinary differential equation system is given.
where
is a state vector,
,
is a control vector,
,
is a compact set that determines restrictions on the control vector.
The domain of admissible initial states is given
To solve the problem numerically, the initial domain (
2) is taken in the form of the finite number of points in the state space:
Sometimes, it is convenient to set one initial state and deviations from it:
where
is a given initial state,
is a deviations vector,
, ⊙ is Hadamard product of vectors,
is a binary code of the number
j. In this case
.
The stabilization point as a terminal state is given by
It is necessary to find a control function in the form
where
, such that it minimizes the quality criterion
where
is the time of achieving the terminal state (
5) from the initial state
,
is determined by an equation
is a particular solution from initial state
,
, of the differential Equation (
1) with an inserted control function (
6)
is a given accuracy for hitting to terminal state (
5),
is a given maximal time for control process,
p is a weight coefficient.
Further, using the principles of synthesized optimal control the following optimal control problem is considered. The model of control object in the form (
9) is used
where the terminal state vector (
5) is changed into the new unknown vector
, which will be a control vector in the considered optimal control problem.
In accordance with the classical formulation of the optimal control problem, the initial state of the object (
10) is given
In the engineering practice, there can be some deviations in the initial position; therefore, in adaptive synthesized control, instead of one initial state (
11) the set of initial states used are defined by Equation (
4). The vector of initial deviations
is defined as a level of disturbances.
The goal of control is defined by achievement of the terminal state
The quality criterion is given
where
is a terminal time,
is not given but is limited,
,
is a given limit time of control process.
According to the principle of synthesized control, it is necessary to choose time interval
and to search for optimal constant values of the control vector
for each interval
where
M is a number of intervals
So the system (
10) with the found optimal constant values of the control vector (
14) in the right-hand side of differential equations has a particular solution which reaches the terminal state (
12) from the given initial state (
11) with an optimal value of the quality criterion (
13).
Algorithmically, in the second stage of the adaptive synthesized control approach, the optimal values of the vector
are found as a result of the optimization task with the following quality criterion, which takes into account the given grid according to the initial conditions:
where
K is number of initial states,
is determined by Equation (
8).
3. Methods of Solving
As described in the previous section, the approach based on the principle of adaptive synthesized optimal control consists of two stages.
To implement the first stage of the approach under consideration for solving the control synthesis problem (
1)–(
9), any known method can be used. For linear systems, for example, methods of modal control [
15] can be applied, as well as such analytical methods such as backstepping [
16,
17] or synthesis based on the application of the Lyapunov function [
18]. In practice, stability is ensured through linearization of the model (
1) in the terminal state and setting PI or PID controllers in control channels [
19,
20]. All known analytical and technical methods have their limitations, which mostly depend on the type of the model used to describe the control object. The mathematical formulation of the stabilization problem as a control synthesis problem is needed to apply numerical methods and automatically obtain a feedback control function. Today, to solve the synthesis problem for nonlinear dynamic objects of varying complexity, modern numerical methods of machine learning can be applied [
21]. Among different machine learning techniques, only symbolic regression allows searching both for the structure of the needed mathematical function and its parameters. In our case, the needed function is a control function. So, in the present paper machine learning by symbolic regression [
22,
23] is used.
Methods of symbolic regression search for the mathematical expression of the desired function in the encoded form. These methods differ in the form of this code. The search for solutions is performed in the space of codes by a special genetic algorithm.
Let us demonstrate the main features of symbolic regression on the example of the network operator method (NOP), which was used in this work in the computational experiment. To code a mathematical expression NOP uses an alphabet of elementary functions:
- –
Functions without arguments or parameters and variables of the mathematical expression
- –
Functions with one argument
- –
Functions with two arguments
Any elementary function is coded by two digits: the first one is the number of arguments, the second one is the function number in the corresponding set. These digits are written as indexes of elements in the introduced sets of the alphabet (
17)–(
19). The set of functions with one argument must include the identity function
. Functions with two arguments should be commutative, associative and have a unit element.
NOP encodes a mathematical expression in the form of an oriented graph. Source-nodes of the NOP-graph are connected with functions without arguments, while other nodes are connected with functions with two arguments. Arcs of the NOP-graph are connected with functions with one argument. If on the NOP-graph some node has one input arc, then the second argument is a unit element for the function with two arguments connected with this node.
Let us define the following alphabet of elementary functions:
With this alphabet the following mathematical expressions can be encoded in the form of NOP:
The NOP-graphs of these mathematical expressions are presented in
Figure 1. The nodes of the graph are numbered. Inside each node there is either the number of a binary operation or an element of the set of variables and parameters
, and the arcs of the graph indicate the numbers of unary operations.
In the computer memory, the NOP-graphs are presented in the form of integer matrices.
As the NOP-nodes are enumerated in such a way that the node number from which an arc comes out is less than the node number to which an arc enters, then the NOP-matrix has an upper triangular form. Every line of the matrix corresponds some node of the graph. Lines with zeros in the main diagonal corresponds to source-nodes of the graph. Other elements in the main diagonal are the function numbers with two arguments. Non-zero elements above the main diagonal are the function numbers with one argument.
NOP-matrices for the mathematical expressions (
21) have the following forms:
To calculate a mathematical expression by its NOP-matrix, initially, the vector of nodes is determined. The number of components of the vector of nodes equals to the number of nodes in a graph. The initial vector of nodes includes variables and parameters in positions that correspond to source nodes, as well as other components equal to the unit elements of the corresponding functions with two arguments. Further, every line of the matrix is checked. If element of the matrix does not equal zero, then corresponding element of the vector of nodes is changed. To calculate mathematical expression by the NOP-matrix, the following equation is used:
where
is a unit element for function with two arguments
,
Consider an example of calculating the second mathematical expression in (
21) on its NOP-matrix
.
The initial vector of nodes is
Further, all strings in the matrix
are checked and non-zero elements are found.
The last mathematical expression coincides with the needed mathematical expression for
(
21).
So, we considered the way of coding in the NOP method. Then, to search for an optimal mathematical expression in some task, the NOP method applies a principle of small variations of a basic solution. According to this principle, one possible solution is encoded in the form of the NOP-matrix
. This solution is the basic solution and it is set by a researcher as a good solution. Other possible solutions are presented in the form of sets of small-variation vectors. A small variation vector consists of four integer numbers
where
is a type of small variation,
is a line number of the NOP-matrix,
is a column number of NOP-matrix,
is a new value of an NOP-matrix element. There are four types of small variations:
is an exchange of the function with one argument, if
, then
;
is an exchange of the function with two arguments, if
, then
;
is an insertion of the additional function with one argument, if
, then
;
is an elimination of the function with one argument, if
and
,
,
and
,
, then
.
The initial population includes
H possible solutions. Each possible solution
except the basic solution is encoded in the form of the set of small variation vectors
where
d is a depth of variations, which is set as a parameter of the algorithm.
The NOP-matrix of a possible solution is determined after application of all small variations to the basic solution
Here, the small variation vector is written as a mathematical operator changing matrix .
During the search process sometimes the basic solution is replaced by the current best possible solution. This process is called a change of an epoch.
Consider an example of applying small variations to the NOP-matrix
. Let
and there are three following small variation vectors:
After application of these small variation vectors to the NOP-matrix
, the following NOP-matrix is obtained:
This NOP-matrix corresponds to the following mathematical expression:
Similar to a search engine, a genetic algorithm is used. To perform the main genetic operation of crossover, two possible solutions are selected randomly
A crossover point is selected randomly
. Two new possible solutions are obtained as the result of exchanging elements of the selected possible solutions after the crossover point:
The second stage of the synthesized principle under consideration is to solve the problem of optimal control via determination of the optimal position of the equilibrium points. Studies have shown that for a complex optimal control problem with phase constraints, evolutionary algorithms allow the system to cope with such problems. Good results were demonstrated [
24] by such algorithms as a genetic algorithm (GA) [
25], a particle swarm optimization (PSO) algorithm [
26,
27,
28], a grey wolf optimizer (GWO) algorithm [
29] or a hybrid algorithm [
24] involving one population of possible solutions and all three evolutionary transformations of GA, PSO and GWO selected randomly.