1. Introduction
Robots can walk in a variety of ways. At present, the movement forms can be roughly divided into wheeled [
1], tracked [
2], wheel-foot compound [
3], snake-like [
4], bionic legged [
5], and so on. Compared with other types of robots, bionic-legged robots have the characteristic of discontinuous support because they have a similar leg structure to tetrapods. Especially when combined with hydraulic drive, which has a high power-to-weight ratio, it not only has good adaptability to unknown and unstructured environments but can also pass through the barrier. Therefore, this type of robot is particularly suitable for use in complex environments in the wild.
The leg controller serves as the bottom-level controller of this kind of robot, and each leg of the robot has several degrees of freedom controlled by highly integrated valve cylinders, also known as the hydraulic drive unit (HDU) [
6,
7]. While the HDU serves as the bottom-level controller of each leg, its control performance directly affects the control strategy and performance of the robot. Commonly, HDU bottom-level control methods can be divided into position control and force control. Based on bottom-level control, control methods of the leg can be extended to compliance control, contact force control, and so on. The above methods are not only applied in electrically driven robots such as Scara [
8] and Stewart [
9], but they can also be applied to robots such as Bigdog [
10], Hydraulic quadrupedal (HyQ) [
11], Light Weight Robot (LWR) [
12], and Atlas [
13].
This paper mainly researched the performance of the HDU in position control. The position control system in the HDU is a kind of high-order nonlinear system. Designing a superior control method requires a very detailed understanding of the characteristics of the controlled system. The establishment of a mathematical model involves analysis of the controlled system, and an accurate mathematical model can truly reflect the dynamic characteristics of the system, fully simulate the actual system in simulation research, and shorten the design cycle of the control method. High-performance intelligent control methods suitable for low-order nonlinear systems can also be used in it. However, in order to ensure the control stability and reliability of the whole machine, such a control method is not often used in engineering practice. The traditional control method is simple to implement and the effect is obvious. Furthermore, the change in the control parameters can truly reflect the system characteristics, which can be used to conduct a preliminary analysis of the system performance. Thus, the HDU position control system is still based on traditional PID control.
A neural network is a computational model that comprehensively simulates the human brain neural network in terms of structure, mechanism, and function [
14,
15,
16]. By virtue of its complex nonlinear network structure and efficient iterative learning performance, it has obvious advantages compared with other nonlinear optimization methods. Some research works have shown that neural networks can fit arbitrary nonlinear functions. Swic presented an original machine learning-based automated approach for controlling the process of machining of low-rigidity shafts using artificial intelligence methods. Three models of hybrid controllers based on different types of neural networks and genetic algorithms were developed [
17]. Rego deals with the problem of finding the control Lyapunov function that keeps the system stable. To find the Lyapunov function, this paper proposes the use of reinforcement learning with two neural networks based on the Lyapunov stability theory [
18]. Nobahari focuses on developing a nonlinear controller based on the convolutional neural networks to control different plants. It is assumed that prior knowledge of the plants is very limited and there are only sensory input–output data history of the plants [
19]. Wang studied the hysteresis nonlinear characteristics of piezoelectric actuators, a novel hybrid modeling method based on long short-term memory (LSTM) and nonlinear autoregressive with external input (NARX) neural networks is proposed [
20].
The neural network is used to learn the relationship between parameters and control performance under different working conditions, and to find out the optimal control parameters under the current working conditions, which can improve the control accuracy of the system under various working conditions and eliminate the work of manual adjustment of parameters. Compared with variable value PID based on manual parameter adjustment, the method based on neural networks can output parameters with continuous variation according to different working conditions, thereby improving the accuracy of control. In addition, the latter method is not restricted by a specific number of conditions in the expert table. Thus, the applicable scope of the improved expert table holds great significance for the application of engineering.
The structure and the contribution of this paper is organized as follows: in
Section 2, a mathematical model is established for the HDU position control system. In the model, many factors are carefully considered, such as servo valve nonlinearity, flow-pressure nonlinearity, and load characteristics. In
Section 3, aiming at the inconvenience of variable value PID based on manual parameter adjustment in engineering practice, a method of employing double-layer back propagation (BP) neural networks for learning the law of PID control parameters is proposed, and the simulation results are shown, this is the main contribution of our paper. In
Section 4, experimental research is carried out on the HDU performance test platform.
3. Adaptive PID Parameter Control Method Based on a Double-Layer BP Neural Network
3.1. Learning Strategy Design
Neurons are the basic unit of neural networks and their main function is to simulate the functional characteristics of biological neurons [
21,
22,
23]. Considering that the input of the neural network in this paper comes from the sensor data of the control system, a Tanh activation function in the Sigmoid activation functions (the latter is generally referred to as a Sigmoid activation function) was selected as the activation function of neurons.
In order to make the system automatically output the optimal control parameters according to the working conditions, it is necessary to design the appropriate neural network structure first. If the neural network is too simple, the fitting accuracy will be reduced; if the neural network is too complex, the convergence will be slow, and even the generalization ability of the neural network will be reduced. Therefore, it is very important to design a neural network with an appropriate structure. Then, designing learning strategies to enable the neural network to learn effectively are needed, including the learning objects of the neural network, the selection of samples, the initial processing of samples, and iterative learning methods. In this section, a parameters learner based on a double-layer BP neural network is designed, which can realize automatic parameter learning. The overall learning strategy is shown in
Figure 3, and the details are explained in the following sections.
3.2. Generation of Learning Samples
The sample is a very important part of neural network learning problems and is the source of learning for effective information. The sample data in this paper were driven by position control system simulation or experimental collection in the HDU. The data contained random interference generated by the system itself, and the range of each variable data was also different, so it was necessary to process the data before it was used for learning. The sample data used in this section had to meet the following conditions:
- (1)
The samples should cover a wider range of working conditions and control parameters as much as possible, and the performance indexes under the corresponding working conditions should be obtained through experiment or simulation, so that the neural network can learn the characteristics of the control system and improve the adaptive ability of the control method.
- (2)
The sample should be universal. The hydraulic system is a highly nonlinear time-varying system, and the dynamic characteristics of the system change with the different external conditions. Collection of data should be carried out after the hydraulic system has been started up and run stably under good heat dissipation conditions.
- (3)
The data interval of each variable in the sample should be as consistent as possible, which is beneficial for improving the convergence speed and stability of neural networks.
According to the above conditions and principles, a plan of learning data for the PID position control system of the HDU was designed in this section. By generating the input signals and change signals of the control parameters, then importing them into the control model, automatic data acquisition was realized.
In order to prove the effectiveness of the proposed learning strategy, part of the overall working conditions of the HUD were selected for verification to reduce unnecessary work, and then the control parameter range was simplified based on the simulation results of the PID control system shown in
Section 2. The working conditions and control parameters finally determined in this section are shown in
Table 2.
The final working conditions are generated by the permutation and combination of sinusoidal frequency, sinusoidal amplitude, and P gain in
Table 2, and there are eight groups of sinusoidal frequency, 15 groups of P gain, 10 groups of sinusoidal amplitude, and 1200 working conditions in total. In order to avoid the mutual influence between two adjacent working conditions, each working condition runs for two cycles, with an overall sampling time of approximately 1632 s. Moreover, the mean of the control deviation absolute value at each moment of the last cycle is taken as the basis for evaluating the control performance.
The desired input signals in the simulation are shown in
Figure 4. Due to the long sampling time, sinusoidal curves at different frequencies are relatively dense, as shown in the
Figure 4 below.
The P gain of controller in the simulation is shown in
Figure 5.
The working conditions parameters include sinusoidal frequency and amplitude of input signal, the control parameters are P gain of the PID control method, and the performance index in the system is the mean of control deviation absolute value. It can be seen that in
Table 2, there is an order of magnitude difference in the size of these three variables, which is not beneficial to the learning of the neural network. Therefore, the above three variables should be appropriately transformed to make their interval roughly between 0 and 1. So, the concept “data after processing” in the following section is the data after normalization.
3.3. Performance Fitting of Control System
In
Section 3.2, the mean of control deviation
e under different working conditions and control parameters are obtained through simulation. In this section, neural network 1 is used to fit the relationship among the working condition parameters, control parameters, and the mean of control deviation
e. Then, neural network 1 can be used to calculate the mean of control deviation
e with different control parameters under each working condition. The parameters with the minimum of mean of control deviation
e under each working condition are selected, so as to complete the optimization process of the control parameters.
- (1)
Input and output of the neural network
Neural network 1 was designed. The input of the neural network is a three-dimensional vector, which represents the sinusoidal frequency and amplitude of the input signal and P gain, respectively, and the output is the mean of control deviation
e of the corresponding set of parameters.
- (2)
Selection of the loss function
The loss function is the index used to evaluate the model fitting effect, and the goal of neural network learning is to make the loss function as small as possible. The input and output variables of the neural network are continuous values, and the mean square error function is adopted. Its expression is as follows:
- (3)
Determination of the neural network structural parameters
The total number of neural network layers is three, including the input layer, the output layer, and a hidden layer. The number of neurons in the hidden layer is 13, and the activation function is Sigmoid, the overall structure of neural network 1 is shown in
Figure 6. The sinusoidal input signals and control parameters are shown in
Table 2, the output of the neural network (mean of control deviation
e) indicates the mean of the control deviation
e between the input signals and output signals of the HDU position control system.
- (4)
Training of neural network 1
The input of neural network 1 after data processing is shown in
Figure 7.
The output of neural network after data processing is shown in
Figure 8.
The processed data are fed into the neural network for learning until the gradient is less than 10−6 or the mean square deviation is less than 10−4.
3.4. Optimization of the Control Parameters
The sinusoidal frequency and amplitude of the input signals can be determined for a specific working condition. Taking the control parameters as independent variables, mapping the relationship established through neural network 1 as a function and the mean of control deviation e as the dependent variable, the relationship between the control performance and control parameters can be obtained under this working condition. There is an obvious rule between the control performance and the control parameters, so the control parameters with better control performance can be obtained through the curves. The optimal control parameters under the working conditions are selected according to certain rules and neural network 2 is used to learn the relationship between the working condition parameters and the selected control parameters. After learning, the neural network is used to adaptively change the control parameters according to the working conditions, so as to realize the adaptive control. The specific learning model was designed as follows:
- (1)
Selection of the neural network input and output
The purpose of neural network 2 is to calculate the control parameters that meet the rules under different working conditions. Therefore, the input of the neural network are the sinusoidal frequency and amplitude of the input signals, which are generated through permutation and combination with a sinusoidal frequency of 0.4~2 Hz and a sinusoidal amplitude of 1~5 mm, forming at the intervals of 0.01 Hz and 0.05 mm, respectively. The neural network output are the selected control parameters which could control the model in
Figure 2 instead of the PID. The overall structure of neural network 2 is shown in
Figure 9.
- (2)
Rules of parameter selection
The control parameters with the minimum of control deviation e are selected to form the output sample of the neural network.
- (3)
Training of neural network 2
Neural network 2 consists of three layers, including a hidden layer and 10 neurons in this hidden layer. The activation function is Sigmoid, and the loss function is the mean square error.
3.5. Simulation
Neural network 2, after training, was applied to the HDU position control system. Then, the updated schematic diagram of the HDU position control system is shown in
Figure 10.
While the working conditions parameters changed, the neural network 2 automatically adjusted the control parameters according to the working condition to realize the adaptive control. Based on the MATLAB/Simulink model of the system established in
Section 2, this section introduces a MATLAB function module for the neural network 2 calculation, and the results were output to the PID control model.
In the simulation, the initial position of the hydraulic cylinder piston was 25 mm, the P gain was the output of neural network 2, the I gain was 2, and the D gain was 0. The simulation working conditions are shown in
Table 3.
The ideal control deviation (reference signal) was 0 which means that there is no control deviation between the input and the output. The comparison curves with constant and variable value PID are shown in
Figure 11 (adaptive PID control based on a neural network is neural network PID for short, control deviation
e is deviation
e for short).
The control deviation of the adaptive PID control system based on the neural network (the blue curves in
Figure 11) is shown in
Table 4 (maximal relative deviation is equal to the ratio of the maximum deviation to the sinusoidal amplitude).
According to the simulation results, under the three working conditions, the maximum relative deviation of the adaptive PID method based on a neural network decreased by an average of 31.3% compared with the maximum relative deviation of the constant value PID and increased by 7.87% compared with the maximum relative deviation of the variable value PID. The deviation of the adaptive PID method based on a neural network was greatly reduced compared with the constant value PID, which approached the effect of the manually adjusted PID control parameters and maintained a good control performance under multiple working conditions. Due to space limitations, additional simulation results are not included in this paper.
4. Experiments
4.1. Introduction to the Experimental System
The experiment of this study was carried out on the performance test platform of the HDU. The platform is mainly composed of two HDUs, which are installed in the top. The HDU on the left adopts the position of closed-loop control, while the HDU on the right adopts the force closed-loop control position. In the experiment, the HDU on the left carried out the performance test of the relevant control algorithm, and the HDU on the right carried out the zero-force servo control. In each experiment, the working conditions of the left and right HDUs were the same. The photo of the experimental platform is shown in
Figure 12a.
The controller used in the experiments is a semi-physical simulation experiment platform dSPACE-MicroLabBox shown in
Figure 12b. MicroLabBox is supported by a comprehensive dSPACE software package, real-time interface (RTI) for Simulink (MathWorks, Natick, America) for model-based I/O integration and the experiment software ControlDesk, which provides access to the real-time application during run time by means of graphical instruments.
After the control algorithm in MATLAB/simulink, we used the code to automatically generate the target C code that could then be identified by the controller. Compared with manual C coding, combining MATLAB/simulink with the encoder can quickly design and test the control algorithms, avoid the complexity of the underlying C code writing, and improve the speed of the controller implementation stage. In the experiment, the data sampling frequency was 1 KHz.
Figure 13 is the schematic diagram of the experimental signal input and data acquisition.
4.2. Collection of Learning Samples
As a joint actuator of robots, the HDU is the key to determining the motion performance of robots. According to the movement of the robot during trotting, pacing, and other gaits, the proposed sampling range of experimental learning samples is shown in
Table 5.
The final working conditions were obtained by permutation and combination in the table, with a total of 324 groups of working conditions, and each group of working conditions ran for three cycles. In order to avoid mutual influence between adjacent conditions, the mean of control deviation for the last two working conditions was taken as the evaluation of the performance index. The generated system input signal sequence is shown in
Figure 14 and
Figure 15, and the signal acquisition interface is shown in
Figure 16.
4.3. Optimization of the Control Parameters
The samples obtained in
Section 4.2 were used to learn the relationship among the working conditions parameters, the control parameters, and the control performance, and the neural network structure and data processing methods used were the same as those in
Section 3.3. The training performance of the neural network is shown in
Figure 17.
It can be seen that after the completion of neural network learning, the value of the mean square error reached the magnitude 10−4, which well estimated the control performance and laid a foundation for the next calculation of control parameters.
The control performance index of the HDU was set as follows: the maximum of control deviation e should not exceed 5% of the sinusoidal amplitude. Based on the obtained neural network, the corresponding system performance under different working conditions and the control parameters were calculated, and the control parameters required to meet the control performance requirements were selected. The working condition parameters were taken as the input of neural network 2, and the selected control parameters were taken as the desired output of neural network 2. The sinusoidal frequency of the input signal was 0.5–2 Hz and the amplitude was 5–15 mm, and the input signals were generated by permutation and combination at intervals of 0.01 Hz and 0.05 mm, respectively.
The neural network structure and data processing methods used were the same as those used in
Section 3.4. The learning performance of neural network 2 is shown in
Figure 18.
It can be seen that the neural network converged rapidly, and the value of the mean square error reached an order of magnitude 10−1 after learning, which meets the requirements of controlling parameter adjustment accuracy.
4.4. Experiment of Adaptive PID Control Based on a Neural Network
In order to verify the performance of the adaptive PID control based on a neural network, an experiment was carried out on the performance test platform of the HDU under the working conditions shown in
Table 3, and the control performance of the system under different working conditions was tested.
The initial position of the piston of the HDU was 25 mm, and the oil source pressure of the system was 5 MPa. The working conditions were input into the adaptive PID control system based on the neural network, and a deviation curve was obtained, which was compared with the deviation curve of the PID control with constant and variable values, as shown in
Figure 19.
The control deviation of the adaptive PID method based on the neural network (the blue curves in
Figure 19) is shown in
Table 5.
As shown in
Figure 19 and
Table 6, due to the setting of the parameter selection rules, the control deviation was slightly larger than that of the constant value PID under working condition 1. It greatly improved over that of the constant value PID method under the other two working conditions. The maximum relative deviation of the three working conditions reduced by 22.13% on average compared with that of the constant value PID method, which is close to the deviation level of the variable value PID method. On the whole, the control accuracy of the adaptive PID method based on a neural network was between the constant value PID method and the variable value PID method, which is slightly worse than the variable value PID method. However, its control accuracy was better than that of the constant value PID method, which has good adaptability and can maintain better control accuracy under various working conditions.
According to the proposed method in this paper, more parameter information corresponding to working conditions can be learned, and the same research idea can be extended to other control systems with similar structures. Moreover, based on this double-layer BP neural network, other “machine learning” methods such as deep deterministic policy gradient (DDPG) could be researched.