1. Introduction
With the development of robot and control technology, various robots are widely used in industry. Different applications present specific requirements for robot systems, such as rapidity, robustness, and safety [
1,
2,
3]. However, among all the indices required by applications, the controllability of the robot in fault state has become the most critical factor. Fault identification is the precondition for realizing this goal, which promotes the investigation of our research [
4,
5,
6].
Robot systems cannot work without the support of different kinds of sensors and actuators. Miniaturization and multi-functionality are required for development. The rapid development of sensors, material science, and micro-electro-mechanical technology allows modern robot joint modules—such as hollow motor, servo driver, harmonic reducer, brake, and encoder—to be integrated within limited space [
7]. Sensors and actuators are key components in the robot system, but their working environment is very complex, with electromagnetic interference, vibration, etc., which will affect the output of the sensors and then the actuators. Moreover, the variable load on manipulators is also a challenge for system state feedback or estimation. All of the above factors make the faults diagnosis of robot system sensors and actuators an urgent task [
8].
In most robot system faults, sensor and actuator malfunction are the main causes of robot system failure. Therefore, diagnosis for the sensors and actuators is very important. In order to improve the reliability of robot joints and realize fault detection and fault-tolerant control of robot systems, researchers have been focused on fault detection and fault-tolerant control of robot joints for many years, and many practical fault diagnosis methods have been proposed. In [
9], redundant sensors are used on the robot joint, and then fuzzy rules are designed to adjust the threshold of the fault signal adaptively to carry out fault diagnosis. In [
10], for a six-degree-of-freedom robot joint system, low-cost MEMS magnetic, angular velocity, and gravity sensors are used to estimate the joint angle of a rotating manipulator. In [
11], a discrete-time framework for fault diagnosis of robot joint sensors, force or torque sensors, and actuators is proposed. The redundant sensors are used on the robot joint, and the feedback values from redundant sensors and the estimated values calculated by two isolation observers are input into the fault decision system. The data from redundant sensors are used to provide information for a group of diagnostic observers to detect, isolate, and identify faults of joint actuators, force, or torque sensors.
However, there may be another consideration when using redundant sensors for fault diagnosis. A robot fault diagnosis system based on redundant sensors not only increases structural complexity, but also increases the hardware cost of the system. In addition, redundant sensors also increase the probability of a sensor fault when the running time of a robot system approaches the sensor’s life cycle.
In order to overcome the shortcomings of using redundant sensors for fault diagnosis, observers have been widely used. There are many novel theories that could be used to design state observers for robot fault diagnosis. A robot-fault diagnosis method using fuzzy logic is proposed in [
12] to evaluate residuals. Fuzzy logic applied to robot fault diagnosis does not require any redundant sensors, but it relies on the fault model of the robot system. The sliding mode method can be seen everywhere in robot fault diagnosis. Daniele uses a second-order sliding mode observer for fault detection and isolation of the rigid manipulator of the COMAU robot and uses the suboptimal second-order sliding mode algorithm to design the input law of the proposed observer, which can detect a single fault on a specific brake or a specific sensor of the manipulator [
13]. Since the high order sliding mode observer can detect possible faults in specific parts of the robot system, the sliding mode method is greatly expanded [
14]. The observer design methods mentioned above are just some typical representatives, actually, there are many other methods that could be used for robot fault diagnosis, such as the output feedback method [
15], nonlinear disturbance observer [
16], and feedback linearization disturbance observer design method [
17]. As it is well known, the difficulty of observer-based robot fault diagnosis lies in the gain design process [
18].
Machine learning introduces an effective solution to the above problems caused by redundant sensors and observers. Typical application methods include, but are not limited to, genetic algorithm [
19], support vector machine [
20], cluster analysis [
21], and neural network [
22]. Among them, the neural network is widely used in the field of fault diagnosis because of its superior nonlinear fitting ability. Traditional methods of fault diagnosis manually realize feature extraction, so prior knowledge about fault information is needed, which increases the difficulty of analyzing the results. Neural networks, especially deep learning methods, can learn representations and patterns hierarchically from the input data, and realize effective feature extraction, so the deep learning method has the ability to model complex working conditions and output accurate predictions. Several typical deep learning methods have been successfully applied to fault diagnosis [
23,
24,
25,
26], including autoencoders [
27], deep belief networks [
28], and CNN [
29]. The autoencoders and feature ensemble method is applied in actuator fault diagnosis [
30]. Furthermore, the one-layer autoencoder-based neural network is proven to be effective in the task of fault classification [
31]. The deep belief nets model is successfully applied for fault diagnosis in actuators using vibration signals [
32]. One-dimensional CNN is used to analyze the raw time series data of the actuator and proved to be successful in diagnosing fault states [
33], and a new CNN architecture based on LeNet-5 is set to process the bearing data set [
34].
Considering that the output of sensor and actuator are similar when faults occur, the normal neural network fault diagnosis methods cannot exactly tell the difference between them. In this paper, the DCNN is used to diagnose sensor and actuator faults of robot joints. DCNN can extract the features from the input data and realize fault classification by increasing the depth of the network. In addition, flexible selection of convolution kernel width makes it an efficient way to deal with classification problems with weak characteristics. Actually, there may be many types of sensors and actuators; our research mainly focuses on the problems of fault diagnosis in position sensors for the robot joint and torque sensors for the actuator. The robot joint is forced to move in a sinusoidal trajectory with the control of actuator, and the position sensor feeds back corresponding signals under different sensor states. Position sensor and torque sensor are separately denoted by sensors and actuators in the following main text. The main contributions of this paper are as follows.
- (1)
This paper gives a fused sensor and actuator fault diagnosis model, where sensor and actuator fault could be expressed in one formulation. Still, different faults could be distinguished, which contributes to our study.
- (2)
This paper proposes a DCNN fault diagnosis method. There are several convolution blocks in the architecture, and the depth of each kernel on different blocks varies, which helps to extract features from the time domain of input data.
- (3)
Experiments with different neural network fault diagnosis methods, such as SVM, ANN, CNN, LTMN, are conducted and compared with DCNN to give a comparison.
The rest of this paper is organized as follows:
Section 2 introduces the basic structure of DCNN,
Section 3 introduces the neural network module training method based on deep CNN, simulation experiments are conducted in
Section 4 and the results are compared, the authors conclude the paper at the end.
3. Sensor and Actuator Fault Diagnosis Framework Using DCNN
The proposed robot joint sensor and actuator fault diagnosis framework based on DCNN is shown in
Figure 8. It is shown that the whole fault diagnosis framework can be divided into data fusion, multi-level feature extraction, dropout of neurons, full connection, and diagnosis output parts. There are six “Blocks” in the framework and each framework consists of the convolution layer, batch standardization layer, activation layer, and pooling layer.
The convolution layer parameters of Block1 are [32 * 1, 8, 4], where 32 * 1 represents the dimension of one-dimensional convolution kernel, convolution depth is 8, and sliding step is 4. The convolution parameters of Block2 to Block6 are [3 * 1, 16, 1], [3 * 1, 32, 1], [3 * 1, 64, 1], [3 * 1128, 1], [3 * 1128, 1]. The pooling parameters of Block1 are [2 * 1, 8, 2], where 2 * 1 represents the dimension of the pooling kernel, pooling depth is 8, and sliding step is 2. The pooled layer parameters from Block2 to Block6 are [2 * 1, 16, 2], [2 * 1, 32, 2], [2 * 1, 64, 2], [2 * 1128, 2], [2 * 1, 16, 2]. The pooled output of Block6 is used as the input of Dropout, and the rejection rate is set to 50% to slow down the over fitting of the model. The number of neurons in the full junction layer is 100, and they are connected to 10 categories.
3.1. Data Fusion
This paper aims at the fault diagnosis problem of robot sensor and actuator, so there are many kinds of input data that contain a variety of fault information. Therefore, this paper adopts a data fusion scheme to unify the sensor fault and actuator fault in one expression, so that the output of the fused model contains both sensor and actuator fault characteristics. The first thing to do is to establish the mathematical model of sensor and actuator respectively. According to the laws of mechanics, the mathematical model of the actuator is as follows.
where,
, , , and are state variables of angular position and angular velocity.
From Equation (9), the torque equation of the actuator can be obtained, as follows.
where
is the vector for torque with a dimension of
n;
D(
q) and
are square matrices with the dimension of
n, denoting inertia matrix and Coriolis force matrix, respectively; and
is the gravity moment vector.
ρ ∊ [0,1] is the effective factor of the actuator.
ρ = 0 means the actuator is completely broken.
fa is the bias term and its value is positively correlated with the degree of actuator damage. The actuator faults with different combinations of
ρ and
fa are listed in
Table 1.
It could be seen from Equation (9) that the output
y variable of the system
S1 is the state variable
x multiplied by the coefficient matrix
E. Thus, the robot sensor fault can be directly expressed by the output equation, as follows.
where
λ ∊ [0,1] is the effective factor of the sensor.
λ = 0 means the sensor does not work anymore.
fb is the bias term and is positively correlated with the degree of sensor damage. The sensor faults with different combinations of
λ and
fb are listed in
Table 2.
From the robot joint sensor fault model and actuator fault model, it could be seen that both of them affect the system in a different way. However, through the model derivation and transformation, the sensor fault can be transformed into an actuator fault through the first-order filter [
38], which simplifies the model. The sensor and actuator fault data are fused according to the following formula.
where
Fault denotes fault data set needed, Δ
Sensor represents the difference between sensor output and settings,
Actuator is the output of actuator,
a and
b are sensor and actuator faults coefficient respectively. Equation (12) has now unified two kinds of fault from sensor and actuator, which helps to obtain training data sets.
The robot sensor and actuator mixed faults studied in this paper are listed in
Table 3. F1 to F10 will be used to represent different fault labels in the following description for ease of use.
3.2. Training of Model and Diagnosis
The fault diagnosis model needs to be well trained before it is used to realize fault diagnosis. The basic training process of the fault diagnosis model based on the DCNN proposed in this paper is as follows.
Fused data containing single or mixed fault of sensor and actuator is input into Block1 (refer to
Figure 8). Convolution layer1 uses kernel1 to carry out convolution operations. The result of the convolution operation is input into the batch standardization module, and the extracted data features are standardized to make the extracted feature data conform to the standard normal distribution. The activation function is then used to activate the neurons. The activation function used here is the ReLU function, and it owns the fine linear property, which overcomes the saturation effect of using Sigmoid or Tanh functions.
Finally, the activated features are input into the pooling layer. The pooling method used here is the maximum pooling method, which can extract the maximum features. The above operation completes the main steps of Block1. The output of Block1 is propagated to Block2, and the above steps conducted in Block1 are repeated until all six Blocks finished the corresponding operation.
After feature extraction is finished, the data should flow into the fully connected layer. however, in order to speed up the training process, the dropout layer is introduced to inactivate some neurons with a certain probability, and then the rest of the neurons come into the fully connected layer, and finally the fault diagnosis model gives the prediction labels.
In the training stage of the model, updating the network weight parameters is key.
Figure 9 gives the flowchart of the parameters updating process, where the network outputs are evaluated using the loss function to determine the direction of parameter update. After the model is well trained, the real-time data from the robot sensor and actuator can be propagated into the fault diagnosis model to realize fault diagnosis.
4. Experiment and Analysis
Data needed is based on multi-joint cooperative robot AUBO i3, with a DOF of 6, and six revolute joints with a maximum working radius of 625 mm. Our Matlab model is constructed based on the above platform. There are several position sensors and one actuator for each joint, so it is necessary to monitor their working state. Experiments using different fault diagnosis methods are conducted to reveal the effectiveness of the proposed DCNN. A PC with i7-10510U 1.8Ghz processor and 16G of RAM is used. The PyCharm software is installed, and combined with a Python3.6 interpreter. All the algorithms are implemented on Keras with Tensorflow as its backend.
The basic architecture of several experiments is shown in
Figure 10, where the fused data is expanded through the data set enhancement method, and then several neural network fault diagnosis methods are investigated and their results are compared.
4.1. Data Sets Enhancement
Considering that the amount of fault sample is not enough in a real robot joint system, in this paper, the data set enhancement method is used to expand the fault sample data sets and improve the generalization ability of the model. In order to expand the acquired robot joint fault data, a sliding sampling data set enhancement method is proposed in this paper, and the schematic diagram of data set enhancement is shown in
Figure 11.
The system data over a period is obtained, and a sample data segment with
N1 points is needed for single training. Assuming that the length of the data obtained is
N, then the network could be trained by
N/
N1 times according to the above method. In order to expand the coefficient of utilization for data, the start point of the second data set is
h backwards compared to the first one, and the rest is roughly the same. The difference of samples before and after the data set enhancement method is used is shown in the following equation.
It is obvious that when the sliding step size is small, which means h is quite large, more data samples could be obtained, which can well meet the needs of data sets in the training process. Here in our research, the sliding step selected is h = 29, and thus we could obtain plenty of data samples for model training and validation.
We have constructed a robot fault model in MATLAB/Simulink according to Equation (12), and the fault data needed in our study is obtained. In the process of data acquisition, the sampling rate is set to1000 hertz, and the sampling time is 8 s, so 8000 sample points for each kind of conditions are obtained, see
Figure 12. The aforementioned data set enhancement method is adopted, so we can get 2000 samples for each kind of conditions. The 2000 samples are divided into three parts, and they are training, verification, and testing samples respectively, and the proportion of the above samples are 70%, 20%, and 10%, which would be used in the later model training and verification process. It should be noted that the input data set is one-dimensional, which is different from the conventional two-dimensional image data.
From
Figure 12, it is very clear to see that some of the fault types could be easily distinguished, such as F1, F3; F2, F9. while some of them could not, such as F3, F4. When the “F3” fault happens, if the diagnosis model output is “F4”, it may bring adverse effects. Thus, realizing high accuracy prediction makes sense.
4.2. Hyper Parameters of DCNN
Hyper parameters of the neural network include learning rate, regularization parameters, and iterations. Actually, these hyper parameters control the values of the weight coefficient. According to existing researches, hyper parameters in deep learning algorithms not only affect the performance of the algorithm itself, but also affect the expression ability of the trained model. This paper takes the advice of Bengio [
39]. The corresponding hyper parameters are set according to whether these hyper parameters can increase or decrease the capacity of the model.
In the process of parameter updating, the exponential decay learning rate is adopted. At first, a large learning rate is used to get the optimal solution quickly, and then the learning rate is gradually reduced to keep the model stable in the later stage of training. The initial learning rate
is set to 0.2, and the decay learning rate
is set to 0.99, so the decay rate is updated per round. The expression for decay rate is as follows.
where
denotes exponential decay learning rate,
H stands for the number of the current round, and
L represents the turns the decay should be executed once,
Batch_k is the number of iterations. When a complete data set passes through the neural network once and then returns, this process is called
Epoch.
In order to alleviate the overfitting of the neural network, the
l2 regularization method is used in this paper. Regularization is to introduce the model complexity index into the loss function and suppress the noise in the training data set by weighting the parameters in the neural network. The expression of the loss function is as follows.
where
Loss_all represents the loss function of all parameters in the model,
REGULARIZER is the regularization weight,
w generally refers to the weight in the forward propagation of neural network, and
Loss(w) is the result of
regularization of parameter
w.
The Adma optimal algorithm is used [
40] and the procedure of weight updating is as follows.
Step 1: give the iteration step .
Step 2: set the decay rate for matrix calculation, .
Step 3: Determine the convergence threshold .
Step 4: Initialize network weight , 1st and 2st moment variables , and set .
Step 5: Set the simulation time step to 0.0001.
Step 6: Small data set with m samples are collected from the training set, using {x(1),x(2),…,x(m)}. to denote it, and set corresponding goals .
Step 7: Calculate gradient variable , and update biased first moment estimation as well as biased second moment estimation .
Step 8: Correct the deviation of the first moment and deviation of the second moment .
Step 9: Calculate incremental weight error , and update it to the network weight .
Step 10: If the convergence threshold in Step 3 is not met, then, back to Step 6, otherwise end the iterative process.
4.3. Simulation and Results
In order to verify the feasibility and effectiveness of the DCNN used in this paper for robot joint sensor and actuator fault diagnosis, the ANN, SVM, CNN, and LTMN methods are studied for comparative analysis and verification. The diagnosis accuracy of the different network is shown in
Figure 13.
As shown in
Table 3, There are nine fault states for sensor and actuator. For each fault state F1 to F9, 2000 samples are obtained and then divided into training, testing, and verification data subset. The accuracy of different diagnosis methods is summarized in
Figure 13. It can be seen that the accuracy of fault recognition using DCNN is significantly improved compared with ANN, SVM. The average accuracy of the DCNN in the training process is over 99%. However, it is worth noting that the lowest accuracy of the five methods appears in F5. It could be interpreted that F5 is a mixed fault of sensor and actuator, and
Figure 12 shows that there is no obvious difference between the fault curves of sensor and actuator, so the fault diagnosis models cannot effectively tell the difference between them.
The confusion matrix of DCNN used on robot sensor and actuator fault diagnosis is shown in
Figure 14. It could be seen that none of the fault types could be 100% recognized, and meanwhile, confusion matrix data shows that between that “Misjudgment” categories, their waveforms seem alike.
Further experiment research is conducted to compare the fault diagnosis effect of CNN, LTMN, and DCNN on robot joint sensor and actuator faults, and the accuracy and loss function figure of each diagnosis method in the training set and testing set are drawn with the help of TensorFlow.
Three of five fault diagnosis methods with an accuracy over 90% are investigated. From
Figure 15, it can be seen that the LTMN and DCNN could realize a fault recognition rate of 100% on both training and testing data sets, and there is no gap between training and testing loss function, which shows the robustness of LTMN and DCNN.
The model training time and accuracy are shown in
Figure 16. This shows DCNN needs fewer training time but still gets the maximum diagnosis accuracy, which proves the high performance of the proposed DCNN fault diagnosis method.
The initial value of the neural network is randomly given, in order to eliminate the influence of accidental factors. This paper uses the cross method to train the model 10 times, and each time the network is initialized with a random value. The accuracy of CNN, LTMN, and DCNN are shown in
Figure 17.
As can be seen from
Figure 17, the recognition accuracy of DCNN in each experiment is more than 99%, and its lowest accuracy is 99.6%. Thus, the model initial value has little effect on fault diagnosis accuracy, which also proves the robustness of DCNN.