1. Introduction
With the rapid development of sensing and communication technologies, modern engineering systems are increasingly networked and distributed [
1]. Further, the large-scale distributed systems such as power grid and vehicle platooning are generally interconnected, physically or informationally [
2,
3,
4,
5]. These kinds of systems are thus referred to as distributed interconnected systems, which are composed of several subsystems in different locations through coupling mechanisms. On the other hand, the increasing size and complexity of distributed interconnected systems makes the occurrence of faults easier. Besides, due to the characteristics of interconnection, the fault diagnosis for distributed interconnected systems is challenging as an incipient fault occurring in any subsystem can potentially propagate from one subsystem to another and even result in the collapse of the whole system. The research on fault diagnosis and fault-tolerant control for distributed interconnected systems is receiving remarkable attention [
6,
7,
8,
9].
For the most part, the fault diagnosis approaches for distributed interconnected systems can be divided, according to the information used by the diagnostic units, into three categories: centralized, decentralized, and distributed fault diagnosis [
10]. The centralized fault diagnosis approach employs a centralized diagnostic unit to collect the information of the whole system and then conducts fault diagnosis for all subsystems. In [
11], an interconnected system with disconnected interconnections and packet dropouts was augmented into a switched system, and then a centralized robust fault detection filter was further designed. Obviously, the centralized approach requires high computation as well as communication and is not easy to expand, so it is not suitable for large-scale distributed interconnected systems. In the decentralized fault diagnosis approach, each subsystem is equipped with a local diagnostic unit which diagnoses its own faults with its own information; thus, the approach to interconnections among subsystems is of great importance. The conventional idea is to regard interconnections as external disturbances and design robust local observers to make local residuals insensitive to interconnections. In [
12], a decentralized sensor fault isolation approach was investigated for a class of large-scale interconnected nonlinear systems. Some prior known reference signals were utilized to estimate interconnections and then the maximum impact of the estimation error on the local residual was assessed for an adaptive threshold setting. Although the decentralized approach does not need to consider information transmission or security issues, because of no interactions among subsystems, it is conservative to think of interconnections as disturbances or estimate interconnections based on prior information, which can result in a low fault detection rate.
In the distributed fault diagnosis approach, the local diagnostic unit for each subsystem can not only use information about the subsystem itself, but also interact with other subsystems. In other words, the distributed fault diagnosis approach can improve the fault diagnosis performance by partial information interaction which can obtain interconnection characteristics, and thus has received much more attention [
13,
14,
15,
16,
17]. Fault isolation is the key to fault diagnosis for the distributed interconnected systems as interconnection characteristics can lead to fault propagation. In [
15], fault isolation for a class of fuzzy interconnected systems was considered for the first time under the framework of interval observers, and piecewise interval observers were constructed to characterize the unknown interconnections among subsystems and the residual intervals were further used to realize fault isolation. In [
16], fault isolation was achieved through the decoupling method. The unknown input consisting of faults in the other subsystems and disturbance in the whole system could be partitioned into the decoupled and the non-decoupled part, and a bank of finite-frequency
unknown input observers were further constructed. Moreover, a set of linear matrix inequalities were also used to ensure that the generated residual was sensitive to the fault, while remaining robust against the unknown input. Although the method provides design with a degree of freedom, the appropriate computation capacity and resources are demanded. In addition, information interaction among subsystems can exacerbate fault propagation, especially for sensor faults. In [
17], the problem of fault isolation for sensor faults was studied. The influence of local and propagated sensor faults on the residuals was analyzed to realize distributed fault isolation for multiple sensor faults in interconnected systems. However, one of the main focuses in some distributed systems is to minimize the number of measurements shared among subsystems to reduce the communication cost. Meanwhile, different strategies including the fault-driven minimal structurally overdetermined set strategy, the minimal hitting set strategy, the equation-based strategy, and a set of fault-driven minimal structurally overdetermined sets strategies have been explored [
8,
18,
19,
20]. Distinguished from what has been mentioned above, the broadcasting communication is used for information exchange in our work, and the means of communication is not the focus.
Less research effort has been made, in comparison with fault diagnosis, to study fault-tolerant control for distributed interconnected systems. The traditional fault-tolerant control approach is to estimate the faults, or the changes of the subsystems caused by the faults, approximately via the adaptive or neural networked method, and then design the distributed or decentralized local fault-tolerant control law to compensate for the faults, so that the subsystems or the whole system can recover to an acceptable performance [
21,
22,
23]. In these schemes, fault-tolerant control only regulates the controllers of the faulty subsystems, so it is called independent fault-tolerant control. From the perspective of globally distributed interconnected systems, another fault-tolerant control approach, called cooperative fault-tolerant control, is to make full use of subsystems and the cooperative effect of their coupling mechanisms to ensure the performance of the faulty system. In [
24], a novel fault-tolerant control scheme for switched and interconnected nonlinear systems was designed to guarantee the stability of the state based on “fault-tolerant control Lyapunov–Barrier functions”. In [
25], the cycle-small-gain theorem was utilized to ensure the closed-loop stability of interconnected systems, and a fault-tolerant control scheme that considered both rigid and flexible component faults was proposed. However, the use of the small gain theorem generally leads to a conservative result, and the fault-tolerant objective is only to guarantee the stability of faulty systems. To the best of our knowledge, most investigations on fault-tolerant control for the distributed interconnected systems are limited to basic stability analysis, whereas other dynamic and static properties have not been covered in great detail.
Inspired by the above considerations, a distributed fault diagnosis and cooperative fault-tolerant control design framework for distributed interconnected systems is proposed in this paper. Specifically, the contributions of this paper are as follows:
A novel fault diagnosis framework, which is mainly composed of fault detection observers and fault isolation observers, is developed for a general class of distributed interconnected systems with actuator faults. By transmitting the state estimation information in the form of a broadcast communication and carrying out several decision logic schemes in the cloud processing unit based on the residuals to achieve fault detection, isolation, and estimation, the problem of fault propagation can be solved as well;
A cooperative fault-tolerant control scheme, where LQR controllers for the healthy subsystems and a cooperative fault-tolerant controller for the faulty subsystem are utilized respectively, is also proposed to guarantee the stability and performance of the whole system;
Different from the conventional isolation decision logic, the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with an actuator fault can be located where the residual value is less than the threshold rather than exceeding the threshold as usual.
This paper is organized as follows. In
Section 2, the framework of distributed fault diagnosis and cooperative fault-tolerant control is introduced briefly, followed by the corresponding design objective.
Section 3 presents the main results, including the design of fault detection observer, fault isolation observer, and cooperative fault-tolerant controller.
Section 4 is dedicated to the simulation of intelligent unmanned vehicle platooning to demonstrate the applicability and effectiveness of the proposed design scheme. Ultimately, some conclusions and possible future research directions are presented in
Section 5.
2. Problem Description
The design framework of distributed fault diagnosis and cooperative fault-tolerant control for distributed interconnected systems is depicted in
Figure 1 and mainly includes the monitoring and control units (MCUs) and cloud processing unit. The whole distributed interconnected system consists of
subsystems and is modeled as
where
denotes the state vector, with
the
subsystem state.
denotes the input vector, with
the
ith subsystem input.
denotes the output vector, with
the
subsystem output.
.
represents the actuator failure to be isolated, with
the
ith actuator failure.
stands for the process noise, with
the
ith subsystem process noise.
denotes the measurement noise, with
the
ith subsystem measurement noise.
,
,
,
,
and
in Equation (1) can be decomposed into
The
ith subsystem can be further given as
where
is the set of all subsystems and
is the set of subsystems other than the
ith subsystem.
and
represent the
jth (
) subsystem state and
lth subsystem input. Note that
denotes the actuator failure and, in general,
.
It can be found that each subsystem is equipped with an MCU which consists of the following components:
- (1)
A fault detection observer (FDO), which is governed by
where
and
are the state estimations of the
ith and
jth subsystem respectively.
represents the output estimation of the
ith subsystem.
is the detection observer gain and
stands for the residual of the
ith subsystem generated by the FDO.
- (2)
A fault isolation observer (FIO), which is activated when there is an alarm provided by the corresponding FDO and can be described by
where
and
is the state estimation of subsystem given by the FIO.
is the isolation observer gain of the
ith subsystem.
represents the corresponding output estimation and
is the residual of the
ith subsystem generated by the FIO.
stands for the fault estimation and
is the weighting matrix.
- (3)
A controller, which can keep the faulty system stable and is constructed as
where
is a positive definite matrix, and
is the local optimal gain determined from a standard LQR Riccati equation.
is the real state estimation of
and is provided by the cloud processing unit.
represents the cooperative compensation vector from other subsystems in the faulty case.
The processing flow of the cloud processing unit is shown in detail in
Figure 2. It can be seen that the clouding processing unit shoulders the responsibility of receiving, processing, and broadcasting information. Further, it mainly perform three functions: (i) obtaining the state estimations
and
from the monitoring unit; (ii) accomplishing fault detection and isolation based on the residual signals and spreading results; and (iii) providing the corresponding state estimation to the control unit.
Based on this, the fault detection and isolation schemes in particular are given in
Figure 3. The conventional fault detection observer is used to detect whether a fault occurs, and a residual value exceeding the threshold indicates that there is a fault in the process. Meanwhile, the fault isolation observer based on the adaptive fault estimation method is adopted to achieve fault isolation by the combination of an unconventional isolation decision logic. Specifically, since the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with actuator faults can be located where the residual value is less than the threshold rather than exceed the threshold as usual. It is noteworthy that isolation decision logic in this paper is contrary to the detection decision logic and different from the conventional method [
26].
In this paper, the design objective is to locate the fault accurately and achieve a cooperative fault-tolerant control. Hence, this paper studies the design of a novel fault diagnosis framework, and the detection and isolation observer gain and , the controller gain , and the cooperative compensation vector .
Remark 1. It is worthwhile to note that only the single fault case is considered in this paper. Meanwhile, a cooperative controller with fault-tolerant ability is introduced to keep the faulty subsystem stable, and LQR controllers are employed so that the healthy subsystems, which may be affected by the faulty subsystem, can remain stable.
Remark 2. The subsystems in Figure 1 are physically interconnected. From the mathematical viewpoint, the physical interconnection can be seen from the state matrix .If the matrix ,the non-diagonal block of the matrix ,is not equal to zero, it means that the ithsubsystem and the jthsubsystem are physically interconnected. In addition, the monitoring units in Figure 1 are informationally interconnected. To be specific, the monitoring units acquire the state estimation information from other interconnected subsystems through broadcast communication, and all subsystems can use them once the state estimation information has been broadcast. Remark 3. Fault isolation is achieved by making use of the adaptive fault estimation observer which is not applicable for a sensor fault. This is the reason we do not consider a sensor fault. If the fault isolation observer based on the adaptive fault estimation method is replaced by some other sensor fault estimation observer, the problem of sensor fault diagnosis can be considered.
Remark 4. Similar to what has been given in [27], the necessary conditions for the existence of the observer arebeing observable andbeing observable, which can be guaranteed by the PBH rank criteriaandrespectively. 4. Simulation Example
In this section, the proposed fault diagnosis and cooperative fault-tolerant control scheme is applied to the simplified model of intelligent unmanned vehicle platooning [
29], which is shown in
Figure 4. A desired separation distance
between adjacent vehicles, and a desired average velocity
should be assigned under normal operating conditions. Furthermore, the variable
(
) represents the deviation from the desired separation distance while the variable
(
) represents the deviation from the desired velocity.
is the real separation distance between the
ith and
vehicle at time
, and
is the real velocity of the
ith vehicle at time
. Therefore, the state vector and output vector in Equation (1) are
and
respectively, where
and
.
The motion of each vehicle is characterized firstly by differential equations with the help of Newton’s second law, and then the state-space representation of the four vehicles platooning can be acquired through expanding the nonlinear term in a Taylor series expansion. The system matrices of the four vehicles platooning are given as follows:
According to Theorem 1, the four parameters
,
,
, and
are computed as 1.1339, 1.1339, 1.1339, and 1.0003 respectively, and the detection observer gains are as follows:
Then, by solving the condition in Theorem 2, we can obtain the
performance levels
,
, and the isolation observer parameters
Further, by solving Riccati Equation (9), the local optimal gains can be obtained as:
In the simulation, the process and measurement noise are assumed as
. Meanwhile, a fault has occurred in the 1st vehicle and is chosen as
The simulation results of fault detection are depicted in
Figure 5a–d. It can be observed that the 1st and 2nd vehicles generate alarm signals and the 3rd and 4th vehicles do not generate alarm signals. Thus, the fault detection logic table can be listed as follows:
The simulation results of fault detection are depicted in
Figure 5a–d. It can be observed that the 1st and 2nd vehicles generate alarm signals and the 3rd and 4th vehicles don’t generate alarm signals. So the fault detection logic table can be listed as
Table 1.
It indicates that the faulty vehicle is located in alarming vehicles and the 3rd and 4th vehicles are healthy because of no alarm signals. Hence, the fault set is defined as {the 1st vehicle, the 2nd vehicle}. The next step is to determine whether the faulty vehicle is the 1st vehicle or the 2nd vehicle. For this purpose, fault isolation for the 1st vehicle and the 2nd vehicle are carried out in turn.
The simulation results of fault isolation for the 1st vehicle and 2nd vehicle are shown in
Figure 6 and
Figure 7 respectively. It can be seen from
Figure 6a,b that the residual assessment values of the 1st and 2nd vehicle are both less than the threshold. However, the residual assessment values of the 1st and 2nd vehicle are both over the threshold in
Figure 7a,b after the occurrence of the fault. Based on this, the fault isolation logic table can be listed as
Table 2.
Combing the simulation results and the fault isolation logic, it can be found that the faulty vehicle is the 1st vehicle.
Meanwhile, it also can be found from
Figure 8 that the fault estimation value of the 1st vehicle can follow the fault value rapidly and accurately in a short time.
In order to guarantee the stability of the whole intelligent unmanned vehicle platooning, a cooperative controller with fault-tolerant ability is applied to the 1st vehicle and LQR controllers are used for the other three vehicles, and fault-tolerant results are further shown in
Figure 9. It is obvious that the malfunction of the 1st vehicle brings about fault propagation among intelligent unmanned vehicle platooning, so that the displacement curves of the other three vehicles are no longer parallel with each other for a period of time. However, the displacement curves of the four vehicles are parallel again under the action of cooperative fault-tolerant control, which demonstrates the effectiveness of the fault-tolerant control scheme proposed in this paper.