1. Introduction
Urban drainage networks (UDN) collect and carry both urban wastewater and that which comes from precipitation to wastewater treatment plants (WWTPs) for treatment before being discharged into the environment, constituting a combined urban drainage system (CUDS). During periods of heavy rain, the residual water resulting from the mixing can overload the urban system and produce overflows (combined system overflows, CSOs) that can be harmful to the environment. To avoid CSOs, the current UDNs have retention systems capable of storing the water that reaches the network in times of intense rain and later releasing the stored volume at lower flows suitable for treatment by the WWTPs. Adequate real-time control (RTC) of the volume of water stored in the tanks can significantly improve the operation of the network to minimize the impact of CSOs [
1,
2,
3,
4].
Among the techniques used for optimal control of these systems, those that use a simplified model of the process to predict its behavior stand out. This is how Model Predictive Control (MPC) works [
5]. MPCs are part of a control methodology that uses a process prediction model to calculate the manipulated variables over a future horizon to optimize a certain cost function. It is an algorithm that has been successfully implemented for several decades and has also been applied to UDNs with great success [
6,
7,
8,
9].
On the other hand, urban water treatment systems (UWS), which integrate both UDNs and WWTPs, have a high degree of interconnection and their proper functioning depends on the reliability of the equipment used, such as sensors (flowmeters, level sensors) and actuators (pumping stations, gates, valves) and communication systems. The environmental conditions surrounding this equipment can cause its deterioration and malfunction. For this reason, it is necessary to develop Fault Tolerant Control Systems (FTCS) to maintain safe and efficient operation. In this way, a Fault-Tolerant Controller (FTC) is one that can achieve control objectives even though faults may exist, which can lead to a reduction in system performance [
10,
11,
12]. Fault-tolerant control takes advantage of the physical and analytical redundancies of the system to increase its performance when an element suffers a malfunction. Furthermore, rapid detection and identification of a fault can help avoid serious and even dangerous breakdowns.
Generally, FTCS can be classified into two types: passive (PFTCS) and active (AFTCS). AFTCS react to system component faults by actively reconfiguring control actions so that stability and performance are acceptable, even if performance has degraded [
11,
13]. Normally, AFTCS consist of four subsystems: (1) a reconfigurable controller, (2) a fault detection and diagnosis scheme (FDD), (3) a controller reconfiguration mechanism, and (4) a command/reference governor.
Existing FDD approaches can be generally classified into two categories: (1) model-based scheme and (2) data-driven (model-free) scheme [
14,
15]. Data-driven schemes are divided mainly into two approaches: the multivariate statistical process control methods (MSPC) and machine learning (ML) methodologies. In the first case, the most applied methodology is the principal component analysis (PCA) [
16,
17,
18,
19,
20]. The second approach considered is the machine learning or artificial intelligence techniques [
21,
22,
23,
24,
25]. Furthermore, deep learning strategies have become increasingly popular in the face of complex nonlinearity and can be used for modeling, control, or management of WWTPs as can be seen in [
26,
27,
28,
29,
30,
31]; however, very few studies have addressed the fault detection problems in sewer networks.
For sewer systems, some methods are based on data analysis [
32]. Furthermore, there are methods using closed-circuit television (CCTV) inspections and artificial intelligence to classify defects automatically [
33]. Others are based on state estimation, using, for example, a Luenberger observer [
34], or determination of normal operating ranges for sensor and actuators [
35,
36]. Often, the controller used is an MPC, so it would be a fault-tolerant model predictive control (FTCMPC) [
35,
36], but no work has been found in which the PCA technique has been applied to the fault detection and diagnosis in a sewer network.
Due to complex physical and chemical processes as well as changing operating conditions and the nonlinearity of the sewer networks, this technique could be applied successfully to this process.
The benchmark considered as a case study is described in [
37]. The main problem of this system is the high variability of the disturbances (collected flows in each area) that affect the process. Among the whole set of data that the benchmark realistically integrates, there are time intervals of weeks in which rainfall is very low, which means that the control system, even when working properly, has little influence on the performance of the system. In this case, a fault of any sensor or actuator, even a large one, would be virtually undetectable, although it would also have little impact on the system.
Something similar occurs in the complete opposite situation: if very heavy and repeated rainfall occurs, which can saturate the sewer network, the control system will not prevent overflows in the network, which can become important even if it works properly, and if a fault occurs in any equipment, it will have little impact on the system and therefore its detection and classification will be more difficult.
These same reasons lead us to think that when any type of fault occurs, if it is not very significant, it will most likely go unnoticed. Benchmark simulation tests demonstrate this.
Consequently, it is advisable to focus on intermediate situations, i.e., situations after times of moderately high-intensity rainfall, or at longer time intervals when medium-intensity rainfall occurs, but with more continuity. It has been found that it is in these cases that the MPC controller is most useful in reducing overflows at different points in the network and in keeping the inflow to the treatment plant closer to its nominal value.
The main contribution of this work consists of the development of a real-time online FTMPC applied to the UDN system considered, consisting of three subsystems: a fault detection system based on an adaptive online PCA moving data window technique, capable of providing a real-time fault monitoring solution for the sewerage system despite the dynamically changing properties of the system; a fault diagnosis system, which will classify the detected fault through statistical calculations that will identify the variable that deviates the most from its normal behavior; and finally, a system for reconfiguring the MPC controller, taking advantage of its constraint handling capability, to try to maintain control over the whole plant, minimizing the effects of the fault. Several case studies with different disturbance profiles will be analyzed. The results have been compared with the behavior of the system without control, with the normal MPC control algorithm and with different fault situations without reconfiguration of the system.
This article is structured as follows: after an introduction, the theoretical description of mathematical algorithms that will be used begins, i.e., MPC and PCA. Afterwards, the fault detection, diagnosis and reconfiguration methodology will be detailed. The following section exposes a case study where previous methodology will be applied: first, the sewer system description is presented, and then, the MPC control algorithm and the FTC system that will be applied. Next, the results obtained in each case will be shown to finish with the conclusions of the work.
5. Results and Discussion
Three scenarios have been considered, extracted from the data time-series of the benchmark, in which the flow variations are more significant according to the reasons explained in the introduction of the article. The first scenario provides the neural network training data that will be used to generate normal online operating data. The second and third scenarios will serve to evaluate the fault detection and diagnosis system, as well as its reconfiguration.
To perform the simulation tests, the weights of the MPC cost function (non-null elements of matrices
Q(
k) and
R) have been adjusted for Equation (17) and are shown in
Table 1:
The model system parameters are shown in
Table 2:
5.1. Training and Validation of the Neural Network
Figure 4 represents the inlet flows to the sewer network collected in each of the catchment areas considered, due to precipitation and wastewater, over a period of 10 days (scenario 1). It shows the training data profile of the neural network extracted from the benchmark:
Then, to validate the trained network, the data provided by the system and the neural network will be compared with input data corresponding to scenarios 2 and 3, shown in
Figure 5 and
Figure 6, respectively:
Below are some results related to the evaluation of the trained network.
Figure 7 and
Figure 8 show the reservoir levels provided by the system and by the neural network under normal operating conditions with input data from scenario 2. Similarly,
Figure 9 and
Figure 10 depict the same levels with input data from scenario 3.
It is found that the results provided by the network in both cases largely match those generated by the system, so the network can be used to generate normal operating data based on the disturbances affecting the system.
5.2. Fault Detection and Diagnosis Tests
As exposed in
Section 4.3, the faults under study will be:
- -
Faults in level sensors: sensor gain is reduced to 10% of its nominal value.
- -
Faults in actuators: gate is blocked at 20% of its total opening.
These faults will be provoked on the second, fifth or eighth day of a 10-day simulation interval, considering both scenarios 2 and 3.
Previously, it has been verified that in the absence of faults, the detection system does not detect any type of fault, although it presents an alarm rate for scenarios 2 and 3 of 9.57% and 11.13%, respectively, but since 20 consecutive alarms are necessary to consider a fault situation, none are detected.
The following graphics show the calculated Q threshold and the Q statistic in the absence of faults for scenarios 2 and 3:
Both
Figure 11 and
Figure 12 show that the
Q statistic sometimes exceeds the calculated threshold, but no fault is detected because 20 consecutive alarms are necessary.
The results obtained in different fault situations are shown in
Table 3 and
Table 4 for scenarios 2 and 3. The tables show the detection results (detection instant) and the diagnosis for the type of fault considered (fault variable:
hi, is the tank
i level;
ui is the tank
i output flow rate). The correct diagnosis is highlighted in green:
In terms of fault detection, both scenarios show that all faults are detected relatively quickly (almost all before the next day). It must be considered, as already mentioned, that these are large faults. In the tests performed with less significant faults, detection was considerably delayed with respect to the time of fault generation, and there were some cases in which the fault was not detected at all.
Regarding fault classification, success is variable depending on the proximity of the disturbances to the moment of generation of the fault, as well as their magnitude and their frequency. As these characteristics are highly variable, success in detecting the fault is also variable. Further investigations must be improved by fault classification.
5.3. Fault Detection, Diagnosis and MPC Reconfiguration Tests
In this section, to assess the MPC reconfiguration performance, comparative results of the control system in four cases will be shown:
- -
Case 1: sewer network without control, that is, always with all the gates open.
- -
Case 2: sewer network controlled with MPC in the absence of faults.
- -
Case 3: sewer network controlled with MPC in the presence of a certain fault.
- -
Case 4: sewer controlled with reconfigured MPC (FTMPC). Once the fault is correctly detected and identified, the controller is reconfigured to improve system performance compared to the previous case.
In each case, scenarios 2 and 3 will be considered. Two of the most representative faults have been selected:
- -
Fault in the tank 1 level sensor, in which its gain is reduced to 10% of its normal value.
- -
Fault in the tank 3 gate, which is supposed to be blocked at 20% of its total opening.
Furthermore, to better evaluate the effect of the fault and the reconfiguration of the system, it will be assumed that, in all cases, the fault is generated on the second day of the 10-day simulation period considered for each scenario.
The performance evaluation criteria will be the same as those detailed in [
40]. In summary, these criteria are number of overflows (
Nov), duration of overflow (
Tov) in
days, volume overflowed (
Vov) in m
3, degree of utilization of WWTP (
Gu) in
%, and smoothness in the application of control signals (
S) in m
3/d.
5.3.1. Scenario 2 Results
Figure 13 shows the
Q threshold and the
Q statistic calculated online for fault detection. Detection occurs when
Q exceeds the threshold 20 consecutive times.
The following table provides the comparative data of system performance, including normal operating situation, fault without reconfiguration, and fault with controller reconfiguration (FTMPC). Note that the sewer configuration causes the overflowed volume in tank 1,
Vov,1 returns to the network, and for this reason, it is not added to
Vov in all tables [
40].
For simplicity, the main indices to be considered are
Vov,
Gu and
S. As can be seen in
Table 5, a normal MPC controller offers the best performance because the total overflow is the lowest as well as the smoothness in the control actions and provides the highest degree of utilization of the WWTP. MPC with the fault considered reduces the system performance, worsening all indices, but by comparing with no control case, its performance is much better since
Gu is 57.92% vs. 53.96%,
Vov is 3.8003 × 10
4 vs. 6.8473 × 10
4 (m
3).
Finally, by comparing an MPC with a fault with FTCMPC, this one improves the system performance since the total overflow volume is reduced from 3.8003 × 104 to 3.4520 × 104 (m3) and the degree of WWTP utilization increases from 57.92% to 58.55%, although S is worse because the system needs greater control efforts. Therefore, this reconfiguration strategy improves system performance when this fault occurs.
Figure 14 shows the
Q threshold calculated online and the value of the
Q statistic for fault detection. Detection happens when
Q exceeds the threshold 20 consecutive times.
Table 6 provides the comparative data of system performance in each case for scenario 2.
Considering the main indices (Vov, Gu and S), the results discussion is like the previous section, with the normal MPC controller having the best performance. Furthermore, MPC with the fault considered reduces the system performance, but by comparing with no control case, its performance is much better since Gu is 58.49% vs. 53.96%, Vov is 3.5235 × 104 vs. 6.8473 × 104 (m3). Finally, by comparing the MPC with a fault with FTCMPC, this one improves the system performance slightly since the total overflow volume is reduced from 3.5235 × 104 to 3.5162 × 104 and the degree of WWTP utilization increases from 58.49% to 58.52%, but S is worse because the system needs higher control efforts.
5.3.2. Scenario 3 Results
Fault in the tank 1 level sensor: alarm percentage before a fault detection: 2.44%. Detection instant: 2.26 days. MPC reconfiguration is performed in the same way as in
Section 5.3.1 for the tank 1 level sensor.
Figure 15 shows the
Q threshold and the value of the
Q statistic calculated online. Detection occurs when
Q exceeds the threshold 20 consecutive times.
Table 7 provides the comparative data of system performance in every case. These results allow to achieve the same conclusions as the scenario 2 results:
Regarding the main indices (Vov, Gu and S), the results discussion is like previous cases, with the normal MPC controller having the best performance. Furthermore, the MPC with the fault considered reduces the system performance, but by comparing with no control case, its performance is better since Gu is 67.28% vs. 61.67%, Vov is 4.77 × 104 vs. 8.74 × 104 (m3). Furthermore, by comparing the MPC with a fault with FTCMPC, this one improves the system performance slightly since the total overflow volume is reduced from 4.77 × 104 to 4.41 × 104 and the degree of WWTP utilization increases from 67.28% to 68.81%, but S is worse because the system needs greater control efforts.
Fault in the tank 3 gate: alarm percentage before a fault detection: 3.05%. Detection instant: 2.271 days. MPC reconfiguration is performed in the same way as in
Section 5.3.1 for the tank 3 gate.
Figure 16 shows the
Q threshold calculated online and the value of the
Q statistic for fault detection and allows knowing the detection instant. Detection occurs when
Q exceeds the threshold 20 consecutive times.
Table 8 presents the comparative data of system performance in each case. Looking at the main indicators, you can see the same behavior as in previous cases. For instance, by comparison of an MPC with a fault with FTCMPC,
Gu is 67.28% vs. 68.81% and
Vov is 4.6066 × 10
4 vs. 4.5926 × 10
4 (m
3). Therefore, FTCMPC improves the performance of the system.
6. Conclusions
In this paper, a methodology for fault detection and diagnosis in certain types of sensors and actuators of a wastewater sewer network, based on an adaptive PCA technique, has been presented and analyzed. Due to the peculiar characteristics of the system, subject to strong disturbances of high variability, only large faults have been detected and classified since low-intensity faults do not affect the performance of the system very much and, therefore, their detection is more difficult. Even so, the detection algorithm used has managed to detect faults in different elements and in different scenarios, with some speed and reliability. Regarding the diagnosis of the detected faults, its classification is very difficult, not only because of the disturbances in the system but also because the set point of the flow regulators is constantly being recalculated; therefore, the results obtained can be improved and it is intended to continue in this sense with the work started. For cases in which both detection and diagnosis have been successful, the MPC reconfiguration strategies show an improvement in system performance compared to that obtained when a malfunction situation occurs, and the controller is not acted upon. On the other hand, by its own structure, MPC facilitates the reconfiguration process when a fault occurs, for instance, by adding a new constraint to the optimization problem. Therefore, MPC reconfiguration is usually easy to implement, and many systems combine both strategies FTC and MPC, using a FTCMPC. Finally, although this FTCMPC controller has been designed for a sewer system, it can be easily adapted to other types of processes that present the same difficulties.