1. Introduction
Modern control systems are organized in a hierarchical structure. The process under control is situated at the lowest level of this hierarchy. The next level in the hierarchy is the instrumentation layer which enables upper levels to communicate with the process. The regulatory control layer is organized around basic univariate loops that most typically utilize PID-based control algorithms. The ubiquitous PID controllers represent more than 95% of the utilized algorithms [
1,
2]. Although widely used, the performance achievable with the PID-based control law is limited. More complicated controllers, such as multivariate, non-linear, predictive, adaptive or algorithms that incorporate soft computing techniques fall under the general category of advanced process control (APC). This class of APC algorithms can be used in conjunction with PID-based algorithms to extend the control capabilities and effectiveness of conventional PID-only based loops. The majority of model predictive control (MPC) implementations utilize this hybrid approach of incorporating a PID controller in conjunction with an MPC algorithm. However, in some cases MPC plays the role of a regulatory control without any downstream PID loops. The supervisory level typically incorporates a superset of capabilities of the regulatory level, such as process optimization, economic planning and long-term scheduling.
Industrial control systems frequently do not perform effectively [
3] due to many reasons such as: inadequate supervision, process non-stationarity, instrumentation failures, incorrect design, poor tuning, changing operating points, lack of engineering expertise, disturbances, process and/or measurement noise or human factors [
4].
Efforts performed by plant personnel to remedy these issues are often ad hoc and typically prove to be ineffective. Scientists and practitioners continually strive to develop automatic and autonomous solutions to address these problems. Control performance assessment (CPA) efforts start with simple univariate PID-based loop assessments. A comprehensive report was proposed by Astrom [
5] in 1967, for a pulp and paper plant using process variable standard deviation benchmarking. Control assessment solutions have evolved in many directions over the past years and have successfully provided mature approaches, measures, and procedures to the industry.
A search of the literature yields various, and often overly complicated, methods and approaches to CPA. We believe that taking an industrial perspective can simplify the problem to a search for a pragmatic methodology that can be applied to industrial control systems. Simplicity of the CPA methodology is key to the effectiveness and success of any given methodology. Success is achieved by reducing the scope of required a priori knowledge necessary to employ a selected approach. Methods that do not require specific knowledge can be simply evaluated as they tend to deliver clear and simple results. These simpler methods do not disturb the process with any externally injected inputs. A further categorization of methods is data-driven vs. model-based. Data-driven methods only require operational plant data where model-based approaches always require some initial assumptions such as the model type or structure. The selected methodology must be robust in that it must be independent of the existing loop characteristics as well as the statistical properties of the assessed variables. The ultimate goal is to measure the internal control quality in a manner that is not affected by noise, disturbances or other possible plant influences of any origin [
6].
This paper focuses on a model-free approach based on the control error signal. The advantage of the controller error variable over other control loop signals is that it provides continuous information on the control quality and is independent of the process operating point or setpoint changes. Other control loop signals, such as the process variable, do not exhibit the same level of independence between different operating points and setpoint changes.
This paper introduces a methodology for the continuous, online assessment of the control performance. The majority of existing CPA approaches incorporate a one-time assessment with limited capability to provide continuous, online monitoring [
7]. This paper addresses the online method, and extends single-shot assessment methodologies.
Naturally, the performance of the control loop depends on the operating point and whether the process is operating in a steady-state or transient conditions. The rate of transient period changes affects the control performance. Moreover, the performance is affected by other external and internal issues, such as disturbances, non-stationarity, human operator interventions, process faults and many others. In general, CPA is complex and requires the assessment of potentially many indicators and may require a hybrid approach.
The following CPA indexes are considered: mean square error (MSE) and mean absolute error (MAE), moments of normal distribution, i.e., mean, standard deviation, skewness and kurtosis, robust median estimator and factors of -stable distribution, namely, stability exponent, skewness and scaling. These indices have been evaluated from a real plant (industrial power generation unit) data recorded over 20 days of normal operation. Its leading variables are confronted with the respective disturbances. The underlying relationships are presented, and signal normalization mechanisms are considered.
The following sections begin with an examination of related work, followed by a description of the reference plant used in the analysis. The data collected from the plant and utilized in the analysis as well as the performance indicators are described. The results are presented along with the derived conclusions and open issues for further research.
2. Related Work
It is well known that the effectiveness of individual control loops results in more efficient, reliable and profitable operations. This is true regardless of the industry. Despite this, many industrial control loops exhibit poor performance that may result from improper design and/or tuning. It is this industry fact that is the motivation for the methods and techniques for assessing the performance and quality of an individual control loop.
As mentioned earlier, there are two fundamental methodologies to this problem. One methodology is model-based and utilizes a model for the benchmarking evaluation process [
8]. The second methodology is data-driven and only utilizes data from the normal operation of the process. Some of the methods use specific experiments, while the others do not. As mentioned earlier, the success of any given CPA methodology depends largely on simplicity—i.e., the required a priori knowledge and the steps needed to obtain this information. The following paragraphs briefly describe some of the factors affecting simplicity [
6,
9]:
Methods necessitating an experiment [
7,
8,
10,
11]:
measures that use a setpoint step response: overshoot, undershoot, rise, peak and settling time, decay ratio, offset, peak value;
indexes that require a disturbance step response: idle index, area index, output index, R-index.
Model-based methods [
12,
13,
14]:
minimum variance Harris index, control performance index or variance benchmarking methods;
all types of model-based measures derived from close loop identification, such as aggressive/oscillatory and sluggishness indexes;
frequency methods starting from classical Nyquist, Bode or Nichols charts with phase and gain margins using further investigations, such as Fourier transform, sensitivity function, singular spectrum analysis, reference to disturbance ratio index, etc.;
alternative indexes using neural networks or support vector machines.
integral time measures: mean square error, integral absolute error, integral (square) time absolute value, integral of square time derivative of the control input, amplitude index, and total squared variation;
correlation measures, such as the relative damping index or the oscillation detection index;
statistical factors for various probabilistic distribution functions or variance band indexes;
benchmarking methods;
alternative indexes using orthogonal functions, wavelets, persistence measures, entropy, multi-fractals or fractional-order.
Hybrid or mixed approaches [
20,
21]:
fusion CPA measures with sensor combinations or the exponentially weighted moving averages calculated for various indexes;
pattern recognition and graphic visualization methods;
case-specific business key performance indicators, such as the number of alarms or human interventions, the time in manual mode and other currency-based indexes.
The research reported in the literature describes common industrial CPA (control performance assessment) applications [
10], e.g., pulp and paper plants and other process industries, mechanical engineering systems or heating ventilation and air conditioning units. Industrial applications introduce a new context of non-linearities, complexities, non-stationarities, and unknown properties with frequent outliers that result in time series distribution tails of the data.
The problem of monitoring the quality of regulation forms the basis for the optimization task, widely described in the literature [
22,
23].
3. Environment
This study was carried out on a new 1000 MW coal-fired power unit. In this study we analysed a control loop that was maintaining negative pressure in the combustion chamber, known in the industry as furnace draft control.
3.1. Control Structure
The flue gas fan (loop MV—manipulated variable) is controlled by a dedicated control structure (see
Figure 1) comprising a PID controller and associated interference decoupling components. The goal is to control the boiler vacuum (under pressure) being the loop CV (controlled variable). The vacuum setpoint is constant and the purpose of the control system is to adapt the fan operation to the changing operating conditions of the power unit as well as to compensate for disturbances. Due to the nature of the measured variables, the values of the variables are subject to constant change as the measurement is not smooth. Therefore, smoothing modules are inserted into control structure.
3.2. Data
Data recorded over 20 days in 2021 were analysed. During this period there was one major failure as well as preventative replacement of additional equipment. The repairs were carried out in March 2021. Four days in January (before the first repair) and 16 days between August and November (after the first repair) were analysed. In selecting the data an attempt was made to obtain data for a wide power range. Data were collected with a sampling period of 1 s. It is important to note that the control system was tuned before January 2021, and the settings were only modified once the failure occurred. That is, the control quality during the first four days of study should be similar. Some modifications were made to the adjustment system after the failure and replacement of the equipment in March. During August–November 2021, the tuning parameters were not further modified, so the expected control quality during the 16 days of study should be similar.
4. Control Quality Factors and Leading Signals
The quality indicators relate to specific features of the system response y(t) and the control error e(t). The most common quality indicators are:
In most cases, the analytical description of the object and the control system is not known. The plant is usually very complex and time-variant based on the operating point and degradation due to wear and tear of the equipment and plant. Furthermore, the process is often dependent on the operating conditions determined by the technological parameters and external conditions. All these factors introduce a very high uncertainty to the behaviour of the plant and make it impossible to apply stability testing techniques. In addition, for this reason it is difficult to apply a frequency response approach. The overshoot/undershoot test criteria are applicable in systems where the setpoint is changed. In continuous processes, it is typically desired to keep the process values at a selected and unchanging value. This work considers such examples and is based on integral indices and the statistical properties of error and control.
4.1. Integral Indices and Statistical Properties Used as Control Quality Metrics
The statistical metrics used in this paper are based on the error calculated as a subtract between the setpoint (STPT) and the process variable (PV). Data were collected with 1 s resolution. Metrics were calculated for one hour (for 3600 samples) and for one day (for 86,400). The following metrics were used in this study:
Metric interpretation is presented with the results in
Section 5.
4.2. Leading Signals and Disturbances
For a power-generating unit, the signal that defines the operating point is the power level produced by the unit. Associated with the power level is the fuel, air and steam flows resulting from the fuel. The system is multidimensional and the values of the variables are closely related to each other. Changes in the values of the signals result from the need to constantly adjust the controls to keep the process variables at set values and additionally from changes in the setpoint for power production. The reference load is the leading signal and all other signals that can be interpreted as disturbances. In the case of a power unit, consider the following as disturbances: air, fuel, and steam flows, disturbances in these values propagate further into larger control errors.
5. Results
The quality of control is difficult to determine. The error rates obtained depend on the specifics of the process. In processes where gas flow is maintained, the values change rapidly and fluctuate. In the process studied it is not physically possible to stabilize the vacuum at the setpoint value as oscillations are a typical feature of this control loop. This process shows that an absolute performance cannot be obtained. The quality of control must be measured relative to the process conditions. A certain level of nominal indicator values should be taken as the correct state (good quality) and deviations from this level should be interpreted either as a deterioration or an improvement in the quality of regulation. Indicators calculated for individual days and hours are presented in
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7 and are used to assess the quality of regulation.
The best results according to the criteria of absolute error, squared error and standard deviation were obtained on days 9 and 10. The worst results were collected on days 1 and 18. The largest differences were observed in hourly data for the mean squared error. The daily metrics were more stable. In the case of mean absolute error and standard deviation, the differences were 2–3 times, while for the mean squared error metric the differences are 4 times.
The differences between the results cannot be explained by changes in the equipment, because, as described earlier, four days from January (before the repair) and 16 days from August to November (after the maintenance) were analysed, accounting for variations in the quality from varying process conditions. Therefore, the index values must be normalized for objectivity.
The median and mean value of the control error gave good results in case of limitations, i.e., when the process value was constantly above or below the setpoint, and the error did not change sign. In a situation where the process oscillates around the setpoint, both the median and mean took the target value of 0.0 and did not reflect the large oscillations and poor performance.
We used the control error median and mean to illustrate the problems of setpoint tracking in cases where the system is operating in saturation. In a situation where the process oscillates around the setpoint, both the median and mean can take the ideal value of 0.0, not reflecting the large oscillations and low control quality.
Skewness illustrates problems with the symmetry of the distribution. Right skewness (positive) indicates that there are more observations below and fewer above the mean. Skewness (both positive and negative) is associated with tails in the times series distribution and indicates rare but significant over- or undershooting. In the example shown, low values of skewness are present and this indicates a symmetrical distribution. The highest values of skewness were recorded for days 6 and 7 and were below . Skewness, like mean and median, is a good indicator that the control action works within its limits.
The kurtosis shows the concentration of the data. In the example above, the kurtosis value was greater than 3 for each day, indicating a relatively small number of outliers in the measurements. Similar to skewness, kurtosis can be a good indicator of saturation conditions when there are problems with keeping process values at the setpoint (e.g., due to saturation of manipulated variables).
The parameters of the
-stable density function (
,
,
) confirm that the distribution is close to normal (
), with low skewness (
factor). The
shows the scatter of data in the same manner as the mean square error or standard deviation. These parameters determine the control quality in the most intuitive way, but their values are significantly different from the day to day values. Changes in the power signal cause variations in the operating point which introduces noise into the whole facility. By analysing the power waveforms and the errors in the vacuum maintenance system, a certain correlation of trends is observed. The noise and control signal characteristics were analysed and the sum of the signal changes for each hour and for each day were counted. The best fit to the control error is shown by the fuel change signal calculated for the day, as shown in
Figure 8.
The results over 20 days for the calculated mean absolute error (MAE) and mean squared error (MSE) are shown in
Table 1. The max/min ratio shows that the same metric can vary by almost 2 times for the MAE and by more than 4 times for the MSE. A simple normalization was proposed as a first step to reduce the variation. Both the MAE and MSE were normalized. For each day, the MAE and MSE were divided by the sum of the fuel signal changes from that day (
) and multiplied by the average value of fuel signal changes counted for 20 days (
), see Equation (
10).
Additionally, the normalized
is used as an analogical metric (
11)
Table 1 shows the minimum, maximum and standard deviation for 20 days, calculated for the normalized indexes
and
.
The value of the normalized metric has less variability for individual days. The max/min ratio is reduced to 1.5924 for
and 1.9882 for
. The results for each day are presented in
Figure 9.
Based on these early results, it can be concluded that normalization allows for more reliable monitoring of the regulation quality considering the variability of the operating conditions. The study used data recorded over 20 days and counted the average for the period. When running the proposed method in an online mode, the number of days should be long enough for the obtained average to be stable. The window length over which the average is calculated is a tuning parameter, and there is no optimal value for all cases. The value depends on the nature of the leading variables and how the process is run. The presented approach requires verification on other control loops and more data collection from the plant.
6. Conclusions
This study concludes that the values of the leading signals and disturbances should be considered in assessing the quality of the control loop. In a steady-state the process variables are much better maintained than in conditions where the operating point is subject to constant change. Therefore, the indicator should be normalized in relation to the leading variables and disturbances in order to obtain more objective results. In the case of a power generation unit, the main leading variable is the power setpoint at which the unit operates. The power setpoint determines the fuel, air and steam flows. These variables were also considered in the control quality assessment.
Disturbances in process values are mainly due to energy imbalances in the system, which most often occur during changes in operating points. Therefore, it is necessary to select leading signals, such as physical quantities, that reflect the energy changes of the object and, at the same time, determine the operating point. Furthermore, normalization can help minimize the effect of more significant causes leading to inaccurate conclusions about the decline in control quality.
Methods based on the control signals and error value (residual) characteristics were used in the control quality assessment. The relationship between the control performance and the sum of disturbances and leading variables were investigated. Basic metrics such as mean error, mean absolute error and mean squared error were used to analyse the control error. In addition, statistical measures in the form of higher degree statistical moments were applied using the differences in their estimation.
This research has shown that trying to define a fully universal metric that allows control quality to be monitored both under steady-state conditions and during operating point changes is a difficult task. Normalization helps to obtain more stable results but there seems to be room for improvement. The problem of how to normalize is the same as the problem of modelling, where the essence lies in the choice of model structure and identification of relevant signals carrying information. Therefore, in further research, researchers might try to use classical regression models using more available signals and mechanisms from the area of data mining and machine learning.
Research has confirmed that using the transfer entropy approach, it is not possible to analyse only one factor. There are many variables involved in proper analysis. Certainly, one should focus not only on the method itself and its properties, but also on the data. A helpful solution is data decomposition, and thus the analysis of signal components. The first step is to identify noises and oscillations and analyse the statistical properties of the residue. Various factors should be considered before selecting or adopting an appropriate methodology.
The preceding analysis serves as motivation for further investigation in this area. Addressing this research challenge will require a distinct approach, as simulations alone are insufficient. A comprehensive and comparative analysis should be performed on several industrial examples, taking into account the various complexities involved. While this task may be more challenging and demanding, it provides an opportunity to gain deeper insights into the problem and, with any luck, discover a viable solution.