1. Introduction
The Brazilian electrical matrix has good energy source diversity, particularly renewable sources. Hydroelectric energy is the primary energy source responsible for supplying the National Integrated System (SIN, in the Brazilian acronym), despite solar and wind generation growth in recent years [
1]. As most of the Brazilian power grid is supplied by energy sources impacted by rainfall regime and other climate variations, there is a need to activate thermoelectric power plants (TPPs) powered by fossil fuels to ensure SIN energy supply stabilization and flexibility [
2].
The National System Operator (ONS, in the Brazilian acronym) and the Electric Energy Trading Chamber (CCEE, in the Brazilian acronym) implemented the Very Short-Term Hydrothermal Dispatch Model (DESSEM, in the Brazilian acronym). This model aims to monitor each power-generating unit individually in real-time and program each operation in semi-hourly granularity with a one-week forecast horizon to improve Brazilian hydrothermal system planning operation efficiency and optimal generation per plant [
3,
4]. Therefore, power generation agents are responsible for ensuring TPP availability on a daily dispatch schedule, which has made it increasingly challenging to guarantee emergency energy generation proposed by DESSEM [
5,
6].
The Brazilian electrical energy supply system often requires fast TPP power output generation changes and uninterrupted operation for long periods. Thus, the general reliability of TPP engines powered by diesel/HFO can be impacted by a lack of ideal operational maintenance [
7]. One way to help the energy-generation monitoring process is to use intelligent tools based on data provided in real-time [
8,
9,
10]. Although most TPPs have monitoring platforms that capture data from various sensors to monitor power generation system health, secondary data-based decision aid tools are yet to be fully explored [
11,
12]. In this sense, time series forecasting techniques prove to be an exciting alternative to analyze possible very short-term generation scenarios and assist TPP’ operation and maintenance planning, especially on diesel/HFO TPPs, which usually operate on an availability basis and require quick actions to start the engine-driven generators and power generation.
Prediction models focused on observing energy generation patterns have been explored in several types of systems [
13,
14,
15,
16,
17,
18]. Typically, linear forecasting models are used as a benchmark during this analysis due to their ease of implementation and coefficient determination simplicity. Among those traditionally used are statistical models of the Box & Jenkins methodology [
19,
20]. However, despite being widely discussed in the literature, the coefficient calculation is often done using statistical tools such as maximum likelihood estimators that need to be improved to obtain models’ optimal values. An alternative to this problem is bio-inspired computational intelligence techniques such as particle swarm optimization (PSO) [
21,
22,
23].
Alternatively, given that these models can only assimilate linear series characteristics and real-world problems have both linear and nonlinear patterns that can be correlated with past values [
24], several machine learning techniques are being used to predict more complex behaviors. Among them, artificial neural networks (ANN) such as multilayer perceptron networks (MLP) are pretty efficient in capturing the nonlinearity caused by sudden variations, as well as observing the influence of a plant’s internal parameters on output parameters [
11,
13,
14,
25,
26]. Similarly, reinforcement techniques based on decision trees (DT) have lately been explored in classification and regression problems such as eXtreme Gradient Boosting (XGBoost) that can also be applied to time series forecasting problems to capture complex patterns [
27,
28].
Different time series forecasting techniques have recently been developed to perform increasingly accurate observations. TPP energy generation, fuel consumption, and gas emissions prediction are discussed in a wide range of works that apply the Box & Jenkins methodology as the primary analysis tool for short- and very short-term horizons. Autoregressive integrated and moving average (ARIMA) and infinite impulse response filters (IIR) recursive models adjustment capacity were studied by Siqueira et al. [
29]. Higher generalization was achieved using the genetic algorithm (GA), PSO, evolutionary differentiation (ED), CLONALG, and Opt-aiNet metaheuristics compared with non-recursive coefficients adjustment methods. Simulations proved superiority of recursive models in all scenarios when compared with the autoregressive (AR) model. This research did not determine which algorithm was the most suitable for the problem since ED, GA, and PSO achieved minimum mean-squared error values during the tests. This improvement can also be seen in the study of Rusli et al. [
30], who used ARIMA-PSO modeling to predict TPP coal consumption to carry out fuel stock planning. The ARIMA-PSO model application increased daily, weekly, and monthly coal need prediction precision by reducing the mean absolute percentage error (MAPE) by 4.34%, 4.91%, and 6.17%, respectively, when compared with the ARIMA model. However, despite the application of different optimizers for linear forecasting models adjustment, Siqueira et al. [
29] and Rusli et al. [
30] did not directly compare the metaheuristic application with nonlinear forecasting techniques that can achieve more accurate results depending on the analyzed series behavior.
PSO and GA bio-inspired algorithms can also be found in the study developed by Xu et al. [
6], who analyzed coal-fired TPP load dispatch. From historical data, it was possible to: determine key performance indicators through the grey correlation method, forecast each unit-specific consumption based on PSO and support vector machine (SVM) hybrid modeling regression, and determine optimal dispatch distribution in each of the generators based on GA. The hybrid modeling proposed by Xu et al. [
6] achieved a maximum coal saving capacity of 7.94 g/kWh when compared with power grid control through automatic generation control (AGC). Although this work used continuously measured supervision information system data from an actual coal-fired TPP, the analyzed model did not investigate each parameter’s temporal influence on coal consumption prediction. For diesel/HFO TPPs unit electricity cost predictions, it is possible to observe that multivariate models based on auxiliary information, such as multiple linear regression (MLR) with each parameter temporal influence analysis, can achieve better performance when compared with univariate modelings if the most significant inputs are selected to avoid multicollinearity [
31]. The developed time series regression model proposed by Weerasinghe and Jayasundara [
31] achieved drastic error metrics reduction with at least 73.69% improvement in root mean square error (RMSE) and 78.47% in mean absolute error (MAE) when compared with the ARIMA univariate model. This significant improvement ensured by the multivariate model can be related to the strong influence of exogenous variables on the unit electricity cost values. However, given diesel/HFO engines’ mechanical stationarity nature, univariate forecasting models application to investigate parameters such as fuel consumption can be a satisfactory alternative for maintenance management.
Nonlinear forecasting techniques are commonly used to capture more complex patterns than linear models [
32]. The application of machine learning-based predictors can be seen in the work of Tuttle et al. [
33] for NO
x gases prediction, whose ensembled modeling considers exchangeable weights ANN with hyper-parameters optimization through GA metaheuristic and PSO application to optimize system combustion parameters dynamically. Additionally, other well-established machine learning methods such as SVM, random forest (RF), and kernel partial least squares (KPLS) were directly compared with ANN hybrid modeling to determine the most suitable technique. The application of ANN in the closed-loop operation of combustion optimization systems for NO
x emission prediction proved to be superior due to the ability to accurately predict NO
x emission rates not evaluated in the training set. Additionally, Tuttle et al. [
33] achieved a 22.5% reduction in the NO
x emission rate when applying the developed system in real-time during a year and a half in a coal-fired TPP, which reinforces the capacity of artificial neural networks in capturing typically non-linear patterns. Analysis of internal combustion engine monitoring parameters has lately been carried out [
34]. Singh et al. [
35] studied a direct injection engine powered by different proportions of biodiesel, injection times, and air/fuel ratio. This analysis used MLP to provide systems thermal efficiency and emission levels predictions. Harris Hawks (HH) and whale optimization algorithm (WOA) metaheuristics were used to determine the most significant ANN inputs. However, these techniques were not compared with well-established methods of determining ANN’s hyper-parameters, such as grid-search or random-search, which can be computationally more efficient when working with a large amount of data and more robust models. Additionally, it is worth mentioning the work of Yıldırım et al. [
36], in which the engine response to different types of enriched hydrogen gas fuel was analyzed using the ANN and support vector regression (SVR) techniques. This work compared vibration, noise, CO, CO
2, and NO
x estimation performed by ANN and SVR models with experimental results collected from a small diesel engine. From this comparison, the authors concluded that the ANN allowed a lower overall MAPE, achieving an average difference of 53.70% compared with the SVR application. It was possible to observe the ANN’s potential to generalize parameters inherent to diesel engine operation, which can be extended to investigate its performance on large-scale systems.
Machine learning algorithms based on DT have also been recently explored for heavy mechanical equipment regression problems as alternative models for capturing nonlinear patterns [
27,
28,
37,
38,
39]. These models have architectures with better results interpretability and usually significantly reduced computational adjustment costs due to the parallel and distributed processing capacity [
37]. Papandreou and Ziakopoulos applied regression models such as multivariate polynomial regression (MPR), ANN, and the XGBoost decision tree reinforcement method to determine the fuel consumption of large-scale crude oil conveyors based on information collected by sensors and path meteorological conditions [
27]. XGBoost fuel consumption prediction exhibited better error metrics achieving 86.14% custom accuracy when compared to ANN (73.40%) and MPR (74.05%). Furthermore, ANN took approximately 40 times more than XGBoost to generate an equal number of models during training. This work developed a methodology to apply XGBoost’s popular machine learning algorithm to predict fuel consumption achieving better results when compared to well-known techniques such as ANN.
Open-pit mine trucks fuel consumption regression based on external parameters was performed by Wang et al. [
37] using different types of machine learning algorithms such as K-Nearest Neighbor (KNN), SVR, ANN, RF, and XGBoost. It was possible to determine the key features for predicting mining trucks’ fuel consumption. Although SVR and XGBoost algorithms were the most effective models in pattern abstraction, the XGBoost approach was computationally faster with R
2 and MAPE by 0.93 and 8.78%, respectively. Additionally, Hu et al. [
38] developed a hybrid GRU-XGB estimator to perform CO, CO
2, HC, and NO
x gas emission forecasts. This model initially used a double-layer gated recurrent unit (GRU) ANN responsible for analyzing historical emission data and transforming them into an encoded feature. This parameter was then fed into XGBoost estimator with other external factors, such as ambient temperature and humidity, speed, acceleration, and road conditions to generate gas predictions for two bus lines. GRU-XGB modeling captured more complex emission prediction patterns when compared to individually applied models, with an average MAPE of 3.84% for diesel-powered buses. Additionally, the XGBoost algorithm application ensured feature importance analysis for each external parameter. Nevertheless, the XGBoost modeling was applied as a regressor in this work that sought to relate the GRU encoded attribute (generated by historical emission past values) with real-time exogenous variables; it was then possible to explore the XGBoost estimator generalization potential for a variable of interest with temporal characteristics.
Although different time series forecasting techniques are applied in coal, natural gas TPPs systems, and various small/medium engine applications, their implementations on large diesel/HFO engines used for electric power generation have not yet been well-explored. Therefore, this work sought to investigate linear (AR and ARIMA-PSO) and nonlinear (MLP and XGBoost) forecasting methods and identify their ability to capture patterns during several operations of a large Brazilian TPP engine-driven generator. The identification and estimation of ARIMA model parameters were optimized using PSO modules, given the improvement seen in the literature with such a metaheuristic for this application. The AR model was used and estimated by Yule-Walker equations as a benchmark. As for the analysis of capturing nonlinear patterns, autoregressive neural networks (NAR) based on the MLP architecture were applied due to their high capacity to capture more complex nonlinear patterns. The application of the XGBoost algorithm adapted for forecasting problems was directly compared to the performance of other applied models as an alternative with a lower computational cost to perform this task.
This work is structured as follows: in
Section 1, an introduction discussing the application of time series forecasting techniques and related works is provided;
Section 2 presents the applied Box & Jenkins and PSO methods for linear predictions, and NAR and XGBoost fundamentals for nonlinear predictions;
Section 3 covers data collection and pre-processing, AR and ARIMA-PSO models estimation, NAR and XGBoost models learning process, and the considered evaluation metrics; in
Section 4, univariate model prediction is applied to the consumption of each fuel; and in
Section 5, conclusions about model performance are presented.
3. Material and Methods
This research evaluated univariate prediction models applied to fuel consumption series related to one of the 17 Wärtsilä 20V46F engines available in Energética Suape II S.A. TPP powered by diesel oil/HFO, with total installed capacity of 381.2 MW, which is located in Pernambuco, Brazil. The analysis goal was to investigate univariate forecast models’ generalization capacity when applied to fuel consumption series based on training and data collection referring to 6 months of operation.
3.1. Data Acquisition, Cleaning and Normalization
The data collection step captured fuel consumption information read by a flowmeter sensor placed before the engine-driven generator intake branch.
Figure 2 shows the physical data acquisition and control system used in this analysis, consisting of a flowmeter (
Figure 2a) and local control panel (
Figure 2b). From the monitoring platform, it was possible to obtain fuel consumption data in kilograms per hour every 7 to 10 s of operation related to the plant engine, with the highest number of accumulated hours during the analyzed period. Additionally, it is essential to consider that during the selected timeframe (31 May 2021 to 4 December 2021), this TPP had a higher dispatch level due to the then-current Brazilian energy and pluvial scenario.
A total of 2,053,542 million fuel consumption observations were collected for this study. After data collection, each non-captured value was identified due to possible errors in the data acquisition step and outliers. We identified 35,815 nonconforming samples and used linear interpolations between available nearest-neighbor values to replace them. To guarantee temporal uniformity in data, a fuel consumption series resizing was carried out in groups of signals considering the average of the minute under analysis since no considerable changes were observed when smaller temporal granularities were analyzed.
After data cleaning, normalization was carried out to guarantee pattern recognition and forecasting model convergence. It is noteworthy that, thanks to the decision tree architecture, XGBoost predictors did not need data normalization before learning, and the same applies to statistical models based on the Box & Jenkins methodology. The normalization process used the and was based on fuel consumption series values taken every minute t, with its mean and standard deviation .
At the end of this stage, a set of 190,391 fuel consumption samples referring to 188 days was considered, totaling 54 complete operations. As machine learning-based nonlinear predictor techniques require sets of supervised signals during the model’s learning period, the collected observations were divided into three distinct sets of operations. The first and most extensive set, called the training set, consisted of 36 operations used for forecast model refinement. The second set, called the validation set, consisted of 8 unknown operations during the adjustment of the models to monitor possible overfitting. This set was also used to select nonlinear model hyper-parameters. The last set, called the testing set, consisted of 10 operations in which linear and nonlinear prediction models were compared. It is worth mentioning that the statistical models do not require any validation step during model adjustment, since this procedure is based on the first 44 operations.
Figure 3 shows the fuel consumption dataset split. It is possible to see that the operations carried out during this period varied considerably depending on the requested order of dispatch, ranging from very-short operations (277 min) to operations over days (19,803 min).
3.2. Linear Modeling
For AR and ARIMA linear modeling, it was first necessary to perform a fuel consumption series stationarity analysis. For this task, autocorrelation function (ACF) and partial autocorrelation function (PACF) were used to verify correlations between lags and detect possible patterns as an initial analysis. ADF and KPSS unit root and stationarity statistical tests were also used [
52,
53]. These tests were used simultaneously although the ADF test is commonly used individually to verify series stationarity related to a deterministic trend potentially seen at the beginning and end of each operation. For nonstationarity patterns suggested by statistical tests at a 5% significance level, a new series was created by taking the analyzed time series difference
d. The input of this subsystem was the preprocessed fuel consumption time series
, and the output was a stationary time series and its statistical results. Statistical tests available by the
statsmodels library in Python were used for this analysis.
After the stationarity analysis, 15 AR-type models were adjusted to identify the best model order p considering the analyzed error metrics. Model coefficients were estimated on the learning set, and Yule-Walker equations shown in Equation (2) were used. The input of this sub-step was the obtained stationary series, while the output 15 fitted AR models and their predictions on learning subset partition .
To identify the influence of adding moving average components in the linear model approach, a univariate ARMA (
p,
q) model was adjusted by using two PSO modules as previously shown [
29,
54]. These modules are responsible for identifying the order (
p,
q) of the model through a PSO search (module 1) and estimating their respective coefficients (
,
) through another PSO search (module 2), minimizing the total prediction error on the learning subset. The first module was a discrete search mentioned in Equation (10) since the orders (
p,
q) are discrete values in the search space. For the second module, the search for coefficients was done in a continuous space through Equation (8).
Table 1 shows the considered ARMA and PSO search parameter configuration ranges. Each search module was done after reaching the total number of iterations or if there was no prediction error improvement for several consecutive iterations. The input of this subsystem was the stationary series. The outputs were the identification of optimal order and coefficients (
p,
q) and (
,
), respectively, swarms learning curve, an adjusted ARMA model and its predictions in the learning subset
.
3.3. Nonlinear Modeling
For the nonlinear approach, it was first necessary to transform the fuel consumption dataset into a tabular input, as the nonlinear machine learning models required. Once transformed, NAR and XGBoost models were used to perform nonlinear predictions. To optimize the model performance, a hyper-parameter adjustment step was carried out, including determining the
k input delay to make predictions. This step, seeking the best model performance considering the RMSE metric, applied a randomized search with 50 hyper-parameter combinations for each model. Each combination used different values of hyper-parameters within a pre-established limit (
Table 2).
Models were evaluated through the cross-validation methodology according to the division of data presented in
Section 3.1. Furthermore, to avoid problems related to the stochasticity of machine learning methods, each combination of hyper-parameters was repeated 10 times, totalizing 500 adjusted models during this analysis.
To define the best-randomized search combinations, we analyzed each metric set’s average and standard deviation. NAR and XGBoost models with the best performance in the randomized search were selected and retrained 30 times to select the model with the smallest validation subset error metrics. The input of nonlinear modeling was the preprocessed fuel consumption time series
, while the output two adjusted models (NAR and XGBoost) and their respective predictions in the learning set
and
, respectively.
Figure 4 describes the univariate modeling step scheme.
3.4. Model Evaluation Metrics
Regarding error metrics, we used the root mean squared error (
RMSE) (Equation (22)) and mean absolute error (
MAE) (Equation (23)) to evaluate the performances:
where
n represents the total number of predicted values,
are the output values of the prediction model and
are the actual values of the evaluated time series.
Both metrics are widely used in forecasting tasks, the former being more sensitive to outliers, while the latter providing the distances between predicted and actual values.
The traditional coefficient of determination (
) defined in Equation (24) was also used to indicate the prediction adherence compared to an adjusted regression. This metric varies between 0 and 1, and higher coefficients indicate better overall model performance.
The last metric utilized in this work, which was presented by Papandreou and Ziakopoulos [
27], is a simple one used for regression and classification evaluation called custom accuracy (
CA). This performance metric, defined in Equation (25), is calculated considering an acceptable error margin to distinguish whether predictions are accurate. For this work, we also considered a 5% custom threshold acceptable for performance evaluation.
In summary, this metric counted how many values were considered accurate
within the custom threshold. It also considered the total number of predictions generated
for this analysis as indicated in Equation (26):
Even though its value varies between 0 and 1, like that of R2, it has a different performance evaluation principle.
For the final model evaluation analysis, we used the Wilcoxon Signed-Rank statistical test to evaluate statistical differences between linear and nonlinear models.
4. Results
This section will analyze and discuss adjustment procedure results of selected models, their prediction behavior in the test set, and their main error metrics.
4.1. Stationarity Analysis and AR and ARIMA Models Adjustment
Initially, original fuel consumption ACF and PACF series were analyzed as shown in
Figure 5 to understand patterns correlation.
It is possible to see that more distant lags have a smaller magnitude than closer values, which indicates the stationarity nature of the time series. Since there is a smooth decay profile in the ACF as delays are considered, it is suggested that the series is stationary with p autoregressive orders indicated by the truncation after significantly relevant partial autocorrelations values presented in the PACF graph. The low influence of the moving averages for linear model construction considering the original series can also be observed since there is no drastic drop in ACF after a specific delay q. Furthermore, evaluating the original PACF series, it is possible to visualize the presence mainly of the first three orders p with a purely autoregressive behavior AR(p). However, there may be a significant influence of higher orders for model building as there is an increase in PACFs values as long delays are observed.
ADF and KPSS tests were performed to confirm the fuel consumption series stationarity hypothesis. The ADF test did not detect a unit root presence in model construction, suggesting the series stationarity during the learning set and considering a confidence level of 95% (
). However, the KPSS stationarity test concerning a deterministic trend suggested the series non-stationarity around the mean
considering the reliability of 95% (
), possibly due to fuel consumption behavior changes during engine start-up and shutdown, as well as sudden drops in fuel consumption during regular operation. It is noteworthy that while the null hypothesis of the ADF indicates the series non-stationarity, the null hypothesis used in the KPSS test suggests a possible stationarity of the time series around the mean. Due to contradictory conclusions, we decided to carry out a differentiation (
d = 1) and repeat the stationarity analysis in the differentiated series also presented in
Figure 5.
Unlike the original series, ACF and PACF autocorrelations for the differentiated series suggested a presence of autoregressive and moving average behaviors since there was a drastic drop in both after a set of orders p and q. ADF and KPSS statistical tests under the differentiated series indicated a stationarity nature ( and ) also considering the reliability of 95%, making it clear that any deterministic trends were removed along the differentiated series mean. ACF truncation referring to the orders or together with a sharp decay in PACF can also suggest a purely moving average characteristic for the model final adjustment, which will consider the search for identification and estimation parameters through the previously presented PSO modules. Two observations can then be made on the analysis of both functions: visually identifying orders can be a complex task; it is possible to perform the adjustment of more than one model that guarantees white noise presence, which is an indication of linear patterns generalization when using this type of approach.
Error metrics referring to the adjustments of the 15 different AR(
p) models during the learning set considering the first 15 orders
p estimated through Yule-Walker equations are shown in
Figure 6.
These model adjustments were also evaluated in the differentiated series that obtained relatively worse metrics, indicating a better predictive capacity of this family of models for the original fuel consumption series. Considering the set of used error metrics, a order was chosen for the final AR model since this order generates a balance between performance and complexity, being considered the most parsimonious model among those generated. It is noteworthy that the model can better predict perturbations in the time series, directly reducing the RMSE metric with higher orders. However, it becomes more sensitive to small changes that affect MAE evaluation. The increase does not influence R2 and CA behaviors in orders.
Regarding ARIMA model family application (
p,
d,
q), orders identification (
p,
q) and coefficients estimation (
,
) from the PSO modules found an ARIMA model (4,1,6) with convergence curves shown in
Figure 7. It is possible to observe that the order selection module (
module) achieved a significantly faster convergence than the coefficient estimation (
module). This process lasted about 50 iterations without any decrease in the objective function. However, we decided to present fewer iterations to facilitate learning curve visualization. This behavior is predictable since the orders discrete search space is limited to few possibilities (orders 1~15 in both cases), while coefficients’ continuous search space makes optimization a more complex procedure. Additionally, despite only six iterations required to model the adjustment process in the first module to find the optimal solution, there was a computational cost of 13.73 h to carry out this task, which is considerably longer than the AR models estimation, which had a processing period of a few minutes to calculate the 15 combinations. It is worth mentioning that depending on the orders (
p,
q), the estimation of coefficients (
,
) varied with duration ranging from 30 min to 2 h for orders higher than both cases.
Despite providing a more complex model than the AR model with more coefficients to be identified and estimated, the addition of moving averages did not show a significant performance difference in error metrics in the learning set (RMSE = 132.72; MAE = 28.30; R2 = 0.940; CA = 0.959). However, since engine-driven operations had severe fluctuations in the fuel consumption series that do not present linear patterns, the addition of moving average orders may have caused extra errors by considering q previous errors to make prediction corrections in samples right after sudden changes. Despite error metrics deterioration due to this ARIMA model corrective behavior, this sensitivity helped predict other patterns that improved the metrics set in the training set.
4.2. NAR and XGBoost Models Adjustment
For nonlinear univariate prediction NAR and XGBoost models, the RMSE set obtained in the randomized search for the validation set depends considerably on the combination of hyper-parameters used (
Figure 8). The five best adjustments were highlighted in color to facilitate the performance visualization of each combination. Combinations 10 (XGBoost) and 36 (NAR), highlighted in green, were selected for model final fit because they had the lowest RMSE means as the lowest limits.
Table 3 lists the selected hyper-parameters for each analyzed nonlinear model.
It is worth mentioning that combinations 11, 12, 21, 28, 30, 31, 38, 40, 42, and 49 of the univariate XGBoost model randomized search exhibited much higher error metrics than the others, but they are not shown to facilitate the visualization of the other combinations.
XGBoost error metrics have the same scale as the fuel consumption series since the input data normalization process is unnecessary due to the models’ structure construction based on residuals. Additionally, for the analyzed validation set, XGBoost error metrics were much lower than the previously mentioned AR and ARIMA model metrics, suggesting that there was better model adherence during training. The relevance of each temporal step after the final model adjustment is shown in
Figure 9, which indicates each attribute proportion usage for DT building. It is possible to notice that the fuel consumption immediately preceding minute
has an important composing role in the next-minute prediction. We can also observe decreasing lag importance on the XGBoost model trees with further delays. This behavior is close to the previously presented ACF profile behavior. Furthermore,
and
steps showed similar importance, suggesting that new model entries had no benefits. The 50 randomized search combinations execution time was equal to 26.62 h, being much higher than the AR model adjustment but only two times longer than ARIMA-PSO adjustment through order identification and coefficients estimation modules.
NAR randomized search RMSE metrics refer to normalized fuel consumption prediction values since this practice is necessary for good model convergence. All neural network simulations considered 500 epochs supported by an early stopping technique. This procedure interrupts weights adjustment after not detecting RMSE improvement in the validation set after 150 consecutive iterations to reduce the analyzed combinations’ computational effort. NAR modeling lasted 231.51 h, resulting in a computational cost 8.7 times greater than XGBoost modeling and the slowest univariate model optimized through the followed methodology.
The final NAR model convergence profile on training and validation sets is depicted in
Figure 10. Initially, it is possible to observe that both learning curves presented very noisy convergences that tended to decrease their variations during the learning phase. Noisy patterns’ presence in small randomly selected training batches justifies this behavior and contributes to better model generalizability. Adam batches stochasticity forces neural network destabilization in local optima during training epochs. The decrease in curve perturbation at the end of epochs may have been related to learning rate smoothing due to Adam
coefficients. Despite this behavior, training and validation sets converged without overfitting characteristics during learning.
Notably, the magnitude of RMSE calculated on the validation set (orange) during the learning period is smaller than the one of the training error itself (blue). There are several assumptions related to this type of behavior, the main ones being:
The eight validation set operations have simpler patterns to be captured by the neural network. However, validation predictions were not affected by ANN’s higher generalization capability due to more complex patterns during training;
Among 36 training operations, there may be random noises that are not related to the engine’s operation and were not observed during the model validation;
The L2 regularization process is applied only to the training set and increases the cost function value on this occasion to avoid overfitting. The cost function applied to the validation set is calculated from an unregularized RMSE, which may have resulted in lower error magnitudes;
The dropout regularization technique seeks to penalize the model variance by randomly freezing neurons in a specific layer during the training set and unfreezing them during validation. However, this technique was not applied in the present study.
After the hyper-parameters selection phase, fuel consumption testing set predictions were made to better understand each model’s characteristics during the final 10 operations. NAR and XGBoost models were retrained 30 times to avoid misadjustments due to the algorithms’ stochasticity.
4.3. Univariate Prediction Analysis by Engine Operation
Initially, it is useful to consider that the duration of operations in the test set varies drastically, ranging from 277 to 16,333 min, which directly reflects on the performance of the forecast models and fuel consumption predictions profile. Error metrics obtained during this analysis are illustrated in
Figure 11. The linear models’ (AR and ARIMA-PSO) performance in the test set had similar error metrics behavior. Although the AR model exhibited a slightly higher RMSE in operations 5, 6, 7, and 8, it had a lower MAE in 9 of the 10 operations. However, NAR and XGBoost models’ performance was better for all test operations except for operation 4. There is a general behavior similarity among the machine learning models. However, the NAR model provided closer predictions to the actual fuel consumption series, which directly implied a decrease in MAE among almost all models. The short-duration characteristics of operations 4 and 7 are directly reflected in the peaks of RMSE and MAE metrics and the reductions of R
2 and
CA metrics.
Despite relatively close error metrics, operation prediction profiles differ from each other. It was possible to observe that AR and ARIMA-PSO models were better at capturing full-load operation patterns when there were no large fuel consumption fluctuations. However, as shown in
Figure 12, NAR and XGBoost models were better at predicting sudden variations due to process stochasticity. Initially, it was possible to notice that no model captured sudden drop patterns when prediction vs. actual fuel consumption curves were analyzed. These patterns are related to fuel valve closure that guarantees micro fuel leak usage. Leaked fuel is stored in an external reservoir positioned right after the flowmeter. Due to this positioning, reused fuel flow is not captured by the sensors for 1 to 3 min until all total leaked fuel is consumed. This sharp drop pattern is unpredictable by univariate models as they only consider past fuel consumption observations to predict its future.
Each model’s sensitivity to readjusting after sudden drop episodes is another exciting pattern to be analyzed. As AR and ARIMA-PSO models make predictions based on a linear combination of the past values, the immediately subsequent predictions are affected by abrupt oscillations. This characteristic generates error propagation for a few minutes, which increases error prediction, mainly in longer and more complex operations. This error propagation can be observed in
Figure 12 through the arrow in the prediction vs. actual consumption graphs and prediction error vs. prediction graphs. Furthermore, these characteristics generate punctual errors up to 2000 kg/h. However, when frequency and prediction errors magnitude are analyzed during regular operation in which fuel consumption varies only between 4350 to 4650 kg/h, it is possible to observe that the AR and ARIMA-PSO linear models generate predictions with errors that have normal distributions close to zero (
and
kg/h, respectively).
Different patterns were observed when NAR and XGBoost models were applied. Although these models were not able to capture the sudden drops related to fuel supply valve closure, they were able to accurately predict most of the time fuel consumption in the minutes immediately after these events, reducing error propagation. Error decrease after sudden drops is evidenced in
Figure 12 by the disappearance of the previously mentioned arrow, mainly in the predictions made by the NAR model. Furthermore, it is possible to verify that the NAR model generates relatively worse predictions than the AR and ARIMA-PSO linear models during regular engine operation at full load. Additionally, NAR predictions tended to overestimate the fuel consumption series with a displaced normal distribution of
kg/h, while XGBoost conventional operation predictions resulted in a fuel consumption underestimation with a normal distribution of
kg/h. This characteristic reinforces the assumption that the model has a higher sensitivity to slight variations despite the use of optimized regulators determined through the randomized search.
Operations and model prediction characteristics can be seen in
Figure 13. These behaviors are related to different operation phases such as engine start-up (ramp) and shutdown, sudden peaks and drops, and nonlinear oscillations inherent to engine operation. While AR and ARIMA-PSO models underestimated fuel consumption during the start-up process and overestimated future values at engine shutdown, NAR and XGBoost models provided predictions closer to the actual series values. It is noteworthy that among the linear models, ARIMA-PSO made closer predictions when compared to the real fuel consumption series, benefiting from the use of moving averages for this operation stage. The higher XGBoost sensitivity was confirmed when small fluctuations happened at full load, which resulted in an MAE penalty. This sensitivity may have occurred due to the learning ability of the model to predict sudden variations that affected its capacity to predict simpler patterns inherent to the engine’s operation. Linear model behavior in predicting sudden drops confirmed prediction error propagation and greater ARIMA-PSO instability due to moving averages addition. Nonlinear models correctly predicted fuel valve opening and achieved faster prediction stability in these scenarios.
We performed statistical analysis to evaluate possible differences between the applied models on test metrics. For this purpose, we used the Wilcoxon Signed-Rank Test to verify whether there was any statistically significant difference between error median values considering a confidence level of 95%. It was not possible to reject the null hypothesis since there was no difference in central tendency between AR and ARIMA-PSO models ( and ) and NAR and XGBoost models (, , and ). However, we observed a statistically significant difference between AR and ARIMA-PSO models regarding MAE and CA error metrics ( and ). This difference can be explained by the addition of moving averages to the ARIMA-PSO model, which reduced its stability to sudden oscillations during operations, directly affecting MAE and CA metrics.
5. Conclusions
Univariate linear AR models adjusted by the Yule-Walker equations and ARIMA identified and estimated by the PSO module’s ability to capture patterns in engine fuel consumption were investigated. The presented results indicated that, despite the computational time spent increase related to the PSO modules application to adjust ARIMA models of 13.73 h, there were no improvements in the training set when compared to the AR(3) model. There was a significant MAE degradation of 13.11% due to the moving averages addition to the ARIMA(4,1,6) model, which caused prediction instabilities right after sudden fluctuations during the operation. The test set metrics behavior and error profile were statistically similar to the analytically adjusted AR model application, even with the same performance behavior observed during training with a 1.24% RMSE improvement and 6.45% MAE degradation. This characteristic suggests that the diesel/HFO engine fuel consumption series may be purely autoregressive. Overall AR and ARIMA-PSO performance during full-load operation presented normal error distributions of −0.03 ± 3.55 and 0.03 ± 3.78 kg/h, respectively. Additionally, although these models did not efficiently capture consumption variations related to disturbances and periods of power output modification (being the ARIMA-PSO model is more accurate for the latter task), they were superior to machine learning models to assimilate linear patterns during regular operation.
Simultaneously, the performance of nonlinear NAR and XGBoost models adjusted through randomized searches was investigated. As expected, the machine learning model computational costs were considerably higher than the linear prediction model adjustments. During the learning stage, while training the 50 randomized XGBoost models took 26.62 h (approximately twice as long as the ARIMA-PSO fitting), the NAR model training took 231.51 h (8.7 times longer than the XGBoost training). XGBoost feature importance for each temporal step considered during the DTs construction was evaluated. It was found that there was a recent input predominance of 81.5% (, , and ) compared to the later temporal steps ( and ). These models ensured considerably better error metrics in test operations when compared to the applied linear models. The NAR and XGBoost models had statistically similar RMSE performances with an improvement of 26.59% on average when compared with the ARIMA-PSO model, which was the best linear model considering this error metric. Furthermore, the NAR model had a 42.37% MAE improvement over the AR model, while the XGBoost model achieved a 30.30% improvement over the same statistical model. Predictions based on ANN showed RMSE, R2, and CA metrics similar to the XGBoost technique. However, there was a clear difference in MAE behavior between these models, with the ANN-based model having a MAE 21.96% lower than the boosting model. It is worth mentioning that these results did not consider the operation 4 performance due to the error metrics’ high variability related to the very-short duration of this operation. Additionally, it was possible to notice that although NAR and XGBoost models can assimilate nonlinear patterns related to engine startup, shutdown, and sudden fluctuations in fuel consumption, the application of the XGBoost predictor resulted in a higher prediction sensitivity during the full load operation, despite regularizers usage during model construction, which reinforces the MAE metric reduction in relation to the NAR model. Overall NAR and XGBoost performance during full-load operation presented normal error distributions of 1.66 ± 5.20 and −4.03 ± 7.42 kg/h, respectively. Therefore, it was possible to observe better adherence to NAR and ARIMA-PSO univariate models for engines’ fuel consumption series.
Finally, predictions for the very short-term range fuel consumption series related to a large TPP diesel/HFO engine can be accomplished with univariate forecasting models. The machine learning methods application can improve real-time data-driven system monitoring despite the high computational cost inherent to model training. However, statistical methods proposed by Box & Jenkins also present satisfactory performances for the analyzed fuel consumption series. The methodology applied in this work can be adapted for other parameters inherent to the engine operation analysis, such as SOX and NOX emission episodes at TPPs. Since these series typically have emission peak patterns that are not effectively predicted with linear models, NAR and XGBoost can be investigated for pollution episode prediction due to their capacity to capture nonlinear patterns and sudden oscillations. However, each emission series behavior must be analyzed, and hybrid machine learning models can be developed to obtain more accurate predictions. Overall, nonlinear forecasting techniques have already been applied to improve emissions rates in coal-fired TPPs over long periods and can also be explored in diesel/HFO engine emission series. Additionally, it is necessary to verify whether exogenous variables addition related to the engine’s operating state and combustion quality can benefit emission levels pattern recognition as a multivariate forecasting approach.
Therefore, it is possible to notice that no analyzed univariate forecasting model could better generalize all the described fuel consumption series patterns. Given the model’s high-performance specificity in capturing different operation patterns, such as full-load, start-up, shut-down, and sudden oscillations, we propose the development of new adaptive prediction algorithms that can identify momentary operation regions and, based on this information, select the historically best-suited forecasting model for the analyzed behavior. Machine learning techniques based on temporal window clustering and forecasting models classification based on training errors could be applied. Additionally, different time series forecasting techniques, such as recurrent neural networks (RNN) and support vector machine (SVM), can be analyzed. Beyond that, in future works, the multivariate time series forecasting techniques application can be analyzed for fuel consumption prediction. For that matter, the addition of a new feature selection step can be investigated through linear and nonlinear causality analysis. Additionally, it is worth mentioning the investigation of each model’s performance through different fuel consumption series temporal granularities and input delays during the data pre-processing step.