Next Article in Journal
Facilitating the Transition to an Inverter Dominated Power System: Experimental Evaluation of a Non-Intrusive Add-On Predictive Controller
Next Article in Special Issue
Thermocatalytic Pyrolysis of Exhausted Arthrospira platensis Biomass after Protein or Lipid Recovery
Previous Article in Journal
Water Intrusion Characterization in Naturally Fractured Gas Reservoir Based on Spatial DFN Connectivity Analysis
Previous Article in Special Issue
Heterogeneous Catalysis in (Bio)Ethanol Conversion to Chemicals and Fuels: Thermodynamics, Catalysis, Reaction Paths, Mechanisms and Product Selectivities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods

by
Hugo Siqueira
1,
Mariana Macedo
2,
Yara de Souza Tadano
3,
Thiago Antonini Alves
4,
Sergio L. Stevan, Jr.
1,
Domingos S. Oliveira, Jr.
5,
Manoel H.N. Marinho
6,
Paulo S.G. de Mattos Neto
5,
 João F. L. de Oliveira
6,
Ivette Luna
7,
Marcos de Almeida Leone Filho
8,
Leonie Asfora Sarubbo
9,10 and
Attilio Converti
11,*
1
Department of Electronics, Federal University of Technology–Parana (UTFPR), Ponta Grossa (PR) 84017-220, Brazil
2
BioComplex Lab, Department of Computer Science, University of Exeter, Exeter EX4 4PY, UK
3
Department of Mathematic, Federal University of Technology–Parana (UTFPR), Ponta Grossa (PR) 84017-220, Brazil
4
Department of Mechanical Engineering, Federal University of Technology–Parana (UTFPR), Ponta Grossa (PR) 84017-220, Brazil
5
Departamento de Sistemas de Computação, Centro de Informática, Universidade Federal de Pernambuco (UFPE), Recife (PE) 50670-901, Brazil
6
Polytechnic School of Pernambuco, University of Pernambuco (UPE), Recife (PE) 50720-001, Brazil
7
Department of Economic Theory, Institute of Economics, State University of Campinas (UNICAMP), Campinas (SP) 13083-857, Brazil
8
Venidera Pesquisa e Desenvolvimento, Campinas 13070-173, Brazil
9
Department of Biotechnology, Catholic University of Pernambuco (UNICAP), Recife (PE) 50050-900, Brazil
10
Advanced Institute of Technology and Innovation (IATI), Recife (PE) 50070-280, Brazil
11
Department of Civil, Chemical and Environmental Engineering, University of Genoa (UNIGE), 16145 Genoa, Italy
*
Author to whom correspondence should be addressed.
Energies 2020, 13(16), 4236; https://doi.org/10.3390/en13164236
Submission received: 14 July 2020 / Revised: 10 August 2020 / Accepted: 13 August 2020 / Published: 16 August 2020

Abstract

:
The forecasting of monthly seasonal streamflow time series is an important issue for countries where hydroelectric plants contribute significantly to electric power generation. The main step in the planning of the electric sector’s operation is to predict such series to anticipate behaviors and issues. In general, several proposals of the literature focus just on the determination of the best forecasting models. However, the correct selection of input variables is an essential step for the forecasting accuracy, which in a univariate model is given by the lags of the time series to forecast. This task can be solved by variable selection methods since the performance of the predictors is directly related to this stage. In the present study, we investigate the performances of linear and non-linear filters, wrappers, and bio-inspired metaheuristics, totaling ten approaches. The addressed predictors are the extreme learning machine neural networks, representing the non-linear approaches, and the autoregressive linear models, from the Box and Jenkins methodology. The computational results regarding five series from hydroelectric plants indicate that the wrapper methodology is adequate for the non-linear method, and the linear approaches are better adjusted using filters.

Graphical Abstract

1. Introduction

An essential task for countries where power generation is done using hydroelectric plants is the monthly seasonal streamflow series forecasting. This step is directly related to energetic planning, water availability, and pricing strategies. The last Hydropower Status Report from the International Hydropower Association reported that 4306 TWh of electricity was generated in the world employing hydroelectric plants in 2019, corresponding to a 2.5% annual increase [1]. This amount represents the single most significant contribution from a renewable energy source in history. The report addresses data from 13,000 stations in 150 countries. The leaders in hydropower installed capacity are China (356.40 GW), Brazil (109.06 GW), United States (102.75 GW), and Canada (81.39 GW). These figures also provide an idea about the productive chain regarding such power source. It is clear that the correct water management can bring substantial benefits in avoiding water or money waste [2].
Many researchers have addressed streamflow series tasks for different countries such as China, Canada, Ecuador, Iraq, Mozambique, United States, Serbia, Norway, Turkey, Sri-Lanka, and Brazil [2,3,4,5,6,7,8,9,10]. It highlights the importance of the problem to the global economy. These series present a particular seasonal component due to the periods of rainfall along the year, being non-stationary series [11,12]. However, most of such studies focus on determining just the best predictor, disregarding some other essential steps of the whole process.
Identifying a system is a task influenced by factors, such as prior knowledge of its characteristics, complexity, presence of noise, and performance metrics to be used [13,14]. The inputs must represent the dynamics of the system, which helps in choosing a forecasting model that is appropriate to the problem and is a fundamental step to obtain an efficient model for time series forecasting [15]. The structure of models also defined by the number of entries. For example, in the case of neural networks, the number of inputs impacts the determination of their structure, since the more entries there are, the more complex the neural network will be, as well as the more costly its training, without a guarantee of a performance improvement [16]. Additionally, the number of inputs influences the surface of the cost function, which tends to have more local minima [17,18].
Among potential benefits of inputs selection, we can mention the facilitation of visualization, data understanding, order reduction, memory requirements, reduction in training time, and computational effort [17].
Variable selection (VS) methods attempt to identify a subset of inputs that assist in forecasting, pattern recognition, and data regression, playing a significant role in the accuracy of the forecasting methods. Additionally, such approaches tend to simplify the final model, as well as to improve the stability of responses, and eliminate redundant inputs [19]. Harrell [20] stated that VS could be subjective because they use a conceptual understanding of the dependent variable to select independent variables. Many times, a small number of inputs is recommended for prediction purposes [21]. Guyon and Elisseeff [17] classified the selection methods as filters, wrappers, and embedded. We can also mention a new category, the bio-inspired metaheuristics for optimization.
Many research fields have addressed the importance of variable selection to improve the accuracy of models in different contexts, as presented in the works below:
  • Li et al. [22]—to increase the estimation of growing stem volume of pine using optical images;
  • Bonah et al. [23]—to quantitative tracking of foodborne pathogens;
  • Xiong et al. [24]—to increase near-infrared spectroscopy quality;
  • Speiser et al. [25]—an extensive investigation with 311 datasets to compare several random forest VS methods for classification;
  • Rendall et al. [26]—extensive comparison of large scale data driven prediction methods based on VS and machine learning;
  • Marcjasz et al. [27]—to electricity price forecasting;
  • Santi et al. [28]—to predict mathematics scores of students;
  • Karim et al. [29]—to predict post-operative outcomes of cardiac surgery patients;
  • Kim and Kang [30]—to faulty wafer detection in semiconductor manufacturing;
  • Furmańczyk and Rejchel [31]—to high-dimensional binary classification problems;
  • Fouad and Loáiciga [5]—to predict percentile flows using inflow duration curve and regression models;
  • Ata Tutkun and Kayhan Atilgan [32]—investigated VS models in Cox regression, a multivariate model;
  • Mehmood et al. [33]—compared several VS approaches in partial least-squares regression tasks;
  • McGee and Yaffee [34]—provided a study on short multivariate time series and many variations of Least Absolute Shrinkage and Selection Operator (LASSO) for VS;
  • Seo [35]—discussed the VS problem together with outlier detection, due to each input affecting the regression task;
  • Dong et al. [36]—to wind power generation prediction;
  • Sigauke et al. [37]—presented a probabilistic hourly load forecasting framework based on additive quantile regression models;
  • Wang et al. [38]—to short-term wind speed forecasting;
  • Taormina and Chau [39]—to rainfall-runoff modeling;
  • Taormina et al. [40]—to river flow forecasting;
  • Cui and Jiang [41]—to chaotic time series prediction;
  • Silva et al. [42]—to predict the price of sugarcane derivatives;
  • Siqueira et al. [12,43,44,45]—applied the partial autocorrelation function linear filter to streamflow series forecasting;
  • Siqueira et al. [2]—use of VS methods, such as wrappers and filters to predict streamflow series; and
  • Kachba et al. [46]—application of wrapper and non-linear filters to estimate the impact of air pollution on human health.
Despite these methodologies not being universal or equally useful in various fields, the presentation of the studies above depicts the importance of variable selection for data processing in many different contexts, methods, and tasks. In streamflow series forecasting from hydroelectric plants, this is even more relevant due to the high magnitude of the energy generated. An increase of a single percentage point in the accuracy of such predictions represents an enormous amount of electric power. However, most of the literature focuses on the definition of the best forecasting model, neglecting a further investigation on the impact in adopting distinct VS approaches.
To fill this gap, we analyzed the use of ten VS methods in the autoregressive (AR) model from the Box and Jenkins methodology and the extreme learning machines neural network. The addressed VS approaches are linear filters (two manners of using the partial autocorrelation function, PACF), non-linear filters (three mutual information-based methods), wrappers (considering three evaluating metrics), and bio-inspired metaheuristics (genetic algorithm and particle swarm optimization).
The remainder of this study is organized as follows: Section 2 presents the main content of the variable selection procedure, as well as the main stages of the seasonal streamflow series forecasting; Section 3 discusses the filters; Section 4 the wrapper; Section 5 the bio-inspired metaheuristics—genetic algorithm and particle swarm optimization (PSO); Section 6 the case study, computational results, and the performance analysis; and Section 7 presents the conclusions.

2. Variable Selection

The variable selection methodologies can use information available a priori, through empirical tests of trial and error, or some information criterion. Puma-Villanueva et al. [47] describe a simple example of how the general process works. Consider set V that represents the space of input variables, here limited to 3. Thus, we define the vector of inputs V = [v1,v2,v3], with which it is possible to form (23 – 1 = 7) subsets of inputs, as depicted in Table 1.
The selection methods’ role is to define which of these subsets is the most appropriate to represent the information in the data, possibly in contrast to the adoption of all inputs. In this case, selecting variables is choosing the subset that allows the best forecast of future values of a time series, that is, selecting the vector V ¯ k among the possible combinations between the variables of V ¯ l , such that k ≤ l. This set represents the dependence structure of a stochastic process over time. In Table 1, the goal of the VS methods is to choose one of seven possibilities.
Yu and Liu [48] present some criteria that relate to this procedure:
  • Relevance: the concept associated with the importance of a given variable may have to the problem, since the information it contains will be the basis of the selection process. The relevance is strong or weak depending on how much its removal degrades the performance of the predictor;
  • Redundancy: two or more variables are redundant if their observed values are highly correlated or dependent. The level of this correlation reveals the degree of redundancy; and
  • Optimality: a so-called optimal subset of input variables is when there is no other subset that produces better results.
However, these characteristics, if combined, have no direct implication. For example, a relevant variable does not mean that the optimal subset contains it. Likewise, the inputs that belong to the optimal subset are not necessarily appropriate [47]. Guyon and Elisseeff [17] classify the VS models into embedded, wrappers, and filters, each of them having its own advantages. The particularities for a given problem indicate which method is most appropriate.
It is essential to point out a difference between the variables and feature selection. The understanding of feature is linked to the idea of a set of inputs that is formed from a combination of the original variables or the extraction of some essential characteristics. An example is principal component analysis (PCA), which linearly combines the inputs [49]. Thus, there is a set of new variables, which are in a new space. It differs from as, in this case, the subset is formed by those variables of the original entries that do not undergo any type of transformation.

Variable Selection in Streamflow Series Forecasting

The predictions regarding the monthly seasonal streamflow series follow the stages shown in Figure 1 (adapted from [2]):
The first stage is data acquisition. Sensors are spread to measure the volume of water that passes a transversal section of the rivers. Such water is that which moves the turbines of hydroelectric plants. Due to the nature of the rivers, containing many recesses along their way, the measurement’s uncertainty incidence is inevitable.
The second stage is the pre-processing, which can be summarized in two steps: (a) the application of the deseasonalization procedure to remove the inherent seasonal component present in this kind of series. The transformed data are stationary series, with zero mean and variance equal to one. For linear models of the Box and Jenkins methodology, this step is mandatory, but some investigations have shown that the performance of non-linear methods also increases with this procedure; (b) the input selection step, to determine the lags that lead to the best performances of the predictors. The input selection step is the focus of this investigation.
The next stage is the definition of the forecasting model. Undoubtedly, this is the most usual theme addressed in streamflow series forecasting. Clearly, the definition of the adequate predictor is crucial regarding the accuracy of the estimation of future samples.
The last stage is post-processing, which involves three procedures. After making predictions, the output responses of the predictors are in the deseasonalized domain. Therefore, we reverse the deseasonalization to allow a performance evaluation in the real domain. Then, a statistical test is applied to verify if the results found by the various predictors are distinct, even if they present different numeric values. Finally, the last step is performance analysis.
Despite the specialized literature on this series is related to the forecasting model, it is necessary to elaborate a specific investigation focused on the input selection process due to the impact it presents in the performance of such models. Determining the most suitable set of lags may lead to distinct conclusions. Thus, this theme must be discussed.

3. Filters

The filter selection method is based only on the available data and does not depend on the predictor model. The variables are chosen through linear or non-linear correlation measures between the observations. The main advantage of this method is its generality, as it is not necessary to synthesize the predictor, which tends to make it computationally efficient [17].
However, as this is a previous step, the optimal set of inputs may not be selected, since there is no interaction with the forecasting model. Metrics based on dependency between samples can be useful, but insufficient to ensure that the chosen set is the best possible. Therefore, we recommend using it for problems with a large amount of available data because if the criterion of optimality is not met, the computational cost should be worthwhile.
Figure 2 shows the scheme of the filter type method. Some of the inputs contained in the vector u t will belong to vector u t of smaller or equal dimension. The predictions are performed with u t .

3.1. Partial Autocorrelation Function

The partial autocorrelation function (PACF) is a widely used filter to identify the order of linear models [11]. The definition of the partial correlation coefficient is directly related to autoregressive models (AR) and the Yule–Walker equations.
The partial autocorrelation coefficient of order k is the last coefficient of an AR(k) model, adjusted for a time series xt and denoted by φkk. This means that an AR process of order p is different from zero to k less than or equal to p, and zero for k > p. Based on this assumption and using the Yule–Walker equations, the relationship between the autocorrelation estimates of a time series in these terms obeys the set of equations described in (1):
ρ j = φ k 1 ρ j 1 + φ k 2 ρ j 2 + + φ k ( k 1 ) ρ j ( k + 1 ) + φ k k ρ j k , j = 1 , 2 ,   ,   k
or, in a matrix form, we have (2) and (3):
P k p = [ 1 ρ 1 ρ p 1 ρ 1 1 ρ p 2 ρ p 1 ρ p 2 1 ] , ρ k p = [ ρ 1 ρ 2 ρ p ] , Φ k p = [ φ k 1 φ k 2 φ k p ] ,
Φ k p = P k p 1 ρ k p .
in which ρ are the coefficients of the autocorrelation [50].
Thus, the AR(p) model of order p = 1, 2, ...,k must be adjusted to find φ k k . Expression (4) shows the coefficients of the first two AR models:
AR(1): ρ 1 = φ 11 ρ 0
P 1 p = [ ρ 0 ] , ρ 1 p = [ ρ 1 ] , Φ 1 p = [ φ 11 ] ,
being φ 11 = ρ 1 the first partial autocorrelation coefficient. Similarly, we have (5):
AR(2): | ρ 1 = φ 11 ρ 0 + φ 22 ρ 1 ρ 2 = φ 21 ρ 1 + φ 22 ρ 0
P 1 p = [ 1 ρ 1 ρ 1 1 ] , ρ 1 p = [ ρ 1 ρ 2 ] , Φ 1 p = [ φ 21 φ 22 ] ,
Isolating φ 21 and equating the equations, we have φ 22 = ( ρ 2 ρ 1 2 ) / ( 1 ρ 1 2 ) , the value admitted as the second partial autocorrelation coefficient.
It is noteworthy that the autocorrelation coefficients ρp are problem-dependent. Then, the PACF of a series can be estimated through successive adjustments of the autoregressive models, determining the most appropriate orders of an AR model. In practice, the AR(1) is adjusted, from where we estimate the coefficient φ 11 . Following, we adjust the AR(2), and we have φ 21 and φ 22 , the latter being of interest. We continue in these systematic steps until the required k order is adjusted, from where the coefficients come out the desired φ k k .
For a time series, the highest order is sought such that all estimates φ k k for k > p are not significant. The order of the model is the value corresponding to the selected entry, that is, if the coefficients are selected φ 11 and φ 55 , lags 1 and 5 are part of the subset of inputs.
Quenouille [51] showed that for a AR(p) process, the coefficients φ k k estimated for orders greater than p have a Gaussian distribution with a mean equal to zero, variance equal to V A R [ φ k k ] 1 / N , being N the number of samples. Thus, the confidence threshold for the coefficients based on the standard deviation is | φ k k | 2 / N , considering that the estimate is different from zero in this interval.
However, the method can select non-consecutive delays as model inputs. For example, if V = [v1,v2,v3,v4], it can select V = [v1, v4], which means that φ 11 and φ 44 were significant, while φ 22 and φ 33 are not. For hydrologic time series, Stedinger [52] states that it makes no sense that a given sample is related to non-consecutive delays and that delays in a non-consecutive hydrological system selected by the PACF have no physical meaning, proposing the suppression of these entries. According to this work, some historical series have an autocorrelation structure relative to both the time between observations and the observed period.
Taking as an example an AR(6) model, if PACF defines that only the inputs weighted by the coefficients φ 11 and φ 44 are significant, the latter is considered as spurious data and must be discarded. This means that intermediate values should not be considered. Siqueira et al. [2] used bootstrapping techniques to evaluate the best order of periodic autoregressive models, and reached the same conclusion as Stedinger, with similar orders for streamflow series.
Figure 3 is related to the calculation of the PACF of the monthly streamflow series from Furnas hydroelectric plant, located in Brazil. The data used are from January. The horizontal line is the confidence threshold calculated as a function of the standard deviation. Note that, with 12 delays, the method selected lags 1, 5, and 7. If Stedinger’s proposal is taken into account, only delay 1 is chosen.
The technique is implemented for the selection of inputs in linear simulation models by the Brazilian National Electric System Operator [12].

3.2. Mutual Information

The dependency between two variables is an essential step for selecting model inputs. In this context, a reliable criterion belonging to the scope of information theory can be used, i.e. the mutual information (MI) [53]. The MI is a metric that provides a measure of the degree of dependence between variables; it reflects the amount of information that links them. This filter can be applied as a criterion to select the inputs of forecasting models.
The definition of MI between two random variables can be interpreted as a measure of proximity between the joint probability distribution of the variables x and y, and the product of their marginal distributions. Mathematically, we have (6):
M I = f x y ( x , y )   log ( f x y ( x , y ) f x ( x ) f y ( y ) )   d x   d y ,
in which f x y ( x , y ) is the joint probability density function (PDF), and f x ( x ) and f y ( y ) are the respective marginal density functions. The MI criterion presents zero as a result for independent variables, and greater than zero otherwise. If a representative sample of the data is available, one can estimate (6) using (7) [54]:
M I = 1 N i = 1 N log [ f x y ( x i , y i ) f x ( x i ) f y ( y i ) ] ,
where ( x i , y i ) is the i-th pair of data from the sample with size N, being i = 1, 2,..., N.
The difficulty in this case is to estimate the probabilities since the distributions are often unknown in practice. Additionally, these estimates may require a large amount of data, which is not always available.
There are several ways to estimate PDFs in the literature. In this work, we use a non-parametric approach based on kernel functions, the absolute distance, or city-block type, which have already been applied in streamflow series forecasting [54]. The choice for this proposal is justified in terms of computational simplicity and absence of data distribution type assumption.
Consider the input and output dataset [ X k , y k ] , being k = 1, 2,...,N. The approximation of the probability densities of a one-dimensional x variable via non-parametric kernel estimators is given by (8):
f ^ x = 1 N λ i = 1 N K [ x x i λ ] = 1 N i = 1 N K λ ( x x i ) ,
in which K λ ( t ) is the kernel function, and λ the bandwidth or dispersion parameter.
Therefore, the marginal approximate probability density function of x is given by (9):
f ^ x ( x ) = 1 N ( 2 λ ) p i = 1 N exp [ 1 λ   i = 1 p | x j x i j | ] ,
with p being the dimension of x.
Equation (9) arises from (8) as a case adapted to multidimensional x, and using the city-block function. The parameter λ is calculated by (10) [55]:
λ = ( 4 p + 2 ) 1 ( p + 4 ) N 1 ( p + 4 ) .
Finally, the joint probability of ( x y ) , the latter being a one-dimensional output, is as in (11) [56]:
f ^ x y ( x , y ) = 1 N ( λ ) p + 1 i = 1 N K ( x x j λ ) K ( y y j λ ) ,
or as in (12)
f ^ x y ( x , y ) = 1 N ( λ ) p + 1 i = 1 N exp ( 1 λ s i ) .
Therefore, s i is calculated by (13):
s i = j = 1 p | x j x i j | + | y y j | .
An example that shows the approximation capability of this proposal is its use to build a bi-variable Gaussian distribution function. This function is defined by (14):
f x y = 1 2 π σ x σ y 1 ρ 2   exp ( Γ )
where:
Γ = 1 2 ( 1 ρ 2 ) [ ( x μ x ) σ x 2 + ( y μ y ) σ y 2 + 2 ρ   ( x μ x ) ( y μ y ) σ x 2 σ y 2 ] .
with x and y being the variables used, μ x and μ y their respective averages, σ x and σ y the standard deviations, and ρ the correlation coefficient between them.
To exemplify the approximation capability using the city-block function, we generate 2000 samples of x and y with normal distribution and zero mean. With Equation (14), it is possible to plot fxy, represented graphically in Figure 4a together with its diagram in contour lines (Figure 4b). In parallel, we present the approximations using the city-block kernel function of (13) in Figure 4c,d. The proximity of the curves is clear, which illustrates how this function approximates the distributions. The correlation coefficient between them is 0.9932, although the circles are not perfectly concentric.
The final step in applying the MI filter is to define a confidence threshold that determines if an input belongs to the selected subset of explanatory variables. One possibility is to establish a minimum value for the MI and reject entries with a lower MI score. Another option is to use a bootstrapping or resampling technique to test the hypothesis of independence between x and y. For this, several sequences other than x in relation to y are built, in which the independent variable is reordered, and a vector of MIs is obtained. If this value surpasses the threshold at a given level of significance α, x and y are considered dependent. Thus, an input variable x is identified [57]. This work adopts the latter approach.
The example in Figure 5 refers to the calculation of the MI coefficients related to Furnas hydroelectric plant streamflow series. As in Figure 3, the samples are from January. In this case, we adopted p = 100 sequences and α = 5%, and the respective MI values calculated. For 12 lags initially considered as possible explanatory variables, the selected ones are lags 1, 8, and 9.

3.3. Partial Mutual Information

The MI criterion appears advantageous due to the non-parametric nature on identifying explanatory variables, regardless of the model’s nature to adjust afterward. Notwithstanding, two or more explanatory variables may be highly correlated; thus the choice of those variables would insert redundancy and an unnecessary increase in the complexity of the model. It may occur because the criterion does not perform a joint evaluation of the whole set of potential input variables. One way to deal with this is the proposal of [58], who reformulates the MI into what is known as the partial mutual information criterion (PMI). The PMI criterion measures the mutual information between the independent variable x and the dependent variable y, conditioned to a set of inputs z previously selected.
Consider that this set z exists. Next, it is necessary to extract the influence of this set concerning the other potential inputs evaluated yet, to calculate their real contribution, a different from the one already given by z. Thus, following Luna et al. [54] and Sharma [58], Equation (7) can be reformulated as (16):
P M I = 1 N i = 1 N log e [ f x y ( x i , y i ) f x ( x i ) f y ( y i ) ]
where x = x E ( x | z ) and y = y E ( y | z ) .
Here, x and y denote the residuals of x and y, respectively, after removing the conditional expectations given z, E ( x | z ) and E ( y | z ) . With this transformation, x and y can be interpreted as remaining information in both variables: what is yet different from the information in z; and what has not been explained yet from y.
Several approaches have been proposed for estimating expected means [58,59,60]. We opted for simplicity by following the non-parametric approach based on kernel regressions by using the Nadaraya–Watson estimator [61]. According to this, given two variables a and b, a general expected value E ( a | b ) is defined by (17):
r ^ ( a ) = i = 1 N w λ a ( a , a i )   b i
where:
w λ a ( a , a i ) = K λ a ( a a i ) i = 1 N K λ a ( a a i ) .
with K λ a ( a a i ) denoting the kernel function for variable a. As before, we will use the city-block function for this purpose.
Therefore, input selection, in this case, follows an iterative process. At the first iteration, MI scores are calculated for every potential input variable previously defined. The first input selected is the one with the higher MI score as long as its significance is validated, initiating z. In the following steps, PMI scores are calculated for all the potential input variables, updating z at each iteration, until the higher PMI is not statistically significant at all. The bootstrapping technique is once more used to verify the PMI scores significance at a 5% level.

3.4. Normalization of Maximum Relevance and Minimum Common Redundancy Mutual Information

Some studies use the principles of mutual information, extending it in different directions to increase the filers’ selection capability. The work from Che et al. [62] proposed to expand the MI using the maximum relevance and minimum common redundancy (MRMCR) between the inputs of a model (lags). This framework intends to determine the best set of inputs, controlling the redundancy between them.
The first step of this method is to calculate the common redundancy to evaluate the inputs’ common information. Following, one must apply the normalization of maximum relevance and minimum common redundancy (N-MRMCR-MI). The result presents values in the interval [0,1].
Let S be the subset of chosen inputs, and T the complementary non-selected group. Then, calculate the common mutual information (CI), using (19):
C I ( x i , S , y ) = m a x x j S { M I ( x i , x j ) m a x [ M I ( x i , x j ) , M I ( x i , y ) , M I ( x j , y ) ]   m i n [ M I ( x i , y ) , M I ( x j , y ) ] } ,
where i is the index of the variables in T, j the index of the inputs in S, and MI is the mutual information (see (7)):
The complete application of N-MRMCR-MI procedure is according to the following stages [62]:
(1) Initialization: be T = ( x 1 , x 2 , , x p ) the full set of inputs, and S = (empty);
(2) First input selection: calculate F ( x i ) using (20) for all i = 1, 2, …, p, and set the best on x i applying (21):
F ( x i ) = M I ( x i , y ) M I ( y , y ) ,
x i = arg max i = 1 , 2 , p { F ( x i ) }
(3) Update the groups: T = T { x i } , and S = { x i } ;
(4) Greedy selection: repeat steps 1 and 2 until the desired number of features is determined;
(5) Determine the N-MRMCR-MI considering the output variable using (22):
F ( x i ) = M I ( x i , y ) M I ( y , y ) C I ( x i , S , y ) M I ( y , y ) ,
(6) Update T = T { x i } , and S = { x i } ;
(7) Output the subset S.
In this work, we again address the bootstrapping to calculate the confidence level [57], and the city-block functions.

4. Wrappers

In the wrapper approach, the central aspect is the interaction between the variable selection mechanism and the forecasting model [63]. Once the model has already been adjusted, the wrapper will evaluate, through some performance criteria, each of the subsets to solve it [47]. However, the computational cost involved is high, as the model needs to be adjusted for each candidate subset [64]. The literature recommends using this method for cases where the number of samples is reduced [17]. Figure 6 shows the scheme of the method.
As shown in Figure 6, the operation of the wrapper is as follows: firstly, the set of entries u t is divided into smaller subsets u t ; next, the predictor is trained and executed for each of subset; after the forecasting stage, we calculate a performance score from the evaluator block for each subset. The one with the best value is used.
It is possible not to use the last predictor shown in Figure 6, if the result of each assessment is stored. However, we presented the selection as a separate task of the forecasting process for the sake of simplicity of understanding.

4.1. Progressive Selection

The computational cost of performing an exhaustive search of all possible subsets can be impractical even a relatively small problem since the computational cost is factorial. A proposal to overcome this problem is the wrapper using the progressive selection method. This methodology establishes a manner to build subsets of entries considering each one individually.
The procedure initiates with an empty subset, and we compare each variable with all others. The one presenting best performance measured by the evaluation function is selected, either to improve the result or to least deterioration of this value. After choosing the first entry, fixed in the subset, the others are evaluated to ascend as the second entry. We repeat this procedure until the evaluation of all V variables. The final subset is the one with the best overall result.
Figure 7 presents this idea, considering the Emborcação series, with the adjustment of an Extreme learning machines neural network (ELM) with a fixed number of 20 neurons in the hidden layer, and a maximum of 10 delays as input. As one can note, we selected three entries in this order: 4, 10, and 6, since this was the combination that had the lowest mean square error. In this case, it is noticeable that the selected entries are not consecutive and that the increase in the number of inputs does not necessarily improve performance.
The number of subsets formed in this case is equal to the number V of inputs, and the number of times the predictor needs training obeys V ( V + 1 ) / 2 . In the example, above 55 ELMs were adjusted, since V = 10.
It is also interesting to observe the behavior of the mean squared error (see Section 4.2) between iterations 2 and 4 in Figure 7. When adding the lag v6, the error decreased, improving the value of the objective function to be minimized. However, with the insertion of the variable v3, the error increased. This behavior occurs because the search may fall in local minima, which can circumvent at later iterations [47].

4.2. Evaluation Functions

After discussing how the wrapper method works, it is necessary to define a criterion for assessing the quality of forecasting using the previously defined subsets. This step corresponds to the Evaluator block in Figure 7.
The most straightforward criterion is to use some error metric that, for each adjusted set, shows the average of the differences between the desired and the predicted data. The proposal used by Puma-Villanueva et al. [47] is the mean absolute error (MAE), given by (23):
M A E = 1 N s t = 1 N s | x t x ^ t | ,
where x t is the desired sample in time t, x ^ t is the prediction, and N s the number of predicted samples.
Another possibility is to address the mean squared error (MSE), the most common metric used as a cost function in the training of neural networks, and in the estimation of AR model parameters. This metric is defined by (24):
M S E = 1 N s t = 1 N s ( x t x ^ t ) 2 .
Note that these criteria only consider the final result of the adjustment, regardless of the number of inputs. However, there are other types of evaluation functions seek to penalize the number of entries in order to select parsimonious subsets regarding the number of inputs. Criteria widely used are based on information measures [17].
Schwarz [65] proposed the Bayesian Information Criterion (BIC). It is based on linear correlation metrics, and is linked to the optimal orders of the forecasting model, as defined in (25):
B I C = N log e ( σ ^ a 2 ) + p log e ( N ) .
where N is the number of observations, p is the order or number of model entries and σ ^ a 2 is estimated variance of white noise (or residue).
Thus, the wrapper chooses the set of inputs with the lowest BIC value. Similarly, the Akaike Information Criterion (AIC) [66] is defined by (26):
A I C = N ln ( σ ^ a 2 ) + 2 p .
In both cases, it is clear that there is a penalty concerning the number of entries so that the inclusion of more inputs depends not only on the performance improvement but also on how much it is increased. Thus, the selected subset must be efficient and parsimonious. The difference between the criteria lies in the fact that the BIC penalizes the inclusion more strongly than the AIC. Observe the BIC last term (25) is the natural logarithm of the number of observations, while the AIC (26) has a multiplication of the order or number of model entries by 2.

5. Bio-Inspired Metaheuristics

Bio-inspired metaheuristics for optimization have been widely applied in VS tasks, especially their binary versions. The genetic algorithm (GA) belongs to the field of evolutionary computation because it is inspired by Darwinian natural selection. Another class of bio-inspired methods is the swarm-based approaches, from which many algorithms were created. The main characteristic of this class is the inspiration based on the collective behavior of groups of animals. Its primary representative is the particle swarm optimization (PSO) [67]. Still, many other algorithms can be cited as artificial bee colony, cat swarm optimization, fish school search, ant colony optimization, among others [68,69].
In this section, we briefly explain the two metaheuristics widely used in the literature for optimization: genetic algorithm and particle swarm optimization. Both techniques simulate multiple agents that evolve/adapt depending on the environment to find better solutions to one fitness function. The flexibility, robustness, and scalability are key advantages of applying metaheuristics to real problems.

5.1. Genetic Algorithm

The genetic algorithm (GA) was introduced by John Holland [70], and the theory of natural evolution inspires it. The population which adapts better to the environment circumstances perpetuates the genes, reproducing other individuals with more likely useful genes. Therefore, GA has a population of individuals that will pass into three processes: selection, crossover, and mutation to reproduce a better population until an a priori constraint is reached. For the binary version, a vector of binary values represents each individual of the population.
In summary, the GA starts generating an initial (generally random) population of individuals with the respective fitness function of their genes. Until the population has not converged, the three processes will be executed, and the fitness function will be updated when the genes are updated. The first process, selection, chooses the individuals that will reproduce new individuals. Then, in the following procedure, crossover, the adopted parents (two) will mix their genes to create a new offspring. Thirdly, the mutation process randomly selects genes to be changed. In general, each process establishes the number of individuals that will be updated. The algorithm is usually greedy (only allowing the update of the new individuals that are better than the current ones).

5.2. Particle Swarm Optimization

Kennedy and Eberhart [67] developed particle swarm optimization (PSO). After two years, the discrete or binary version of PSO was also published by them [71]. The PSO algorithm mimics the behavior of a flock of birds where each bird is a candidate solution. Each candidate solution i is represented by a position xi and a velocity vi. For the binary version, binary values represent the position and the velocity is continuous values between 0 and 1.
In summary, PSO starts generating an initial (generally random) population of birds with their respective fitness function. Then, until reaching a priori condition, all the birds update the velocity and position. The update of the post (flip the position for binary optimization) is performed each time that random value is higher (or smaller) than a transformation function of the velocity F ( v ) (such as sigmoid or tangent sigmoid). Using the sigmoid function, we can express that the new position is updated by:
x i t + 1 = { 0   i f   r a n d ( ) F ( v i t + 1 ) 1   i f   r a n d ( ) < F ( v i t + 1 )
Moreover, the new velocity is calculated based on the current velocity, and the delta displacement between the personal ( p b e s t i t ) and global ( g b e s t i t ) best positions. The p b e s t i t is updated every time that a particle finds a better position, and the g b e s t i t is the best position between the neighbors’ particle (defined a priori by the topology). The parameters c 1 and c 2 are a priori constants that define how altruistic or selfish each particle is, and the parameters r 1 and r 2 are random values between 0 and 1. Equation (28) shows the update process in the velocity:
v i t + 1 = w x i t + c 1 r 1 ( p b e s t i t x i t ) + c 2 r 2 ( g b e s t i t x i t ) .
Until the algorithm is converged, each particle updates the velocity and position. When the position is updated, the fitness function is also updated.

6. Case Study

In this section, we summarize the computational results of the linear model and the Extreme Learning Machines neural networks using the variable selection techniques discussed: filters, wrappers, and bio-inspired metaheuristics. As discussed in Section 1, seasonal streamflow series forecasting is essential for countries presenting hydroelectric plants to power generation. In the Brazilian case, 70% of the electric energy is hydroelectric generated [72]. Additionally, this task is vital to optimize the energetic planning [43,44,45,73].
First, it is paramount to mention the adopted assumptions. We performed the simulations with a maximum of 6 delays, following the literature [2,11,54]. We addressed two forecasting approaches: the use of one predictor to the whole series, and the monthly approach, in which we adjusted 12 different models, one for each month of the year.
Several investigations have shown that monthly streamflows present a seasonal behavior throughout the year, being non-stationary series. Linear models cannot be directly applied, being necessary to remove the seasonal component. In this work, we adopted the deseasonalization procedure to transform the series into stationary, with zero mean and variance equals one [2]. The process is reversed before the performance analysis. Equation (29) describes the deseasonalization procedure:
z i , m = s i , m μ ^ m σ ^ m .
where, s n is the original series formed by the samples s i , m , which is transformed into a new series z n ; μ ^ m is the monthly mean; σ ^ m the monthly standard deviation; and the month m = 1, 2…,12.
The series addressed are related to five important Brazilian hydroelectric plants: Furnas, Emborcação, Sobradinho, Agua Vermelha, and Passo Real. The datasets are available from 1931 to 2015, totaling 85 years or 1020 monthly samples. Each sample refers streamflow in m³/s. These data are public, being available on the website of the National Operator of the Electric System (ONS) [74]. We separated each series on three groups:
  • Training, from January 1st, 1931 to December 31st, 1995 (780 samples);
  • Validation, from January 1st, 1996 to December 31st, 2005 (120 samples); and
  • Test, from January 1st, 2006 to December 31st, 2015 (120 samples).
The mean and standard deviation of all series are available in Table 2. Note their distinct statistical and consequent hydrological behavior, enabling a broader analysis of the results.

6.1. Predictors

In this section, we briefly describe the forecasting models used in this work. We consider as predictors two methods: the autoregressive linear model (AR) from the Box and Jenkins methodology [50], and the extreme learning machines neural network (ELM), as the nonlinear representative. Note that when using the approach with 12 predictors, the linear method is called the Periodic Autoregressive model (PAR).
The AR approach linearly weights p past values u t = [ u t 1 , u t 2 , , u t p ] of a time series to provide a future response y t . Considering that the values of vector u are stationary, (30) explicates such a process:
y t = φ 1 u t 1 + φ 2 u t 2 + + φ p u t p + a t
where φ p are the free coefficients of the model.
A significant advantage of this method is the possibility of calculating its coefficients using a close form approach named Yule–Walker equations. This means that, using the same set of inputs, the model always converges to the same output. This method guarantees the minimum MSE between the output and the desired response.
The standard AR considers just one model to predict all values of the time series. However, it is possible to extend the AR to series, which presents variations in its structure [75], using the periodic autoregressive model (PAR). According to Hippel and McLeod [76], some historical series, such as hydrological ones with seasonal behavior, present an autocorrelation structure linked to the time delay between observations, and the observed period. In this sense, we can address one predictor for each month, the core of the PAR model. For monthly streamflow forecasting, we use 12 predictors, each one adjusted to predict the samples for each month [11].
The second forecasting model addressed is the extreme learning machine (ELM). ELMs are feedforward neural networks like the traditional multilayer perceptron (MLP), with only one intermediate layer. However, the training process differentiates them since the weights of the neurons in the hidden layer are randomly and independently determined. The training process does not adjust the weights of this layer, but only those of the output. The optimal values of the weights are typically calculated analytically since the training involves solving a linear regression problem [77]. Thus, there is no need to calculate derivatives, back-propagate error signals, or use iterative algorithms, which reduces the computational cost involved in the training process.
Bartlett [78] obtained an important theoretical result. The author proved that controlling the norm of synaptic weights is more relevant in terms of the generalization capability of a neural model than controlling the number of neurons in the middle layer. This leads to important evidence that an improvement occurs when the parameter vector has a minimum norm, so the effective number of neurons in the intermediate layer will be defined by the configuration of the weights of the output layer.
Given this statement, ELM presents a guarantee of good generalization effectively given by the weights of the output layer, and the weights of the intermediate layer can be defined at random. Because of this, the network’s training becomes linear in relation to the adjustable parameters for supervised training. The generalized Moore–Penrose operator is the most important candidate for solving this problem in the literature [79,80].
In this work, we address the neural network following the same premises of the AR and PAR models: just one ELM for the complete series (annual approach), and 12 ELMs, one adjusted for each month. We highlight that such method was chosen because it presented good results in monthly seasonal streamflow series forecasting, overcoming other neural models [2,12,43,44,45,80].

6.2. Computational Results

This investigation aims to analyze the quality of the predictions regarding the use of the aforementioned variable selection techniques: filters, wrappers, and bio-inspired metaheuristics. We consider as predictor the autoregressive model (AR), periodic autoregressive model (PAR) [50], and the extreme learning machines neural network (ELM) considering the annual and monthly approaches.
The purpose of these simulations is to find the input selection model that is more suitable for a linear and a non-linear methodologies. Note that the wrappers for AR and PAR models take into account the assessment of the fit of the training set, while the training error of ELMs are not as important because we are interested in the smallest generalization error.
The maximum number of inputs or delays allowed is six, as models of higher orders increase the possibility of negative auto-regressive coefficients [11]. Siqueira et al. [2,12] and Stedinger [52] defend this premise.
The computational results regarding the mean square error (MSE) and mean absolute error (MAE) in the real and deseasonalized (MESd and MAEd) domains for one step ahead are in Table 3, Table 4 and Table 5. The acronym “Lf” means the linear filter approach based on the partial autocorrelation function (the traditional PACF and the Stedinger’s approach PACF-Sted.), “Nf” corresponds to the non-linear filters developed using the mutual information principle (MI, PMI, and N-MRMCR-MI), “WR”, the wrapper method considering as evaluation function the BIC, AIC, and MSE, and “M” the metaheuristics, GA and PSO.
In the AR (Table 3) and PAR (Table 4) cases, we presented the error for training and test sets, while for the single and monthly ELM (Table 5), we just show the errors for the test, because we are interested in analyzing the generalization capability of such response. To the ELM, the results are an average of 30 simulations. The best performances in the test set regarding the MSE in the real domain are highlighted using shades of gray. Additionally, in Appendix A we explicate the inputs selected for all case studies in Table A1, Table A2, Table A3, Table A4 and Table A5.
We also applied the Friedman’s test to evaluate if the results are significantly distinct [57]. As expected, for the cases that the same set of inputs are selected, there is no statistical difference between the results for the same model. In almost all the other cases, the p-values achieved were close to zero, indicating that change the inputs leads to different conclusions. We discuss the exceptions below.
The critical analysis regarding the results achieved by the AR model reveals interesting behaviors (Table 3). For Furnas and Emborcação time series, there was no perfect correspondence between the best performance regarding the MSE and MAE in the real and deseasonalized domains’ errors. In such cases, we assumed the best predictor with the smallest MSE in the real space, following the premises already stated in previous works [2,12,80].
The general analysis showed other relevant issues: for training or test, at least two variable methods led to the same performances since they selected the same subset of inputs, except for the Agua Vermelha training set. As the AR optimized by the Yule-Walker equations presents a closed-form solution, the same input vector necessarily leads to the same output responses. This behavior can occur since just one model is adjusted, and the number of inputs is limited to six.
Likewise, these draws happened for the same class of variable selection method. Note, for example, for Emborcação, Sobradinho, and Passo Real, the smallest training error in the MSE sense was related to the non-linear filters, while for Furnas were the wrappers based on BIC and AIC.
However, we observed an intriguing behavior: the best performances were related to distinct variable selection methods for training and test sets (Table 3). Following some literature regarding monthly seasonal streamflow series forecasting [2,12], one should state the best variable selection method related to the error in the test set. However, the analysis of the training set presented the search capability of the methods. Indeed, the ideal behavior would be the same VS approach for both. The training set was better adjusted for MI filters in 4 cases, and the wrapper (BIC and AIC) in one. PACF best fitted the test sets four of five times, and by the MI filters, once.
Analyzing the inputs selected for AR model in Table A1, Table A2, Table A3, Table A4 and Table A5 in Appendix A, one can note that the models that fitted better in training set selected six lags for Emborcação, Sobradinho, and Passo Real. For Furnas and Agua Vermelha, two entries. Considering the test set, the PACF approaches, in general, selected three or four inputs. As expected, the MI methods usually chose more lags than the PACF, since they detect non-linear relations. As the linear approach achieved four of five best results, we can affirm that include all inputs in the AR model may tend to a configuration with less generalization capability.
Although the bio-inspired metaheuristics present an elevated search capability, for the AR neither, PSO nor GA achieved some of the best performances (Table 3).
Unlike the AR case, the PAR model’s results presented a draw just for the training set of Emborcação (Table 4). It happened because both GA and wrapper-MSE found the same set of inputs (see Table A2). Considering the optimization of 12 models simultaneously, totaling up to 72 free parameters, it is more likely that the variable selection models achieve distinct configurations.
To the PAR model (Table 4), the best VS method presented homogeneity regarding the four error metrics. On the other hand, we noted again that the best performance for training did not present correspondence with the test set. In the training set, the GA stood out, achieving the smallest errors for Emborcação, Furnas, and together with wrapper-MSE for Agua Vermelha. For the other scenarios, we highlight the N-MRMCR-MI. In the test set, just filters reached the smallest errors: PACF-Sted. (Furnas and Agua Vermelha), N-MRMCR-MI (Emborcação), and MI for the others.
Table A1, Table A2, Table A3, Table A4 and Table A5 reveal the GA and wrapper-MSE often selected five or six inputs for all months. In the comparison of wrapper methodologies, a pattern could be noticed, since the BIC selected fewer inputs than AIC. In practice, we observed the influence of each type of penalty regarding the insertion of new entries.
The results achieved by the ELM considering the annual approach, summarized in Table 5, must be discussed considering not just the numerical values of the errors, but also the statistical difference between them. Additionally, it is important to highlight that when applying neural networks, there is no interest in evaluating the training error, because we are looking for the configuration to achieve the highest generalization capability. Therefore, we discuss just the error in the test set.
In the annual models, we once again noted some ties for Furnas and Emborcação. In addition to Table 5 presenting distinct numerical values, the obtained p-value for the Friedman test was higher than 0.05, as expected, since the set of selected inputs was the same (see Table A1, Table A2, Table A3, Table A4 and Table A5). As the initialization of the weights of an ELM is random, the outputs are distinct, but the results are close. It is the reason we must run the algorithm at least 30 times.
For Furnas, wrapper (AIC and MSE), and the non-linear filters (MI and PMI) presented the best results, selecting the same entries. For Emborcação, a further discussion must be done. Except for the wrapper-BIC, all methods led to the same performances statistically, according to the Friedman test. However, one can find four distinct sets of inputs in Table A2. It can happen since the approximation capability of a neural network is elevated, and these models are universal approximators. Regarding the inputs, note that lags 1 and 4 belong to all subsets.
For the other series, the AIC stood out for Agua Vermelha and Sobradinho, while the PACF and PACF-Sted., for Passo Real (same set of inputs). The AIC method selected two inputs for all cases.
As observed for the linear prediction models, the MI methods tended to select more lags than the linear filters (see Table A1, Table A2, Table A3, Table A4 and Table A5). The PMI presented fewer inputs than their non-linear counterparts. In general, it seems clear that inserting too many inputs does not necessarily lead to an increase in performance, especially in the test set.
The monthly prediction using ELM, similar to the linear case, showed error values with a standard pattern. For all scenarios, the wrapper-MSE found the best performances, although the smallest MAE for Sobradinho is related to AIC. In a few cases, the best error metrics converge to the same predictor. Except for the tie in Emborcação, the metaheuristics did not achieve the best errors.

6.3. Discussion

Table 6 displays how many times each VS method led to the best performance, according to the models depicted in Table 3, Table 4 and Table 5: AR (training and test), PAR (training and test), annual ELM (test), and monthly ELM (test). Due to the statistical similarity between some of the best results, provided by Friedman test, we indicated the number of VS methods that achieved similar performance regarding each predictor between parentheses. For example, for the AR in the training set, the PMI reached the best result four times, once alone and three times together with two more VS methods. In bold and underlined it is highlighted the number of times some approach was the single best.
The general analysis considering all results allows some interesting observations. In annual approaches, we noted many draws, unlike in the monthly approach. It is plausible since only one predictor is adjusted. The monthly models presented more sprayed performances due to the number of models fitted for each series (see Table 3, Table 4 and Table 5).
In most cases, the PACF and PACF-Sted. presented the same set of inputs (see Appendix A, Table A1, Table A2, Table A3, Table A4 and Table A5). It indicates that, for hydrologic series, the hypothesis of dependency of consecutive delays makes sense. These methods were highlighted to the test set of the AR and PAR models (Table 6).
Considering the non-linear filters for annual models, these approaches presented as inputs all the six lags, except for Agua Vermelha. In general, comparing the MI-methods to the monthly cases, the PMI tended to select fewer inputs, while the N-MRMCR-MI selected more inputs, except for Agua Vermelha (see Table A1, Table A2, Table A3, Table A4 and Table A5). The MI-based approaches are avid for data since the estimation of the probability density function (PDF) gets better; the more data are available. That is why for annual models, such an approach tends to behave better than the monthly models. The AR was better adjusted for the training set using MI. Additionally, the N-MRMCR-MI won alone twice (Table 6).
Regarding the wrapper methodology, as expected, the MSE-based selected more inputs then BIC and AIC, since there is no penalty in introducing new lags. The BIC selected fewer entries than the AIC, due to its strong penalty function (see Table A1, Table A2, Table A3, Table A4 and Table A5). However, The BIC did not overcome the others by itself, unlike the AIC, which was the winner twice for ELM annual model. In summary, the wrapper approach stood out for the neural networks (Table 6).
Regarding the metaheuristics, just for the PAR in the training set, the GA achieved most of the best performances. These are powerful methods to deal with binary optimization problems like variable selection, and we believe this approach could be competitive [68,69]. Additionally, the computational cost was higher than the wrappers (Table 6). Moreover, we can also notice that for four of five series, the monthly ELM achieved the best general performances (Table 3, Table 4 and Table 5). It is particularly relevant since, in current days, one can still find the massive use of linear models in the literature [2,12].
Finally, we state that variable selection is a complex problem, little explored in the context of monthly seasonal streamflow forecasting. The variety among the responses proves the task’s inherent difficulties, which present elevated economic and environmental impacts. As can be seen, the models’ efficiency is greatly influenced by lags choice. In Figure 8, we present the best predictions achieved by the winner in each scenario: AR for Furnas and monthly ELM for the other cases.

7. Conclusions

This work performed an extensive investigation on the variable selection methods to determine the best subsets of inputs to increase the accuracy in the monthly seasonal streamflow series forecasting tasks. Most of the specialized literature focus on finding the best predictor, but the selection of the inputs presents an essential step for forecasting.
We addressed wrappers, linear and non-linear filters, and bio-inspired metaheuristics. The wrapper methodology can evaluate the quality of this subset under several criteria, including:
  • Mean square error (MSE);
  • Bayesian information criterion (BIC);
  • Bayesian information criterion (AIC).
  • The linear filters used were the:
  • Partial autocorrelation function (PACF);
  • PACF using the Stedinger [52] approach for hydrological series.
  • The nonlinear filters addressed were:
  • Mutual Information (MI);
  • Partial mutual information (PMI); and
  • Normalization of maximum relevance and minimum common redundancy mutual information (N-MRMCR-MI).
Regarding the metaheuristics, we used binary versions of the:
  • Particle swarm optimization (PSO); and
  • Genetic algorithm (GA).
We performed computational tests with the predictions made for five monthly series related to hydroelectric plants using the autoregressive linear model (AR), and extreme learning machine neural network (ELM) as predictors. We also addressed two forecasting approaches, the use of only one predictor for the whole series, and the use of 12 predictors, each one adjusted for each month.
The main findings of this investigation are:
  • The selected lags were very diverse depending on the method, especially for the monthly case;
  • For the annual approaches, some draws could be found;
  • The linear models perform better with filters;
  • The wrapper is the best choice for the neural network; and
  • Regarding the forecasting methods, the monthly ELM achieved the best error values.
These findings are especially important for countries where the power generation is mainly from hydroelectric plants, since this is the most important renewable power source in the world. Additionally, such investigation can contribute to energetic planning, water availability, and pricing strategies for the power productive chains.
Future works can be developed considering these approaches for other series of renewable inputs for power generation, like wind power [36,38]. Other problems related to energy generation as an estimation of methane production and biogas efficiency [81] could be treated and simulated using a similar methodology. Furthermore, other metaheuristics can be addressed, since there is a vast repertoire of possibilities being developed in the last years, such as differential evolution and artificial bee colony, among others.

Author Contributions

Conceptualization: H.S., S.L.S.J., Y.d.S.T., T.A.A., M.H.N.M., and I.L.; methodology: H.S., S.L.S.J., Y.d.S.T., and T.A.A.; software: H.S., I.L., D.S.O.J., and M.H.N.M.; validation: H.S., P.S.G.d.M.N., J.F.L.d.O., and M.H.N.M.; formal analysis: P.S.G.d.M.N., M.H.N.M., and J.F.L.d.O.; investigation: Y.d.S.T., and T.A.A.; resources: M.H.N.M; writing—original draft preparation: H.S., Y.d.S.T., M.H.N.M., I.L., L.A.S., and A.C.; writing—review and editing: M.d.A.L.F., L.A.S., T.A.A., and A.C.; illustrations: T.A.A.; visualization: P.S.G.d.M.N., T.A.A., and J.F.L.d.O.; supervision: L.A.S., A.C., and M.H.N.M; project administration: M.H.N.M; funding acquisition: H.S. and M.H.N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work received funding and technical support from AES and Associated Companies (of the CPFL, Brookfield, and Global group) as part of the ANEEL PD-0610-1004/2015 project, “IRIS - Integration of intermittent renewables: A simulation model of the operation of the Brazilian electrical system to support planning, operation, commercialization, and regulation”, which is part of an R and D program regulated by ANEEL, Brazil. The authors also thank the Advanced Institute of Technology and Innovation (IATI) for its support, the National Institute of Meteorology (INMET) for providing the data, and the Coordination for the Improvement of Higher Education Personnel - Brazil (CAPES)—Financing Code 001, for support.

Acknowledgments

This work was partially supported by the Brazilian agencies CAPES and FACEPE. The authors also thank the Brazilian National Council for Scientific and Technological Development (CNPq), process number 40558/2018-5, and Araucaria Foundation, process number #51497, for their financial support.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix we present in Table A1, Table A2, Table A3, Table A4 and Table A5 the inputs selected by each VS methodology, regarding the five monthtly seasonal streamflow series from five Brazilian hydroelectric plants. Note that the best performance related to Table 3, Table 4 and Table 5 in the test set are highlighted in bold. For the training set, the best results are in italics and bold.
Table A1. Furnas.
Table A1. Furnas.
MonthBICAICW-MSEPACFPACF
-Sted.
MIPMIN-MRMCR
-MI
GAPSO
PARJ1(1)1(1)5(1,3,4,5,6)1(1)1(1)1(1)1(1)3(1,5,6)5(1,3,4,5,6)5(12,3,4,6)
F1(1)1(1)6(1,2,3,4,5,6)2(1,6)1(1)2(1,2)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(2,4,6)
M1(1)2(1,2)6(1,2,3,4,5,6)2(1,6)1(1)2(1,2)1(1)5(1,2,3,4,5,5)6(1,2,3,4,5,6)2(1,5)
A2(1,2)3(1,2,3)6(1,2,3,4,5,6)2(1,2)2(1,2)4(1,2,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(1,3,5)
M4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)5(1,2,3,4,5)2(1,3)4(1,3,4,6)6(1,2,3,4,5,6)3(1,2,3)
J2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,2,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,4)
J2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)4(1,2,4,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,4,5,6)
A1(1)4(1,2,3,4)6(1,2,3,4,5,6)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,3,5,6)
S4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,6)
O2(3,4)3(3,4,6)6(1,2,3,4,5,6)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)1(2)6(1,2,3,4,5,6)5(1,2,3,4,6)4(1,3,4,6)
N1(1)2(1,2)6(1,2,3,4,5,6)2(1,5)1(1)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)5(1,2,3,4,5)4(1,2,4,5)
D2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,2,6)2(1,2)4(1,2,3,4)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(4,5)
ELMJ1(2)1(2)1(2)1(1)1(1)1(1)1(1)3(1,5,6)2(1,4)1(1)
F1(1)1(1)1(1)2(1,6)1(1)2(1,2)1(1)6(1,2,3,4,5,6)2(1,4)5(1,3,4,5,6)
M1(1)1(1)2(3,6)2(1,6)1(1)2(1,2)1(1)5(1,2,3,4,5,5)5(1,2,3,4,5)3(1,3,4)
A1(5)1(5)1(5)2(1,2)2(1,2)4(1,2,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,2)4(1,2,4,5)
M1(2)1(2)1(2)3(1,2,3)3(1,2,3)5(1,2,3,4,5)2(1,3)4(1,3,4,6)2(2,3)3(1,2,3)
J1(1)1(1)1(1)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,2,5)6(1,2,3,4,5,6)1(1)1(1)
J1(1)1(1)1(1)2(1,2)2(1,2)6(1,2,3,4,5,6)4(1,2,4,5)6(1,2,3,4,5,6)2(1,6)1(1)
A1(1)1(1)1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,5)1(3)
S1(1)1(1)1(1)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)1(1)6(1,2,3,4,5,6)1(1)2(1,3)
O1(1)1(1)1(1)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)1(2)6(1,2,3,4,5,6)1(1)3(1,2,6)
N1(5)1(5)2(2,5)2(1,5)1(1)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)1(1)3(2,3,5)
D1(2)1(2)1(2)3(1,2,6)2(1,2)4(1,2,3,4)1(1)6(1,2,3,4,5,6)1(2)2(2,6)
AR2(1,2)2(1,2)4(1,2,3,5)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,5)4(1,2,3,5)
ELM1(1)2(1,4)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,5)5(1,2,4,5,6)
Table A2. Emborcação.
Table A2. Emborcação.
Month BIC AIC W-MSE PACF PACF
-Sted.
MI PMI N-MRMCR
-MI
GA PSO
PARJ1(1)1(1)6(1,2,3,4,5,6)2(1,6)1(1)1(6)1(6)1(1)6(1,2,3,4,5,6)3(4,5,6)
F1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)1(1)1(1)1(1)6(1,2,3,4,5,6)4(1,2,4,5)
M1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)2(1,2)1(1)1(1)6(1,2,3,4,5,6)2(3,4)
A3(1,2,5)3(1,2,5)6(1,2,3,4,5,6)3(1,2,5)2(1,2)3(1,2,3)2(1,2)2(1,2)6(1,2,3,4,5,6)1(4)
M4(1,2,3,5)4(1,2,3,5)6(1,2,3,4,5,6)2(1,3) 1(1)5(1,2,3,4,5)6(1,2,3,4,5,6)3(1,2,3)6(1,2,3,4,5,6)2(5,6)
J3(1,3,5)4(1,2,3,5)5(1,2,3,5,6)1(1)1(1)5(1,2,3,4,5)3(1,2,5)2(1,2)5(1,2,3,5,6)2(1,2)
J3(1,2,6)3(1,2,6)3(1,2,6)1(1)1(1)6(1,2,3,4,5,6)2(1,2)2(1,2)3(1,2,6)1(1)
A1(1)1(1)1(1)2(1,3)1(1)6(1,2,3,4,5,6)2(1,2)1(1)1(1)2(1,2)
S1(1)1(1)2(1,2)2(1,3)1(1)6(1,2,3,4,5,6)2(1,2)4(1,2,3,4)2(1,2)2(1,2)
O1(1)1(1)3(1,2,3)4(1,3,4,6)1(1)6(1,2,3,4,5,6)1(1)4(1,2,3,4)3(1,2,3)1(2)
N2(1,2)2(1,2)4(1,2,3,4)3(1,2,5)2(1,2)1(1)1(1)1(1)4(1,2,3,4)2(4,5)
D1(1)1(1)5(1,2,3,4,5)3(1,5,6)1(1)2(1,2)2(1,6)2(1,2)5(1,2,3,4,5)3(1,2,3)
ELMJ1(2)1(2)3(1,2,6)2(1,6)1(1)1(6)1(6)1(1)4(1,2,3,5)3(4,5,6)
F1(1)1(1)2(1,5)1(1)1(1)1(1)1(1)1(1)2(1,4)2(1,3)
M1(1)1(1)2(2,3)1(1)1(1)2(1,2)1(1)1(1)1(2)5(1,2,4,5,6)
A1(5)1(5)1(5)3(1,2,5)2(1,2)3(1,2,3)2(1,2)2(1,2)1(3)1(5)
M1(2)1(2)1(2)2(1,3)1(1)5(1,2,3,4,5)6(1,2,3,4,5,6)3(1,2,3)2(1,4)2(1,5)
J1(1)1(1)3(1,5,6)1(1)1(1)5(1,2,3,4,5)3(1,2,5)2(1,2)2(1,5)5(1,2,3,4,5)
J1(1)1(1)2(1,2)1(1)1(1)6(1,2,3,4,5,6)2(1,2)2(1,2)2(1,6)4(1,3,4,6)
A1(1)1(1)3(1,4,5)2(1,3)1(1)6(1,2,3,4,5,6)2(1,2)1(1)2(1,6)4(1,2,3,5)
S1(1)1(1)3(1,5,6)2(1,3)1(1)6(1,2,3,4,5,6)2(1,2)4(1,2,3,4)5(1,2,4,5,6)3(2,5,6)
O1(1)1(1)3(1,3,4)4(1,3,4,6)1(1)6(1,2,3,4,5,6)1(1)4(1,2,3,4)5(1,2,3,4,5)4(1,2,4,6)
N1(5)1(5)3(1,4,5)3(1,2,5)2(1,2)1(1)1(1)1(1)2(1,3)3(1,2,3)
D1(2)1(2)2(2,5)3(1,5,6)1(1)2(1,2)2(1,6)2(1,2)3(1,2,5)4(2,4,5,6)
AR2(1,2)2(1,2)2(1,2)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,2)2(1,2)
ELM1(1)2(1,4)6(1,2,3,4,5,6)4(1,2,3,4)4(1,2,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,4,5,6)
Table A3. Sobradinho.
Table A3. Sobradinho.
MonthBICAICW-MSEPACFPACF
-Sted.
MIPMIN-MRMCR
-MI
GAPSO
PARJ1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)3(1,2,3)1(1)4(1,4,5,6)6(1,2,3,4,5,6)2(3,6)
F3(1,3,5)3(1,3,5)6(1,2,3,4,5,6)3(1,4,5)1(1)4(1,2,4,5)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,4)
M1(1)1(1)5(1,2,3,4,6)1(1)1(1)2(1,2)1(1)6(1,2,3,4,5,6)5(1,2,3,4,6)3(1,2,5)
A1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)5(1,2,3,4,6)1(5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(2,4,6)
M1(1)2(1,3)6(1,2,3,4,5,6)1(1)1(1)4(1,2,3,4)2(1,3)5(1,2,4,5,6)6(1,2,3,4,5,6)3(1,3,4)
J5(1,2,3,5,6)5(1,2,3,5,6)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)
J5(1,2,3,4,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)1(1)1(1)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)5(1,2,3,5,6)2(1,3)
A1(1)1(1)5(1,2,3,4,5)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,5,4)6(1,2,3,4,5,6)5(1,2,3,4,5)4(1,2,3,5)
S1(1)1(1)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,3,4,5)
O1(1)1(1)4(1,2,3,4)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,2,6)6(1,2,3,4,5,6)4(1,2,3,4)6(1,2,3,4,5,6)
N1(1)2(1,3)2(1,3)1(1)1(1)5(1,2,3,4,5)1(1)6(1,2,3,4,5,6)2(1,3)1(1)
D1(1)1(1)5(1,2,3,4,5)2(1,6)1(1)2(1,2)1(1)6(1,2,3,4,5,6)5(1,2,3,4,5)1(6)
ELMJ1(2)1(2)1(3)1(1)1(1)3(1,2,3)1(1)4(1,4,5,6)1(3)3(1,3,5)
F1(1)1(1)3(1,2,6)3(1,4,5)1(1)4(1,2,4,5)1(1)6(1,2,3,4,5,6)2(1,6)4(1,2,5,6)
M1(1)1(1)2(1,3)1(1)1(1)2(1,2)1(1)6(1,2,3,4,5,6)3(1,5,6)3(1,4,6)
A1(5)1(5)3(1,2,4)1(1)1(1)5(1,2,3,4,6)1(5)6(1,2,3,4,5,6)2(1,6)5(1,2,3,5,6)
M1(2)1(2)1(1)1(1)1(1)4(1,2,3,4)2(1,3)5(1,2,4,5,6)2(1,3)1(1)
J1(1)1(1)2(1,4)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,4,5,6)3(1,3,6)
J1(1)1(1)1(1)1(1)1(1)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)1(1)2(1,4)
A1(1)1(1)1(1)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,5,4)6(1,2,3,4,5,6)1(1)2(1,2)
S1(1)1(1)2(1,5)2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)6(1,2,3,4,5,6)2(1,2)3(1,4,5)
O1(1)1(1)5(1,2,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,2,6)6(1,2,3,4,5,6)1(3)4(1,4,5,6)
N1(5)1(5)2(1,3)1(1)1(1)5(1,2,3,4,5)1(1)6(1,2,3,4,5,6)1(3)4(1,4,5,6)
D1(2)1(2)4(2,3,5,6)2(1,6)1(1)2(1,2)1(1)6(1,2,3,4,5,6)1(2)5(1,2,3,4,5)
AR2(1,3)2(1,3)2(1,3)2(1,3)3(1,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,3)2(1,3)
ELM1(1)2(1,4)6(1,2,3,4,5,6)1(1)3(1,3,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,4)
Table A4. Agua Vermelha.
Table A4. Agua Vermelha.
MonthBICAICW-MSEPACFPACF
-Sted.
MIPMIN-MRMCR
-MI
GAPSO
PARJ1(1)1(1)6(1,2,3,4,5,6)2(1,5)1(1)3(1,5,6)1(1)1(1)5(1,3,4,5,6)3(3,4,6)
F1(1)1(1)6(1,2,3,4,5,6)2(1,6)1(1)1(1)1(1)1(1)6(1,2,3,4,5,6)2(5,6)
M1(1)1(1)6(1,2,3,4,5,6)2(1,6)1(1)2(1,2)1(1)1(1)6(1,2,3,4,5,6)3(3,5,6)
A3(1,2,6)3(1,2,6)4(1,2,5,6)2(1,2)2(1,2)3(1,2,4)1(1)2(1,2)4(1,2,5,6)2(2,3)
M4(1,2,3,5)4(1,2,3,5)5(1,2,3,4,5)3(1,2,3)3(1,2,3)5(1,2,3,4,5)2(1,2)3(1,2,3)5(1,2,3,4,5)3(2,5,6)
J1(1)1(1)5(1,2,4,5,6)1(1) 1(1)5(1,2,3,4,5)6(1,2,3,4,5,6)2(1,2)5(1,2,4,5,6)4(2,3,4,5)
J3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)1(3)
A1(1)5(1,2,3,4,6)6(1,2,3,4,5,6)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)1(1)6(1,2,3,4,5,6)5(2,3,4,5,6)
S2(1,3)2(1,3)6(1,2,3,4,5,6)2(1,2)2(1,2)6(1,2,3,4,5,6)1(1)4(1,2,3,4)6(1,2,3,4,5,6)1(1)
O2(1,3)3(1,3,6)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,4)6(1,2,3,4,5,6)1(1)
N1(1)2(1,2)6(1,2,3,4,5,6)1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)6(1,2,3,4,5,6)4(1,2,5,6)
D1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)5(1,2,3,4,5)1(1)2(1,2)6(1,2,3,4,5,6)1(4)
ELMJ1(2)1(2)2(1,6)2(1,5)1(1)3(1,5,6)1(1)1(1)5(1,2,3,4,6)4(1,2,4,6)
F1(1)1(1)1(1)2(1,6)1(1)1(1)1(1)1(1)4(1,2,3,6)4(1,2,4,5)
M1(1)1(1)2(1,2)2(1,6)1(1)2(1,2)1(1)1(1)2(1,2)6(1,2,3,4,5,6)
A1(5)1(5)2(1,2)2(1,2)2(1,2)3(1,2,4)1(1)2(1,2)2(1,2)4(1,2,3,5)
M1(2)1(2)1(1)3(1,2,3)3(1,2,3)5(1,2,3,4,5)2(1,2)3(1,2,3)4(1,2,4,6)4(2,3,5,6)
J1(1)1(1)3(1,2,6)1(1) 1(1)5(1,2,3,4,5)6(1,2,3,4,5,6)2(1,2)2(2,5)3(1,2,4)
J1(1)1(1)2(1,2)2(1,2)2(1,2)6(1,2,3,4,5,6)2(1,2)2(1,2)2(1,2)3(1,2,5)
A1(1)1(1)3(1,2,5)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)1(1)5(1,2,3,4,5)2(1,2)
S1(1)1(1)2(4,6)2(1,2)2(1,2)6(1,2,3,4,5,6)1(1)4(1,2,3,4)2(2,6)2(5,6)
O1(1)1(1)2(1,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,4)3(1,3,4)3(1,2,5)
N1(5)1(5)1(6)1(1)1(1)6(1,2,3,4,5,6)1(1)1(1)2(2,5)4(2,3,5,6)
D1(2)1(2)3(2,3,4)1(1)1(1)5(1,2,3,4,5)1(1)2(1,2)1(2)2(2,6)
AR2(1,2)2(1,2)2(1,2)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)2(1,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)
ELM1(1)2(1,4)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)2(1,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,4)
Table A5. Passo Real.
Table A5. Passo Real.
MonthBICAICW-MSEPACFPACF
-Sted.
MIPMIN-MRMCR
-MI
GAPSO
PARJ2(1,2)2(1,2)6(1,2,3,4,5,6)3(1,5,6)1(1)3(1,2,4)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,4,5,6)
F2(1,3)3(1,3,5)5(1,3,4,5,6)1(1)1(1)3(1,2,3)1(1)6(1,2,3,4,5,6)5(1,3,4,5,6)3(2,5,6)
M1(1)2(1,6)5(1,2,3,4,6)2(1,2)2(1,2)6(1,2,3,4,5,6)1(1)6(1,2,3,4,5,6)5(1,2,3,4,6)2(2,5)
A1(2)1(2)6(1,2,3,4,5,6)3(1,2,4)2(1,2)5(1,2,3,4,5)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(1,5,6)
M1(1)2(1,5)4(1,3,5,6)5(1,2,3,4,6)4(1,2,3,4)6(1,2,3,4,5,6)2(1,6)6(1,2,3,4,5,6)4(1,3,5,6)2(2,3)
J1(1)2(1,2)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,5)3(1,2,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(1,3,5)
J3(1,2,4)3(1,2,4)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,3,4,6)2(1,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(3,4,5,6)
A1(1)3(1,2,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,3,4,5)2(1,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(2,3,6)
S1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,3)2(1,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)3(2,4,5)
O1(1)3(1,2,4)5(1,2,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,4)2(1,4)6(1,2,3,4,5,6)5(1,2,4,5,6)2(3,4)
N3(1,2,3)4(1,2,3,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,3)2(1,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(5,6)
D1(1)2(1,4)6(1,2,3,4,5,6)5(1,2,3,4,6)4(1,2,3,4)3(1,2,4)1(1)6(1,2,3,4,5,6)5(1,3,4,5,6)1(1)
ELMJ1(2)1(2)1(2)3(1,5,6)1(1)3(1,2,4)1(1)6(1,2,3,4,5,6)2(1,4)3(1,3,4)
F1(1)1(1)1(1)1(1)1(1)3(1,2,3)1(1)6(1,2,3,4,5,6)2(1,2)5(1,2,3,4,6)
M1(1)1(1)2(3,6)2(1,2)2(1,2)6(1,2,3,4,5,6)1(1)6(1,2,3,4,5,6)2(1,2) 3(1,3,6)
A1(5)1(5)1(5)3(1,2,4)2(1,2)5(1,2,3,4,5)3(1,2,3)6(1,2,3,4,5,6)2(1,2)3(1,3,4)
M1(2)1(2)1(2)5(1,2,3,4,6)4(1,2,3,4)6(1,2,3,4,5,6)2(1,6)6(1,2,3,4,5,6)1(4)4(1,2,3,6)
J1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,5)3(1,2,5)6(1,2,3,4,5,6)2(2,5)1(1)
J1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,3,4,6)2(1,3)6(1,2,3,4,5,6)3(2,3,6)2(1,5)
A1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)5(1,2,3,4,5)2(1,5)6(1,2,3,4,5,6)3(3,4,5)3(2,3,5)
S1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,3)2(1,3)6(1,2,3,4,5,6)4(1,2,4,5)3(2,3,5)
O1(1)1(1)1(1)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,4)2(1,4)6(1,2,3,4,5,6)3(1,4,5)3(2,4,6)
N1(5)1(5)2(2,5)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,3)2(1,3)6(1,2,3,4,5,6)3(1,2,3)1(1)
D1(2)1(2)1(2)5(1,2,3,4,6)4(1,2,3,4)3(1,2,4)1(1)6(1,2,3,4,5,6)3(1,4,5)2(1,3)
AR2(1,2)2(1,2)2(1,2)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)2(1,2)2(1,2)
ELM1(1)2(1,4)6(1,2,3,4,5,6)3(1,2,3)3(1,2,3)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)6(1,2,3,4,5,6)4(1,2,3,6)

References

  1. IHA—International Hydropower Association. Hydropower Status Report: Sector Trends and Insights. 2020. Available online: https://www.hydropower.org/publications/2020-hydropower-status-report (accessed on 15 May 2020).
  2. Siqueira, H.V.; Boccato, L.; Luna, I.; Attux, R.; Lyra, C. Performance analysis of unorganized machines in streamflow forecasting of Brazilian plants. Appl. Soft Comput. 2018, 68, 494–506. [Google Scholar] [CrossRef]
  3. Zhu, S.; Zhou, J.; Ye, L.; Meng, C. Streamflow estimation by support vector machine coupled with different methods of time series decomposition in the upper reaches of Yangtze River, China. Environ. Earth Sci. 2016, 75, 531. [Google Scholar] [CrossRef]
  4. Dilini, W.; Attygalle, D.; Hansen, L.L.; Nandalal, K.W. Ensemble Forecast for monthly Reservoir Inflow; A Dynamic Neural Network Approach. In Proceedings of the 4th Annual International Conference on Operations Research and Statistics (ORS 2016), Global Science and Technology Forum, Singapore, 18–19 January 2016; pp. 84–90. [Google Scholar]
  5. Fouad, G.; Loáiciga, H.A. Independent variable selection for regression modeling of the flow duration curve for ungauged basins in the United States. J. Hydrol. 2020, 587, 124975. [Google Scholar] [CrossRef]
  6. Arsenault, R.; Côté, P. Analysis of the effects of biases in ensemble streamflow prediction (ESP) forecasts on electricity production in hydropower reservoir management. Hydrol. Earth Syst. Sci. 2019, 23, 2735–2750. [Google Scholar] [CrossRef] [Green Version]
  7. Stojković, M.; Kostić, S.; Prohaska, S.; Plavsic, J.; Tripković, V. A new approach for trend assessment of annual streamflows: A case study of hydropower plants in Serbia. Water Resour. Manag. 2017, 31, 1089–1103. [Google Scholar] [CrossRef]
  8. Hailegeorgis, T.T.; Alfredsen, K. Regional statistical and precipitation-runoff modelling for ecological applications: prediction of hourly streamflow in regulated rivers and ungauged basins. River Res. Appl. 2016, 33, 233–248. [Google Scholar] [CrossRef] [Green Version]
  9. Hernandez-Ambato, J.; Asqui-Santillan, G.; Arellano, A.; Cunalata, C. Multistep-ahead Streamflow and Reservoir Level Prediction Using ANNs for Production Planning in Hydroelectric Stations. In Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA 2017), Institute of Electrical and Electronics Engineers (IEEE), Cancun, Mexico, 18–21 December 2017; pp. 479–484. [Google Scholar]
  10. Yaseen, Z.M.; Jaafar, O.; Deo, R.; Kisi, O.; Adamowski, J.; Quilty, J.; El-Shafie, A. Stream-flow forecasting using extreme learning machines: A case study in a semi-arid region in Iraq. J. Hydrol. 2016, 542, 603–614. [Google Scholar] [CrossRef]
  11. Maceira, M.E.P.; Damázio, J.M. Use of the PAR (p) model in the stochastic dual dynamic programming optimization scheme used in the operation planning of the brazilian hydropower system. Probab. Eng. Inf. Sci. 2005, 20, 143–156. [Google Scholar] [CrossRef]
  12. Siqueira, H.V.; Boccato, L.; Attux, R.; Lyra, C. Unorganized machines for seasonal streamflow series forecasting. Int. J. Neural Syst. 2014, 24, 1430009. [Google Scholar] [CrossRef] [PubMed]
  13. Munera, S.; Amigo, J.M.; Aleixos, N.; Talens, P.; Cubero, S.; Blasco, J. Potential of VIS-NIR hyperspectral imaging and chemometric methods to identify similar cultivars of nectarine. Food Control. 2018, 86, 1–10. [Google Scholar] [CrossRef]
  14. Yan, L.; Liu, C.; Qu, H.; Liu, W.; Zhang, Y.; Yang, J.; Zheng, L. Discrimination and measurements of three flavonols with similar structure using terahertz spectroscopy and chemometrics. J. Infrared Millim. Terahertz Waves 2018, 39, 492–504. [Google Scholar] [CrossRef]
  15. Moon, Y.I.; Rajagopalan, B.; Lallid, U. Estimation of mutual information using kernel density estimators. Phys. Rev. E 1995, 52, 2318–2321. [Google Scholar] [CrossRef] [PubMed]
  16. Crone, S.F.; Kourentzes, N. Feature selection for time series prediction—A combined filter and wrapper approach for neural networks. Neurocomputing 2010, 73, 1923–1936. [Google Scholar] [CrossRef] [Green Version]
  17. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, A.; Xie, L. Technology using near infrared spectroscopic and multivariate analysis to determine the soluble solids content of citrus fruit. J. Food Eng. 2014, 143, 17–24. [Google Scholar] [CrossRef]
  19. Yin, L.; Zhou, J.; Chen, D.; Han, T.; Zheng, B.; Younis, A.; Shao, Q. A review of the application of near-infrared spectroscopy to rare traditional Chinese medicine. Spectrochim. Acta Part. A Mol. Biomol. Spectrosc. 2019, 221, 117208. [Google Scholar] [CrossRef]
  20. Harrell, F.E. Regression Modeling Strategies; Springer Science and Business Media LLC: New York, NY, USA, 2001. [Google Scholar]
  21. Tsakiris, G.; Nalbantis, I.; Cavadias, G. Regionalization of low flows based on canonical correlation analysis. Adv. Water Resour. 2011, 34, 865–872. [Google Scholar] [CrossRef]
  22. Li, X.; Liu, Z.; Lin, H.; Wang, G.; Sun, H.; Long, J.; Zhang, M. Estimating the growing stem volume of chinese pine and larch plantations based on fused optical data using an improved variable screening method and stacking algorithm. Remote. Sens. 2020, 12, 871. [Google Scholar] [CrossRef] [Green Version]
  23. Bonah, E.; Huang, X.; Aheto, J.H.; Yi, R.; Yu, S.; Tu, H. Comparison of variable selection algorithms on vis-NIR hyperspectral imaging spectra for quantitative monitoring and visualization of bacterial foodborne pathogens in fresh pork muscles. Infrared Phys. Technol. 2020, 107, 103327. [Google Scholar] [CrossRef]
  24. Xiong, Y.; Zhang, R.; Zhang, F.; Yang, W.; Kang, Q.; Chen, W.; Du, Y. A spectra partition algorithm based on spectral clustering for interval variable selection. Infrared Phys. Technol. 2020, 105, 103259. [Google Scholar] [CrossRef]
  25. Speiser, J.L.; Miller, M.A.; Tooze, J.; Ip, E.H. A comparison of random forest variable selection methods for classification prediction modeling. Expert Syst. Appl. 2019, 134, 93–101. [Google Scholar] [CrossRef]
  26. Rendall, R.; Pereira, A.C.; Reis, M. An extended comparison study of large scale datadriven prediction methods based on variable selection, latent variables, penalized regression and machine learning. Comput. Aided Chem. Eng. 2016, 38, 1629–1634. [Google Scholar] [CrossRef]
  27. Marcjasz, G.; Uniejewski, B.; Weron, R. Beating the naïve—Combining LASSO with naïve intraday electricity price forecasts. Energies 2020, 13, 1667. [Google Scholar] [CrossRef] [Green Version]
  28. Santi, V.M.A.; Notodiputro, K.; Sartono, B. Variable selection methods applied to the mathematics scores of Indonesian students based on convex penalized likelihood. J. Phys. Conf. Ser. 2019, 1402, 077096. [Google Scholar] [CrossRef]
  29. Karim, N.; Reid, C.M.; Tran, L.; Cochrane, A.; Billah, B. Variable selection methods for multiple regressions influence the parsimony of risk prediction models for cardiac surgery. J. Thorac. Cardiovasc. Surg. 2017, 153, 1128–1135.e3. [Google Scholar] [CrossRef]
  30. Kim, D.; Kang, S. Effect of irrelevant variables on faulty wafer detection in semiconductor manufacturing. Energies 2019, 12, 2530. [Google Scholar] [CrossRef] [Green Version]
  31. Furmańczyk, K.; Rejchel, W. Prediction and variable selection in high-dimensional misspecified binary classification. Entropy 2020, 22, 543. [Google Scholar] [CrossRef]
  32. Tutkun, N.A.; Atilgan, Y.K. Visual research on the trustability of classical variable selection methods in Cox regression. Hacet. J. Math. Stat. 2020, 49, 1–18. [Google Scholar] [CrossRef]
  33. Mehmood, T.; Saebø, S.; Liland, K.H. Comparison of variable selection methods in partial least squares regression. J. Chemom. 2020, 34, e3226. [Google Scholar] [CrossRef] [Green Version]
  34. McGee, M.; Yaffee, R.A. Comparison of Variable Selection Methods for Forecasting from Short Time Series. In Proceedings of the 6th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2019), Institute of Electrical and Electronics Engineers (IEEE), Washington, DC, USA, 5–8 October 2019; pp. 531–540. [Google Scholar]
  35. Seo, H.S. Unified methods for variable selection and outlier detection in a linear regression. Commun. Stat. Appl. Methods 2019, 26, 575–582. [Google Scholar] [CrossRef]
  36. Dong, W.; Yang, Q.; Fang, X. Multi-Step ahead wind power generation prediction based on hybrid machine learning techniques. Energies 2018, 11, 1975. [Google Scholar] [CrossRef] [Green Version]
  37. Sigauke, C.; Nemukula, M.M.; Maposa, D. Probabilistic hourly load forecasting using additive quantile regression models. Energies 2018, 11, 2208. [Google Scholar] [CrossRef] [Green Version]
  38. Wang, H.; Sun, J.; Sun, J.; Wang, J. Using random forests to select optimal input variables for short-term wind speed forecasting models. Energies 2017, 10, 1522. [Google Scholar] [CrossRef] [Green Version]
  39. Taormina, R.; Ahmadi, M.H. Data-driven input variable selection for rainfall–runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines. J. Hydrol. 2015, 529, 1617–1632. [Google Scholar] [CrossRef]
  40. Taormina, R.; Ahmadi, M.H.; Sivakumar, B. Neural network river forecasting through baseflow separation and binary-coded swarm optimization. J. Hydrol. 2015, 529, 1788–1797. [Google Scholar] [CrossRef]
  41. Cui, X.; Jiang, M. Chaotic time series prediction based on binary particle swarm optimization. AASRI Proc. 2012, 1, 377–383. [Google Scholar] [CrossRef]
  42. Silva, N.; Siqueira, I.; Okida, S.; Stevan, S.L.; Siqueira, H.V. Neural networks for predicting prices of sugarcane derivatives. Sugar Tech. 2018, 21, 514–523. [Google Scholar] [CrossRef]
  43. Siqueira, H.V.; Boccato, L.; Attux, R.; Lyra, C. Echo state networks and extreme learning machines: A comparative study on seasonal streamflow series prediction. In Computer Vision; Springer Science and Business Media LLC: Heidelberg/Berlin, Germany, 2012; Volume 7664, pp. 491–500. [Google Scholar]
  44. Siqueira, H.V.; Boccato, L.; Attux, R.; Filho, C.L. Echo state networks for seasonal streamflow series forecasting. In Computer Vision; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2012; Volume 7435, pp. 226–236. [Google Scholar]
  45. Siqueira, H.V.; Boccato, L.; Attux, R.; Filho, C.L. Echo State networks in seasonal streamflow series prediction. Learn. Nonlinear Model 2012, 10, 181–191. [Google Scholar] [CrossRef] [Green Version]
  46. Kachba, Y.R.; Chiroli, D.M.D.G.; Belotti, J.T.; Alves, T.A.; Tadano, Y.D.S.; Siqueira, H.V. Artificial neural networks to estimate the influence of vehicular emission variables on morbidity and mortality in the largest metropolis in South America. Sustainability 2020, 12, 2621. [Google Scholar] [CrossRef] [Green Version]
  47. Puma-Villanueva, W.; Dos Santos, E.; Von Zuben, F. Data partition and variable selection for time series prediction using wrappers. In Proceedings of the 2006 IEEE International Joint Conference on Neural Network Proceedings, Institute of Electrical and Electronics Engineers (IEEE), Vancouver, BC, Canada, 16–21 July 2006; pp. 4740–4747. [Google Scholar]
  48. Yu, L.; Liu, H. Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 2004, 5, 1205–1224. [Google Scholar]
  49. Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; John Wiley and Sons: New York, NY, USA, 2001; ISBN 978-0-471-40540-5. [Google Scholar]
  50. Geurts, M.; Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley and Sons: Hoboken, NJ, USA, 2016; ISBN 978-1-118-67502-1. [Google Scholar]
  51. Quenouille, M.H. Approximate tests of correlation in time-series. J. R. Stat. Soc. Ser. B 1949, 11, 68–84. [Google Scholar] [CrossRef]
  52. Stedinger, J.R. Report on the Evaluation of CEPEL’s PAR Models, Technical Report; School of Civil and Environmental Engineering—Cornell University, Ithaca: New York, NY, USA, 2001. [Google Scholar]
  53. Bonnlander, V.; Weigend, A.S. Selecting Input Variables Using Mutual Information and Nonparametric Density Estimation. In Proceedings of the 1994 International Symposium on Artificial Neural Networks (ISANN’94), National Cheng Kung University, Taiwan, China, 28 June–2 July 1994; pp. 42–50. [Google Scholar]
  54. Luna, I.; Soares, S.; Ballini, R. Partial Mutual Information Criterion for Modelling Time Series Via Neural Networks. In Proceedings of the 11th Information Processing and Management of Uncertainty in Knowledge-Based System (IPMU 2006), Université Pierre et Marie Curie, Paris, France, 2–7 July 2006; pp. 2012–2019. [Google Scholar]
  55. Bowden, G.J.; Dandy, G.; Maier, H.R. Input determination for neural network models in water resources applications. Part 1—Background and methodology. J. Hydrol. 2005, 301, 75–92. [Google Scholar] [CrossRef]
  56. Akaho, S. Conditionally independent component analysis for supervised feature extraction. Neurocomputing 2002, 49, 139–150. [Google Scholar] [CrossRef]
  57. Luna, I.; Ballini, R. Top-down strategies based on adaptive fuzzy rule-based systems for daily time series forecasting. Int. J. Forecast. 2011, 27, 708–724. [Google Scholar] [CrossRef]
  58. Sharma, A. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for system predictor identification. J. Hydrol. 2000, 239, 232–239. [Google Scholar] [CrossRef]
  59. Specht, D.F. A general regression neural network. IEEE Trans. Neural Netw. 1991, 2, 568–576. [Google Scholar] [CrossRef] [Green Version]
  60. Bowden, G.J. Forecasting Water Resources Variables Using Artificial Neural Networks. Ph.D. Thesis, University of Adelaide, Adelaide, Australia, February 2003. [Google Scholar]
  61. Scott, D.W. Multivariate Density Estimation: Theory, Practice, And Visualization; John Wiley and Sons: New York, NY, USA, 1992; ISBN 978-0-471-54770-9. [Google Scholar]
  62. Che, J.; Yang, Y.; Li, L.; Bai, X.; Zhang, S.; Deng, C.; Fowler, J.E. Maximum relevance minimum common redundancy feature selection for nonlinear data. Inf. Sci. 2017, 68–86. [Google Scholar] [CrossRef]
  63. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef] [Green Version]
  64. McLeod, A.I. Diagnostic checking of periodic autoregression models with application. J. Time Ser. Anal. 1994, 15, 221–233. [Google Scholar] [CrossRef]
  65. Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  66. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
  67. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Institute of Electrical and Electronics Engineers (IEEE), Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  68. Santana, C.J.; Macedo, M.; Siqueira, H.V.; Gokhale, A.A.; Bastos-Filho, C.J.A. A novel binary artificial bee colony algorithm. Futur. Gener. Comput. Syst. 2019, 98, 180–196. [Google Scholar] [CrossRef]
  69. Siqueira, H.; Santana, C.; Macedo, M.; Figueiredo, E.; Gokhale, A.; Bastos-Filho, C. Simplified binary cat swarm optimization. Integr. Comput. Eng. 2020, 1–15. [Google Scholar] [CrossRef]
  70. Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, 1st ed.; MIT Press: Cambridge, MA, USA, 1992; ISBN 978-0-262-08213-6. [Google Scholar]
  71. Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, Institute of Electrical and Electronics Engineers (IEEE), Orlando, FL, USA, 12–15 October 1997; Volume 5, pp. 4104–4108. [Google Scholar]
  72. EPE—Energy Research Company (in Portuguese: Empresa de Pesquisa Energética). 2019; Brazilian National Energy Balance 2019 (Base Year 2012). Available online: https://www.epe.gov.br/en/publications/publications/brazilian-energy-balance (accessed on 15 May 2020).
  73. Sacchi, R.; Ozturk, M.C.; Principe, J.C.; Carneiro, A.A.F.M.; Da Silva, I.N. Water Inflow Forecasting using the Echo State Network: A Brazilian Case Study. In Proceedings of the 2007 International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers (IEEE), Orlando, FL, USA, 12–17 August 2007; pp. 2403–2408. [Google Scholar]
  74. ONS—Electric System Operator—Brazil (in Portuguese: Operador Nacional do Sistema Elétrico). 2020. Dados Hidrológicos/Vazões. Available online: http://www.ons.org.br/Paginas/resultados-da-operacao/historico-da-operacao/dados_hidrologicos_vazoes.aspx (accessed on 1 May 2020).
  75. Vecchia, A.V. Maximum likelihood estimation for periodic autoregressive moving average models. Technometrics 1985, 27, 375–384. [Google Scholar] [CrossRef]
  76. Hipel, K.W.; McLeod, A.I. Time Series Modelling of Water Resources and Environmental Systems, 1st ed.; Elsevier: New York, NY, USA, 1994; ISBN 978-0-444-89270-6. [Google Scholar]
  77. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  78. Bartlett, P.L. The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Trans. Inf. Theory 1998, 44, 525–536. [Google Scholar] [CrossRef] [Green Version]
  79. Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
  80. Siqueira, H.; Luna, I. Performance comparison of feedforward neural networks applied to stream flow series forecasting. Math. Eng. Sci. Aerosp. 2019, 10, 41–53. [Google Scholar]
  81. Kowalczyk-Juśko, A.; Pochwatka, P.; Zaborowicz, M.; Czekała, W.; Mazurkiewicz, J.; Mazur, A.; Janczak, D.; Marczuk, A.; Dach, J. Energy value estimation of silages for substrate in biogas plants using an artificial neural network. Energy 2020, 202, 117729. [Google Scholar] [CrossRef]
Figure 1. Block diagram of monthly streamflow series prediction.
Figure 1. Block diagram of monthly streamflow series prediction.
Energies 13 04236 g001
Figure 2. Schematic of the filter model for variable selection.
Figure 2. Schematic of the filter model for variable selection.
Energies 13 04236 g002
Figure 3. Example of partial autocorrelation function.
Figure 3. Example of partial autocorrelation function.
Energies 13 04236 g003
Figure 4. Original bi-variable Gaussian function (a) and (b), and the approximation (c) and (d).
Figure 4. Original bi-variable Gaussian function (a) and (b), and the approximation (c) and (d).
Energies 13 04236 g004
Figure 5. Example of mutual information values.
Figure 5. Example of mutual information values.
Energies 13 04236 g005
Figure 6. Schematic of the wrapper method.
Figure 6. Schematic of the wrapper method.
Energies 13 04236 g006
Figure 7. Error behavior with the progressive selection method.
Figure 7. Error behavior with the progressive selection method.
Energies 13 04236 g007aEnergies 13 04236 g007b
Figure 8. Best prediction by plant for p = 1.
Figure 8. Best prediction by plant for p = 1.
Energies 13 04236 g008aEnergies 13 04236 g008b
Table 1. Possible subsets for the input vector V.
Table 1. Possible subsets for the input vector V.
SubsetsSelected Inputs
1v1
2v2
3v3
4v1,v2
5v1,v3
6v2,v3
7v1,v2,v3
Table 2. Mean and standard deviation of each series.
Table 2. Mean and standard deviation of each series.
Complete SeriesTest Set
SeriesMean
(m³/s)
S. Deviation (m³/s)Mean (m³/s)S. Deviation (m³/s)
Furnas912.1225613.5036803.6833611.6814
Emborcação480.6578360.3957447.7333355.7428
Sobradinho2.6062 × 1031.9412 × 1031.9607 × 1031.5001 × 103
Agua Vermelha2.0773 × 1031.2957 × 1031.9635 × 1031.2668 × 103
Passo Real208.6216169.7734228.0083167.1326
Table 3. Results of the variable selection using the AR model.
Table 3. Results of the variable selection using the AR model.
Variable SelectionTestTraining
MSEMAEMSEdMAEdMSEMAEMSEdMAEd
FURNASWRBIC109,014210.770.43300.507896,033196.580.44900.4907
AIC109,014210.770.43300.507896,033196.580.44900.4907
WRAPPER-MSE107,962208.810.42240.499297,037195.550.44590.4859
LfFACPPe107,551209.320.42590.504396,646195.750.44640.4871
FACPPe-Sted.107,551209.320.42590.504396,646195.750.44640.4871
NfMI108,083209.650.42520.501596,702195.430.44520.4856
PMI108,083209.650.42520.501596,702195.430.44520.4856
N-MRMCR-MI108,083209.650.42520.501596,702195.430.44520.4856
MGA107,962208.810.42240.499297,037195.550.44590.4859
PSO107,962208.810.42240.499297,037195.550.44590.4859
EMBORCAÇÃOWRBIC51,745139.360.54870.571640,456119.790.48380.5131
AIC51,745139.360.54870.571640,456119.790.48380.5131
WRAPPER-MSE51,745139.360.54870.571640,456119.790.48380.5131
LfFACPPe50,408138.010.53530.561339,953119.400.47900.5102
FACPPe-Sted.50,408138.010.53530.561339,953119.400.47900.5102
NfMI50,559138.240.53970.560239,880119.200.47680.5081
PMI50,559138.240.53970.560239,880119.200.47680.5081
N-MRMCR-MI50,559138.240.53970.560239,880119.200.47680.5081
MGA51,745139.360.54870.571640,456119.790.48380.5131
PSO51,745139.360.54870.571640,456119.790.48380.5131
SOBRADINHOWRBIC836,738568.040.30710.44081,032,181580.830.38950.4366
AIC836,738568.040.30710.44081,032,181580.830.38950.4366
WRAPPER-MSE836,738568.040.30710.44081,032,181580.830.38950.4366
LfFACPPe836,738568.040.30710.44081,032,181580.830.38950.4366
FACPPe-Sted.863,796577.300.31960.45151,020,492578.130.39100.4403
NfMI828,142566.450.30430.4375994,408573.770.38370.4350
PMI828,142566.450.30430.4375994,408573.770.38370.4350
N-MRMCR-MI828,142566.450.30430.4375994,408573.770.38370.4350
MGA836,738568.040.30710.44081,032,181580.830.38950.4366
PSO836,738568.040.30710.44081,032,181580.830.38950.4366
AGUA VERMELHAWRBIC417,720404.350.40970.4826378,866394.300.40950.4780
AIC417,720404.350.40970.4826378,866394.300.40950.4780
WRAPPER-MSE417,720404.350.40970.4826378,866394.300.40950.4780
LfFACPPe409,613401.000.40520.4803379,329392.980.40780.4752
FACPPe-Sted.409,613401.000.40520.4803379,329392.980.40780.4752
NfMI415,465404.500.40620.4828378,197392.120.40650.4749
PMI413,991403.400.41190.4886369,658393.500.41490.4819
N-MRMCR-MI413,991403.400.41190.4886378,197392.120.40650.4749
MGA415,465404.500.40620.4828378,197392.120.40650.4749
PSO415,465404.500.40620.4828378,197392.120.40650.4749
PASSO REALWRBIC14,99688.650.65700.596916,63786.700.64900.5718
AIC14,99688.650.65700.596916,63786.700.64900.5718
WRAPPER-MSE14,99688.650.65700.596916,63786.700.64900.5718
LfFACPPe14,52387.740.63970.591416,49786.320.64150.5696
FACPPe-Sted.14,52387.740.63970.591416,49786.320.64150.5696
NfMI14,63288.160.64470.595616,47886.080.63980.5676
PMI14,63288.160.64470.595616,47886.080.63980.5676
N-MRMCR-MI14,63288.160.64470.595616,47886.080.63980.5676
MGA14,99688.650.65700.596916,63786.700.64900.5718
PSO14,99688.650.65700.596916,63786.700.64900.5718
Table 4. Results of the variable selection using PAR model.
Table 4. Results of the variable selection using PAR model.
Variable SelectionTestTraining
MSEMAEMSEdMAEdMSEMAEMSEdMAEd
FURNASWRBIC117,347207.370.40960.481994,879190.770.41240.4725
AIC120,629211.500.43430.495793,403189.090.40410.4677
WRAPPER-MSE128,415215.490.45460.504687,842183.450.38460.4556
LfPACF118,744211.520.41130.4882102,317198.260.44990.4899
PACF-Sted.117,144206.700.40550.478894,824190.410.41120.4709
NfMI120,121211.570.43020.496392,821187.350.39730.4620
PMI122,682215.720.44420.505596,930193.120.45020.4822
N-MRMCR-MI135,260224.640.48110.527195,500187.390.39990.4619
MGA128,680216.090.46010.507387,837183.330.38450.4550
PSO133,171228.390.47940.5279109,291201.150.48560.5007
EMBORCAÇÃOWRBIC46,356128.980.50340.535035,421112.660.43770.4817
AIC46,356129.000.50340.535235,415112.570.43700.4806
WRAPPER-MSE51,529134.640.54600.548933,362109.120.41880.4702
LfPACF50,251134.960.55260.561242,622123.540.55960.5418
PACF-Sted.46,195129.230.50930.544435,710113.810.45350.4914
NfMI47,918130.250.50940.535339,096115.990.45880.4851
PMI48,392129.720.51830.536140,967118.630.49130.5010
N-MRMCR-MI46,088128.220.49820.534335,593113.050.44190.4836
MGA51,529134.640.54600.548933,362109.120.41880.4702
PSO58,768153.160.91610.679548,079136.610.81870.6266
SOBRADINHOWRBIC650,374507.000.27910.4097850,140530.470.35600.4133
AIC642,631496.610.27060.3972847,763528.970.35300.4118
WRAPPER-MSE675,439510.130.30710.4194829,497523.190.34500.4061
LfPACF724,099546.380.33610.44621,021,024577.880.46650.4523
PACF-Sted.666,690513.820.28960.4175886,340540.870.36710.4207
NfMI628,672495.940.29230.4124847,892530.370.35190.4104
PMI665,958511.550.29480.4171886,511544.870.38520.4340
N-MRMCR-MI682,074513.840.30340.4180824,000519.210.33990.4010
MGA675,494510.640.30740.4207829,499523.200.34500.4061
PSO755,372564.360.34020.45091,181,095632.300.52900.4889
AGUA VERMELHAWRBIC438,074401.410.40750.4729357,604380.460.37740.4614
AIC439,586402.880.41610.4769356,951379.090.37420.4587
WRAPPER-MSE473,968409.230.44370.4881335,448366.740.35640.4461
LfPACF469,836407.750.41790.4732404,467393.700.40560.4728
PACF-Sted.432,799393.570.39600.4611359,174382.190.38170.4645
NfMI459,970408.370.43000.4836379,638380.900.38170.4566
PMI443,387399.300.41880.4711360,565384.670.38380.4673
N-MRMCR-MI436,274395.680.40290.4664357,410380.800.37810.4618
MGA476,365409.640.44500.4884335,047366.400.35610.4459
PSO775,418560.050.77220.6972627,743500.780.78430.6385
PASSO REALWRBIC16,79393.490.79450.644915,86485.700.62280.5684
AIC15,60188.500.76640.619815,29984.280.60220.5590
WRAPPER-MSE15,58491.700.74540.639514,77082.820.57970.5474
LfPACF15,92490.390.80310.629615,01284.050.60620.5605
PACF-Sted.15,52289.040.75570.615114,97683.780.60180.5576
NfMI14,98290.730.71520.630016,10785.610.63360.5671
PMI15,28287.410.74230.603816,01385.630.63440.564
N-MRMCR-MI15,47490.550.74020.630614,63882.250.57240.5428
MGA15,77992.120.75510.642414,76982.990.57960.5486
PSO17,895101.470.87740.713022,018103.180.87690.6816
Table 5. Results of the variable selection using ELM.
Table 5. Results of the variable selection using ELM.
Variable SelectionMonthly ApproachAnnual Approach
MSEMAEMSEdMAEdMSEMAEMSEdMAEd
FURNASWRBIC124,391212.470.43200.4943123,754220.190.45970.5307
AIC123,550210.020.42640.4878126,426219.670.45030.5195
WRAPPER-MSE119,067206.450.40600.4809123,745217.800.44550.5171
LfPACF126,594222.320.46230.5236129,323219.720.44510.5138
PACF-Sted.121,806212.110.44330.4997126,768216.790.45090.5110
NfMI130,637230.240.51150.5656124,105220.330.46650.5279
PMI126,445217.600.47690.5207126,349221.970.46110.5287
N-MRMCR-MI140,451238.150.54240.5842126,507222.670.46500.5330
MGA137,304230.580.50630.5448135,978232.350.49630.5594
PSO132,599228.870.48290.5377131,426226.520.48230.5435
EMBORCAÇÃOWRBIC38,143114.510.44950.495648,513130.030.53330.5586
AIC41,110116.210.46570.494445,459127.260.51580.5501
WRAPPER-MSE37,551118.560.43350.500144,936129.150.51370.5557
LfPACF44,227124.360.51220.535545,994130.150.51530.5563
PACF-Sted.48,543130.420.53950.552644,690129.010.50950.5556
NfMI49,315131.440.56410.566245,169130.890.51260.5565
PMI52,571132.200.57070.556444,094128.540.50610.5511
N-MRMCR-MI47,931129.260.53980.545945,707129.500.51690.5558
MGA54,659142.880.62250.601745,434128.470.51660.5515
PSO50,211130.820.56670.564445,298130.200.51700.5595
SOBRADINHOWRBIC642,185519.690.29540.4329669,441534.370.32090.4632
AIC590,254492.200.29790.4298657,405530.220.31660.4532
WRAPPER-MSE587,680495.730.29450.4250672,783531.900.32290.4567
LfPACF696,480530.920.33760.4506718,719549.400.33660.4676
PACF-Sted.747,932550.630.35610.4591690,307549.760.35100.4872
NfMI692,277533.350.35260.4587668,656531.830.31870.4527
PMI746,916546.820.36470.4621694,596540.000.32340.4549
N-MRMCR-MI828,437582.140.48190.3842710,649542.320.33160.4583
MGA728,468563.810.34000.4674698,381555.960.35010.4869
PSO773,147571.990.36570.4738712,829558.870.36550.4920
AGUA VERMELHAWRBIC408,982384.660.36460.4528443,959394.450.40550.4727
AIC411,485387.400.37010.4583436,790393.100.39460.4661
WRAPPER-MSE374,264375.430.34120.4429436,981394.690.39520.4664
LfPACF436,494412.250.41090.4864439,692401.170.40250.4759
PACF-Sted.419,472401.620.40400.4813458,381406.680.41310.4813
NfMI423,961426.850.45650.5274453,903420.360.42880.5065
PMI434,519410.820.44200.5026432,417394.640.39550.4687
N-MRMCR-MI417,154397.180.40200.4787458,725422.000.43440.5076
MGA502,689437.620.46730.5201449,919405.970.41420.4874
PSO478,617440.210.45260.5271439,377402.750.40220.4806
PASSO REALWRBIC12,76879.700.63820.554915,85989.780.73670.6079
AIC12,96478.810.65920.549015,86689.780.73660.6078
WRAPPER-MSE11,82878.420.60330.548815,43587.800.72770.5962
LfPACF16,28891.410.77720.621815,25786.400.72780.5884
PACF-Sted.16,43589.650.78500.611615,35186.550.73200.5893
NfMI15,05986.490.71960.594416,07489.960.76120.6170
PMI15,40087.350.75170.597916,14691.000.75840.6207
N-MRMCR-MI16,63591.480.77230.627416,20890.100.76830.6168
MGA17,03590.310.81340.614016,25891.150.76260.6206
PSO15,58986.860.73440.596416,27091.040.77310.6217
Table 6. Number of best results by approach.
Table 6. Number of best results by approach.
Models
VS MethodAR TrainAR TestPAR TrainPAR TestELM AnnualELM Monthly
BIC1(+1)---1(+3)-
AIC1(+1)---2; 1(+8)-
WRAPPER-MSE--1(+1)-1(+3); 1(+8)5
PACF-1(+1); 1(+1);
1(+1); 1(+1)
--1; 1(+8)-
PACF-Sted.-1(+1); 1(+1);
1(+1); 1(+1)
-21(+8)-
MI1(+2); 1(+2); 1(+2)1(+2)-21(+3); 1(+8)-
PMI1; 1(+2);
1(+2); 1(+2)
1(+2)--1(+3); 1(+8)-
N-MRMCR-MI1(+2); 1(+2); 1(+2)1(+2)111(+8)-
GA--3; 1(+1)-1(+8)-
PSO----1(+8)-

Share and Cite

MDPI and ACS Style

Siqueira, H.; Macedo, M.; Tadano, Y.d.S.; Alves, T.A.; Stevan, S.L., Jr.; Oliveira, D.S., Jr.; Marinho, M.H.N.; Neto, P.S.G.d.M.; Oliveira,  .F.L.d.; Luna, I.; et al. Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods. Energies 2020, 13, 4236. https://doi.org/10.3390/en13164236

AMA Style

Siqueira H, Macedo M, Tadano YdS, Alves TA, Stevan SL Jr., Oliveira DS Jr., Marinho MHN, Neto PSGdM, Oliveira  FLd, Luna I, et al. Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods. Energies. 2020; 13(16):4236. https://doi.org/10.3390/en13164236

Chicago/Turabian Style

Siqueira, Hugo, Mariana Macedo, Yara de Souza Tadano, Thiago Antonini Alves, Sergio L. Stevan, Jr., Domingos S. Oliveira, Jr., Manoel H.N. Marinho, Paulo S.G. de Mattos Neto,  João F. L. de Oliveira, Ivette Luna, and et al. 2020. "Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods" Energies 13, no. 16: 4236. https://doi.org/10.3390/en13164236

APA Style

Siqueira, H., Macedo, M., Tadano, Y. d. S., Alves, T. A., Stevan, S. L., Jr., Oliveira, D. S., Jr., Marinho, M. H. N., Neto, P. S. G. d. M., Oliveira,  . F. L. d., Luna, I., Filho, M. d. A. L., Sarubbo, L. A., & Converti, A. (2020). Selection of Temporal Lags for Predicting Riverflow Series from Hydroelectric Plants Using Variable Selection Methods. Energies, 13(16), 4236. https://doi.org/10.3390/en13164236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop