1. Introduction
Agricultural datasets, in particular, are of great importance as they provide information on various variables, such as the occurrence and intensity of rainfall, daily temperature fluctuations, and price variations of commodities. The forecasting of future commodity prices is crucial for all stakeholders in the agricultural supply chain, from farmers/producers to consumers and policymakers. Having proper knowledge about possible future prices can prevent distress sales by farmers and enable them to make better decisions about their farming activities.
Time series modeling is a crucial tool in data analysis and forecasting, as it allows researchers to uncover hidden patterns and trends within the data. Model selection is an important step in time series modeling, as the choice of model can have a significant impact on the accuracy of the forecast. Ultimately, the selection of the appropriate technique depends on the nature of the data and the specific research question being addressed.
The wavelet transformation helps capture minute events in a signal that may not be obvious [
1]. It represents localized phenomena at different time scales in the signal. A wavelet representation of a signal can identify frequency content as well as temporal variations [
2]. There are two different ways to analyze a signal through wavelet transformation: continuous wavelet transform (CWT) and discrete wavelet transform (DWT). CWT operates on a continuous signal, whereas DWT performs decomposition on scales with discrete values with a dyadic structure. CWT produces a redundant number of signals to capture specific details in the signal through reconstruction of the original signal, which is quite difficult in this case [
3]. Reconstruction is useful for forecasting the original signals [
3]. Consequently, DWT is best suited for discrete, multiscale agricultural price datasets. DWT produces orthogonal sub-signals that are amenable to further treatment.
Stochastic models, such as autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroscedastic (GARCH) models, have several data specifications before modeling can be performed [
4]. Data-driven ML algorithms need very little human intervention, unlike other stochastic models [
5]. Several important algorithms are used for forecasting purposes. MARS is a piecewise regression model that can efficiently handle nonlinearity in the data [
6]. PCR is an artificial intelligence (AI) algorithm based on principal components (PCs), which are linear combinations of the original predictor variables [
7,
8]. The PCs are orthogonal among themselves, which easily solves the serious issue of multicollinearity in regression analysis [
9,
10]. The support vector regression (SVR) algorithm is simultaneously useful in both classification and regression problems and is based on the risk minimization principle [
11,
12,
13]. Zhang et al. [
14] used the linear programming method and proposed a two-phase SVR with multiple kernel functions. This helped them find important features to predict output variables with reduced computational complexity, which is the case in solving convex quadratic programming. RF is based on the famous bagging algorithm, which combines several decision trees to give the final prediction output of many input variables [
15,
16,
17,
18]. ANN makes use of a three-layered system for use in classification and regression tasks [
19]. Li et al. [
20] proved the outperformance of an induced ordered weighted averaging (IOWA)-optimized neural network (NN) model over other models to predict a vegetable price series. Zhou et al. [
21] discussed tensor principle component analysis (PCA)-based techniques to recover clean data in the presence of noise. Zhao [
22] studied a wavelet-based signal processing technique along with an ML model to predict future prices of agricultural products. Paul and Garai [
23] predicted tomato prices using wavelet filter-based decomposition in combination with stochastic and ML models, but they did not address the issue of the best combination of filter and model or their relationship. Iniyan et al. [
24] used ML as a dynamic device to forecast crop yield. They utilized several ML methods with several variables to help farmers decide which crop grow and to increase yield.
In the present investigation, onion prices from three major markets in India, namely Bengaluru, Delhi, and Lasalgaon, were used. Onion is the second-most-produced vegetable in India after potatoes. As per the 3rd Advance Estimates (2020-21), its production stands at 26.83 million tonnes. The modeling and forecasting of onion price series in various markets in India has importance among researchers. Many studies are available in the literature on the application of stochastic and ML models [
25,
26,
27,
28,
29,
30]. The price series were predicted using stochastic models and ML algorithms. Thereafter, wavelet-decomposed subseries were used in these models to obtain better results. The efficacy of the prediction results was measured using three commonly used error functions: root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). There are several wavelet filters that have evolved in the literature for use in different aspects. It is necessary to identify which filter performs particularly well for an individual model so that it can be efficiently used for modeling time series with that particular model in the future. In this paper, an attempt is made to address the issue of finding the best-performing filter for a particular method based on several performance metrics.
2. Methodology
2.1. ARIMA
The most popular linear time series model is the autoregressive integrated moving average (ARIMA) model [
31]. For a time series
, the ARMA (
) model is presented by Equations (1) or (2):
or
where
and
are the AR and MA polynomials of lag operator
of order
and
, respectively.
For a wide class of nonstationary time series, the ARMA model is generalized by incorporating a differencing term. The ARIMA
model is defined as:
where
, and
represent the order of autoregression, integration (differencing), and moving average, respectively.
2.2. GARCH
The ARIMA model is unable to capture the nonlinear structure of a time series. The generalized autoregressive conditional heteroscedastic (GARCH) model was proposed by [
32] to capture the conditional heteroscedasticity present in time series data. For a GARCH process, the conditional distribution of the error,
, given the available information
up to
time epoch, is assumed to follow a normal distribution, i.e.,
and
. The values of
are identically and independently distributed (
i.i.d.) innovations with zero mean and unit variance.
Here, the conditional variance of the GARCH (
p, q) process is defined as:
provided
.
The GARCH (
p, q) process is said to be weakly stationary if and only if:
2.3. ANN
ANNs are nonlinear, data-driven, and self-adaptive approaches. Like human brains, neural networks also consist of processing units (artificial neurons) and connections (weights) between them.
The output
from a neuron can be expressed as:
where
is the input to the network,
is the corresponding weights,
is the bias imposed on the output of a neuron, and
is the activation function. The number of hidden layers, number of neurons in each hidden layer, learning rate, activation function, regularization techniques, and optimization algorithm are different hyperparameters. Manual tuning, grid search, random search, Bayesian optimization, and evolutionary algorithms are some methods employed to tune them. Once the system is optimized, it can produce outputs from the supplied inputs.
2.4. SVR
SVR is useful in both classification and regression studies [
11]. It is formulated as Equations (7) and (8), subject to the constraints represented in Equation (9), where:
In the above equations, are the input variables; is the predicted value of the output variable; or is the kernel function; are constants where and ; is the cost factor or regularization parameter; are slack variables; and epsilon () is a constant. The kernel function (linear, polynomial, radial basis function), cos factor, and epsilon are tunable parameters for optimizing the SVR algorithm.
2.5. RF
RF is a supervised learning algorithm that is used for both classification and regression. A natural forest consists of trees. Similarly, the random forest algorithm develops decision trees from the sample information, obtains predictions from each of them, and finally selects the best solution using voting. It is based on the bagging technique (bootstrap aggregation) over the decision trees. The correlation between trees is obtained by applying randomization in two ways. Firstly, each tree is trained on a bootstrapped subset. Secondly, the feature by which splitting is performed in each node is not selected from all possible features, but only from their random subset of size
. This algorithm generates all
trees independently. The efficiency of the RF algorithm is achieved by constructing a full binary tree of maximum depth for each candidate tree. The random forest prediction can be estimated using the following formula:
Here, is the total number of candidate regression trees, is th prediction from the ith tree, and is the final random forest prediction.
2.6. SMLR
MLR is formulated as:
where
denotes the response,
,
,…,
are the explanatory variables (predictors), and
,
,…,
are the constants to be estimated, called regression coefficients. In SMLR, the final model is produced by selectively adding or deleting the explanatory variables one by one and checking their statistical significance iteratively. The algorithm proceeds in a stepwise manner, typically using a combination of forward selection, backward elimination, and/or variable updating. Once the stepwise procedure is complete, the algorithm provides a final model that includes a subset of the available predictor variables. The selected variables are considered the most relevant and statistically significant for predicting the response variable.
2.7. MARS
The MARS model uses a series of piecewise linear or nonlinear splines a basis functions (BF) to mimic the nonlinear relationship between output (splines) and input variables. Each subspace’s BF and slope can be altered by moving from one subspace to the adjacent subspace. Knots are the endpoints of each section. The MARS model can be denoted by:
where
is the expected response,
is the bias,
is the unknown coefficient of the weight connecting
BFs to the response. The unknown coefficients are estimated by using the least squares method. The spline BF can be expressed as:
where
is the number of knots;
denotes the right or left associated linear step function, which takes values of either +1 or −1;
denotes the input variable
at knot
; and
represents the knot location.
An optimal MARS model is created by using a two-stage forward and backward technique. In the forward stage, the data are overfitted by taking into account a large number of BFs. To overcome this, the duplicate BFs are eliminated from Equation (12) in the backward stage. The generalized cross-validation (GCV) criterion is used to remove the duplicate BFs. The GCV is calculated as:
where
is the total number of points in the data;
is the observed response,
is the expected response; and
is a complexity penalty that increases with the number of BFs in the model, which is defined as
. Here,
is the number of BFs in the model, and
denotes the penalty of BF.
2.8. PCR
PCA is a data dimension reduction technique in which a set of correlated variables is transformed into a set of uncorrelated PCs that retain as much information as the original variables. The PCs are ordered by explained variance, and the first PC can explain most of the variance in the data.
Let be the vector of variables under study and be the variance–covariance matrix of the dataset. are the eigenvalues of and are the corresponding eigen vectors. Then, the PCs are defined as subject to the condition that and . The variances of the PCs will be the corresponding eigenvalues, i.e., . Out of this number of PCs, the first few are selected that can explain around 85% to 90% of the total variability. PCR is a statistical technique that combines PCA and linear regression. It is used for handling multicollinearity in regression models and dealing with high-dimensional data. In PCR, the selected PCs are used as regressors in a linear regression model, and usually the OLS method is applied for estimation. PCR helps to improve the stability and interpretability of the regression model.
2.9. Wavelet
The wavelet transform (WT) uses particular high- and low-pass filters to decompose a signal into several subseries containing information at different resolutions [
33]. Levels (
) of decomposition are fixed according to the number of observations (
) in the series (
,
is of base 2). Detailed (Equation (15)) and smooth coefficients (Equation (16)) are generated at the first decomposition. The decomposition takes place in an approximate (smooth) series in the next steps until it completes all levels of decomposition. Detailed coefficients are given by:
Smooth coefficients are given by:
where
is the wavelet function and is associated with
, the scaling function. The wavelet coefficient, or mother wavelet, is represented as
; the father wavelet or scaling coefficient is denoted by
;
represents time; and
, and
are the scale and translation parameters, respectively.
2.10. Proposed Methods
The extent of statistical dependencies in the onion price series was obtained using the autocorrelation function (ACF) and partial autocorrelation function (PACF). Lag series were prepared accordingly for all series. Price series were predicted using a suitable ARIMA model based on the Akaike information criterion (AIC). The residual series were obtained, and it was found that they were not white noise. The residual series were fitted with the GARCH model to obtain the prediction. Using the prepared data frame of the lagged series, the ANN, SVR, RF, SMLR, MARS, and PCR models were trained. The predictions obtained from these individual models were stored to determine the accuracy measures. Wavelet decomposition of the original price series was carried out using Haar, D4, C6, LA8, and BL14 filters at three levels. Three levels of decomposition were suggested in many studies [
23,
34,
35]. Thereafter, the wavelet-decomposed subseries were fitted one by one into ARIMA, GARCH, ANN, SVR, and RF models, respectively. Therefore, five predictions were obtained from one ML algorithm. All of them were stored to calculate their prediction performance. A detailed explanation of the work is given in the following steps.
Step 1: The original series was divided into training and validation sets. Two validation sets were prepared, containing the last 1-week and last 1-month data, respectively.
Step 2: The training set was used to train various models by preparing lag series wherever necessary. The obtained predictions were validated through unseen validation sets, and the performance under different scenarios was recorded.
Step 3: Decomposition of the actual series was performed through five wavelet filters, namely Haar, D4, C6, Bl14, and LA8.
Step 4: The decomposed series were divided into training and validation sets.
Step 5: The lag series were prepared for fitting the training set into various models.
Step 6: Predictions from the wavelet-based models were obtained and compared with those from the validation sets.
Step 7: According to the results, the best wavelet filter in combination with the best statistical model was determined.
In this study, stochastic ARIMA and GARCH models and ML models such as ANN, MARS, PCR, RF, SMLR, and SVR were used along with wavelet-based decomposition (
Figure 1).
These models were represented as WML in Equation (21). If
is the actual series (
represents time),
represents the wavelet function, and
are decomposed series at three levels of decomposition (Equation (17)), then:
represents the wavelet based-ARMA model, and is expressed as:
Here, ; and and are error terms assoicated with ’s and respectively. The ‘’ sign in Equations (18)–(21) has been used to denote ‘or’.
The conditional variance of error term (
) can be modeled using the wavelet-based GARCH model, as in Equation (18):
The wavelet-based ANN (
) model is represented below:
The term has been used to represent the activation function. The term represents lags of . represents the respective predicted values.
The wavelet-based SVR (
) model is given as:
Here,
represents the matrix of
. For implementation of the wavelet-based RF (
) model, each
is modeled with the RF algorithm using
as a predictor variable.
is the prediction from the
tree:
The inverse wavelet transform is represented as
,
is the function (Equations (18)–(21)) to predict all subseries individually, and
is final prediction of the WML model, so:
The ranks of the individual models were determined based on RMSE, MAE, and MAPE values for each of the markets. Then, the wavelet filter that performed the best for an individual model was determined using similar metrics. For a particular wavelet filter, the ranks of the wavelet-based combination models were obtained based on the performance measures. Based on the results, the best wavelet-ML combination model was declared, and poorly performing models were pointed out for particular markets.
3. Data Description
The daily modal prices (Rs./q) of onions for the Bengaluru, Delhi, and Lasalgaon markets were obtained from the Agricultural Marketing Information System (AGMARKNET) website (
https://agmarknet.gov.in/, accessed on 28 January 2023) for the time interval of 1 May 2019 to 31 December 2022. The descriptive statistics of these price series are given in
Table 1. From this table, it can be seen that the mean prices were in the order of Bengaluru > Lasalgaon > Delhi. The same sequence was seen for the median and maximum prices, standard deviation (SD), and coefficient of variation (CV) percentage. The minimum price for the Bengaluru and Lasalgaon markets was the same, and it was less for the Delhi market. All price series were positively skewed and leptokurtic. For skewness and kurtosis, the sequence was Bengaluru > Delhi > Lasalgaon. The kurtosis of the price series of the Bengaluru market was significantly higher than that of the other two markets. The time plots of the price series are shown in
Figure 2. Here, it can be seen that all of them followed almost similar patterns. The highest spike in price was noticeable in December 2019. Again, during the last quarter of 2020, a price spike with relatively less intensity was noticeable.
The other properties of the dataset, such as normality, stationarity, and linearity, were also tested. The normality of the price series was tested using the Shapiro–Wilk test [
36]. The null hypothesis of this test was that the series followed normality, and it was seen (
Table 2) that none of the price series displayed normality at a 1% level of significance.
The stationarity of the price series was tested using the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test [
37] and the Phillips–Perron (PP) test [
38]. The KPSS test had a null hypothesis of the absence of a unit root (stationary data), whereas the PP test assumed that the data had a unit root (non-stationary data). The values of the test statistics for these two tests are given in
Table 3, and it was seen that all price series were stationary.
The linearity of the price series was tested using the Broock–Dechert–Scheinkman (BDS) test [
39]. Its null hypothesis was that the data in a time series were independently and identically distributed (
i.i.d.). The values of the test statistics for two and three embedding dimensions at different values of epsilon are given in
Table 4, and all were significant at a 1% level of significance. Hence, it could be concluded that the price series were nonlinear.
The autocorrelation function (ACF) and partial autocorrelation function (PACF) of the onion price series are illustrated in
Figure 3. The significant values of ACF and PACF indicated statistical dependencies among the different lagged realizations of any time series. The ACF values of the price series of all the markets were significant for the large lag. This is known as hyperbolic decay. This indicated the presence of long-term persistence (long memory). Significant values of PACF at different lags were also noticeable.
In this research article, wavelet decomposition of the price series helped to address non-normality, nonlinearity, and the presence of long-term persistence.
5. Results and Discussion
In this study, the onion price series of three markets in India were divided into training and validation sets. The validation set contained 30 observations for each market. Here, the short- (1-week) and long- (1-month) term prediction performances of the models were studied. A total of 33 different models, including wavelet-based combination models, were used to model the series. Among the models used for this purpose, two were stochastic models and six were ML models.
The training process of an ANN involved feeding the training data through the network, adjusting the weights based on the error, and updating the model’s hyperparameters (number of hidden layers, number of neurons in each hidden layer, learning rate, activation function, regularization techniques, and optimization algorithm). The validation set was used to assess the model’s performance and choose the best hyperparameters. There are several methods for tuning the hyperparameters of ANN, including manual tuning, grid search, random search, Bayesian optimization, and evolutionary algorithms. Significant changes were not noticed during the manual tuning phase with different hyperparameter setups. To keep the model simple but efficient, a single hidden layer with four hidden units, a sigmoid activation function, resilient back propagation with a weighted backtracking optimization algorithm, L2 regularization, and a learning rate of 0.05 were set. For the RF-based model, 100, 250, 500, and 1000 trees (bootstrapped subsets) were tried. In the present investigation, 500 trees were found to be optimal for the three markets with the maximum explained variation in the data. To tune the SVR model, the epsilon values were varied from 0.001 to 0.1. The cost factor varied in the range of 0.01 to 1. Gamma values were considered between 0.1 and 0.5. The optimum combination of hyperparameters was selected based on the minimum MSE value. The optimum values of hyperparameters are represented in
Tables S1–S4 in the supplementary material.
In the case of the Bengaluru market, the order of the best ARIMA model was (4, 1, 4) with an AIC value of −6516.45. The most efficient SVR model had a radial kernel, an epsilon value of 0.1, cost factor of 1, gamma of 0.33, and 269 support vectors. The RF model with 500 trees explained 95.71% of the variation in the data. In the case of the Delhi market, the order of the best ARIMA model was (1, 1, 0) with an AIC value of −6331.73. The most efficient SVR model had 209 support vectors, and the other parameters remained the same. The best RF model explained 97.39% of the variation in price in Delhi. In the case of the Lasalgaon market, the order of the best ARIMA model was (0, 1, 1) with an AIC value of −5140.65. The best SVR model used 325 support vectors, and the other parameters remained the same as those of the Bengaluru market. The best RF model explained 94.97% variation in the Bengaluru price.
Five wavelet filters were used to decompose the three datasets. Each of the decomposed sets was predicted using the ARMA, GARCH, ANN, SVR, and RF models to obtain the final predictions of the original datasets. They called the wavelet-based combination models. In total, 25 wavelet-based combination models were formulated.
The validation results of the performance metrics for the 1-month as well as 1-week data for the above-mentioned models for the three markets are provided in
Table 5 and
Table 6, respectively. The MAPE values in the tables are given in percentage. For representing the different wavelet-based combination models, the term ‘Wmodel_filter’ was used, where ‘W’ represents ‘wavelet’, ‘model’ means either ARMA, GARCH, ANN, SVR, or RF, and ‘filter’ indicates either Haar, D4, C6, BL14, or LA8.
To compare the prediction performance of the different methods,
Figure 4 depicts the actual versus predicted values obtained using the best prediction model.
There were eight individual models, including the stochastic and ML models. Their performance was compared for predicting the onion prices of the mentioned markets, and the ranks of these models are presented in
Table 7. The PCR was the best-performing model for the Bengaluru and Delhi markets based on the RMSE values. SVR was the best model for the Lasalgaon market. The ARIMA model was the worst performer for the Bengaluru and Delhi markets, and GARCH did not perform well for the Lasalgaon market. Based on the MAE values, MARS was the best-performing model for the first two markets, and ANN was the best for the last market. ARIMA was the worst performer for the first market and GARCH was the worst for the remaining two markets. Nevertheless, ANN was the best-performing model for the three markets based on the MAPE values. ARIMA and GARCH were the worst-performing models for the first and last two markets, respectively.
There are several wavelet filters that have evolved in the literature for use in different aspects. It is necessary to identify which filter performs particularly well for an individual model so that it can be efficiently used for modeling time series with that particular model in the future. A thorough representation of the ranks of the models with separate filters is provided in
Table 8 for predicting the onion prices of the selected markets. The three performance measures indicated that the Haar filter was the best performer with the RF model for all markets. The best performance was achieved by RF in combination with the D4 filter for the Bengaluru market. In the Delhi and Lasalgaon markets, GARCH was the best model with the D4 filter. It may be said that the RF and GARCH models had invariable performance when used with the C6 filter for the prediction of onion prices. RF, ARMA, and SVR performed the best with the BL14 filter for the Bengaluru, Delhi, and Lasalagaon markets, respectively. GARCH and RF were more or less equally efficient with the LA8 filter for the best prediction of onion prices in the three markets.
In the previous section, discussion was confined to the fact that a particular model was good for prediction using a particular wavelet filter. In this section, an attempt is made to identify the best filter for use with an individual model to predict onion prices. From
Table 9, it was noticeable that the Haar and D4 filters could be used interchangeably with the stochastic and ML models mentioned above for the best prediction. However, the BL14 and C6 filters should not be used for the decomposition of datasets in these models.
Now that filter-wise and model-wise comparisons had been performed, the best and worst models needed to be identified for the prediction of onion prices. It was observed from
Table 10 that the Haar filter combined with the RF and SVR models was the best performer for all markets. WGARCH_BL14 and WANN_BL14 were the worst-performing models for the prediction of onion prices in the first two markets. Here, GARCH was the most poorly performing model. WANN_C6 and WARMA_C6 were the most undesirable models for the Lasalgaon market. The ANN-based combination was the worst model.