1. Introduction
Until now, supply chain management (SCM) has primarily been perceived as an issue confined to individual companies. Both large and small disruptions occur routinely at the enterprise level, often being naturally resolved through price adjustments. However, the landscape of supply chain disruptions has undergone a significant transformation, reaching a global scale due to risks such as the obstruction of the Suez Canal, the COVID-19 pandemic, the Russia–Ukraine war, and the U.S.–China trade dispute. This escalation of supply chain disruptions has surpassed the capabilities and scope of individual companies. It is no longer sufficient to rely solely on price adjustments for resolution. Notably, the direct correlation between supply chain risk management for critical goods or strategic materials—integral for national security—and the maintenance of industrial competitiveness, social stability, and diplomatic and security leverage has elevated the issue. Consequently, developing robust response systems and building resilience to supply chain shocks have become imperative tasks, requiring coordinated efforts at both corporate and national levels.
South Korea has been significantly affected by recent global supply chain risks. In particular, it is important for the economy of South Korea to secure stable prices and the supply of raw materials such as crude oil, which is 100% import-dependent. There are many studies that have analyzed the correlation between the Korean economy and international oil prices. Shin [
1] derived, through quantitative analysis, that the impact of international oil prices on the Korean economy continues to grow. Lee [
2] analyzed how Korea’s GDP and producer price index change in response to oil price hikes and petroleum product prices.
Also, as oil prices have a great impact not only on Korea but also on the world economy, numerous researchers have focused on predicting crude oil prices. Shin et al. [
3] proposed a semi-supervised learning method devised for oil price prediction. Mahdian and Khamehchi [
4] compared a modified neural network model with a pure neural network in predicting both daily and monthly crude oil prices, demonstrating its superior performance, particularly in situations with a small number of input data for training or great fluctuations in variables. Xiong et al. [
5] indicated that the proposed EMD–SBM–FNN model using the MIMO strategy is the best in terms of prediction accuracy with accredited computational load.
In this research, our objective is to examine whether supply chain-related variables can enhance the forecasting performance of crude oil prices. Furthermore, we propose a novel hybrid factor-based approach to enhance the accuracy of forecasting crude oil prices. Our research contributes to the analysis and forecast of the crude oil price by using the relationship with global supply chain parameters. The crude oil price variation is a crucial parameter that affects the socio-economy. The instability of the crude oil price causes unstable situations such as global energy supply instability and inflation. Therefore, our work aims to provide an analysis frame that predicts the crude oil price time-varying tendency regarding global supply chain pressure. Ultimately, this novel method aims to cope with this uncertain and unstable supply chain situation and enable sustainable supply chain management.
1.1. Types of Global Supply Chain Risk
Many previous studies have explored the types of global supply chain risks and the triggers that cause these supply chain disruptions. Yang et al. [
6] classified internal risk types into logistical, financial, and information risks, and external risk types into policy, economy, culture, technology, natural disasters, and demand-related risks. Elsewhere, Harland et al. [
7], Faisal et al. [
8], and Manners-Bell et al. [
9] also distinguished between internal and external risks. According to this, internal risks refer to cases that are directly related to operations, such as excessive inventory holding, product defects, and production volatility. External risks are those that can affect the supply chain from the outside, such as terrorism, war, piracy, or the global economic crisis. The WEF report in 2012 [
10] also divided risks into internal and external risks: internal risks include credit rating, capital flow, intellectual property, asset value, and production quality, while external risks include natural disasters, disputes, brand reputational damage, and asset damage.
Zsidisin and Hendrick [
11] classified risks into six areas: transportation, inventory, forecast, information, market, and suppliers. Dae-hyun et al. [
12] added global risks to supply, operation, and demand risks. Global risks include innovative technologies, frequent legal and institutional changes, natural disasters, political risks, and strategic risks, as mentioned by Jeongwook [
13]. Houlihan and Laurent [
14] categorized supply chain risk factors into changes in short-term forecasts, changes in customer preferences, changes in technology, changes in government policies, changes in organizational frameworks, changes in organizational members, and changes in competitive strategies. Cooper and Ellram [
15] classify risk factors as inaccurate fluctuations in customer demand, inaccurate supply lead times, partner financial conditions, inaccurate information, shortened product lifecycles, frequent market changes, globalization, intensified competition, and innovative technologies. These were divided into development risks and regulatory changes.
Christine et al. [
7] defined four supply chain risk types: financial loss, material loss, psychological loss, and psychological loss. In addition, Tang and Musa [
16] defined risk as any factor that disrupts or disrupts the supply chain process and grouped risk factors into material flow, financial flow, and information flow. Lin and Zhou [
17], Seok-Mo and Choong-Bae [
18], and Choong-Bae and Hee-Chan [
19] classified the internal environment by supply chain nodes, such as supply and demand, while categorizing various risk factors, such as natural disasters, terrorism, international politics, and war, into one risk factor called ‘external environment.’ In other words, these previous studies focused on risks that can occur within the supply chain while neglecting to classify risk types.
As a result of analyzing previous studies, risks can be largely divided into external and internal risks, with macro risk factors and micro risk factors. External risks refer to global factors that affect the supply chain, including natural disasters, war and terrorism, political instability, economic downturn, sovereignty risks, and regional instability. Internal risk refers to the risk that may occur in relation to all activities that a company conducts within its supply chain. Previous studies have identified risk factors by classifying them into different supply chain stages, such as demand, manufacturing, and supply. In the case of demand, risk factors such as inaccurate demand forecasts, rapid demand, short product life cycles, competitor movements, and market changes were derived. Risks at the manufacturing stage include technical knowledge, production capacity, product quality, demonstrations, and design changes. In this study, we focus particularly on risks in the logistics aspect that arise from external factors in the supply chain in the analysis.
1.2. Influential Factors and Models in Crude Oil Price Forecasting
Supply and demand factors have been widely recognized as significant indicators for oil price prediction. Hamilton [
20] and Kilian [
21] emphasized that oil supply and demand shocks are crucial determinants in explaining oil price shocks. Furthermore, Miao et al. [
22] suggested a total of twenty-six determinants for forecasting models for the West Texas Intermediate (WTI) crude oil spot prices, grouping them into six categories: supply factors, demand factors, financial factors, commodity market factors, speculative factors, and political factors. With respect to supply and demand factors, they considered factors such as global production, global stock, global export, OPEC surplus, US stock, capacity utilization rate, Baltic Dirty, Kilian index; GDP growth in China, US, and EU; Steel World; global imports of China, US, and EU; and ISM. Despite the significant impact of supply chain disruptions on many economies, there is a scarcity of research papers that consider supply chain factors as determinants in oil price forecasting models. In this research, we aim to investigate whether supply chain-related variables could enhance forecasting performance.
A variety of models, including statistical and econometric models, artificial intelligence (AI) models, and hybrid models, have been employed to predict crude oil prices. Traditional time series econometrics models, such as autoregressive integrated moving average (ARIMA), generalized autoregressive conditional heteroscedasticity (GARCH), random walk (RW), a vector autoregression (VAR) model, and a vector error correction (VECM) model, are commonly used for oil price prediction. However, these models often face challenges in handling complexity and nonlinearity. As a response, AI models have been increasingly applied to the forecasting domain. Safari and Davallou [
23] noted that time-series models might be insufficient to capture the nonlinear features of crude oil prices. To address this limitation, AI models are employed for oil price prediction. Azadeh et al. [
24] introduced a flexible algorithm based on artificial neural networks (ANNs) and fuzzy regression (FR) to optimize long-term oil price forecasting in noisy, uncertain, and complex environments. Zhao et al. [
25] utilized an advanced deep neural network model called stacked denoising autoencoders (SDAE) and an ensemble method named bootstrap aggregation (bagging) to demonstrate superior forecasting ability. Li et al. [
26] proposed a novel crude oil price forecasting method based on online media text mining, aiming to capture more immediate market antecedents of price fluctuations.
In addition to individual approaches, there have been endeavors to explore hybrid methods. Safari and Davallou [
23] identify three categories of hybrid methods, encompassing a combination of soft-computing techniques, a fusion of econometric models, and an amalgamation of soft-computing and econometric methods. They integrated the exponential smoothing model (ESM), the autoregressive integrated moving average model (ARIMA), and the nonlinear autoregressive (NAR) neural network to enhance the accuracy of forecasting crude oil prices. Zhang et al. [
27] introduced a hybrid method to predict crude oil prices, combining the ensemble empirical mode decomposition (EEMD) method, the least squares support vector machine together with the particle swarm optimization (LSSVM–PSO) method, and the generalized autoregressive conditional heteroskedasticity (GARCH) model. Abdollahi [
28] constructed a hybrid model incorporating complete ensemble empirical mode decomposition, support vector machine, particle swarm optimization, and Markov-switching generalized autoregressive conditional heteroscedasticity to more effectively capture the nonlinearity and volatility of the time series. Despite numerous studies on developing hybrid models, a consensus on the best-fit model for forecasting oil prices has yet to be reached. In this research, we propose a novel hybrid factor-based approach to enhance the accuracy of forecasting crude oil prices. This will be achieved by comparing time series models and machine learning models based on the encompassing test.
This research aims to investigate whether supply chain-related variables have statistically significant effects on South Korea’s crude oil import price. Additionally, we propose a novel hybrid factor-based approach to forecasting crude oil prices, incorporating supply chain aspects. This involves comparing the forecasting accuracy between traditional time series models and machine learning models. In the following section, we describe the time series models ARIMA, VAR, and VECM and the machine learning models KNN, SVM, and RF. The results are then discussed, followed by sections on discussion and conclusions.
The data are presented in
Section 2.1. The main data sources are Petronet, KEEI, and Neworkfed. The time series data of the main crude oil price indicators are applied to analyze the effect due to global supply chain pressure. Supply and demand factors and supply chain factors are elaborated in
Section 2.1 and
Section 2.2, respectively.
In
Section 2.2.1,
Section 2.2.2,
Section 2.2.3,
Section 2.2.4 and
Section 2.2.5, the analysis of the models that analyze the effect of the global supply chain variables on the crude oil price is presented. Three time series models and three machine learning prediction methods are proposed. The target is to estimate the ∆lnKprice by using the variation of the other parameters, such as supply demand and global supply chain pressure.
In
Section 3, the experiment results are exhibited. The exogeneity test and Johansen’s cointegration rank test are applied to the VAR and VECM models, which are applied to forecast the lnKprice based on the other indicators. The forecast performance of time series-based forecast methods and the machine learning methods are compared by applying them to the moving, expanding, and fixed window schemes. The limitations and the future works are elaborated in
Section 4 and
Section 5.
Figure 1 presents the schematic diagram of this research.
3. Results
To investigate whether the supply–demand and supply chain factors are weak exogenous variables, we conducted a weak exogeneity test utilizing the standard Wald test. The null hypothesis of a weak exogenous variable was rejected at 1% level for
,
,
,
,
, and
, and for
at the 5% level. These results imply that supply–demand and supply chain factors react to disequilibrium in the long run and could improve the accuracy of predicting the
.
Table 5 below shows the results of the weak exogeneity test. In the case of a larger VAR (n > 2), the Granger causality restriction implies a weak exogeneity form. The results of the Granger causality test based on the VAR and VECM models align with those of the weak exogeneity test. In both models, tests 1, 2, 3, 5, 6, and 7 reject the null hypothesis at the 1% significance level, while test 4 does so at the 10% and 5% levels. For the test, group 1 includes variables from
to
, each corresponding to a specific test (e.g.,
for test 1,
for test 2,
for test 7). Group 2 comprises the remaining six variables.
Since the variables are non-stationary over time and all have a single unit root, a Johansen’s cointegration rank test is conducted to ascertain the presence of a long-run equilibrium relationship between the variables.
Table 6 indicates the results of Johansen’s cointegration test. Both trace and maximum eigenvalue tests fail to reject the null hypothesis of four cointegration vectors at the 5% level. The below table indicates the long-run equilibrium relationship in the VECM model, which consists of the long-run parameter
and the adjustment coefficient
with
normalized.
Based on the results of the weak exogeneity test and Johansen’s cointegration rank tests, we constructed the VAR model and the VECM model. Further, we employed the KNN, SVM, and RF models to forecast the target variable, lnKprice. In principle, the proposed three machine learning models are supervised models and require sufficient subsets of training. The number of data in the training set is one hundred and seventy in this research, and it is not sufficient to build a machine learning forecast model. Therefore, the fine-time interval data might enhance the performance of models. In addition, the optimization method’s hyperplane parameters also make a difference in the forecast performance. This research adopts the grid search method that experiments with the forecast effects due to the combination of the hyperplane parameters of the model. In particular, it is necessary to define the weight (wSVM) that maximizes the margin of the forecast model in the SVM model. The optimization method is involved in the weight decision process. This research adopts the Lagrangian multiplier to define the weight that aims to have the maximum margin, but other optimization methods, such as gradient descent, the genetic algorithm, the particle swarm method, and steal annealing, can be applied to optimize the SVM model based on the characteristics of the dataset.
We considered three window schemes, moving window, expanding window, and fixed window, to compare the forecasting performance of the resulting models. In the moving window, the model predicts based on 119 monthly data. We measure the first one-step-ahead forecasts using the first 119 observations (from February 2008 to December 2017). For the second forecast values, we drop the very first observation (February 2008) and include the 120th datum (January 2018). In this case, the size of the window is fixed at 119. For the expanding window, we used all the data available to estimate the one-step-ahead forecast values. The dataset used to estimate the 120th forecast value (January 2018) is the same as the moving window scheme. However, to estimate the 121st prediction (February 2018), the expanding window scheme incorporates all the data from the 1st to the 120th (from February 2008 to January 2018), whereas the moving window dropped the very first datum (February 2008). In other words, the size of the expanding window increases by one as time goes by. Lastly, for the fixed window scheme, we used 119 in-sample data points (from February 2008 to December 2017) to estimate the 50 forecast values (from January 2018 to February 2022).
To assess the predictive performance of the forecast models, we utilized the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of each forecasting model. According to
Table 7, for the time series model, the VAR and the VECM models outperform the ARIMA model under moving window and expanding window schemes. This result implies that the crude oil supply–demand factors and supply chain factors are useful in improving the forecasting performance of the crude oil import price in South Korea because the VAR and VECM models are suitable methods to forecast the target multivariable that mutually influences each. Additionally, the VECM model has the smallest RMSE under the moving window scheme and has the smallest RMSE and MAPE under the expanding window scheme. This result implies the VECM model can be an appropriate method when the large time series data are sufficient to find the equilibrium among the occupied multivariable. For the fixed window scheme, the table denotes that the VAR model outperforms the ARIMA and VECM models. These results suggest that the VECM model is a superior model when the amount of data is relatively limited. Therefore, the VECM and the VAR models are recommended in most situations to forecast the effect on crude oil by the Global Supply Chain Pressure Index).
The performance comparison Is conducted between the deep neural network (DNN) and the machine learning models, as presented in
Table 7. The superior forecast performance of the machine learning method is valid under the limited data quantity. It was empirically established that machine learning methods would perform better than deep learning for numerical data. As a result of verification, the results of comparing the DNN and machine learning methods without any special tuning were as follows. Among the total data, those from the period from 2008 to 2017 were used as learning data, and the data from 2018 to 2022 were used as testing data.
For the machine learning models,
Table 8 shows that the random forest model outperforms under the expanding window and fixed window schemes. The SVM model has the best performance at the fixed window scheme and is slightly better than KNN under the moving and expanding window. The SVM and the RF models have a robust forecast performance when the training datasets are insufficient and unbalanced compared to the KNN model. The random forest model basically consists of the ensemble model, so it shows a decent performance when the data are not sufficient for machine learning. Our problem is more explainable with a hyper-parameter optimized by the Lagrangian multiplier. This finding aligns with a previous study by Keerthan [
41], which also shows the superiority of the SVM forecast model for oil price prediction that has a similar structure to our dataset.
Interestingly, under the moving window and expanding window, the time series model seems to have better prediction performance than the machine learning model. However, under the fixed window scheme, the machine learning models outperform the conventional time series models. This could imply that the prediction performance of machine learning and time series models might vary depending on the prediction range.
Regarding the sensitivity analysis, the result of the sensitivity analysis is shown in
Table 9. The windows of the fixed, moving, and expanding schemes are varied to analyze the forecast performance of the machine learning method. Three window sizes, 99, 119, and 139, are selected and experimented with. Based on the sensitivity test results of the experiment with the learning data size, the forecast tendency of the machine learning method is maintained. Therefore, it can be concluded that this forecast scheme has robustness for this crude oil price prediction using the relation of the GSCPI.
Based on the preliminary comparison results, we conducted the encompassing test to compare the forecast values of the conventional time series model with those of the machine learning model. The below is the equation for the test:
where
is the real value of the crude oil import price in South Korea,
is the forecast values from the time series model,
is the forecast values from the machine learning model,
are the coefficients of i th forecast, and
is the error term.
Table 10 denotes that we could reject both the null (
) and alternative (
), which means the combined (or weighted) forecasts with
and
provide a better forecast at the 5% level. That is, for the moving window and expanding window schemes, the combined VECM forecasts and SVM forecasts would provide better forecast information. Further, under the fixed window scheme, the combined VAR and SVM prediction values could improve the forecasting accuracy.
4. Discussion
As the importance of the role of supply chain risk management for strategic materials, especially those that are 100% import-dependent, has increased, we incorporated the supply chain-related variable into the crude oil import price forecasting model. This research investigates whether the supply chain factor, represented by the GSCPI, could significantly influence the improvement in the forecast performance of crude oil import prices in South Korea. We conducted the weak exogeneity test to see if the GSCPI is a weak exogenous variable. The null hypothesis () was rejected at the 1% level, which implies that the GSCPI might improve the predicting accuracy of the . Furthermore, we compared the forecasting performance of the VAR and VECM models, including the GSCPI with the ARIMA model, and found that the models with the GSCPI outperformed the models without the GSCPI in predicting the crude oil import price in South Korea. Based on these results, we propose that monitoring variables related to supply chain disruptions, such as the GSCPI, could be effective in stabilizing domestic prices and establishing long-term sustainable supply chain or energy policies. For instance, the South Korean government is currently seeking to enact a basic bill to support supply chain stabilization for economic security. This proposed bill includes the selection of economic security items and the operation of an early warning system to proactively identify and respond to supply chain risks. Building on the findings of this research, the authors suggest that early warning systems for crude oil should include monitoring of the GSCPI. Moreover, these implications are applicable to countries facing conditions similar to South Korea, particularly those heavily dependent on 100% oil imports.
In this research, a novel hybrid factor-based approach is proposed. We compared the forecasting performance of time series models ARIMA, VAR, and VECM, as well as the machine learning models KNN, SVM, and RF, using RMSE and MAPE under three different window schemes. As shown in
Table 8, for time series models, the VECM model outperforms the ARIMA and VAR models under the moving and expanding window schemes, while the VAR model outperforms the ARIMA and VECM models under the fixed window scheme. For the machine learning model, the SVM model has the smallest RMSE and MAPE under all three window schemes. Based on these preliminary comparison results, we conducted the encompassing test to compare the forecast values of traditional time series models with those of machine learning models. Interestingly, the results of the encompassing test indicated that combining forecasts from time series models and machine learning models provided a better forecast. These findings are consistent with the previous research, which showed better prediction performance of proposed hybrid models than their counterparts (Safari and Davallou [
23]; Zhang et al. [
27]; Abdollahi [
28]; He et al. [
42]; Ning et al. [
43]).
This research is meaningful in that it may serve as a foundation for the development of future oil price prediction models by examining whether supply chain-related variables are important factors in oil price prediction and what methodologies can be applied to enhance the forecasting accuracy of oil prices. Based on this, it is expected that more sophisticated forecasting models can be developed in future studies. In addition, it will be necessary to continue to discover supply chain-related variables such as the GSCPI for oil price prediction.
5. Conclusions
In this research, we aim to offer valuable insights for policymakers tasked with establishing a stable supply and demand strategy for strategic commodities and a national price stability strategy. Given South Korea’s 100% dependence on crude oil imports, forecasting crude oil prices is crucial. This research emphasizes the importance of monitoring supply chain-related variables to enhance the predictive performance of crude oil prices, proposing a hybrid factor-based forecasting approach.
Nevertheless, there are a few considerations to solidify this approach. Firstly, this research evaluated the forecast performance of three representative machine learning-based regression methods to determine if these could enhance the forecasting performance of the crude oil import price in South Korea. We analyzed the performance of the time windows of SVM, RF, and KNN. This research contributes by suggesting the appropriate hyper-parameter to build the machine learning model-based analysis framework. The grid search method was employed to find the optimal value that boosts the forecast performance of machine learning models. In addition, the weight of the SVM is suggested by the Lagrangian multiplier-based optimization. The machine learning-based estimator presents the forecast excellence of the fixed window-based forecast and the encompassing experiments.
Meanwhile, further studies can be conducted to optimize the SVM’s parameter selection. Investigating different optimization methods, such as gradient descent, the genetic algorithm, and the particle swarm method, could help define the weighting factor more effectively. In addition, the machine learning method’s forecast performance is also required to be studied when the time interval of the learning feature is finer than in this research. If the number of learning features is insufficient, then the augmentation or the replication of the time series data can be examined. Furthermore, other real-time-based estimators, such as the Long Short-Term Memory (LSTM) model, could be helpful for oil price prediction. LSTM is the one type of recurrent neural network model that has strong forecast performance in real-time variant data.
The data of the crude oil price is gathered from Petronet, but the oil price data can be gathered in identical forms from other sources such as Bloomberg energy and Wall Street Journal market data [
44,
45]. Global balance, Strategic Reserves, and GSCPI are global indexes, so their data can be obtained from various sources (e.g., Bloomberg and the Organization of the Petroleum Exporting Countries (OPEC) [
44,
46]) in similar forms. The limitation of this paper is that it only targeted the Korean market, which is described by Kprice, Kdemand, and Kstock, so future works can look into applying our model to other markets, for example, Europe, China, and the United States.
The change in the measurement frequency of the original data source that represents the relation between the crude oil price and GSCPI could be a valuable topic to be studied further in the future. If sufficient data, enough to apply the deep learning model, is reserved, then it can be used to predict the future time domain tendency of the crude oil price. Also, the recent generative AI foundation model has the potential to make a general AI model that can answer the projection of the crude oil price due to the GSCPI and other parameters’ conjectures.
Lastly, as Livieris [
47] mentioned, AI methods do not guarantee better forecasting performance in all cases. In this research, we found that the forecasting accuracy of traditional econometric models outperformed that of machine learning methods in the case of one-step-ahead oil price forecasting. While we considered three window schemes—moving, expanding, and fixed window—for the robustness of the estimates, it would be interesting to conduct a forecasting horizon sensitivity test, as it could impact the forecasting performance. Furthermore, it might be meaningful to analyze specific cases where the machine learning method demonstrates higher accuracy. Future research should focus on specific cases that contribute to increasing the prediction accuracy of econometric models, machine learning models, or hybrid models.