Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry
Abstract
:1. Introduction
- We proposed a gasoline order prediction model for gas stations using a linear regression model to understand the trend of gasoline consumption in South Korea.
- We proposed a Variational Auto-Encoder (VAE) and K-means clustering algorithm to address data asymmetry.
- ○
- We performed data augmentation on our model with the Variational Auto-Encoder (VAE) to implement a model with high accuracy and generalized performance.
- ○
- We grouped the datasets into clusters using the K-means clustering and then augmented each cluster’s datasets with VAE to better reflect the characteristics of the data samples for augmentation.
- We found significant independent variables that influence gasoline orders using the Variance Inflation Factor (VIF) and p-value.
- We confirmed that linear regression is the most suitable method for the prediction of gasoline orders through modeling with various regression models.
2. Related Works
2.1. K-Means Clustering
2.2. Variational Auto-Encoder
2.3. Regression
2.4. Ensemble
3. Material and Methods
3.1. Data Collection
3.2. Data Augmentation with Variational Auto-Encoder
- First, we grouped the training data into clusters. was set to 11, which was decided using the elbow method of the K-means clustering algorithm (refer to Section 2.1). The elbow method shows that the data have been organized using a visual analysis and gives insight into the optimal value of .
- Figure 3 shows the changes in the similarity distance according to the number () of clusters. When was set to 11, the distance dramatically decreased. The separate clusters consisted of 4 to 28 datasets. However, to obtain a good performance of the machine learning algorithm, it is best if the amount of data in each cluster is similar.
- The amount of data in each cluster should therefore be evenly distributed. To meet this objective, the data in each group were standardized. The standardized data were augmented with VAE. The data within each cluster were augmented to 85 sets, and the total number of augmented training sets was 935.
3.3. Preprocessing and Exploration of Independent Variables
3.4. Modeling
4. Experimental Results
4.1. Evaluation of the Prediction of Gasoline Orders Using Data Augmentation
4.2. Evaluation of the Prediction of Gasoline Orders Using Regression Models
4.3. Linear Regression Equation of Gasoline Orders
4.4. Analysis of Variables Affecting Gasoline Orders with Linear Regression
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
References
- Country Analysis Brief: South Korea; U.S. Energy Information Administration: Washington, DC, USA, 2023.
- Kim, H. Analysis of Changes in Petroleum Product Price Determination Structure; Korea Energy Economics Institute: Ulsan, Republic of Korea, 2009. [Google Scholar]
- Korean Statistical Information Service (KOSIS); Ministry of Trade, Industry and Energy: Sejong City, Republic of Korea, 2023.
- Bacon, R.W. Rockets and feathers: The asymmetric speed of adjustment of UK retail gasoline prices to cost changes. Energy Econ. 1991, 13, 211–218. [Google Scholar] [CrossRef]
- Borenstein, S.; Shepard, A. Sticky prices, inventories, and market power in wholesale gasoline markets. RAND J. Econ. 2002, 33, 116–139. [Google Scholar] [CrossRef]
- Kim, H. An Analysis of the Asymmetry of Domestic Gasoline Price Adjustment to the Crude Oil Price Changes: Using Quantile Autoregressive Distributed Lag Model. Environ. Resour. Econ. Rev. 2022, 31, 755–775. [Google Scholar]
- Kim, N.J.; Kim, H.G. An Effect of Volatility of Crude Oil Price on Asymmetry of Domestic Gasoline Price Adjustment. Asia-Pac. J. Bus. 2023, 14, 351–364. [Google Scholar]
- Bae, J.; Kim, S.; Kim, M.; Heo, E. The Asymmetric Response of Gasoline Prices to International Crude Oil Price Changes Considering Inventories. Environ. Resour. Econ. Rev. 2013, 22, 643–670. [Google Scholar] [CrossRef]
- Jang, H.; Choi, B. Effects of fuel tax cut on retail prices and its implications. Korean Energy Econ. Rev. 2023, 22, 205–228. [Google Scholar]
- Petroleum and Alternative Fuel Business Act. Available online: http://www.kpetro.or.kr (accessed on 8 October 2023).
- Shyakur, M.A.; Khotimah, B.K.; Rochman, E.M.S.; Satoto, B.D. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2018; Volume 336. [Google Scholar]
- Kingma, K.D.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Gharibi, M.A.; Nafisi, H.; Askarian-abyaneh, H.; Hajizadeh, A. Deep learning framework for day-ahead optimal charging scheduling of electric vehicles in parking lot. Appl. Energy 2023, 349, 121614. [Google Scholar] [CrossRef]
- Omer, T.; Zohdy, M.; Rrushi, J. Clustering Application for Data-Driven Prediction of Health Insurance Premiums for People of Different Ages. In Proceedings of the IEEE International Conference on Consumer Electronics (ICCE), Penghu, Taiwan, 10–12 January 2021. [Google Scholar]
- Maity, S.; Mandal, R.P.; Bhattacharjee, S.; Chatterjee, S. Variational Autoencoder-Based Imbalanced Alzheimer Detection Using Brain MRI Images. In Proceedings of International Conference on Computational Intelligence, Data Science and Cloud Computing: IEM-ICDC 2021; Springer: Singapore, 2022; pp. 165–178. [Google Scholar]
- Kim, J.; Park, M. Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model. J. Converg. Cult. Technol. 2022, 8, 91–98. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statical Society. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Hoeri, A.; Kennard, R. Ridge regression. Encycl. Stat. Sci. 1988, 8, 129–136. [Google Scholar]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Statical Society. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Segal, M.R. Machine learning benchmarks and random forest regression. Cent. Bioinform. Mol. Biostat. 2004. Available online: https://escholarship.org/uc/item/35x3v9t4 (accessed on 8 October 2023).
- Geurts, P.; Ernst, D.; Wehankel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, USA, 8–14 December 2001; Volume 1. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Opinet. Available online: http://www.opinet.co.kr (accessed on 8 October 2023).
- Economic Statistics System (ECOS). Available online: http://www.ecos.bok.or.kr (accessed on 8 October 2023).
- Korea Meteorological Administration. Available online: http://www.kma.go.kr (accessed on 8 October 2023).
- Petronet. Available online: http://www.petronet.co.kr (accessed on 8 October 2023).
- O’brien, R.M. A Caution Regarding Rules of Thumb for Variance Inflation Factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
- Mason, R.L.; Gunst, R.F.; Hess, J.L. Statistical Design and Analysis of Experiments: With Applications to Engineering and Science; John Wiley & Sons: New York, NY, USA, 2003; p. 474. [Google Scholar]
- Alin, A. Multicollinearity. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 370–374. [Google Scholar] [CrossRef]
- Antunes, F.; Ribeiro, B.; Pereira, F. Probabilistic Modeling and Visualization for Bankruptcy Prediction. Appl. Soft Comput. 2017, 60, 831–843. [Google Scholar] [CrossRef]
- Jabeur, S.B.; Sadaaoui, A.; Sghaier, A.; Aloui, R. Machine learning models and cost-sensitive decision trees for bond rating prediction. J. Oper. Res. Soc. 2020, 71, 1161–1179. [Google Scholar] [CrossRef]
- Jebeur, S.B.; Mefteh-Wali, S.; Viviani, J.L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann. Oper. Res. 2021, 1–21. [Google Scholar] [CrossRef]
- Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation Between Training and Testing Sets: A pedagogical Explanation. Dep. Tech. Rep. 2018, 1209. Available online: https://scholarworks.utep.edu/cs_techrep/1209 (accessed on 8 October 2023).
- Olston, C.; Najork, M. Web Crawling. Found. Trends® Inf. Retr. 2010, 4, 175–246. [Google Scholar] [CrossRef]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
- Gupta, A.; Chugh, D.; Anjum; Katarya, R. Automated News Summarization Using Transformers. Concurr. Comput. Pract. Exp. 2022, 34, e6482. [Google Scholar]
- Kim, H.; Park, M.; Song, K. Analysis of Urban Warming Phenomenon using Degree days in Major Korean Cities. J. Environ. Sci. 2004, 13, 189–196. [Google Scholar]
- Benchmark Oils: Brent Crude, WTI and Dubai. Available online: http://www.investopedia.com (accessed on 8 October 2023).
- Mehra, Y.P. A Federal Fuds Rate Equation. Econ. Inq. 1997, 35, 621–630. [Google Scholar] [CrossRef]
- Jeong, Y.; Chung, H. The Effect of Base Rate Changes on Stock Prices. Korean J. Bus. Adm. 2014, 27, 219–241. [Google Scholar]
- Yoon, S.; Jeon, Y. Consumer Price Outlook and Implications for International Crude Oil Prices. Korea Insurance Research Institute (KIRI), 28 November 2022; Volume 560. Available online: http://www.kiri.or.kr (accessed on 8 October 2023).
- Seo, B. Machine-Learning-Based News Sentiment Index (NSI) of Korea; Working Paper; Bank of Korea: Seoul, Republic of Korea, 2022. [Google Scholar]
- Harpaz, G.; Krull, S.; Yagil, J. The Efficiency of the U.S. Dollar Index Futures Market. J. Futures Mark. 1990, 10, 1986–1998. [Google Scholar] [CrossRef]
- Caldara, D.; Iacoviello, M. Measuring geopolitical risk. Am. Econ. Rev. 2022, 112, 1194–1225. [Google Scholar] [CrossRef]
- Lee, D.; Park, S.Y. A penal analysis on determinants of energy intensity. Korean Energy Econ. Rev. 2020, 19, 89–116. [Google Scholar]
- Ju, W. The Urgent Need for Improving the Economic Oil Dependency of the Top OECD Economy. Hyundai Research Institute. Febuary 2022. Available online: http://www.hri.co.kr (accessed on 8 October 2023).
- Lamoureux, C.G.; Wansley, J.W. Market Effects of Changes in the Standard & Poor’s 500 Index. Financ. Rev. 1987, 22, 53–69. [Google Scholar]
- Norland, E. Economics of Oil-Equity Correlations. 2017. Available online: http://www.cmegroup.com (accessed on 8 October 2023).
Category | Variables | VIF |
---|---|---|
Climate | Cooling degree day | 1.1 |
Prices | Dubai crude oil prices | 8.7 |
International gasoline (95RON) prices | 8.8 | |
Stocks | FFR | 1.4 |
USDX | 4.4 | |
S&P 500 | 4.0 | |
Economy | PPI fluctuation rate | 1.8 |
NSI | 2.1 | |
Policy | GPR | 1.8 |
Fuel tax | 1.7 | |
Management | Gasoline inventory at the gas station | 1.2 |
Category | Variables | Point |
---|---|---|
Climate | Cooling degree day | |
Prices | Dubai crude oil prices | |
International gasoline (95RON) prices | ||
Stocks | FFR | |
USDX | ||
S&P 500 | ||
Economy | PPI fluctuation rate | |
NSI | ||
Policy | GPR | |
Fuel tax | ||
Management | Gasoline inventory at the gas station |
Augmented per Cluster | Number of Training Sets | R-Squared | RMSE | Accuracy | ||||
---|---|---|---|---|---|---|---|---|
Training Sets | Test Sets | Training Sets | Test Sets | Training Sets | Test Sets | |||
Without Augmentation | - | 144 | 0.7441 | 0.7162 | 0.4898 | 0.5827 | 86.26% | 84.63% |
VAE | - | 935 | 0.7490 | 0.7204 | 0.1931 | 0.5598 | 86.54% | 84.88% |
K-means Clustering + VAE | 80 | 880 | 0.7842 | 0.7827 | 0.2666 | 0.4359 | 88.55% | 88.47% |
85 | 935 | 0.7862 | 0.7858 | 0.2614 | 0.4328 | 88.67% | 88.65% | |
90 | 990 | 0.7892 | 0.7831 | 0.2569 | 0.4355 | 88.84% | 88.49% | |
95 | 1045 | 0.7924 | 0.7759 | 0.2513 | 0.4416 | 89.02% | 88.14% | |
100 | 1100 | 0.7953 | 0.7722 | 0.2464 | 0.4463 | 89.18% | 87.88% | |
110 | 1210 | 0.8016 | 0.7706 | 0.2381 | 0.4479 | 89.53% | 87.78% |
Regression Models | Regularization | R-Squared | RMSE | Accuracy | ||||
---|---|---|---|---|---|---|---|---|
Weight | L1:L2 | Training Sets | Test Sets | Training Sets | Test Sets | Training Sets | Test Sets | |
Linear | - | - | 0.7862 | 0.7858 | 0.2614 | 0.4328 | 88.67% | 88.65% |
Ridge | 0.01 | - | 0.7862 | 0.7810 | 0.2613 | 0.4376 | 88.66% | 88.37% |
0.1 | - | 0.7849 | 0.7816 | 0.2622 | 0.4370 | 88.60% | 88.40% | |
1 | - | 0.7885 | 0.7834 | 0.2602 | 0.4352 | 88.80% | 88.51% | |
10 | - | 0.7873 | 0.7827 | 0.2611 | 0.4359 | 88.73% | 88.47% | |
Lasso | 0.01 | - | 0.7764 | 0.7756 | 0.2673 | 0.4429 | 88.11% | 88.07% |
0.1 | - | 0.6962 | 0.6851 | 0.3117 | 0.5247 | 83.44% | 82.77% | |
1 | - | 0.0000 | −0.0525 | 0.5665 | 0.9593 | 00.00% | 00.00% | |
10 | - | 0.0000 | −0.0515 | 0.5637 | 0.9588 | 00.00% | 00.00% | |
Elastic-Net | 0.01 | 30%:70% | 0.7823 | 0.7799 | 0.2642 | 0.4387 | 88.45% | 88.31% |
50%:50% | 0.7844 | 0.7818 | 0.2628 | 0.4367 | 88.57% | 88.42% | ||
70%:30% | 0.7815 | 0.7814 | 0.2644 | 0.4372 | 88.41% | 88.40% | ||
0.1 | 30%:70% | 0.7551 | 0.7549 | 0.2804 | 0.4629 | 86.89% | 86.89% | |
50%:50% | 0.7367 | 0.7338 | 0.2899 | 0.4825 | 85.83% | 85.66% | ||
70%:30% | 0.7209 | 0.7086 | 0.2993 | 0.5048 | 84.90% | 84.18% | ||
1 | 30%:70% | 0.2773 | 0.2665 | 0.4809 | 0.8008 | 52.66% | 51.63% | |
50%:50% | 0.0000 | −0.0524 | 0.5655 | 0.9593 | 00.00% | 00.00% | ||
70%:30% | 0.0000 | −0.0535 | 0.5649 | 0.9598 | 00.00% | 00.00% | ||
10 | 30%:70% | 0.0000 | −0.0525 | 0.5649 | 0.9593 | 00.00% | 00.00% | |
50%:50% | 0.0000 | −0.0525 | 0.5646 | 0.9593 | 00.00% | 00.00% | ||
70%:30% | 0.0000 | −0.0519 | 0.5657 | 0.9590 | 00.00% | 00.00% |
Regression Models | R-Squared | RMSE | Accuracy | |||
---|---|---|---|---|---|---|
Training Sets | Test Sets | Training Sets | Test Sets | Training Sets | Test Sets | |
Linear | 0.7862 | 0.7858 | 0.2614 | 0.4328 | 88.67% | 88.65% |
AdaBoost | 0.8117 | 0.6531 | 0.2452 | 0.5507 | 90.10% | 80.82% |
Extra Trees | 0.7632 | 0.6158 | 0.2753 | 0.5796 | 87.36% | 78.48% |
Random Forest | 0.8382 | 0.5511 | 0.2274 | 0.6265 | 91.55% | 74.23% |
XGBoost | 0.9823 | 0.6969 | 0.0750 | 0.5148 | 99.11% | 83.48% |
Variables | p-Value | Coefficient |
---|---|---|
Cooling degree day | 0.000 | 0.1345 |
Dubai crude oil prices | 0.000 | 0.1802 |
International gasoline (95RON) prices | 0.000 | −0.1370 |
FFR | 0.048 | −0.0204 |
USDX | 0.000 | 0.2235 |
S&P 500 | 0.000 | 0.2824 |
PPI fluctuation rate | 0.043 | 0.0232 |
NSI | 0.000 | 0.0714 |
GPR | 0.000 | −0.0542 |
Fuel tax | 0.002 | −0.0341 |
Gasoline inventory at the gas station | 0.000 | −0.0747 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yoon, S.; Park, M. Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry. Appl. Sci. 2023, 13, 11124. https://doi.org/10.3390/app132011124
Yoon S, Park M. Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry. Applied Sciences. 2023; 13(20):11124. https://doi.org/10.3390/app132011124
Chicago/Turabian StyleYoon, Sungyeon, and Minseo Park. 2023. "Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry" Applied Sciences 13, no. 20: 11124. https://doi.org/10.3390/app132011124
APA StyleYoon, S., & Park, M. (2023). Prediction of Gasoline Orders at Gas Stations in South Korea Using VAE-Based Machine Learning Model to Address Data Asymmetry. Applied Sciences, 13(20), 11124. https://doi.org/10.3390/app132011124