State-Space Compression for Efficient Policy Learning in Crude Oil Scheduling
Abstract
:1. Introduction
- A novel deep reinforcement learning framework, VSCS, is presented, employing a variational autoencoder to distill the complex, high-dimensional state space of refinery crude oil scheduling into a compact, low-dimensional feature space for optimal policy identification.
- To address the challenge of selecting the dimensionality for low-dimensional features, we devised a method that rigorously evaluates the similarity of feature reconstructions. This approach, integrated with visual analytics, enables the precise determination of the optimal dimensionality for low-dimensional features.
- The VSCS approach delineated herein underwent comprehensive experiments within the crude oil scheduling problem, conclusively affirming the framework’s efficacy. Experimental validation confirmed the appropriateness of the chosen low-dimensional feature dimensions, establishing a robust empirical foundation for the methodology.
2. Related Work
3. Problem Formulation
3.1. Description of the Refinery Scheduling Problem
- Within a single cycle, each tank must contain only one type of oil product.
- Communal storage tanks and dock tanks can only commence oil transfer operations after completing static desalting.
- The liquid levels in all storage tanks must be maintained within the specified upper and lower capacity limits.
- The transfer rates must remain within the safe transfer speed range.
- Crude oil transported via overland pipelines enters the factory tanks at a predetermined rate.
- The processing units must operate continuously in accordance with the specified processing schemes and plans.
3.2. Markov Modeling
4. The Proposed VSCS Algorithm
4.1. The Framework of VSCS
4.2. Low-Dimensional Feature Generation Module
Algorithm 1 Steps of computation in low-dimensional feature generation module |
|
4.3. Policy Learning Module
Algorithm 2 The proposed VSCS Algorithm |
|
5. Experiment
- Comparing the VSCS method introduced in this study with baseline algorithms using a dataset of refinery crude oil storage and transportation scheduling from an actual scenario.
- Analyzing the performance of the algorithm at various compression scales to determine the optimal low-dimensional feature dimensionality.
- Conducting a similarity analysis between low-dimensional reconstructed state features and original state samples and proposing a state reconstruction threshold for refinery crude oil scheduling problems based on reconstruction similarity.
- Evaluating the performance of the proposed algorithm by visualizing the low-dimensional features.
5.1. Data for Simulator
5.2. Comparison with Baseline Algorithm
5.3. Impact of Reconstruction with Different Compression Sizes
5.4. Reconstructed State Vector Similarity Analysis
5.5. Visual Analysis of Low-Dimensional Features
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xu, J.; Zhang, S.; Zhang, J.; Wang, S.; Xu, Q. Simultaneous scheduling of front-end crude transfer and refinery processing. Comput. Chem. Eng. 2017, 96, 212–236. [Google Scholar] [CrossRef]
- Jia, Z.; Ierapetritou, M.; Kelly, J.D. Refinery short-term scheduling using continuous time formulation: Crude-oil operations. Ind. Eng. Chem. Res. 2003, 42, 3085–3097. [Google Scholar] [CrossRef]
- Zheng, W.; Gao, X.; Zhu, G.; Zuo, X. Research progress on crude oil operation optimization. CIESC J. 2021, 72, 5481. [Google Scholar]
- Hamisu, A.A.; Kabantiok, S.; Wang, M. An Improved MILP model for scheduling crude oil unloading, storage and processing. In Computer Aided Chemical Engineering; Elsevier: Lappeenranta, Finland, 2013; Volume 32, pp. 631–636. [Google Scholar]
- Zhang, H.; Liang, Y.; Liao, Q.; Gao, J.; Yan, X.; Zhang, W. Mixed-time mixed-integer linear programming for optimal detailed scheduling of a crude oil port depot. Chem. Eng. Res. Des. 2018, 137, 434–451. [Google Scholar] [CrossRef]
- Furman, K.C.; Jia, Z.; Ierapetritou, M.G. A robust event-based continuous time formulation for tank transfer scheduling. Ind. Eng. Chem. Res. 2007, 46, 9126–9136. [Google Scholar] [CrossRef]
- Li, F.; Qian, F.; Du, W.; Yang, M.; Long, J.; Mahalec, V. Refinery production planning optimization under crude oil quality uncertainty. Comput. Chem. Eng. 2021, 151, 107361. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef] [PubMed]
- Esteso, A.; Peidro, D.; Mula, J.; Díaz-Madroñero, M. Reinforcement learning applied to production planning and control. Int. J. Prod. Res. 2023, 61, 5772–5789. [Google Scholar] [CrossRef]
- Dong, Y.; Zhang, H.; Wang, C.; Zhou, X. Soft actor-critic DRL algorithm for interval optimal dispatch of integrated energy systems with uncertainty in demand response and renewable energy. Eng. Appl. Artif. Intell. 2024, 127, 107230. [Google Scholar] [CrossRef]
- Kuhnle, A.; Kaiser, J.P.; Theiß, F.; Stricker, N.; Lanza, G. Designing an adaptive production control system using reinforcement learning. J. Intell. Manuf. 2021, 32, 855–876. [Google Scholar] [CrossRef]
- Park, J.; Chun, J.; Kim, S.H.; Kim, Y.; Park, J. Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. Int. J. Prod. Res. 2021, 59, 3360–3377. [Google Scholar] [CrossRef]
- Yang, X.; Wang, Z.; Zhang, H.; Ma, N.; Yang, N.; Liu, H.; Zhang, H.; Yang, L. A review: Machine learning for combinatorial optimization problems in energy areas. Algorithms 2022, 15, 205. [Google Scholar] [CrossRef]
- Ogunfowora, O.; Najjaran, H. Reinforcement and deep reinforcement learning-based solutions for machine maintenance planning, scheduling policies, and optimization. J. Manuf. Syst. 2023, 70, 244–263. [Google Scholar] [CrossRef]
- Hamisu, A.A.; Kabantiok, S.; Wang, M. Refinery scheduling of crude oil unloading with tank inventory management. Comput. Chem. Eng. 2013, 55, 134–147. [Google Scholar] [CrossRef]
- Shah, N. Mathematical programming techniques for crude oil scheduling. Comput. Chem. Eng. 1996, 20, S1227–S1232. [Google Scholar] [CrossRef]
- Pinto, J.M.; Joly, M.; Moro, L.F.L. Planning and scheduling models for refinery operations. Comput. Chem. Eng. 2000, 24, 2259–2276. [Google Scholar] [CrossRef]
- Zimberg, B.; Ferreira, E.; Camponogara, E. A continuous-time formulation for scheduling crude oil operations in a terminal with a refinery pipeline. Comput. Chem. Eng. 2023, 178, 108354. [Google Scholar] [CrossRef]
- Su, L.; Bernal, D.E.; Grossmann, I.E.; Tang, L. Modeling for integrated refinery planning with crude-oil scheduling. Chem. Eng. Res. Des. 2023, 192, 141–157. [Google Scholar] [CrossRef]
- Castro, P.M.; Grossmann, I.E. Global optimal scheduling of crude oil blending operations with RTN continuous-time and multiparametric disaggregation. Ind. Eng. Chem. Res. 2014, 53, 15127–15145. [Google Scholar] [CrossRef]
- Assis, L.S.; Camponogara, E.; Menezes, B.C.; Grossmann, I.E. An MINLP formulation for integrating the operational management of crude oil supply. Comput. Chem. Eng. 2019, 123, 110–125. [Google Scholar] [CrossRef]
- Assis, L.S.; Camponogara, E.; Grossmann, I.E. A MILP-based clustering strategy for integrating the operational management of crude oil supply. Comput. Chem. Eng. 2021, 145, 107161. [Google Scholar] [CrossRef]
- Zimberg, B.; Camponogara, E.; Ferreira, E. Reception, mixture, and transfer in a crude oil terminal. Comput. Chem. Eng. 2015, 82, 293–302. [Google Scholar] [CrossRef]
- Ramteke, M.; Srinivasan, R. Large-scale refinery crude oil scheduling by integrating graph representation and genetic algorithm. Ind. Eng. Chem. Res. 2012, 51, 5256–5272. [Google Scholar] [CrossRef]
- Hou, Y.; Wu, N.; Zhou, M.; Li, Z. Pareto-optimization for scheduling of crude oil operations in refinery via genetic algorithm. IEEE Trans. Syst. Man Cybern. Syst. 2015, 47, 517–530. [Google Scholar] [CrossRef]
- Hou, Y.; Wu, N.; Li, Z.; Zhang, Y.; Qu, T.; Zhu, Q. Many-objective optimization for scheduling of crude oil operations based on NSGA-III with consideration of energy efficiency. Swarm Evol. Comput. 2020, 57, 100714. [Google Scholar] [CrossRef]
- Ramteke, M.; Srinivasan, R. Integrating graph-based representation and genetic algorithm for large-scale optimization: Refinery crude oil scheduling. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2011; Volume 29, pp. 567–571. [Google Scholar]
- Badia, A.P.; Piot, B.; Kapturowski, S.; Sprechmann, P.; Vitvitskyi, A.; Guo, Z.D.; Blundell, C. Agent57: Outperforming the atari human benchmark. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 507–517. [Google Scholar]
- Hubbs, C.D.; Li, C.; Sahinidis, N.V.; Grossmann, I.E.; Wassick, J.M. A deep reinforcement learning approach for chemical production scheduling. Comput. Chem. Eng. 2020, 141, 106982. [Google Scholar] [CrossRef]
- Gui, Y.; Tang, D.; Zhu, H.; Zhang, Y.; Zhang, Z. Dynamic scheduling for flexible job shop using a deep reinforcement learning approach. Comput. Ind. Eng. 2023, 180, 109255. [Google Scholar] [CrossRef]
- Che, G.; Zhang, Y.; Tang, L.; Zhao, S. A deep reinforcement learning based multi-objective optimization for the scheduling of oxygen production system in integrated iron and steel plants. Appl. Energy 2023, 345, 121332. [Google Scholar] [CrossRef]
- Lee, Y.H.; Lee, S. Deep reinforcement learning based scheduling within production plan in semiconductor fabrication. Expert Syst. Appl. 2022, 191, 116222. [Google Scholar] [CrossRef]
- Yang, F.; Yang, Y.; Ni, S.; Liu, S.; Xu, C.; Chen, D.; Zhang, Q. Single-track railway scheduling with a novel gridworld model and scalable deep reinforcement learning. Transp. Res. Part Emerg. Technol. 2023, 154, 104237. [Google Scholar] [CrossRef]
- Pan, L.; Cai, Q.; Fang, Z.; Tang, P.; Huang, L. A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Hilton, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1393–1400. [Google Scholar]
- Yan, Q.; Wang, H.; Wu, F. Digital twin-enabled dynamic scheduling with preventive maintenance using a double-layer Q-learning algorithm. Comput. Oper. Res. 2022, 144, 105823. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, Y.; Xiahou, T. A deep reinforcement learning approach to dynamic loading strategy of repairable multistate systems. IEEE Trans. Reliab. 2021, 71, 484–499. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Zang, W.; Song, D. Energy-saving profile optimization for underwater glider sampling: The soft actor critic method. Measurement 2023, 217, 113008. [Google Scholar] [CrossRef]
- Hussain, A.; Bui, V.H.; Musilek, P. Local demand management of charging stations using vehicle-to-vehicle service: A welfare maximization-based soft actor-critic model. eTransportation 2023, 18, 100280. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
Technique | Scale | Performance |
---|---|---|
discrete-time MILP framework [16] | Four crude types, two CDUs, seven refinery tanks, and eight portside tanks; the time horizon of operation is one month, and a discretization interval of one day is used | in a few minutes |
continuous and discrete temporal MILP [17] | Three CDUs, six storage tanks, and three oil pipelines; the time horizon of operation is one day, at every hour | in reasonable time |
continuous-time MINLP [1] | One single docking berth, four storage tanks, four charging tanks, and two CDUs; the time horizon of operation is 15 days | 25.94 s |
Many-objective optimization for scheduling of crude oil operations based on NSGA-III [26] | There are three distillers with nine charging tanks and a long-distance pipeline; every time, it needs to produce a 10-day schedule | about 100 s–150 s |
MILP framework with rolling horizon strategy [23] | Eight tanks, where one tank is assumed in maintenance, five crude qualities; the time horizon is 31 or 61 days (periods) | less than 5 min |
Model | Number of Neurons | Number of Hidden Layers | Optimizer | Discount Factor | Learning Rate | Soft Update Coefficient | Batch Size | Entropy Threshold | Experience Buffer Size |
---|---|---|---|---|---|---|---|---|---|
Policy learning module | 512 | 5 | Adam [40] | 0.99 | 0.03 | 0.005 | 128 | 0.9 | 100,000 |
Low−dimensional feature generation module | 40 | 1 | Adam [40] |
Iterations for Maximum Reward | Final Reward | Training Time to Steady State | |
---|---|---|---|
SAC | 209 | −27,540,217 | 305 |
VSCS | 47 | −2,942,594 | 78 |
Improvement Rate (%) | 77.5 | 89.3 | 74.4 |
Feature Dimension | Iterations for Steady State | Convergence Speed Improvement Rate | Final Reward | Reward Improvement Rate |
---|---|---|---|---|
VSCS (10) | 148 | 29.19% | −4,040,694 | 85.33% |
VSCS (15) | 215 | −2.87% | −1,980,772 | 92.81% |
VSCS (20) | 158 | 24.40% | −2,143,448 | 92.22% |
VSCS (25) | 134 | 35.89% | −3,493,724 | 87.31% |
VSCS (30) | 147 | 29.67% | −1,940,762 | 92.95% |
VSCS (35) | 170 | 18.66% | −2,942,594 | 89.32% |
VSCS (40) | 47 | 77.51% | −2,991,348 | 89.14% |
VSCS (45) | 87 | 58.37% | −1,941,364 | 92.95% |
VSCS (50) | 105 | 49.76% | −4,876,383 | 82.29% |
Dimensionality | 55 | 50 | 45 | 40 | 35 | 30 | 25 | 20 | 15 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Arithmetic Mean | 12.66 | 12.72 | 12.66 | 12.61 | 12.67 | 12.47 | 12.52 | 12.55 | 12.53 | 12.54 |
Maximum | 146.93 | 149.13 | 141.87 | 145.71 | 149.14 | 146.29 | 141.77 | 136.13 | 134.52 | 139.56 |
Minimum | 0.23 | 0.27 | 0.33 | 0.44 | 0.38 | 0.56 | 0.61 | 1.17 | 1.36 | 0.59 |
Variance | 608.91 | 620.99 | 617.28 | 608.06 | 619.23 | 606.81 | 611.70 | 614.83 | 614.93 | 611.88 |
Standard Deviation | 24.67 | 24.92 | 24.85 | 24.66 | 24.88 | 24.63 | 24.73 | 24.80 | 24.80 | 24.74 |
Median | 4.19 | 4.17 | 4.04 | 4.06 | 3.99 | 3.88 | 3.90 | 3.79 | 3.73 | 3.86 |
50 | 40 | 30 | 20 | 10 | |
---|---|---|---|---|---|
Intracluster Cumulative Distance (5, 0.3) | 75.07 | 73.04 | 71.88 | 79.53 | 90.4 |
Intracluster Cumulative Distance (5, 0.15) | 66.32 | 69.14 | 65.7 | 72.67 | 83.5 |
Intracluster Cumulative Distance (10, 0.15) | 57.81 | 57.77 | 57.55 | 61.17 | 70.1 |
Intracluster Cumulative Distance (10, 0.10) | 56.38 | 55.4 | 55.14 | 59.02 | 67.3 |
Intracluster Cumulative Distance (10, 0.50) | 71.25 | 71.13 | 70.22 | 75.74 | 86.35 |
Average Intracluster Cumulative Distance | 65.37 | 65.30 | 64.10 | 69.63 | 79.53 |
Intracluster Density (10, 0.50) | 0.104 | 0.104 | 0.102 | 0.109 | 0.124 |
Intracluster Density (10, 0.15) | 0.083 | 0.082 | 0.082 | 0.086 | 0.101 |
Intracluster Density (5, 0.3) | 0.108 | 0.104 | 0.104 | 0.113 | 0.131 |
Intracluster Density (5, 0.15) | 0.094 | 0.099 | 0.093 | 0.105 | 0.124 |
Average Intracluster Density | 0.0984 | 0.0984 | 0.0968 | 0.1042 | 0.1208 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, N.; Li, H.; Liu, H. State-Space Compression for Efficient Policy Learning in Crude Oil Scheduling. Mathematics 2024, 12, 393. https://doi.org/10.3390/math12030393
Ma N, Li H, Liu H. State-Space Compression for Efficient Policy Learning in Crude Oil Scheduling. Mathematics. 2024; 12(3):393. https://doi.org/10.3390/math12030393
Chicago/Turabian StyleMa, Nan, Hongqi Li, and Hualin Liu. 2024. "State-Space Compression for Efficient Policy Learning in Crude Oil Scheduling" Mathematics 12, no. 3: 393. https://doi.org/10.3390/math12030393
APA StyleMa, N., Li, H., & Liu, H. (2024). State-Space Compression for Efficient Policy Learning in Crude Oil Scheduling. Mathematics, 12(3), 393. https://doi.org/10.3390/math12030393