Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance
Abstract
:1. Introduction
2. Problem Description
3. Solution Approach Design
3.1. Background of General Q-Learning and Deep Q Networks
Algorithm 1 General Q-Learning Algorithm |
arbitrarily for all state-action pairs |
; |
is terminated |
Algorithm 2 Deep Q-Learning Algorithm with Experience Replay | |
Initialize replay memory to capacity | |
Initialize action-value function with random weights | |
Initialize target action-value function with weights | |
For episode = 1: M_ do | |
Initialize sequence and preprocessed sequence | |
For t = 1: T_ do | |
With probability select a random action | |
otherwise select | |
Execute action in emulator and observe reward and image | |
Set and preprocess | |
Store transition in | |
Sample random minibatch of transitions from | |
Set | if episode terminates at step |
otherwise | |
Perform a gradient descent step on with respect to the network parameters | |
Every steps reset | |
End For | |
End for |
3.2. Definition of Key Elements
Algorithm 3 |
based on the above difference list |
from the available action space |
If the system state is terminated, i.e., there are no more jobs that can be assigned to any factory, then |
Else |
, respectively |
then |
End If End If |
3.3. Overall Algorithm Framework
Algorithm 4 Selection of the First Action |
Input (Optional): The Job List which includes the first job of each factory in the previous solution |
] |
For machine in ML do |
If the machine index is equal to 1 then |
let the starting time of the machine is 0 and initialize the time window list using Algorithm 5 |
If the input is empty, i.e., the previous solution is worse than the historical optimal solution, then |
select the first job randomly from the optional job list for the machine |
Else |
select the job of the corresponding machine in the list of entered Job List End If |
Else |
Update the current machine and machine index |
Calculate the starting time of the machine, i.e., the completion time of the previous machine |
Initialize the time window list using Algorithm 5 |
Identify and reserve the selected operation which depends on the last operation of the previous machine End If |
Update the available action space, remaining operation number |
Update the lists of the starting time, processing time, machine’s age, completion time |
Initialize the system state referring to the difference calculation in Algorithm 3 |
Remove the selected operation from job lists for all the factories |
End For |
Reserve the maximum completion time for all the factories |
Output: System state features, available action space |
Algorithm 5 Initialization of Flexible Maintenance Time Windows |
timeWindowList = [ [ ] for machine in ML ] |
Input: |
For index = 1: N + 2 do |
] in the corresponding list in the timeWindowList |
End For |
Output: timeWindowList |
Algorithm 6 Judgement of the Start Time of Actions |
Input: Selected action |
Decode the selected action, i.e., which job is to be assigned to which factory |
For do |
If the machine index is equal to 1 then |
the earliest start time of the action is initialized as the completion time of the last job on this machine |
Else |
the earliest start time depends on the maximum completion time of the previous job as well as operation End If |
Update the available action space, remaining operation number |
Calculate the actual processing time and remove the selected operation from job lists for all the factories |
If the completion time of the previous job does not conflict with the adjacent time window and the insertion of the action still does not conflict with the time window constraint then the starting time of the action is equal to the earliest start time of the action |
Else if the PM is performed immediately after the previous job then |
the starting time of the action is equal to the maximum value of the action’s earliest start time and the completion time of PM |
Else if the completion time of the previous job does not conflict with the adjacent time window and the insertion of the action still does not conflict with the time window constraint then the machine keeps idle until the lower bound of the time window and PM is performed |
Calculate the starting time of the action by evaluating the maximum value of the action’s earliest start time and the completion time of PM End If |
Update the lists of the starting time, processing time, machine’s age, completion time |
Update the system state referring to the difference calculation in Algorithm 3 |
End For |
Reserve the maximum completion time for all the factories |
Output: Start time of the selected action |
Algorithm 7 DQN-Based Solution Framework |
Input: |
For episode = 1: Learning number do |
Select the first action using Algorithm 4 to initialize the system state features |
While True: |
Choose an action based on the observed state features using Algorithm 2 |
Judge the actual start time of the chosen action using Algorithm 6 |
Update new state features and calculate immediate reward using Algorithm 3 |
using Algorithm 2 |
If there are no more jobs that can be assigned to any factory then |
calculate the final reward using Algorithm 3 |
Break End If End While |
End For |
Output: Algorithm runtime, learning curve, detailed optimal scheduling result |
4. Numerical Experiments
4.1. Parameter Settings
4.2. Performance Evaluation of the Developed Algorithm
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Framinan, J.M.; Gupta, J.N.; Leisten, R. A review and classification of heuristics for permutation flow-shop scheduling with makespan objective. Oper. Res. Soc. 2004, 55, 1243–1255. [Google Scholar] [CrossRef]
- Zobolas, G.; Tarantilis, C.D.; Ioannou, G.J.C. Minimizing makespan in permutation flow shop scheduling problems using a hybrid metaheuristic algorithm. Comput. Oper. Res. 2009, 36, 1249–1267. [Google Scholar] [CrossRef]
- Yenisey, M.M.; Yagmahan, B.J.O. Multi-objective permutation flow shop scheduling problem: Literature review, classification and current trends. Omega 2014, 45, 119–135. [Google Scholar] [CrossRef]
- Wu, X.; Che, A.J.O. Energy-efficient no-wait permutation flow shop scheduling by adaptive multi-objective variable neighborhood search. Omega 2020, 94, 102117. [Google Scholar] [CrossRef]
- Jiang, E.D.; Wang, L. An improved multi-objective evolutionary algorithm based on decomposition for energy-efficient permutation flow shop scheduling problem with sequence-dependent setup time. Int. J. Prod. Res. 2019, 57, 1756–1771. [Google Scholar] [CrossRef]
- Mishra, A.; Shrivastava, D.J.C.; Engineering, I. A TLBO and a Jaya heuristics for permutation flow shop scheduling to minimize the sum of inventory holding and batch delay costs. Comput. Ind. Eng. 2018, 124, 509–522. [Google Scholar] [CrossRef]
- Fu, Y.; Wang, H.; Tian, G.; Li, Z.; Hu, H. Two-agent stochastic flow shop deteriorating scheduling via a hybrid multi-objective evolutionary algorithm. J. Intell. Manuf. 2019, 30, 2257–2272. [Google Scholar] [CrossRef]
- Han, Y.; Li, J.; Sang, H.; Liu, Y.; Gao, K.; Pan, Q. Discrete evolutionary multi-objective optimization for energy-efficient blocking flow shop scheduling with setup time. Appl. Soft Comput. 2020, 93, 106343. [Google Scholar] [CrossRef]
- Wang, S.-Y.; Wang, L.; Liu, M.; Xu, Y. An effective estimation of distribution algorithm for solving the distributed permutation flow-shop scheduling problem. Int. J. Prod. Econ. 2013, 145, 387–396. [Google Scholar] [CrossRef]
- Fu, Y.; Hou, Y.; Wang, Z.; Wu, X.; Gao, K.; Wang, L. Distributed scheduling problems in intelligent manufacturing systems. Tsinghua Sci. Technol. 2021, 26, 625–645. [Google Scholar] [CrossRef]
- Wang, J.J.; Wang, L. A knowledge-based cooperative algorithm for energy-efficient scheduling of distributed flow-shop. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1805–1819. [Google Scholar] [CrossRef]
- Wang, G.; Gao, L.; Li, X.; Li, P.; Tasgetiren, M.F. Energy-efficient distributed permutation flow shop scheduling problem using a multi-objective whale swarm algorithm. Swarm Evol. Comput. 2020, 57, 100716. [Google Scholar] [CrossRef]
- Deng, J.; Wang, L. A competitive memetic algorithm for multi-objective distributed permutation flow shop scheduling problem. Swarm Evol. Comput. 2017, 32, 121–131. [Google Scholar] [CrossRef]
- Mao, J.-Y.; Pan, Q.-K.; Miao, Z.-H.; Gao, L. An effective multi-start iterated greedy algorithm to minimize makespan for the distributed permutation flowshop scheduling problem with preventive maintenance. Expert Syst. Appl. 2021, 169, 114495. [Google Scholar] [CrossRef]
- Wang, H.; Yan, Q.; Zhang, S. Integrated scheduling and flexible maintenance in deteriorating multi-state single machine system using a reinforcement learning approach. Adv. Eng. Inform. 2021, 49, 101339. [Google Scholar] [CrossRef]
- Zhao, Z.; Shen, L.; Yang, C.; Wu, W.; Zhang, M.; Huang, G.Q. IoT and digital twin enabled smart tracking for safety management. Comput. Oper. Res. 2021, 128, 105183. [Google Scholar] [CrossRef]
- Chen, J.-S. Scheduling of nonresumable jobs and flexible maintenance activities on a single machine to minimize makespan. Eur. J. Oper. Res. 2008, 190, 90–102. [Google Scholar] [CrossRef]
- Yang, S.-L.; Ma, Y.; Xu, D.-L.; Yang, J.-B. Minimizing total completion time on a single machine with a flexible maintenance activity. Comput. Oper. Res. 2011, 38, 755–770. [Google Scholar] [CrossRef]
- Mosheiov, G.; Sarig, A. Scheduling a maintenance activity to minimize total weighted completion-timeComput. Math. Appl. 2009, 57, 619–623. [Google Scholar]
- Wang, T.; Baldacci, R.; Lim, A.; Hu, Q. A branch-and-price algorithm for scheduling of deteriorating jobs and flexible periodic maintenance on a single machine. Eur. J. Oper. Res. 2018, 271, 826–838. [Google Scholar] [CrossRef]
- Zandieh, M.; Khatami, A.; Rahmati, S.H.A. Flexible job shop scheduling under condition-based maintenance: Improved version of imperialist competitive algorithm. Appl. Soft Comput. 2017, 58, 449–464. [Google Scholar] [CrossRef]
- Rahmati, S.H.A.; Ahmadi, A.; Govindan, K. A novel integrated condition-based maintenance and stochastic flexible job shop scheduling problem: Simulation-based optimization approach. Ann. Oper. Res. 2018, 269, 583–621. [Google Scholar] [CrossRef]
- Ghaleb, M.; Taghipour, S.; Sharifi, M.; Zolfagharinia, H. Integrated production and maintenance scheduling for a single degrading machine with deterioration-based failures. Comput. Ind. Eng. 2020, 143, 106432. [Google Scholar] [CrossRef]
- Chan, F.T.S.; Chung, S.H.; Chan, L.Y.; Finke, G.; Tiwari, M.K. Solving distributed FMS scheduling problems subject to maintenance: Genetic algorithms approach. Robot. Comput. Manuf. 2006, 22, 493–504. [Google Scholar] [CrossRef] [Green Version]
- Chung, S.H.; Chan, F.T.S.; Chan, H.K. A modified genetic algorithm approach for scheduling of perfect maintenance in distributed production scheduling. Eng. Appl. Artif. Intell. 2009, 22, 1005–1014. [Google Scholar] [CrossRef]
- Lei, D.; Liu, M. An artificial bee colony with division for distributed unrelated parallel machine scheduling with preventive maintenance. Comput. Ind. Eng. 2020, 141, 106320. [Google Scholar] [CrossRef]
- Wang, K.; Huang, Y.; Qin, H. A fuzzy logic-based hybrid estimation of distribution algorithm for distributed permutation flowshop scheduling problems under machine breakdown. J. Oper. Res. Soc. 2016, 67, 68–82. [Google Scholar] [CrossRef]
- Miyata, H.H.; Nagano, M.S. Optimizing distributed no-wait flow shop scheduling problem with setup times and maintenance operations via iterated greedy algorithm. J. Manuf. Syst. 2021, 61, 592–612. [Google Scholar] [CrossRef]
- Jafar-Zanjani, H.; Zandieh, M.; Sharifi, M. Robust and resilient joint periodic maintenance planning and scheduling in a multi-factory network under uncertainty: A case study. Reliab. Eng. Syst. Saf. 2022, 217, 108113. [Google Scholar] [CrossRef]
- Wang, Y.; Usher, J.M. Application of reinforcement learning for agent-based production scheduling. Eng. Appl. Artif. Intell. 2005, 18, 73–82. [Google Scholar] [CrossRef]
- Cheng, L.; Tang, Q.; Zhang, L.; Zhang, Z. Multi-objective Q-learning-based hyper-heuristic with Bi-criteria selection for energy-aware mixed shop scheduling. Swarm Evol. Comput. 2022, 69, 100985. [Google Scholar] [CrossRef]
- Lin, J.; Li, Y.-Y.; Song, H.-B. Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 2022, 187, 115978. [Google Scholar] [CrossRef]
- Shahmardan, A.; Sajadieh, M.S.J.C.; Engineering, I. Truck scheduling in a multi-door cross-docking center with partial unloading–Reinforcement learning-based simulated annealing approaches. Comput. Ind. Eng. 2020, 139, 106134. [Google Scholar] [CrossRef]
- Long, X.; Zhang, J.; Qi, X.; Xu, W.; Jin, T.; Zhou, K. A self-learning artificial bee colony algorithm based on reinforcement learning for a flexible job-shop scheduling problem. Concurr. Comput. Pract. Exp. 2021, 34, e6658. [Google Scholar] [CrossRef]
- Wang, J.; Lei, D.; Cai, J. An adaptive artificial bee colony with reinforcement learning for distributed three-stage assembly scheduling with maintenance. Appl. Soft Comput. 2021, 117, 108371. [Google Scholar] [CrossRef]
- Luo, S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Appl. Soft Comput. 2020, 91, 106208. [Google Scholar] [CrossRef]
- Wang, H.; Yan, Q.; Wang, J. Blockchain-secured multi-factory production with collaborative maintenance using Q learning-based optimisation approach. Int. J. Prod. Res. 2021, 59, 1–18. [Google Scholar] [CrossRef]
- Naderi, B.; Azab, A. An improved model and novel simulated annealing for distributed job shop problems. Int. J. Adv. Manuf. Technol. 2015, 81, 693–703. [Google Scholar] [CrossRef]
- Liu, C.; Zhang, Y.; Sun, J.; Cui, Z.; Wang, K. Stacked bidirectional LSTM RNN to evaluate the remaining useful life of supercapacitor. Int. J. Energy Res. 2021, 46, 3034–3043. [Google Scholar] [CrossRef]
- Hua, Y.; Wang, N.; Zhao, K. Simultaneous unknown input and state estimation for the linear system with a rank-deficient distribution matrix. Math. Probl. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
- Gao, J.; Chen, R. A hybrid genetic algorithm for the distributed permutation flowshop scheduling problem. Int. J. Comput. Intell. Syst. 2011, 4, 497–508. [Google Scholar]
Job 1 | Job 2 | Job 3 | Job 4 | Job 5 | Job 6 | Job 7 | Job 8 | Job 9 | Job 10 | |
---|---|---|---|---|---|---|---|---|---|---|
F1M1 | 10 | 9 | 12 | 8 | 9 | 17 | 15 | 9 | 13 | 8 |
F1M2 | 11 | 14 | 12 | 11 | 13 | 14 | 7 | 13 | 7 | 14 |
F1M3 | 10 | 13 | 13 | 12 | 6 | 12 | 15 | 18 | 10 | 15 |
F2M1 | 9 | 8 | 11 | 7 | 10 | 15 | 15 | 7 | 13 | 8 |
F2M2 | 10 | 17 | 10 | 11 | 11 | 15 | 7 | 13 | 7 | 15 |
F2M3 | 13 | 12 | 15 | 13 | 7 | 12 | 15 | 18 | 12 | 16 |
Number of jobs | |
Number of machines | |
Number of factories | |
Processing time of a job | |
Maintenance cycle |
Parameters | Values |
---|---|
Learning number | 5000 |
Replay memory size | 2000 |
Batch size of samples to perform gradient descent | 128 |
Greedy rate | Linearly decreases from 0.5 to 0.1 |
Learning rate | 0.1 |
Discount factor | 0.9 |
Iteration interval of updating the target network | 300 |
DQND | DQNF | GA | IGA | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | t-Test | Mean | Std | t-Test | Mean | Std | t-Test | |||
2 | 3 | 20 | 178.00 | 2.19 | 183.12 | 1.20 | + | 177.20 | 2.25 | ~ | 175.11 | 1.56 | − |
40 | 322.15 | 1.75 | 340.52 | 2.56 | + | 324.86 | 3.86 | + | 321.89 | 2.49 | ~ | ||
60 | 495.63 | 2.01 | 533.17 | 1.79 | + | 501.89 | 4.79 | + | 498.24 | 3.46 | + | ||
80 | 642.40 | 2.98 | 679.30 | 3.05 | + | 658.45 | 4.35 | + | 650.75 | 2.89 | + | ||
100 | 808.28 | 3.48 | 867.95 | 3.76 | + | 820.14 | 5.80 | + | 812.37 | 4.01 | + | ||
2 | 5 | 20 | 221.29 | 1.76 | 235.24 | 0.99 | + | 219.54 | 1.42 | − | 218.46 | 1.68 | − |
40 | 380.65 | 1.45 | 394.66 | 1.49 | + | 383.27 | 2.43 | + | 384.12 | 1.77 | + | ||
60 | 539.78 | 1.89 | 578.20 | 2.64 | + | 548.31 | 2.87 | + | 545.33 | 2.20 | + | ||
80 | 707.51 | 2.18 | 746.79 | 1.86 | + | 715.24 | 3.58 | + | 709.64 | 3.87 | + | ||
100 | 891.45 | 2.45 | 911.56 | 2.51 | + | 901.70 | 4.76 | + | 896.34 | 4.06 | + | ||
3 | 3 | 20 | 131.02 | 0.86 | 145.28 | 2.01 | + | 129.11 | 1.10 | − | 130.97 | 1.92 | ~ |
40 | 234.64 | 1.64 | 252.69 | 1.94 | + | 233.98 | 2.49 | ~ | 235.88 | 2.04 | + | ||
60 | 344.48 | 1.82 | 356.17 | 2.31 | + | 350.86 | 3.47 | + | 349.18 | 3.28 | + | ||
80 | 455.89 | 2.49 | 470.92 | 1.68 | + | 460.75 | 2.54 | + | 459.66 | 4.08 | + | ||
100 | 554.10 | 3.08 | 581.34 | 1.99 | + | 565.34 | 4.97 | + | 560.02 | 3.67 | + | ||
3 | 5 | 20 | 168.02 | 1.54 | 188.02 | 1.76 | + | 167.20 | 1.16 | − | 166.16 | 1.01 | − |
40 | 281.72 | 1.25 | 294.32 | 0.86 | + | 283.56 | 2.89 | + | 282.14 | 2.04 | + | ||
60 | 390.15 | 1.99 | 410.59 | 2.14 | + | 395.68 | 4.06 | + | 393.54 | 3.47 | + | ||
80 | 495.17 | 2.51 | 512.62 | 3.12 | + | 506.86 | 3.85 | + | 500.02 | 2.88 | + | ||
100 | 601.44 | 1.67 | 624.25 | 2.68 | + | 606.79 | 5.10 | + | 604.29 | 3.79 | + | ||
4 | 3 | 20 | 100.12 | 0.76 | 120.96 | 1.58 | + | 99.59 | 1.29 | − | 99.76 | 0.99 | ~ |
40 | 194.06 | 1.05 | 230.42 | 2.16 | + | 196.13 | 2.16 | + | 194.32 | 1.83 | + | ||
60 | 278.46 | 1.69 | 291.68 | 1.76 | + | 283.59 | 3.47 | + | 280.49 | 2.56 | + | ||
80 | 360.17 | 2.54 | 379.69 | 1.53 | + | 369.96 | 4.63 | + | 365.56 | 3.29 | + | ||
100 | 437.69 | 2.34 | 455.16 | 2.18 | + | 442.77 | 3.57 | + | 439.76 | 4.25 | + | ||
4 | 5 | 20 | 131.35 | 1.86 | 152.30 | 2.34 | + | 131.59 | 2.44 | + | 130.45 | 1.85 | − |
40 | 225.16 | 2.04 | 248.36 | 1.58 | + | 233.75 | 3.86 | + | 229.37 | 2.88 | + | ||
60 | 315.33 | 1.66 | 330.27 | 1.36 | + | 318.62 | 2.76 | + | 317.60 | 3.49 | + | ||
80 | 410.89 | 2.38 | 440.86 | 2.86 | + | 415.63 | 4.35 | + | 415.31 | 2.95 | + | ||
100 | 475.51 | 3.06 | 501.24 | 2.76 | + | 481.21 | 3.86 | + | 479.62 | 4.32 | + |
DQND | DQNF | GA | IGA | |||||
---|---|---|---|---|---|---|---|---|
Mean | Std | Mean | Std | Mean | Std | Mean | Std | |
50 | 322.15 | 1.75 | 340.52 | 2.56 | 324.86 | 3.86 | 321.89 | 2.49 |
60 | 332.71 | 1.46 | 348.85 | 1.95 | 334.22 | 2.89 | 331.89 | 2.43 |
70 | 331.25 | 2.03 | 341.55 | 2.58 | 332.17 | 4.21 | 332.12 | 1.68 |
80 | 335.42 | 1.45 | 349.79 | 1.64 | 334.69 | 2.89 | 336.88 | 3.49 |
90 | 342.56 | 1.67 | 356.34 | 1.35 | 345.86 | 2.64 | 344.64 | 2.34 |
100 | 346.28 | 1.89 | 359.60 | 2.57 | 346.98 | 1.97 | 345.66 | 2.69 |
110 | 361.29 | 2.56 | 386.99 | 2.68 | 365.47 | 3.44 | 363.20 | 3.64 |
120 | 362.75 | 2.35 | 385.42 | 3.04 | 363.70 | 1.66 | 364.18 | 2.34 |
130 | 361.92 | 1.75 | 386.48 | 2.67 | 362.86 | 2.76 | 362.56 | 3.48 |
140 | 366.46 | 2.41 | 397.12 | 1.95 | 369.79 | 3.45 | 365.79 | 2.99 |
150 | 371.11 | 2.68 | 402.31 | 1.86 | 377.64 | 2.86 | 374.55 | 3.47 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, Q.; Wu, W.; Wang, H. Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance. Machines 2022, 10, 210. https://doi.org/10.3390/machines10030210
Yan Q, Wu W, Wang H. Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance. Machines. 2022; 10(3):210. https://doi.org/10.3390/machines10030210
Chicago/Turabian StyleYan, Qi, Wenbin Wu, and Hongfeng Wang. 2022. "Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance" Machines 10, no. 3: 210. https://doi.org/10.3390/machines10030210
APA StyleYan, Q., Wu, W., & Wang, H. (2022). Deep Reinforcement Learning for Distributed Flow Shop Scheduling with Flexible Maintenance. Machines, 10(3), 210. https://doi.org/10.3390/machines10030210