Next Article in Journal
New Mechanical Knee Supporter Device for Shock Absorption
Previous Article in Journal
Virtual Neuromuscular Control for Robotic Ankle Exoskeleton Standing Balance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

1
School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China
2
Institute of Physical Internet, Jinan University, Zhuhai 519070, China
*
Author to whom correspondence should be addressed.
Machines 2022, 10(7), 573; https://doi.org/10.3390/machines10070573
Submission received: 13 June 2022 / Revised: 3 July 2022 / Accepted: 14 July 2022 / Published: 16 July 2022
(This article belongs to the Section Machines Testing and Maintenance)

Abstract

:
Job shop scheduling problem (JSSP) is essential in the production, which can significantly improve production efficiency. Dynamic events such as machine breakdown and job rework frequently occur in smart manufacturing, making the dynamic job shop scheduling problem (DJSSP) methods urgently needed. Existing rule-based and meta-heuristic methods cannot cope with dynamic events in DJSSPs of different sizes in real time. This paper proposes an end-to-end transformer-based deep learning method named spatial pyramid pooling-based transformer (SPP-Transformer), which shows strong generalizability and can be applied to different-sized DJSSPs. The feature extraction module extracts the production environment features that are further compressed into fixed-length vectors by the feature compression module. Then, the action selection module selects the simple priority rule in real time. The experimental results show that the makespan of SPP-Transformer is 11.67% smaller than the average makespan of dispatching rules, meta-heuristic methods, and RL methods, proving that SPP-Transformer realizes effective dynamic scheduling without training different models for different DJSSPs. To the best of our knowledge, SPP-Transformer is the first application of an end-to-end transformer in DJSSP, which not only improves the productivity of industrial scheduling but also provides a paradigm for future research on deep learning in DJSSP.

1. Introduction

Smart manufacturing is an emerging concept with the development of Internet of Things technology [1], and scheduling plays a vital role in smart manufacturing since effective scheduling can improve profitability and resource utilization in the production [2]. Job shop scheduling problem (JSSP) is a classical scheduling problem in manufacturing, which is an NP-hard combinatorial optimization problem [3] that aims to find the optimal solution to production scheduling given a set of jobs, each with multiple operations processed by different machines. Most JSSP methods assume that the production environment is static, and the states of the production environment are known in advance. However, these JSSP methods have to reschedule the jobs when dynamic events such as machine breakdown and the insertion of new jobs occur in the production environment, causing the initial scheduling plans to be invalid. The extension of JSSP considering such dynamic events is called dynamic job shop scheduling problem (DJSSP), which effectively solved can better boost the productivity in the actual manufacturing.
DJSSP has been widely studied, and various methods are proposed. Dispatching rules are preferred in the production due to their simpler, easier implementation, and low computational complexity [4]. However, real-time selection of dispatching rules for scheduling is crucial because performances of different dispatching rules vary significantly in different DJSSPs. Simple priority rules are a kind of dispatching rule that uses a single job or shop feature to prioritise jobs and process the job with the highest priority first. Human operators select simple priority rules based on their own experience in the production, which does not guarantee that the rules are optimal at each step. Moreover, selecting simple priority rules is intractable for human operators, failing to realize real-time scheduling. And using only one simple priority rule to schedule the jobs at each step is irrational since DJSSP often requires considering multi objects that cannot be achieved using a simple priority rule. Thus, two types of composite dispatching rules are proposed, which combine good features of different simple priority rules [5]. One is to deploy different simple priority rules on different machines; the other is to combine different simple priority rules to form a new dispatching rule using some operators such as addition, subtraction, multiplication, and division. But designing a stable composite dispatching rule is hard since different ways to combine the simple priority rules significantly affect the performance of the composite dispatching rule [6].
Meta-heuristic algorithm has been proven competitive in solving DJSSP. The most used ones in DJSSP are evolutionary algorithms (EAs) and swarm-based algorithms. EAs are algorithms inspired by biological evolution, which mimic the mechanism of biological evolution to generate better solutions, such as genetic algorithm (GA) [7] and differential evolutionary (DE) algorithm [8]. Whereas swarm-based algorithms mimic the behaviour of animal groups such as fish, birds, and wolf packs in nature and use the information exchange between groups to achieve optimization through simple and limited interactions between individuals, such as grey wolf optimizer (GWO) [9], ant colony optimization [10], and particle swarm optimization [11]. Although meta-heuristic algorithms can yield high-quality solutions, they often require massive iterations to get the optimal or near-optimal solution, and the parameters of the meta-heuristic algorithms are set empirically. Furthermore, meta-heuristic algorithms need to reschedule the jobs when dynamic events occur in the production environment, but rescheduling the jobs is impractical.
Reinforcement learning (RL) has been considered a powerful method for solving combinatorial optimization problems [12] and has been successfully used in various fields like AlphaGo wins the go game against one of the top human players [13], and AlphaFold predicts the protein structure with great accuracy [14]. The main component in RL is called agent, which takes actions to obtain maximum cumulative reward through interactions with the environment like a human being. Since the fast computation and ability to cope with dynamic events of RL [15], RL has achieved outstanding success in solving DJSSP. Q-learning is a classic algorithm of RL which chooses the action with the highest Q-value stored in the Q table. Q-learning not only improves multi-objective performance in JSSP but also effectively responds to dynamic events [16]. Considering that the Q table may become too large as the DJSSP size increases, causing the massive memory occupied and long computation time, deep reinforcement learning has been proposed. Deep reinforcement learning is also known as Deep Q-Network (DQN) and outperforms the popular dispatching rules in solving DJSSP [17]. Other RL methods such as proximal policy optimization (PPO) [18] and deep deterministic policy gradient [19] are also used in solving DJSSP and achieve great success. However, researchers have experimentally found that the optimal policy obtained by the RL agent is deterministic, but the observability of the unseen instance is partial. Thus RL models tend to generalize poorly [20], resulting in the models trained on a fixed-size instance can not cope with instances of different sizes.
Deep learning is a kind of representation-learning method in artificial intelligence, which contains a deeper network structure, helping it extract more accurate feature representation and map the feature representation with a particular output. Deep learning methods can generalize well on the unseen problem, and some researchers have successfully used deep learning to solve DJSSP. Weckman et al. [21] combine a multi-layer artificial neural network (ANN) with Giffler–Thomson (GT) algorithm to build a neural network scheduler, which can schedule the operations in real time. The ANN takes the real-time features related to the production environment as input and output the priorities of each operation. The priorities of each operation are further processed using GT algorithm to get a scheduling plan that can deal with dynamic events. Sim et al. [22] further optimize the neural network scheduler, and they consider that the priorities of different operations output by the ANN might be the same for the same machine, causing the operations with the same priorities to be arbitrarily scheduled, so they propose to use dispatching rules to schedule the operations with the same priorities, but the GT algorithm is still needed to process the priorities of other operations to get a scheduling plan, failing to realize an end-to-end scheduling method. Zang et al. [23] design a hybrid deep neural network scheduler (HDNNS) combined with the two-dimensional convolution neural network (CNN), which can also extract the features of the production environment and output the priorities of each operation. But the ability to cope with dynamic events of HDNNS is not tested, and the CNN overlooks some critical features with time steps in the environment, which makes the performance of HDNNS still not satisfactory. Tian et al. [24] use the long short-term memory network (LSTM) that can retain the features with time steps to predict the production information and use the improved GA to achieve the scheduling. Shao et al. [25] also build a model based on LSTM named self-supervised long-short term memory (SS-LSTM), which can deal with dynamic events effectively, but the hyperparameters in SS-LSTM are difficult to determine. All the above-mentioned deep learning models either need one module to preprocess the input of the neural network or use other modules to process the output of the neural network to get the final scheduling plan, which may aggravate the errors between each module and lead to the suboptimal scheduling plan.
DJSSP can be viewed as a time series forecasting problem [26], and transformer is a deep learning-based method that is widely used for time-series forecasting [27]. Moreover, transformer with attention mechanism is a frontier model and some studies successfully use attention mechanism to solve JSSP. For example, Magalhães et al. [28] build a neural network with attention mechanism to solve dual resource constrained flexible job shop scheduling problem, Yang [29] utilizes attention mechanism to extract the representation of the production environment, and Chen et al. [30] use attention mechanism to allocate the processing resources to the more significant parts when solving the JSSP of large size. Most JSSP methods based on attention mechanism use attention mechanism to extract the features of the production environment, but there is no work to use the transformer based on attention mechanism to solve JSSP directly. In this paper, we build an end-to-end transformer-based deep learning model named spatial pyramid pooling-based transformer (SPP-Transformer) to solve different-sized DJSSPs. The disjunctive graph, which can represent the production environment features with time information [31], is adopted as the input of SPP-Transformer. SPP-Transformer utilizes a feature extraction module to extract the features in the disjunctive graph [32] and transforms them into vectors of different sizes. Then, a feature compression module compresses the different-sized feature vectors into fixed-length vectors. The fixed-length vectors are further input to the action selection module to select a simple priority rule. To the best of our knowledge, SPP-Transformer is the first work purely using a transformer-based model to solve DJSSP and achieve satisfactory results, providing a paradigm for further studies on deep learning in DJSSP. The contributions of this work are as follows:
  • An end-to-end transformer-based method is proposed, which takes the disjunctive graph as input and outputs the simple priority rule that can directly be applied in the production, providing a total data-driven method to solve DJSSP.
  • A feature compression module that contains a spatial pyramid pooling (SPP) [33] layer is presented to enhance the generalizability of the proposed model. The feature compression module enables the proposed model to be applied to DJSSP instances of different sizes and be trained on the various-sized instances represented by the disjunctive graph. Thus, there is no need to train the model instance-by-instance.
  • A sequence-to-action model that effectively deal with dynamic events is presented. Different from the conventional sequence-to-sequence deep learning models, the proposed sequence-to-action deep learning model reacts to the environment at each step according to the real-time states of the production environment. Thus, dynamic events that happen at an uncertain step can be well tackled without extra measures.
The rest of this paper is organized as follows: Section 2 introduces the DJSSP with dynamic events such as machine breakdown and job rework, including the mathematical model and constraint conditions. Section 3 presents the whole structure of SPP-Transformer and gives the training details. Section 4 compares the performance of SPP-Transformer with exiting methods. Section 5 draws the conclusions finally.

2. Materials and Methods

2.1. Problem Description

The JSSP refers to allocating n jobs on m machines in a limited number of resources to optimize one or more objects, such as minimizing the makespan or achieving lower production costs. And DJSSP is an extension of JSSP, which considers the dynamic events that occur unexpectedly in the production and is more common than JSSP in the actual production. DJSSP can be described as follows: There are n jobs J = J 1 , J 2 , , J n arrive at the shop successively, which will be processed on m machines M = M 1 , M 2 , , M m . Each job J i consists of multiple operations O = O i 1 , O i 2 , , O i j , . O i j is the j t h operation of job J i , and it can only be processed on a specific machine. The dynamic events that may happen in the production are various, including machine breakdown, the insertion of new jobs, the change of the delivery dates and the processing time and job rework, etc. [34]. And DJSSP should satisfy the assumptions followed:
(1)
Each machine can only process one operation at the same time;
(2)
An operation is processed by only one machine at a time;
(3)
All jobs and machines are available at time 0;
(4)
An operation can not be interrupted if it has started unless machine breakdown;
(5)
The processing time of each operation may change during machining;
(6)
An operation cannot be processed until its preceding operations are completed.
In this paper, the dynamic events we considered include machine breakdown and job rework. We change the processing time and the machine order of job J 1 randomly to simulate machine breakdown or job rework. However, the number of machines and operations needed by J 1 are immutable. For simplicity, we only increase the processing time of the operation to represent machine breakdown and switch the machine order of the job to represent job rework. We use the parameter Switch to disorder the machine order to represent job rework. If S w i t c h is “True”, the machine order of the job is disordered. Moreover, we set a random rate f to represent the probability of machine breakdown, and the new processing time p i j of operation O i j caused by machine breakdown is calculated as follows:
p i j = p i j + min ( 1 , max ( 1 , N ( 0 , 0.1 ) ) ) · p i j , r < f p i j , r > f
where p i j is the original processing time of operation O i j , r [ 0 , 1 ] is a random number, and f [ 0 , 1 ] is a hyperparameter. N ( 0 , 0.1 ) is a normal distribution with mean 3 and variance 1.
SPP-Transformer is a sequence-to-action model and schedules the jobs at each step using the feature sequences from the disjunctive graph, which will change according to the production environment. And SPP-Transformer deals with dynamic events by taking actions depending on the input disjunctive graph without changing the scheduling procedure when dynamic events occur.

2.2. Mathematical Model

This study aims to minimize the makespan with the dynamic events, including machine breakdown and job rework that happened in the production. A mathematical model of DJSSP is formulated as follows with the notations listed in Table 1.
min C max = min max i = 1 n j = 1 h s i j + p i j s . t . C 1 : p i j > 0 C 2 : s i j + p i j s i ( j + 1 ) C 3 : e i j s i j = p i j C 4 : e i j e i j C 5 : T = n × m
where the objective is minimizing the makespan as shown in Equation (2). C1 indicates that the processing time of all operations is positive. C2 shows that an operation can not be processed if its preceding operations are not completed. C3 represents that an operation cannot be interrupted while it is being processed since the operation O i j and the operation O i j may be processed on the same machine. C4 is built to make sure that each machine can only process one operation at a time by ensuring the end time of O i j and O i j is different. C5 makes sure that each operation is processed by only one machine at a time.

3. Proposed Method

3.1. Input and Output Representation

3.1.1. Input Representation

Considering the disjunctive graph can represent both the global and local features of the production and divide discrete-time steps in terms of operation assignment, it is adopted as the input of our model to help collect more features with time information. Disjunctive graph G = ( V , C D ) is a kind of directed graph, and is also used to represent the JSSP popularly. All operations of all jobs construct the set V, and these operations are represented as vertices. However, V includes two more dummy vertices called source and sink, representing the start and the end of the scheduling, respectively. Moreover, the processing time of the two dummy vertices is zero. The precedence constraints initially between every two consecutive operations are reflected by the directed conjunctive edges that construct the set C. Unordered operations that can be processed on the same machine are linked to each other with the undirected disjunctive edges, which construct the set D [35]. Thus, solving a JSSP means turning a disjunctive edge into a directed edge to get a directed acyclic graph and determine the operation sequences on each machine. A simple JSSP instance and its solution represented by the disjunctive graphs are shown in Figure 1.
Each vertex denotes an operation of a job, and the content O i j in it denotes the jth operation of job i. The solid lines denote the conjunctive edges, while the dotted lines denote the disjunctive edges. The numbers near the edges denote the processing time needed for the operations where the edges begin. Furthermore, the vertices of the same colour mean that the operations represented by these vertices are processed on the same machine.
However, the disjunctive graphs only reflect the static features in JSSP, failing to represent the dynamic features in DJSSP. Therefore, several following attributes are added to the vertices in the disjunctive graph to represent the dynamic features [36]:
(1)
Operation identified number: The corresponding serial number of this operation in the job is added to the vertex;
(2)
Job identified number: If a job includes the operation, its serial number is added to the vertex;
(3)
Machine identified number: If a machine can process this operation, its serial number is added to the vertex. And 0 is added if there are no machines that can process this operation;
(4)
Node situation: 1, 0, and −1 represent that the operation is finished, being processed, and unfinished, respectively;
(5)
Completion rate: When this operation is finished, the completion rate of the entire job;
(6)
Number of subsequent operations: The number of remaining operations to be finished;
(7)
Time for waiting: The amount of time that has elapsed since the start before this operation can be processed;
(8)
Processing time: The time required to process the operation;
(9)
Remaining time: The remaining processing time of the operation (0 for unprocessed);
(10)
Doable: It is “True” if the operation is doable.

3.1.2. Output Representation

Because of the black-box nature of the neural network, a detailed scheduling operation output by a neural network may be unsafe. Therefore, dispatching rules, which are widely used in JSSP since they are easy to implement and compute fast, are adopted as the output of our model. Dispatching rules prioritise the jobs waiting to be processed according to shop features (e.g., machine utilization) or job features, and the machine will first process the job with the highest priority [37]. Simple priority rule is a kind of dispatching rule and some typical simple priority rules have shown their effectiveness in DJSSP. In our work, SPP-Transformer takes the following 8 simple priority rules as output:
(1)
First In First Out (FIFO): The machine will first process the job that first arrived at the queue of the machine;
(2)
Last In Last Out (LIFO): The machine will first process the job that last arrived at the queue of the machine;
(3)
Most Operations Remaining (MOR): The machine will first process the job that has the most remaining operations;
(4)
Least Operations Remaining (LOR): The machine will first process the job that has the least remaining operations;
(5)
Longest Processing Time (LPT): The machine will first process the job that has the longest processing time;
(6)
Shortest Processing Time (SPT): The machine will first process the job that has the shortest processing time;
(7)
Longest Total Processing Time (LTPT): The machine will first process the job that has the longest total processing time;
(8)
Shortest Total Processing Time (STPT): The machine will first process the job that has the shortest total processing time.

3.2. Transformer Layer

3.2.1. Feature Extraction Module

The feature extraction module of SPP-Transformer consists of 6 encoder layers, and each encoder layer contains stacked self-attention, a fully connected feed-forward network (FFN), residual connection [38], and layer normalization (LN) [39]. All operations O = O i 1 , O i 2 , , O i j , corresponding with the vertices in the disjunctive graph construct a feature sequence X = x i 1 , x i 2 , , x i j , , which can be viewed as a vector with 10 dimensions because 10 attributes are added to each vertex. And the feature sequence is input to the feature extraction module and processed by the feature extraction module to get a feature vector so that the features of the production can be better extracted.
Self-attention computes the attention distribution over the input feature sequence [40] and maps the query and the set of key-value pairs to an output. Three different linear transformations are learned and performed on the input feature sequence to obtain the queries, keys, and values of all the operations. The query of an operation represents the type of the operation it concerns. The key of an operation represents its type, and the value of an operation contains the information about itself. Furthermore, the queries, keys, and values of all the operations construct the query matrix Q, the key matrix K, and the value matrix V, respectively. And Q , K , V R d L × d F , where d L represents the number of total operations while d F represents the number of the attributes of each operation. Firstly, the dot products of the queries of each operation and the keys of the other operations are computed to obtain the matching scores over all operations, and the matching scores are further divided by d F to avoid the attention over-focusing on the high-scores operations. Then, a softmax function is applied on the matching scores for obtaining the attention weights of the values, and we compute the o u t p u t R d L × d F of a single attention function by finally conducting the matrix multiplication of the attention weights and values. And the attention function is shown in Equation (3). It is worth mentioning that such a process can be executed in parallel. Thus, high computational efficiency can be achieved.
Attention ( Q , K , V ) = softmax Q K d F V
Further, self-attention is often combined with multi-head attention and the outputs of multiple attention functions are concatenated to obtain more information with different query weight matrices W t Q , key weight matrices W t K , value weight matrices W t V , as shown in Equation (4):
MultiHead ( Q , K , V ) = Concat head 1 , , head H W multi where head t = Attention Q W t Q , K W t K , V W t V
where H represents the number of self-attention heads that are applied on the input sequence, W t Q , W t K , W t V R d F × d F H and W multi R d F × d F represent the parameter matrices. By concatenating the outputs of different attention functions, we obtain a vector which contains more information about the input feature sequence.
Besides the multi-head attention sub-layer, a fully connected FFN is included in both the feature extraction module and the action selection module, which consists of a nonlinear activation function and two linear transformations in between to transform the input from a d I n p u t dimensions vector to a d O u t p u t dimensions vector with the parameter W 1 , W 2 , b 1 , and b 2 as shown in Equation (5).
FFN ( x ) = max 0 , x W 1 + b 1 W 2 + b 2
Moreover, residual connections are applied around FFN sub-layer and every multi-head attention, which helps prevent the gradient disappearance. We use LN after the residual connection to prevent internal covariate shifts since a single batch may contain several examples.

3.2.2. Feature Compression Module

Various-sized feature vectors are obtained after using the feature extraction module to process the different disjunctive graphs. And the feature compression module is proposed to compress these various-sized feature vectors into fixed-length. Various-sized feature vectors are input to the feature compression module, and the feature compression module outputs fixed-length feature vectors. The feature compression module is the SPP layer that is typically used in computer vision to aggregate the convolutional features of images by splitting images into a range of grids at each level in the pyramid and aggregating local features. Images can also be viewed as vectors with multiple dimensions, so the vectors output by the feature extraction module can be compressed into fixed-length vectors by the SPP layer in the feature compression module.
To sample the extracted features at different scales from the disjunctive graph, four different bins are used for adaptive pooling in the SPP layer. After various-sized feature vectors are input to the feature compression module, the four different-sized bins in the SPP layer pool the feature vectors to get four fixed-length vectors and all the four vectors are 64 dimensions. The sliding windows and the strides of the four bins are changing adaptively according to the input various-sized vectors. For a bin, the size and the stride of its sliding window are determined using Equations (6) and (7), respectively.
win = ceil d In d Out
str = floor d In d Out
where the “win” and the “str” represent the size and the stride of the sliding window, respectively. The “ceil(·)” and the “floor(·)” represent rounding up and rounding down, respectively. d I n and d O u t represent the sizes of the input vectors and the sizes of the output vectors, respectively. It is worth mentioning that the sizes of the input vectors are the same with the sizes of the vectors output by the feature extraction module.
Finally, four fixed-length vectors whose sizes are 64 × 1, 32 × 2, 16 × 4, and 8 × 8, respectively, are output by the four bins. Next, each vector is reshaped to 64 × 1 and connected to construct the final fixed-length feature vector of 256 dimensions. With the fixed-length feature vector, the generalizability of SPP-Transformer is achieved, enabling SPP-Transformer to be trained once using existing instances but be applied on different-sized DJSSPs instead of being trained instance-by-instance. The framework of the feature compression module in our work is shown in Figure 2.

3.2.3. Action Selection Module

The fixed-length vectors of 256 dimensions output by the feature compression module are further input to the action selection module, and the action selection module outputs the simple priority rules using the input fixed-length vectors. The action selection module of SPP-Transformer consists of 6 decoder layers, a linear transformation and a softmax function. And each decoder layer is similar to the encoder layer in the feature extraction module, but contains a third sub-layer to perform multi-head attention over the output of the feature compression module. Moreover, masking is added to the self-attention sub-layer combined with the fact that the output of the feature compression module is offset by one position to avoid paying attention to the subsequent position [41].
And we utilize the linear transformation and softmax function to convert the output of the final decoder layer to predicted next-token probabilities. The linear transformation is a single layer composed of 16 neurons, which is used to transform the vectors of 256 dimensions to the vectors of 8 dimensions since there are 8 candidate simple priority rules to be selected. And each dimension of the vector represents each simple priority rules, respectively. Moreover, the specific expression of the softmax function used in this paper is shown in Equation (8) below.
Softmax z v = e z v v = 1 8 e z v
where v = 1 , 2 , . . . , 8 , and z v represents the value of each dimension of the 8 dimensions vectors, corresponding to one of the eight simple priority rules. The probability of being selected of each simple priority rule is calculated using Equation (8).

3.3. Training Procedure

SPP-Transformer is built with the feature extraction module, the feature compression module and the action selection module, and its framework is shown in Figure 3. A DJSSP instance represented by the disjunctive graph is input to SPP-Transformer, and the feature extraction module extracts the initial features in the disjunctive graph and turns them into feature vectors of different sizes. Then, the feature compression module compresses the different-sized feature vectors into fixed-length vectors, which are 256 dimensions. Finally, the action selection module further transforms the 256 dimensions vectors to 8 dimensions vectors using the linear transformation and selects a simple priority rule from eight candidate simple priority rules at each step using a softmax function.
DJSSP in this paper can be modelled as a multi-classification problem since a specific instance represented by the disjunctive graph is related to one of the eight simple priority rules. The softmax function shown in Equation (8) is used as the output unit. And the cross-entropy loss function as shown in Equation (9) is used to calculate the cross-entropy loss between the simple priority rule output by our model and the ground truth from the expert to train SPP-Transformer.
L = 1 N c v = 1 8 y c v log p c v
where N represents the number of the total disjunctive graphs in the training dataset, v represents one of the eight simple priority rules, c represents each disjunctive graph, y c v { 0 , 1 } is the label representing whether the disjunctive graph c is related to the simple priority rule v, and p c v represents the probability that the disjunctive graph c is related to the simple priory rule v.

4. Numerical Experiments

4.1. Experimental Setup

We conduct the experiment based on the gymjsp, a benchmark whose data is from the OR-Library. And OR-Library is a dataset of various Operational Research (OR) problems, including JSSP [42]. Moreover, we test SPP-Transoformer on the JSSP instances FT06, FT10, FT20, and LA01-20. The FT and LA instances are from different distributions, and the FT instances are proposed by Fisher and Henry [43], while the LA instances are proposed by Lawrence [44]. The FT and LA instances used in this paper are of different sizes, helping test if SPP-Transformer generalizes well on different-sized instances. The performance of SPP-Transformer significantly relies on the training samples. Therefore, the high-quality training samples from the expert whose solutions are viewed as optimal are required. The RL method named TOFA is the state-of-the-art method recently in solving DJSSP [45]. And the action selected by the TOFA agent at each step is one of the 8 simple priority rules, including FIFO, LIFO, MOR, LOR, LPT, SPT, LTPT, and STPT. We use the trained optimal TOFA agent to interact with the production environment and record the disjunctive graphs and the actions taken by the agent at each step. The collected disjunctive graphs are used as the original training data, and the collected actions are used as the training labels to obtain 100,000 training samples. To verify the generalizability of SPP-Transformer, we test SPP-Transformer on the different-sized instances without further training. And we randomly change the machine order and the processing time of a job to represent job rework or machine breakdown when testing the ability of SPP-Transformer to tackle dynamic events. Moreover, SPP-Transformer is a sequence-to-action model and deals with dynamic events by taking actions according to the current environment without extra measures. The hyperparameters used in this paper are shown in Table 2, and we test our model on the machine with Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz and a single NVIDIA GeForce RTX 3090 GPU.

4.2. Performance Analysis

In this subsection, we compare SPP-transformer with simple priority rules, composite dispatching rules, meta-heuristic methods, and RL methods. The results are obtained and shown in Table 3.
We use “the number of machines × the number of jobs” to represent the size of each instance. Makespan is an important indicator in the production, which means the processing completion time, and the smaller makespan means better method performance. The “Opt” represents the optimal value obtained by TOFA. The best results obtained among these methods for comparison are highlighted in bold font. Further, the relative percentage deviation (RPD) of each method is calculated using Equation (10).
R P D = A l g O p t O p t × 100 %
The “Alg” is the makespan obtained by each method on each instance. RPD reflects the distances between each method and the optimal method, and the smaller RPD means better method performance. Moreover, it can be seen from Equation (10) that the RPD is a negative value when the makespan obtained by the method is smaller than that obtained by TOFA, representing that this method even outperforms the state-of-the-art method. The average makespan and RPD of each method are shown in Table 3. Specifically, the “Makespan” and the “RPD” of the simple priority rules on each instance are, respectively, the average makespan and the average RPD of eight simple priority rules, including FIFO, LIFO, MOR, LOR, LPT, SPT, LTPT, and STPT. The “Makespan” and the “RPD” of the composite dispatching rules on each instance are, respectively, the average makespan and the average RPD of four composite dispatching rules, including MOR+LPT, LOR+SPT, FCFS∗S, and SI/Q. The “Makespan” and the “RPD” of meta-heuristic methods on each instance are, respectively, the average makespan and the average RPD of three meta-heuristic methods, including GA, DE, and GWO. The “Makespan” and the “RPD” of RL methods on each instance are, respectively, the average makespan and the average RPD of three RL methods, including DQN, Rainbow, and PPO. MOR+LPT means processing the job with the most remaining operations and the longest processing time first, but LOR+SPT means processing the job with the least remaining operations and the shortest processing time first. FCFS∗S means setting a specific value in advance, and if the number of waiting jobs is more than this value, using SPT; otherwise, using FIFO. SI/Q means using SPT mainly, but if the job in the queue is joining the queue of the next machine and the queue it will join beyond a specific length, this job is selected first. GA is a traditional meta-heuristic method to solve JSSP, while DE and GWO are relatively novel meta-heuristic methods. Rainbow is an extension of traditional DQN and performs well on atari games [46], and PPO is an effective policy gradient RL method [47]. The detailed results of each simple priority rule, each composite dispatching rule, each meta-heuristic method, and each RL method are listed in Appendix A.
And we depict the average RPD of each method to clearly show the differences between them. The result is shown in Figure 4.
The red dotted line represents the average RPD of SPP-Transformer and it can be found that the average RPDs of the other 18 methods are above the red dotted line, representing that the average RPDs of the other 18 methods are larger than that of SPP-Transformer and SPP-Transformer outperforms the other 18 methods. Moreover, it can be found from Table 3 that the RPDs of SPP-Transformer on LA08-09, LA13, LA15-17, LA19-20 are negative, representing that SPP-Transformer even gets better results than the optimal solutions on these instances. The makespan of SPP-Transformer is 11.67% smaller than the average makespan of the other 18 methods. Specifically, the makespan of SPP-Transformer is 14.30% smaller than the average makespan of all dispatching rules, including simple priority rules and composite dispatching rules, representing that SPP-Transformer can select the effective dispatching rule at different steps according to the current environment. Moreover, MOR outperforms the other 17 methods except SPP-Transformer. The performances of other dispatching rules vary widely on different instances, reflecting that there is no dispatching rule that can always perform well under any situation. Thus, the dynamic selection of dispatching rules is needed. The composite dispatching rules do not achieve the best result among all dispatching rules, which shows that combining different simple priority rules to construct an effective composite dispatching rule is not easy and requires domain knowledge. The makespan of SPP-Transformer is 16.99% smaller than the average makespan of meta-heuristic methods, representing that SPP-Transformer can effectively deal with the dynamic events without extra measures while meta-heuristic methods always need to reschedule the jobs when dynamic events occur. Both SPP-Transformer and RL methods select dispatching rules at different steps, but the makespan of SPP-Transformer is 13.82% smaller than the average makespan of RL methods, representing that SPP-Transformer can select the more effective dispatching rules than RL methods.
And Figure 5, Figure 6 and Figure 7 are the Gantt charts that represent the scheduling schemes obtained by SPP-Transformer to solve FT06, LA01, and LA20, respectively.
The rectangles with different colours represent the different jobs, the horizontal axis represents the processing time, the vertical axis represents the different machines, and the C max is related to the minimum makespan. The closer the rectangular areas are, the more balanced the load of the machine and the better the scheduling effect [48]. It can be found that the rectangles in the three Gantt charts are relatively close, reflecting that our model realizes satisfactory scheduling. The above analyses prove that using an end-to-end transformer to solve DJSSP is feasible and rather effective.

4.3. Data Ablation

Transformer-based models are known to require large amounts of data for training [49] and we verify the dependence of SPP-Transformer on the scale of datasets by doing the experiment that trains SPP-Transformer on different scale datasets. We randomly sample 1000 and 10,000 samples from the original dataset containing 100,000 samples to build two new subsets. For simplicity, other hyperparameters are kept fixed except for the scales of the training datasets. The line chart is depicted to show the RPD of each SPP-Transformer on each instance more clearly, as shown in Figure 8. And “ours-100000”, “ours-10000”, and “ours-1000” represent the SPP-Transformer trained on the 100,000, 10,000, and 1000 samples datasets, respectively. The detailed results of each model are listed in Appendix B.
It can be found that except for the RPD on LA04, LA05, LA07, and LA14 being relatively closer, the performances of the three SPP-Transformers on other instances are still significantly different. Moreover, “ours-100000” performs best among the three models and its performance is even better than the optimal solutions on some instances, reflecting that a larger dataset helps improve the performance of SPP-Transformer. The fact that the improvement between “ours-100000” and “ours-10000” is more significant than that between “ours-10000” and “ours-1000” reasonably explains that the larger dataset can better improve the performance of the model, which provides a gist for us to further improve the performance of SPP-Transformer.

5. Conclusions

This paper introduces an end-to-end transformer-based deep learning model called SPP-Transformer to solve DJSSP of different sizes. SPP-Transformer is constructed by a feature extraction module, a feature compression module, and an action selection module. Firstly, a DJSSP instance represented by the disjunctive graph is input to the feature extraction module, and the feature extraction module extracts the features in the disjunctive graph and transforms them into various-sized feature vectors. Then, the feature compression module compresses the various-sized feature vectors into the fixed-length vectors. Finally, the action selection module selects the simple priority rule to schedule the job at each step to complete the scheduling. The experimental results show that the makespan of SPP-Transformer is 11.67% smaller than the average makespan of dispatching rules, meta-heuristic methods, and RL methods. Specifically, the makespan of SPP-Transformer is 14.30% smaller than the average makespan of dispatching rules, 16.99% smaller than the average makespan of meta-heuristic methods, and 13.82% smaller than the average makespan of RL methods. The experimental results prove that SPP-Transformer can generalize well on different-sized JSSP instances and effectively deal with dynamic events. Moreover, it is also proved that a larger training dataset improves the performance of SPP-Transformer. However, the training time of SPP-Transformer is relatively long since there are lots of parameters in SPP-Transformer and collecting enough data is not easy since it needs a large dataset, which should be further solved. And the dynamic events considered in this paper are limited, which can be further improved. To the best of our knowledge, this is the first study of an end-to-end transformer in DJSSP, providing a paradigm for future research on deep learning in DJSSP. Using the disjunctive graphs as input and dispatching rules as output, further research can be conducted to explore other end-to-end deep learning-based methods besides the transformer-based methods to solve DJSSP.

Author Contributions

Conceptualization, S.C.; methodology, S.C.; software, S.C.; validation, S.C., Z.H. and H.G.; formal analysis, S.C.; investigation, S.C. and Z.H.; resources, S.C. and Z.H.; data curation, S.C.; writing—original draft preparation, S.C. and Z.H.; writing—review and editing, S.C. and H.G.; visualization, S.C. and Z.H.; project administration, S.C.; funding acquisition, S.C. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (“Climbing Program” Special Funds) (Grant No.pdjh2022b0059), the Guangdong Academic Degree and Graduate Education Reform Research Project (Grant No.2019JGXM15), Guangdong Province Higher Education Teaching Research and Reform Project (Grant No.2020059), the Guangdong Graduate Education Innovation Project (Grant No.82620516), Jinan University Off-campus Practice Teaching Base Construction Project (Grant No.55691207), Guangdong Higher Education Association “14th Five-Year Plan” Higher Education Research Topic (Grant No.21GZD09), Guangdong Province “Quality Engineering” Construction Project (Grant No.210308), Jinan University Graduate Education Teaching Achievement Cultivation Project (Grant No.2021YPY010), Jinan University Teaching Reform Research Project (Grant No.JG2022086).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All experiment datasets of solving DJSSP are available at https://github.com/Yunhui1998/Gymjsp (accessed on 17 May 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. The comparison between simple priority rules and SPP-Transformer on makespan.
Table A1. The comparison between simple priority rules and SPP-Transformer on makespan.
InstanceSizeOptMethods
FIFOLIFOLORLPTLTPTMORSPTSTPTSPP-Transformer
FT066 × 659657068776859888359
FT1010 × 101002123012011352129511901163107412621180
FT205 × 201205165813041487163114951601126713091231
LA015 × 10666848772941822835763751933699
LA025 × 10684821799982990881812821761716
LA035 × 10615734765758825696726672811651
LA045 × 10615737827745818887706711874670
LA055 × 10593593681704693658593610677593
LA065 × 159261014124610951125109892612001012926
LA075 × 1592011051156113710691025100110341105956
LA085 × 1586698211019951035952925942995863
LA095 × 159521011108911491183130495110451220951
LA105 × 159581035113210321132103395810491312958
LA115 × 201222122914881586146714161222147314461222
LA125 × 201039103913291295124011261039120313581039
LA135 × 201151116015191320123012521150127512671150
LA145 × 201292132115051567143414191292142716461292
LA155 × 201336147115191598161213941436133916321321
LA1610 × 101108129713061119122911651108115612681074
LA1710 × 1084490898794810829938449241120812
LA1810 × 10942105711781154111411989429811111942
LA1910 × 1095210629921004106210041088940981935
LA2010 × 10102612431092120712721086113010001201979
Average 912102710891098110610519759991104923
Table A2. The comparison between simple priority rules and SPP-Transformer on RPD (%).
Table A2. The comparison between simple priority rules and SPP-Transformer on RPD (%).
InstanceSizeMethods
FIFOLIFOLORLPTLTPTMORSPTSTPTSPP-Transformer
FT066 × 610.1718.6415.2530.5115.25049.1540.680
FT1010 × 1022.7519.8634.9329.2418.7616.077.1925.9517.76
FT205 × 2037.598.2223.435.3524.0732.865.158.632.16
LA015 × 1027.3315.9241.2923.4225.3814.5612.7640.094.95
LA025 × 1020.0316.8143.5744.7428.818.7120.0311.264.68
LA035 × 1019.3524.3923.2534.1513.1718.059.2731.875.85
LA045 × 1019.8434.4721.1433.0144.2314.815.6142.118.94
LA055 × 10014.8418.7216.8610.9602.8714.170
LA065 × 159.534.5618.2521.4918.57029.599.290
LA075 × 1520.1125.6523.5916.211.418.812.3920.113.91
LA085 × 1513.3927.1414.919.529.936.818.7814.9−0.35
LA095 × 156.214.3920.6924.2636.97−0.119.7728.15−0.11
LA105 × 158.0418.167.7218.167.8309.536.950
LA115 × 200.5721.7729.7920.0515.88020.5418.330
LA125 × 20027.9124.6419.358.37015.7830.70
LA135 × 200.7831.9714.686.868.77−0.0910.7710.08−0.09
LA145 × 202.2416.4921.2810.999.83010.4527.40
LA155 × 2010.113.719.6120.664.347.490.2222.16−1.12
LA1610 × 1017.0617.870.9910.925.1404.3314.44−3.07
LA1710 × 107.5816.9412.3228.217.6509.4832.7−3.79
LA1810 × 1012.2125.0522.5118.2627.1804.1417.940
LA1910 × 1011.554.25.4611.555.4614.29−1.263.05−1.79
LA2010 × 1021.156.4317.6423.985.8510.14−2.5317.06−4.58
Average 12.9419.820.6822.5116.257.0611.4822.521.45
Table A3. The comparison between composite dispatching rules and SPP-Transformer on makespan.
Table A3. The comparison between composite dispatching rules and SPP-Transformer on makespan.
InstanceSizeOptMethods
LOR+SPTMOR+LPTFCFS∗SSI/QSPP-Transformer
FT066 × 6598568788459
FT1010 × 10100211001284117611141180
FT205 × 20120512801616135812791231
LA015 × 10666761818869775699
LA025 × 10684796952773778716
LA035 × 10615745807770711651
LA045 × 10615703865722698670
LA055 × 10593620668623623593
LA065 × 159261182112210221171926
LA075 × 159201021108710821023956
LA085 × 158669591086902958863
LA095 × 159521122117810151060951
LA105 × 159581090111610411077958
LA115 × 20122214941471131814351222
LA125 × 20103912141245116812271039
LA135 × 20115113381278123412891150
LA145 × 20129215181405142614631292
LA155 × 20133613561596142913661321
LA1610 × 10110812551219114411811074
LA1710 × 1084495610741001944812
LA1810 × 1094299810991029998942
LA1910 × 109529501068963957935
LA2010 × 1010261022121012231023979
Average 9121025110110161010923
Table A4. The comparison between composite dispatching rules and SPP-Transformer on RPD (%).
Table A4. The comparison between composite dispatching rules and SPP-Transformer on RPD (%).
InstanceSizeMethods
LOR+SPTMOR+LPTFCFS∗SSI/QSPP-Transformer
FT066 × 644.0715.2532.242.370
FT1010 × 109.7828.1417.3711.1817.76
FT205 × 206.2234.1112.76.142.16
LA115 × 1014.2622.8230.4816.374.95
LA125 × 1016.3739.1813.0113.744.68
LA135 × 1021.1431.2225.215.615.85
LA145 × 1014.3140.6517.413.58.94
LA155 × 104.5512.655.065.060
LA165 × 1527.6521.1710.3726.460
LA175 × 1510.9818.1517.6111.23.91
LA185 × 1510.7425.44.1610.62−0.35
LA195 × 1517.8623.746.6211.34−0.11
LA105 × 1513.7816.498.6612.420
LA115 × 2022.2620.387.8617.430
LA125 × 2016.8419.8312.4218.090
LA135 × 2016.2511.037.2111.99−0.09
LA145 × 2017.498.7510.3713.240
LA155 × 201.519.466.962.25−1.12
LA1610 × 1013.2710.023.256.59−3.07
LA1710 × 1013.2727.2518.611.85−3.79
LA1810 × 105.9416.679.245.940
LA1910 × 10−0.2112.181.160.53−1.79
LA2010 × 10−0.3917.9319.2−0.29−4.58
Average 13.8221.4112.9212.331.45
Table A5. The comparison between meta-heuristic methods and SPP-Transformer on makespan.
Table A5. The comparison between meta-heuristic methods and SPP-Transformer on makespan.
InstanceSizeOptMethods
GADEGWOSPP-Transformer
FT066 × 65958605959
FT1010 × 1010021331131212191180
FT205 × 2012051634163515641231
LA015 × 10666782769727699
LA025 × 10684821805768716
LA035 × 10615737731696651
LA045 × 10615736728681670
LA055 × 10593633616599593
LA065 × 1592610401025982926
LA075 × 15920108510681014956
LA085 × 1586610451024974863
LA095 × 15952109410631021951
LA105 × 1595810481020985958
LA115 × 2012221400137113301222
LA125 × 2010391195118211331039
LA135 × 2011511335131212611150
LA145 × 2012921377135213211292
LA155 × 2013361527149814491321
LA1610 × 1011081238120611381074
LA1710 × 1084410721032947812
LA1810 × 10942114911091040942
LA1910 × 10952118211421066935
LA2010 × 101026123511851118979
Average 912107610541004923
Table A6. The comparison between meta-heuristic methods and SPP-Transformer on RPD (%).
Table A6. The comparison between meta-heuristic methods and SPP-Transformer on RPD (%).
InstanceSizeMethods
GADEGWOSPP-Transformer
FT066 × 6−1.691.6900
FT1010 × 1032.8330.9421.6617.76
FT205 × 2035.635.6829.792.16
LA015 × 1017.4215.479.164.95
LA025 × 1020.0317.6912.284.68
LA035 × 1019.8418.8613.175.85
LA045 × 1019.6718.3710.738.94
LA055 × 106.753.881.010
LA065 × 1512.3110.696.050
LA075 × 1517.9316.0910.223.91
LA085 × 1520.6718.2412.47−0.35
LA095 × 1514.9211.667.25−0.11
LA105 × 159.396.472.820
LA115 × 2014.5712.198.840
LA125 × 2015.0113.769.050
LA135 × 2015.9913.999.56−0.09
LA145 × 206.584.642.240
LA155 × 2014.312.138.46−1.12
LA1610 × 1011.738.842.71−3.07
LA1710 × 1027.0122.2712.2−3.79
LA1810 × 1021.9717.7310.40
LA1910 × 1024.1619.9611.97−1.79
LA2010 × 1020.3715.58.97−4.58
Average 17.2815.089.611.45
Table A7. The comparison between RL methods and SPP-Transformer on makespan.
Table A7. The comparison between RL methods and SPP-Transformer on makespan.
InstanceSizeOptMethods
DQNRainbowPPOSPP-Transformer
FT066 × 65965636759
FT1010 × 1010021289123513411180
FT205 × 2012051463137814791231
LA015 × 10666785935759699
LA025 × 10684809804851716
LA035 × 10615750742713651
LA045 × 10615684812779670
LA055 × 10593592660634593
LA065 × 159269841066963926
LA075 × 1592010359911034956
LA085 × 1586610149641125863
LA095 × 1595210101121987951
LA105 × 1595897810451021958
LA115 × 2012221283148013141222
LA125 × 2010391123120411211039
LA135 × 2011511198130012431150
LA145 × 2012921283129013241292
LA155 × 2013361390139315161321
LA1610 × 1011081102118311791074
LA1710 × 10844894900987812
LA1810 × 1094299311471123942
LA1910 × 109529639581146935
LA2010 × 101026104711831178979
Average 91298810371038923
Table A8. The comparison between RL methods and SPP-Transformer on RPD (%).
Table A8. The comparison between RL methods and SPP-Transformer on RPD (%).
InstanceSizeMethods
DQNRainbowPPOSPP-Transformer
FT066 × 610.176.7813.560
FT1010 × 1028.6423.2533.8317.76
FT205 × 2021.4114.3622.742.16
LA015 × 1017.8740.3913.964.95
LA025 × 1018.2717.5424.424.68
LA035 × 1021.9520.6515.935.85
LA045 × 1011.2232.0326.678.94
LA055 × 10−0.1711.36.910
LA065 × 156.2615.1240
LA075 × 1512.57.7212.393.91
LA085 × 1517.0911.3229.91−0.35
LA095 × 156.0917.753.68−0.11
LA105 × 152.099.086.580
LA115 × 204.9921.117.530
LA125 × 208.0815.887.890
LA135 × 204.0812.957.99−0.09
LA145 × 20−0.7−0.152.480
LA155 × 204.044.2713.47−1.12
LA1610 × 10−0.546.776.41−3.07
LA1710 × 105.926.6416.94−3.79
LA1810 × 105.4121.7619.210
LA1910 × 101.160.6320.38−1.79
LA2010 × 102.0515.314.81−4.58
Average 9.0414.4514.421.45

Appendix B

Table A9. The comparison between different SPP-Transformers on makespan.
Table A9. The comparison between different SPP-Transformers on makespan.
InstanceSizeOptMethods
Ours-1000Ours-10000Ours-100000
FT066 × 659675959
FT1010 × 101002134712061180
FT205 × 201205141413131231
LA015 × 10666787706699
LA025 × 10684755741716
LA035 × 10615688699651
LA045 × 10615688690670
LA055 × 10593602597593
LA065 × 159269881005926
LA075 × 15920975978956
LA085 × 15866937883863
LA095 × 159521007951951
LA105 × 159581033956958
LA115 × 201222131714251222
LA125 × 201039118010551039
LA135 × 201151121712071150
LA145 × 201292129612921292
LA155 × 201336137413551321
LA1610 × 101108113111271074
LA1710 × 10844904904812
LA1810 × 10942996984942
LA1910 × 10952969963935
LA2010 × 10102610201015979
Average 9129881037923
Table A10. The comparison between different SPP-Transformers on RPD (%).
Table A10. The comparison between different SPP-Transformers on RPD (%).
InstanceSizeMethods
Ours-1000Ours-10000Ours-100000
FT066 × 613.5600
FT1010 × 1034.4320.3617.76
FT205 × 2017.348.962.16
LA015 × 1018.176.014.95
LA025 × 1010.388.334.68
LA035 × 1011.8713.665.85
LA045 × 1011.8712.28.94
LA055 × 101.520.670
LA065 × 156.78.530
LA075 × 155.986.33.91
LA085 × 158.21.96−0.35
LA095 × 155.78−0.11−0.11
LA105 × 157.83−0.210
LA115 × 207.7716.610
LA125 × 2013.571.540
LA135 × 205.734.87−0.09
LA145 × 200.3100
LA155 × 202.841.42−1.12
LA1610 × 102.081.71−3.07
LA1710 × 107.117.11−3.79
LA1810 × 105.734.460
LA1910 × 101.791.16−1.79
LA2010 × 10−0.58−1.07−4.58
Average 8.695.411.45

References

  1. Ma, J.; Chen, H.; Zhang, Y.; Guo, H.; Ren, Y.; Mo, R.; Liu, L. A digital twin-driven production management system for production workshop. Int. J. Adv. Manuf. Technol. 2020, 110, 1385–1397. [Google Scholar] [CrossRef]
  2. Guo, H.; Zhu, Y.; Zhang, Y.; Ren, Y.; Chen, M.; Zhang, R. A digital twin-based layout optimization method for discrete manufacturing workshop. Int. J. Adv. Manuf. Technol. 2021, 112, 1307–1318. [Google Scholar] [CrossRef]
  3. Garey, M.R.; Johnson, D.S.; Sethi, R. The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1976, 1, 117–129. [Google Scholar] [CrossRef]
  4. Renke, L.; Piplani, R.; Toro, C. A Review of Dynamic Scheduling: Context, Techniques and Prospects. J. Intell. Syst. Ref. Libr. Implement. Ind. 2021, 4, 229–258. [Google Scholar] [CrossRef]
  5. Tay, J.C.; Ho, N.B. Evolving dispatching rules using genetic programming for solving multi-objective flexible job-shop problems. Comput. Ind. Eng. 2008, 54, 453–473. [Google Scholar] [CrossRef] [Green Version]
  6. Papakostas, N.; Chryssolouris, G. A scheduling policy for improving tardiness performance. Asian Int. J. Sci. Technol. 2009, 2, 79–89. [Google Scholar]
  7. Cheng, R.; Gen, M.; Tsujimura, Y. A tutorial survey of job-shop scheduling problems using genetic algorithms—I. Representation. Comput. Ind. Eng. 1996, 30, 983–997. [Google Scholar] [CrossRef]
  8. Wong, K.P.; Dong, Z.Y. Differential evolution, an alternative approach to evolutionary algorithm. In Proceedings of the 13th International Conference on Intelligent Systems Application to Power Systems, Arlington, VA, USA, 6–10 November 2005; pp. 73–83. [Google Scholar]
  9. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  10. Zhou, R.; Nee, A.; Lee, H. Performance of an ant colony optimisation algorithm in dynamic job shop scheduling problems. Int. J. Prod. Res. 2009, 47, 2903–2920. [Google Scholar] [CrossRef]
  11. Wang, Z.; Zhang, J.; Yang, S. An improved particle swarm optimization algorithm for dynamic job shop scheduling problems with random job arrivals. Swarm Evol. Comput. 2019, 51, 100594. [Google Scholar] [CrossRef]
  12. Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
  13. Wang, F.Y.; Zhang, J.J.; Zheng, X.; Wang, X.; Yuan, Y.; Dai, X.; Zhang, J.; Yang, L. Where does AlphaGo go: From church-turing thesis to AlphaGo thesis and beyond. IEEE/CAA J. Autom. Sin. 2016, 3, 113–120. [Google Scholar] [CrossRef]
  14. Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
  15. Rinciog, A.; Meyer, A. Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling. arXiv 2021, arXiv:2104.08196. [Google Scholar] [CrossRef]
  16. Zhou, T.; Tang, D.; Zhu, H.; Wang, L. Reinforcement learning with composite rewards for production scheduling in a smart factory. IEEE Access 2020, 9, 752–766. [Google Scholar] [CrossRef]
  17. Turgut, Y.; Bozdag, C.E. Deep Q-network model for dynamic job shop scheduling pproblem based on discrete event simulation. In Proceedings of the 2020 Winter Simulation Conference (WSC), Orlando, FL, USA, 14–18 December 2020; pp. 1551–1559. [Google Scholar]
  18. Wang, L.; Hu, X.; Wang, Y.; Xu, S.; Ma, S.; Yang, K.; Liu, Z.; Wang, W. Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Comput. Netw. 2021, 190, 107969. [Google Scholar] [CrossRef]
  19. Liu, C.L.; Chang, C.C.; Tseng, C.J. Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access 2020, 8, 71752–71762. [Google Scholar] [CrossRef]
  20. Ghosh, D.; Rahme, J.; Kumar, A.; Zhang, A.; Adams, R.P.; Levine, S. Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 25502–25515. [Google Scholar]
  21. Weckman, G.R.; Ganduri, C.V.; Koonce, D.A. A neural network job-shop scheduler. J. Intell. Manuf. 2008, 19, 191–201. [Google Scholar] [CrossRef]
  22. Sim, M.H.; Low, M.Y.H.; Chong, C.S.; Shakeri, M. Job Shop Scheduling Problem Neural Network Solver with Dispatching Rules. In Proceedings of the 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 14–17 December 2020; pp. 514–518. [Google Scholar]
  23. Zang, Z.; Wang, W.; Song, Y.; Lu, L.; Li, W.; Wang, Y.; Zhao, Y. Hybrid deep neural network scheduler for job-shop problem based on convolution two-dimensional transformation. Comput. Intell. Neurosci. 2019, 2019, 7172842. [Google Scholar] [CrossRef] [Green Version]
  24. Tian, W.; Zhang, H. A dynamic job-shop scheduling model based on deep learning. Adv. Prod. Eng. Manag. 2021, 16, 23–36. [Google Scholar] [CrossRef]
  25. Shao, X.; Kim, C.S. Self-Supervised Long-Short Term Memory Network for Solving Complex Job Shop Scheduling Problem. KSII Trans. Internet Inf. Syst. (TIIS) 2021, 15, 2993–3010. [Google Scholar] [CrossRef]
  26. Morariu, C.; Borangiu, T. Time series forecasting for dynamic scheduling of manufacturing processes. In Proceedings of the 2018 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, 24–26 May 2018; pp. 1–6. [Google Scholar]
  27. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 11106–11115. [Google Scholar]
  28. Magalhães, R.; Martins, M.; Vieira, S.; Santos, F.; Sousa, J. Encoder-Decoder Neural Network Architecture for solving Job Shop Scheduling Problems using Reinforcement Learning. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 5–7 December 2021; pp. 1–8. [Google Scholar]
  29. Yang, S. Using Attention Mechanism to Solve Job Shop Scheduling Problem. In Proceedings of the 2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 14–16 January 2022; pp. 59–62. [Google Scholar]
  30. Chen, R.; Li, W.; Yang, H. A Deep Reinforcement Learning Framework Based on an Attention Mechanism and Disjunctive Graph Embedding for the Job Shop Scheduling Problem. IEEE Trans. Ind. Inform. 2022, 1. [Google Scholar] [CrossRef]
  31. Shakhlevich, N.; Sotskov, Y.N.; Werner, F. Adaptive scheduling algorithm based on mixed graph model. IEE Proc.-Control Theory Appl. 1996, 43, 9–16. [Google Scholar] [CrossRef]
  32. Gholami, O.; Sotskov, Y.N. Solving parallel machines job-shop scheduling problems by an adaptive algorithm. Int. J. Prod. Res. 2014, 52, 3888–3904. [Google Scholar] [CrossRef]
  33. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [Green Version]
  34. Wang, Z.; Zhang, J.; Si, J. Dynamic job shop scheduling problem with new job arrivals: A survey. In Proceedings of the Chinese Intelligent Automation Conference, Zhenjiang, China, 20–22 September 2019; pp. 664–671. [Google Scholar]
  35. Błażewicz, J.; Pesch, E.; Sterna, M. The disjunctive graph machine representation of the job shop scheduling problem. Eur. J. Oper. Res. 2000, 127, 317–331. [Google Scholar] [CrossRef]
  36. Zeng, Y.; Liao, Z.; Dai, Y.; Wang, R.; Li, X.; Yuan, B. Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism. arXiv 2022, arXiv:2201.00548. [Google Scholar] [CrossRef]
  37. Chen, B.; Matis, T.I. A flexible dispatching rule for minimizing tardiness in job shop scheduling. Int. J. Prod. Econ. 2013, 141, 360–365. [Google Scholar] [CrossRef]
  38. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  39. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
  40. Wang, Y.; Mohamed, A.; Le, D.; Liu, C.; Xiao, A.; Mahadeokar, J.; Huang, H.; Tjandra, A.; Zhang, X.; Zhang, F.; et al. Transformer-based acoustic modeling for hybrid speech recognition. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6874–6878. [Google Scholar]
  41. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  42. Beasley, J.E. OR-Library: Distributing test problems by electronic mail. J. Oper. Res. Soc. 1990, 41, 1069–1072. [Google Scholar] [CrossRef]
  43. Fisher, H. Probabilistic Learning Combinations of Local Job-Shop Scheduling Rules; Prentice-Hall: Hoboken, NJ, USA, 1963; pp. 225–251. [Google Scholar]
  44. Lawrence, S. Resouce Constrained Project Scheduling: An Experimental Investigation of Heuristic Scheduling Techniques (Supplement); Graduate School of Industrial Administration, Carnegie-Mellon University: Pittsburgh, PA, USA, 1984. [Google Scholar]
  45. Available online: https://github.com/Yunhui1998/TOFA (accessed on 27 April 2022).
  46. Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3215–3222. [Google Scholar]
  47. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  48. Zhang, W.; Wen, J.; Zhu, Y.; Hu, Y. Multi-objective scheduling simulation of flexible job-shop based on multi-population genetic algorithm. Int. J. Simul. Model. 2017, 16, 313–321. [Google Scholar] [CrossRef]
  49. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
Figure 1. An example of JSSP intance and its solution representing by the disjunctive graphs.
Figure 1. An example of JSSP intance and its solution representing by the disjunctive graphs.
Machines 10 00573 g001
Figure 2. The framework of the feature compression module.
Figure 2. The framework of the feature compression module.
Machines 10 00573 g002
Figure 3. The whole framework of SPP-Transormer.
Figure 3. The whole framework of SPP-Transormer.
Machines 10 00573 g003
Figure 4. The average PRD(%) of each method on each instance.
Figure 4. The average PRD(%) of each method on each instance.
Machines 10 00573 g004
Figure 5. The Gantt chart of FT06.
Figure 5. The Gantt chart of FT06.
Machines 10 00573 g005
Figure 6. The Gantt chart of LA01.
Figure 6. The Gantt chart of LA01.
Machines 10 00573 g006
Figure 7. The Gantt chart of LA20.
Figure 7. The Gantt chart of LA20.
Machines 10 00573 g007
Figure 8. The RPD of each SPP-Transformer on each instance.
Figure 8. The RPD of each SPP-Transformer on each instance.
Machines 10 00573 g008
Table 1. DJSSP specific parameters table.
Table 1. DJSSP specific parameters table.
ParametersDescription
iindex of jobs, i = 1 , 2 , , n
jindex of operations of a job, j = 1 , 2 , , h
mthe number of machines
C max the maximum completion time of all jobs
s i j the start time of operation O i j
e i j the end time of operation O i j
e i j the end time of operation O i j
p i j the processing time of operation O i j
Tthe maximum total number of all planned operations at the current time
Table 2. List of hyperparameters and their values.
Table 2. List of hyperparameters and their values.
HyperparameterValue
Number of encoder layers6
Number of decoder layers6
Number of feed-forward network layers1
Random rate f0.1
S w i t c h True
Number of heads8
Number of training epochs10,000
Number of test epochs100
OptimizerAdam
Mini-batch size128
Dropout rate0.1
Table 3. The comparison between each method on makespan and RPD (%).
Table 3. The comparison between each method on makespan and RPD (%).
InstanceSizeOptMethods
Simple Priority RulesComposite Dispatching RulesMeta-Heuristic MethodsRL MethodsSPP-Transformer
MakespanRPDMakespanRPDMakespanRPDMakespanRPDMakespanRPD
FT066 × 6597222.467933.475906510.17590
FT1010 × 101002122121.84116916.62128728.48128828.57118017.76
FT205 × 201205146921.91138314.79161133.69144019.512312.16
LA015 × 1066683325.0980620.9875914.0282624.076994.95
LA025 × 1068485825.4982520.5879816.6782120.087164.68
LA035 × 1061574821.6975823.2972117.2973519.516515.85
LA045 × 1061578828.1574721.4771516.2675823.316708.94
LA055 × 105936519.86346.836163.886296.015930
LA065 × 15926109017.66112421.4110169.6810048.469260
LA075 × 15920107917.28105314.49105614.75102010.879563.91
LA085 × 1586699114.4297612.73101417.13103419.44863−0.35
LA095 × 15952111917.54109414.89105911.2810399.17951−0.11
LA105 × 15958108513.3108112.8410186.2310155.929580
LA115 × 201222141615.87143016.98136711.87135911.2112220
LA125 × 201039120415.84121416.8117012.61114910.6210390
LA135 × 201151127210.48128511.62130313.1812478.341150−0.09
LA145 × 201292145112.34145312.4613504.4912990.5412920
LA155 × 201336150012.2914377.54149111.6314337.261321−1.12
LA1610 × 10110812068.8412008.2811947.7611554.211074−3.07
LA1710 × 1084497615.6199417.74101720.499279.83812−3.79
LA1810 × 10942109215.9110319.45109916.7108815.469420
LA1910 × 1095210176.799853.42113018.710227.39935−1.79
LA2010 × 101026115412.4711209.11117914.95113610.72979−4.58
Average 912105616.66103815.12104513.99102112.649231.45
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, S.; Huang, Z.; Guo, H. An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem. Machines 2022, 10, 573. https://doi.org/10.3390/machines10070573

AMA Style

Chen S, Huang Z, Guo H. An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem. Machines. 2022; 10(7):573. https://doi.org/10.3390/machines10070573

Chicago/Turabian Style

Chen, Shifan, Zuyi Huang, and Hongfei Guo. 2022. "An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem" Machines 10, no. 7: 573. https://doi.org/10.3390/machines10070573

APA Style

Chen, S., Huang, Z., & Guo, H. (2022). An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem. Machines, 10(7), 573. https://doi.org/10.3390/machines10070573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop