1. Introduction
With the introduction of Industry 4.0, production systems have become increasingly complex within the last few years [
1]. This is, on one hand, due to the growing size, but it is also due to the extended automation influences in manufacturing environments. As a result, the demands on intralogistics in terms of flexibility and internal control are increasing. One major development within the technology of material handling is the evolution of autonomous mobile robots (AMRs), including AGVs [
2]. AGVs are wheel-based and unmanned vehicles that transport small or large unit loads along the floor of a facility. These load carriers provide the internal and external transport of materials [
3]. Two of the main enabling software technologies behind AGVs are scheduling and routing. Thereby, dynamic AGV scheduling models have the potential to minimize delays and reduce production costs by facilitating a minimal production makespan [
4].
Scheduling, in general, is a decision-making process that deals with the allocation of resources to tasks over given time periods, while the goal is to optimize one or more objectives [
5]. Thereby, resources and tasks can take many different forms, and the objectives themselves may range from a minimization of the completion time of the last task to the minimization of overall delay, depending on the use case. Scheduling can be either static or dynamic; in static scheduling, all task information is stable and obtained in advance, while dynamic scheduling considers uncertainties on the shop floor [
6]. The majority of scheduling problems are NP-hard problems, which are more difficult to solve than others. This is because the computation time required for solving a problem increases faster as the size of the problem grows [
7]. There are a wide variety of approaches for solving scheduling problems. However, they differ in their solution quality. Traditional scheduling approaches include three major areas: heuristic approaches, such as dispatching rules; metaheuristics, such as simulated annealing, tabu search, local search, and genetic algorithms; and exact methods, such as mathematical programming (MP), mixed-integer programming (MIP), linear programming (LP), integer linear programming (ILP), and mixed integer linear programming (MILP) [
8,
9]. Another emerging technology for optimizing scheduling problems is constraint programming (CP). The dynamic selection of the appropriate algorithm for a specific scheduling problem is a new challenge, especially in the case of changing framework conditions. Considering the new complexity arising within production systems, the dynamic selection of the most appropriate algorithm for a specific scheduling problem can be beneficial. Depending on the scheduling problem, some algorithms might be more efficient than others, especially when considering the specific instances within a scheduling problem.
Algorithm selection has been part of research for a long time. The algorithm selection problem was introduced in 1976 [
10]. Three major characteristics help to categorize a model for algorithm selection: the problem space, algorithm space, and performance space. Algorithm selection has become especially relevant in the last decade, as researchers are increasingly investigating how to identify the most suitable existing algorithm for solving a problem instead of developing new algorithms [
11]. This is because, in some scenarios, a new approach will improve the current state of the art but only for some problems. Selecting the most suitable algorithm for a particular problem aims to mitigating these problems and has the potential to significantly increase performance in practice [
12]. Researchers use the benefits of algorithm selection to explore the details of data and to make use of data characteristics in order to choose the most promising algorithm to save computational time and increase performance.
For executing the algorithm selection process, researchers make use of different techniques to select algorithms from several algorithms for solving optimization problems, such as integration or hyper-heuristics. Hyper-heuristics select between low-level heuristics for solving optimization problems [
13]. They operate at a higher level of abstraction than traditional heuristics, which makes them more flexible and adaptable to different problem domains. Additionally, it is possible to use integration as part of an algorithm selection process. Therefore, a set of algorithms and performance metrics is defined, and an integration method calculates the overall performance of each algorithm based on metrics [
14]. The difference between these ideas, in essence, is that the algorithm selection method refers to the overall process of selecting an algorithm for a given problem, which can involve a variety of different techniques and approaches. These techniques can be integration, metaheuristics, or hyper-heuristics, but machine learning also finds frequent application.
Machine learning has received considerable attention in recent years due to its ability to learn patterns in data relating to a specified output. It extracts useful knowledge from the generated data throughout the search process and can support the detection of a problem class by evaluating incoming scheduling problems according to their individual characteristics. Therefore, machine learning algorithms were implemented to select algorithms for several problem areas [
15].
The motivation lies in the successful application of algorithm selection approaches in other domains. One outstanding example within the field of satisfiability problems (SAT) is the portfolio-based algorithm selection
SATzilla [
16]. Instead of traditionally choosing the best solver for a given class of instances, this decision is online on a per-instance basis. The approach takes a distribution of problem instances and a set of component solvers as input. It constructs a portfolio that optimizes a given objective function. Furthermore, Refs. [
17,
18] introduced algorithm selection approaches for two well-known production planning problems: the job-shop scheduling problem (JSSP) and the flexible job-shop scheduling problem (FJSSP). In [
17], the authors proposed a CP model with several solvers for the FJSSP and showed that there was no clear winner over all tested instances. The same results were obtained in [
18] by testing several state-of-the-art algorithms for the JSSP. They both implemented machine learning algorithms to select the best solver for specific instances of the individual scheduling problem class.
This article aimed to develop an algorithm selector that is capable of finding the most suitable scheduling solution strategy with machine learning to determine the best schedule for a particular AGV problem instance. An automated algorithm selector was built from two state-of-the-art solvers for the JSSP and FJSSP, which were transferred to AGV scheduling. Several machine learning models were designed to decide which algorithm to run on an incoming instance. By selecting the algorithm for every given problem instance, a scheduling system can react dynamically to incoming scheduling problem types. There is no static algorithm presented. Instead, the instance is analyzed, and the suitable algorithm, based on the input features, is chosen. Furthermore, more than one overall problem type is taken into account. This enables a manufacturing system to be flexible in its approach to sending transport requests. These can already be assigned to an AGV, or the scheduler can allocate the orders to AGVs. The selection of an appropriate algorithm for individual scheduling scenarios enables a dynamic procedure of scheduling.
To this end, the applied methodology to develop the algorithm selector is first presented in
Section 2. The analysis phase unveils the selection of the algorithm base and the theoretical concept of the selector. The design phase introduces the problem classification and the mathematical problem definition. The modeling of algorithms takes place, as well as the design of the dataset for training the machine learning algorithms.
Section 3 presents the results of the implementation, which include the comparison of scheduling algorithms and the evaluation of machine learning models. A conclusion summarizing the results and the discussion follows in
Section 4.
3. Results and Discussion
The proposed algorithm selector and the scheduling algorithms were programmed in Python version 3.9 and run on an 11th generation Intel® Core i7-1165G7 @ 2.80 GHz processor with 16.0 GB of RAM. The time limits of the solvers were set to 30 s and 300 s to ensure realistic calculation times and to also test the effects of extending the time limit. The experiments were conducted on all presented instances, and the solvers successfully solved the instances.
In order to ensure the correctness of the algorithms, the results were compared to the results of [
65] for the FJSSP benchmark instances and the JSSP with the benchmark instance collection [
66]. The comparison showed similar results, which ensured the overall correctness of the algorithms. Furthermore, for some instances, CP Optimizer or the OR-Tools solver found better values for the makespan. This illustrated the development of solvers over the years and their improvements in solving larger instances faster.
3.1. Computational Study on Benchmark Instances with CP Solvers
Before going into detail about the solvers’ performances per instance, we present the evaluation of the overall performance. The solvers achieved optimal results for 27% of the instances when given a time limit of 30 s and 43% of the instances after the extension of the time limit to 300 s for the JSSP. Regarding the FJSSP, the solvers calculated the optimum for 71.3% of instances with a 30 s time limit and 76.9% of instances with a 300 s time limit, as shown in
Figure 4.
After calculating the makespan with the introduced solvers, the results were compared (
Table 4 and
Table 5). During the computational study, it became clear that there was no overall winner for all problem instances. The distribution of wins for FJSSP instances was balanced between both solvers compared to the JSSP results. When extending the time limit to 300 s for calculating the makespan, the results for JSSP differed compared to the previous experiments with a 30 s time limit. For the FJSSP, the results were similar to the results with a 30 s time limit. However, the results showed that the winning solvers sometimes changed for individual instances when changing the time limit. This means that, for instance, OR-Tools provided better results with the 30 s time limits for one instance, while CP Optimizer provided a better makespan with a 300 s time limit. Consequently, the time limit influenced the performance of an algorithm for a specific instance. Nevertheless, this was mostly applicable for equal makespan values calculated by both solvers but with different solving times.
To evaluate the effects of the algorithm selection approach for industrial applications, the assessment of the makespan was important. For 54 instances of the JSSP with a 30 s solving time, both solvers achieved the same makespan. Based on a time limit of 300 s, the solvers calculated the same makespan for 72 instances. In addition, it was interesting to evaluate the impact on the makespan by extending the time limit. For JSSP instances, the extension of the time limit to 300 s achieved time savings of 2.5% over all tested instances. Moreover, the makespan improvements by selecting between two solvers achieved a 5.9% decrease in the makespan for the 30 s time limit and a 2.9% decrease in the makespan for 300 s (
Figure 5). Consequently, the selection has the potential to strongly influence the improvement in the overall productivity of production. The smaller decrease when extending the time limit is not surprising because the longer solving time led to a convergence of solutions at the same time. For instance, with a 300 s time limit, the solvers achieved more equal values for the makespan compared to the 30 s solving time. Consequently, especially for use cases where a smaller calculation time is necessary, for example, dynamically reacting production systems, an algorithm selection can be useful and can increase production performance by up to 5.9%.
Regarding the FJSSP, the extension of the time limit to 300 s achieved, over all instances, time savings of 0.18% compared to the winning makespan values for 30 s. Analyzing the minimization of the makespan by selecting the two solvers revealed a saving of 6329 time units for the 30 s time limit and 1245 time units for the 300 s time limit, calculated over all tested instances (
Figure 6). This showed that the extension of the time limit did not substantially improve the overall performance regarding the makespan.
Within 30 s of solving time, both solvers achieved the same makespan for 187 instances. With a 300 s solving time, both solvers calculated the same makespan for 219 instances. Therefore, there were only small changes between first and second place regarding the makespan metrics. For the 30 s solving time, the difference was 1.31%, and for 300 s the makespan was reduced by 0.3%. Nevertheless, for those instances, the individual solving times deviated widely. Taking the solver with the smaller calculation time saved 77.5% in the calculation time for the 300 s time limit in comparison to the solving time of the second-choice solver and 43.1% for the 30 s maximum calculation time.
Considering the JSSP instances, there were fewer similar makespan values. Nevertheless, when comparing those instances with the similarly calculated makespan, the time savings by selecting between solvers was high as well. For the 30 s overall time limit, the selection saved 50.3% in the calculation time and 22.9% for a 300 s overall time limit.
Although the solvers achieved the same makespan, the difference in calculation time was high, and the algorithm selection could save up to 77.5% in the calculation time. Therefore, especially in industrial environments where time is an expensive factor, an algorithm selection can save costs by reducing the computational time and the production makespan.
3.2. Machine Learning Training and Evaluation
After implementing the selected machine learning models, they were trained on the designed datasets. Furthermore, feature engineering ensured the assessment of feature importance. Three different methods were applied for measuring feature importance: a heatmap with the Pearson correlations, the built-in class feature importance of tree-based classifiers, and random forest classifiers for evaluating feature importance. After selecting the features with the highest correlation to the output, the machine learning models were trained. The evaluation phase was ensured to test the robustness of the models and their validity. It implied the determination of the correct machine learning models to ensure the meeting of requirements and to solve the initial problem. For classification problems, the accuracy, recall, precision, and F1 score are key performance indicators that build on the principles of true positives, false negatives, true negatives, and false positives. The results of the JSSP dataset evaluation are presented in
Table 6 and
Table 7.
The tested models for classification problems provided successful implementations. Nevertheless, depending on the problem class, the performance altered. For the JSSP dataset with a solving time of 30 s, the neural network using the Keras implementation, decision trees, and KNN provided the best results, considering the tested performance metrics. For the 300 s solving time, random forest and KNN provided the best accuracy scores. However, the close values of the performance metrics made a concrete decision for the best machine learning technique difficult. Therefore, the selection of the best machine learning model must consider the dataset used.
The performance metrics for the FJSSP datasets are presented in
Table 8 and
Table 9.
Compared to the JSSP datasets, FJSSP achieved overall weaker results in terms of accuracy, precision, and recall. The highest scores for the 30 s solving time were obtained by the neural network with Sklearn and the decision tree, with an accuracy of 75%. For the 300 s solving time dataset, KNN and decision trees achieved the highest performance metrics. Consequently, there was not one machine learning algorithm providing the best results. The structure of the input data is an important criterion for future prediction accuracy. Therefore, the assessment of machine learning algorithm performance on the given data structure must be part of implementing the algorithm approach for an AGV scheduling use case.
3.3. Validation of Machine Learning Models
An additional measure to test the validity of the machine learning models is cross-validation. During the evaluation, the splitting of data into training and test datasets already validated the models in terms of how they react to unseen data. However, this validation method can lead to sampling bias if one subset includes only some of the existing instance groups. Therefore, cross-validation served as another method to ensure the validity of the created models. K-fold cross-validation ensures more reliability in terms of accuracy and parameter determination. It is a technique that is commonly used to measure the performance and generalizability of a model and helps to reduce the chances of overfitting [
67]. For the presented models, a five-fold cross-validation was applied to measure the overall performance (
Table 10).
Furthermore, in order to validate the model, two new datasets were created with literature data and randomly created test instances. In this way, the machine learning models were validated with previously unseen data. To properly validate the results of the design phase, CP Optimizer and OR-Tools both solved the given instances to evaluate the prediction by the machine learning models. The comparison of the selector predictions with the actual results of the two solvers showed satisfying results. For the JSSP, the decision tree achieved an accuracy of 78.5%, the random forest model achieved an accuracy of 70.1%, and the neural network achieved an accuracy of 67.1%. For the FJSSP validation dataset, the accuracy scores of the correct prediction were even higher (
Table 11).
The testing of the individual machine learning models on the new datasets emphasized the interdependency between datasets and algorithm performance. Overall, the results showed high performance metrics and consequently validated the concept of the algorithm selection approach for AGV scheduling problems.
3.4. Validation of Approach with Disruptions and Rescheduling
To test the approach in a running factory environment, the algorithm selector made predictions on a dataset created from real-world data from the learning factory “Werk 150” at Reutlingen University. The learning factory represents a scooter production that uses AGVs to transport goods between different stations. “Werk 150” is a production company that replicates processes in the areas of product and work system engineering, incoming goods, storage, order picking, production, assembly, and distribution, which are fundamental for the verification of a scheduling system for AGVs [
68]. All necessary processes that impact AGV scheduling are present. For delivering parts, there are three different transport vehicles available (
Figure 7). Two AGVs and one AMR, which travel with an average velocity of 0.9 m/s. The transport orders of the vehicles vary among the production, depending on the outstanding work step. The transport orders are in JSON format and include information about the operation, the machine ID, the area where the transport takes place, the line, and the planned quantity. This file format complies with VDA5050, a standardization approach in the automotive industry in Germany, which shall ensure conflict-free and transparent interfaces between AGV fleets and the central executing software system, independent of the manufacturer.
To test the functionality of the algorithm selector, the experiment included different scheduling scenarios. One scenario considered AGVs that were already assigned to tasks, whereas the second scenario offered alternatives for assigning the AGVs to orders. In this case, the scheduler allocated the eligible AGVs to individual tasks to minimize the makespan of the entire scenario. Furthermore, disruptions within the system were included to test the reaction of the approach to disturbances. The entering disruption included a timestamp that specified the already executed orders. These were deleted in the original scheduling problem in order to reschedule only the pending orders. In the tested scenario, an AGV failure appeared, and the system replaced the defective AGV with the corresponding substitute. After the adaptation, a new schedule was generated, and the execution of the plan continued, considering the downtime of the respective AGV.
4. Conclusions
This research focused on the analysis of AGV scheduling and on potential fields of artificial intelligence to improve the performance of current solutions. The purpose was to discover the possibility of increasing efficiency with the help of artificial intelligence for traditional scheduling approaches. By investigating current scheduling algorithms, constraint programming emerged as a promising solution technique, and the suitability of the developed artifact for dynamic scheduling approaches was tested with real-world data from a learning factory environment.
The algorithm selection model resulting from this research helped to save computational time and improve production efficiency by minimizing the makespan. The experimental study showed that the automated selection process can increase production performance by up to 5.9% and can decrease computational power by up to 77.5%. Furthermore, the research supported the statement of other researchers that CP improved and can solve large instances in a reasonable time. Additionally, CP is easy to adapt to different constraints and goals, which makes them dynamically adjustable for industrial requirements. The detailed performance evaluation of the two tested CP solvers showed great potential for selecting between both solutions, depending on the instance. Machine learning algorithms evaluated the performances of existing methods based on the characteristics of a given scheduling instance and achieved up to 88.9% precision in predicting the optimal solver for a given scheduling problem.
In the future, further parameters such as job priorities, real-time AGV location data, and battery constraints could be included. Additionally, the application of more objectives could be of great interest, especially when it comes to how this influences the performances of algorithms among various problem instances. This work concentrated on the implementation of constraint programming to solve the scheduling problem. Future research could test more algorithms, such as tabu search or genetic algorithms, to compare the performances among the algorithm categories.