1. Introduction
Air traffic planning plays an important role in the planning and the management of an airline [
1,
2,
3]. With the globalization of economies and the rapid development of information and technology, the choice to travel by air and the use of airplanes as a means of transportation are becoming increasingly common today [
2,
4,
5,
6]. Before the coronavirus pandemic, air transport demand had been growing consistently. In 2019, the number of flights boarded by passenger airlines in mainland China reached 4.611 million, representing an increase of 6.1% over the previous year [
7,
8]. In the same year, the Boeing Company estimated that the civil aviation industry will need 790,000 new pilots in the next 20 years, with a scarcity of pilots, meaning that the number of monthly flights is projected to far outweigh the number of crew members. Therefore, it is likely that some flights will not have crew matches, which is an issue that cannot be solved in a short time because it takes an average of seven years for a general pilot to become a captain and costs between 0.2 and 1 million dollars to train an individual to this level [
9,
10]. Although the coronavirus pandemic has had a huge impact on the air transport industry, it is expected that air traffic will recover quickly [
7,
11]. The rapid increase in the demand for airline transportation and air traffic has led airline companies to use their resources more effectively, and the need for operation optimal planning has arisen [
12,
13,
14,
15,
16]. Airline operation plans include a route plan [
17,
18], flight plan [
19,
20], and crew scheduling plan [
21,
22,
23], among which crew scheduling is an important part. The aircraft and crew are two fundamental factors to consider in the operating costs of the airline industry; the flight is the direct means of income of the airline, and crew expenditures represent their second-highest source of spending after fuel costs [
21,
24]. Airline crew scheduling is related to the flight and crew, and it is significant in the management and decisions of commercial airlines. Therefore, how to develop a high-quality crew scheduling methodology with a limited number of crew members but a large number of flights is an important issue [
25,
26,
27,
28]. High-quality crew scheduling means a more rational crew schedule that minimizes total costs and maximizes total profitability, which is particularly important for commercial airlines to optimize their operation costs. For example, a 1% increase in flights meeting crew configurations or 1% decrease in crew scheduling equates to billions of dollars in additional profits in large commercial airlines [
26,
29,
30,
31].
Airline crew scheduling is the process of assigning flights to pilots and flight attendants within a flight plan in accordance with the relevant Civil Aviation Administration (CAA) airworthiness documents and a set of regulations set by the specific airline. The crew scheduling problem concerns the development of a crew schedule for a specific period (generally a month) under the condition that the flight information and the crew information are known; schedules should make clear when and where each crew member will perform what task in which flight so as to minimize the total cost and maximize the total profit of the flight [
32]. In view of crew scarcity, the number of planned monthly flights exceeds the availability of crew members, which could lead to some flights not having a crew match, meaning that crew configuration of the flight cannot be met. In this paper, a high-quality crew scheduling method is developed to ensure as many flights as possible are made to meet the crew configuration. The airline crew scheduling problem is a constraint integer linear programming problem, which is NP-hard [
33,
34]. Because the flight and the crew that satisfy the constraints can be matched, the possible matching combination increases exponentially as the number of flights and crew increases. Finding a viable crew scheduling solution manually is almost impossible, leading to difficulty in finding a cost-effective solution [
25,
35].
In the definition of airline crew scheduling, a flight is considered as the take-off and landing of an aircraft. The tasks performed by the crew are divided into fly and deadhead. Fly refers to the crew that includes the captain or co-captain of the flight, while deadhead refers to the crew who board as passengers to take a flight to another airport to perform fly tasks. Duty consists of a series of flights (fly or deadhead) and intervals of rest. Pairing defines a series of duties and layovers that begin and end at the same base (the airport where the crew is stationed) [
25]. Traditionally, the airline crew scheduling problem is divided into two sub-problems that need to be solved sequentially, namely CPP and CRP [
21,
36,
37]. The goal of CPP is to find a feasible set of pairings that cover a given set of flights at the lowest cost [
26]. CRP is the process of assigning all necessary staff members by using the pairing generated by CPP to create a work schedule for each crew member [
38,
39]. Alternatively, the CPP and the CRP could be integrated into a single problem [
29,
40,
41,
42].
However, traditional crew scheduling models and solvers face the following three problems:
A higher time complexity. An NP-hard large-scale integer programming problem is decomposed into two large-scale integer programming sub-problems, resulting in almost double the time complexity [
43,
44,
45].
Poor generalization and reliance on the feasible sub-solutions generated by initialization. Once faced with the new flight dataset, it is necessary to create a large number of feasible sub-solutions that meet the constraints, which does not facilitate parallel calculation and consumes nearly 80% of the computation time of the total program [
29,
31,
46].
Due to the need to store the large number of feasible sub-solutions and branch-cut tree searches, they usually need to run on servers equipped with professional multi-core and multi-threaded CPUs, such as Intel Xeon processors [
21,
26].
In order to alleviate the problem of time-consuming initialization, we propose an integrated crew scheduling solver based on the parallel genetic algorithm. The randomly generated flight sequence is defined to guarantee that each flight is represented as a 0–1 independent variable. The value of each position in the flight sequence corresponding to this flight successfully matched with the crew is equal to 1; otherwise, it is 0. After four optimization processes of matching the flight with the crew, the operation of removing ‘0’, returning to base, and pairing adjustment to the flight sequence, we obtain the maximum number of flights successfully matched with a crew as the fitness value and the qualified flight sequence. The genetic algorithm [
9,
27,
47] performs evaluation, selection, crossover and mutation for the flight sequence to calculate the optimal flight sequence. MPI is used to divide the population into multiple threads for parallel calculation [
48,
49]. Drift strategy performs information exchange between subpopulations in parallel calculation. We integrate the above algorithms and develop the computational framework PGAF. Based on the parallel genetic algorithm and innovative crew scheduling algorithm, our heuristic solver achieves global random search iterative optimization and parallel calculation to match and optimize flights and crew directly. Our solver avoids spending multiple hours on initialization to generate feasible sub-solutions and improves the generalization with lower time complexity and model complexity. Meanwhile, the parallel optimization efficiency depends on the number of processor cores and running memory, but it can be run on ordinary personal computers and maximizes the use of processing cores and running memory to improve optimization efficiency. Its advantages can be exploited to a greater extent in formal laboratory clusters.
The integrated crew scheduling solver with the computational framework PGAF performs a parallel calculation of innovative the crew scheduling algorithm and genetic algorithm to ensure that the number of flights that meet the crew configuration can be maximized and then improves optimization efficiency. It avoids high time complexity and model complexity, which has high generalization and is suitable for research in common experimental environment. Compared with the traditional genetic algorithm and commercial solvers, such as CPLEX and Gurobi, which are at the core of the traditional crew scheduling model, our solver reduces the computation time by 16.57–85.82% on a small-scale dataset with 206 flights. Meanwhile, a more than 45.75% reduction in calculation time is achieved on a large-scale dataset with 13,954 flights, requiring only 284.59 min. More importantly, it achieves a near−optimal flight matched percentage of 97% and higher crew utilization ratio of nearly 44 flights than traditional models with a DCA [
40] and GENCOL column generation library with the commercial solver CPLEX [
26], which is more suitable in solving crew scarcity and reduces crew costs.
The remainder of this paper is structured as follows.
Section 2 is the literature review on airline crew scheduling.
Section 3 describes the construction of the integrated airline crew scheduling model, including its assumptions, objectives and constraints.
Section 4 introduces the parallel genetic algorithm in crew scheduling, including the innovative crew scheduling algorithm, genetic algorithm, MPI multi-thread parallel calculation, and drift strategy.
Section 5 demonstrates the calculation results, and we provide the conclusions in
Section 6.
3. Airline Crew Scheduling
3.1. Model Assumptions
In order to make airline crew scheduling more applicable and to ensure it is in compliance with the real common needs of different commercial airlines, we make the following assumptions for our crew scheduling model:
Only pilots are considered for the crew of our model. Pilots, flight attendants, flight police, etc., are not interchangeable. However, due to the constraints and commonality of their scheduling, our crew scheduling model is equally applicable.
Crew scheduling only considers the human resources planning, without considering airline hardware resources, such as fleet resources, ground facilities, communication and navigation facilities.
Flight scheduling is determined without considering flight delays in our model. That is, every flight departs and arrives on time according to its schedule. All crew members work according to schedules and scheduling plans, without personal preferences, contingencies, etc.
3.2. Problem Analysis
Based on the feasible path method, is denoted as the flight number sequence of pilot k performing fly and deadhead. is the flight number sequence of pilot k only performing deadhead. is denoted as the flight number at the mth position in . If pilot k performs deadhead on the flight with flight number , then , otherwise . Assuming that each flight only needs one captain and one co-captain to satisfy the crew configuration, there are , where NC is the set of pilot.
For duty, the sequence is introduced to record the position, where each duty starts in . is denoted as the number at the rth position in , which is the position where the rth duty starts in ; therefore, . We note that and we know that the flight with the flight number from to in is a duty of pilot k; therefore, .
When dividing the pairing for crew scheduling, the sequence is introduced to represent the starting position of each pairing in the duty sequence . is denoted as the number at the sth position in , which is the position where the sth pairing starts in ; therefore, and we define , so we find that the flight with the flight number from to in is a pairing of pilot k, which means that .
3.3. Model Description
With the changing attitude toward COVID-19 in various countries and regions, the air transport industry will gradually return to normal levels [
7,
8,
9,
10,
11]. In view of crew scarcity, the number of planned monthly flights exceeds the availability of crew members, which means that it is possible for some flights to not have crew matching; therefore, the necessary crew configuration of the flight cannot be met. In this paper, a high-quality crew scheduling method is developed to ensure that as many flights as possible meet the necessary crew configuration. The crew scheduling problem is transformed into the following optimization problem with constraints:
According to the financial revenue of commercial airlines, the primary goal of airline crew scheduling is to maximize the number of flights that can satisfy the crew configuration in crew scarcity [
52,
53].
is the total number of flights meeting the crew configuration; therefore, our objective function is to maximize
:
The connection time between two adjacent flights of pilot
k to have interval rest not less than
:
The constraint condition of flow conservation.
is used to measure the indegree of airport
i in
of pilot
k;
is used to measure the outdegree of airport
i in
of pilot
k. According to the requirements, no aircraft will stay at airports without base attribute and they should come back to their base, so the airport’s outdegree is equal to the indegree. This ensures all aircraft and pilots eventually return to their bases:
Comparing the start time of two adjacent duties with their dates means they cannot be on the same day.
The rest time between two adjacent duties of pilot
k is not less than
:
The maximum time of each duty of pilot
k is not more than
:
The maximum flight time of each duty, because the time when the pilot performs deadhead belongs to the time of duty, but not the fly time of duty, has to subtract the deadhead time
on duty:
The maximum total time of pairing is not more than
in the crew scheduling:
The interval rest time between two adjacent pairings of pilot
k is not less than
:
For the upper limitation of continuous duty days, the continuous duty can only belong to the same pairing due to the limitation of the minimum rest time between pairings. For , if continuous duty days more than , the crew scheduling plan of this pilot does not meet the requirements.
Table 1 is a summary of the variables and their explanations mentioned in the above mathematical model:
4. The Proposed Airline Crew Scheduling with Parallel Genetic Algorithm
The mathematical model of airline crew scheduling proposed in the
Section 3 is solved in this section. An innovative crew scheduling algorithm is proposed, which contains four optimization processes of matching the flight with the crew, the operation of removing ‘0’, returning to base, and pairing adjustment, to adjust the randomly generated flight sequence. After adjustment, we obtain the maximum number of the flight matched with the crew as the fitness value and the qualified flight sequence. The genetic algorithm performs evaluation, selection, crossover and mutation for the flight sequence to calculate the optimal flight sequence and maximum number of flights that meet the crew configuration. MPI is used for multi-thread parallel calculation to improve the calculation efficiency of the genetic algorithm. The drift strategy leads to the effect of information exchange between subpopulations after mutation in genetic algorithm parallel calculations. Algorithm 1 is the mathematical model for airline crew scheduling using the parallel genetic algorithm.
Algorithm 1 Heuristic crew scheduling algorithm. |
- 1:
Process data to meet input standards - 2:
Generate the initial population - 3:
Divide the subpopulation of thread - 4:
Match crew and flight according to flight sequence - 5:
Adjust the flight sequence - 6:
for… do - 7:
if all flights are matched or then - 8:
return flight sequence with the highest number - 9:
break; - 10:
else - 11:
Calculate the number of flights after adjustment - 12:
Perform selection, crossover, mutation - 13:
Execute Drift Strategy on subpopulation - 14:
Update flight sequence as new population - 15:
end if - 16:
- 17:
end for
|
The following section details the process of solving the model.
Section 4.1 describes the preprocessing of flight data, which standardizes the flight data and crew data.
Section 4.2 provides an introduction to generating the initial randomly generated flight sequence (generating initial populations) and the input of the genetic algorithm.
Section 4.3 explains the process of matching the crew member and the planned flight based on the randomly generated flight sequence.
Section 4.4 shows the three detailed steps used to adjust the randomly generated flight sequence, which ensures that the maximum number of flights meet the crew configuration under each flight sequence.
Section 4.5 provides a description of the parallel genetic algorithm, with the construction of the genetic algorithm, MPI multi-thread parallel calculation, and drift strategy.
4.1. Data Preprocessing
Sort the planned flight data from smallest to largest by departure time. (Note: in this paper, flight rank number refers to the position of the sorted flight in the randomly generated flight sequence and the planned flight data , and flight number refers to the number of each flight in .)
Assume that there are m crew members in , and each crew member is recorded as . For the crew information data , we add the following six kinds of information:
: The list of performing fly and deadhead.
H: The list of only performing deadhead.
: Departure time.
: Arrival time.
: The passing airport for each flight performing fly and deadhead (the first passing airport of each flight is its base).
: Duty list.
4.2. Generate the Initial Randomly Generated Flight Sequence—Initial Population
Suppose there are
n flight in
, and
is the list that records whether each flight takes off, so
is an
n-dimensional vector. We write down the take-off situation of the
ith flight in
as
, and its value is as follows:
So
, and
is the chromosome in the genetic algorithm. If 10
is required as the initial population
in the genetic algorithm, then
; its population size is
= 10.
Figure 1 shows the population of the genetic algorithm.
4.3. Match the Crew and the Flight
We searched
from the beginning. When the
ith position in
is retrieved, if
= 0, the flight does not take off, and crew matching is not required. If
= 1, we will verify whether the flight with
=
i can satisfy the crew configuration, and we also verify whether it is possible to find qualified crew members. The matching is based on the fact that the currently located airport of the crew member is the same as the departure airport of the flight and the crew is allowed to perform the fly task at the departure time point of the flight. If the flight cannot meet the crew configuration requirements, it is not allowed to take off and
is changed to 0. If successfully matched, i.e., if the crew can be found to meet the crew configuration requirements, we select the crew with the lowest work cost from the crew members who meet the requirements, and update the information of the selected crew in
. The pseudo code is shown in Algorithm 2.
Algorithm 2 Match the crew and the flight. |
- 1:
- 2:
fordo - 3:
if then - 4:
- 5:
else - 6:
if (the set of crew meet requirements) then - 7:
- 8:
else - 9:
Select the crew with lowest cost in C - 10:
Update these crews’ information in - 11:
end if - 12:
- 13:
end if - 14:
end for
|
4.4. Adjust the Randomly Generated Flight Sequence
4.4.1. Operation of Removing ‘0’
We traversed
, as adjusted in
Section 4.3. Suppose
= 0, that is, the flight with
=
i does not take off. First of all, we find that the crew can match this flight from
, which is adjusted in
Section 4.3. If we find out the crew requirements, we update their information.
If we cannot find out the crew match for this flight, traverse forward from the
position of
to find the flight from the airport with base attribute to the departure airport of this flight that meets the time requirement. Suppose that the flight with
=
j meets these requirements. Find the crew who can perform fly in the flight with
=
j and perform deadhead in the flight with
=
i from
adjusted in
Section 4.3. If the flight and the crew meeting above requirements are found, update
and change
to 1. If it cannot be found after the traversal is completed, the flight cannot take off.
4.4.2. Handling of Crew Returning to Base
Adjust for each crew member finally needs to return to their base. After successfully matching with through above operations, we also need to meet the constraint for each crew to initially depart from base and eventual return to base. We find that the final airport of some crew members is not the base they belong to in . So we will return the unqualified crew members to their base by performing fly or deadhead tasks. Assuming that does not meet this requirement, the following is how to deal with it.
Solution 1: Find the flight with the closest departure time to the time point when is allowed to perform the task and meet the requirement that the flight that departs from the airport is located to his base in adjusted above. If the sign of this flight is = 0, performs the fly task on this flight and we change the sign of the flight to = 1. If the sign of this flight is = 1, then performs the deadhead task. Finally, update the information of according to the above situation.
Solution 2: If the flight in the above situation does not exist, the crew will not be able to return to his base by performing fly or deadhead. We find the last of in his and delete all information about this flight from , so that this flight does not meet the crew configuration. We modify to 0 in . The information of is used for performing the previous flight. If the final airport of the crew at this state is the base he belongs to, the requirements are met; otherwise, we apply Solution 1.
When using Solution 2, we still cannot solve the problem of
eventually returning to his base; repeat the solution and the worst result is that
does not match any flight. The pseudo code of
Section 4.4 is shown in Algorithm 3.
Algorithm 3 Adjust the randomly generated flight sequence. |
- 1:
- 2:
- 3:
fordo - 4:
if then - 5:
- 6:
else - 7:
if (the set of crew meets requirements) then - 8:
Select the crew with lowest cost in - 9:
Update these crew information in - 10:
else - 11:
- 12:
while do - 13:
if = and then - 14:
Update crew information in - 15:
continue; - 16:
else - 17:
- 18:
end if - 19:
end while - 20:
end if - 21:
end if - 22:
end for - 23:
fordo - 24:
while do - 25:
for do - 26:
if and then - 27:
Update these crews’ information in - 28:
break; - 29:
end if - 30:
if then - 31:
Delete the last flight information of - 32:
end if - 33:
end for - 34:
end while - 35:
- 36:
end for
|
4.4.3. Pairing Adjustment
We adjust the crew that does not meet the constraints of pairing in after the above operations. Each crew has a duty list , which is used to record the information of each duty. According to the constraint that both the departure and arrival airport belong to the base of the crew, is divided into pairings, and each pairing is independent of the other. After dividing into pairings, judge whether each pairing meets the constraints, find out whether the pairings do not meet the constraints and find the other crew members who can complete the pairings. If there are no other crew members who can complete them, this input is illegal.
4.5. Construction of Parallel Genetic Algorithm
threads are allocated in CPU by MPI; each thread is responsible for calculating chromosomes, that is, the total population is divided into subpopulations. Each subpopulation contains solution. On the premise that the number of cores occupied by each thread is an integer, each thread performs parallel computations on allocated cores and memory.
The fitness function of the Genetic Algorithm is
, and
x is the randomly generated flight sequence before processing
. We remember the randomly generated flight sequence after it is adjusted as
, and the fitness function can be expressed as follows:
It is the number of 1 in .
Calculate the fitness function and sort from large to small according to the fitness value for all chromosomes in each subpopulation . Record each maximum value and its chromosome . Note the global maximum fitness value and the corresponding chromosome .
Figure 2 demonstrates the iterative process of fitness function calculation, selection, crossover and mutation in the genetic algorithm. The selection strategy of the genetic algorithm is roulette wheel selection. The probability of each chromosome in the population being selected for genetic operations is proportional to its fitness value. In crossover, we use a single-point crossover method, which obtains two different sub-chromosomes by selecting two chromosomes, dividing them at a randomly selected position, and exchanging their right part. The traditional method of single gene mutation is used in mutation.
After the above four operations are completed, we need to exchange information between each subpopulation. The drift strategy is to replace the first-to-last, second-to-last and
-to-last solution in the subsequent
subpopulations with its optimal solution, respectively.
Figure 3 shows the drift strategy, supposing there are a total of four subpopulations and
.
For each new subpopulation obtained, if the maximum fitness value of the subpopulation is greater than , update and ; otherwise, there is no need to update . It is the same for and .
6. Conclusions
We improve the traditional airline crew scheduling model by integrating the different problems into a single problem, and match the flight and crew that meet the requirements by performing local-scope optimal search and parallel calculation. A series of operations are innovatively proposed to ensure the number of flights that meet the crew configuration can be maximized. The genetic algorithm calculates the maximum number of flights that satisfy the crew configurations in each by iterating through the global search. The drift strategy provides higher efficiency, stability and faster convergence of airline crew scheduling optimization with the information exchange effect between subpopulations. Using multi-thread parallel calculation with MPI greatly reduces the calculation time, improves our solver efficiency and optimizes the high time complexity of generating initial feasible clusters in traditional methods. We integrate the innovative crew scheduling algorithm, genetic algorithm, drift strategy, and MPI into a computational framework PGAF, which is the synthesis of all algorithms and innovations in this paper. In the experimental results of Dataset A and Dataset B, the maximum number of flights after adjustment is significantly improved by 3.40% and 3.28%, respectively, with the genetic algorithm. The parallel genetic algorithm greatly shortens the calculation time and improves our model calculation efficiency by 2.5 to 5 times. Furthermore, the iterative convergence of parallel computation is faster, and the error rate between the average optimal value and maximum optimal value is lower and more stable with drift strategy. Compared with the traditional genetic algorithm, commercial solver CPLEX and Gurobi, the parallel genetic algorithm has significant optimization efficiency in terms of calculation time due to a reduction of 16.57–85.82% for time complexity and the maximum number of flights that meet the crew configuration being much higher. A comparison with the methods of the traditional model shows that our solver has good scalability and stability and that the flights-matched percentage is high and stable, of up to nearly more than 97% both in small and large datasets. Furthermore, fewer crew members are required and the crew utilization ratio is higher by our solver, achieving almost 44 flights per month; therefore, it could be universally applicable in times of crew scarcity. It has lower requirements for the software and hardware, which is more suitable for a general experimental research environment and has a strong generalization ability.
7. Discussion
Traditional airline crew scheduling is divided into two sub-problems and solved sequentially. We improve the time-effectiveness by integrating airline crew scheduling into a single problem, which has been the goal of some scholars in recent years. The experiments show that it alleviates the problem of time-consuming initialization and has a lower time and model complexity, resulting in better generalizability.
The airline crew scheduling problem is a type of resource scheduling problem. Using the solver based on the parallel genetic algorithm for an integrated solution means that the constraints of time costs of CPLEX and the commercial solver Gurobi are overcome so as to obtain a large number of feasible solutions for comparative optimization. Compared with the traditional intelligent optimization method, the genetic algorithm has the characteristics of multivariate coding processing and random optimization iteration, and parallel computing greatly shortens the total calculation time and improves the efficiency of our solver.
For MPI used in parallel calculation, if there is a laboratory cluster with multiple high-configuration hosts sharing a large running memory server, the optimization efficiency will be greatly improved.
We assume that if an airline’s scheduling is relatively stable, then we can use the machine learning method to train a scheduling prediction model that can predict the next most likely flight based on the information of the previous flight, using the scheduling of previous years as training data.