1. Introduction
The passenger load factor (PLF) is a metric that refers to the percentage of seats that are occupied by passengers. It also determines how well an airline’s demand and capacity management operations are working, so it is commonly used to assess efficiency and effectiveness.
Although there have been several studies on the airline business, only a few have attempted to estimate the load factor parameter that is commonly used to describe an airline’s capacity management performance [
1]. The load factor for the American airline industry was calculated using the number of travel agent locations using each computerized reservation system, the average length in miles of all airline flights between city pairs, the number of departures for each carrier, advertising expenses, and the change in vehicle miles. The number of departures for each carrier, as well as the cost of advertising, were found to be significant factors in explaining the load factor for American Airlines [
2]. Pegels and Yang tested a domestic air transportation model in the United States. In this model, the load factor is a dependent variable that is determined by available seat miles, total assets, and advertising costs [
3]. In another paper, two gravity models are presented for estimating air passenger volume between city pairs, which can be applied to city pairs where no air service is established, where historical data is not available, or for which factors describing the current service level of air transportation are not accessible or accurately predictable, and both models show a good fit to the observed data and are statistically tested and validated [
4].
Stochastic models have been established to determine load factors, which is the best-fit trend for Europe’s North Atlantic and mid-Atlantic flights in the Association of European Airlines [
5]. The load factor has both periodic and serial correlations. For serial and periodic correlations, two distinct models were developed. The Prais–Winsten technique for serial correlations and dynamic temporal effects for periodic correlations were later integrated into the models [
6].
In the study of airlines, there are lots of restrictions that can be considered while explaining load factors. The inherent uncertainty associated with predicting these restrictions is what makes this challenge so difficult, leaving human decision-makers dependent on expertise to develop efficient air traffic capacity management measures. Some of the variables are the reservation system, the average length of miles between city pairs, and the number of departures for each carrier. [
7]. The price and demand balance is taken into account with the gradual sale of the tickets to maintain a high load factor rate, as well as the integrity of network connections [
8]. Based on the first investigation, it was discovered that passenger load varies significantly depending on parameters such as airline type (full-service vs. low-cost), aircraft type, destination, etc. [
9].
In airline operations, knowing the future values of the load factor that has been agreed upon to identify the line is critical. This line is determined by several things. One of them is that aviation operations are controlled in a loop, and the load factor in the aviation industry is not stable [
10]. Another is that airlines use dynamic capacity management to meet and absorb demand, and the airlines can alter the type of aircraft they use at any time [
11]. The consumer is strong due to the fierce competition between end-market free-market airlines and the online reservation system (ORS). As a result, dynamic decisions are made based on airway efficiency, which affects the load factor [
12,
13,
14].
As mentioned above, many parameters and issues need to be taken into account during modeling, analyzing, or monitoring the current status of the airlines and the prediction of forthcoming days. In the present study, fractional calculus is employed for modeling since such a method provides more flexible mathematical modeling. Fractional calculus, which is more broadly defined as the calculus of integrals and derivatives of any arbitrary real or complex order, arose from a query posed to Wilhelm Leibniz (1646–1716) by French mathematician Marquis de L’Hopital (1661–1704) in 1695 [
15,
16,
17,
18]. Marquis de L’Hopital asked what would happen if the derivative order becomes 0.5. “This is an apparent paradox from which, one day, useful consequences will be drawn…” was Leibniz’s response to the question [
19].
Over the last 50 years, many people with different professions have shown that fractional derivatives and integrals contain crucial information about the systems they are investigating. The fractional derivative, in particular, gives useful information for complex problems, processes with memory, and heredity. In control theory, mechanic, economic, finance, and electromagnetic fractional calculations are frequently utilized [
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25]. AlBaidani’s study, for instance, analyzes the time-fractional Kawahara and modified Kawahara equations, which are crucial for modeling nonlinear water waves and signal transmission. There are two new techniques offered: the homotopy perturbation transform and the Elzaki transform decomposition. To handle nonlinear terms efficiently, fractional derivatives in the Caputo sense are combined with Adomian and He’s polynomials [
26]. In a further study, modeling based on fractional-order derivatives is utilized to predict future trends in confirmed cases and deaths from the COVID-19 outbreak in India through October 2020. A mathematical model based on a fractal fractional operator is built to explore the dynamics of the disease epidemic, accounting for various transmission channels and the role of environmental reservoirs [
27]. Another study explains how to create mathematical models for claydate constructions that consider the material’s fractal structure to solve the heat conduction problem. The use of fractional order integro-differentiation machinery explains its fractal structure [
28].
A review article focuses on variable-order fractional differential equations (VO-FDEs), which are utilized to explore dynamics dependent on time, space, or other variables. The study aims to survey the recent literature on definitions, models, numerical methods, and applications of VO-FDEs [
29]. Podlubny’s book provides thorough coverage of essential topics such as special functions, the theory of fractional differentiation, analytical and numerical methods for solving fractional differential equations, and practical applications. This is useful for those seeking more in-depth information about fractional derivatives and fractional-order mathematical models. It serves as a resource for both pure mathematicians and applied scientists, offering inspiration for additional study and the rapid application of concepts from fractional calculus [
30]. We developed a mathematical approach in this work that uses fractional calculus theory to analyze and explore the components that are continuously related to the modeling and forecasting of the passenger load factor. Using information from two years of reservations, group sales data, calendar information, weekly dates, trend difference between the current year and the previous year, past load factor information, and load factor information from the same period of the previous year, we developed the linear model. In the fractional Model-3, DAM, and DAM wd models using the reservation rate data of the year 2016, by the least-squares method, we developed a continuous curve valid for any time interval. The theory, numerical results of the proposed theory, and comparisons with various modeling approaches including machine learning (linear) are discussed in this work.
In forecasting, two main stochastic numerical methods are highly useful: the first one is time series, and the second one is regression analysis. Nominal variables may have varying values inside each time unit in a time series. Factors are another name for nominal variables—for instance, a location, a country, a profession, a province, per capita income, unemployment rates in various nations, job-based income, and so on. The factor effect cannot be accurately measured using a time-series analysis. On the other hand, regression analysis does not account for the dynamic effect of external variables on internal variables. Panel data and the least-squares method appear to be a good combination [
25].
PLF is calculated by dividing the revenue passenger kilometers (RPK) by the available seat kilometers (ASK). Hence, assuming the capacity of an airline remains the same, an increase in RPK is directly proportional to an increase in PLF. Its formula is shown in Equation (1) below:
The set of objectives of this study can be itemized as follows:
After the flights are opened for sale, predicted PLF on boarding times;
Understanding the seasonality,
Revealing the special day’s effects,
Revealing the previous flight’s behaviors,
Board-off country and area-based predictions.
This paper is the first one to adapt the fractional Model-3, DAM, and DAM wd model to air PLF and on the comparison of these models. In our research, we determine the finest fractional order value of the derivative for each factor, which will allow us to develop effective modeling. We also build a sample application using air PLF market area and flight data to put the mathematical models into practice.
The formulation for the modeling and prediction will be provided in the next section. Applications will be provided in the third section. The results and figures to show the comparison of the proposed models are provided in the fourth part.
3. Numerical Results
With a linear regression study, it was aimed to forecast and model the passenger load factor (PLF) by using the information on the 2015–2016 year reservations, group sales data, calendar information, weekly dates, and trend difference between the current year and previous year, and load factor information of the same period of the previous year of Turkish Airlines. To cover all flying points, we chose 19 market areas to display our modeling results. All analysis and reports for linear modeling have been developed with the R programming language for portability and with Matlab for Fractional Model-3 and DAM and DAM wd modeling. In this section, we will discuss the numerical results of our four models.
The linear regression model gives meaningful results when looking at R
2 values as shown in
Table 6 below. The high variability is 60%, which is seen in the aviation sector and above that R
2 (the coefficient of determination) gives consolidated results for the 30-day forecasts. The model does not explain the uncertainty very well for the flights with 120 days or more remaining before their departures.
The model results were obtained using the 2016 data for testing. The mean absolute deviation statistic is used for model validation of the linear regression method. The mean absolute deviation (MAD) is a reliable measure of the variability of a single-variate sample of quantitative data in statistics. It can also refer to a population parameter generated from a sample and approximated by the MAD.
Table 7 shows the validation findings for the machine learning–linear regression method.
The Fractional Model-3, DAM, and DAM wd modeling results were obtained using the 2016 reservation rate data (real load factor data) to find the optimum modeling method. We used modeling days as , RES1, RES2, RES3, RES4, RES5, RES6, RES7, RES8, RES9, RES10, RES12, RES15, RES20, RES30, RES45, RES60, RES90, RES120, RES200, and RES340 days to flight. We have used the same and constants for the comparison.
For model validation, the mean absolute percentage error (MAPE) is utilized. MAPE is a reliable measure of the variability of quantitative data samples in statistics. In
Table 8, MAPE and optimum α value results for Fractional Model-3 are given. For
= 15, we have found the least AMAPE value to be 0.734.
Then, we used the DAM model and found the results for different
and
values. We have found the least AMAPE value to be 0.853 for
= 7 and
= 3, as shown in
Table 9.
Then, we used the DAM wd model, which is the first-order derivative-added version of the DAM model, and found the results for different
M and
l values. We have found the least AMAPE value to be 0.571 for
= 7 and
= 3, as shown in
Table 10.
In
Table 8,
Table 9 and
Table 10, we have listed 19 market route areas of Turkish Airlines—for instance, in eastern Africa–Turkey; the departure area is eastern Africa and the arrival area is Turkey. First, Fractional Model-3 was used in this work to compare different
exponent values up to 5, 9, and 15 (
= 5, 9, 15) in Equation (5); we have found MAPE and α value results as shown in
Table 8. For
= 15, we have found the most accurate results that are approximately the same with the real reservation rate data. Then, we used another model, which is DAM (deep assessment methodology) for different
and
values that are chosen arbitrarily to find the optimum
values that correspond to the least MAPE values for the reservation rate data and also DAM wd, which is the first-order derivative-added version of the DAM model. Here,
defines how many truncations are performed, and the
value shows how many values to model every time. For instance, using value
= 2, it means that the data between 340 days to flight and boarding days were modeled using data from the previous two values every time. Also, our Matlab code changes
values between 0 and 1 automatically in seconds to find the optimum modeling results, as shown by the MAPE values in
Table 8,
Table 9 and
Table 10 provided below.
The models were compared using the mean absolute percentage error (MAPE). The MAPE formulation is shown in Equation (17).
where
is the real value, and
is the predicted value.
DAM and the fractional model are theoretically identical when the fractional order value in the fractional model equals one. The alpha values in the fractional model were taken between 0.001 and 1 and increased by 0.001. As a result, some results in the DAM model and the fractional model may be the same. The minimal MAPE values were used to determine the alpha values.
Table 8,
Table 9 and
Table 10 show MAPE values of the load factor versus the number of days to flight according to the methods of fractional calculus, DAM, and DAM wd model. Considering
Table 8, when the truncation number
in Equation (5) was increased, the MAPE ratio in fractional models was decreased as expected. The average of the total MAPE (AMAPE) was calculated with the formula given in
where
represents the number of values, which was 19.
To compare the performance of the DAM model, DAM wd model, and Fractional Model-3, we calculated the MAPE ratio for each. The MAPE of each DAM model was divided by the corresponding Fractional Model-3′s MAPE, and the maximum and minimum values were established as benchmarks. The DAM model’s MAPE results were 0.68 to 2.39 times smaller than the fractional model’s MAPE. The DAM wd model’s MAPE results were 0.33 to 1108 times smaller than the DAM model’s MAPE. Approximate values were obtained for Fractional Model-3 ( = 15) and DAM model ( = 7 and = 3). When N equaled 9, the fractional model’s AMAPE was 1.407, while the DAM model ( = 3 and = 3) had an AMAPE of 2.14. For DAM ( = 5 and = 3), the results were about 1.16 times better than the fractional model = 9. At N = 5, the fractional model’s AMAPE was 3.951, and the DAM model ( = 3 and = 3) reported an AMAPE of 2.14. The DAM wd model ( = 7 and = 3) outperformed the fractional model ( = 15) by about 1.43 times in the highest AMAPE values.
At = 9 and = 15, the fractional model surpassed the DAM model for DAM ( = 3 and = 3). Using the fractional model, DAM model, and DAM wd model yielded better outcomes with fewer computational steps. In other words, the DAM and DAM wd models achieve the same MAPE with fewer terms compared to the Fractional Model-3. DAM wd ( = 7 and = 3) had the lowest AMAPE value at 0.571, superior to DAM wd ( = 3 and = 3) and DAM wd ( = 5 and = 3). Lower AMAPE values were evident compared to the DAM and Fractional Model-3 modeling.
To visualize our modeling results, we acquired results using linear regression (machine learning), Fractional Model-3, DAM wd, and DAM model, respectively. We compared the modeling results of four modeling methods with real reservation rates that are shown in
Figure 1,
Figure 2,
Figure 3 and
Figure 4. It is clear that until 120 days before the flight, linear regression gives mostly the results of the previous year’s load factor rate to this year because the reservation pattern is not obtained regularly. After 120 days, with the increment of the reservation rates, we can find more accurate results for the load factor. Because of that, we obtain the more accurate results that are shown in our figures for 120 days to the boarding day.
Figure 1,
Figure 2,
Figure 3 and
Figure 4 depict how the load factor value varies depending on the number of days until the flight in various locations. The last point on the graph, read from right to left, provides us the load factor value of the boarding day. Flights to market areas like the North Atlantic, South Atlantic, Far East, and South Africa, especially when considering the distance, demonstrate that planes are starting to fill up, even if they have more time to take off. We can easily see when flights from Turkey to Turkey, that is, domestic flights, begin to fill up, especially recently. Flights to different destinations exhibit different patterns, as may be observed from these inferences. For capturing these trends, the panel data format is critical. The vertical lines demonstrate how the reservation days are subject to change. If we look at the South Atlantic–Turkey destination, for example, each booking day has very varying values. As a result, forecasting will be difficult, and variations will be significant. DAM wd, DAM, and Fractional Model-3 provide similar modeling results. The linear regression method gives more accurate results for 340 days to flight because it uses the previous year’s reservation rates and other factors like day of the week, etc. Before 200 days to flight, four modeling methods obtained similar differences with real reservation data. While the departure day is soon, when more reservation data are included in the dataset, more accurate modeling results are obtained. Especially in the last 30 days to departure, all methods have more approximate modeling results, although there are cancellations and reservation changes.
For the Northern Africa–Turkey, Turkey–Northern Africa, North Atlantic–Turkey, Turkey–North Atlantic, Middle East–Turkey, and Turkey–Middle East market areas, Fractional Model-3 (yellow line), DAM (purple line), and DAM wd results (green line) are more approximate to real reservation rates than the linear regression (red line) modeling results, as shown in
Figure 2.
For the Europe–Turkey and Turkey–Europe market areas, the DAM results (yellow line) are more approximate to real reservation rates than the linear regression (red line) modeling results, as shown in
Figure 3. In the Far East–Turkey market area after 30 days to flight and for Turkey–Far East after 15 days to flight, the linear regression (red line) modeling results are more approximate than the DAM modeling results. For Eastern Africa–Turkey after 80 days to flight and for Turkey–Eastern Africa after 18 days to flight, the linear regression (red line) modeling results were more approximate to real reservation rates than the DAM modeling results.
For domestic flights within Turkey, the DAM results (yellow line) and DAM wd results (green line) are more approximate to real reservation rates than the linear regression (red line) modeling results, especially 30 days to the departure date, as shown in
Figure 4.
The DAM wd technique was used to model real reservation flight data with three distinct M and l values in the following phase because it provides the most approximate modeling results among the four modeling approaches (
Table 11). We used the same constants to model market areas. For market areas and sample flights, the modeling outcomes using DAM wd with the same constants are quite similar. The lowest MAPE value was the 0.11 obtained for
= 7 and
= 3 for flight number 45. Predicting flight results is challenging due to varying values on each booking day. Linear regression predominantly relies on the previous year’s load factor until 120 days before the flight, with improved accuracy after that due to rising reservation rates. Different flight markets exhibit distinct patterns; for instance, flights from Turkey to Turkey show increasing demand when the departure day is soon. DAM wd modeling the least MAPE results of (
= 3 and
= 3) is 0.44, (
= 5 and
= 3) is 0.27, and (
= 7 and
= 3) is 0.11, based on data from 32 sample flights. In African flights, the MAPE values are high because the reservation rates are unstable, especially 200 days before flights. Far East and Europe flights have smaller MAPE values.
4. Conclusions
We propose a method for passenger load factor (PLF) prediction that involves modeling the relationship between the load factor and the number of days remaining until a flight. Our approach yields minimal errors compared to other methods like DAM, fractional calculus, and standard linear methods. We leveraged data from 19 Turkish Airlines market routes and sample flights to construct a continuous curve applicable to any time frame using the least-squares approach, incorporating fractional calculus theory and a linear model. By utilizing historical data from the reservation process development, our method enables the anticipation of future flight values.
The analysis of the DAM wd model using specific coefficients demonstrates superior performance compared to linear techniques and Fractional Model-3 and DAM modeling. In this paper, we compare and contrast the effectiveness of the deep assessment system (DAM) with the first-order derivative and DAM models in simulating air transport PLF, Fractional Model-3, and linear regression method results. We conclude that the DAM model with a first-order derivative excels not only in modeling capabilities but also in its ability to predict such data accurately.
Our provided approaches allow the extraction of load factor development parameters for any chosen time using the DAM method and discrete data points for each percentile, resulting in a more precise continuous curve.
Table 8,
Table 9 and
Table 10 display load factor modeling outcomes for linear, DAM, and DAM wd models for three different constant values.
Each of the DAM, DAM wd, and Fractional Model-3 models exhibit a decrease in the MAPE ratio as the exponent number in Equation (5) rises. When examining the and values for the DAM model, we observe that altering the value has a greater impact on results than changing the l value.
The modeling findings with the DAM wd model are approximately 0.67 times superior to the DAM model and outperform the fractional model and regression analysis. For the DAM wd modeling method, which models real reservation data with varying and values, the lowest AMAPE value is 0.571. In practical reservation modeling, each booking day shows considerable variation, making forecasting challenging. Our AMAPE results for DAM wd modeling are as follows: ( = 3 and = 3) 1.99, ( = 5 and = 3) 0.948, and ( = 7 and = 3) 0.571. These values are lower than the DAM modeling results, which are ( = 3 and = 3) 2.14, ( = 5 and = 3) 1.27, and ( = 7 and = 3) 0.853. A detailed analysis reveals that the linear regression method yields less accurate estimations during the modeling phase. We developed a load factor prediction model to enhance the management of Turkish Airlines’ flight capacity. As we opted for a dynamic and conditional model approach, the models are trained daily through parallel processing tools. Over a 30-day forecast, our model predicts flight results with an error range of 0 to 8%, accurately predicting around 64% of the flights.
This study’s limitation revolves around the predictive accuracy of the proposed method beyond a 30-day forecast for reservations, where deviations greater than 10% may occur. This highlights a potential challenge in achieving reliable long-term forecasting accuracy. To address this limitation, future research could focus on refining the modeling approach to enhance its predictive capability for extended forecasting periods. Possible remedies for improvement may include exploring advanced forecasting techniques, incorporating additional variables or factors into the model, proposing the DAM with first and second derivatives, and optimizing the methodology to make it more robust and adaptable to longer-term prediction scenarios. By addressing this limitation, future studies could enhance the practical applicability and reliability of the proposed method in real-world air transport capacity management contexts.
To seamlessly integrate our proposed model into the operational system, flight accuracy must be further optimized. Our forecasting model, including factors like special day effects, significantly outperforms the prior linear model. Revenue and capacity management in air transport are increasingly embracing data-driven optimization methods. Thus, there is a growing need for efficient models grounded in extensive available data to address these traditional challenges. Our ongoing effort focuses on developing an infrastructure to handle sales days separately from reservations. We plan to overcome this limitation by incorporating emerging technologies such as distributed computing or cloud computing to process data. Our emphasis lies in adopting a novel fractional calculus approach to manage vast amounts of data by forecasting reservation intervals and enhancing model performance.
We also intend to incorporate the inflation rate variable into the model as an extension of this study’s scope for an improved multi-input model. Additionally, using the model for daily and monthly evaluations, along with integrating second- and higher-order derivatives, would enhance our DAM wd model. While our current study assessed individual flights and marketing areas, future research will explore how PLF factors of one marketing area or flight affect others and to what extent changes in one area or flight influence another.