Next Article in Journal
Hyperglycaemia-Linked Diabetic Foot Complications and Their Management Using Conventional and Alternative Therapies
Previous Article in Journal
An Improved Artificial Ecosystem Algorithm for Economic Dispatch with Combined Heat and Power Units
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying Spatial–Temporal Characteristics and Significant Factors of Bus Bunching Based on an eGA and DT Model

1
School of Architecture, Harbin Institute of Technology, Shenzhen 518055, China
2
Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(22), 11778; https://doi.org/10.3390/app122211778
Submission received: 14 October 2022 / Revised: 9 November 2022 / Accepted: 14 November 2022 / Published: 19 November 2022
(This article belongs to the Section Transportation and Future Mobility)

Abstract

:
Bus bunching is a common phenomenon caused by irregular bus headway, which increases the passenger waiting time, makes the passenger capacity uneven, and severely reduces the reliability of bus service. This paper clarified the process of bus bunching formation, analyzed the variation characteristics of bus bunching in a single day, in different types of periods, and at different bus stops, then concluded twelve potential factors. A hybrid model integrating a genetic algorithm with elitist preservation strategy (eGA) and decision tree (DT) was proposed. The eGA part constructs the model framework and transforms the factor identification into a problem of selecting the fittest individual from the population, while the DT part evaluates the fitness. Model verification and comparison were conducted based on real automatic vehicle location (AVL) data in Shenzhen, China. The results showed that the proposed eGA–DT model outperformed other frequently used single DT and extra tree (ET) models with at least a 20% reduction in MAE under different bus routes, periods, and bus stops. Six factors, including the sequence of the bus stop, the headway and dwell time at the previous bus stop, the travel time between bus stops, etc., were identified to have a significant effect on bus bunching, which is of great value for feature selection to improve the accuracy and efficiency of bus bunching prediction and real-time bus dispatching.

1. Introduction

The bus bunching phenomenon is easily seen in our daily lives, which is usually understood as consecutive buses of a single line arriving at the same bus stop within a small interval. Gf et al. [1] first studied the bus bunching problem and demonstrated that bus vehicles tend to run in pairs during operation and that the instability of bus headway increases this trend. Bus bunching occurs when the headway of two adjacent vehicles at the same stop is less than the regular value; therefore, the identification of bus bunching could be transformed into the evaluation of the bus headway. The evaluation criteria of bus bunching are flexible and vary with different scholars. A fixed value such as 30 s or 1 min has been studied as the bus bunching threshold [2,3]. Moreover, the real-time headway is recommended to be larger than 1/2 or 1/4 of the scheduled headway [4,5]. The coefficient of variation of headway, with the advantage of eliminating the influence of average values and reflecting the deviation away from the mean line, is also generally used to evaluate the stability of headway [5].
Bus bunching is undesirable for both passengers and operators. Since buses usually leave at a regular interval from the terminus, when the headway between adjacent vehicles on the same route is less than normal, there is a subsequent large interval, causing overcrowding on the front bus and highly increased passenger waiting time at the stop, which have a more negative impact on passengers compared to on-board travel time or average travel time [6,7]. For operators, bus bunching is usually accompanied by overcrowding on the front bus and a lower passenger load on the following bus, resulting in a waste of transport resources and increased operational costs. What is worse is that there is a positive feedback loop in the process of bus bunching. Small delays during operation may increase the number of passengers and raise the boarding and alighting time at the stop, which in turn exacerbate bus bunching [8]. The effects of bus bunching are difficult to eliminate without effective control strategies. Initial delays would be continuously transmitted and expanded, increasing the number of affected bus vehicles, stops, and passengers and finally exacerbating the instability of bus operation along the entire route.
Bus bunching is usually caused by numerous factors during bus traveling and dwelling, but not all these factors are equally important. Identifying significant factors is of vital importance to improve the accuracy of bus bunching estimation, optimize real-time bus dispatching, etc., which greatly help to eliminate the impacts of bus bunching as much as possible. Overall, the main contributions of this paper could be summarized as follows:
  • We analyzed the whole process of bus bunching formation in two parts: bus traveling between bus stops and bus dwelling at the stops. How delays during bus operation make headway deviate from schedule and finally cause bus bunching is clarified.
  • The variation characteristics of bus headway and bus headway stability in a single day, in different types of periods and at different bus stops were systematically studied temporally and spatially. Twelve potential factors were concluded as the inputs of the model.
  • A new hybrid model integrating eGA and DT was proposed to identify significant factors of bus bunching. eGA constructs the model framework and transforms the factor identification into a problem of selecting the fittest individual from the population. DT is used to evaluate the fitness of individuals.
  • The eGA–DT model was evaluated under different routes, periods, and bus stops using real AVL data collected from bus vehicles in Shenzhen, China. The significant factors were identified. The results showed that the proposed hybrid model outperformed the single DT and ET models.
This paper is organized as follows: Section 2 presents the literature review. Section 3 describes the process of bus bunching formation in theory, the proposed methods for the analysis of spatial–temporal characteristics, and the model construction for factor identification. Section 4 presents the experimental results. Section 5 contains the discussion and conclusions of the paper.

2. Literature Review

Vehicles equipped with GPS devices can dynamically provide AVL data containing the real-time location, speed, and time of buses, which have been widely used in the prediction of bus arrival or travel time on the route [9,10,11], bus dwell time at the stop [12], the number of boarding passengers [13], the condition of traffic flow [14,15], etc. Spatial–temporal analysis methods are utilized to figure out where bus bunching happens and explore potential influential factors with the help of AVL data. Statistical methods such as clustering algorithms and sensitivity analysis are usually used to analyze spatial–temporal characteristics, measure the correlations, and group the occurrence of bus bunching into specific patterns [16]. Iliopoulou et al. applied the spatial–temporal clustering algorithm to identify hot spots where bus bunching more easily occurs [17]. Feng et al. identified and visualized bus bunching events temporally and spatially to summarize seven attributes for the front bus and six attributes for the following bus, indicating that bus bunching is due to at-stop variation more than between-stop variation [18]. Shaji et al. held the opinion that the travel time in particular sections on a roadway follows similar patterns, and spatial clustering algorithms were utilized to identify the group sections with similar characteristics [19].
Furthermore, bus operation is easily influenced by external disturbances, and there are massive and complex potential factors causing bus bunching. For example, Gf et al. first proposed that the impact of bus bunching is not only affected by the initial bus delay but also by passenger arrival rates and bus occupancy rates along the route [1]. Fonzone et al. argued that unstable traffic conditions and high passenger demands are potential reasons why bus bunching occurs [20]. Zhang et al. held the opinion that the influence of the previous bus is the most serious cause and that bus bunching is affected by the front bus more than the following bus [21].
Time series models, regression models, DT, GA, etc. are widely used to model and predict bus travel or arrival time [10,22,23,24] and have been gradually applied to bus bunching problems. Rashidi et al. utilized decision tree and gene algorithm methods separately to model bus bunching, concluded that decision tree performed better in solving non-linear and complex problems, and determined five factors, namely day of the week, number of left turns, distance to next stop, bus stop closeness, and schedule deviation, to be the most influential factors [25]. Arriagada et al. built regression models for bus bunching and classified influencing factors into three groups by importance, demonstrating that the sequence of the bus stop, scheduled frequency, and bus dispatch headways are the most significant factors [26]. Chioni et al. applied the geographically weighted regression model to bus bunching and took bus stops and network attributes into account, emphasizing the importance of spatial structures [27]. Iliopoulou et al. employed duration models for bus bunching events, and results showed that temporal factors and operational characteristics of the initial bus stop had a great impact on the duration of bus bunching [28].
Generally, a planned schedule is adapted for bus routes with low frequency [29,30]. However, bus routes with high frequency are much more easily affected by abnormal traffic conditions, and the deviation from the planned schedule is more likely to result in bus bunching. Hence, dynamic headway-holding and real-time bus dispatching strategies are proposed based on real-time bus headway and traffic demands. He et al. removed the stochastic factors influencing the headway and proposed a target-headway-based holding strategy to stabilize an unstable bus line [31]. Berrebi et al. proposed prediction-based methods to achieve the best compromise between headway regularity and holding time [32]. He utilized the arrival time of the current bus and the preceding bus and proposed a strategy that adaptively determines the holding time and bus cruising speed. Petit et al. proposed a bus substitution strategy by using standby buses to take over service from any early or late buses to contain deviations from the schedule [33]. Dai et al. proposed a predictive headway-based bus holding strategy, and factors related to bus running times and passenger demands were considered in the model [34]. In general, the reliability of real-time bus dispatching mainly depends on the accuracy of the prediction of bus arrival time or possible bus bunching points, which is highly affected by the input data of the model. As a result, identifying significant factors as the inputs of the model to improve the efficiency and accuracy of the model is of importance.
Overall, the analysis of spatial–temporal characteristics of bus bunching is generally statistical, and how the characteristics impact the process of bus bunching has still not been adequately studied spatially and temporally. Although existing works have proposed numerous potential factors, not all of these are equally important; therefore, identifying the significant factors is still a great challenge. Single models are widely used to model bus bunching, while the performance of hybrid models has been so far overlooked.
To solve these problems, we analyzed the spatial–temporal characteristics of bus bunching on basis of the process of bus bunching formation to find potential factors. A hybrid model integrating eGA and DT was proposed to improve the model performance and identify significant factors of bus bunching, which is of great importance for finding efficient inputs for future bus bunching prediction and real-time bus dispatching.

3. Methodology

3.1. Bus Bunching Formation Process

Bus vehicles usually run along a fixed route and dwell at fixed stops, so the process of bus operation could be divided into two parts: travel between bus stops and dwell at fixed stops, as shown in Figure 1. Solid lines represent the actual running track of a bus, and the dotted lines represent the ideal running track without any external disturbances. Assuming that passengers arrive at a certain rate, the longer travel time of bus 1 between stop 1 and stop 2 would significantly increase the time for passengers gathering at stop 2, leading to an increase in the dwell time of bus 1 at stop 2. When bus 2 arrives at stop 2, it serves fewer passengers because of the shorter interval with bus 1, and the dwell time of bus 2 would be remarkably decreased. Bus 2 leaves the stop ahead of schedule and keeps chasing bus 1 until they bunch together. Similarly, if bus 1 delays at the stop because of the unexpectedly large passenger flow or vehicle queue, it departs behind schedule, and bus 2 would gradually operate ahead of schedule, eventually causing bus bunching.

3.2. Analysis of Spatial–Temporal Characteristics

3.2.1. Analysis of Spatial Characteristics

In the same temporal dimension, the variation of bus headway at different bus stops was analyzed to judge if the sequence of the bus stop affects bus bunching and to demonstrate the transitivity of the bus bunching phenomenon.

3.2.2. Analysis of Temporal Characteristics

The analysis of bus headway in the temporal dimension specifically included the following aspects: (1) daily variation, where one day is divided by hour from 6:00 to 24:00 to analyze the trend in each hour of the day; and (2) different types of periods, where according to the characteristics of traffic conditions and the passenger flow, time is divided into four types: morning peak time on weekdays (7:00–9:00), evening peak time on weekdays (17:30–19:30), off-peak time on weekdays, and rest days.

3.3. eGA–DT Model for Factor Identification

3.3.1. Overall Model Framework

The framework of the eGA–DT model is shown in Figure 2. GA is an evolutionary and adaptively global optimization algorithm that simulates the evolution process in biology. Each potential factor is encoded as a gene, and different groups of factors are regarded as chromosomes. In this way, significant factor identification could be transformed into a problem of selecting the best individual from the population. DT was utilized to select the individual with the greatest fitness. In addition, to prevent the best individual from being destroyed by crossover and mutation, the strategy of elitist preservation was used during the chromosome selection, making sure that the best individual would be directly copied to the next generation and the worst individual would be eliminated.

3.3.2. Construction of eGA Model

The construction of the GA model combined with the strategy of elitist preservation is described as follows.
  • Encode factors with a binary digit
Numerous potential factors may cause bus bunching, and each factor could be regarded as a binary gene, which is the basic unit of a chromosome. A binary digit is utilized to indicate if the factor is selected. Code 1 means the factor is selected, while 0 means unselected. Different combinations of the binary genes would create a large number of chromosomes, representing different groups of selected factors.
2.
Initialize the population
To generate the initial population, the length of the chromosome L and the number of initial chromosomes N should be determined. The initial population may not contain the best individual, so the following steps for crossover, mutation, and fitness selection are extremely important.
3.
Evaluate the fitness of each individual
Individuals with better fitness have a greater chance of being selected, so the fitness selection model plays a decisive role in the results. The DT model is adapted to assign the fitness value. In addition, different methods of data division also affect the result of selection. Thus, 5-fold cross-validation is adapted to reduce the dependence on data sets and ensure that the model has a strong generalization ability for data. Mean absolute error (MAE) is selected to verify the results. The fitness of an individual could be expressed as follows:
F i = 1 1 5 i = 1 5 S i  
where S i is the MAE of each fold cross-validation and i ranges from 1 to 5.
4.
Select the chromosome
The probability of the individual selection is proportional to its fitness value, and chromosomes with better fitness are more likely to be selected into the next generation. The probability of each individual being selected is as follows:
P i = F i i = 1 n F i
where F i is the fitness value of each individual and n is the total number of individuals.
5.
Elitist preservation
To avoid destroying the best fitness individual from crossover and mutation, the strategy of elitist preservation is used during selection. According to the fitness evaluation of each individual as discussed before, the individual with the highest fitness F i is regarded as the best individual in the population, which is directly copied to the next generation without crossover and mutation. Meanwhile, the individual with the smallest fitness is eliminated to ensure a certain population size.
6.
Chromosome crossover
Genes from different chromosomes are randomly crossed to generate new chromosomes. Set the probability of the crossover as P c , and the number of chromosomes participating in the crossover is P c × N .
7.
Chromosome mutation
To avoid the model falling into the local optimal result, variant genes are randomly selected from chromosomes to replace original genes and form the new population. Set the probability of the chromosome mutation as P m .
8.
Identify significant factors
After a certain number of iterations, the value of the fitness remains unchanged, and the algorithm ends. The best individual could be selected, and significant factors of bus bunching are also identified.

3.3.3. Fitness Evaluation Based on DT

Feature vectors x = { x 1 , x 2 , x i } and the target value y = { y 1 , y 2 , y i } are selected as the inputs of the DT model to evaluate the fitness.
  • Initial spatial division
Feature vectors x are initially divided into two subspaces r = { r 1 , r 2 } in space based on the selected feature x i and cut-off point s , which could be described as:
r 1 ( i , s ) = { x | x i < s }
r 2 ( i , s ) = { x | x i s }
2.
Prediction in subspaces
The value of prediction in each subspace is expressed as the average value of total vectors within the space:
y r i = 1 n i r i y i
where y j is the prediction of the feature vector x i in range r i and n is the total number of vectors in the range r i .
3.
Spatial division based on error evaluation
Feature vectors x are divided into several non-overlapping ranges r = { r 1 , r 2 , r i } in space based on the feature x i and cut-off point s to minimize the loss of regression. The evaluation criterion of the regression is described as:
l o s s = x r 1 ( i , s ) ( y i y r i ) 2 + x r 2 ( i , s ) ( y i y r 2 ) 2
4.
Decision tree pruning
If there is no restriction when branching, the model would be extremely complex and overfitting. Therefore, the pruning strategies are adapted to avoid overfitting. Construct the decision tree and delete or collapse the nodes with little information.
5.
Fitness output
After each iteration of GA, one chromosome represents a selection of features and the inputs of the DT model. The fitness of each chromosome is obtained by the DT model, which is an important basis for the following chromosome selection.

3.3.4. Parameter Calibration

  • The length of chromosomes L
    Each chromosome corresponds to a set of factors; thus, L is the number of features.
  • The number of initial chromosomes N
The value of N highly affects the search ability and efficiency of the algorithm. A large initial population could increase the diversity of the population but also add pressure to calculation and decrease the efficiency of the algorithm. However, if the initial population is too small, the possibility of containing the best chromosomes decreases, reducing the search ability of the algorithm. The initial population N is generally within the range of 20~100.
3.
The probability of the chromosome crossover P c
If P c is too small, it is difficult to complete the forward search. On the contrary, if it is too large, it would destroy the structure of the high-fitness individual. Empirically, the value of P c usually ranges from 0.4 to 0.99.
4.
The probability of the chromosome mutation P m
A large P m makes it easier to destroy the current optimal individual, while a small value may lead to difficulty in generating new genes and jumping out of the local optimal result. The value of P m usually ranges from 0.01 to 0.1.

3.3.5. Evaluation Criteria

Three evaluation indexes, namely mean absolute error ( M A E ), root mean square error ( R M S E ), and coefficient of determination ( R 2 ), were utilized to evaluate the model results. We set N as the number of test samples, y i as the real value of the sample, y i as the predicted value, and y i ^ as the average value of the sample. The specific formulae for M A E , RMSE, and R 2 are shown below.
M A E = 1 N i = 1 N | y i y i |
R M S E = i = 1 N ( y i y i ) 2 N 1
R 2 = 1 i = 1 N ( y i y i ) 2 i = 1 N ( y i ^ y i ) 2

4. Results

4.1. Data Description

In this study, AVL data from eight bus routes in Shenzhen, for nearly two months (from 1 April to 26 May 2019), were selected as the raw data. To fully reflect the characteristics of bus bunching and make the analysis more representative, four different types of bus routes were selected, namely the regular route, main route, express route, and branch route, as shown in Figure 3 and Table 1. The coverage of selected lines evenly contained different kinds of land uses such as commercial, residential, etc. Furthermore, all selected routes did not pass through the construction area; thus, the bus operation was not affected by road and building construction. The principles above made sure that all selected routes could reflect the general characteristics of the bus operation.
The AVL data mainly includes the following eleven fields, and data samples are shown in Table 2.
  • TripId: number of trips for each vehicle;
  • LineId: number of bus routes;
  • LineDir: direction of bus routes, including up direction and down direction;
  • BusNum: unique ID number of each vehicle;
  • StationName: name of the bus stop;
  • StationIndex: serial number of the bus stop;
  • StationId: unique ID number of the bus stop;
  • ArrTime: the time when the bus arrives at the bus stop (accurate to seconds);
  • LeaTime: the time when the bus leaves the bus stop (accurate to seconds);
  • PreStationId: unique ID number of the previous bus stop;
  • NextStationId: unique ID number of the next bus stop.

4.2. Bus Bunching Process

Route 113 is selected as an example to reflect the process of bus operation during peak time (7:00 to 9:00) and off-peak time (13:00 to 15:00), as observed in Figure 4. The results in Figure 4a clearly indicate that the adjacent vehicles BS37580D and BS06015D bunched together at stop 25 with 99 s headway, roughly a quarter of the departure interval of 346 s. Similarly, BS02935D and BS05185D arrived at stop 25 at nearly the same time, with 13 s headway.
The phenomenon of “large and small interval” could be clearly seen as well. As the bus departed from the terminus at a fixed interval, bus bunching at stop 25 resulted in a large interval before and behind it, remarkably increasing the passenger waiting time. Vehicle BS06015D may serve fewer passengers because of the small interval with vehicle BS37580D; thus, its transport capacity could not be fully utilized in the case of no overtaking. On the contrary, affected by the large interval between BS06015D and BS02935D, BS02935D may face severer operational pressure.
Figure 4b shows the same principles, but the fluctuation of bus headway is relatively gently affected by the reduced passenger demands and lengthened departure interval. However, the phenomenon of “large and small intervals” still exists.

4.3. Spatial–Temporal Characteristics

4.3.1. Analysis of Bus Headway

  • Daily variation
Figure 5 shows the daily variation of bus headway at different bus stops at each hour of a day. Bus stops were selected at equal intervals in a sequence. A conclusion could be drawn that a small headway usually occurs from 7:00 to 10:00 and 17:00 to 19:00, while a large headway frequently occurs from 13:00 to 15:00 and 21:00 to 24:00. Specifically, the minimum headway was 277 s, which occurred from 7:00 to 8:00 at stop 10, and the maximum headway appeared from 23:00 to 24:00 at stop 10 with 1383 s.
The result also demonstrates that the small headway of stops 10, 20, and 30 occurred from 16:00 to 17:00, while the later bus stops such as stops 40 and 50 had a small headway from 18:00 to 19:00, proving that the small headway caused by former bus stops is continuously transmitted to the following buses.
2.
Variation in different time types
The average bus headway in different time types is presented in Figure 6, including four time periods, namely morning peak on weekdays (7:00–9:00), evening peak on weekdays (17:30–19:30), off-peak time on weekdays, and rest days. The headway during the morning peak on weekdays was the most unstable, followed by the evening peak on weekdays. The most stable headway usually occurred on rest days, with an average value of 450 s. Bus headway fluctuated relatively gently during rest days and showed a flat peak on weekdays compared with the morning and evening peak times.

4.3.2. Analysis of Bus Headway Stability

The coefficient of variation of headway, which is generally used to evaluate the stability of headway, is calculated as follows [5]:
C o v = s t a n d a r d   d e v i t i o n   o f   h e a d w a y m e a n   s c h e d u l e d   h e a d w a y  
  • Daily variation
Figure 7 shows the daily variation of C o v at each bus stop based on the AVL data of route 113. The C o v generally had a local peak from 7:00 to 9:00 and 17:00 to 20:00 when commuter trips significantly increase and traffic conditions worsen; thus, the stability of headway is easily affected by more random factors. In addition, there was a peak time that appeared from 20:00 to 21:00 after the regular evening peak, probably caused by overtime commuter trips. Furthermore, the C o v at stops 2 and 17 were more stable than at following stops.
2.
Variation on a work day and non-work day
The AVL data of route 113 from 15 April to 21 April 2019 are divided by bus stops and dates, of which the 20th and 21st are non-working days and others are working days. Figure 8 shows that the date with the maximum C o v is Wednesday, 17 April 2019, while the minimum occurred on Saturday, 20 April 2019. The C o v on a non-work day was obviously lower than on a work day.
3.
Variation at different bus stops
To clearly show the C o v at different bus stops on a work day and non-work day, several bus stops were selected, as shown in Figure 9. The C o v increased severely from 7:00 to 9:00, probably because of unexpected travel demands and traffic congestion, which is more likely to cause bus bunching. The C o v on a non-work day was about 1.0, and the local peak value mainly appeared from 11:00 to 13:00 and 15:00 to 17:00. In general, the trend was relatively stable throughout the non-work day.

4.4. Feature Vector Selection

Based on the previous analyses of the formation process and spatial–temporal characteristics of bus bunching, twelve features were initially selected as the input vectors of the model:
  • x 1 : the number of the bus stop j.
  • x 2 : the dwell time of bus i − 1 at stop j.
  • x 3 : the dwell time of bus i at stop j.
  • x 4 : the time of bus i − 1 travel from stop j − 1 to stop j.
  • x 5 : the time of bus i travel from stop j − 1 to stop j.
  • x 6 : the average travel time between stop j − 1 to stop j in the previous 15 min of the same type of period last week.
  • x 7 : the headway of bus i at stop j − 1.
  • x 8 : the average headway at stop j in the previous 15 min of the same type of period last week.
  • x 9 : the departure interval of bus i at the first stop.
  • x 10 : the number of 15-min period groups.
  • x 11 : the type of the day (morning peak time: 1, evening peak time: 2, off-peak time on weekdays: 3, and rest day: 4).
  • x 12 : the day of the week.
Factors were selected based on the process of bus operation and by different dimensions so that we can measure the importance of different factors to gain a deeper understanding of which aspects have a greater impact on bus bunching. The feature x 1 represents the location of the bus stop, reflecting the spatial characteristics of bus bunching. As discussed above, the process of bus bunching could be divided into two stages, namely the bus traveling stage and the bus dwelling stage. x 2 and x 3 represent the time the bus dwells at stops, indicating the influence of passenger flow and bus queuing at stops. x 4 , x 5 , and x 6 represent the real-time and historical condition of the bus traveling between bus stops, reflecting the traffic condition. x 7 and x 8 represent the bus headway of two adjacent buses. x 9 , x 10 , x 11 , and x 12 reflect the temporal characteristics of bus bunching. The data samples of feature vectors are shown in Table 3.

4.5. Factor Identification by eGA–DT

4.5.1. Parameter Calibration

Each chromosome corresponds to a combination of different factors, so L is the number of features. As discussed above, there were twelve features selected as the inputs of the model; therefore, the value of L was 12.
The ranges of other parameters mentioned above are proposed empirically in Section 3.3.3. Too large or small values of parameters may have a negative impact both on the fitting effect and efficiency of the algorithm. Therefore, the selection of parameters should consider both the fitting effect and efficiency of the algorithm depending on different data sets.
In terms of the AVL data analyzed in this article, appropriate parameters were set to be: L = 12, N = 100, P c = 0.6, and P m = 0.1, which performed well in terms of both the fitting effect and the efficiency of calculation.

4.5.2. Results of eGA–DT Model

The result presented in Figure 10 is the trend of the best fitness during 100 iterations. Considering that the model always selects the individual with better fitness, the reciprocal of the deviation is regarded as the value of fitness in the process of population evolution. It could be clearly observed that as the number of iterations increased, the fitness value of the population rose gradually until it reached the maximum value.
The result in Table 4 clearly indicates that the overall fitting effect of bus headway by the proposed hybrid eGA–DT was significantly improved, with a nearly 20% reduction in MAE compared to single DT and more than a 50% reduction compared to ET.
There are slight differences between the results of the four different types of routes. For regular routes and main routes, the best chromosome was [1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0]. For express routes, the best chromosome was [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0]. For branch routes, the best chromosome was [1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0]. Considering all four different types of routes, the best chromosome could be concluded as [1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0], representing that x 1 (the number of the bus stop j), x 2 (the dwell time of bus i − 1 at stop j), x 4 (the time of bus i travel from stop j − 1 to stop j), x 5 (the time of bus i travel from stop j − 1 to stop j), x 6 (the historical average travel time between stop j−1 to stop j), and x 7 (the headway of bus i at stop j − 1) are the most significant factors of bus bunching.

4.5.3. Model Comparison

  • The performances under different bus stops
Considering the transitivity of bus bunching, later bus stops may have more fluctuating headway. Therefore, the model performance of eGA–DT, DT, and ET on all bus stops of route 113, except the initial and terminal stops, was analyzed, as presented in Figure 11 and Figure 12. The MAE was utilized to evaluate the model performance. It could be clearly seen from Figure 11 that eGA–DT outperformed other models at all bus stops, especially compared to ET. Compared to the single DT model, the average reduction in MAE by eGA–DT at each bus stop was about 5%, as shown in Figure 12.
2.
The performances under different bus routes
There are various levels of bus routes with different characteristics, including regular, main, express, and branch routes. Eight routes from different types were selected, and eGA–DT, DT, and ET were conducted for comparison, as shown in Table 5. The results clearly indicate that eGA–DT performed better for all types of bus routes. Particularly, the eGA–DT performed best for branch routes with 23.54 in MAE. In general, eGA–DT outperformed DT and ET with an average reduction of 26% and 43% in MAE, respectively.
3.
The performances under different periods
The data were divided into four types of periods, including morning peak time on weekdays (7:00–9:00), evening peak time on weekdays (17:30–19:30), off-peak time on weekdays, and rest days. Results in Table 6 show the MAE of the model for four different types of periods. It could be clearly seen that the MAE in peak periods was larger than in off-peak periods, probably because of the severer fluctuation of headway affected by unexpected passenger flow and traffic conditions. Through the performance of different models, eGA–DT outperformed others with an average reduction of 7.3% in MAE during peak periods and 10% during off-peak periods.

5. Discussion and Conclusions

In this paper, the bus operation was divided into two parts, including bus traveling between bus stops and bus dwelling at fixed stops, to analyze the process of bus bunching formation. Delays in bus operation make bus headway at stops deviate from the schedule and finally cause bus bunching. The occurrence of bus bunching is accompanied by the phenomenon of “Large and small intervals”, which is worsened during peak times compared to off-peak time. After analyzing the headway and headway stability temporally and spatially, it could be concluded as follows:
  • One single day. A small headway usually occurred from 7:00 to 10:00 and 17:00 to 19:00, while a large headway frequently occurred from 13:00 to 15:00 and 21:00 to 24:00. The C o v generally had a local peak from 7:00 to 9:00, 17:00 to 20:00, and even at 21:00 after the regular evening peak.
  • Different time types. The headway during the morning peak on weekdays was the most unstable, followed by the evening peak on weekdays. The most stable headway usually occurred on rest days. The C o v on a non-work day was obviously lower and more stable than on a work day.
  • Different bus stops. The small headway caused by former bus stops is continuously transmitted to the following buses. Bus bunching is transitive, and bus stops away from the departure place are more likely to be affected. The spatial attributes of the bus stop should be of great concern.
Considering the spatial and temporal characteristics of bus bunching discussed above, twelve potential factors related to the following aspects were selected as the input vectors of the model: the location of the bus stop, the bus dwelling time at the former stop, the bus traveling time between bus stops, the bus headway at former stops, the departure interval, the type of the day, and the day of a week.
A hybrid model integrating GA and DT was constructed to identify the significant factors of bus bunching. eGA was adapted to build the overall framework for factor identification, and the strategy of elitist preservation was used to avoid destroying the best-fitness individual from crossover and mutation. Each potential factor was encoded as a gene, and each factor group was regarded as a chromosome. Significant factor identification was transformed into a problem of selecting the fittest chromosome from the population. DT was utilized to select the individual with the greatest fitness. The proposed hybrid eGA–DT model outperformed the single DT and ET models in different situations:
  • Different bus stops. eGA–DT outperformed at all bus stops, especially compared to ET. The average reduction in MAE by eGA–DT at each bus stop was about 5% compared to DT.
  • Different bus routes. Four types of routes, including the regular, main, express, and branch route, were selected. The results showed that eGA–DT performed better for all types of bus routes, especially for branch routes with 23.54 in MAE. Moreover, eGA–DT outperformed DT and ET with an average reduction of 26% and 43% in MAE, respectively.
  • Different periods. Considering four types of periods, including morning peak on weekdays (7:00–9:00), evening peak on weekdays (17:30–19:30), off-peak on weekdays, and rest days, the MAE in peak periods was obviously larger than off-peak periods. eGA–DT outperformed DT and ET with an average reduction of 7.3% in MAE during peak periods and 10% during off-peak periods.
The results demonstrate that x 1 (the number of the bus stop j), x 2 (the dwell time of bus i−1 at stop j), x 4 (the time of bus i travel from stop j−1 to stop j), x 5 (the time of bus i travel from stop j−1 to stop j), x 6 (the historical average travel time between stop j−1 to stop j), and x 7 (the headway of bus i at stop j−1) are significant factors contributing to the increased bus bunching phenomenon. Bus bunching is transitive, and bus stops away from the departure place are more likely to be affected; thus, spatial attributes of bus stops should be emphasized in relevant prediction models. Moreover, when bus bunching occurs at an early stage along the routes, possible bus bunching points and bus stop-skipping schemes are taken on the basis of the spatial characteristics. Compared to the bus dwell time at the stop, the results showed that the bus travel time between stops was a more significant factor of bus bunching. As a result, detailed factors which may have an impact on the bus travel time, such as the influence of signalized intersections, bus lane planning, etc., should be taken into account in models. As the reliability of future prediction and dispatch applications greatly depends on the selected inputs, significant factors play an important role in improving the accuracy or efficiency of bus arrival/travel time prediction, bus bunching point estimation, bus dispatching strategies, etc.
In addition, there are some areas that need to be improved. The eGA–DT model has the advantage of fitting non-linear relationships, but it also tends to overfit and converge slowly under different data sets. Therefore, methods related to data preprocessing, cross-validation, and random forest could be further studied to mitigate the problem. Moreover, multi-source data, such as bus passenger data, mobile signal data, traffic flow data, etc., also greatly help to reflect the characteristic of headway. In addition, this paper mainly focused on the analysis of the characteristics of a single bus line, and the interaction among multiple bus routes along the same corridor could be taken into account in future research.

Author Contributions

Conceptualization, M.Y., B.X. and G.X.; data curation, M.Y. and G.X.; formal analysis, M.Y.; funding acquisition, B.X.; investigation, M.Y. and G.X.; methodology, M.Y. and B.X.; project administration, B.X.; resources, B.X.; software, M.Y. and G.X.; supervision, B.X.; validation, B.X. and G.X.; visualization, G.X.; writing—original draft, M.Y.; writing—review & editing, B.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China (71974043) and the Science and Technology Innovation Committee of Shenzhen under Grant JCYJ20210324133202006.

Data Availability Statement

The data presented in this study are available on request from the first author and corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Newell, G.F.; Potts, R.B. Maintaining a Bus Schedule. In Proceedings of the 2nd Australian Road Research Board (ARRB) Conference, Melbourne, Australia, January 1964; pp. 388–393. [Google Scholar]
  2. Feng, W.; Figliozzi, M. Identifying Spatial-Temporal Attributes of Bus Bunching through AVL/APC Data. In Proceedings of the Institute of Transportation Engineers (ITE) Western States Annual Meeting, Anchorage, Alaska, 10–13 July 2011. [Google Scholar]
  3. Du, B.; Dublanche, P.-A. Bus Bunching Identification Using Smart Card Data. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 11–13 December 2018; pp. 1087–1092. [Google Scholar] [CrossRef]
  4. Moreira-Matias, L.; Gama, J.; Mendes-Moreira, J.; Freire de Sousa, J. An Incremental Probabilistic Model to Predict Bus Bunching in Real-Time. In Advances in Intelligent Data Analysis XIII; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2014; pp. 227–238. [Google Scholar] [CrossRef]
  5. Ryus, P.; Danaher, A.; Walker, M.; Nichols, F.; Carter, B.; Ellis, E.; Cherrington, L.; Bruzzone, A. Transit Capacity and Quality of Service Manual, 3rd ed.; The National Academies Press: Washington, DC, USA, 2013. [Google Scholar] [CrossRef]
  6. Schmöcker, J.-D.; Sun, W.; Fonzone, A.; Liu, R. Bus bunching along a corridor served by two lines. Transp. Res. Part B Methodol. 2016, 93, 300–317. [Google Scholar] [CrossRef]
  7. Verbich, D.; Diab, E.; El-Geneidy, A. Have they bunched yet? An exploratory study of the impacts of bus bunching on dwell and running times. Public Transp. 2016, 8, 225–242. [Google Scholar] [CrossRef]
  8. Enayatollahi, F.; Idris, A.; Amiri Atashgah, M.A. Modelling bus bunching under variable transit demand using cellular automata. Public Transp. 2019, 11, 269–298. [Google Scholar] [CrossRef]
  9. Zhou, T.; Wu, W.; Peng, L.; Zhang, M.; Li, Z.; Xiong, Y.; Bai, Y. Evaluation of urban bus service reliability on variable time horizons using a hybrid deep learning method. Reliab. Eng. Syst. Saf. 2022, 217, 108090. [Google Scholar] [CrossRef]
  10. Čelan, M.; Lep, M. Bus-arrival time prediction using bus network data model and time periods. Future Gener. Comput. Syst. 2020, 110, 364–371. [Google Scholar] [CrossRef]
  11. Yu, Z.; Wood, J.S.; Gayah, V.V. Using survival models to estimate bus travel times and associated uncertainties. Transp. Res. Part C Emerg. Technol. 2017, 74, 366–382. [Google Scholar] [CrossRef]
  12. AlHadidi, T.; Rakha, H.A. Modeling bus passenger boarding/alighting times: A stochastic approach. Transp. Res. Interdiscip. Perspect. 2019, 2, 100027. [Google Scholar] [CrossRef]
  13. Sun, W.; Schmöcker, J.-D.; Fukuda, K. Estimating the route-level passenger demand profile from bus dwell times. Transp. Res. Part C Emerg. Technol. 2021, 130, 103273. [Google Scholar] [CrossRef]
  14. Peng, L.; Li, Z.; Wang, C.; Sarkodie-Gyan, T. Evaluation of roadway spatial-temporal travel speed estimation using mapped low-frequency AVL probe data. Measurement 2020, 165, 108150. [Google Scholar] [CrossRef]
  15. Liu, Y.; Qing, R.; Zhao, Y.; Liao, Z. Road Intersection Recognition via Combining Classification Model and Clustering Algorithm Based on GPS Data. ISPRS Int. J. Geo-Inf. 2022, 11, 487. [Google Scholar] [CrossRef]
  16. Iliopoulou, C.A.; Milioti, C.P.; Vlahogianni, E.I.; Kepaptsoglou, K.L. Identifying spatio-temporal patterns of bus bunching in urban networks. J. Intell. Transp. Syst. 2020, 24, 365–382. [Google Scholar] [CrossRef]
  17. Iliopoulou, C.; Milioti, C.; Vlahogianni, E.; Kepaptsoglou, K.; Sanchez-Medina, J. The Bus Bunching Problem: Empirical Findings from Spatial Analytics. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 871–876. [Google Scholar] [CrossRef]
  18. Feng, W.; Figliozzi, M. Empirical Findings of Bus Bunching Distributions and Attributes Using Archived AVL/APC Bus Data. In Proceedings of the 11th International Conference of Chinese Transportation Professionals 2011 (ICCTP 2011), Nanjing, China, 14–17 August 2011; pp. 4330–4341. [Google Scholar]
  19. Shaji, H.E.; Tangirala, A.K.; Vanajakshi, L. Prediction of Trends in Bus Travel Time Using Spatial Patterns. Transp. Res. Procedia 2020, 48, 998–1007. [Google Scholar] [CrossRef]
  20. Fonzone, A.; Schmöcker, J.-D.; Liu, R. A Model of Bus Bunching under Reliability-based Passenger Arrival Patterns. Transp. Res. Procedia 2015, 7, 276–299. [Google Scholar] [CrossRef] [Green Version]
  21. An, S.; Zhang, X.; Wang, J. Finding Causes of Irregular Headways Integrating Data Mining and AHP. ISPRS Int. J. Geo-Inf. 2015, 4, 2604–2618. [Google Scholar] [CrossRef] [Green Version]
  22. Tamir, T.S.; Xiong, G.; Li, Z.; Tao, H.; Shen, Z.; Hu, B.; Menkir, H.M. Traffic Congestion Prediction using Decision Tree, Logistic Regression and Neural Networks. IFAC-Pap. 2020, 53, 512–517. [Google Scholar] [CrossRef]
  23. Dhivya Bharathi, B.; Anil Kumar, B.; Achar, A.; Vanajakshi, L. Bus travel time prediction: A log-normal auto-regressive (AR) modelling approach. Transp. A Transp. Sci. 2020, 16, 807–839. [Google Scholar] [CrossRef]
  24. Yuan, Y.; Zhang, W.; Yang, X.; Liu, Y.; Liu, Z.; Wang, W. Traffic state classification and prediction based on trajectory data. J. Intell. Transp. Syst. 2021, 1–15. [Google Scholar] [CrossRef]
  25. Rashidi, S.; Ranjitkar, P.; Csaba, O.; Hooper, A. Using Automatic Vehicle Location Data to Model and Identify Determinants of Bus Bunching. Transp. Res. Procedia. 2017, 25, 1444–1456. [Google Scholar] [CrossRef]
  26. Arriagada, J.; Gschwender, A.; Munizaga, M.A.; Trépanier, M. Modeling bus bunching using massive location and fare collection data. J. Intell. Transp. Syst. 2019, 23, 332–344. [Google Scholar] [CrossRef]
  27. Chioni, E.; Iliopoulou, C.; Milioti, C.; Kepaptsoglou, K. Factors affecting bus bunching at the stop level: A geographically weighted regression approach. Int. J. Transp. Sci. Technol. 2020, 9, 207–217. [Google Scholar] [CrossRef]
  28. Iliopoulou, C.; Vlahogianni, E.; Kepaptsoglou, K. Understanding the factors that affect the bus bunching events’ duration. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
  29. Gkiotsalitis, K.; Alesiani, F. Robust timetable optimization for bus lines subject to resource and regulatory constraints. Transp. Res. Part E Logist. Transp. Rev. 2019, 128, 30–51. [Google Scholar] [CrossRef]
  30. Mendes-Moreira, J.; Moreira-Matias, L.; Gama, J.; Freire de Sousa, J. Validating the coverage of bus schedules: A Machine Learning approach. Inf. Sci. 2015, 293, 299–313. [Google Scholar] [CrossRef]
  31. He, S.; Liang, S.; Dong, J.; Zhang, D.; He, J.; Yuan, P. A holding strategy to resist bus bunching with dynamic target headway. Comput. Ind. Eng. 2020, 140, 106237. [Google Scholar] [CrossRef]
  32. Berrebi, S.J.; Hans, E.; Chiabaut, N.; Laval, J.A.; Leclercq, L.; Watkins, K.E. Comparing bus holding methods with and without real-time predictions. Transp. Res. Part C Emerg. Technol. 2018, 87, 197–211. [Google Scholar] [CrossRef] [Green Version]
  33. Petit, A.; Ouyang, Y.; Lei, C. Dynamic bus substitution strategy for bunching intervention. Transp. Res. Part B Methodol. 2018, 115, 1–16. [Google Scholar] [CrossRef]
  34. Dai, Z.; Liu, X.C.; Chen, Z.; Guo, R.; Ma, X. A predictive headway-based bus-holding strategy with dynamic control point selection: A cooperative game theory approach. Transp. Res. Part B Methodol. 2019, 125, 29–51. [Google Scholar] [CrossRef]
Figure 1. The process of bus bunching in two situations: (a) Bus bunching occurs when traveling between bus stops; (b) Bus bunching occurs when dwelling at a stop.
Figure 1. The process of bus bunching in two situations: (a) Bus bunching occurs when traveling between bus stops; (b) Bus bunching occurs when dwelling at a stop.
Applsci 12 11778 g001
Figure 2. The overall framework of the eGA–DT model.
Figure 2. The overall framework of the eGA–DT model.
Applsci 12 11778 g002
Figure 3. Map of selected bus routes.
Figure 3. Map of selected bus routes.
Applsci 12 11778 g003
Figure 4. The process of bus operation during peak time and off-peak time: (a) During peak time from 7:00 to 9:00; (b) During the off-peak time from 13:00 to 15:00.
Figure 4. The process of bus operation during peak time and off-peak time: (a) During peak time from 7:00 to 9:00; (b) During the off-peak time from 13:00 to 15:00.
Applsci 12 11778 g004
Figure 5. Daily variation of bus headway.
Figure 5. Daily variation of bus headway.
Applsci 12 11778 g005
Figure 6. Average headway at bus stops in different time types.
Figure 6. Average headway at bus stops in different time types.
Applsci 12 11778 g006
Figure 7. Daily variation of C o v .
Figure 7. Daily variation of C o v .
Applsci 12 11778 g007
Figure 8. Variation of C o v on a work day and non-work day.
Figure 8. Variation of C o v on a work day and non-work day.
Applsci 12 11778 g008
Figure 9. The C o v at different bus stops on a work day and non-work day: (a) On a work day; (b) On a non-work day.
Figure 9. The C o v at different bus stops on a work day and non-work day: (a) On a work day; (b) On a non-work day.
Applsci 12 11778 g009
Figure 10. The trend of model fitness in iterations.
Figure 10. The trend of model fitness in iterations.
Applsci 12 11778 g010
Figure 11. Model comparison by eGA–DT, DT, and ET for different bus stops: (a) Comparison result between eGA–DT and DT; (b) Comparison result between eGA–DT and ET.
Figure 11. Model comparison by eGA–DT, DT, and ET for different bus stops: (a) Comparison result between eGA–DT and DT; (b) Comparison result between eGA–DT and ET.
Applsci 12 11778 g011
Figure 12. Reduction of MAE compared to DT for different bus stops.
Figure 12. Reduction of MAE compared to DT for different bus stops.
Applsci 12 11778 g012
Table 1. Basic information about selected bus routes.
Table 1. Basic information about selected bus routes.
Route NumberTypeNumber of StopsDeparture Interval (min)
1regular194–10
113regular524–10
E11express1715
E15express234–10
M156main3010
M182main556–12 (up direction)
4–10 (down direction)
B811branch155
B618branch146–12
Table 2. Data samples of AVL.
Table 2. Data samples of AVL.
Trip
Id
Line
Id
Line
Dir
Bus
Num
Station
Index
Station
Id
Arr
Time
Lea
Time
Pre
StationId
Next
StationId
650c8
2aa9
1130downBS05265D1B_ZS016520 April 2019
13:09:45
20 April 2019
13:10:24
NULLB_ZS0029
650c8
2aa9
1130downBS05265D2B_ZS002920 April 2019
13:13:12
20 April 2019
13:13:49
B_ZS0165B_ZS0033
650c8
2aa9
1130downBS05265D20B_SH003320 April 2019
13:49:36
20 April 2019
13:50:18
B_SH0035B_SH0045
Table 3. Data samples of feature vectors.
Table 3. Data samples of feature vectors.
x 1 x 2 ( s ) x 3 ( s ) x 4 ( s ) x 5 ( s ) x 6 ( s ) x 7 ( s ) x 8 ( s ) x 9 ( s ) x 10 x 11 x 12
30202011590905015015162742
3420308080804184183972742
3510191701401351272873882742
Table 4. Model performance by eGA–DT, DT, and ET.
Table 4. Model performance by eGA–DT, DT, and ET.
ModelMAE (s)RMSE (s)R2
eGA–DT35.2773.370.91
DT43.7578.690.86
ET74.61124.580.61
Table 5. Model performance by eGA–DT, DT, and ET for different bus routes.
Table 5. Model performance by eGA–DT, DT, and ET for different bus routes.
Route NumberRoute TypeModelMAE(s)RMSE(s)R2
1regulareGA–DT47.76 84.37 0.89
DT72.45 123.71 0.76
ET78.98 128.39 0.74
113regulareGA–DT35.2773.370.91
DT43.7578.690.86
ET74.61124.580.61
E11expresseGA–DT77.08 129.97 0.81
DT113.10 181.93 0.62
ET126.68 198.65 0.55
E15expresseGA–DT76.11 118.28 0.89
DT93.62 148.55 0.83
ET111.32 168.91 0.78
M156maineGA–DT57.29 99.92 0.89
DT80.36 132.85 0.81
ET106.08 166.74 0.70
M182maineGA–DT35.5275.170.91
DT56.36 111.28 0.81
ET61.60 117.01 0.79
B811brancheGA–DT45.41 60.40 0.94
DT49.91 66.62 0.93
ET87.27 141.72 0.66
B618brancheGA–DT23.54 43.36 0.95
DT35.10 64.42 0.88
ET45.03 75.64 0.83
Table 6. MAE of eGA–DT, DT, and ET for different types of periods.
Table 6. MAE of eGA–DT, DT, and ET for different types of periods.
Time TypeeGA–DTDTET
morning peak 41.19 43.70 123.46
evening peak50.67 55.63 168.30
off-peak35.09 39.33 118.37
rest day35.6539.19130.71
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yan, M.; Xie, B.; Xu, G. Identifying Spatial–Temporal Characteristics and Significant Factors of Bus Bunching Based on an eGA and DT Model. Appl. Sci. 2022, 12, 11778. https://doi.org/10.3390/app122211778

AMA Style

Yan M, Xie B, Xu G. Identifying Spatial–Temporal Characteristics and Significant Factors of Bus Bunching Based on an eGA and DT Model. Applied Sciences. 2022; 12(22):11778. https://doi.org/10.3390/app122211778

Chicago/Turabian Style

Yan, Min, Binglei Xie, and Gangyan Xu. 2022. "Identifying Spatial–Temporal Characteristics and Significant Factors of Bus Bunching Based on an eGA and DT Model" Applied Sciences 12, no. 22: 11778. https://doi.org/10.3390/app122211778

APA Style

Yan, M., Xie, B., & Xu, G. (2022). Identifying Spatial–Temporal Characteristics and Significant Factors of Bus Bunching Based on an eGA and DT Model. Applied Sciences, 12(22), 11778. https://doi.org/10.3390/app122211778

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop