1. Introduction
The stable development of urban society relies on the efficient operation of infrastructure. Public transportation systems bear the significant task of facilitating travel for residents in medium and large cities and have become crucial infrastructure support systems for TOD (transit-oriented development) in recent years [
1]. However, with the continuous expansion and development of cities, the resilience of public transportation systems to natural disasters and public health emergencies has been declining. After such disruptions, it is challenging for these systems to recover quickly. The loss of or reduced transportation system functionality directly impacts population mobility and affects social and economic recovery [
2]. Due to their efficiency, punctuality, and large capacity, subway systems are popular among commuters in major cities and account for a significant portion of public transportation. Since 2020, the COVID-19 pandemic has severely impacted travel patterns. The need for social distancing due to the airborne nature of the virus became a primary concern during severe phases of the pandemic [
3]. Consequently, subway travel was severely affected, with a significant decline in passenger flow.
In the post-pandemic new normal, subway passenger flow is gradually recovering, but it exhibits spatial and temporal differences at the station level [
4]. This necessitates introducing a new concept of station-level passenger flow resilience to represent these differences. As early as the late 1990s, researchers began using the concept of resilience to study cities as complex socio-economic systems. Urban resilience refers to the ability of the overall urban system and its subsystems to maintain and restore their functions when facing external shocks. Due to the pandemic, most urban rail transit systems experienced three phases: a sharp decline, gradual recovery, and stable normalization [
5]. Station-level passenger flow reflects the relationship between urban rail transit systems and urban spaces. The built environment around stations, especially land-use characteristics, significantly influences the spatial and temporal distribution of passenger flow. For example, stations primarily serving residential and employment functions tended to recover passenger flow more quickly post-pandemic, while those focused on tourism and entertainment recovered more slowly, showing clear differences between weekdays and non-weekdays [
6,
7].
Reviewing the mechanisms behind the differentiated distribution of passenger flow characteristics, a large body of literature has explored the complex relationship between multi-source built environment variables and passenger flow characteristics. The selection of built environment variables primarily focuses on the 5D characteristics of density, diversity, design, distance, and destination. Overall, factors related to residence and employment tend to play a decisive role in passenger flow characteristics [
8,
9]. Other types of land use are mainly associated with non-essential travel, thus exhibiting considerable fluctuations in their correlation with overall station connections. Other built environment factors, such as the number of bus routes and distance to the city center, are also important influencing factors, especially concerning peak-hour passenger flow [
10,
11]. However, when studying the correlation between subway passenger flow and the built environment under the impact of the pandemic, many scholars rely on qualitative analysis or empirical judgment without systematically quantifying the complex relationship between land-use characteristics and passenger flow resilience during the post-pandemic recovery stage.
In terms of research methods, earlier approaches often used global regression models and local regression models, overlooking the fact that the impact of built environment factors on passenger flow distribution varies across spatial and temporal scales. To address this, the multiscale geographically weighted regression (MGWR) model solves this issue by optimizing bandwidth differences to account for these variations, although it still adheres to the assumption of linear relationships [
12,
13,
14]. As machine learning’s tree-based regression methods handle supervised learning problems by fitting features according to the tree structure and solving the limitations of linear regression assumptions with second-order residual-based criteria, they still suffer from the drawback of relying on subjective experience during hyperparameter settings in supervised learning [
11].
In summary, although detailed research has been conducted on the changes in transportation services under the impact of the pandemic, there is still a lack of metric development for station-level passenger flow resilience in the post-pandemic era, as well as analysis of the mechanisms behind the complex nonlinear relationships between these changes and land use-related built environment variables and their differentiated spatiotemporal distribution. Based on this, the current study uses AFC card data to establish station-level passenger flow recovery resilience (PFRR) characteristics during the post-pandemic period and constructs a new analysis framework. The objectives are to analyze the spatiotemporal distribution framework of station-level PFRR indices in different time periods in relation to land use-related built environment characteristics and to optimize light gradient boosting machine (LightGBM) hyperparameters using improved gray wolf optimization with Levy flight (LGWO) to achieve the optimal fitting of LightGBM supervised learning. Subsequently, SHAP attribution analysis based on game theory is used to evaluate the nonlinear impacts of relevant built environment factors on passenger flow resilience. Finally, other commonly used baseline methods are employed to evaluate the model’s efficiency and validity using different evaluation metrics to verify the effectiveness and rationality of the proposed analysis approach.
2. Related Work
2.1. Research on Passenger Flow Resilience
Resilience refers to the ability to recover from the current state to the original state, specifically the capacity of a related system to return to its previous state after being disrupted [
15]. Resident activities are an important dimension for assessing social recovery, and indicators of the extent of recovery are a key representation of social resilience. A significant portion of urban residents’ travel activities depends on public transportation. Thus, the spatial and temporal distribution characteristics of transportation system activities are considered a starting point for studying activity resilience [
16]. During the more severe periods of the pandemic, residents’ travel activities were regulated due to government control measures and concerns for their own health. The urban rail transit system, which is the backbone of urban residents’ travel activities, especially commuting, can serve as an important indicator for understanding resident activity resilience, given its high passenger density and elevated risk of transmission. Therefore, analyzing changes in rail transit passenger flow metrics can offer important insights into resident activity resilience.
Since the outbreak of the pandemic, scholars have examined the frequency of residents’ travel activities and the shift in transportation modes across the world. For subway systems, the decline in residents’ travel frequency has become an indisputable fact [
17]. Research has shown that the land-use characteristics around subway stations play a key role in the differences in passenger flow, with this variance being particularly evident in the proportion of passenger flow during peak hours, which correlates strongly with residential and commuting land use. Several studies have highlighted the differences in station passenger flow during the early stages of the pandemic. For example, a study of Seoul’s subway passenger flow from January to March 2020 found that stations serving work-related functions experienced relatively small fluctuations in passenger flow, while those serving leisure and entertainment purposes saw frequent fluctuations closely linked to the rise in new infections [
18]. A contemporaneous investigation of Taiwan province’s subway system revealed that stations near large shopping centers, universities, and areas with active nighttime economies were the most severely impacted by the pandemic [
6]. In a study conducted in Hong Kong on transportation modes used for different travel purposes, it was found that subway ridership for entertainment and shopping trips dropped by more than 80% and 40%, respectively, further highlighting the significant decline in non-essential travel during the pandemic [
7].
After the pandemic went through various stages, including virus mutations, residents receiving vaccinations, and the implementation of control measures, life gradually returned to normal for the population. Although studies indicate that subway ridership cannot return to pre-pandemic levels in a short period, a trend of gradual recovery after system disruptions has emerged. Therefore, it is necessary to study the differences in subway ridership resilience in this “new normal,” as this can better assist operators in improving operational efficiency by adjusting strategies such as train–car matching, route planning, and departure frequency [
19]. Consequently, we can draw on previous analyses of subway network resilience to explain the station-level differences from the perspective of physical factors. At present, studies based on station network topological attributes (e.g., network centrality and betweenness centrality) and travel performance indicators (e.g., travel time, waiting time, and safety attributes) observe the recovery characteristics of subway networks when affected by global or local disruptive events. For example, in an analysis using Chengdu’s subway system, the recovery performance of stations and local lines was evaluated during disruption, response, and recovery phases considering impedance needs. This provided practical suggestions for improving emergency response capabilities [
20,
21,
22]. Research on identifying key stations using weighted network complex topology dynamics found that during peak hours, weighted network indicators shifted alongside changes in passenger flow, indicating that key stations are not static [
23]. In the event of a sudden heavy rainfall, which leads to an influx of more passengers than usual, Zou et al. analyzed the impact of such extreme weather events on subway networks, including the spatiotemporal effects on passenger flow and resilience evaluation, using passenger elasticity curves to assess system resilience [
24].
2.2. The Correlation Between Land-Related Built Environment Factors and Passenger Flow
The TOD model encourages the integration of public transportation infrastructure with an efficiently built environment that is walkable by developing station areas closely connected to non-motorized transportation [
10,
25]. Therefore, the catchment area around subway stations is generally limited to a 15 min walking zone, typically defined within a radius of 500 to 1000 m [
26,
27]. It has been verified that the intensity of land use and accessibility of the built environment around stations play an extremely important role in generating and attracting passenger flow. This is not difficult to understand, and can be better illustrated through travel theory based on the four-step model [
28]. For example, population density and a younger population around stations positively promote passenger flow, and areas with higher levels of urban development tend to attract commuting passenger flow from less developed areas, creating a siphoning effect. Of course, the spatial effects of these variables are not static, and they are also related to the spatial morphology of cities [
29]. From a temporal heterogeneity perspective, the same land-use characteristics may have different impacts on passenger flow at different times. For example, the influence of residential and office land use on the passenger flow of surrounding stations may show opposite correlations during morning and evening peak hours [
26,
30]. Commercial and recreational land use also have significantly different impacts during peak and non-peak periods [
31,
32]. Regardless of the type of land use, it is through the generation and attraction of trips that travel activities across the entire network are formed.
Single-type land use may not accurately reflect the development intensity per unit area, while land use-related POI (point of interest) data can effectively compensate for this limitation. Because POI data have clearer functional attributes, they can have a more direct impact on the generation and attraction of passenger flow [
33,
34]. For example, when two stations have the same area of commercial land use, the station with a greater number of associated POIs indicates a higher intensity of commercial development in that area. Although this method of judgment may not be applicable on a global scale, it serves as a valuable analytical supplement. While some studies have confirmed the association between a station’s topological structure, road density, and other connecting distributions with passenger flow characteristics, the complexity and degree of correlation with passenger flow characteristics are less than those of land attributes [
35,
36,
37].
Research methods exploring the associative characteristics between the two primarily rely on advancements in computational techniques, which can be broadly categorized into three types: global regression methods, local regression methods, and supervised machine learning methods. A typical representative of global regression methods is ordinary least squares (OLS), which clearly promoted detailed research on the relationship between the built environment and passenger flow characteristics in the early stages of study [
38,
39,
40]. However, because it considers global regression, OLS cannot reflect local distribution differences. This limitation led to the application of local regression models, represented by geographically weighted regression (GWR), which uses a distance matrix to capture spatial distribution differences and provide fitting variations for the model [
13,
14]. However, all of the aforementioned methods adhere to the linear regression assumptions between the dependent and independent variables, which has been proven to be unrealistic. This results in poor fitting performance and imprecise interpretability of these methods. In contrast, tree-based supervised machine learning methods excel at uncovering such nonlinear relationships among variables, which has led to their successful application in various cities in recent years [
41,
42].
2.3. Limitations and Contributions
In summary, although most researchers have applied the relationship between various built environment variables and passenger flow characteristics across different regions, there are still many limitations. First, there has been inadequate analysis of passenger flow resilience at stations during the post-pandemic period, particularly in the recovery stage, and how these station-specific differences in passenger flow resilience relate to land use-associated built environment factors remains unverified. Additionally, there exists a gap in understanding the nonlinear relationships of the same built environment factors among different studies, primarily due to variations in the spatial and temporal contexts of the research subjects. Finally, current tree-based models have poor control over complexity, and the selection of hyperparameters in supervised learning methods often relies on experience, which may lead to model overfitting and reduced generalization capability.
Therefore, this study first uses swipe data during the pandemic recovery period to determine the evaluation indicators for passenger flow resilience. Next, it proposes a new machine learning regression model that combines LGWO (gray wolf optimization with Levy flight) and LightGBM (light gradient boosting machine). This model can automatically configure parameters with the help of heuristic algorithms, effectively limiting model complexity and preventing overfitting. Furthermore, it incorporates SHAP (Shapley additive explanations) tools to conduct a global feedback mechanism analysis on the selected variables, illustrating the positive and negative feedback and threshold effects of passenger flow resilience and land use. The research findings can provide a unique perspective on the recovery characteristics of subway network passenger flow under the influence of public health events, addressing experiential judgments in operational decision-making without a scientifically sound theoretical basis.
4. Methodology
This study first employs the MGWR model to analyze the spatial heterogeneity of built environment variable coefficients under weekday and weekend passenger flow characteristics, aiming to examine the spatial scale associations of these variables. Secondly, the tree-based LightGBM algorithm is used to assess the contribution of built environment variables to station-level passenger flow recovery resilience. To optimize the hyperparameters of the LightGBM algorithm for the best-fitting characteristics, an LGWO method is proposed for automatic hyperparameter tuning. Finally, SHAP theory, based on Shapley values, is utilized to explain the nonlinear relationship trends between individual variables and PFRR, as well as the corresponding threshold effects.
4.1. MGWR Model
In the analysis of geographic spatial issues, independent variables have both global effects on dependent variables and localized effects due to spatial differences. In this study, the built environment variables related to land use may exhibit different impacts due to the spatial distribution of subway stations. The MGWR model is a local regression model that embeds geographic location information into the regression parameters through weighting, allowing for the estimation of local spatial variations in the data. It can identify the degree of influence that built environment indicators related to land use have on the PFRR of each subway station. Factors influencing station-level PFRR may exhibit both spatially stable and spatially unstable characteristics. By allowing variable bandwidth differences, the MGWR model improves upon GWR, resulting in more accurate estimation outcomes. The MGWR method is employed in this study to investigate the spatial heterogeneity of the regression coefficients of land use-related built environment variables on passenger flow recovery resilience, with spatial visualization of the results. The calculation formula is as follows:
In the equation, represents the PFRR value at station i, denotes the j-th built environment indicator for station i, k is the total number of stations, represents the spatial coordinates of the subway center point, is the intercept for station i, is the local regression coefficient of the j-th built environment indicator for station i, and represents the random error term.
The selection of the bandwidth in the MGWR model is a direct indicator for measuring how the relationship between the independent variables and the dependent variable varies with spatial scale. In other words, the bandwidth reflects how many surrounding sample points are required to estimate the regression parameters. By using different bandwidths for different variables, the model can capture relationships at varying spatial scales. The Akaike information criterion (AIC) is used to determine the optimal bandwidth selection by evaluating the goodness of fit. The basic form is as follows:
where
is the corrected value of
AIC,
n is the size of the sample,
is the maximum likelihood estimate of the variance of the random error term, and
de notes the trace of matrix
S. Similarly, the bandwidth corresponding to the minimum
is selected as the optimal bandwidth.
4.2. LightGBM Model
LightGBM uses a histogram-based method to efficiently process data by fitting model residuals, gradually improving the performance of weak learners. It employs a leaf-wise growth strategy to limit tree depth, iterating by splitting only the nodes with the highest gain, thereby reducing errors and improving prediction accuracy.
The initial objective function of the model consists of the loss function
and the regularization term
. The objective function for LightGBM can be expressed as:
where
n represents the total number of samples,
is the actual value of the
i-th sample,
is the predicted value of the
i-th sample,
refers to the
k-th tree model, and
denotes the complexity of the
k-th tree.
The objective function can be transformed into the following form:
where
Const is the constant term. By expanding using the second-order Taylor series and removing the constant term, we can regularize the expansion and combine the coefficients of the linear and quadratic terms to obtain the final objective function. The second-order Taylor expansion of the objective function is as follows:
4.3. LGWO Optimization Algorithm
In the process of fitting the relationship between the subway station-level PFRR value and land use-related built environment factors, it is necessary to adjust the hyperparameters of the LightGBM model to achieve optimal performance. Therefore, the LGWO algorithm is chosen for parameter optimization to overcome local optimality issues and automatically generate the optimal set of hyperparameters, enhancing the model’s fitting and predictive accuracy:
where
,
is the current position of the leader wolf
, and
a is the random coefficient. The expression for determining the random search path
of the leader wolf
is:
In the equation, is the best position of the leader wolf, and both and follow a normal distribution
4.4. SHAP Attribution Analysis Principles
SHAP is a post hoc explanation method inspired by game theory. It measures the impact of various features and their interaction terms by calculating their marginal contributions in the model. The marginal contributions of different features and their interaction terms are known as Shapley values. Shapley values quantify the influence of different features on the model’s output, providing an explanation for the predictions made by black-box models. Let the prediction be represented as
f(
x), then we have:
In the equation,
represents the magnitude of the impact of each feature on
. According to the model’s predictions, it can be understood as the sum of all the features. The calculation formula for
is as follows:
In the equation, the overall expression represents an expected value, indicating the changes in the simulation results when feature xi is included in the model versus when it is not. Here, M refers to the full set of features, S denotes a subset of features from , with multiple possible values corresponding to different feature combinations. and represent the model’s output results when is included in the model and when it is not, respectively.
4.5. Overall Architecture Implementation Process
The overall model architecture can be divided into the following steps:
Step 1: Obtain land use-related built environment variables and calculate PFRR values for weekdays and weekends.
Step 2: Use the MGWR model to solve for the optimal bandwidth of each variable, verify the spatial heterogeneity distribution of the variable coefficients, and visualize the results.
Step 3: Select the tree depth, number of leaves, and learning rate in the LightGBM model as optimization targets.
Step 4: Initialize the parameters of the LGWO algorithm, including the value range of the parameters to be optimized, the initial positions of the gray wolf population, the population size, and the fitness function.
Step 5: Use the LGWO algorithm for optimization, continuously updating the position of the wolf and outputting the optimal parameter set to LightGBM.
Step 6: Fit the relationship between built environment variables and PFRR using LightGBM, utilizing R-squared, RMSE, and MAE metrics, and output the importance of built environment variables.
Step 7: Perform global positive and negative feedback feature analysis and investigate the nonlinear relationships and threshold effects between individual variables and passenger flow recovery resilience using SHAP analysis based on Shapley values.
5. Results
5.1. Built Environment Coefficient Spatial Heterogeneity
In the process of model calculation, we standardized the built environment variables. When constructing the MGWR model, we used the variance inflation factor (VIF) to analyze the correlation among the independent variables, finding that all VIF values were less than 5. This indicates that there is no significant multicollinearity among the variables, allowing them to be included in the model. According to the introduction of the MGWR model, coefficient spatial distribution heterogeneity exists only when the bandwidth is less than the standard global bandwidth. The results show that for PFRR on weekdays and weekends, the bandwidths for
,
,
,
, and
are all less than 85 (the global bandwidth). Therefore, we characterize the spatial distribution heterogeneity of the coefficients corresponding to these land-related built environment variables. It is worth noting that in order to highlight samples contributing to spatial heterogeneity, only samples with a
p-value less than 0.1 are depicted. The spatial distribution of the coefficients is shown in
Figure 4 and
Figure 5.
Figure 4 and
Figure 5 illustrate the distribution of parameters with spatially heterogeneous variable coefficients, showing only the distribution of stations with
p-values less than 0.1, meaning only statistically significant stations are displayed. We can observe that
has a positive promoting effect on the PFRR during both time periods (with positive coefficients). On weekends, this type of feature has a greater impact in urban areas than in the terminal areas of the lines; however, the opposite is true for weekdays, where the influence of urban areas on PFRR is less than that of non-city center areas, especially significantly impacting stations in the western part of the city.
has a negative effect on the PFRR for both weekdays and weekends, indicating a poor recovery of non-essential travel, with no obvious spatial distribution pattern.
positively correlates with PFRR, with varying degrees of influence on weekdays and weekends. Specifically, stations with a smaller impact on weekdays may exhibit a larger influence on weekends. For
, there are fewer stations with spatially heterogeneous distribution on weekends, reflecting that the size of leisure land area does not constrain its impact on PFRR on weekends, while for weekdays, the negative impact is greater for north–south-oriented stations compared to east–west-oriented stations. Notably, the influence of
on PFRR is negative, indirectly indicating a shift from subway travel to other commuting patterns. Stations with coefficient spatial heterogeneity are predominantly located at the ends of the lines.
To more comprehensively explore the spatial heterogeneity of PFRR, we further analyzed the impact of urban land-use patterns on PFRR and considered the interference of transportation modes such as private cars, public transit, and walking. Our analysis indicates that the diversity and complexity of urban land use significantly affect PFRR. Specifically, areas with a concentration of commercial and office spaces exhibit stronger PFRR due to higher commuting demands on weekdays. However, PFRR may decline in these areas during weekends due to an increase in commercial and entertainment activities. Additionally, residential areas generally have lower PFRR, which may be associated with a shift in residents’ travel patterns, from subways to private cars or walking and other modes of transportation.
When considering the interference of private cars, public transit, and walking, we found that these modes of transportation have a complex impact on the spatial distribution of PFRR. The prevalence of private cars may lead to a reduction in subway passenger flow in certain areas, especially in residential areas on the outskirts of the city. Meanwhile, the layout of public transit routes and pedestrian-friendly urban planning also affect PFRR. For instance, areas dense with bus stops may attract more public transit passengers, thereby affecting the subway’s PFRR. Pedestrian-friendly environments may increase the willingness of people to walk, particularly for short-distance trips, which also impacts the subway’s PFRR.
5.2. Built Environment’s Significance and Global Impact
Global interpretation refers to explaining the input features and feature interactions of the entire model, specifically by calculating the Shapley values for all features and their interactions. The contribution levels of different features to the PFRR are shown in
Figure 6, where feature importance is ranked based on the average absolute impact on the target variable.
Figure 6 presents the ranking of variable contributions to the optimal fitting of PFRR by the LGWO–LightGBM model on weekdays and weekends.
As shown in
Figure 6a, for weekdays,
and
have the highest importance, accounting for 37.69% and 18.83%, respectively. In contrast, the importance of
is only 9.82%. This suggests that travelers who shifted away from using the subway during the rapid recovery phase did not quickly return to it, likely due to changes in travel habits. Once travel behavior changes, it does not easily revert to the original state.
,
,
, and
have less impact on PFRR, indicating that non-essential travel did not effectively recover during the rapid recovery period.
As shown in
Figure 6b, for weekends, the impact of land-related built environment variables on PFRR is relatively balanced. Notably,
has the most significant impact, indicating an increase in trips for recreational purposes, which reflects the recovery of travel vitality.
and
also contribute significantly, with the positive and negative feedback mechanisms on weekend PFRR being clearly identifiable in
Figure 7. Additionally, features related to non-essential travel, such as
,
, and
, also have relatively high contributions, highlighting the recovery of non-essential travel during weekends.
Figure 7 presents the SHAP summary plots for weekdays and weekends, illustrating the impact and importance of various features on the PFRR. The SHAP summary plot ranks the variables from top to bottom based on the sum of their absolute SHAP values. The relationship between the Shapley values and the feature values for each sample helps to estimate the effect of each feature on the prediction results. For example, on weekdays, stations with higher
tend to show mostly negative SHAP values. This suggests that for most stations, an increase in
leads to a negative feedback mechanism on the resilience of passenger flow recovery. Similar negative feedback mechanisms are observed for
,
, and
. Other variables either have a positive feedback mechanism or show no significant feedback. On weekends,
exhibits a clear positive feedback mechanism, as do
and
. In the following section, a more detailed investigation will be conducted into the nonlinear relationships between individual variables and features.
5.3. Built Environment’s Nonlinear Impact
In this subsection, we introduce the SHAP dependence plots for individual features.
Figure 8 and
Figure 9 illustrate the specific relationships between individual variables and PFRR for weekdays and weekends, respectively. The horizontal axis represents the values of the features, while the vertical axis indicates the corresponding SHAP values. By observing the trends in SHAP values, we can infer the degree to which the feature impacts the model’s results, allowing for a better understanding of the relationship between the feature and the model.
From
Figure 8, it can be seen that for the PFRR on weekdays, the selected built environment variables exhibit significant nonlinear relationships.
has a negative feedback effect on the dependent variable at lower values, and as the values increase, their SHAP values become positive, indicating a subsequent positive feedback phenomenon. Similarly, other built environment variables that display this initial negative and then positive trend include
,
,
,
, and
. For example,
facilities show a positive SHAP value at lower values, which then turns negative as the values increase, indicating an association with passenger flow resilience that shifts from positive to negative. Other variables exhibiting this positive-to-negative relationship include
,
,
,
, and
.
From
Figure 9, it can be observed that for the PFRR on weekends, different types of nonlinear relationships are also evident. Built environment variables related to land that exhibit a negative-to-positive feedback characteristic include
,
,
, and
. In contrast, variables such as
,
,
, and
show a positive-to-negative feedback pattern. Other features display alternating positive and negative feedback characteristics or fluctuate around a SHAP value of zero, indicating that the feedback mechanisms of these variables on PFRR are influenced by the random variation of sample points.
5.4. Comparison of Model Fitting Effects
To highlight the advantages of the proposed model in terms of fitting effects and performance, we selected the following models for comparison, assessing them based on R-squared, RMSE, MAE, as well as runtime and model complexity. The characteristics of the baseline models are as follows.
OLS: This method estimates parameters in a linear model by finding the best-fitting line that minimizes the sum of the squared vertical distances (i.e., errors) from all data points to this line. It is a straightforward and intuitive approach that serves as the baseline model for many regression analyses.
GBDT [29]: This is an ensemble learning algorithm that iteratively trains decision trees to minimize a loss function, with each tree attempting to correct the residuals of the previous one. GBDT (gradient boosting decision tree) adds a new tree at each step to improve the model’s fit to the data while controlling model complexity through a regularization term to prevent overfitting.
LightGBM: This efficient gradient boosting decision tree algorithm significantly improves training speed and reduces memory usage through histogram optimization and a leaf-wise growth strategy.
XGBoost [11]: This is an efficient gradient boosting framework that improves model performance by sequentially adding decision trees, each aiming to correct the residuals of the previous model. XGBoost incorporates a regularization term into the objective function to control model complexity and prevent overfitting.
Our personal computers are equipped with a CPU of i7-9700 2.40 GHz and 32 GB RAM, running in a Python 3.7 environment.
Table 2 presents the fitting results of the model proposed in this study and the baseline models.
As shown in
Table 2, the new method proposed in this study demonstrates advantages in fitting characteristics. For the XGBoost and LightGBM methods listed in
Table 2, we used grid search cross-validation to obtain the optimal combination of hyperparameters. The range for the maximum depth of trees (Max depth) was set from 1 to 10, while the number of regression trees (N estimators) ranged from 10 to 50. The information in
Table 2 indicates that the proposed model achieves a more accurate fitting effect while reducing model complexity and computation time. Overall, the reduced number of optimal hyperparameters suggests that the proposed architecture is more streamlined in terms of model complexity, thereby minimizing the risk of overfitting and operational complexity associated with parameter tuning.
6. Discussion and Conclusions
This study examined the relationship between station-level passenger flow recovery resilience and land-related built environment factors during the post-pandemic period. We developed a novel approach that combines heuristic algorithms with supervised tree-based methods to explore the impact of land use-related built environment variables on station-level passenger flow recovery resilience. Several interesting and unique findings warrant further discussion.
First, during the recovery phase of passenger flow amid the pandemic, although sporadic outbreaks may still affect people’s travel, the characteristics of passenger flow recovery were not significantly impacted. People may have adapted to the travel patterns influenced by the pandemic in a normalized context [
22,
44]. However, there were significant fluctuations in passenger flow recovery among different sites, reflecting spatial differences in how people respond to sudden public health emergencies. It is generally believed that high elasticity and rapid recovery occur mainly outside urban centers, and residential statistics are positively correlated with this phenomenon [
45]. Indeed, our study found a strong correlation between the contribution of residential land and the characteristics of passenger flow resilience disturbances. Additionally, based on the spatial distribution of correlation coefficients for built environment variables at subway stations in Xi’an, we observed that this spatial heterogeneity is more closely associated with the urban center.
Second, we found that compared to weekdays, the importance of built environment factors influencing weekend passenger flow resilience is relatively balanced. This reflects the non-urgency of resident travel on non-working days, and the reduction in essential travel underscores the diverse land uses’ multifaceted impact on passenger flow composition during this period. This confirms the temporal variation in station vulnerability [
46]. Overall, the nonlinear effects of the variables exhibit distinct threshold characteristics, providing significant data support for operational strategy adjustments during the recovery phase, such as offering more efficient departure frequencies or reserving greater onboard capacity at rapidly recovering stations.
From a policy perspective, the pandemic has exacerbated disparities in travel accessibility. Our findings indicate that commercial land, recreational land, and residential land are decisive factors during the rapid recovery phase. This suggests that the speed of recovery for subway travel is most closely related to residents’ leisure and daily entertainment travel within the vicinity of the stations, reflecting the broad service scope of the subway as a fundamental mode of transportation. It is important to note that many studies have pointed out that low-income populations, along with age and gender characteristics, may exhibit a greater dependence on public transportation. This could lead to a quicker return to public transit for these groups in the post-pandemic period [
47,
48]. This confirms that stations with higher residential characteristics in non-central areas tend to have greater recovery capabilities. Therefore, our research provides a basis for the integrated planning of subway networks, supporting the development around subway stations to enhance their resilience in the face of unknown disturbances or sudden public health events.
In the context of urban planning and policymaking, the application of the PFRR index provides a powerful tool for understanding and enhancing the resilience of urban transportation systems. Considering the significant impact of commercial, leisure, and residential land use on the recovery of passenger flow at subway stations, urban planners should consider promoting diversity and comprehensiveness in land use around subway stations to enhance the recovery capacity of traffic flow. The study reveals the recovery capabilities of stations in non-central urban areas, suggesting that policymakers should pay attention to the potential for public transport development in these areas and improve the connectivity of the entire city’s transportation through service optimization. The PFRR index can reflect the adaptability of stations to public health emergencies, and urban planners and policymakers can use this index to assess and plan the resilience of the urban transportation system, developing effective emergency response plans. Data play a key role in revealing the recovery force of passenger flow, and urban planners and policymakers can use these data to guide decision-making, optimize resource allocation, and operational strategies.
Although this paper approaches the topic from the perspective of station-level passenger flow recovery resilience and presents a novel analytical framework with specific research results, particularly unique indicators of passenger flow recovery during the pandemic recovery period, challenges remain regarding the impact of such public health events on travel behavior. Moreover, there is a lack of comprehensive analyses of the impact on public transportation networks. Specifically, the applicability of the conclusions drawn from our case study to other cities needs further validation. Additionally, while this study focuses on the impact of land-use characteristics on passenger flow recovery resilience indicators, future research should also consider factors such as residents’ income, age, and other personal transportation tools in relation to their impact on recovery resilience.