An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment

Li, Peikun; Yang, Quantao; Lu, Wenbo; Xi, Shu; Wang, Hao

doi:10.3390/land13111887

Open AccessArticle

An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment

by

Peikun Li

¹

,

Quantao Yang

^2,*,

Wenbo Lu

^3,4

,

Shu Xi

¹ and

Hao Wang

¹

Key Laboratory of Transport Industry of Big Data Application Technologies for Comprehensive Transport, Ministry of Transport, Beijing Jiaotong University, Beijing 100044, China

²

Department of Public Security, Shaanxi Police College, Xi’an 710021, China

³

School of Transportation, Southeast University, Nanjing 214135, China

⁴

Department of Civil Engineering, Monash University, Melbourne, VIC 3800, Australia

^*

Author to whom correspondence should be addressed.

Land 2024, 13(11), 1887; https://doi.org/10.3390/land13111887

Submission received: 22 September 2024 / Revised: 6 November 2024 / Accepted: 9 November 2024 / Published: 11 November 2024

(This article belongs to the Special Issue Land Use Planning for Post COVID-19 Urban Transport Transformations)

Download

Browse Figures

Versions Notes

Abstract

:

The COVID-19 pandemic and similar public health emergencies have significantly impacted global travel patterns. Analyzing the recovery characteristics of subway station-level passenger flow during the pandemic recovery phase can offer unique insights into public transportation operations and guide practical planning efforts. This pioneering study constructs a station-level passenger flow recovery resilience (PFRR) index during the rapid recovery phase using subway AFC system swipe data. Additionally, it develops an analytical framework based on a multiscale geographically weighted regression (MGWR) model, the improved gray wolf optimization with Levy flight (LGWO), and light gradient boosting machine (LightGBM) regression to analyze passenger flow resilience on weekdays and weekends in relation to land use-related built environment types. Finally, SHAP attribution analysis is used to study the nonlinear relationships between built environment variables and PFRR index. The results show significant spatial heterogeneity in the impact of commercial, recreational, and residential land, as well as POI (points of interest) of leisure and shopping on PFRR. On weekdays, the most relevant built environment variables for PFRR are POI of enterprises and shopping numbers. In contrast, the contribution of built environment variables affecting PFRR of weekend is more balanced, reflecting the recovery of non-essential travel on weekends. Most land use-related built environment variables exhibit nonlinear associations with PFRR values. The proposed analytical framework shows significant performance advantages over other baseline models. This study provides unique insights into subway passenger flow characteristics and surrounding land use-related development layouts under the impact of public health emergencies.

Keywords:

land use; passenger flow recovery resilience; light gradient boosting machine; nonlinear relationship; SHAP value

1. Introduction

The stable development of urban society relies on the efficient operation of infrastructure. Public transportation systems bear the significant task of facilitating travel for residents in medium and large cities and have become crucial infrastructure support systems for TOD (transit-oriented development) in recent years [1]. However, with the continuous expansion and development of cities, the resilience of public transportation systems to natural disasters and public health emergencies has been declining. After such disruptions, it is challenging for these systems to recover quickly. The loss of or reduced transportation system functionality directly impacts population mobility and affects social and economic recovery [2]. Due to their efficiency, punctuality, and large capacity, subway systems are popular among commuters in major cities and account for a significant portion of public transportation. Since 2020, the COVID-19 pandemic has severely impacted travel patterns. The need for social distancing due to the airborne nature of the virus became a primary concern during severe phases of the pandemic [3]. Consequently, subway travel was severely affected, with a significant decline in passenger flow.

In the post-pandemic new normal, subway passenger flow is gradually recovering, but it exhibits spatial and temporal differences at the station level [4]. This necessitates introducing a new concept of station-level passenger flow resilience to represent these differences. As early as the late 1990s, researchers began using the concept of resilience to study cities as complex socio-economic systems. Urban resilience refers to the ability of the overall urban system and its subsystems to maintain and restore their functions when facing external shocks. Due to the pandemic, most urban rail transit systems experienced three phases: a sharp decline, gradual recovery, and stable normalization [5]. Station-level passenger flow reflects the relationship between urban rail transit systems and urban spaces. The built environment around stations, especially land-use characteristics, significantly influences the spatial and temporal distribution of passenger flow. For example, stations primarily serving residential and employment functions tended to recover passenger flow more quickly post-pandemic, while those focused on tourism and entertainment recovered more slowly, showing clear differences between weekdays and non-weekdays [6,7].

Reviewing the mechanisms behind the differentiated distribution of passenger flow characteristics, a large body of literature has explored the complex relationship between multi-source built environment variables and passenger flow characteristics. The selection of built environment variables primarily focuses on the 5D characteristics of density, diversity, design, distance, and destination. Overall, factors related to residence and employment tend to play a decisive role in passenger flow characteristics [8,9]. Other types of land use are mainly associated with non-essential travel, thus exhibiting considerable fluctuations in their correlation with overall station connections. Other built environment factors, such as the number of bus routes and distance to the city center, are also important influencing factors, especially concerning peak-hour passenger flow [10,11]. However, when studying the correlation between subway passenger flow and the built environment under the impact of the pandemic, many scholars rely on qualitative analysis or empirical judgment without systematically quantifying the complex relationship between land-use characteristics and passenger flow resilience during the post-pandemic recovery stage.

In terms of research methods, earlier approaches often used global regression models and local regression models, overlooking the fact that the impact of built environment factors on passenger flow distribution varies across spatial and temporal scales. To address this, the multiscale geographically weighted regression (MGWR) model solves this issue by optimizing bandwidth differences to account for these variations, although it still adheres to the assumption of linear relationships [12,13,14]. As machine learning’s tree-based regression methods handle supervised learning problems by fitting features according to the tree structure and solving the limitations of linear regression assumptions with second-order residual-based criteria, they still suffer from the drawback of relying on subjective experience during hyperparameter settings in supervised learning [11].

In summary, although detailed research has been conducted on the changes in transportation services under the impact of the pandemic, there is still a lack of metric development for station-level passenger flow resilience in the post-pandemic era, as well as analysis of the mechanisms behind the complex nonlinear relationships between these changes and land use-related built environment variables and their differentiated spatiotemporal distribution. Based on this, the current study uses AFC card data to establish station-level passenger flow recovery resilience (PFRR) characteristics during the post-pandemic period and constructs a new analysis framework. The objectives are to analyze the spatiotemporal distribution framework of station-level PFRR indices in different time periods in relation to land use-related built environment characteristics and to optimize light gradient boosting machine (LightGBM) hyperparameters using improved gray wolf optimization with Levy flight (LGWO) to achieve the optimal fitting of LightGBM supervised learning. Subsequently, SHAP attribution analysis based on game theory is used to evaluate the nonlinear impacts of relevant built environment factors on passenger flow resilience. Finally, other commonly used baseline methods are employed to evaluate the model’s efficiency and validity using different evaluation metrics to verify the effectiveness and rationality of the proposed analysis approach.

2. Related Work

2.1. Research on Passenger Flow Resilience

Resilience refers to the ability to recover from the current state to the original state, specifically the capacity of a related system to return to its previous state after being disrupted [15]. Resident activities are an important dimension for assessing social recovery, and indicators of the extent of recovery are a key representation of social resilience. A significant portion of urban residents’ travel activities depends on public transportation. Thus, the spatial and temporal distribution characteristics of transportation system activities are considered a starting point for studying activity resilience [16]. During the more severe periods of the pandemic, residents’ travel activities were regulated due to government control measures and concerns for their own health. The urban rail transit system, which is the backbone of urban residents’ travel activities, especially commuting, can serve as an important indicator for understanding resident activity resilience, given its high passenger density and elevated risk of transmission. Therefore, analyzing changes in rail transit passenger flow metrics can offer important insights into resident activity resilience.

Since the outbreak of the pandemic, scholars have examined the frequency of residents’ travel activities and the shift in transportation modes across the world. For subway systems, the decline in residents’ travel frequency has become an indisputable fact [17]. Research has shown that the land-use characteristics around subway stations play a key role in the differences in passenger flow, with this variance being particularly evident in the proportion of passenger flow during peak hours, which correlates strongly with residential and commuting land use. Several studies have highlighted the differences in station passenger flow during the early stages of the pandemic. For example, a study of Seoul’s subway passenger flow from January to March 2020 found that stations serving work-related functions experienced relatively small fluctuations in passenger flow, while those serving leisure and entertainment purposes saw frequent fluctuations closely linked to the rise in new infections [18]. A contemporaneous investigation of Taiwan province’s subway system revealed that stations near large shopping centers, universities, and areas with active nighttime economies were the most severely impacted by the pandemic [6]. In a study conducted in Hong Kong on transportation modes used for different travel purposes, it was found that subway ridership for entertainment and shopping trips dropped by more than 80% and 40%, respectively, further highlighting the significant decline in non-essential travel during the pandemic [7].

After the pandemic went through various stages, including virus mutations, residents receiving vaccinations, and the implementation of control measures, life gradually returned to normal for the population. Although studies indicate that subway ridership cannot return to pre-pandemic levels in a short period, a trend of gradual recovery after system disruptions has emerged. Therefore, it is necessary to study the differences in subway ridership resilience in this “new normal,” as this can better assist operators in improving operational efficiency by adjusting strategies such as train–car matching, route planning, and departure frequency [19]. Consequently, we can draw on previous analyses of subway network resilience to explain the station-level differences from the perspective of physical factors. At present, studies based on station network topological attributes (e.g., network centrality and betweenness centrality) and travel performance indicators (e.g., travel time, waiting time, and safety attributes) observe the recovery characteristics of subway networks when affected by global or local disruptive events. For example, in an analysis using Chengdu’s subway system, the recovery performance of stations and local lines was evaluated during disruption, response, and recovery phases considering impedance needs. This provided practical suggestions for improving emergency response capabilities [20,21,22]. Research on identifying key stations using weighted network complex topology dynamics found that during peak hours, weighted network indicators shifted alongside changes in passenger flow, indicating that key stations are not static [23]. In the event of a sudden heavy rainfall, which leads to an influx of more passengers than usual, Zou et al. analyzed the impact of such extreme weather events on subway networks, including the spatiotemporal effects on passenger flow and resilience evaluation, using passenger elasticity curves to assess system resilience [24].

2.2. The Correlation Between Land-Related Built Environment Factors and Passenger Flow

The TOD model encourages the integration of public transportation infrastructure with an efficiently built environment that is walkable by developing station areas closely connected to non-motorized transportation [10,25]. Therefore, the catchment area around subway stations is generally limited to a 15 min walking zone, typically defined within a radius of 500 to 1000 m [26,27]. It has been verified that the intensity of land use and accessibility of the built environment around stations play an extremely important role in generating and attracting passenger flow. This is not difficult to understand, and can be better illustrated through travel theory based on the four-step model [28]. For example, population density and a younger population around stations positively promote passenger flow, and areas with higher levels of urban development tend to attract commuting passenger flow from less developed areas, creating a siphoning effect. Of course, the spatial effects of these variables are not static, and they are also related to the spatial morphology of cities [29]. From a temporal heterogeneity perspective, the same land-use characteristics may have different impacts on passenger flow at different times. For example, the influence of residential and office land use on the passenger flow of surrounding stations may show opposite correlations during morning and evening peak hours [26,30]. Commercial and recreational land use also have significantly different impacts during peak and non-peak periods [31,32]. Regardless of the type of land use, it is through the generation and attraction of trips that travel activities across the entire network are formed.

Single-type land use may not accurately reflect the development intensity per unit area, while land use-related POI (point of interest) data can effectively compensate for this limitation. Because POI data have clearer functional attributes, they can have a more direct impact on the generation and attraction of passenger flow [33,34]. For example, when two stations have the same area of commercial land use, the station with a greater number of associated POIs indicates a higher intensity of commercial development in that area. Although this method of judgment may not be applicable on a global scale, it serves as a valuable analytical supplement. While some studies have confirmed the association between a station’s topological structure, road density, and other connecting distributions with passenger flow characteristics, the complexity and degree of correlation with passenger flow characteristics are less than those of land attributes [35,36,37].

Research methods exploring the associative characteristics between the two primarily rely on advancements in computational techniques, which can be broadly categorized into three types: global regression methods, local regression methods, and supervised machine learning methods. A typical representative of global regression methods is ordinary least squares (OLS), which clearly promoted detailed research on the relationship between the built environment and passenger flow characteristics in the early stages of study [38,39,40]. However, because it considers global regression, OLS cannot reflect local distribution differences. This limitation led to the application of local regression models, represented by geographically weighted regression (GWR), which uses a distance matrix to capture spatial distribution differences and provide fitting variations for the model [13,14]. However, all of the aforementioned methods adhere to the linear regression assumptions between the dependent and independent variables, which has been proven to be unrealistic. This results in poor fitting performance and imprecise interpretability of these methods. In contrast, tree-based supervised machine learning methods excel at uncovering such nonlinear relationships among variables, which has led to their successful application in various cities in recent years [41,42].

2.3. Limitations and Contributions

In summary, although most researchers have applied the relationship between various built environment variables and passenger flow characteristics across different regions, there are still many limitations. First, there has been inadequate analysis of passenger flow resilience at stations during the post-pandemic period, particularly in the recovery stage, and how these station-specific differences in passenger flow resilience relate to land use-associated built environment factors remains unverified. Additionally, there exists a gap in understanding the nonlinear relationships of the same built environment factors among different studies, primarily due to variations in the spatial and temporal contexts of the research subjects. Finally, current tree-based models have poor control over complexity, and the selection of hyperparameters in supervised learning methods often relies on experience, which may lead to model overfitting and reduced generalization capability.

Therefore, this study first uses swipe data during the pandemic recovery period to determine the evaluation indicators for passenger flow resilience. Next, it proposes a new machine learning regression model that combines LGWO (gray wolf optimization with Levy flight) and LightGBM (light gradient boosting machine). This model can automatically configure parameters with the help of heuristic algorithms, effectively limiting model complexity and preventing overfitting. Furthermore, it incorporates SHAP (Shapley additive explanations) tools to conduct a global feedback mechanism analysis on the selected variables, illustrating the positive and negative feedback and threshold effects of passenger flow resilience and land use. The research findings can provide a unique perspective on the recovery characteristics of subway network passenger flow under the influence of public health events, addressing experiential judgments in operational decision-making without a scientifically sound theoretical basis.

3. Data and Processing

3.1. Research Area and Passenger Flow Data

Xi’an, the capital of Shaanxi Province, is the political, economic, educational, and cultural center of the northwest region of China, covering an area of 10,752 square kilometers (including the Xi’an Xianyang New Area) and with a population of 13.16 million as of 2021. Xi’an is an ideal case study that can reveal the relationship between the built environment around subway stations and network passenger flow. The first subway line was opened in 2011, and by the end of 2020, when the data analysis was conducted, four lines were studied after excluding the airport connector line between the airport and Xi’an North Station, totaling 87 stations. The distribution of the subway network and stations is shown in Figure 1. Line 1 runs east–west through the urban area of Xi’an, with a total length of about 25 km, connecting the Weiyang District and Baqiao District. Line 2 and Line 4 run north–south through the urban area of Xi’an, serving the city’s main educational institutions, science and technology parks, residential areas, and commercial districts. Line 3 extends in an arc from the high-tech zone in the northwest to the southeastern suburbs.

The selection of the recovery period during the pandemic is determined based on changes in passenger flow. First, we compile the hourly entrance data for each station. Next, we provide the statistical distribution of total passenger flow across the entire network. After that, based on the definition of the resilience triangle, we record the time it takes for the overall network passenger flow to recover from 50% to 80% of pre-pandemic levels. We then calculate the passenger flow recovery resilience indicators for each station during this period. Considering that passenger flow varies in regular weekly cycles, we use the passenger flow data from the week starting Monday, 6 January 2020—when the pandemic had not yet emerged—as our baseline. We aim to identify the times

T_{50 %}

and

T_{80 %}

, which represent when the overall network passenger flow first returns to this baseline after the pandemic. Our goal is to calculate the differences in passenger flow recovery resilience among various stations during this time. The changes in passenger flow characteristics across the entire network are illustrated in Figure 2.

3.2. Calculation of Station-Level PFRR

We used raw card swipe data to conduct a consolidated analysis of the weekly swipe data for each station during the

T_{50 %}

and

T_{80 %}

periods, and calculated the PFRR values for each station on both weekdays and weekends. This value represents the percentage of passenger flow recovery at each station during the

T_{50 %}

and

T_{80 %}

time points compared to the same period before the pandemic. The passenger flow resilience indicators for different weekdays at each station during the two time periods (weekdays and weekends) are shown in Figure 3.

3.3. Land Use-Related Built Environment Variables

Through literature review and practical experience, it is understood that land use-related variables have the greatest impact on travel. Therefore, based on extensive research by previous scholars and the selection of the catchment area around the stations in Xi’an, we define an 800 m radius as the statistical range for the built environment surrounding the stations [43]. Land-use attributes are categorized into five major types: work land, educational land (including secondary and university land), commercial land, residential land, and recreational land. To compensate for the inadequacy of land use in reflecting development intensity and its inability to directly represent the locations generating and attracting passenger flow, we also included the number of different types of POI as part of the selected built environment variables. It should be noted that the land-use data are sourced from the 2020 land-use statistics published by the Xi’an Municipal Bureau of Statistics and have been verified using online mapping tools. The POI data were obtained from web scraping of Amap data. Importantly, we utilized the Shannon entropy index to characterize land-use mix. Basic information about the selected variables is presented in Table 1.

4. Methodology

This study first employs the MGWR model to analyze the spatial heterogeneity of built environment variable coefficients under weekday and weekend passenger flow characteristics, aiming to examine the spatial scale associations of these variables. Secondly, the tree-based LightGBM algorithm is used to assess the contribution of built environment variables to station-level passenger flow recovery resilience. To optimize the hyperparameters of the LightGBM algorithm for the best-fitting characteristics, an LGWO method is proposed for automatic hyperparameter tuning. Finally, SHAP theory, based on Shapley values, is utilized to explain the nonlinear relationship trends between individual variables and PFRR, as well as the corresponding threshold effects.

4.1. MGWR Model

In the analysis of geographic spatial issues, independent variables have both global effects on dependent variables and localized effects due to spatial differences. In this study, the built environment variables related to land use may exhibit different impacts due to the spatial distribution of subway stations. The MGWR model is a local regression model that embeds geographic location information into the regression parameters through weighting, allowing for the estimation of local spatial variations in the data. It can identify the degree of influence that built environment indicators related to land use have on the PFRR of each subway station. Factors influencing station-level PFRR may exhibit both spatially stable and spatially unstable characteristics. By allowing variable bandwidth differences, the MGWR model improves upon GWR, resulting in more accurate estimation outcomes. The MGWR method is employed in this study to investigate the spatial heterogeneity of the regression coefficients of land use-related built environment variables on passenger flow recovery resilience, with spatial visualization of the results. The calculation formula is as follows:

y_{i} = β_{0} (u_{i}, v_{i}) + \sum_{j = 1}^{k} β_{b w j} (u_{i}, v_{i}) x_{i j} + ε_{i}

(1)

In the equation,

y_{i}

represents the PFRR value at station i,

x_{i j}

denotes the j-th built environment indicator for station i, k is the total number of stations,

(u_{i}, v_{i})

represents the spatial coordinates of the subway center point,

β_{0} (u_{i}, v_{i})

is the intercept for station i,

β_{b w j} (u_{i}, v_{i})

is the local regression coefficient of the j-th built environment indicator for station i, and

ε_{i}

represents the random error term.

The selection of the bandwidth in the MGWR model is a direct indicator for measuring how the relationship between the independent variables and the dependent variable varies with spatial scale. In other words, the bandwidth reflects how many surrounding sample points are required to estimate the regression parameters. By using different bandwidths for different variables, the model can capture relationships at varying spatial scales. The Akaike information criterion (AIC) is used to determine the optimal bandwidth selection by evaluating the goodness of fit. The basic form is as follows:

A I C c = 2 n l n (\hat{σ}) + n l n (2 π) + n \frac{n + t r (s)}{n - 2 - t r (s)}

(2)

where

A I C c

is the corrected value of AIC, n is the size of the sample,

\hat{σ}

is the maximum likelihood estimate of the variance of the random error term, and

t r (s)

de notes the trace of matrix S. Similarly, the bandwidth corresponding to the minimum

A I C c

is selected as the optimal bandwidth.

4.2. LightGBM Model

LightGBM uses a histogram-based method to efficiently process data by fitting model residuals, gradually improving the performance of weak learners. It employs a leaf-wise growth strategy to limit tree depth, iterating by splitting only the nodes with the highest gain, thereby reducing errors and improving prediction accuracy.

The initial objective function of the model consists of the loss function

l o s s (*)

and the regularization term

Ω (*)

. The objective function for LightGBM can be expressed as:

L^{t} (ϕ) = \sum_{i}^{n} l (y_{i}, \bar{y_{i}}) + \sum Ω (f_{k})

(3)

where n represents the total number of samples,

y_{i}

is the actual value of the i-th sample,

\bar{y_{i}}

is the predicted value of the i-th sample,

f_{k}

refers to the k-th tree model, and

\sum Ω (f_{k})

denotes the complexity of the k-th tree.

The objective function can be transformed into the following form:

L^{t} (ϕ) = \sum_{i = 1}^{n} l (y_{i}, {\bar{y_{i}}}^{t - 1} + f_{t} (x_{i})) + Ω (f_{t}) + C o n s t

(4)

where Const is the constant term. By expanding using the second-order Taylor series and removing the constant term, we can regularize the expansion and combine the coefficients of the linear and quadratic terms to obtain the final objective function. The second-order Taylor expansion of the objective function is as follows:

L^{t} (ϕ) = \sum_{i}^{n} [l (y_{i}, {\bar{y_{i}}}^{t - 1}) + g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t}) + C o n s t

(5)

4.3. LGWO Optimization Algorithm

In the process of fitting the relationship between the subway station-level PFRR value and land use-related built environment factors, it is necessary to adjust the hyperparameters of the LightGBM model to achieve optimal performance. Therefore, the LGWO algorithm is chosen for parameter optimization to overcome local optimality issues and automatically generate the optimal set of hyperparameters, enhancing the model’s fitting and predictive accuracy:

X_{α} (t + 1) = X_{α} (t) - a \oplus L (β)

(6)

where

a = r a n d o m (s i z e (α_{p o s i t i o n}))

,

x_{α} (t)

is the current position of the leader wolf

α

, and a is the random coefficient. The expression for determining the random search path

L (β)

of the leader wolf

α

is:

L (β) : 0.01 \frac{μ}{{|ν|}^{- β}} (X_{α} (t) - X_{α - b e s t}); β \in [1, 3]

(7)

In the equation,

X_{α - b e s t}

is the best position of the leader wolf, and both

μ

and

ν

follow a normal distribution

4.4. SHAP Attribution Analysis Principles

SHAP is a post hoc explanation method inspired by game theory. It measures the impact of various features and their interaction terms by calculating their marginal contributions in the model. The marginal contributions of different features and their interaction terms are known as Shapley values. Shapley values quantify the influence of different features on the model’s output, providing an explanation for the predictions made by black-box models. Let the prediction be represented as f(x), then we have:

f (x) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i}

(8)

In the equation,

ϕ_{i}

represents the magnitude of the impact of each feature on

f (x)

. According to the model’s predictions, it can be understood as the sum of all the features. The calculation formula for

ϕ_{i}

is as follows:

ϕ_{i} = \sum_{S \subseteq \{M \ x_{i}\}} \frac{|S|! (|M| - |S| - 1)!}{|M|!} \{f (x_{s \cup \{i\}}) - f (x_{s})\}

(9)

In the equation, the overall expression represents an expected value, indicating the changes in the simulation results when feature x_i is included in the model versus when it is not. Here, M refers to the full set of features, S denotes a subset of features from

\{M \ x_{i}\}

, with multiple possible values corresponding to different feature combinations.

f (x_{s \cup \{i\}})

and

f (x_{s})

represent the model’s output results when

x_{i}

is included in the model and when it is not, respectively.

4.5. Overall Architecture Implementation Process

The overall model architecture can be divided into the following steps:

Step 1: Obtain land use-related built environment variables and calculate PFRR values for weekdays and weekends.

Step 2: Use the MGWR model to solve for the optimal bandwidth of each variable, verify the spatial heterogeneity distribution of the variable coefficients, and visualize the results.

Step 3: Select the tree depth, number of leaves, and learning rate in the LightGBM model as optimization targets.

Step 4: Initialize the parameters of the LGWO algorithm, including the value range of the parameters to be optimized, the initial positions of the gray wolf population, the population size, and the fitness function.

Step 5: Use the LGWO algorithm for optimization, continuously updating the position of the

α

wolf and outputting the optimal parameter set to LightGBM.

Step 6: Fit the relationship between built environment variables and PFRR using LightGBM, utilizing R-squared, RMSE, and MAE metrics, and output the importance of built environment variables.

Step 7: Perform global positive and negative feedback feature analysis and investigate the nonlinear relationships and threshold effects between individual variables and passenger flow recovery resilience using SHAP analysis based on Shapley values.

5. Results

5.1. Built Environment Coefficient Spatial Heterogeneity

In the process of model calculation, we standardized the built environment variables. When constructing the MGWR model, we used the variance inflation factor (VIF) to analyze the correlation among the independent variables, finding that all VIF values were less than 5. This indicates that there is no significant multicollinearity among the variables, allowing them to be included in the model. According to the introduction of the MGWR model, coefficient spatial distribution heterogeneity exists only when the bandwidth is less than the standard global bandwidth. The results show that for PFRR on weekdays and weekends, the bandwidths for

X_{p o i}^{l e i s}

,

X_{p o i}^{s h o p}

,

X_{l a n d}^{c o m m}

,

X_{l a n d}^{r e c r}

, and

X_{l a n d}^{r e s i}

are all less than 85 (the global bandwidth). Therefore, we characterize the spatial distribution heterogeneity of the coefficients corresponding to these land-related built environment variables. It is worth noting that in order to highlight samples contributing to spatial heterogeneity, only samples with a p-value less than 0.1 are depicted. The spatial distribution of the coefficients is shown in Figure 4 and Figure 5.

Figure 4 and Figure 5 illustrate the distribution of parameters with spatially heterogeneous variable coefficients, showing only the distribution of stations with p-values less than 0.1, meaning only statistically significant stations are displayed. We can observe that

X_{p o i}^{l e i s}

has a positive promoting effect on the PFRR during both time periods (with positive coefficients). On weekends, this type of feature has a greater impact in urban areas than in the terminal areas of the lines; however, the opposite is true for weekdays, where the influence of urban areas on PFRR is less than that of non-city center areas, especially significantly impacting stations in the western part of the city.

X_{p o i}^{s h o p}

has a negative effect on the PFRR for both weekdays and weekends, indicating a poor recovery of non-essential travel, with no obvious spatial distribution pattern.

X_{l a n d}^{c o m m}

positively correlates with PFRR, with varying degrees of influence on weekdays and weekends. Specifically, stations with a smaller impact on weekdays may exhibit a larger influence on weekends. For

X_{l a n d}^{r e s i}

, there are fewer stations with spatially heterogeneous distribution on weekends, reflecting that the size of leisure land area does not constrain its impact on PFRR on weekends, while for weekdays, the negative impact is greater for north–south-oriented stations compared to east–west-oriented stations. Notably, the influence of

X_{l a n d}^{r e s i}

on PFRR is negative, indirectly indicating a shift from subway travel to other commuting patterns. Stations with coefficient spatial heterogeneity are predominantly located at the ends of the lines.

To more comprehensively explore the spatial heterogeneity of PFRR, we further analyzed the impact of urban land-use patterns on PFRR and considered the interference of transportation modes such as private cars, public transit, and walking. Our analysis indicates that the diversity and complexity of urban land use significantly affect PFRR. Specifically, areas with a concentration of commercial and office spaces exhibit stronger PFRR due to higher commuting demands on weekdays. However, PFRR may decline in these areas during weekends due to an increase in commercial and entertainment activities. Additionally, residential areas generally have lower PFRR, which may be associated with a shift in residents’ travel patterns, from subways to private cars or walking and other modes of transportation.

When considering the interference of private cars, public transit, and walking, we found that these modes of transportation have a complex impact on the spatial distribution of PFRR. The prevalence of private cars may lead to a reduction in subway passenger flow in certain areas, especially in residential areas on the outskirts of the city. Meanwhile, the layout of public transit routes and pedestrian-friendly urban planning also affect PFRR. For instance, areas dense with bus stops may attract more public transit passengers, thereby affecting the subway’s PFRR. Pedestrian-friendly environments may increase the willingness of people to walk, particularly for short-distance trips, which also impacts the subway’s PFRR.

5.2. Built Environment’s Significance and Global Impact

Global interpretation refers to explaining the input features and feature interactions of the entire model, specifically by calculating the Shapley values for all features and their interactions. The contribution levels of different features to the PFRR are shown in Figure 6, where feature importance is ranked based on the average absolute impact on the target variable. Figure 6 presents the ranking of variable contributions to the optimal fitting of PFRR by the LGWO–LightGBM model on weekdays and weekends.

As shown in Figure 6a, for weekdays,

X_{p o i}^{c o m p}

and

X_{p o i}^{s h o p}

have the highest importance, accounting for 37.69% and 18.83%, respectively. In contrast, the importance of

X_{l a n d}^{r e s i}

is only 9.82%. This suggests that travelers who shifted away from using the subway during the rapid recovery phase did not quickly return to it, likely due to changes in travel habits. Once travel behavior changes, it does not easily revert to the original state.

X_{p o i}^{c u l t}

,

X_{l a n d}^{e d u}

,

X_{p o i}^{c a t e}

, and

X_{p o i}^{l e i s}

have less impact on PFRR, indicating that non-essential travel did not effectively recover during the rapid recovery period.

As shown in Figure 6b, for weekends, the impact of land-related built environment variables on PFRR is relatively balanced. Notably,

X_{p o i}^{c a t e}

has the most significant impact, indicating an increase in trips for recreational purposes, which reflects the recovery of travel vitality.

X_{p o i}^{c o m p}

and

X_{l a n d}^{r e s i}

also contribute significantly, with the positive and negative feedback mechanisms on weekend PFRR being clearly identifiable in Figure 7. Additionally, features related to non-essential travel, such as

X_{l a n d}^{c o m m}

,

X_{p o i}^{s h o p}

, and

X_{l a n d}^{r e c r}

, also have relatively high contributions, highlighting the recovery of non-essential travel during weekends.

Figure 7 presents the SHAP summary plots for weekdays and weekends, illustrating the impact and importance of various features on the PFRR. The SHAP summary plot ranks the variables from top to bottom based on the sum of their absolute SHAP values. The relationship between the Shapley values and the feature values for each sample helps to estimate the effect of each feature on the prediction results. For example, on weekdays, stations with higher

X_{l a n d}^{m i x}

tend to show mostly negative SHAP values. This suggests that for most stations, an increase in

X_{l a n d}^{m i x}

leads to a negative feedback mechanism on the resilience of passenger flow recovery. Similar negative feedback mechanisms are observed for

X_{l a n d}^{r e s i}

,

X_{l a n d}^{r e c r}

, and

X_{p o i}^{s h o p}

. Other variables either have a positive feedback mechanism or show no significant feedback. On weekends,

X_{l a n d}^{c o m m}

exhibits a clear positive feedback mechanism, as do

X_{p o i}^{c a t e}

and

X_{p o i}^{c o m p}

. In the following section, a more detailed investigation will be conducted into the nonlinear relationships between individual variables and features.

5.3. Built Environment’s Nonlinear Impact

In this subsection, we introduce the SHAP dependence plots for individual features. Figure 8 and Figure 9 illustrate the specific relationships between individual variables and PFRR for weekdays and weekends, respectively. The horizontal axis represents the values of the features, while the vertical axis indicates the corresponding SHAP values. By observing the trends in SHAP values, we can infer the degree to which the feature impacts the model’s results, allowing for a better understanding of the relationship between the feature and the model.

From Figure 8, it can be seen that for the PFRR on weekdays, the selected built environment variables exhibit significant nonlinear relationships.

X_{p o i}^{l e i s}

has a negative feedback effect on the dependent variable at lower values, and as the values increase, their SHAP values become positive, indicating a subsequent positive feedback phenomenon. Similarly, other built environment variables that display this initial negative and then positive trend include

X_{p o i}^{c o m p}

,

X_{l a n d}^{w o r k}

,

X_{l a n d}^{e d u}

,

X_{l a n d}^{c o m m}

, and

X_{p o i}^{c a t e}

. For example,

X_{p o i}^{c u l t}

facilities show a positive SHAP value at lower values, which then turns negative as the values increase, indicating an association with passenger flow resilience that shifts from positive to negative. Other variables exhibiting this positive-to-negative relationship include

X_{p o i}^{s c e n}

,

X_{p o i}^{s h o p}

,

X_{l a n d}^{m i x}

,

X_{l a n d}^{r e c r}

, and

X_{l a n d}^{r e s i}

.

From Figure 9, it can be observed that for the PFRR on weekends, different types of nonlinear relationships are also evident. Built environment variables related to land that exhibit a negative-to-positive feedback characteristic include

X_{p o i}^{s c e n}

,

X_{p o i}^{c o m p}

,

X_{l a n d}^{w o r k}

, and

X_{l a n d}^{c o m m}

. In contrast, variables such as

X_{p o i}^{l e i s}

,

X_{p o i}^{s h o p}

,

X_{l a n d}^{m i x}

, and

X_{l a n d}^{r e c r}

show a positive-to-negative feedback pattern. Other features display alternating positive and negative feedback characteristics or fluctuate around a SHAP value of zero, indicating that the feedback mechanisms of these variables on PFRR are influenced by the random variation of sample points.

5.4. Comparison of Model Fitting Effects

To highlight the advantages of the proposed model in terms of fitting effects and performance, we selected the following models for comparison, assessing them based on R-squared, RMSE, MAE, as well as runtime and model complexity. The characteristics of the baseline models are as follows.

OLS: This method estimates parameters in a linear model by finding the best-fitting line that minimizes the sum of the squared vertical distances (i.e., errors) from all data points to this line. It is a straightforward and intuitive approach that serves as the baseline model for many regression analyses.
GBDT [29]: This is an ensemble learning algorithm that iteratively trains decision trees to minimize a loss function, with each tree attempting to correct the residuals of the previous one. GBDT (gradient boosting decision tree) adds a new tree at each step to improve the model’s fit to the data while controlling model complexity through a regularization term to prevent overfitting.
LightGBM: This efficient gradient boosting decision tree algorithm significantly improves training speed and reduces memory usage through histogram optimization and a leaf-wise growth strategy.
XGBoost [11]: This is an efficient gradient boosting framework that improves model performance by sequentially adding decision trees, each aiming to correct the residuals of the previous model. XGBoost incorporates a regularization term into the objective function to control model complexity and prevent overfitting.

Our personal computers are equipped with a CPU of i7-9700 2.40 GHz and 32 GB RAM, running in a Python 3.7 environment. Table 2 presents the fitting results of the model proposed in this study and the baseline models.

As shown in Table 2, the new method proposed in this study demonstrates advantages in fitting characteristics. For the XGBoost and LightGBM methods listed in Table 2, we used grid search cross-validation to obtain the optimal combination of hyperparameters. The range for the maximum depth of trees (Max depth) was set from 1 to 10, while the number of regression trees (N estimators) ranged from 10 to 50. The information in Table 2 indicates that the proposed model achieves a more accurate fitting effect while reducing model complexity and computation time. Overall, the reduced number of optimal hyperparameters suggests that the proposed architecture is more streamlined in terms of model complexity, thereby minimizing the risk of overfitting and operational complexity associated with parameter tuning.

6. Discussion and Conclusions

This study examined the relationship between station-level passenger flow recovery resilience and land-related built environment factors during the post-pandemic period. We developed a novel approach that combines heuristic algorithms with supervised tree-based methods to explore the impact of land use-related built environment variables on station-level passenger flow recovery resilience. Several interesting and unique findings warrant further discussion.

First, during the recovery phase of passenger flow amid the pandemic, although sporadic outbreaks may still affect people’s travel, the characteristics of passenger flow recovery were not significantly impacted. People may have adapted to the travel patterns influenced by the pandemic in a normalized context [22,44]. However, there were significant fluctuations in passenger flow recovery among different sites, reflecting spatial differences in how people respond to sudden public health emergencies. It is generally believed that high elasticity and rapid recovery occur mainly outside urban centers, and residential statistics are positively correlated with this phenomenon [45]. Indeed, our study found a strong correlation between the contribution of residential land and the characteristics of passenger flow resilience disturbances. Additionally, based on the spatial distribution of correlation coefficients for built environment variables at subway stations in Xi’an, we observed that this spatial heterogeneity is more closely associated with the urban center.

Second, we found that compared to weekdays, the importance of built environment factors influencing weekend passenger flow resilience is relatively balanced. This reflects the non-urgency of resident travel on non-working days, and the reduction in essential travel underscores the diverse land uses’ multifaceted impact on passenger flow composition during this period. This confirms the temporal variation in station vulnerability [46]. Overall, the nonlinear effects of the variables exhibit distinct threshold characteristics, providing significant data support for operational strategy adjustments during the recovery phase, such as offering more efficient departure frequencies or reserving greater onboard capacity at rapidly recovering stations.

From a policy perspective, the pandemic has exacerbated disparities in travel accessibility. Our findings indicate that commercial land, recreational land, and residential land are decisive factors during the rapid recovery phase. This suggests that the speed of recovery for subway travel is most closely related to residents’ leisure and daily entertainment travel within the vicinity of the stations, reflecting the broad service scope of the subway as a fundamental mode of transportation. It is important to note that many studies have pointed out that low-income populations, along with age and gender characteristics, may exhibit a greater dependence on public transportation. This could lead to a quicker return to public transit for these groups in the post-pandemic period [47,48]. This confirms that stations with higher residential characteristics in non-central areas tend to have greater recovery capabilities. Therefore, our research provides a basis for the integrated planning of subway networks, supporting the development around subway stations to enhance their resilience in the face of unknown disturbances or sudden public health events.

In the context of urban planning and policymaking, the application of the PFRR index provides a powerful tool for understanding and enhancing the resilience of urban transportation systems. Considering the significant impact of commercial, leisure, and residential land use on the recovery of passenger flow at subway stations, urban planners should consider promoting diversity and comprehensiveness in land use around subway stations to enhance the recovery capacity of traffic flow. The study reveals the recovery capabilities of stations in non-central urban areas, suggesting that policymakers should pay attention to the potential for public transport development in these areas and improve the connectivity of the entire city’s transportation through service optimization. The PFRR index can reflect the adaptability of stations to public health emergencies, and urban planners and policymakers can use this index to assess and plan the resilience of the urban transportation system, developing effective emergency response plans. Data play a key role in revealing the recovery force of passenger flow, and urban planners and policymakers can use these data to guide decision-making, optimize resource allocation, and operational strategies.

Although this paper approaches the topic from the perspective of station-level passenger flow recovery resilience and presents a novel analytical framework with specific research results, particularly unique indicators of passenger flow recovery during the pandemic recovery period, challenges remain regarding the impact of such public health events on travel behavior. Moreover, there is a lack of comprehensive analyses of the impact on public transportation networks. Specifically, the applicability of the conclusions drawn from our case study to other cities needs further validation. Additionally, while this study focuses on the impact of land-use characteristics on passenger flow recovery resilience indicators, future research should also consider factors such as residents’ income, age, and other personal transportation tools in relation to their impact on recovery resilience.

Author Contributions

Conceptualization, methodology, P.L.; software, validation, Q.Y.; investigation, writing—original draft preparation, W.L. and P.L.; resources, writing—review and editing, supervision, S.X. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant 71871027).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors express their gratitude to doctoral candidate Yuqing Wang for their valuable feedback and technical support in enhancing this research work. Any reviewers’ valuable suggestions for revisions to the manuscript are likewise appreciated.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, S.; Liu, X.; Li, Z.; Wu, Z.; Yan, Z.; Chen, Y.; Gao, F. Spatial and temporal dynamics of urban expansion along the Guangzhou–Foshan inter-city rail transit corridor, China. Sustainability 2018, 10, 593. [Google Scholar] [CrossRef]
Serdar, M.Z.; Koç, M.; Al-Ghamdi, S.G. Urban transportation networks resilience: Indicators, disturbances, and assessment methods. Sustain. Cities Soc. 2022, 76, 103452. [Google Scholar] [CrossRef]
Li, P.; Chen, X.; Ma, C.; Zhu, C.; Lu, W. Risk assessment of COVID-19 infection for subway commuters integrating dynamic changes in passenger numbers. Environ. Sci. Pollut. Res. 2022, 29, 74715–74724. [Google Scholar] [CrossRef]
Kwon, D.; Oh, S.E.S.; Choi, S.; Kim, B.H. Viability of compact cities in the post-COVID-19 era: Subway ridership variations in Seoul Korea. Ann. Reg. Sci. 2023, 71, 175–203. [Google Scholar] [CrossRef]
Parker, M.E.; Li, M.; Bouzaghrane, M.A.; Obeid, H.; Hayes, D.; Frick, K.T.; Rodríguez, D.A.; Sengupta, R.; Walker, J.; Chatman, D.G. Public transit use in the United States in the era of COVID-19: Transit riders’ travel behavior in the COVID-19 impact and recovery period. Transp. Policy 2021, 111, 53–62. [Google Scholar] [CrossRef]
Chang, H.-H.; Lee, B.; Yang, F.-A.; Liou, Y.-Y. Does COVID-19 affect metro use in Taipei? J. Transp. Geogr. 2021, 91, 102954. [Google Scholar] [CrossRef]
Zhang, N.; Jia, W.; Wang, P.; Dung, C.-H.; Zhao, P.; Leung, K.; Su, B.; Cheng, R.; Li, Y. Changes in local travel behaviour before and during the COVID-19 pandemic in Hong Kong. Cities 2021, 112, 103139. [Google Scholar] [CrossRef] [PubMed]
Su, S.; Zhao, C.; Zhou, H.; Li, B.; Kang, M. Unraveling the relative contribution of TOD structural factors to metro ridership: A novel localized modeling approach with implications on spatial planning. J. Transp. Geogr. 2022, 100, 103308. [Google Scholar] [CrossRef]
Lin, J.-J.; Zhao, P.; Takada, K.; Li, S.; Yai, T.; Chen, C.-H. Built environment and public bike usage for metro access: A comparison of neighborhoods in Beijing, Taipei, and Tokyo. Transp. Res. Part D Transp. Environ. 2018, 63, 209–221. [Google Scholar] [CrossRef]
Wang, Z.; Liu, S.; Lian, H.; Chen, X. Investigating the Nonlinear Effect of Land Use and Built Environment on Public Transportation Choice Using a Machine Learning Approach. Land 2024, 13, 1302. [Google Scholar] [CrossRef]
Li, P.; Yang, Q.; Lu, W. Nonlinear Relationship of Multi-Source Land Use Features with Temporal Travel Distances at Subway Station Level: Empirical Study from Xi’an City. Land 2024, 13, 1021. [Google Scholar] [CrossRef]
Ma, X.; Zhang, J.; Ding, C.; Wang, Y. A geographically and temporally weighted regression model to explore the spatiotemporal influence of built environment on transit ridership. Comput. Environ. Urban Syst. 2018, 70, 113–124. [Google Scholar] [CrossRef]
Cardozo, O.D.; García-Palomares, J.C.; Gutiérrez, J. Application of geographically weighted regression to the direct forecasting of transit ridership at station-level. Appl. Geogr. 2012, 34, 548–558. [Google Scholar] [CrossRef]
Zhu, Y.; Chen, F.; Wang, Z.; Deng, J. Spatio-temporal analysis of rail station ridership determinants in the built environment. Transportation 2019, 46, 2269–2289. [Google Scholar] [CrossRef]
Walker, B.; Holling, C.S.; Carpenter, S.R.; Kinzig, A. Resilience, adaptability and transformability in social–ecological systems. Ecol. Soc. 2004, 9, 5. [Google Scholar] [CrossRef]
Wu, L.; Yuan, M.; Liu, F.; Niu, Q. The Impact of COVID-19 on the Jobs–Housing Dynamic Balance: Empirical Evidence from Wuhan between 2019, 2021, 2023. Land 2024, 13, 1299. [Google Scholar] [CrossRef]
Liu, Y.; Pei, T.; Song, C.; Chen, J.; Chen, X.; Huang, Q.; Wang, X.; Shu, H.; Wang, X.; Guo, S. How did human dwelling and working intensity change over different stages of COVID-19 in Beijing? Sustain. Cities Soc. 2021, 74, 103206. [Google Scholar] [CrossRef]
Park, J. Changes in subway ridership in response to COVID-19 in Seoul, South Korea: Implications for social distancing. Cureus 2020, 12, e7668. [Google Scholar] [CrossRef]
Yin, J.; Cao, X.J.; Huang, X. Association between subway and life satisfaction: Evidence from Xi’an, China. Transp. Res. Part D Transp. Environ. 2021, 96, 102869. [Google Scholar] [CrossRef]
Chen, J.; Liu, J.; Peng, Q.; Yin, Y. Resilience assessment of an urban rail transit network: A case study of Chengdu subway. Phys. A Stat. Mech. Appl. 2022, 586, 126517. [Google Scholar] [CrossRef]
Marra, A.D.; Sun, L.; Corman, F. The impact of COVID-19 pandemic on public transport usage and route choice: Evidences from a long-term tracking study in urban area. Transp. Policy 2022, 116, 258–268. [Google Scholar] [CrossRef] [PubMed]
Osorio, J.; Liu, Y.; Ouyang, Y. Executive orders or public fear: What caused transit ridership to drop in Chicago during COVID-19? Transp. Res. Part D Transp. Environ. 2022, 105, 103226. [Google Scholar] [CrossRef] [PubMed]
Meng, Y.; Zhao, X.; Liu, J.; Qi, Q.; Zhou, W. Data-driven complexity analysis of weighted Shenzhen Metro network based on urban massive mobility in the rush hours. Phys. A Stat. Mech. Appl. 2023, 610, 128403. [Google Scholar] [CrossRef]
Zhou, Y.; Li, Z.; Meng, Y.; Li, Z.; Zhong, M. Analyzing spatio-temporal impacts of extreme rainfall events on metro ridership characteristics. Phys. A Stat. Mech. Its Appl. 2021, 577, 126053. [Google Scholar] [CrossRef]
Liu, L.; Wang, Y.; Hickman, R. How Rail Transit Makes a Difference in People’s Multimodal Travel Behaviours: An Analysis with the XGBoost Method. Land 2023, 12, 675. [Google Scholar] [CrossRef]
Yang, L.; Yu, B.; Liang, Y.; Lu, Y.; Li, W. Time-varying and non-linear associations between metro ridership and the built environment. Tunn. Undergr. Space Technol. 2023, 132, 104931. [Google Scholar] [CrossRef]
Yu, L.; Cong, Y.; Chen, K. Determination of the peak hour ridership of metro stations in Xi’an, China using geographically-weighted regression. Sustainability 2020, 12, 2255. [Google Scholar] [CrossRef]
Jun, M.-J.; Choi, K.; Jeong, J.-E.; Kwon, K.-H.; Kim, H.-J. Land use characteristics of subway catchment areas and their influence on subway ridership in Seoul. J. Transp. Geogr. 2015, 48, 30–40. [Google Scholar] [CrossRef]
Gan, Z.; Yang, M.; Feng, T.; Timmermans, H.J. Examining the relationship between built environment and metro ridership at station-to-station level. Transp. Res. Part D Transp. Environ. 2020, 82, 102332. [Google Scholar] [CrossRef]
Li, S.; Lyu, D.; Liu, X.; Tan, Z.; Gao, F.; Huang, G.; Wu, Z. The varying patterns of rail transit ridership and their relationships with fine-scale built environment factors: Big data analytics from Guangzhou. Cities 2020, 99, 102580. [Google Scholar] [CrossRef]
Cervero, R. Alternative approaches to modeling the travel-demand impacts of smart growth. J. Am. Plan. Assoc. 2006, 72, 285–295. [Google Scholar] [CrossRef]
Ding, C.; Cao, X.; Liu, C. How does the station-area built environment influence Metrorail ridership? Using gradient boosting decision trees to identify non-linear thresholds. J. Transp. Geogr. 2019, 77, 70–78. [Google Scholar] [CrossRef]
An, D.; Tong, X.; Liu, K.; Chan, E.H. Understanding the impact of built environment on metro ridership using open source in Shanghai. Cities 2019, 93, 177–187. [Google Scholar] [CrossRef]
Sun, L.-S.; Wang, S.-W.; Yao, L.-Y.; Rong, J.; Ma, J.-M. Estimation of transit ridership based on spatial analysis and precise land use data. Transp. Lett. 2016, 8, 140–147. [Google Scholar] [CrossRef]
Zhao, J.; Deng, W.; Song, Y.; Zhu, Y. What influences Metro station ridership in China? Insights from Nanjing. Cities 2013, 35, 114–124. [Google Scholar] [CrossRef]
Durning, M.; Townsend, C. Direct ridership model of rail rapid transit systems in Canada. Transp. Res. Rec. 2015, 2537, 96–102. [Google Scholar] [CrossRef]
Ewing, R.; Hamidi, S.; Gallivan, F.; Nelson, A.C.; Grace, J.B. Combined effects of compact development, transportation investments, and road user pricing on vehicle miles traveled in urbanized areas. Transp. Res. Rec. 2013, 2397, 117–124. [Google Scholar] [CrossRef]
Kim, M.-K.; Kim, S.; Sohn, H.-G. Relationship between spatio-temporal travel patterns derived from smart-card data and local environmental characteristics of Seoul, Korea. Sustainability 2018, 10, 787. [Google Scholar] [CrossRef]
Yang, H.; Lu, Y.; Wang, J.; Zheng, Y.; Ruan, Z.; Peng, J. Understanding post-pandemic metro commuting ridership by considering the built environment: A quasi-natural experiment in Wuhan, China. Sustain. Cities Soc. 2023, 96, 104626. [Google Scholar] [CrossRef]
Choi, J.; Lee, Y.J.; Kim, T.; Sohn, K. An analysis of Metro ridership at the station-to-station level in Seoul. Transportation 2012, 39, 705–722. [Google Scholar] [CrossRef]
Liu, M.; Liu, Y.; Ye, Y. Nonlinear effects of built environment features on metro ridership: An integrated exploration with machine learning considering spatial heterogeneity. Sustain. Cities Soc. 2023, 95, 104613. [Google Scholar] [CrossRef]
Chen, E.; Ye, Z.; Wu, H. Nonlinear effects of built environment on intermodal transit trips considering spatial heterogeneity. Transp. Res. Part D Transp. Environ. 2021, 90, 102677. [Google Scholar] [CrossRef]
Wang, Z.; Song, J.; Zhang, Y.; Li, S.; Jia, J.; Song, C. Spatial heterogeneity analysis for influencing factors of outbound ridership of subway stations considering the optimal scale range of “7D” built environments. Sustainability 2022, 14, 16314. [Google Scholar] [CrossRef]
Xiao, W.; Wei, Y.D.; Wu, Y. Neighborhood, built environment and resilience in transportation during the COVID-19 pandemic. Transp. Res. Part D Transp. Environ. 2022, 110, 103428. [Google Scholar] [CrossRef]
Wang, S.; Huang, X.; Shen, Q. Disparities in resilience and recovery of ridesourcing usage during COVID-19. J. Transp. Geogr. 2024, 114, 103745. [Google Scholar] [CrossRef]
Pan, S.; Ling, S.; Jia, N.; Liu, Y.; He, Z. On the dynamic vulnerability of an urban rail transit system and the impact of human mobility. J. Transp. Geogr. 2024, 116, 103850. [Google Scholar] [CrossRef]
Kim, S.; Lee, S.; Ko, E.; Jang, K.; Yeo, J. Changes in car and bus usage amid the COVID-19 pandemic: Relationship with land use and land price. J. Transp. Geogr. 2021, 96, 103168. [Google Scholar] [CrossRef]
Li, P.; Chen, X.; Lu, W.; Wang, H.; Wang, Y. Nonlinear Effects of the Built Environment on Subways at Station Level: Average Travel Distance Changes under the Influence of COVID-19. J. Urban Plan. Dev. 2025, 151, 04024070. [Google Scholar]

Figure 1. Spatial distribution of Xi’an subway network and stations.

Figure 2. Overall subway network passenger flow and time period selection.

Figure 3. The distribution of PFRR values across all stations at different time points.

Figure 4. Spatial heterogeneity distribution of built environment variable coefficients on weekdays: (a) coefficient of

X_{p o i}^{l e i s}

, (b) coefficient of

X_{p o i}^{s h o p}

, (c) coefficient of

X_{l a n d}^{c o m m}

, (d) coefficient of

X_{l a n d}^{r e c r}

, and (e) coefficient of

X_{l a n d}^{r e s i}

.

Figure 4. Spatial heterogeneity distribution of built environment variable coefficients on weekdays: (a) coefficient of

X_{p o i}^{l e i s}

, (b) coefficient of

X_{p o i}^{s h o p}

, (c) coefficient of

X_{l a n d}^{c o m m}

, (d) coefficient of

X_{l a n d}^{r e c r}

, and (e) coefficient of

X_{l a n d}^{r e s i}

.

Figure 5. Spatial heterogeneity distribution of built environment variable coefficients on weekends: (a) coefficient of

X_{p o i}^{l e i s}

, (b) coefficient of

X_{p o i}^{s h o p}

, (c) coefficient of

X_{l a n d}^{c o m m}

, (d) coefficient of

X_{l a n d}^{r e c r}

, and (e) coefficient of

X_{l a n d}^{r e s i}

.

Figure 5. Spatial heterogeneity distribution of built environment variable coefficients on weekends: (a) coefficient of

X_{p o i}^{l e i s}

, (b) coefficient of

X_{p o i}^{s h o p}

, (c) coefficient of

X_{l a n d}^{c o m m}

, (d) coefficient of

X_{l a n d}^{r e c r}

, and (e) coefficient of

X_{l a n d}^{r e s i}

.

Figure 6. The ranking of the importance of land-related built environment factors on PFRR.

Figure 7. SHAP summary plot of PFRR.

Figure 8. SHAP dependence plot for single variables on PFRR during weekdays: (a)

X_{p o i}^{l e i s}

variable, (b)

X_{p o i}^{s c e n}

variable, (c)

X_{p o i}^{c u l t}

variable, (d)

X_{p o i}^{s h o p}

variable, (e)

X_{p o i}^{c a t e}

variable, (f)

X_{p o i}^{c o m p}

variable, (g)

X_{l a n d}^{m i x}

variable, (h)

X_{l a n d}^{w o r k}

variable, (i)

X_{l a n d}^{e d u}

variable, (j)

X_{l a n d}^{c o m m}

variable, (k)

X_{l a n d}^{r e c r}

variable and (l)

X_{l a n d}^{r e s i}

variable.

Figure 8. SHAP dependence plot for single variables on PFRR during weekdays: (a)

X_{p o i}^{l e i s}

variable, (b)

X_{p o i}^{s c e n}

variable, (c)

X_{p o i}^{c u l t}

variable, (d)

X_{p o i}^{s h o p}

variable, (e)

X_{p o i}^{c a t e}

variable, (f)

X_{p o i}^{c o m p}

variable, (g)

X_{l a n d}^{m i x}

variable, (h)

X_{l a n d}^{w o r k}

variable, (i)

X_{l a n d}^{e d u}

variable, (j)

X_{l a n d}^{c o m m}

variable, (k)

X_{l a n d}^{r e c r}

variable and (l)

X_{l a n d}^{r e s i}

variable.

Figure 9. SHAP dependence plot for single variables on PFRR during weekends: (a)

X_{p o i}^{l e i s}

variable, (b)

X_{p o i}^{s c e n}

variable, (c)

X_{p o i}^{c u l t}

variable, (d)

X_{p o i}^{s h o p}

variable, (e)

X_{p o i}^{c a t e}

variable, (f)

X_{p o i}^{c o m p}

variable, (g)

X_{l a n d}^{m i x}

variable, (h)

X_{l a n d}^{w o r k}

variable, (i)

X_{l a n d}^{e d u}

variable, (j)

X_{l a n d}^{c o m m}

variable, (k)

X_{l a n d}^{r e c r}

variable and (l)

X_{l a n d}^{r e s i}

variable.

Figure 9. SHAP dependence plot for single variables on PFRR during weekends: (a)

X_{p o i}^{l e i s}

variable, (b)

X_{p o i}^{s c e n}

variable, (c)

X_{p o i}^{c u l t}

variable, (d)

X_{p o i}^{s h o p}

variable, (e)

X_{p o i}^{c a t e}

variable, (f)

X_{p o i}^{c o m p}

variable, (g)

X_{l a n d}^{m i x}

variable, (h)

X_{l a n d}^{w o r k}

variable, (i)

X_{l a n d}^{e d u}

variable, (j)

X_{l a n d}^{c o m m}

variable, (k)

X_{l a n d}^{r e c r}

variable and (l)

X_{l a n d}^{r e s i}

variable.

Table 1. Built environment variable information.

Variable	Description	Mean	STD
$X_{l a n d}^{w o r k}$	Area of work land within the catchment area/acre	61.44	56.51
$X_{l a n d}^{e d u}$	Area of educational land within the catchment area/acre	21.28	31.68
$X_{l a n d}^{c o m m}$	Area of commercial land within the catchment area/acre	120.01	107.66
$X_{l a n d}^{r e s i}$	Area of residential land within the catchment area/acre	626.10	288.49
$X_{l a n d}^{r e c r}$	Area of recreational land within the catchment area/acre	19.81	35.07
$X_{l a n d}^{m i x}$	Land-use mix within the catchment area	0.34	0.23
$X_{p o i}^{c o m p}$	Number of companies within the catchment area	103.44	132.41
$X_{p o i}^{l e i s}$	Number of leisure points within the catchment area	6.35	10.96
$X_{p o i}^{c u l t}$	Number of cultural service points within the catchment area	69.46	76.17
$X_{p o i}^{s h o p}$	Number of shopping points within the catchment area	8.67	8.54
$X_{p o i}^{c a t e}$	Number of catering service points within the catchment area	179.66	154.15
$X_{p o i}^{s c e n}$	Number of scenic points within the catchment area	6.35	10.96

Table 2. Comparison of model fitting effects.

Evaluation Metrics	Weekday				Weekend
Evaluation Metrics	LGWO–LightGBM	OLS	LightGBM	XGBoost	LGWO–LightGBM	OLS	LightGBM	XGBoost
R-squared	0.67	0.49	0.52	0.55	0.64	0.42	0.53	0.58
RMSE	0.064	0.082	0.072	0.071	0.092	0.120	0.107	0.092
MAE	0.045	0.059	0.057	0.054	0.062	0.083	0.092	0.089
Calculate time (s)	8.25	6.24	9.56	10.20	9.21	7.20	9.83	11.40
Max depth	3	N/A	3	4	3	N/A	3	4
N estimators	28	N/A	34	40	28	N/A	34	38

Note: N/A = not applicable.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, P.; Yang, Q.; Lu, W.; Xi, S.; Wang, H. An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment. Land 2024, 13, 1887. https://doi.org/10.3390/land13111887

AMA Style

Li P, Yang Q, Lu W, Xi S, Wang H. An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment. Land. 2024; 13(11):1887. https://doi.org/10.3390/land13111887

Chicago/Turabian Style

Li, Peikun, Quantao Yang, Wenbo Lu, Shu Xi, and Hao Wang. 2024. "An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment" Land 13, no. 11: 1887. https://doi.org/10.3390/land13111887

APA Style

Li, P., Yang, Q., Lu, W., Xi, S., & Wang, H. (2024). An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment. Land, 13(11), 1887. https://doi.org/10.3390/land13111887

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Machine Learning Framework Considering Spatiotemporal Heterogeneity for Analyzing the Relationship Between Subway Station-Level Passenger Flow Resilience and Land Use-Related Built Environment

Abstract

1. Introduction

2. Related Work

2.1. Research on Passenger Flow Resilience

2.2. The Correlation Between Land-Related Built Environment Factors and Passenger Flow

2.3. Limitations and Contributions

3. Data and Processing

3.1. Research Area and Passenger Flow Data

3.2. Calculation of Station-Level PFRR

3.3. Land Use-Related Built Environment Variables

4. Methodology

4.1. MGWR Model

4.2. LightGBM Model

4.3. LGWO Optimization Algorithm

4.4. SHAP Attribution Analysis Principles

4.5. Overall Architecture Implementation Process

5. Results

5.1. Built Environment Coefficient Spatial Heterogeneity

5.2. Built Environment’s Significance and Global Impact

5.3. Built Environment’s Nonlinear Impact

5.4. Comparison of Model Fitting Effects

6. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI