1. Introduction
Shared bicycles can not only provide convenient trip services for urban residents, but also can help them to obtain access to public transportation. They play a vital role in improving the efficiency of short-distance trips, solving the first and last mile travel problem of urban traffic, and alleviating a city’s traffic stress [
1,
2]. However, while sharing bicycles facilitates residents’ trips and improves trip efficiency, there is also an imbalance between supply and demand. For example, few bicycles are in areas with high demand, while too many are in areas with low demand [
3]. When shared bicycles are placed, only the area’s land use is generally considered, so areas such as commercial districts, subway stations, and bus stations tend to put more bicycles [
4]. However, although the land use of some areas is the same, the demand for bicycles may be quite different. This is because different urban built environments have different levels of adaptability to shared bicycles; however, land use is only one factor. A series of factors reflecting the urban built environment, such as the accessibility of road network, the distribution of POI, and public transport coverage, will affect the use of shared bicycles. In this study, we discuss the impact of urban built environment factors on the spatial distribution of shared bicycles and explore their usage. The findings of this paper guide the understanding of the demand and supply of shared bicycles and improve service quality.
The studies related to public bicycle trips mainly focus on predicting demand [
5], scheduling optimization [
6], cycling OD identification [
7], and path selection [
8]. The first two types of studies mainly focus on docked public bicycles and pay less attention to dockless shared bicycles. The practical significance of the research on OD identification and path selection is to guide bicycle network planning, and there is little research on operation and management. The characteristics of the urban built environment are the critical factors affecting the use of public bicycles [
9,
10,
11,
12]. This study shows that bicycle trips are affected by the 3D elements of urban form, namely density, diversity of land use, and urban design. Density mainly refers to the density of public transport stations, kinds of POI density, shared bicycle station density, etc.; it mostly indicates land-use diversity. Design mostly represents urban spatial structure design. Based on this, some scholars gradually expanded the urban built environment elements to “5D” and added “destination accessibility” and “distance to transit”. Later, some studies incorporated “demand management” into influencing factors to form “6D”. Indicators related to population statistics have gradually entered the vision of relevant researchers and put forward “7D”. Weliwitiya [
13] set a linear model to analyze the impact of a built environment on the utilization rate of bicycles. It was found that the density of non-motorized lanes and the passenger flow of subway have a positive impact on it. Noland et al. [
14] used the public bicycle operation data of New York City to determine that the utilization rate of the public bicycle is related to factors such as population density, distance from bus stops, and catering outlets. In addition, some studies have proved that mixed land use can reduce travel distance and improve the utilization rate of shared bicycles [
15,
16]. For example, commercial places mixed with public buildings positively impact the use of shared bicycles [
17]. The higher the density of catering outlets and shopping outlets near the station, and the closer the station to the subway station, the greater the number of public bicycles used [
18,
19]. With the gradual development of studies, more scholars have been analyzing the impact of the urban built environment on shared bicycles from aspects such as infrastructure [
20], population density [
21], land use [
22,
23], and traffic accessibility [
24].
In general, the studies on the impact of the urban built environment on shared bicycles have not yet formed a complete theoretical method system. Most existing studies either take the whole study area as the analysis unit to analyze the relationship between the distribution of shared bicycles and the built environment, or make a horizontal comparison between different areas, which cannot focus on the smaller spatial scale, such as the street level.
Meanwhile, in terms of research methods, most of the existing studies rely on linear regression models to analyze the built environment’s impact on the distribution of shared bicycles, such as multiple linear regression [
25,
26] and least squares regression models [
27]. However, the regression effect of these models is not very good, and the regression coefficients of most studies are between 0.5 and 0.7. These linear regression models cannot measure the complex relationship between the distribution of shared bicycles and influencing factors, nor can they analyze the impact of different built environments on the distribution of shared bicycles in a small-scale space.
This paper uses Spatial Syntax theory and ArcGIS software 10.3 to assign the urban built environment elements and shared bicycle borrowing and returning location data on the urban spatial grid network. It describes the spatial distribution of shared bicycles with the grid as a unit, obtains spatial geographic information data, analyzes the spatial distribution characteristics of shared bicycles, and discusses the influencing factors. The road network axis map is drawn and imported into Space Syntax software DepthMap X0.7.0 to obtain the accessibility index of the road network, which is used to accurately analyze the impact of the road network on the distribution of shared bicycles. By combining the factors influencing the distribution of shared bicycles, prediction models based on SVR are proposed to explore the rules between the built environment and the distribution of shared bicycles. It provides a basis for the rational planning of shared bicycle parking areas and is of great significance to improving shared bicycles’ operating status and service level.
The remainder of this paper is organized as follows: In
Section 2, the situation of the study area and the collection methods employed to obtain the required research data are introduced. In
Section 3, the extraction method of urban built environment factors and its assignment method in the grid are proposed, and the SVR model is introduced to explore how an urban built environment affects the distribution of shared bicycles. In
Section 4, the analysis results of the model are given, and the effectiveness of the proposed model is verified. Finally, a summary of our findings is concluded in
Section 5.
3. Methodology
3.1. Extraction of Urban Built Environment Factors
Using ArcGIS software 10.3 to analyze the nuclear density, the spatial distribution of the shared bicycles in the downtown Shanghai can be obtained, as shown in
Figure 3.
As shown in
Figure 3, the shared bicycles in downtown Shanghai on weekdays and weekends are relatively similar in spatial distribution and present an aggregation state. From the perspective of administrative divisions, the areas with higher shared bicycle density are Yangpu District, Huangpu District, Putuo District, and Hongkou District. At the street level, shared bicycles are concentrated in public transport stations, shopping areas, office buildings, education institutions, catering, and other service industries, as well as locations with high road accessibility.
According to the distribution characteristics of shared bicycles and the research in the existing literature, the urban built environment influence factors are divided into three aspects: building function, public transportation convenience, and road network conditions. They are characterized by the POI comprehensive index, intensity public transport coverage, and spatial accessibility.
3.2. Assign Values to the Network Grid
In order to analyze the impact of public transport and road network coverage, the paper uses the Fishnet function of ArcGIS software to divide the study area into 1297 grids of 500 m × 500 m. By assigning built environment factors to grid cells, we can analyze the relationship between the urban built environment and the spatial distribution of shared bicycles on a small-scale space.
This paper innovatively proposes three built environment indicators, including the POI comprehensive index, the intensity of public transport coverage, and spatial accessibility. Then, their characterization methods and assignment methods on the grid. Among them, the POI comprehensive index takes into account two aspects of POI density and land-use diversity. Firstly, the POI density indexes that have no significant impact on shared bicycles are screened and excluded by SPSS, and then the comprehensive impact of various POI indexes is calculated by the entropy weight method. The intensity of public transport coverage is used to express the impact of public transport on shared bicycles. The research object of this paper is the dockless shared bicycle, and the research area is divided into a grid of 500 m × 500 m. The impact of traffic stations in the adjacent grid may be ignored by simply using indicators such as station density. Therefore, the index of public transport coverage intensity is proposed. The spatial accessibility index is used to express the impact of urban spatial structure on the distribution of shared bicycles. Previous studies mostly used road network density, non-motor vehicle density, and other indicators. This paper creatively introduces the concept of Spatial Syntax and uses the indicators of Spatial Syntax to quantify spatial accessibility.
3.2.1. POI Composite Index
Point of Interest (POI) data, representing building functions and geographic location information, can characterize the spatial distribution of urban land use to a certain extent. Eight categories of POI data closely related to residents’ daily life were selected, and the Pearson correlation analysis method in IBM SPSS Statistics 24.0 software was used to explore the relationship between these categories and the distribution of shared bicycles. The results are shown in
Table 1.
As the result of data analysis shows, in addition to medical institutions, the other seven types of POI data have a significant correlation with the shared bicycle borrowing and returning quantity. The correlation value is relatively close, generally between 0.6 and 0.8. This paper introduces the POI comprehensive index to reflect the comprehensive impact of multiple POI data. It can reduce the data analysis dimension and be obtained by weighted summation of the above seven types of POI, The calculation method is:
where
is the number of POI
, and
is the weight of the POI. Its value is determined by the entropy method, and the specific calculation method is as follows:
Standardize POI data:
where
is the standardized treatment result of POI
.
Calculate the weight:
where
is the weight coefficient of index
,
is the information utility, which can be calculated based on information entropy
:
where
is the information entropy of group
, and
is th proportion of the sample value
in index
.
The weight calculation results of POI are shown in
Table 2. Since the POI data contain geographic information, the weighted POI comprehensive value can be assigned directly to the grid cell.
Through Pearson analysis in SPSS, this paper analyzes the correlation between the POI composite index and the number of shared bicycles. The result shows that the correlation between them is 0.603, and that the intensity of public transport coverage can explain the distribution of shared bicycles.
3.2.2. Intensity of Public Transport Coverage
According to relevant surveys, 69% of shared bicycle trips in Shanghai are designed to connect with public transport. Therefore, the intensity of public transport coverage has an important impact on the distribution of shared bicycles. Since the grid cell scale is small, only the grid cell at the station’s location will be assigned by simply using the number of public transport stations, which is inconsistent with the actual situation. The parameter of public transportation coverage intensity is introduced in the study to characterize the spatial grid unit’s convenience of public transportation connection.
where
is the intensity of public transport coverage of grid
. According to the coverage rate of the public transport network in downtown Shanghai, the service radius of the conventional bus station is set to 300 m, and the service radius of the rail transit station is set to 800 m. Calculate the affected area of the bus stations and rail transit stations in the unit grid, and record them as the intensity of bus coverage and the intensity of rail transit coverage, respectively. Considering the difference in service level and the attraction of passenger flow between rail transit and conventional buses, the conversion coefficient of rail transit stations is introduced. Take the ratio of the transportation capacity of rail transit and conventional bus per unit time as the conversion factor, record it as
. Then is the following is obtained:
where
and
, respectively, indicate the coverage of rail transit stations and bus stations.
Figure 4 shows the assignment results of the public transportation coverage intensity of the grid network in downtown Shanghai. The data in picture (c) are the sum of pictures (a) and (b).
Through Pearson analysis in SPSS, this paper analyzes the correlation between the intensity of public transport coverage and the number of shared bicycles. The result shows that the correlation between them is 0.786, and that the intensity of public transport coverage can explain the distribution of shared bicycles.
3.2.3. Spatial Accessibility
The influence of road network conditions on the distribution of shared bicycles essentially reflects spatial accessibility on travel choices. This paper uses the index of Space Syntax to reflect spatial accessibility. Space Syntax is a theory and method of urban spatial analysis proposed by Bill Hillier in the 1980s [
28]. It essentially divides space while following the principle of graph theory, specifically abstract urban space, into intersecting line segments. Formulas have been put forward to quantify the topological relationship between line segments, explore the internal laws of spatial structure, and reveal the relationship between human activities and spatial structure [
29].
The analysis variables of the Spatial Syntax model include:
Depth: This refers to the minimum number of steps required for a space to reach other spaces and expresses the accessibility of the space in a topological sense. The higher the depth value, the worse the accessibility, and the lower the intensity of human activity. The mean depth (MD) is the mean of the minimum steps from a node to all other nodes in the system. It is a description of the whole system. The calculation formula is:
where
is the minimum number of topology steps between the nodes.
- 2.
Integration: This indicates the degree of aggregation or dispersion between nodes and other nodes in the entire system. If the integration value is greater than 1, the cell space has a high degree of aggregation with all other spaces in the system. If it is less than 1, it indicates that the nodes show a trend of mutual dispersion. When the value is between 0.4 and 0.6, the layout of spatial objects is relatively scattered. The calculation formula is as follows:
- 3.
Choice: This indicates the frequency of the shortest topological distance between nodes of the spatial unit in the system, and measures the advantages of the spatial unit as the shortest travel path, which reflects the possibility of the space being traversed. The node with a higher choice degree is more likely to be traversed by people. The calculation formula is as follows:
where
represents the sum of the spatial unit’s times in the shortest topological distance between the two nodes.
Referring to the road network data of downtown Shanghai in 2020, the road network axis map is drawn in ArcGIS software 10.3 based on the principle of “longest and least” and imported into DepthMap X0.7.0 for data analysis to obtain the variable values of Spatial Syntax. Finally, the obtained road network index values are fed back to ArcGIS software 10.3 and assigned to the road network axis. The calculation method of spatial accessibility of unit grid cells is as follows:
where
is the accessibility value of grid
,
is the accessibility value of the road axis
in the grid, and
is the length of the road axis
.
In Spatial Syntax, the n-step topological distance and 3-step topological distance are representative, so they are selected for analysis. Through the Pearson correlation analysis in the SPSS, the correlation between the Spatial Syntax index and the spatial distribution of the shared bicycles is shown in
Table 3. The integration (Rn) has the highest correlation with the shared bicycles, so the integration (Rn) is selected to reflect the spatial accessibility. After drawing the road network axis map of downtown Shanghai, 26,926 road axes were obtained. The result is shown in
Figure 5.
The integration (Rn) of the road network in downtown Shanghai presents the characteristics of the single-center structure. The core area is the junction of Jingan District, Hongkou District, and Huangpu District. The roads with high integration degrees are concentrated near the bund on the west side of the Huangpu River, forming the accessibility center of the road network in downtown Shanghai.
3.3. Support Vector Regression Model
At present, the regression prediction methods of shared bicycle demand mainly include multivariable linear regression model, geographically weighted regression, and regression prediction models based on machine learning, such as support vector machine, random forest, and SP neural network. Because there is a nonlinear relationship between the spatial distribution of shared bicycles and their influencing factors, and the distribution of shared bicycles changes periodically, this paper attempts to use the SVR model to establish the relationship between the spatial distribution of shared bicycles and the urban built environment. As an efficient machine learning method, SVR can take into account the complexity and learning ability of the model and minimize empirical errors. It has advantages in small-sample data and nonlinear conditions [
30].
The idea of SVR is to map the data to the high-dimensional feature space through nonlinear mapping and to carry out linear regression in the high-dimensional feature space to achieve the effect of linear regression in the original space. The regression estimation function is:
where
is the weight vector, and
is a constant.
Suppose
is the urban built-up environmental factor affecting the distribution of shared bicycles,
is the number of shared bicycles, and set the training dataset
. The insensitive loss function is introduced into the constrained optimization problem, which can be expressed as:
where
is the regularization part,
is the experience risk,
is the penalty factor, and
is the slack variable.
We can transform the optimization problem of the above formula into a dual problem by using the Lagrange multiplier method, then introduce the kernel function
, from which we obtain:
where
,
are Lagrange multipliers.
can be obtained by solving the quadratic programming problem.
The final regression function expression is:
In solving the problem of SVR, the selection of kernel function is the key problem. Generally, there are three types of kernel functions: (a) Linear kernel function: ; (b) Polynomial kernel function: ; and (c) Gaussian radial basis kernel function: .
This paper sets the grid as the analysis cell. Input the POI comprehensive index, the intensity of public transport coverage, spatial accessibility, and shared bicycle data during peak hours (7:30 a.m. to 8:30 a.m. and 17:30 p.m. to 18:30 p.m.) in downtown Shanghai in a week. The data within Xinhua Road, Changning District, Shanghai, were selected as the prediction sample set. The rest of the data were selected as the training sample set. In this paper, the root mean square error (RMSE) and the coefficient of determination (
) are used to test the prediction effect. The calculation formulas are as follows:
where
is the forecast value of shared bicycle demand,
is the measured value of shared bicycle,
is the sample mean of shared bicycle demand, and
is the number of samples in the prediction sample set.
The multiple linear regression method is introduced as a comparative experiment to compare and analyze the advantages and disadvantages of SVR and traditional linear regression methods. The calculation formula is as follows:
where
is the number of shared bicycles,
is the influencing factors of sharing bicycles,
is a constant, and
is the regression coefficient.
4. Results
After assigning the research data to the grid, the correlation analysis results show that the POI comprehensive index, the intensity of public transport coverage, and spatial accessibility significantly impact the spatial distribution of shared bicycles. The Linear kernel function, Gaussian radial basis function, and Polynomial kernel function are used to construct the SVR model. The forecast accuracies of the SVR models under these three kernel functions are obtained by training these models, as shown in
Figure 6,
Figure 7 and
Figure 8.
The root means square error and coefficient of determination of the SVR models based on the three kernel functions and the Multivariable linear regression model are shown in
Table 4.
As shown in the figures, the X axis stands for forecasting sample, that is, the grid cell serial number, while the Y axis stands for forecasting results, that is, the number of shared bicycles in each grid cell. The results show that the SVR models based on these three kernel functions have achieved high forecast accuracy. The model based on the Gaussian radial basis function has the minimum errors and the best fitting effect. The RMSE value measured by the model is 14.3416, which means that the average error between the predicted quantity of shared bicycles and the true value is about 14 vehicles in a unit fishing net of 500 m × 500 m. The value of is 0.97797, indicating that the prediction model can explain 97.797% of the distribution of shared bicycles. The RMSE obtained by the multiple linear regression model is 67.3542, and is 0.847, which is not as good as the former. The study found that the shared bicycle SVR model based on the Gaussian radial basis function can better explain the impact of urban built environment factors on the spatial distribution of shared bicycles.
5. Discussion and Conclusions
5.1. Discussion
This study provides a new perspective for investigating the impacts of the built environment on shared bicycles in a small-scale space, taking Shanghai as a case study. The main contribution of this study is threefold. First, the Fishnet function of ArcGIS was utilized to divide the study area into grids of 500 m × 500 m, and a method to calculate the borrowing and returning position of shared bicycles was proposed based on time-varying GPS data. Second, the built environmental factors are represented by building function, public transportation convenience, and road network conditions, and then the conditions are assigned to the grid. Third, SVR models are applied to explore the nonlinear relationship between the usage of shared bicycles and contributing factors.
The results show that the POI comprehensive index, the intensity of public transport coverage, and spatial accessibility significantly affect the spatial distribution of shared bicycles. The SVR model based on the Gaussian radial basis function is more effective in unveiling the complex and nonlinear relationship between shared bicycle usage and built environment elements than traditional linear models. In addition, the study finds that environmental factors, such as financial institutions and residential, have a more significant impact on shared bicycles than others.
5.2. Conclusions
Although this paper analyzes the nonlinear relationship between urban built environment factors and shared bicycle distribution in small-scale space, there are still some limitations. First, Shanghai was taken as the case study; therefore, the conclusion may not be universal. In the future, this research method can be extended to carry out case studies in different cities. Second, this paper mainly considers the impact of urban built environment on shared bicycles. Although this paper comes to the conclusion that environmental factors such as financial institutions and residential have a more significant impact on shared bicycles than other factors, considering that the population density of such POI areas is high, the impact of population density of various POIs may also play an important role in the demand and supply of shared bicycles. The influence of individual characteristics and sociodemographic attributes of residents may be ignored.
The spatial distribution law of shared bicycles has practical significance in understanding the supply and demand of shared bicycles in different built environment areas. In the next stage, the site selection of shared bicycle stations can be studied on this basis.