3.1. Construction of Model
Two research hypotheses of this article are reviewed again: Hypothesis 1: sponge city has interregional cooperation in the sense of management; Hypothesis 2: under the framework of regional cooperation, there is linkage between sponge city management and control elements. The premise of these two hypotheses is that there is regional connectivity between sponge cities on related elements of management and control. That is, related elements of management and control of sponge cities show regional clustering characteristics and certain regions have similarities in the elements of management and control. The clustering characteristics of management elements between regions are the premise of the regional cooperative management mechanism of sponge cities. Therefore, it is necessary to explore the similarity of management elements between regions with the clustering analysis method empirically to prove the feasibility of the mechanism. At the same time, this article assumes that there is linkage between sponge cities on the elements of management and construction. Therefore, on the basis of clustering analysis, a diversified clustering index system should be constructed to prove the feasibility of multiple linkage mechanism under the condition of regional cooperation mechanism. To sum up, this article uses meteorological indicators to represent the management and control factors of sponge cities and builds a spatial clustering model of multiple meteorological indicators to prove the hypothesis of this article.
Clustering analysis model cannot do without the support of data. Meteorological indicators are closely related to the distribution characteristics of climate and precipitation and the meteorological indicators between cities are similar, which creates conditions for clustering analysis. Based on the framework of sponge city regional cooperation management, this article conducts k-means clustering analysis on the 30 years’ meteorological data of 71 cities belonging to various geographical regions of China. This article makes a preliminary study on the manifestation of sponge city characteristics in different regions and cities in China and tries to find out the general laws reflected in sponge city management and construction indexes between different regions and cities and demonstrates the macro-management framework of sponge city. The mechanism is shown as follows:
Figure 3 is the modelling process of clustering analysis used in this article. The detailed steps are as follows:
- (1)
The modelling process starts with sample city selection as the far left side of
Figure 3. The sample cities to be studied are first determined, as shown below, 71 Chinese cities were selected as sample cities.
- (2)
The second step is to select the original data corresponding to the sample city according to the sample city determined in the first step. Since the research object of this paper is sponge city, which is inseparable from meteorological factors, the original data selected in this step is the meteorological data of the sample cities. The reasons for the selection of meteorological data are also discussed in detail below.
- (3)
The third step is to determine the indicators for cluster analysis. As shown below, 8 indicators were selected as cluster analysis indicators in this article.
- (4)
The fourth step is “annual averaging treatment.” Because the original data are the meteorological data of each year of 30 years in each city and for the purpose of this paper, the meteorological data of these single years need to be aggregated and averaged.
- (5)
The fifth step is the standardization of data processing, which is used to uniform dimension and eliminate the adverse effects caused by the inconsistency of each indicator unit of the original data.
- (6)
The sixth step is to complete the pre-processing of sample data and make the data enter the stage of cluster analysis.
- (7)
Steps 7 to 10 in the dotted box are the calculation processes of clustering process. The seventh step is to determine the centre of clustering, that is, to determine the data to be processed into several categories. As shown below, 7 clustering centres were finally determined in this paper.
- (8)
In this step, we calculate the Euclidean distance between the data of 71 sample cities and 7 clustering centres respectively and only cities close to the clustering centre will be included in this category.
- (9)
In step 9, the clustering centre is updated because it needs to be recalibrated to ensure that the distances of the seven clustering centres are appropriate.
- (10)
In step 10, the square sum of the distance between the sample city data and the clustering centre is calculated to minimize the distance between the cities in each category and the clustering centre of this category, so that the clustering result is more stable. It should be noted that from step 8 to step 10, this is an iterative calculation process and the centre of the cluster is constantly updated and the distance of the data is calculated, until the centre of the cluster and the clustering result do not change significantly.
- (11)
In this step, we have completed the whole process of clustering calculation and obtained stable clustering results. We also output the results of clustering.
- (12)
This is the last step of cluster analysis, that is, analyse the clustering results obtained in the previous step and formulate appropriate policies and measures according to the clustering results of the sample cities.
The index system of clustering analysis model adopted in this article is the combination of
n-dimensional clustering features:
, where
represents the clustering features of the
j-dimension. The data used in the clustering model are the annual meteorological data under the combination of specific indicators of each city, whose matrix is expressed as follows:
where, the matrix
X represents the multivariate feature annual meteorological data value of a city and
represents the annual average value corresponding to the meteorological feature
j of the sample day
i of a city. The calculation formula is as follows:
where,
represents the actual data value corresponding to the
m year of the characteristic variable
j on the day of sample day
i,
is accumulated and averaged from years of data. In order to eliminate dimensional differences, the data entering the clustering model are processed in a standardized way, so as to keep the measurement standards of data consistent and avoid some variables with a large order of magnitude from affecting the final classification results. The standardization method used in this article is maximum and minimum value standardization and the standardization formula is:
where,
represents the variable,
represents the standardized variable,
represents the minimum value in the value of the corresponding variable column and
represents the maximum value in the value of the corresponding variable column. The standardized meteorological data matrix is shown as follows:
The above is the pre-processing process of the original meteorological data. After the pre-processing, the standardized meteorological data matrix
X* is put into the clustering model for calculation. In this article,
k-means method is adopted for clustering analysis, Euclidean distance is taken as similarity measure value and the sum of squares of distances between samples and clustering centres is taken as the clustering objective to reach the minimum. Firstly, according to the research purpose, the sample data is divided into
k groups, to calculate the mean groups of sample data and determine the
k corresponding to the clustering centres. Give the clustering centre features under the combination of sample values and then calculate the rest of the sample and the Euclidean distance between the
k cluster centre. Then iterate the process and update the clustering centre, until the distance between the sample with the corresponding clustering centre, to minimize the sum of iterative calculation and make sure the clustering centre is changeless. The sample data gather into
k clusters finally. The measurement objective function applied by the clustering model is as follows:
where,
is the square sum of the distance between the sample value and the clustering centre;
is the sample
i value of the subordinate
k group;
is the mean value of the corresponding sample data of the
k group; and the final clustering objective is achieved by minimizing the value of the objective function
.
The premise of efficient operation of sponge city is to identify and control the characteristics of related management elements. The relationship between sponge city and natural meteorology is inseparable, so the management and construction of sponge city need the technical support of meteorological data. The existing meteorological data mainly include the city’s temperature, wind speed, water vapor pressure, precipitation and other meteorological information. The specific data structure includes the average temperature, maximum temperature, average wind speed, average precipitation, average water vapor pressure and other elements constitute. These meteorological elements show the monitoring point in the region of the time-sharing meteorological conditions. Meteorological environment is characterized by geographical connectivity, that is, cities and regions with similar geographical locations and similar features of mountains and landforms have certain similarity in meteorological performance and even unconnected geographical regions show certain laws in meteorological conditions, especially in the characteristics of certain meteorological elements. Therefore, the mining of meteorological data composed of the above elements and the exploration of the general law of meteorological data can enable each city to have a deep and clear understanding of its own conditions and provide reference basis for the formulation of sponge city plans and the preliminary exploration, planning and design of system construction in various regions. The planning and construction of low-impact development rainwater system and the subsequent monitoring and evaluation of low-impact development rainwater system between cities and regions with similar geographical location and geomorphological features can be linked and coordinated. This approach improves the overall efficiency of sponge city construction. The consistency of meteorological indexes between cities is conducive to the linkage of the overall planning of sponge city construction plans of national, administrative and municipal government departments at all levels and to the improvement of the work efficiency from early design and development to late evaluation and maintenance of the system.
In this article, according to the idea of zoning, sponge city macro-management area is divided according to China’s geographical regions. The original control indexes are decomposed and other control indexes such as invalid precipitation time and seasonal precipitation factors are introduced to carry out further analysis, so as to dig into and explain the consistency of control indexes between cities and the relevant laws presented. The final clustering results of data are determined by the index variables that constitute the data and the final clustering results of data points reflect the law of data on the clustering index variables. The processing and analysis of the regional meteorological data is an important part of the construction of sponge city. In order to ensure that the data remain general in time series, the time span of the data should be no less than 30 years and the research index should be the perennial meteorological index [
36]. Some scholars have conducted decomposition studies on the annual stormwater total control rate index, detailed the impact of the index on the target value of sponge city construction and decomposed the sponge city construction index and it is conducive to the specific implementation of low-impact development rainwater system construction, providing data support for system construction [
58]. In order to further explore the general law of meteorological indexes in Chinese cities, this article decomposes the rainfall indexes according to the year, day and season. At the same time, other meteorological indexes related to rainfall are introduced into the data set to be processed, so as to explore the interaction between multiple indexes and the law of their presentation.
The sponge city construction takes the city as the unit and the meteorological conditions between adjacent cities in geographical space are consistent [
55]. According to the standard geographical division of China, the cities selected in this article cover all seven geographical divisions of central China, north China, south China, southwest, northwest, northeast and east China. According to the research purpose, eight meteorological indicators finally entered the cluster analysis model, respectively is: average annual daily temperature of 30 years (unit: °C), average annual daily precipitation of 30 years (unit: mm), average annual daily precipitation in spring of 30 years (unit: mm), average annual daily precipitation in summer of 30 years (unit: mm), average annual daily precipitation in autumn of 30 years (unit: mm), average annual daily precipitation in winter of 30 years (unit: mm), average annual total precipitation of 30 years (unit: mm), average annual days of ineffective precipitation of 30 years (unit: day).
In terms of index variable processing and selection, this article extends the control target, further decompose and refine indexes such as precipitation and introduce other control elements. The meteorological clustering index and its corresponding control target elements and administrative departments are shown in
Table 1. The method of index processing in this article is mainly based on the following considerations:
- (1)
Introduce seasonal factors into meteorological indicators. It can be seen from the geographical characteristics of precipitation distribution and climate distribution in China that there are differences between summer and winter precipitation in most regions of China. For example, the temperate continental climatic belt shows that precipitation is concentrated in summer, while the monsoon climatic belt shows that there is more precipitation in summer and less precipitation in winter. In order to reflect the possible problem of seasonal water shortage or excessive rainfall, seasonal factors were taken into account in the modelling stage. The average daily precipitation in spring, summer, autumn and winter of the sample cities were calculated on the basis of seasons.
- (2)
Introduce the concept of ADIP. In this article, the number of days when the average daily precipitation of 30 years of each sample city is less than or equal to 2.0 mm is counted as the value of ADIP, which is added into the clustering model as a variable. The index of ADIP examines the annual precipitation distribution of sample cities from the perspective of time and whether the precipitation distribution is uniform throughout the year. For example, some southern regions of China are “drought,” the annual precipitation is relatively abundant but prone to drought in summer and the communication situation, the influence of low development of rain water system construction should be considered when in invalid precipitation period how to improve the water conservation function.
- (3)
Introduce temperature index. The existing reference indexes of sponge city construction are mainly rainfall indexes and other important meteorological indexes are not referred to, while air temperature has an important interaction relationship with rainfall and non-point source pollution. Therefore, this article introduces the annual average daily temperature index in the model, corresponding to the annual average daily precipitation and conducts clustering analysis.
In order to deeply explore the influence of various variables on the clustering results, this article adopts the method of gradual accumulation of multiple variables based on clustering analysis. From average value of rainfall of accumulated years, cluster analysis is carried out for many times in the form of multiple variable combinations, gradually accumulating variables and variable combinations that are processed by clustering. For example, variables such as ADP, ADSU, ATP and ADIP were taken as clustering variables to cluster 71 data records separately and the clustering results at this time were observed and recorded. Then, variables such as annual average daily precipitation and annual average ineffective precipitation time were combined to observe and record the clustering of data corresponding to different variable combinations.
3.2. Data Selection
In order to ensure the accuracy of the data, the meteorological data used in this article are from the China meteorological data network sponsored by the China meteorological administration. Selection of data sets on duty Chinese Ground Annual Value Data Set (1981–2010), The data set is compiled from the monthly report informatization documents of China’s basic, benchmark and general ground meteorological observation stations. It is the daily climate standard data including air pressure, temperature, precipitation, wind speed and other elements. The time is 1 January 1981 to 31 December 2010 and the span is 30 years.
Since the purpose of this study is to explore the clustering characteristics in the sense of sponge city between cities in mainland China, the following considerations should be taken into consideration in the selection of sample cities in this paper: (1) Sample cities should be distributed as far as possible within each large geographical area of mainland China and each large geographical area should contain a certain number of cities; (2) At the administrative level, each province in mainland China should also contain a certain number of cities; (3) Due to the importance of capitals within administrative regions, it is necessary to ensure that each administrative capital has access to the sample set; (4) The urban data entered into the sample set should be consistent in the statistical time span. Therefore, this paper finally selected 71 cities as samples for analysis. However, meteorological data of some cities, especially those of administrative capitals, cannot be included in the sample set due to the inconsistency of statistical calibre. The data structure includes nominal variables and meteorological indicators, including the city name, station number, daily serial number and various meteorological indicators. The form is the average value of the meteorological data of the city for 30 years, a total of 365 records, arranged in ascending order from 1 to 365.
On the basis of the original data, this article takes ADP of the city as the average accumulated daily precipitation index and ADT in the original data as the average accumulated daily temperature index of the city. At the same time, this article deals with ADP in the original data: (1) Calculate the average daily precipitation of each quarter according to the annual distribution of the four seasons. Spring is the march, April and May of the year and the daily serial number ranges from 60–151, a total of 92 days; Summer is the whole year in June, July and August, the day serial number range is 152–243, a total of 92 days; Autumn is September, October and November of the year. The serial number ranges from 244 to 334 days, a total of 91 days. Winter is the whole year in January, February, December, the day serial number range is 1–59 and 335–365, a total of 90 days. In the data of each city, ADSP, ADSU, ADAU and ADWI are calculated in the range of daily serial number according to the season length. (2) In the data of each city, ATP is taken as the average accumulated total precipitation of the city for 30 years according to the statistical value of daily serial number 1–365. (3) Daily precipitation events with precipitation less than or equal to 2.0 mm are deleted from the data of each city. According to the annual distribution of daily precipitation events that meet the above conditions, ADIP of the city is determined.
In this article, a total of 71 sample cities were selected from 7 standard geographical regions in China for cluster analysis and the average values of indicators of each geographical region were calculated respectively according to the meteorological data of the cities belonging to each geographical region. The number of sample cities of each geographical region and the average values of each index were shown in
Table 2.
As can be seen from
Table 2, among the seven geographical regions in China, the value of ADT is the lowest in northeast China and the highest in south China. In terms of precipitation, ADP, ADSP, ADSU, ADAU, ADWI and ATP in northwest China are all at the lowest level and ADIP is the longest. ADP, ADSP, ADSU, ADAU and ATP in south China are all at the highest level and ADIP is the shortest. ADWI is at its highest in eastern China.