2.3.1. Theoretical Foundation
The term ‘fractal’ was first proposed by Mandelbrot, with the initial meaning of irregular and broken. It was defined as a rough geometric shape that can be subdivided into smaller parts, which are the copy of the whole [
30]. Fractals are born in nature and often used to describe intricate natural phenomena, such as the length of a coastline, the shape of a cloud and the outline of a leaf. Fractal geometry has been applied in various fields, and has been widely adopted in geography for approximately 30 years. A number of publications have discussed the multiscale spatial organization resulting from urban growth [
31,
32,
33]. Others explore the statistical scaling relationship between the size and number of urban clusters [
25,
34,
35]. Fractals such as Fournier dusts have strict hierarchy between distances and numbers of empty lanes: their numbers will increase as their widths become narrower. Such characteristics could be recognized by the dilation method proposed by Minkowski [
36], as shown in
Figure 3. In dilation, every element is surrounded by a buffer of increasing width. It is obvious that the number of clusters will ultimately decrease to one, which is to say that there exists only one enormous cluster (
Figure 3d). If a spatial organization resembles Fournier dusts, the number of clusters N and the size of buffer width ε will follow a power-law function. If we take the logarithm of both sides, a linear relationship with slope equaling to fractal dimension D would appear (Equation (1)).
where m is a constant of proportionality. If we draw a log–log plot about N and ε, and use a curve to fit it, we can obtain a dilation curve reflecting the spatial scaling behavior of patterns.
Built-up land in an urban area (except transportation land and squares) (
Figure 4b) is morphologically similar to Fournier dusts (
Figure 4a). Both are separated by hierarchical streets (lanes) following the rule that the narrower, the more. In contrast, built-up land in a rural area (
Figure 4d) is more random and has an irregular spatial organization (
Figure 4c). Tannier built three theoretical urban patterns: (1) regular fractal city, (2) random fractal city, and (3) regular fractal city in a nonfractal environment (an appropriate substitute of a real city involving urban and rural areas) [
25]. The dilation curves of these three patterns express different scaling characteristics. Only the third pattern shows a sharp change in its dilation curve. In the third theoretical urban pattern, the urban area is a fractal pattern which is well-bedded, hierarchical, high-density and agglomerative, whereas the rural area is amorphous, random, low-density and scattered. Different spatial organization leads to different scaling behavior, so that we may find a significant change in one city’s dilation curve, separating urban and rural areas.
According to the aforementioned findings, Tannier et al. developed a fractal-based approach to identify the urban boundary [
25]. Although this method is adapted to both vector and raster data in theory, it may have difficulties when applied to raster data. One problem is that raster resolution cannot satisfy the need of exponential growth of buffer radius similarly to vector data. Another limit of this method is the lack of approach distinguishing urban and rural clusters automatically from rank-size distribution, which reduces the comparability of urban boundaries between different spatial or temporal dimensions. Therefore, this paper proposed an improved method to delineate urban boundaries.
2.3.2. Urban Boundary Delineation
The basic logic of the fractal approach to delineating urban boundaries is that the number of built-up land clusters and their size at each dilation step should follow a linearly shaped power-law function [
25]. It is assumed that two spatial subsets with distinct fractal characteristics would be obtained when the deviation between the dilation curve and a straight line reaches the top point. The top point is regarded to be the optimum threshold for classifying the built-up land patches, because the fractality of built-up land would no longer exist beyond the threshold. After that, all the built-up land patches are buffered with the optimum threshold and the rank-size distribution of new clusters can be re-plotted. A challenge emerges in the last step to distinguish urban and rural clusters. Instead of artificial judgement, the focus of our approach is to automatically classify the urban and rural clusters.
Figure 5 shows the flowchart of the proposed approach and details for each step are described as follows:
First, every built-up patch will be covered by a steadily widening buffer in every iteration, and the buffer radius ε and the number of built-up clusters N will be stored.
Second, to ensure that ε can exponentially grow, we adopt a method of interpolation: cubic hermite interpolation (CHI) [
37,
38]. Cubic hermite interpolation can ensure that both function value and 1st–nth derivatives remain the same between the interpolation polynomial and original function. Especially, we can assign derivatives of the interpolation polynomial in the node. In view of the distribution of log(ε) and log(N) monotonically decreasing, the closer two nodes are, and the more similar their 1st derivatives are. Therefore, we calculated every 1st derivative in the node with the following equation:
where
n is the number of nodes, k
1 and k
2 are the slope from node i − 1 to node i and that from node i to node i + 1, respectively, whereas d
1 and d
2 are the distance from node i − 1 to node i and that from node i to node i + 1, respectively. The 1st derivatives of the first and last node are equal to slopes of their nearest node to themselves, respectively. In this paper, we set 0.1 as the sample interval, which meant that the position (log(ε
0) + 0.1 ∗
n) could be interpolated. Meanwhile, the first and last points will be reserved.
Third, log(N) and log(ε) were fitted with a polynomial from the 1st to 13th degree to obtain 13 estimated curves. Then, the best dilation curve was selected according to the Bayesian information criterion (BIC). BIC can evaluate the trade-off between complexity and goodness of fit for statistical models [
39], as shown in Equation (3), where
n is the number of data,
p is the number of model parameters, and σ
2 is the mean-square error (MSE).
This equation shows that only when the polynomial fits well and its complexity remains low does the BIC value reach the minimum. After we selected the best estimated dilation curve, we needed another indicator to catch the discontinuity in scaling behavior: curvature k (Equation (4)).
where y′ is used to measure the velocity of the decreasing number of clusters, y″ is its accelerated velocity, and k is the ratio of y″ and y′. BIC: Bayesian information criterion; Existing method: the fractal approach [
25].
In this study, the point of main curvature of a dilation curve with minimal BIC corresponds with the distance threshold. To avoid the point of main curvature arising from estimation artifacts, two extremities of the estimated curve should be removed before calculating the curvature [
25]. If there is not a point of main curvature within a valid range, other dilation curves with secondary BIC values would not be selected until a meaningful curvature appeared. The data range can be determined by Equation (5), the same as the Morpholim software [
40]:
where X means log(ε) and R means the valid range of X.
Next, built-up land was covered with a buffer radius equaling the distance threshold. Then, the rank-size distribution of new clusters was plotted. Instead of artificial judgement, we adopted a bottom-up clustering approach named hierarchical agglomerative clustering (HAC) [
41] to distinguish urban and rural clusters. In consideration of the characteristics of the rank-size distribution of urban and rural clusters, we chose the single-linkage as the linkage criteria of HAC. This means the minimum distance between any two subsets which determines whether these two subsets are able to merge together or not. Urban clusters seem to be top-ranking, few and large, whereas rural clusters are always lower-ranking, abundant and much smaller; therefore, Euclidean distance is sufficient as a metric. All clusters will be divided into two clusters: the one including the first rank cluster must be the urban set, and the other one is the rural set. The clustering result will be exhibited in hierarchical clustering dendrograms. Finally, we extracted the urban boundary from urban clusters. All of the above steps were performed using Visual C++ programming.
2.3.3. Urban Boundary Characteristics
After we extracted the boundary of the clusters, the city was divided into two spatial forms: urban areas within the boundary and rural areas outside of the boundary. To explore the shape of the boundary and the difference between urban and rural area as the assessment of delineated boundary, we then calculated multidimensional indices [
3]. The first one characterized fractal patterns or sets by quantifying their complexity as a ratio of the change in detail to the change in scale [
42]. The second one measured the average morphological compact degree of spatial organization. The third one was an index reflecting aggregation effect, and the final one showed the proportion of one landscape in a scope.
The fractal dimension is a ratio reflecting the space-filling capacity of a pattern in changing scales, which does not have to be an integer. If we used rulers of different lengths to measure the length of a coastline, we would not obtain a unique value, because coastlines show more details at higher resolution. This is why we need an index named fractal dimension instead of a perimeter to describe a fractal-like coastline. It can be calculated by different methods, such as the structured walk method [
43], equipaced polygon method [
44], hybrid walk method [
45] and cell count method [
46]. However, there is a small diversity in the calculated dimensions [
47]. In this paper, we used the box-counting method [
48,
49] realized in a fractal analysis software package named Fractalyse [
50] to calculate the fractal dimension of the boundary (D
B) and surface (D
S). This method uses the exponential growth box size to cover the whole picture and then counts the boxes containing objects. Then, the result is displayed on a log-log plot. If a pattern is fractal, points can be fitted by a straight line whose slope is its fractal dimension.
To be specific, the raster images in the five years were all extracted by the minimum enclosing rectangle of the urban boundary in 2016. These extracted images were converted into BMP format and added into the Fractalyse software, which was downloaded from the official address (
https://sourcesup.renater.fr/www/fractalyse/, accessed on 3 March 2020). The upper left corner was used as the start point for box-counting. We chose six theoretical patterns (
Figure 6) with known fractal dimensions to test the approximate accuracy of box-counting in Fractalyse: boundary and surface of circle, square, Sierpinski carpet and Fournier dusts. The box size was set following an exponential function whose exponent equaled 2 and the grid algorithm. The result presented a high correlation coefficient of size and number (near 1.0) and tiny deviation (smaller than 0.06), which demonstrates the high precision of the box-counting method in Fractalyse in calculating the fractal dimension (
Table 2). Theoretically, the fractal dimension of a point is 0; a line is 1; and a surface is 2, which equals their own topological dimensions. If a square is narrow enough, its fractal dimension may be close to 1, because the space-filling capacity of this pattern finally degenerates to that of a line in large scales. If a line is complex and winding enough or widens, its space-filling capacity will be stronger in small scales than smoother lines, such that its fractal dimension may be close to 2. When we calculate the fractal dimension of sets of lines or squares, it reflects the space-filling capacity of all the elements in changing scales, not a part of them. In theory, the space-filling capacity of built-up land in urban areas is stronger than that in rural areas. With respect to the urban boundary, the minimum size was 2 pixels and maximum size was 8192 pixels. The minimum size was 1 pixel and the maximum size was 8192 pixels for both urban clusters and rural clusters.
- 2.
Area-weighted compactness index
The compactness index (CI) is used to evaluate whether the shape of one patch is compact or not, which in some places is named after the shape complexity. It is calculated using the following formula:
where A is the surface area and P is the perimeter.
A circle has the most compact geometry, whose CI reaches a maximum equaling 1. If a patch is very compact, CI is close to 1. In contrast, if a patch has a long perimeter and small surface area, CI is close to 0. We expected to compare overall patches in urban areas to rural areas in terms of the compactness index; therefore, we designed the area-weighted compactness index (AWCI):
where A means the total area.
We calculated the AWCI of urban and rural built-up land to explore the difference in shapes between two such patterns.
The aggregation index (AI) is a global measurement regarding the aggregation of landscapes, which is mathematically described as the ratio of actual shared edges versus maximal possible shared edges [
51]. It was calculated as follows:
where g
ii is the number of similar adjacencies tallied using the single-count method, whereas the max g
ii is the maximum possible number of similar adjacencies corresponding to the whole landscape.
The single-count method means that if two neighboring pixels belong to the same landscape, it tallies only once. Hence, the maximal aggregation is achieved when the patch type consists of a single and compact patch, which is not necessarily a square patch. We calculated this index with FRAGSTATS v4 [
52], which is a computer software program designed to compute a wide variety of landscape metrics for categorical map patterns.
Density in this paper is the ratio of built-up land surface area and the urban or rural area, divided by the urban morphological boundary we have defined. Its formula is:
where D
u/r means the built-up land density in the urban or rural area, S
u/r means the built-up land area in urban or rural area, and A
u/r means the whole area in the urban or rural region.
By comparing Du to Dr, it is easy to observe the difference in built-up land distribution in spatial scales between urban and rural areas, contributing to further explorations of the functions of such morphological boundaries.