4.1. GIS Statistical Analysis
In the experiment, each algorithm is run five times with 10-fold cross-validation on the same dataset, which is equivalent to executing the algorithm 50 times, taking the average of the correct rate as the final result and finding the AUC by using the average of the first category for comparison. The experimental results are shown in
Table 2. AUC (area under the curve) is a model evaluation index, ACC (accuracy) refers to accuracy, DBN (deep belief network) refers to deep belief network, and NBC (naive Bayes classifier) refers to the naive Bayes classification algorithm. The experimental results show that for most datasets, the joint operation not only improves the accuracy rate, but also increases the AUC value and comprehensively improves the performance of the classifier. The joint operation actually plays a role similar to the kernel function in SVM here, that is, combining two attributes into a new attribute that can provide more information that is beneficial to classification.
The running results of the NBC on the selected datasets are shown in
Figure 4. Five of the datasets did not find any deletable attributes, so in this round of experiments, only twenty-five datasets were used. The experiment tested a total of five indicators: accuracy, root mean square error, the area under the receiver operating characteristic curve, classifier modeling time, and classification time. Among them, the higher the correct rate, the better; the root mean square error represents the degree of dispersion of the classification results, and the smaller the error, the more stable the classification results; the ROC curve has been used in previous experiments, and the evaluation criteria for the classification results are more accurate than just focusing on accuracy. The rate is more comprehensive, and the larger the value, the better. For the final modeling time and classification time, the smaller the value, the better.
The descriptive statistical results of PAHs in the subsidence area of Yongtai County are shown in
Table 3. It can be seen from the table that the content of PAHs in the soil of the subsidence area ranges from 6.81 ng/gdw to 408.79 ng/gdw. The coefficient of variation of PAH content in the two subsidence areas was more than 1.0, which indicates strong variation. Although the conventional statistical analysis of PAH content in subsidence area can summarize the whole picture and overall characteristics of soil PAH content, it cannot reflect its local variation characteristics, that is, it can only reflect the whole sample to a certain extent, but cannot quantitatively describe the randomness, structure, independence, and correlation of soil PAH content. From the perspective of spatial distribution characteristics, the overall distribution characteristics are strong in the north and weak in the south. Specifically, grade I functional areas of agricultural production are mainly distributed in northeast villages and towns, including Hongxing Township, Baiyun Township, and Danyun Township. There are four townships in the grade II agricultural production functional area, which are mainly located in the northwest and central towns. The grade III agricultural production functional area includes eight townships, mainly located in the central and eastern townships of Yongtai County. The fourth-grade functional areas of agricultural production are mainly distributed in the towns of central and southern China, including Wutong Town, Chek Tin Township, Camphor Town, and Chengfeng Town. Grade I agricultural production functional areas are mainly distributed in Hongxing Township, Baiyun Township, and Danyun Township. The second-grade agricultural production functional area includes eight townships, mainly located in the north of Yongtai County. The third-grade agricultural production functional area includes six townships, mainly located in the central and western regions of Yongtai County. The fourth-grade functional areas of agricultural production are mainly distributed in the towns of central and southern China, including Wutong Town, Chek Tin Township, camphor town, and Chengfeng town. Therefore, the northern township of Yongtai County has a stronger agricultural production function, while the southern township is relatively weak in agricultural production development. From the perspective of temporal variation characteristics, the overall spatial differentiation has little change, but the differentiation pattern has changed. The average agricultural production function index in Yongtai County is 0.010714 and the median is 0.0099720; the average in 2015 is 0.010715 and the median is 0.010721, which indicates that the agricultural production function index in Yongtai County is rising over time. The coefficient of variation of the agricultural production function index in Yongtai County is 0.273, and the coefficient of variation in 2015 is 0.243, which indicates that spatial differentiation is more obvious.
The cluster analysis is based on the proximity of the geomorphic window in the spatial coordinate system for classification. This paper uses the hierarchical clustering method to classify the geomorphic types of the study area. First, the geomorphology of the study area is divided into five categories. The statistical values of various geomorphic factors are shown in
Table 4 and
Figure 5. ESTDmean refers to the average of the standard deviation of elevation, SMEANmean refers to the average of the slope, SSTDmean refers to the average of the standard deviation of slope, and ARmean refers to the average of the area rate. On the basis of the above classification, karst landforms can be further divided into subcategories to highlight the spatial differentiation characteristics of karst landforms under the influence of multiple factors. Among them, landform types I, II, and III are the most widely distributed in the study area. This type of landform is the product of the late development of early karst cycles. It is well preserved due to the influence of water flow traceability and erosion. It is in the form of remnant mound troughs, gentle mound troughs, karst basins, and other landforms, and the standard deviation of elevation is 18.27 m. The average slope is 11.79°, the slope standard deviation is 9.47, and the area ratio is 1.05. These parameters all indicate that this type of karst landform has the characteristics of wide and gentle wavy karst planing surfaces, its groundwater level is shallow, and the Quaternary loess residual. The thickness of the material is large, the development of the landform is dominated by the lateral dissolution of the current, and it is continuously leveled. However, due to the decline of the erosion benchmark, there are sinkholes and funnels on the edge, and the landform type transitions to the Type II landform. From the secondary cluster analysis of the geomorphic factors of this type of karst landform unit, it can be further divided into Type I-1, remnant mound troughs and basins; Type I-2, gentle mound troughs and karst depressions; and Type I-3. In terms of space, the differentiation of Type I landforms is not only affected by the deep cut of the lateral valley as the base level of regional excretion, but also by the lateral valley as the base level of local excretion. Therefore, from when the upstream watershed of the karst system transitions to the regional drainage base level and the local drainage base level, the Type I landform transitions from the Type I-1 landform to the Type I-3 landform. The degree of adaptation of the karst development in the descending area and strengthening of the local erosion base level shows the greater the depth of the karst development.
4.2. Spatial Differentiation Feature Analysis
Because factor ecological analysis is an analysis method in the category of induction, it requires multivariate statistical analysis, and the selection of variables must be universal and relatively comprehensive. In the research of urban social space, most scholars choose urban demographic data as social factors. The research on rural social space emphasizes the network space of daily practice behavior. Therefore, the selected influencing factors involve population, residence, employment, schooling, etc., while taking into account the spatial attributes of rural social space, the spatial distance and topography are considered. The addition of factors can better reflect that rural social space is a spatial manifestation of rural residents’ daily life practice. The correlation between the soil erodibility K value and organic matter content is shown in
Figure 6. The formation and change of rural settlements are often affected by multiple factors. The impact of various indicators on the settlement distribution is also different, so it is necessary to grade and assign weights to the indicators and explore the significant degree of correlation between the impact factors and settlements, as well as between different impact factors. From the perspective of temporal variation characteristics, the overall spatial differentiation has little change, but the differentiation pattern has changed. The K value of the agricultural production function index of each township is 10.714, with an average of 20.715, indicating that the agricultural production function index of each township is rising over time. From the perspective of spatial distribution, it is strong in the middle and low on both sides. Specifically, the coefficient of variation of the rural life guarantee function in Yongtai County is greater than 0.5, which indicates that its spatial differentiation is obvious. The first-level functional areas of the life guarantee function include Zhangcheng town and Chengfeng town. The second-level functional areas of the life support function include Dayang Town, Tongan town, and Wutong Town, all located in the central part of Yongtai County. The level III functional area of the life security function includes five townships, which are mainly located in the periphery of level I and II functional areas. The level IV functional area of the life security function includes 11 townships, with the proportion as high as Yongtai County, and the number of townships is generally above. From the perspective of time change, the ecological pattern of Yongtai County is relatively stable, and the change in the functional index is not obvious.
The coefficient of variation of the content of soil texture components of the main soil types is shown in
Table 5. Through the scattered point trend prediction regression analysis, the relationship between the K value and the sand content is analyzed to obtain an exponential relationship distribution; the relationship between the K value and the silt content is analyzed to obtain a power function relationship between the K value and the silt grain; the relationship between the K value and N1 content is analyzed; and the relationship between K value and N1 tends to be a power function.
The test results of the GWO-SVM model are compared with the results of the PSO-SVM, GA-SVM, and ABC-SVM models. The classification results are shown in
Table 6 and
Figure 7. SVM (support vector machine) refers to support vector machine, GA-SVM refers to genetic algorithm optimization support vector machine, PSO-SVM refers to particle swarm optimization support vector machine, ABC-SVM refers to intelligent bee colony optimization support vector machine, and GWO-SVM refers to gray wolf optimization support vector machine. The results show that GWO-SVM has good classification performance. Its cross-validation rate reaches 91.66%, and the recognition rate of test samples reaches 82.41%. Both the cross-validation rate and the recognition rate are the best among several comparative models. It shows that the use of GWO-SVM has better evaluation accuracy for the evaluation of small sample datasets. When the sample distance is greater than the range, the samples become completely independent.
The change in the landscape index of rural settlement patterns is shown in
Figure 8. From the perspective of time change, the growth trend of rural life security function in Yongtai County is obvious, and the spatial differentiation characteristics have obvious changes. The average and median life security function in 2013 were less than that in 2015, and the number of villages and towns in grade I and II functional areas in 2015 was significantly more than that in 2013, indicating the overall improvement of life security function in Yongtai County in 2015. From the perspective of spatial distribution pattern change, although the number of each functional grade area has changed at the two evaluation time points, the relative strength relationship between the township functional index has not changed significantly. The analysis of per capita income is helpful to understand the degree of wealth and poverty in a region. From the source of income, a higher degree of industrial agglomeration and a more developed industrial and agricultural system will bring higher income, which can reflect the level of urbanization in a region; the level of urbanization is closely related to the distribution characteristics of rural settlements. Logically, per capita income has a positive impact on the density of rural settlements. Nearly 75% of the rural settlements are distributed in gentle areas with slopes of less than 2°, and less than 2% of the rural settlements are distributed in areas with slopes greater than 10°. The impact of the slope on the spatial distribution of rural settlements is more obvious than that of altitude. Low-slope areas are conducive to the construction of residential areas and infrastructure, and agricultural production can be better carried out, so it is easier to form settlements.
Non-normally distributed data will cause the semivariogram to produce a proportional effect, enlarge the impact of errors, raise the sill value and nugget value, and change the correlation of the spatial structure. Therefore, in order to eliminate the proportional effect, the Hg data of the soil surface layer after removing the specific value meets the normal distribution after logarithmic transformation, and the Hg data of the deep soil layer is also processed in the same way. In the case of data not belonging to a normal distribution, it is generally necessary to transform them. If the data are transformed and the transformed data still do not meet the conditions of normal distribution, Krüger interpolation should not be used. The impact of spatial relationship similarity and name similarity on data links is shown in
Figure 9. It can be seen from the figure that when measuring the result of the data link, the utilization rate of spatial topological relationship similarity is obviously higher than other similarities. However, name similarity is more different in data linking, and it is more convenient to analyze whether the link is successful. Therefore, spatial topological relationship similarity and name similarity are more important in data linking and matching processing than category similarity.
The prediction accuracy evaluation of the model is shown in
Table 7 and
Figure 10. It can be inferred from the results that the prediction accuracy of the three prediction models is more than 88%, and the radial basis function neural network prediction model is better than the least squares support vector machine prediction model and the random forest prediction model. From the prediction accuracy of different soil layers, the prediction effect of the least-squares support vector machine prediction model and radial basis function neural network prediction model in the 0~20 cm soil layer is better than that in the 20~40 cm soil layer, which may be caused by the more obvious effect of meteorological factors on surface soil and the heavy influence of meteorological factors on all prediction factors; however, the prediction effect of the random forest prediction model in the 20~40 cm soil layer is slightly better than that in the 0~20 cm soil layers. Through the accuracy analysis and comparison, it can be found that the radial basis function neural network prediction model has the best prediction effect in this study area.
The statistical values of the ecological functions of each township in Yongtai County are shown in
Table 8. From the perspective of spatial characteristics, the rural areas are more functional in the central and northeast areas than in other areas. The coefficient of variation is less than 0.2 and that in 2015 is less than in 2013, which indicates that the spatial differentiation of comprehensive function in rural areas is less significant and has a decreasing trend. Specifically, in 2013, the first level functional zone includes Tangqian Township and Chengfeng town. The second level functional zone includes six towns. The third level functional zone includes five towns. The level IV functional zone includes eight towns. In 2015, the first-grade functional areas include seven townships, namely Tangtang Township, Chengfeng Town, Song Kou Town, Tongan Town, Wutong Town, Dayang Town, and Qing Liang Town. The second-level functional zone includes four towns. The third-level functional zone includes seven towns. The level IV functional zone consists of two towns. From the perspective of time change, the comprehensive function of rural areas has been enhanced as a whole, and the spatial differentiation pattern has changed. In 2015, the number of townships in the grade I functional zone was significantly higher than that in 2013, and that of the class IV functional zone was significantly lower than that in 2013. In general, the multi-function of rural areas increased over time, but the growth of different towns was different.