Next Article in Journal
Use of Multi-Feature Extraction and Transfer Learning to Identify Urban Villages in China
Previous Article in Journal
Giant Aerosol Observations with Cloud Radar: Methodology and Effects
Previous Article in Special Issue
Snow Cover Extraction from Landsat 8 OLI Based on Deep Learning with Cross-Scale Edge-Aware and Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatial and Temporal Variations in Soil Organic Carbon in Northwestern China via Comparisons of Different Methods

1
College of Geography and Environmental Science, Northwest Normal University, Lanzhou 730070, China
2
Linze Inland River Basin Research Station, Key Laboratory of Ecohydrology of Inland River Basin, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou 730000, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(3), 420; https://doi.org/10.3390/rs17030420
Submission received: 9 December 2024 / Revised: 19 January 2025 / Accepted: 23 January 2025 / Published: 26 January 2025

Abstract

:
Soil organic carbon (SOC) is a crucial component for investigating carbon cycling and global climate change. Accurate data exhibiting the temporal and spatial distributions of SOC are very important for determining the soil carbon sequestration potential and formulating climate strategies. An important scheme of mapping SOC is to establish a link between environmental factors and SOC via different methods. The Shiyang River Basin is the third largest inland river basin in the Hexi Corridor, which has closed geographical conditions and a relatively independent carbon cycle system, making it an ideal area for carbon cycle research in arid areas. In this study, 65 SOC samples were collected and 21 environmental factors were assessed from 2011 to 2021 in the Shiyang River Basin. The linear regression (LR) method and two machine learning methods, i.e., support vector machine regression (SVR) and random forest (RF), are applied to estimate the spatial distribution of SOC. RF is slightly better than SVR because of its advantages in the comparison of classification. When latitude, slope, and the normalized vegetation index (NDVI) are used as predictor variables, the best SOC performance is shown. Compared with the Harmonized World Soil Database (HWSD), the optimal scheme improved the accuracy of the SOC significantly. Finally, the spatial distribution of SOC tended to increase, with a total increase of 135.94 g/kg across the whole basin. The northwestern part of the middle basin decreased by 2.82% because of industrial activities. The SOC in Minqin County increased by approximately 62.77% from 2011 to 2021. Thus, the variability of the spatial SOC increased. This study provides a theoretical basis for the spatial and temporal distributions of SOC in inland river basins. In addition, this study can also provide effective and scientific suggestions for carbon projects, offer a key scientific basis for understanding the carbon cycle, and support global climate change adaptation and mitigation strategies.

1. Introduction

The Sixth Assessment Report of the Intergovernmental Panel on Climate Change (IPCC, 2023) indicates that the global climate is experiencing unprecedented changes. The global carbon cycle, which is the linkage among the various spheres of the Earth, is crucial for studying climate change [1,2]. As the largest carbon reservoir in terrestrial ecosystems, soil organic carbon (SOC) is a crucial component for investigating carbon cycling and global climate change [2,3]. SOC is a fundamental indicator of soil quality and a key constituent of the soil carbon pool [4]. The spatial–temporal distribution of SOC plays a crucial role in the management of carbon and climate change, which can not only contribute to determining the SOC stock but also benefit inputs to earth system models and sustainable soil management [1,5,6,7].
The most accurate method of obtaining SOC data is primarily through analyzing soil samples in the laboratory [8]. This method has overly high time and labor costs and is easily limited by the natural conditions of the study area [9,10,11]. Thus, it is typically suitable only on a small scale. However, obtaining a high-resolution map of SOC on a large scale is crucial for understanding the processes controlling the global carbon cycle [12,13]. The Harmonized World Soil Database (HWSD) was once an important foundation for previous research related to this topic; the Harmonized World Soil Database (HWSD) was a collaborative effort between the Food and Agriculture Organization (FAO) and the ISRIC-World Soil Information [14,15,16,17]. However, an increasing number of studies have noted that its accuracy is no longer sufficient to meet the demands of recent research [18]. Thus, maps of the SOC with higher precision are necessary on a large scale.
The development of ‘3S’ technology has greatly increased research on a large scale [19,20]. The main method of mapping the distribution of SOC on a large scale is to establish models between SOC and environmental factors on the basis of ground observations. However, two key issues impact the accuracy of these SOC maps. Firstly, environmental factors are selected for estimating SOC [21]. SOC usually shows strong spatial heterogeneity due to the impact of both natural and anthropogenic factors [22]. A reasonable scheme for estimating environmental factors can be used to effectively constantly improve the prediction accuracy of SOC, with factors such as climate, topography, soil properties, biota, parent material, and human activities (land use change, land management, etc.) [4,23,24]. Secondly, the appropriate estimation method was selected from among the various methods. Two types of SOC estimation methods exist. The first type is the traditional statistical method (linear regression). This type of method is widely used because of its simplicity, computational efficiency, and intuitive interpretation. However, it is difficult to achieve further breakthroughs in accuracy [21,25,26,27,28]. The second type is nonlinear methods, such as machine learning (ML) methods, random forest (RF), and support vector machines (SVMs) [29,30]. These methods are characterized by multiple factors and nonlinearity and have achieved satisfactory results in various research fields, including SOC prediction [27,31]. Different ML methods perform differently in various scenarios due to their structural and theoretical differences [27,32]. Thus, a comparison of different methods for mapping SOC is essential [33,34,35]. RF and SVR are the most widely used ML methods to estimate various environmental factors, such as SOC [35,36,37,38]. Both of them excel in capturing nonlinear relationships and interactions, making them highly effective in scenarios with high-dimensional and heterogeneous data [39,40]. SVR involves the nonlinear mapping of primary data into a higher-dimensional feature space, and RF shows a good performance for the outliers and noise data [41,42]. As a classical statistical method, MLR is as a baseline for comparison.
Conducting SOC research at the watershed scale is the foundation for accurately understanding regional carbon cycle processes and mechanisms and for achieving a comprehensive and reliable assessment of global carbon cycle processes and carbon budgets [1]. Globally, arid and semi-arid regions account for approximately one-third of the Earth’s land area and play a critical role in global carbon storage [36]. As climate change accelerates, alterations in SOC dynamics in these regions may become particularly consequential, with potential implications for both regional and global carbon cycles [37,38,39]. Inland river basins are located mostly in arid and semi-arid regions [40]. These basins have closed geographical conditions, independent carbon cycle systems, and extreme sensitivity to climate change and human activities. Thus, these basins are ideal for studying the carbon cycle in arid areas [1,41,42]. The Shiyang River Basin is the third largest basin and has the highest degree of development and utilization of water and soil resources in the Hexi Corridor, and it is located in Northwest China. The Shiyang River Basin features complex topography and is divided into distinct vertical zones, resulting in significant spatial heterogeneity of SOC. This variability allows SOC research to not only map the geographical distribution of soil carbon storage but to also elucidate carbon dynamics across different land types and ecological conditions. In recent years, the basin has confronted environmental challenges such as land degradation, desertification, and water scarcity. As ecological restoration measures are implemented, studying changes in SOC provides a scientific basis for evaluating restoration effectiveness and supports the achievement of the basin’s sustainable development goals [1,43]. Therefore, in this study, the Shiyang River Basin is selected as the study area, and 65 soil samples were collected in the Shiyang River Basin between 2011 and 2021. Three methods for estimating SOC have been compared; these methods include the representative linear regression (MLR) method and nonlinear methods (SVR and RF). This method aims to find the optimal estimation methods for environmental factors. Then, the spatiotemporal variation in SOC in the Shiyang River Basin from 2011 to 2021 is analyzed. This study helps clarify the characteristics of changes in and factors driving SOC over the past decade and can provide a theoretical basis for estimating SOC in inland river basins and reasonable carbon projects in the Shiyang River Basin. Studying the spatiotemporal distribution of SOC in arid areas can provide a key scientific basis for understanding the carbon cycle and can also provide support for global climate change adaptation and mitigation strategies.

2. Materials and Methods

2.1. Studying Area

The Shiyang River Basin is one of the three major inland river basins in China. It is located in Gansu Province, Northwest China, east of the Hexi Corridor. The basin is bounded to the west by the Wushaoling Mountains and to the north by the foothills of the Qilian Mountains [44]. The basin is geographically situated within 101°22′–104°16′E and 36°29′~39°27′N (Figure 1), with a total area of approximately 4.16 × 104 km2. The basin has a topography that slopes from the southwest to the northeast in China, with the southern part being higher than the northern part, which comprises the Qilian Mountains, the corridor plains, the low hills, and the desert areas from south to north [44,45]. The average annual precipitation in the basin is 197.957 mm, and the average annual temperature is 7.745 °C. The annual evaporation ranges from 700 mm to 2600 mm [44,46,47]. Situated on the boundary of the monsoon region, the Shiyang River Basin is influenced by both the Asian winter and summer monsoons, as well as the westerly circulation [48]. The unique interplay between water and heat conditions at varying altitudes gives rise to pronounced vertical climatic gradients, leading to the formation of distinct vertical zones characterized by diverse and unique landscapes [49].
Given that SOC dynamics in the upper reaches exhibit slower changes due to limited human activities, whereas the middle and lower reaches are more significantly influenced by anthropogenic factors, the sampling points in this study are primarily concentrated in the middle and lower reaches [49,50,51]. Two cities are located in the basin: Wuwei and Jinchang [52]. Wuwei is the city with the largest permanent population, with more than 1.4 million people in the Hexi Corridor in 2023. It is currently governed Liangzhou District, Minqin County, Gulang County, and Tianzhu Tibetan Autonomous County. Jinchang is known as Nickel city, which is an important industrial city [43]. It is under the jurisdiction of one district and one county (Jinchuan District and Yongchang County). Jinchang County is a typical resource-based city, with the motto of “mining before city”. The ecological environment is fragile, and the relationship between humans and the environment is weak in the Shiyang River Basin. Thus, ecological protection and sustainable utilization are particularly important and meaningful.

2.2. Soil Sampling and Analysis of Soil Organic Matter

The soil samples covered the main regions of the entire Shiyang River Basin (Figure 1). The samples were selected from the main types of land use/land cover in each region so that they could capture regional variations in SOC [51]. At last, the number of sampling points was kept consistent across different years as much as possible [7]. Therefore, we collected a total of 65 soil samples in this study (Table 1).
The soil organic matter content was measured via the potassium dichromate oxidation method. Visible stones and leaves were removed in the laboratory first. After drying, each soil sample was measured twice to reach a weight of 0.3 ± 0.0005 g. Then, 5 mL of potassium dichromate and 5 mL of concentrated sulfuric acid solution were added to dissolve the samples. The next step involved boiling for ten minutes at 203 °C. After boiling, FeSO4 was titrated to reduce Cr2O7. The volume of FeSO4 was recorded if the orange–yellow solution turned brownish-red. Finally, the soil organic matter content was calculated via the following formula:
S O C   ( g / k g ) ( V 0 V ) × C × 0.003 × 1000 m g / k g
C represents the concentration of the standard solution (mol/L), V0 is the volume of recorded FeSO4 solution consumed by the blank solution (mL), V is the volume of recorded FeSO4 (mL), m is the sample mass (g), and a factor of 1000 is used to convert the result to the organic matter content per 1 kg of soil.

2.3. Data

2.3.1. Digital Elevation Model (DEM)

The digital elevation model (DEM) data were obtained from the geospatial data cloud (https://www.gscloud.cn/, accessed on 11 April 2024), whose spatial resolution is 30 m. The data were preprocessed, which included radiometric calibration, atmospheric correction, orthorectification, and clipping.

2.3.2. Meteorological Data

The meteorological data were sourced from the National Centers for Environmental Information (NCEI) of the National Oceanic and Atmospheric Administration (NOAA) (https://www.ncei.noaa.gov/access/search/data-search/global-summary-of-the-day, accessed on 20 April 2024).
Two meteorological stations were selected (Minqin and Wushao Mountains) within the Shiyang River Basin from 2011 to 2021. The data include the monthly average temperature, precipitation, vapor pressure, evaporation, relative humidity, and sunshine duration.

2.3.3. HWSD

The Harmonized World Soil Database (HWSD) was also used in this study (https://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/HWSD_Data.html?sb=4, accessed on 9 January 2024). The spatial resolution was 1 km and provided various soil information, such as SOC. The data were imported into ArcGIS 10.2, and the SOC data were extracted from the measurement points on the basis of their corresponding coordinates.

2.3.4. Net Primary Productivity (NPP)

In this study, the Moderate Resolution Imaging Spectroradiometer (MODIS) net primary productivity (NPP) product (MOD17A3HGF) from the United States Geological Survey (USGS) website (https://www.usgs.gov/, accessed on 20 April 2024) was used. Its spatial resolution was 1 km from 1982 to 2021. After reprojection and format conversion via the Modis Reprojection Tool (MRT), additional preprocessing steps, including clipping, calculation, and spatial analysis, were conducted using ArcGIS 10.2 software.

2.3.5. Normalized Difference Vegetation Index (NDVI)

In this study, the MODIS NDVI product (MOD13A3) from the Earth data search website (https://search.earthdata.nasa.gov/search, accessed on 20 April 2024), with a spatial resolution of 1 km, was used. Its temporal resolution was monthly, and the temporal range was from February 2000 to December 2021.

2.4. Methods

2.4.1. Support Vector Regression (SVR)

SVR is a method used in SVM learning to solve complex regression problems [32]. This approach minimizes empirical risk, addresses high-dimensional and low-sample nonlinear issues, and can avoid the problem of overfitting. The SVR has high estimation accuracy when modeling variables without multicollinearity, and its modeling principle is to find a hyperplane in high-dimensional space. For nonlinear models, SVR uses linear algorithms in the high-dimensional feature space to perform a linear analysis of the nonlinear characteristics of the samples [53].

2.4.2. Multiple Linear Regression (MLR)

Multiple linear regression is an effective method of solving multivariable prediction problems in classical statistical analysis. It describes how the dependent variable Y changes with changes in multiple independent variables X [43]. The general form of the multiple linear regression model is as follows:
Y = β 0 + β 1 X 1 + β 2 X 2 + + β M X M + e
where β 0 is the constant term, also known as the intercept. β 1 , β 2 …, β m are called the partial regression coefficients or simply regression coefficients. This equation indicates that the dependent variable Y can be approximately represented as a linear function of the independent variables X 1 , X 2 , …, X M . The term e represents the random error (i.e., the residual) after removing the influence of m independent variables on Y. The partial regression coefficient β j (j = 1, 2…, m) represents the average change in Y when one independent variable changes by one unit, while the other independent variables are held constant.

2.4.3. Random Forest (RF)

RF is a flexible ensemble learning algorithm based on decision trees. The core idea of RF is to use random samples to generate multiple independent decision trees and then generate a single estimated value determined by each decision tree in the forest [54,55]. The construction of an RF follows the following main principles: (1) The bootstrap sample algorithm is used for sampling; (2) during the process of generating the decision forest, features of a constant m (m < M) are specified for each decision tree. The optimal features can be selected from the m features during the model training process; (3) the construction of the RF does not require pruning; and (4) each time a new sample is input, k decision trees make independent judgements, generate their own total k operation results, and make decisions on the overall result of the forest via the average weighting method. These principles ensure that each decision tree is random and that the computations are independent in the RF. Moreover, it also ensures that each decision tree is effective and random, and it is useful for reducing the impact of overfitting and feature interference [56].

2.4.4. Estimating Methods

To evaluate the accuracy of the SVR, RF, and MLR methods, the correlation coefficient (r), root mean square error (RMSE), and Coefficient of Determination (R2) were used to determine the simulation performance of these methods [57]. The equations for the calculation are as follows:
r = C o v a i , b i V a r a i V ar b i
R M S E = 1 n i = 1 n a i b i 2
R 2 = 1 i = 1 n a i b i 2 i = 1 n a i a i ¯ 2
where n represents the number of samples, ai represents the measured SOC content, and bi represents the calculated SOC content. The RMSE reflects the difference between the observed and estimated measured values. The smaller the RMSE value is, the smaller the overall difference between the observed and estimated values. On the other hand, r represents the tendency between the observed and true values. The larger the value of r is, the closer the pattern of fluctuation in the observed values to the true values. R2 quantifies how well the independent variables explain the variation in the dependent variable in a regression model.

3. Results

3.1. Influence of the Number of Samples

In this study, 65 soil samples were used to estimate the distribution of SOC. First, the influence of the number of samples on the performance of the estimation methods was analyzed via cross-validation. Initially, 30 sample points were selected randomly for the training model, and the remaining samples were used for validation. By repeating these processes 20 times, in this study, the performance of the three methods (SVR, RF, and MLR) from 30 to 60 training points was compared, as shown in Figure 2. The results showed that the accuracy of the three methods changed as the number of samples varied. The optimal result occurred when the number of samples was 47, which was approximately 72% of the total number of samples. Both classical statistical analysis and ML algorithms have revealed that the number of sample points affects the model’s predictive accuracy. It is generally recommended that the ratio of training samples to validation samples be between 60% and 70% [58,59]. It is generally believed that the greater the number of sample points trained in a model there are, the better the performance of the method [60]. However, more sample points are not necessarily better for further research. The main reason is that the characteristics of the distribution of the data and scale of the research jointly impact the model’s accuracy [61]. Therefore, comparative analysis needs to be conducted during research to obtain the optimal number of sample points [62].

3.2. Performance with Different Schemes of Factors

Before the distribution of SOC in a basin can be mapped, the best environmental factors for estimating SOC need to be identified. On the basis of previous studies, in this study, 21 environmental factors were collected (Table 2). Then, the estimation accuracies of different models under different schemes of factors were compared to identify the optimal scheme of factors via cross-validation. The aforementioned 47 sample points were selected as the training samples to establish the model, and the process was repeated 20 times.

3.2.1. Comparisons of the Effects of Single Environmental Factors on the Results of the Estimation

First, we compared the results of the estimation of the three methods with those of the single environmental factors in Figure 3. Specifically, SVR performed the best (RMSE = 0.008) when elevation was the driving environmental factor. The LR was the greatest when the latitude was the main environmental factor (RMSE = 0.02). The RF was the best if the average visibility attribute was the primary environmental factor (RMSE = 0.015).
Considering the RMSE, r, and R2 values together, the SVR was the best when the elevation was the environmental factor (RMSE = 0.008, r = 0.47, R2 = 0.24). For LR, the best results were obtained when latitude was used as the environmental factor (RMSE = 0.02, r = 0.36, R2 = 0.139). For RF, the average dew point yields were the best environmental factor (RMSE = 0.07, r = 0.67, R2 = 0.18). Although SVR had a slightly lower RMSE and R2 value than RF did, the r value of RF was much greater than that of SVR. Therefore, the overall performance of RF was superior to that of SVR.

3.2.2. Comparisons of the Effect of the Performance of Multiple Environmental Factors on the Estimated Results

On the basis of the single-factor comparison and screening, the five most effective schemes of factors were selected; these factors were the average dewpoint, NDVI, temperature, elevation, and latitude. The other environmental factors were then added to these five factors separately. A total of 100 estimation schemes were used to compare the performance of the two environmental factors. Among them, 17 schemes had RMSE values less than 0.04, and the absolute values of r were greater than 0.2. In addition, the smallest RMSE values and largest r values of the three methods were also selected. These results are shown in Figure 4. The optimal scheme for the RF model was the NDVI and elevation (RMSE = 0.07, r = 0.38, R2 = 0.31), that for the SVR model was latitude and temperature (RMSE = 0.11, r = 0.31, R2 = 0.2), and that for the LR model was elevation and maximum precipitation (RMSE = 0.05, r = 0.30, R2 = 0.16). For RF and SVR, the estimation accuracy effectively improved after increasing the number of environmental factors to two. However, the performance did not show an obvious improvement and even declined in some schemes for LR. When the performances of the RMSE and r of the three methods are jointly compared, the RF performed the best when the NDVI and elevation were used as driving environmental variables.
In this study, the performances of the three methods were compared with three different driving environmental factors on the basis of the results of the two driving environmental factors. Among the schemes, the results of the estimation for which the RMSE value was less than 0.04 and the absolute value of r was greater than 0.2 were selected. The best performance schemes in terms of both the RMSE and r were also selected. As shown in Figure 5, eight schemes were selected in total. Among these schemes, the optimal schemes for RF were the latitude, slope, and NDVI (RMSE = 0.005, r = 0.39, R2 = 0.41), which was also the best scheme for SVR (RMSE = 0.07, r = 0.27, R2 = 0.24). For LR, the optimal scheme was the latitude, slope, and temperature (RMSE = 0.05, r = 0.36, R2 = 0.25). The fluctuations of SVR gradually decreased, whereas the performance of RF continued to improve with increasing environmental factors. The performance tended to deteriorate with additional environmental factors.
On the basis of the performance of the three factors, in this study, the number of environmental factors was added to the four types. Similarly, there were four schemes in which the RMSE value was less than 0.04, the absolute value of r was greater than 0.2, and the best performance was observed for RMSE, r, and R2 (Figure 6). The best performance of the three methods among the different environmental factors occurred with the RF method (RMSE = 0.006, r = 0.27, R2 = 0.45), with latitude, slope, NDVI, and minimum precipitation as the driving variables, followed by the LR method (RMSE = 0.05, r = 0.36, R2 = 0.27). The SVR was the worst (RMSE = 0.04, r = 0.23, R2 = 0.32). The performance of both the RF and SVR methods decreased as the number of factors increased from three to four. In addition, the performance fluctuated greatly as the number of factors increased for all methods. Thus, further comparisons with more factors were not performed.
In summary, the optimal scheme was the RF method with three environmental factors (latitude, slope, and NDVI) for predicting SOC. Different methods exhibited different patterns with increasing environmental factors. SVR showed a more significant improvement in performance with an increasing number of factors. Since the number of environmental factors was three, its performance was stable. The performance of RF was close to that of SVR, but it was more stable for each environmental scheme. The performance of LR fluctuated noticeably as the number of factors increased. The instability of the R2 value may be attributed to several factors, including that the data may have nonlinear relationships that cannot be accurately fitted by a linear model. The model may not capture all the key factors, resulting in low R2 [63,64].

3.3. Spatial Distribution of SOC in the Shiyang River Basin

The HWSD is the most widely used traditional soil database in the world and contains geographical information on the physical and chemical properties of soil from regional and national inventories worldwide [65,66]. The HWSD map of SOC across the Shiyang River Basin revealed that the SOC values ranged from −9 to 42.523 g/kg, with an average of 0.98 g/kg and a standard deviation of 2.44 g/kg (Figure 7d). According to the results for calculating the optimal estimation scheme in this study, the SOC values for the Shiyang River Basin in 2011, 2018, and 2021 ranged from 0.295 to 0.841 g/kg, with mean values of 0.49 g/kg, 0.54 g/kg, and 0.50 g/kg and standard deviations of 0.07 g/kg, 0.1 g/kg, and 0.08 g/kg, respectively. By comparing the distributions of SOC in the HWSD and this study, the range of fluctuation in the HWSD was larger than that for the RF method, but the accuracy of the HWSD (RMSE = 0.41, r = 0.19, R2 = 0.11) was much worse than that of the result in this study (RMSE = 0.005, r = 0.39, R2 = 0.41). The accuracy increased by 98% and 51.5%, respectively. The accuracy of the HWSD was poor because of the uneven spatial distribution of profiles in certain areas [15,16,62,67]. Thus, a more precise distribution of SOC was mapped in this study.
According to the results of the simulation for the optimal scheme of the RF method, the average SOC changes in 2011, 2018, and 2021 indicated that the SOC from 2011 to 2021 showed an overall increasing trend. Notably, the SOC content in 2018 was the highest, mainly because of the influence of extreme weather conditions. The La Niña event was the main factor leading to a greater carbon sink, which caused the SOC in the Shiyang River Basin in 2018 to be significantly greater than that in 2011 and 2021 [68]. In October 2017, a La Niña event occurred, which significantly increased the precipitation in the following autumn. In addition, the lower winter temperatures also increased the SOC content in the basin. Thus, the total SOC in the Shiyang River Basin increased from 23.897 × 103 g/kg in 2011 to 24.033 × 103 g/kg. We observed significant fluctuations in SOC content due to extreme weather events in 2018. While the short-term effects of such events on SOC have been documented, their long-term impacts on the carbon cycle in the watershed require further investigation [69].
From 2011 to 2021, the spatial heterogeneity of the SOC was more significant in the Shiyang River Basin. This was mainly caused by the variance in the middle and downstream regions of the basin. The NDVI also showed a decreasing trend from 2011 to 2021 in the northwestern part of the middle basin, which is the main reason why the SOC content decreased by 2.82% (Figure 8 and Figure 9). The local government has implemented the “northwards relocation of the residential area and eastwards expansion of the industrial area” plan since 2014, which has led to the gradual expansion of urban construction land towards the northwest. It was found that the industrial area increased by about 5% from 2011 to 2021, which was similar to other research [8,68,69,70] (Figure 10). Thus, the industrial activity was the main reason why the SOC content decreased from 2011 to 2021 in the middle basin.
In contrast, the SOC content downstream tended to increase. The SOC increased by 34.4% from 2011 to 2018 in Minqin County, Wuwei city. This was caused mainly by improvements in the ecological environment [43,70]. Minqin County is surrounded by the Tengger and Badain Jaran deserts on the east, west, and north sides. It is one of the most severely decertified areas in the lower reaches of Shiyang River; the county has long been confronted with challenges related to the degradation of the ecological environment and the scarcity of water resources. Since 2001, there have been many ecological restoration projects, such as livestock grazing bans, well closures, agricultural water-saving practices, and the promotion of low-water-consuming crops, which have prevented ecological degradation in this area. In addition, the annual water volume transferred from the Jingdian Phase II project to Minqin has essentially met the value established after adjustments in the management of the Shiyang River were made in 2014. With the comprehensive assistance made by various measurements, the Minqin diversion canal operated at or above the design flow for 149 days in 2020. Its influx is increasing significantly, and the downwards trend of groundwater levels is effectively being curbed [71,72]. Additionally, given the inherent spatial heterogeneity of SOC, an analysis of NDVI and land use data from 2011 to 2021 indicates that the NDVI increased by 0.5% from 2011 to 2021 (Figure 11).
In summary, the SOC showed an increasing trend from 2011 to 2021 in the Shiyang River Basin. This was caused mainly by improvements in environmental quality, which are caused by climate change, ecological restoration, and other factors. Since 2011, the spatial distribution of SOC has significantly varied, and the variability in SOC has been greater. The SOC in the northwestern part of the middle basin showed a decreasing trend because of the relatively small area of vegetation, which is located in the middle basin. In addition, the SOC in Minqin County, which is located downstream of the Shiyang River Basin, was greater than that in the other areas, according to the ecological restoration measurements.

4. Discussion

4.1. Differences in the Performance of the Three Methods

A comparison of the performance of the three methods under different schemes of factors reveals that the performance of RF is better than that of SVR and LR. The performance of RF and SVR is much closer than that of LR. This is mainly because RF and SVR are nonlinear methods. Compared with linear methods, nonlinear methods better capture and explain the nonlinear relationships between SOC and environmental factors. Thus, the performance of the SVR method is better than that of the LR method [67,73,74,75].
A comparison of the performance of the RF and SVR methods with different sample sizes revealed that the SVR method was less sensitive to changes in sample size than the RF method was. The performance of SVR improved with a smaller number of training sample points. In contrast, the RF was better if the number of training sample points increased. This was caused by the differences between RF and SVR. The RF method has gradually shown advantages in comparisons of classification and can provide more accurate data of SOC prediction [12,73,74,76]. Thus, the RF model achieved the best performance with many training samples.
Generally, ML methods perform better with more environmental factors and samples [77]. In this study, we also compared the performance of three methods with different environmental factors. However, the best performance of the three methods occurred when the number of types of driving environmental factors was three. This phenomenon indicated that the performance of the methods would be worse if too many types of environmental factors were used as driving factors. This was caused mainly by the collinearity between factors, sample representativeness, and other factors. The characteristics of the samples determine the accuracy of the method, which requires the samples to reflect the spatial–temporal distribution of SOC [34]. Too many or too few samples easily lead to obvious differences between the samples and true distribution, especially on a large scale. The comprehensive and systematic observation of SOC data is challenging because of the limitations of various natural conditions and cognitive limitations [78]. Therefore, an excessive number of samples and factors may also lead to a decline in the performance of the method.

4.2. Influences of the Latitude, Aspect, and the NDVI on SOC

In this study, the optimal estimation performance was achieved by the RF model when the latitude, slope, and NDVI were used as the independent variables. The Shiyang River Basin extends from the upstream mountainous areas to the downstream desert oasis, exhibiting different geomorphic and vegetation. The NDVI captured the variance in vegetation effectively. Consequently, the significant influence of vegetation on SOC leads to better performance in predicting SOC [79,80]. Typically, small-scale slopes affect SOC by impacting the variance in the soil temperature, soil moisture, and vegetation communities and by generating microclimates that are significantly different from those under zonal climate conditions. Owing to the large changes in elevation and complex geomorphic in the Shiyang River Basin, the NDVI and slope direction lead to obvious differences in vegetation and climate. Typically, the Minqin Oasis, located between the surrounding Badain Jaran Desert and Tengger Desert, plays an important role in maintaining the ecological balance of the area. It is one of the hotspots for the study of soil desertification, mainly including artificial sand fixation, sand fixation, natural white thorn shrubs, and plants such as P. sylvestris and S. salviphytes, Halophyllum salina, and P. salvophytes and Halophylla, which may have different adaptation strategies [46]. The green pattern of the Minqin Oasis in the lower reaches is complex, and desertification is more severe in areas farther from cities and towns [81]. The main reason for this phenomenon is the difference in the spatial distribution of SOC in the erosion and deposition process in the watershed [28,82]. Thus, the NDVI, temperature, and precipitation in the study area exhibit more obvious stepped distributions with changes in latitude and slope, which affects the distribution of SOC [28,83,84].
When latitude, slope, and the NDVI were used as independent variables, the RF method achieved optimal predictions for SOC. However, these three factors were not the most significant factors according to the correlation analysis (Figure 12). This discrepancy arose primarily from the complex nonlinear interactions between environmental factors and SOC, as well as among the environmental factors themselves. Many studies have shown that the relationships between SOC and environmental variables are complex and nonlinear [85,86]. The mechanism of interaction between some environmental factors and SOC is still unclear [87]. In addition, the correlation coefficients can reflect only the linear relationships between factors [88,89]. Thus, the correlation results could not be the basis for feature selection in nonlinear methods to select the optimal solutions, such as the RF and SVR methods. Therefore, it is essential to conduct comparative analyses that consider the scale of the study, the characteristics of the study area, and the methods themselves to identify the most effective approaches and strategies.

5. Conclusions

On the basis of 65 soil sample points collected from 2011 to 2021, in this study, three representative methods for estimating SOC (linear regression (MLR)) and two nonlinear methods (SVR and RF) were used to estimate the distribution of SOC. To achieve the best scheme for mapping SOC, in this study, the optimal number of training points was compared, different environmental factors were used as the argument, and the performance among the three methods was assessed. Then, the spatiotemporal values of the SOC from 2011 to 2021 were analyzed. The three conclusions are as follows:
1.
The optimal number of training samples is 47, which is 72% of all sample points. This percentage is the same as that in other studies. When the performances of the LR, RF, and SVR methods are compared, the RF and SVR methods are better than the LR method because of the nonlinear mechanism.
2.
The best estimation of SOC is achieved when latitude, slope, and NDVI are used as predictor variables in this study, whose spatial scale is the watershed scale and the period is 10 years. Vegetation distribution exhibits a significant correlation with latitude, and this distribution plays a key role in determining SOC content. Additionally, temperature and precipitation show a clear stepwise distribution that corresponds to latitude variations within the study area.
3.
The spatial distribution and variability of SOC increased from 2011 to 2021. In the northwestern part of the middle basin, SOC decreased due to industrial activities, coinciding with a reduction in grassland and forest areas. In contrast, SOC levels downstream exhibited a continuous increase, primarily driven by a range of ecological restoration efforts.

Author Contributions

Conceptualization, J.L. and N.H.; data curation, J.L., N.H., Y.Q. and Q.D.; formal analysis, J.L.; investigation, N.H; methodology, J.L. and N.H.; data collection, Q.D.; visualization, N.H.; writing—original draft, J.L.; writing—review and editing, N.H. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

HWSD datasets are derived from (https://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/HWSD_Data.html?sb=4, accessed on 9 January 2024). NDVI are available from (https://search.earthdata.nasa.gov/search, accessed on 20 April 2024). NPP data are available from (https://www.usgs.gov/, accessed on 20 April 2024).

Acknowledgments

We sincerely thank the reviewers for 47 their comments, which have greatly helped to improve this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, Y.; Zhang, C.; Wang, N.; Han, Q.; Zhang, X.; Liu, Y.; Xu, L.; Ye, W. Substantial inorganic carbon sink in closed drainage basins globally. Nat. Geosci. 2017, 10, 501–506. [Google Scholar] [CrossRef]
  2. Piao, S.; Sitch, S.; Ciais, P.; Friedlingstein, P.; Peylin, P.; Wang, X.; Ahlström, A.; Anav, A.; Canadell, J.G.; Cong, N. Evaluation of terrestrial carbon cycle models for their response to climate variability and to CO2 trends. Glob. Change Biol. 2013, 19, 2117–2132. [Google Scholar] [CrossRef] [PubMed]
  3. Fang, Y.; Liu, C.; Huang, M.; Li, H.; Leung, L.R. Steady state estimation of soil organic carbon using satellite-derived canopy leaf area index. J. Adv. Model. Earth Syst. 2014, 6, 1049–1064. [Google Scholar] [CrossRef]
  4. Shafizadeh-Moghadam, H.; Minaei, F.; Talebi-khiyavi, H.; Xu, T.; Homaee, M. Synergetic use of multi-temporal Sentinel-1, Sentinel-2, NDVI, and topographic factors for estimating soil organic carbon. Catena 2022, 212, 106077. [Google Scholar] [CrossRef]
  5. Lal, R. Carbon sequestration in dryland ecosystems. Environ. Manag. 2004, 33, 528–544. [Google Scholar] [CrossRef] [PubMed]
  6. Jandl, R.; Rodeghiero, M.; Martinez, C.; Cotrufo, M.F.; Bampa, F.; Van Wesemael, B.; Harrison, R.B.; Guerrini, I.A.; Richter, D.D., Jr.; Rustad, L. Current status, uncertainty and future needs in soil organic carbon monitoring. Sci. Total Environ. 2014, 468, 376–383. [Google Scholar] [CrossRef]
  7. Wang, Y.; Li, Y.; Zhang, C.; Zhou, X. Significance of proxy indicators for lake sediments in arid zones—An example of surface sediment samples from Inojirasawa. J. Lanzhou Univ. (Nat. Sci.) 2014, 50, 816–823. [Google Scholar] [CrossRef]
  8. He, M.; Tang, L.; Li, C.; Ren, J.; Zhang, L.; Li, X. Dynamics of soil organic carbon and nitrogen and their relations to hydrothermal variability in dryland. J. Environ. Manag. 2022, 319, 115751. [Google Scholar] [CrossRef]
  9. Ahirwal, J.; Nath, A.; Brahma, B.; Deb, S.; Sahoo, U.K.; Nath, A.J. Patterns and driving factors of biomass carbon and soil organic carbon stock in the Indian Himalayan region. Sci. Total Environ. 2021, 770, 145292. [Google Scholar] [CrossRef]
  10. van der Meij, W.M. Evolutionary pathways in soil-landscape evolution models. SOIL Discuss. 2021, 2021, 1–14. [Google Scholar] [CrossRef]
  11. Huang, C.; Chen, Y.; Zhang, S.; Wu, J. Detecting, extracting, and monitoring surface water from space using optical sensors: A review. Rev. Geophys. 2018, 56, 333–360. [Google Scholar] [CrossRef]
  12. Wang, B.; Waters, C.; Orgill, S.; Gray, J.; Cowie, A.; Clark, A.; Li Liu, D. High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia. Sci. Total Environ. 2018, 630, 367–378. [Google Scholar] [CrossRef] [PubMed]
  13. O’Rourke, S.M.; Angers, D.A.; Holden, N.M.; McBratney, A.B. Soil organic carbon across scales. Glob. Change Biol. 2015, 21, 3561–3574. [Google Scholar] [CrossRef]
  14. Pan, G.; Xu, X.; Smith, P.; Pan, W.; Lal, R. An increase in topsoil SOC stock of China’s croplands between 1985 and 2006 revealed by soil monitoring. Agric. Ecosyst. Environ. 2010, 136, 133–138. [Google Scholar] [CrossRef]
  15. Tifafi, M.; Guenet, B.; Hatté, C. Large differences in global and regional total soil carbon stock estimates based on SoilGrids, HWSD, and NCSCD: Intercomparison and evaluation based on field data from USA, England, Wales, and France. Glob. Biogeochem. Cycles 2018, 32, 42–56. [Google Scholar] [CrossRef]
  16. Silatsa, F.B.; Yemefack, M.; Tabi, F.O.; Heuvelink, G.B.; Leenaars, J.G. Assessing countrywide soil organic carbon stock using hybrid machine learning modelling and legacy soil data in Cameroon. Geoderma 2020, 367, 114260. [Google Scholar] [CrossRef]
  17. Zhang, K.; Chen, H.; Ma, N.; Shang, S.; Wang, Y.; Xu, Q.; Zhu, G. A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 2020. Sci. Data 2024, 11, 445. [Google Scholar] [CrossRef]
  18. Scharlemann, J.P.; Tanner, E.V.; Hiederer, R.; Kapos, V. Global soil carbon: Understanding and managing the largest terrestrial carbon pool. Carbon Manag. 2014, 5, 81–91. [Google Scholar] [CrossRef]
  19. Poulter, B.; Frank, D.; Ciais, P.; Myneni, R.B.; Andela, N.; Bi, J.; Broquet, G.; Canadell, J.G.; Chevallier, F.; Liu, Y.Y. Contribution of semi-arid ecosystems to interannual variability of the global carbon cycle. Nature 2014, 509, 600–603. [Google Scholar] [CrossRef] [PubMed]
  20. Tong, R.; Liang, X.; Guan, Q.; Song, Y.; Chen, Y.; Wang, Q.; Zheng, L.; Jin, Q.; Yu, Y.; He, J.; et al. Accounting for Carbon Stocks in Terrestrial Soils and Carbon Sinks in Land Management in China, 2000–2020. Acta Geogr. Sin. 2023, 78, 2209–2222. [Google Scholar]
  21. Jiang, B.; Xu, W.; Zhang, D.; Nie, F.; Sun, Q. Contrasting multiple deterministic interpolation responses to different spatial scale in prediction soil organic carbon: A case study in Mollisols regions. Ecol. Indic. 2022, 134, 108472. [Google Scholar] [CrossRef]
  22. Hu, B.; Xie, M.; Zhou, Y.; Chen, S.; Zhou, Y.; Ni, H.; Peng, J.; Ji, W.; Hong, Y.; Li, H. A high-resolution map of soil organic carbon in cropland of Southern China. Catena 2024, 237, 107813. [Google Scholar] [CrossRef]
  23. Khormali, F.; Kehl, M. Micromorphology and development of loess-derived surface and buried soils along a precipitation gradient in Northern Iran. Quat. Int. 2011, 234, 109–123. [Google Scholar] [CrossRef]
  24. Beillouin, D.; Corbeels, M.; Demenois, J.; Berre, D.; Boyer, A.; Fallot, A.; Feder, F.; Cardinael, R. A global meta-analysis of soil organic carbon in the Anthropocene. Nat. Commun. 2023, 14, 3700. [Google Scholar] [CrossRef] [PubMed]
  25. Wiesmeier, M.; von Lützow, M.; Spörlein, P.; Geuß, U.; Hangen, E.; Reischl, A.; Schilling, B.; Kögel-Knabner, I. Land use effects on organic carbon storage in soils of Bavaria: The importance of soil types. Soil Tillage Res. 2015, 146, 296–302. [Google Scholar] [CrossRef]
  26. Liu, X.; Li, S.; Wang, S.; Bian, Z.; Zhou, W.; Wang, C. Effects of farmland landscape pattern on spatial distribution of soil organic carbon in Lower Liaohe Plain of northeastern China. Ecol. Indic. 2022, 145, 109652. [Google Scholar] [CrossRef]
  27. Chen, F.; Feng, P.; Harrison, M.T.; Wang, B.; Liu, K.; Zhang, C.; Hu, K. Cropland carbon stocks driven by soil characteristics, rainfall and elevation. Sci. Total Environ. 2023, 862, 160602. [Google Scholar] [CrossRef]
  28. Zhang, S.; Tian, J.; Lu, X.; Tian, Q. Temporal and spatial dynamics distribution of organic carbon content of surface soil in coastal wetlands of Yancheng, China from 2000 to 2022 based on Landsat images. Catena 2023, 223, 106961. [Google Scholar] [CrossRef]
  29. Zhang, C.-t.; Yang, Y. Can the spatial prediction of soil organic matter be improved by incorporating multiple regression confidence intervals as soft data into BME method? Catena 2019, 178, 322–334. [Google Scholar] [CrossRef]
  30. Qu, M.; Li, W.; Zhang, C.; Huang, B.; Zhao, Y. Spatially nonstationary relationships between copper accumulation in rice grain and some related soil properties in paddy fields at a regional scale. Soil Sci. Soc. Am. J. 2014, 78, 1765–1774. [Google Scholar] [CrossRef]
  31. Ma, D. Characteristics of spatial and temporal variations in water production and water conservation in the Shiyang River Basin and their influencing factors. J. Northwest Norm. Univ. (Nat. Sci.) 2022. (In Chinese) [Google Scholar] [CrossRef]
  32. Li, J.; Zhang, L. Comparison of Four Methods for Vertical Extrapolation of Soil Moisture Contents from Surface to Deep Layers in an Alpine Area. Sustainability 2021, 13, 8862. [Google Scholar] [CrossRef]
  33. Tang, K.; Zhao, X.; Xu, Z.; Sun, H. A stacking ensemble model for predicting soil organic carbon content based on visible and near-infrared spectroscopy. Infrared Phys. Technol. 2024, 140, 105404. [Google Scholar] [CrossRef]
  34. Xu, X.; Du, C.; Ma, F.; Qiu, Z.; Zhou, J. A framework for high-resolution mapping of soil organic matter (SOM) by the integration of fourier mid-infrared attenuation total reflectance spectroscopy (FTIR-ATR), sentinel-2 images, and DEM derivatives. Remote Sens. 2023, 15, 1072. [Google Scholar] [CrossRef]
  35. Liu, S.; Wang, Z.; Lin, Z.; Zhao, Y.; Yan, Z.; Zhang, K.; Visser, M.; Townsend, P.A.; Wu, J. Spectra-phenology integration for high-resolution, accurate, and scalable mapping of foliar functional traits using time-series Sentinel-2 data. Remote Sens. Environ. 2024, 305, 114082. [Google Scholar] [CrossRef]
  36. Berdyyev, A.; Al-Masnay, Y.A.; Juliev, M.; Abuduwaili, J. Desertification Monitoring Using Machine Learning Techniques with Multiple Indicators Derived from Sentinel-2 in Turkmenistan. Remote Sens. 2024, 16, 4525. [Google Scholar] [CrossRef]
  37. Odebiri, O.; Mutanga, O.; Odindi, J.; Slotow, R.; Mafongoya, P.; Lottering, R.; Naicker, R.; Matongera, T.N.; Mngadi, M. Mapping sub-surface distribution of soil organic carbon stocks in South Africa’s arid and semi-arid landscapes: Implications for land management and climate change mitigation. Geoderma Reg. 2024, 37, e00817. [Google Scholar] [CrossRef] [PubMed]
  38. Ali, A.B.; Shukla, M. Assessment of soil organic and inorganic carbon stocks in arid and semi-arid rangelands of southeastern New Mexico. Ecol. Indic. 2024, 166, 112398. [Google Scholar]
  39. Sun, W.; Zhang, T.; Clow, G.D.; Sun, Y.-H.; Zhao, W.-Y.; Liang, B.-B.; Fan, C.-Y.; Peng, X.-Q.; Cao, B. Observed permafrost thawing and disappearance near the altitudinal limit of permafrost in the Qilian Mountains. Adv. Clim. Change Res. 2022, 13, 642–650. [Google Scholar] [CrossRef]
  40. Zhao, Q.; Ye, B.; Ding, Y.; Zhang, S.; Yi, S.; Wang, J.; Shangguan, D.; Zhao, C.; Han, H. Coupling a glacier melt model to the Variable Infiltration Capacity (VIC) model for hydrological modeling in north-western China. Environ. Earth Sci. 2013, 68, 87–101. [Google Scholar] [CrossRef]
  41. Meybeck, M. Global analysis of river systems: From Earth system controls to Anthropocene syndromes. Philos. Trans. R. Soc. London. Ser. B Biol. Sci. 2003, 358, 1935–1955. [Google Scholar] [CrossRef] [PubMed]
  42. Li, Y.; Morrill, C. A Holocene East Asian winter monsoon record at the southern edge of the Gobi Desert and its comparison with a transient simulation. Clim. Dyn. 2015, 45, 1219–1234. [Google Scholar] [CrossRef]
  43. Wang, X.; He, K.; Dong, Z. Effects of climate change and human activities on runoff in the Beichuan River Basin in the northeastern Tibetan Plateau, China. Catena 2019, 176, 81–93. [Google Scholar] [CrossRef]
  44. Zhu, G.; Wan, Q.; Yong, L.; Li, Q.; Zhang, Z.; Guo, H.; Zhang, Y.; Sun, Z.; Zhang, Z.; Ma, H. Dissolved organic carbon transport in the Qilian mountainous areas of China. Hydrol. Process. 2020, 34, 4985–4995. [Google Scholar] [CrossRef]
  45. Wang, N.; Xue, J.; Peng, J.; Biswas, A.; He, Y.; Shi, Z. Integrating remote sensing and landscape characteristics to estimate soil salinity using machine learning methods: A case study from Southern Xinjiang, China. Remote Sens. 2020, 12, 4118. [Google Scholar] [CrossRef]
  46. Wang, Z.; Shi, P.; Zhang, X.; Tong, H.; Zhang, W.; Liu, Y. Research on landscape pattern construction and ecological restoration of Jiuquan City based on ecological security evaluation. Sustainability 2021, 13, 5732. [Google Scholar] [CrossRef]
  47. Xu, C.; Hu, X.; Wang, X.; Liu, Z.; Tian, J.; Ren, Z. Spatiotemporal variations of eco-environmental vulnerability in Shiyang River Basin, China. Ecol. Indic. 2024, 158, 111327. [Google Scholar] [CrossRef]
  48. Zhang, Z.; Jia, W.; Zhu, G.; Shi, Y.; Yang, L.; Xiong, H.; Zhang, M.; Zhang, F. Hydrochemical characteristics and ion sources of river water in the upstream of the Shiyang River, China. Environ. Earth Sci. 2021, 80, 614. [Google Scholar] [CrossRef]
  49. Zhang, W.; Wan, Q.; Zhu, G.; Xu, Y. Distribution of soil organic carbon and carbon sequestration potential of different geomorphic units in Shiyang river basin, China. Environ. Geochem. Health 2023, 45, 4071–4086. [Google Scholar] [CrossRef]
  50. Wang, Q.; Xu, Y.; Zhu, G.; Lu, S.; Qiu, D.; Jiao, Y.; Meng, G.; Chen, L.; Li, R.; Zhang, W. Agricultural Activities Increased Soil Organic Carbon in Shiyang River Basin, a typical inland river basin in China. Res. Sq. 2024, preprint. [Google Scholar] [CrossRef]
  51. Ngabire, M.; Wang, T.; Xue, X.; Liao, J.; Sahbeni, G.; Huang, C.; Duan, H.; Song, X. Soil salinization mapping across different sandy land-cover types in the Shiyang River Basin: A remote sensing and multiple linear regression approach. Remote Sens. Appl. Soc. Environ. 2022, 28, 100847. [Google Scholar] [CrossRef]
  52. Ngabire, M.; Wang, T.; Liao, J.; Sahbeni, G. Quantitative Analysis of Desertification-Driving Mechanisms in the Shiyang River Basin: Examining Interactive Effects of Key Factors through the Geographic Detector Model. Remote Sens. 2023, 15, 2960. [Google Scholar] [CrossRef]
  53. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin, Germany, 2013. [Google Scholar]
  54. Takeda, H.; Tamura, Y.; Sato, S. Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy 2016, 104, 184–198. [Google Scholar] [CrossRef]
  55. Choubin, B.; Darabi, H.; Rahmati, O.; Sajedi-Hosseini, F.; Kløve, B. River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. Sci. Total Environ. 2018, 615, 272–281. [Google Scholar] [CrossRef] [PubMed]
  56. Zhao, T. Construction of dynamic model and optimization of transportation mode of solid waste in typical scenarios of non-waste city. J. Harbin Inst. Technol. 2020. Available online: https://ss.zhizhen.com/detail38502727e7500f26418314f75983cca59c4ca8c76a193bce1921b0a3ea255101928fa69a765a3d2d9d38b5213d88b75302aa9b643567414b03d3e8bdc9ed00a22dd8f5ad09666cdb88fcefc965c92709 (accessed on 20 May 2024). (In Chinese).
  57. Yuan, S.; Quiring, S.M. Comparison of three methods of interpolating soil moisture in Oklahoma. Int. J. Climatol. 2017, 37, 987–997. [Google Scholar] [CrossRef]
  58. Yi, Y.; Zhang, Z.; Zhang, W.; Jia, H.; Zhang, J. Landslide susceptibility mapping using multiscale sampling strategy and convolutional neural network: A case study in Jiuzhaigou region. Catena 2020, 195, 104851. [Google Scholar] [CrossRef]
  59. Yang, C.; Liu, L.-L.; Huang, F.; Huang, L.; Wang, X.-M. Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Res. 2023, 123, 198–216. [Google Scholar] [CrossRef]
  60. Liu, H.; Chen, S.-M.; Cocea, M. Subclass-based semi-random data partitioning for improving sample representativeness. Inf. Sci. 2019, 478, 208–221. [Google Scholar] [CrossRef]
  61. Saurette, D.D.; Berg, A.A.; Laamrani, A.; Heck, R.J.; Gillespie, A.W.; Voroney, P.; Biswas, A. Effects of sample size and covariate resolution on field-scale predictive digital mapping of soil carbon. Geoderma 2022, 425, 116054. [Google Scholar] [CrossRef]
  62. Zhao, Z.; Yang, Q.; Sun, D.; Ding, X.; Meng, F.-R. Extended model prediction of high-resolution soil organic matter over a large area using limited number of field samples. Comput. Electron. Agric. 2020, 169, 105172. [Google Scholar] [CrossRef]
  63. Zhang, Y.; Guo, L.; Chen, Y.; Shi, T.; Luo, M.; Ju, Q.; Zhang, H.; Wang, S. Prediction of soil organic carbon based on Landsat 8 monthly NDVI data for the Jianghan Plain in Hubei Province, China. Remote Sens. 2019, 11, 1683. [Google Scholar] [CrossRef]
  64. Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and mapping of soil organic carbon using machine learning algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
  65. Carvalhais, N.; Forkel, M.; Khomik, M.; Bellarby, J.; Jung, M.; Migliavacca, M.; Saatchi, S.; Santoro, M.; Thurner, M.; Weber, U. Global covariation of carbon turnover times with climate in terrestrial ecosystems. Nature 2014, 514, 213–217. [Google Scholar] [CrossRef] [PubMed]
  66. Batjes, N.H. Harmonized soil property values for broad-scale modelling (WISE30sec) with estimates of global soil carbon stocks. Geoderma 2016, 269, 61–68. [Google Scholar] [CrossRef]
  67. Zhou, T.; Geng, Y.; Chen, J.; Pan, J.; Haase, D.; Lausch, A. High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms. Sci. Total Environ. 2020, 729, 138244. [Google Scholar] [CrossRef] [PubMed]
  68. Luo, X.; Li, L. Response and Prediction Analysis of ENSO Events to Arid Climate in the Shiyang River Basin. Desert Oasis Meteorol. 2020, 14, 46–51. [Google Scholar]
  69. Restrepo-Coupe, N.; Campos, K.S.; Alves, L.F.; Longo, M.; Wiedemann, K.T.; de Oliveira Jr, R.C.; Aragao, L.E.; Christoffersen, B.O.; Camargo, P.B.; Figueira, A.M.S. Contrasting carbon cycle responses to dry (2015 El Niño) and wet (2008 La Niña) extreme events at an Amazon tropical forest. Agric. For. Meteorol. 2024, 353, 110037. [Google Scholar] [CrossRef]
  70. Huang, S.; Zhou, L.; Chen, Y.; Lu, H. Impact of policy factors on ecological changes in Minqin over the past 60 years. J. Arid Land Resour. Environ. 2014, 28, 73–78. [Google Scholar] [CrossRef]
  71. Xue, J.; Zhong, W.; Xie, L.; Unkel, I. Vegetation responses to the last glacial and early Holocene environmental changes in the northern Leizhou Peninsula, south China. Quat. Res. 2015, 84, 223–231. [Google Scholar] [CrossRef]
  72. He, Y. Analysis of feasibility and solution measures for increasing water transfer to Minqin in Jingdian Phase II. Value Eng. 2024, 43, 29–31. [Google Scholar]
  73. Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
  74. Zhang, W.-c.; Wan, H.-s.; Zhou, M.-h.; Wu, W.; Liu, H.-b. Soil total and organic carbon mapping and uncertainty analysis using machine learning techniques. Ecol. Indic. 2022, 143, 109420. [Google Scholar] [CrossRef]
  75. Chen, B.; Zhu, Z.; Yang, W. Study on habitat quality assessment and ecological restoration strategy in Jinchang City based on InVEST model. Gansu Sci. Technol. Inf. 2024, 53, 72–80. [Google Scholar]
  76. Nguyen, T.T.; Pham, T.D.; Nguyen, C.T.; Delfos, J.; Archibald, R.; Dang, K.B.; Hoang, N.B.; Guo, W.; Ngo, H.H. A novel intelligence approach based active and ensemble learning for agricultural soil organic carbon prediction using multispectral and SAR data fusion. Sci. Total Environ. 2022, 804, 150187. [Google Scholar] [CrossRef]
  77. Bonfatti, B.R.; Hartemink, A.E.; Giasson, E.; Tornquist, C.G.; Adhikari, K. Digital mapping of soil carbon in a viticultural region of Southern Brazil. Geoderma 2016, 261, 204–221. [Google Scholar] [CrossRef]
  78. Liu, B.; Guo, B.; Zhuo, R.; Dai, F. Estimation of soil organic carbon in LUCAS soil database using Vis-NIR spectroscopy based on hybrid kernel Gaussian process regression. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2024, 321, 124687. [Google Scholar] [CrossRef]
  79. Mishra, U.; Hugelius, G.; Shelef, E.; Yang, Y.; Strauss, J.; Lupachev, A.; Harden, J.W.; Jastrow, J.D.; Ping, C.-L.; Riley, W.J. Spatial heterogeneity and environmental predictors of permafrost region soil organic carbon stocks. Sci. Adv. 2021, 7, eaaz5236. [Google Scholar] [CrossRef]
  80. Wang, H.; Zhang, X.; Wu, W.; Liu, H. Prediction of Soil Organic Carbon under Different Land Use Types Using Sentinel-1/-2 Data in a Small Watershed. Remote Sens. 2021, 13, 1229. [Google Scholar] [CrossRef]
  81. Li, X.; Wang, Y.; Zhao, Y.; Zhai, J.; Liu, Y.; Han, S.; Liu, K. Research on the Impact of Climate Change and Human Activities on the NDVI of Arid Areas—A Case Study of the Shiyang River Basin. Land 2024, 13, 533. [Google Scholar] [CrossRef]
  82. Hu, Y.; Fister, W.; Kuhn, N.J. Temporal variation of SOC enrichment from interrill erosion over prolonged rainfall simulations. Agriculture 2013, 3, 726–740. [Google Scholar] [CrossRef]
  83. Li, P. Vertical zonal differentiation of surface sediments in the Shiyang River Basin and its application to environmental change. J. Lanzhou Univ. (Nat. Sci.) 2018. Available online: https://ss.zhizhen.com/detail38502727e7500f26092d2caccfc2605be4bcfd07be3ac4dd1921b0a3ea255101928fa69a765a3d2d695ce4c3744cf04e416453b0ac50e5618d8eefdefd1041d7e96669aeb633e2437ce31be008ef327c (accessed on 20 June 2024). (In Chinese).
  84. Zobeck, T.M.; Baddock, M.; Van Pelt, R.S.; Tatarko, J.; Acosta-Martinez, V. Soil property effects on wind erosion of organic soils. Aeolian Res. 2013, 10, 43–51. [Google Scholar] [CrossRef]
  85. Lark, R. Soil–landform relationships at within-field scales: An investigation using continuous classification. Geoderma 1999, 92, 141–165. [Google Scholar] [CrossRef]
  86. Yang, L.; Jia, W.; Shi, Y.; Zhang, Z.; Xiong, H.; Zhu, G. Spatiotemporal differentiation of soil organic carbon of grassland and its relationship with soil physicochemical properties on the northern slope of Qilian mountains, China. Sustainability 2020, 12, 9396. [Google Scholar] [CrossRef]
  87. Taghizadeh-Mehrjardi, R.; Schmidt, K.; Amirian-Chakan, A.; Rentschler, T.; Zeraatpisheh, M.; Sarmadian, F.; Valavi, R.; Davatgar, N.; Behrens, T.; Scholten, T. Improving the spatial prediction of soil organic carbon content in two contrasting climatic regions by stacking machine learning models and rescanning covariate space. Remote Sens. 2020, 12, 1095. [Google Scholar] [CrossRef]
  88. Li, H.; Wu, Y.; Liu, S.; Xiao, J.; Zhao, W.; Chen, J.; Alexandrov, G.; Cao, Y. Decipher soil organic carbon dynamics and driving forces across China using machine learning. Glob. Change Biol. 2022, 28, 3394–3410. [Google Scholar] [CrossRef]
  89. Gao, C.; Yan, X.; Qiao, X.; Wei, K.; Zhang, X.; Yang, S.; Wang, C.; Yang, W.; Feng, M.; Xiao, L. Multivariate prediction of soil aggregate-associated organic carbon by simulating satellite sensor bands. Comput. Electron. Agric. 2023, 209, 107859. [Google Scholar] [CrossRef]
Figure 1. General characteristics of the study area and distribution of SOC samples (drawing review number is GS(2024)0650).
Figure 1. General characteristics of the study area and distribution of SOC samples (drawing review number is GS(2024)0650).
Remotesensing 17 00420 g001
Figure 2. Plots comparing the RMSE and r for 30–60 samples for three methods ((a): RMSE; (b): r).
Figure 2. Plots comparing the RMSE and r for 30–60 samples for three methods ((a): RMSE; (b): r).
Remotesensing 17 00420 g002
Figure 3. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for one element under different methods; (b): error plot showing the error analysis of sampling for E in SVR).
Figure 3. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for one element under different methods; (b): error plot showing the error analysis of sampling for E in SVR).
Remotesensing 17 00420 g003
Figure 4. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for two elements under different methods; (b): error plot showing the error analysis of sampling for N-E in RF).
Figure 4. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for two elements under different methods; (b): error plot showing the error analysis of sampling for N-E in RF).
Remotesensing 17 00420 g004
Figure 5. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for three elements under different methods; (b): error plot showing the error analysis of sampling for L-A-N in RF).
Figure 5. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for three elements under different methods; (b): error plot showing the error analysis of sampling for L-A-N in RF).
Remotesensing 17 00420 g005
Figure 6. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for four elements under different methods; (b): error plot showing the error analysis of sampling for L-A-N-Min in RF).
Figure 6. Comparison of different methods under different schemes ((a): curves showing the error analysis of sampling for four elements under different methods; (b): error plot showing the error analysis of sampling for L-A-N-Min in RF).
Remotesensing 17 00420 g006
Figure 7. Map showing the distribution of SOC in the Shiyang River Basin. ((a): 2011 RF forecast map; (b): 2011 RF forecast map; (c): 2011 RF forecast map; (d): HWSD).
Figure 7. Map showing the distribution of SOC in the Shiyang River Basin. ((a): 2011 RF forecast map; (b): 2011 RF forecast map; (c): 2011 RF forecast map; (d): HWSD).
Remotesensing 17 00420 g007
Figure 8. Map showing a comparison of the middle reaches of the Shiyang River in 2011, 2018, and 2021 ((a): map showing the comparison between 2011 and 2018; (b): map showing the comparison between 2018 and 2021).
Figure 8. Map showing a comparison of the middle reaches of the Shiyang River in 2011, 2018, and 2021 ((a): map showing the comparison between 2011 and 2018; (b): map showing the comparison between 2018 and 2021).
Remotesensing 17 00420 g008
Figure 9. Comparison of the NDVI values in the northwestern part of the middle basin in 2011, 2018, and 2021.
Figure 9. Comparison of the NDVI values in the northwestern part of the middle basin in 2011, 2018, and 2021.
Remotesensing 17 00420 g009
Figure 10. Comparison of the land use type of the northwestern part of the middle basin in 2011, 2018, and 2021.
Figure 10. Comparison of the land use type of the northwestern part of the middle basin in 2011, 2018, and 2021.
Remotesensing 17 00420 g010
Figure 11. Comparison of the NDVI in the lower reaches of the Shiyang River in 2011, 2018, and 2021 ((a): 2011; (b): 2018; (c): 2021; (d): map showing the comparison between 2011 and 2021).
Figure 11. Comparison of the NDVI in the lower reaches of the Shiyang River in 2011, 2018, and 2021 ((a): 2011; (b): 2018; (c): 2021; (d): map showing the comparison between 2011 and 2021).
Remotesensing 17 00420 g011
Figure 12. Pearson correlation analysis of SOC and environmental factors.
Figure 12. Pearson correlation analysis of SOC and environmental factors.
Remotesensing 17 00420 g012
Table 1. Basic data for each of the sampling pints.
Table 1. Basic data for each of the sampling pints.
LocationSample NumbersLand Use Types
Haxi (HX)5Forest
Maozang Township (ZM)6Grassland
BaiTa Lake (BT)1Cropland
Wuwei Basin (WQ)12Cropland
Datan Township (B)19Cropland
Qingtu Lake (SQ)22Desert
Table 2. Twenty-one types of environmental factors and their abbreviations.
Table 2. Twenty-one types of environmental factors and their abbreviations.
Environmental FactorsAbbreviationEnvironmental FactorsAbbreviation
LongitudeLonDewp_AttributesD_A
LatitudeLatSLP_AttributesS_A
ElevationEVisibV
HWSDHVisib_AttributesV_A
NPPNPPWDSPW
NDVINWDSP_AttributesW_A
SlopeSMXSPDMXSPD
AspectAMINMIN
TemperatureTMAXMAX
Temp_AttributesT_APRCPP
DewpD
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Hu, N.; Qi, Y.; Zhao, W.; Dong, Q. Spatial and Temporal Variations in Soil Organic Carbon in Northwestern China via Comparisons of Different Methods. Remote Sens. 2025, 17, 420. https://doi.org/10.3390/rs17030420

AMA Style

Li J, Hu N, Qi Y, Zhao W, Dong Q. Spatial and Temporal Variations in Soil Organic Carbon in Northwestern China via Comparisons of Different Methods. Remote Sensing. 2025; 17(3):420. https://doi.org/10.3390/rs17030420

Chicago/Turabian Style

Li, Jinlin, Ning Hu, Yuxin Qi, Wenzhi Zhao, and Qiqi Dong. 2025. "Spatial and Temporal Variations in Soil Organic Carbon in Northwestern China via Comparisons of Different Methods" Remote Sensing 17, no. 3: 420. https://doi.org/10.3390/rs17030420

APA Style

Li, J., Hu, N., Qi, Y., Zhao, W., & Dong, Q. (2025). Spatial and Temporal Variations in Soil Organic Carbon in Northwestern China via Comparisons of Different Methods. Remote Sensing, 17(3), 420. https://doi.org/10.3390/rs17030420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop