Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region

Khosravani, Pegah; Moosavi, Ali Akbar; Baghernejad, Majid; Kebonye, Ndiye M.; Mousavi, Seyed Roohollah; Scholten, Thomas

doi:10.3390/rs16224304

Open AccessArticle

Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region

by

Pegah Khosravani

^1,2,

Ali Akbar Moosavi

¹,

Majid Baghernejad

¹,

Ndiye M. Kebonye

^2,3,*

,

Seyed Roohollah Mousavi

⁴

and

Thomas Scholten

^2,3

¹

Department of Soil Science, Faculty of Agriculture, Shiraz University, Shiraz 7194684471, Iran

²

Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, 72076 Tübingen, Germany

³

Cluster of Excellence Machine Learning: New Perspectives for Science, University of Tübingen, 72076 Tübingen, Germany

⁴

Department of Soil Science, Faculty of Agriculture, University of Tehran, Karaj 77871-31587, Iran

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(22), 4304; https://doi.org/10.3390/rs16224304

Submission received: 26 September 2024 / Revised: 7 November 2024 / Accepted: 15 November 2024 / Published: 18 November 2024

(This article belongs to the Special Issue Remote Sensing for Land Degradation and Drought Monitoring II)

Download

Browse Figures

Versions Notes

Abstract

:

Soil aggregate stability (SAS) is needed to evaluate the soil’s resistance to degradation and erosion, especially in semi-arid regions. Traditional laboratory methods for assessing SAS are labor-intensive and costly, limiting timely and cost-effective monitoring. Thus, we developed cost-efficient wall-to-wall spatial prediction maps for two fundamental SAS proxies [mean weight diameter (MWD) and geometric mean diameter (GMD)], across a 5000-hectare area in Southwest Iran. Machine learning algorithms coupled with environmental and soil covariates were used. Our results showed that topographic covariates were the most influential covariates in predicting these SAS proxies. Overall, our SAS maps are valuable tools for sustainable soil and natural resource management, enabling decision-making for addressing potential soil degradation and promoting sustainable land use in semi-arid regions.

Keywords:

aggregate stability; digital soil mapping; machine learning; random forest; semi-arid regions Southern Iran

1. Introduction

Soil aggregate stability (SAS) is one of the main soil physical properties indicating the level of soil resistance to water and wind erosion [1,2,3]. It is also a proxy of soil structure, demonstrating the soil’s ability to control water infiltration and nutrient movement. SAS significantly impacts soil aeration, microbial activity, and tillage performance by improving air circulation, supporting microbial life, and reducing compaction for easier tillage [4].

The spatial distribution of SAS depends on many factors including particle size distribution, plant characteristics (e.g., plant type, growth stages, plant root system, and residues), and land management (e.g., tillage operations), just to name a few [4]. Since SAS is not easily quantified as a single value, certain indices have been proposed as proxies to represent this property: the mean weight diameter (MWD) and the geometric mean diameter (GMD) of soil aggregates [5,6]. These proxies provide insights into soil structure and resilience to erosion [7]. MWD measures the average size of soil aggregates, weighted by their mass. It is often used to evaluate the soil’s physical resistance to external forces such as water erosion and compaction. MWD values greater than 2 indicate more stable and robust soil aggregates [8], contributing to soil health and resistance to degradation [7]. GMD reflects the distribution of aggregate sizes, providing a finer assessment of soil structure by emphasizing the arrangement and connectivity of soil particles [9]. This measure is useful when evaluating soil porosity, aeration, and water infiltration capacity, which are crucial for plant growth and microbial activity [10]. Studies have shown that both MWD and GMD serve as essential proxies for predicting soil health, erosion susceptibility, and nutrient cycling efficiency, thus highlighting their relevance in soil conservation practices and land management efforts [8]. Specific to semi-arid regions, other proxies have been adopted as well [11,12,13,14,15,16]. For example, in some Australian regions with low organic matter, SAS is largely influenced by clay content. Here, wetting-and-drying cycles, especially with the addition of organic amendments like straw, can significantly enhance SAS [17]. Additionally, biological soil crusts (biocrusts)—composed of cyanobacteria, algae, fungi, and other microorganisms—play a crucial role in stabilizing soil aggregates, particularly in arid climates where they act as a biological agent to enhance SAS [18]. However, for humid conditions, their stabilizing effect may be overshadowed by the presence of vascular plants and higher organic matter content [18].

Given the importance of SAS in facilitating land management efforts, it is important to quantify and map its distribution over vast landscapes, and digital soil mapping (DSM) has made this possible and more rapid than solely depending on conventional approaches [19]. DSM has previously been applied to understand the relationship between SAS indices and environmental covariates as well as present their spatial and temporal variations at the landscape scale [20,21]. In DSM, freely accessible environmental covariates such as those from remote sensing data (RS) have frequently been used for such modeling tasks. DSM involves the use of different machine learning algorithms (MLAs) to extrapolate or quantify over space different soil properties using observational data [22,23,24]. However, one reliable and widely applied MLA is the Random Forest (RF) algorithm. RF has demonstrated superior performance in soil property prediction and mapping over the years as it can handle complex relationships between input and target variables, making it suitable for capturing nonlinear patterns in soil datasets [25]. RF is an ensemble approach that combines multiple decision trees eventually resulting in improved accuracy and robustness of model predictions [16,26,27,28,29]. The application of DSM for mapping soils is popular, with varying successes and failures. Despite this, there is still not sufficient information on SAS modeling and mapping for soil management purposes in dryland regions [24,25].

To fill this fundamental knowledge gap, we aimed to estimate two critical indices of SAS (MWD and GMD) based on two MLAs, RF and k-nearest neighbor (k-NN), coupled with selected environmental covariates, in the Lapuee Plains of Southwestern Iran. Specifically, our study objectives were to (1) develop spatial prediction maps for SAS properties and their associated uncertainties; (2) evaluate the performance of different MLAs in predicting SAS indices; and (3) highlight the benefits of SAS maps for operational purposes.

2. Materials and Methods

Our study follows a sequential workflow described in (Figure 1). The main steps were to (i) compile a database including soil observations of SAS indices for the top soil layer (0 to 30 cm) coupled with the environmental covariate, (ii) select the effective environmental covariates through the recursive feature elimination (RFE) method, (iii) model SAS indices using RF and k-NN MLAs based on two scenarios S1 [all covariates (RS indices, topographic attributes, and soil covariates)] and S2 (considering only topographic and RS covariates), (iv) evaluate the performance of each MLA based on conventional model validation criteria, and (v) map both the SAS indices of interest, MWD and GMD, alongside their associated uncertainties. Our reason for adopting a scenario-based approach was to examine how the same model would perform given a varying number of covariates. Particularly, we would be able to establish whether fewer variables, relative to more, would yield a parsimonious outcome or not.

2.1. Study Area Description

Our study area is located between 29°78′ to 29°88′E longitude and 52°68′ to 52°71′N latitude in the Lapuee Plains of Fars province, Southwestern Iran, and covers 5000 ha (Figure 2). The mean annual temperature and precipitation in this area are 17 °C and 446 mm, respectively, a typical BSh (Semi-Arid Climate) class based on the köppen classification system [30]. The climatic data from the Zarghan Meteorological Station was provided by the Iranian Meteorological Organization covering the period from 2001 to 2021 (Zarghan Meteorological Station, 2001–2021). The study area consists mainly of two soil orders, Entisols and Inceptisols. Soils here have relatively low organic matter ranging between less than 1 and 3% [31]. Also, these soils tend to have low nutrient levels limiting vegetation cover and sustenance. Nonetheless, for areas where farming (cultivation) takes place, farmers integrate organic manure to improve soil aggregation and nutrient levels as well as organic matter input. Soils here also have relatively low moisture due to the climatic conditions (xeric and thermic) [32]. The region has three main landscape types ranging from South to North: Plain, Piedmont, and Hill lands. The Lapuee Plain is largely flat, with elevation ranging between 1591 and 1900 above sea level (Figure 2), and predominantly features flat terrain. Over 85% of the area consists of Plain landscapes, Less than 10% of the area is Piedmont landscape also with Hill land areas covering less than 5% of the total region. (Figure 2).

2.2. Field Study and Laboratory Analysis

The study area comprises primary land-use types: agricultural irrigated, dry farming, and pasture lands. A total of 129 georeferenced points were sampled. At each point, surface soils were collected (0–30 cm depth) and transferred to the laboratory where they were immediately air-dried following routine laboratory procedures.

Since our goal was to estimate SAS, we first used a standard sieving method to separate stable aggregates. For this purpose, the soil samples were each passed through a 6 mm sieve to remove debris. After that, 100 g of each soil sample was placed on top of a set of sieves. The sieves were in the ranges >2, 2, 1, 0.5, 0.25, and <0.053 mm open diameters. The set of sieves was then placed in a water-filled container of a wet sieving apparatus, and vertically shaken for 5 min. After shaking, each fraction of aggregate size was washed with water, oven-dried at 65 °C, and weighed [7,8]. Having obtained the weights of different soil aggregate fractions, we then estimated GMD and MWD. These were calculated as follows:

M W D = \sum_{i = 1}^{n} w_{i} {\bar{d}}_{i}

(1)

G M D = e x p [\sum_{i = 1}^{n} w_{i} Log {\bar{d}}_{i}]

(2)

{\bar{d}}_{i}

is the arithmetic mean of the aggregate size on the ith sieve, w_i is the proportion of stable aggregates on the ith sieve, and n is the number of aggregate size groups.

For better interpretation, especially regarding the MWD, values were classified based on the criteria in [8] (Table 1).

Along with the measurements of GMD and MWD, we also assessed other physical properties of the soil known to influence SAS. These properties included bulk density (BD) and Atterberg limits—specifically, the Plastic Limit (PL), Liquid Limit (LL), Shrinkage Limit (SL), and plasticity index (PI)—(BD) was measured via the cylinder method, and all Atterberg limits measured following international ASTM standards and protocols (D 422 and D 4318). These properties provide valuable information about the behavior of soil under different moisture conditions, which is directly linked to the formation and stability of soil aggregates.

2.3. Environmental Covariates

We selected environmental covariates including topographic attributes, soil data, and RS data. As mentioned earlier, soil data were BD and Atterberg limit values. RS data were based on Landsat 8 image satellite (OLI/TIRS), including individual bands and vegetation indices commonly used for predicting SAS, the Normalized Ratio Vegetation Index (NRVI), and the Greenness (Green). Vegetation indices serve as proxies for vegetation cover, while individual spectral bands offer insights into the physical properties of land surface features (i.e., soil texture, moisture content, and mineral composition), pertinent to soils [26,33]. Topographic covariates like slope gradient, aspect, and microtopography influence SAS, influencing water redistribution, microclimatic conditions, vegetation cover, and exposure to environmental factors [16,34]. Steeper slopes are more prone to erosion, while microtopographic features can create localized variations in soil moisture and organic matter content [6,35].

Topographic covariates were derived from the ASTER Global Digital Elevation Model (DEM) with a spatial resolution of 30 m. From the DEM, 33 topographic covariates were obtained using SAGA GIS software, version 7.4.0 [36]. For the spectral data, 20 covariates were obtained including the ratios of visible, near-infrared, short-wave infrared, and thermal bands (2, 3, 4, 5, 6, 7, 10, and 11). To do so, the Landsat 8 images (path/row: 163/39) with spatial 30 m resolution were used which corresponded to the soil sampling period. Pre-processing of remote sensing data was performed, and radiometric correction was performed in ERDAS IMAGINE (version 14) software. Additionally, the five corresponding soil covariates previously mentioned were measured in the laboratory. Finally, all environmental covariates were resampled to a spatial resolution of 30 m using the bilinear resampling method in R statistical software, version 4.2.1. We should highlight here that for the soil covariates, each was first mapped using ordinary kriging to obtain wall-to-wall spatial maps for each property.

2.4. Environmental Covariate Selection for Modeling Purposes

In selecting covariates, we approached it based on two scenarios. In the first scenario (S1), we modeled SAS indices using all the environmental covariates (RS, topographic, and soil covariates) and then adopted a recursive feature elimination (RFE) method based on the “caret” package in R to select the most highly predictive set of environmental covariates for modeling SAS indices (MWD and GMD). More detailed explanations about RFE can be found in [37]. For the second scenario (S2), only the RFE-selected environmental covariates (RS and topographic covariates) were used. This comparison allowed us to assess the model performance while using either many or fewer variables.

2.5. Modeling and Mapping Aspects

We implemented two MLAs, the k-nearest neighbors (k-NN) and random forest (RF), to predict MWD and GMD indices under the two scenarios. The main reason for using these two algorithms is that the RF has proven effective for mapping several soil properties with success in the past [38,39,40,41]. Meanwhile, k-NN, which is considered a simple algorithm, was only used to compare our results with those of the RF.

The RF algorithm is a well-known tree-based approach. It involves many decision trees which are combined together to generate a prediction [42]. Individual trees are trained using bootstrap samples of the data, with each set of original data (in the bag) being used for training. Residuals for internal (out-of-bag) cross-validation of this algorithm address non-linear and complex relationships between soil properties and environmental covariates. The RF algorithm is advantageous because it requires few parameters for training while being easy to implement [43]. It also reduces overall error by using subsets of randomly chosen input variables for each tree [44]. Its predictive power increases with the number of independent trees, which decreases collinearity between them [45]. Two optimizing parameters, including the number of trees (ntree) ranging from 100 to 1000 based on an interval of 50 trees [46] and the number of mtrys, were applied to minimize error (RMSE) using the “randomForest” package in R software, version 4.2.1.

The k-NN algorithm is a simple and easy-to-implement method used for both classification and regression tasks [47]. It determines the closest probability of occurrence and prediction values based on the optimal selection of nearest neighbors, aiming to minimize model errors [48]. The k-NN was tuned using the “caret” package in R software, with the optimal number of ‘k’ parameters selected to calibrate the model. This approach was chosen because of its simplicity and effectiveness. Based on these models, we further generated wall-to-wall maps of MWD and GMD together with their associated uncertainty estimates. For the uncertainty estimates, the bootstrap method using n = 100 bootstraps was applied. Specifically, we generated 100 maps for each of the SAS indices and averaged them.

Having randomly divided the dataset into calibration (80%) and validation (20%) sets, three common model evaluation metrics including the coefficient of determination (R²), root mean square error (RMSE), and normalized root mean squared error (nRMSE) were used to assess model accuracy (equations below).

R^{2} = 1 - \frac{{\sum_{i = 1}^{n} (a_{i} - b_{i})}^{2}}{{\sum_{i = 1}^{n} (a_{i} - \bar{b})}^{2}}

(3)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(a_{i} - b_{i})}^{2}}

(4)

n R M S E = \frac{R M S E}{\bar{b}}

(5)

where a_i: observed value, b_i: predicted value,

\bar{b}

: mean of the observed values, n: total number of observations.

3. Results and Discussion

3.1. Summary Statistics of Soil Variables and SAS Indices

A statistical summary of SAS indices is presented in (Table 2). The minimum and maximum values of the MWD and GMD vary from 0.24 to 2.25 and 0.009 to 0.81 mm, respectively. The mean values of MWD and GMD were 1.64 and 0.48 mm, respectively. The minimum and maximum values of the BD vary from 0.36 to 1.95 g·cm⁻³ and its mean value was 1.14 g·cm⁻³. According to Hazelton and Murphy [49], based on the mean value of BD in this study, soils belong to a low-class BD rating (1 < BD < 1.3 g·cm⁻³). Meanwhile, the minimum and maximum values of the Shrinkage Limit (SL), Plastic Limit (PL), Liquid Limit (LL), and Plasticity Index (PI) vary from 10.1 to 45.0, 18.6 to 49.2, 30.0 to 73.0, and 2.4 to 31.9%, respectively. Moreover, the coefficient of variation (CV%) values for MWD, BD, and PI showed high variability, while GMD, SL, PL, and LL varied moderately among the classes proposed by [45].

3.2. The Importance of Environmental Covariates and Modeling Performance

Based on the output generated by the RFE method, 15 out of the total 58 environmental covariates were selected as the most important predictors of SAS. Carefully selecting a subset of covariates has been shown to improve model outcomes compared with using all available covariates [50]. We selected several covariates for analysis. The first group consisted of five indices obtained from RS data: The Normalized Ratio Vegetation Index (NRVI), Greenness (Green), Difference Vegetation Index (DVI), Soil Adjusted Vegetation Index (SAVI), and Transformed Vegetation Index (TVI). The second group included five parameters derived from topographic attributes: Channel Network Base Level (CNBL), Watershed Basins (WB), Valley Depth (VD), Diffuse Insolation (Diff), and Standardized Height (SH). Finally, we incorporated five soil covariates: BD, PL, SL, LL, and PI (see Table 3). The relative importance of the selected environmental covariates while predicting GMD and MWD based on the two scenarios is presented in (Figure 3 and Figure 4). Based on the ranking of environmental covariates, topographic attributes showed the greatest importance in predicting SAS indices (Figure 3 and Figure 4), which validated the dominant influence of topography on the spatial variability of SAS indices in the study area. To further elaborate on the influence of topographic factors on soil aggregate stability (SAS), factors like Channel Network Base Level, Watershed Basins, Valley Depth, and Diffuse Insolation play crucial roles by influencing soil moisture, erosion patterns, nutrient cycling, and microclimate conditions. Channel Network structure regulates soil moisture and erosion, essential for SAS [51], while Base Level influences water table depth and aggregation processes [51]. Additionally, Watershed Basins shape water flow and nutrient cycles, affecting soil structure [52], and Valley Depth creates microclimatic conditions, thus altering soil cohesion [53]. Meanwhile, Diffuse Insolation modulates soil temperature and moisture, affecting microbial activity critical for soil aggregation [54]. Overall, among all the soil and environmental covariates, the two topographic covariates, CNBL and Diff, had the highest influence on the modeling process for SAS indices (Figure 3 and Figure 4). The CNBL has the highest influence on both GMD and MWD in the study area (Figure 3 and Figure 4). CNBL, representing topography, plays a significant role in SAS indices, aligning with SOC patterns observed in the area due to soil erosion and deposition [55,56].

Topographic attributes, which describe the land physiography, are widely applied in DSM [57,58,59,60]. We observe that these attributes drive SAS variability in the area, influencing soil moisture regimes, organic matter, microbial diversity and activity, and plant cover, as well as surface features known to influence runoff and erosion. All these directly or indirectly influence the formation and stability of soil structure. The variability and distribution of SAS indices have also been shown to be dependent on the abovementioned topographic covariates [6,61]. Meanwhile, soil erosion results in the deposition of surface soil, changes in micro-topographic characteristics, and the composition of soil properties, all of which affect SAS stability [29,62,63,64]. Moreover, research by Parker et al. [53] highlighted that elevation stands out as the most crucial terrain factor influencing the spatial distribution of vegetation. Vegetation, in turn, acts as a source of SOM, directly contributing to the formation and stabilization of soil aggregates [6,54]. Also, topographic covariates are important factors in controlling the effects of climate, water availability, lithological dissimilarity, and biota in soil formation [65,66]. Topographic and climatic variables control the state of soil water, and the dynamics of plant litter mineralization, erosion, and deposition [46,67]. Several studies have confirmed the importance of topographic covariates such as elevation, slope, and aspect [2,57] in the prediction of soil properties. Millard and Richardson [68] also showed the importance of topographic attributes in predicting and mapping soil properties. Bouslihim et al. [21] specifically stated that topography and geology are the most crucial parameters in the spatial prediction of SAS. The results based on the two scenarios indicate that different environmental covariate combinations affect the prediction of SAS indices. The relative importance analysis (Table 3) showed that the presence of three sets of environmental covariates, shown in S1, increased the RF and k-NN performances. The results validate the significance and benefit of soil covariates in influencing the variability of SAS indices in the study area. Soil covariates were ranked lower in importance in the RF algorithm compared with the k-NN algorithm because RF evaluates variable significance based on their contribution to improving node purity across multiple decision trees, which could dilute the impact of individual variables. In contrast, k-NN directly uses soil covariates to calculate distances between data points, making these covariates more prominent in determining predictions. This difference arises because RF averages the importance across many trees, while k-NN relies on immediate distance metrics, highlighting the soil covariates’ significance more clearly (Figure 3a,c and Figure 4a,c). Therefore, soil covariates showed a high ability to predict the changes in some soil secondary attributes, such as SAS indices, and demonstrated acceptable results in improving modelling performance [24]. Kamamia et al. [6] found that soil covariates have an important role in MWD prediction. Our results, in agreement with Zeraatpisheh et al. [24], show that soil covariates have the most important effect in predicting SAS compared with RS indices. DVI and SH had relatively greater effects when predicting MWD, whereas DVI, SH, and WB covariates had lesser effects when predicting GMD (Figure 3 and Figure 4).

While evaluating the performance of our RF and k-NN algorithms (Table 4, also see Supplementary Figure S1), the results based on the two scenarios (S1 and S2) showed that S1 was the best scenario for predicting both GMD and MWD. We link this improvement in model performances to the integration of soil covariates which are crucial and should be considered in SAS prediction. Altogether, both algorithms succeeded in modelling SAS. Overall, the RF algorithm resulted in higher accuracy for the SAS indices (R² = 0.70 for GMD and R² = 0.72 for MWD) than the k-NN algorithm (R² = 0.62 for GMD and R² = 0.63 for MWD). Kamamia et al. [6] predicted GMD and MWD indices via the RF algorithm with R² values of 0.50 and 0.39, respectively, less than when compared with the current study. The main advantages of RF are its high capability to generate robust estimates, its non-linear capabilities to relate the response and independent variables, and its partial interpretability associated with the ability to derive relative importance estimates of each covariate applied in the model [69]. Applying the RF algorithm, Zeraatpisheh et al. [24] also reported comparable R² values of 0.50 to 0.75 while predicting SAS indices in another semi-arid region of Iran. In contrast, compared with our results, higher R² values (0.8 > R² > 0.65) were also reported by Bouslihim et al. [21] in Morocco while also applying the RF algorithm.

3.3. Spatial Prediction and Uncertainty Maps

Predicted maps of SAS indices (MWD and GMD) based on both RF and k-NN algorithms are presented in (Figure 5).

While adopting the best scenario, here S1, we see that both MLAs had comparable overall spatial patterns of MWD and GMD where high, relative to low, lands show low and high SAS predictions, respectively. Nonetheless, observing the spatial patterns in detail, we see that the k-NN contains several artifacts that are mainly introduced by the covariates used here (see Supplementary Figures S2 and S3). The regions with the highest values of MWD (>2 mm) and GMD (0.70 mm) involve the irrigation of croplands. In contrast, the lowest values of MWD (1.2 mm) and GMD (0.2 mm) were found at the northern and southern boundaries, where pastures with sparse vegetation were significantly impacted by high erosion rates. The central parts of the study region have mostly flat landscapes (lowland) and all the dispositional material (alluvial) is assumed to have eroded and been deposited from the higher area (upland) to settle here. Therefore, the accumulation of the fine soil texture components such as clay could have led to more SOC stock and eventually increased SAS levels in the soils. Some physio-chemical properties such as clay, SOC, and calcium carbonate equivalent (CCE) are binding agents that improve soil structure by forming large and stable aggregates [70]. The study area is mainly on the Piedmont plain and Hill land landscapes. By comparing these two landscapes, we observe that for the highlands, our SAS indicators show that soil stability is lower. However, for lowlands where land-use activities (e.g., crop rotation) take place, we observe higher soil stability. Variability in soil structure and quality is linked to variations in land-use types, which are directly correlated with soil aggregate stability [71]. Furthermore, the type of agricultural land, how much fertilizer is applied, and the type of crops can influence SAS [72], which is also relevant to our study area. Agricultural activities are known to encourage higher rooting density and reduce erosion rates. Under these situations, the level of SOC and the percentage of stable aggregates are expected to be high. Our results follow those of Kamamia et al. [6] and Ostovari et al. [70]. According to the land management in the study area, it seems that increasing SAS in agricultural lands may relate to agricultural practices that involve adding crop residues to the soil [73]. This process promotes the formation of macroaggregates, which then contribute to the development of macroaggregates through the influence of transient organic binding agents such as polysaccharides, roots, and fungal hyphae [74]. Additionally, bioturbation by earthworms and soil micro and macrofauna plays a crucial role in this aggregation process. Earthworms and soil fauna enhance soil structure by physically mixing the soil and organic matter, which accelerates the formation of macroaggregates, and their subsequent consolidation into macroaggregates. Their activity also facilitates the distribution of organic binding agents throughout the soil, further supporting the development and stability of soil aggregates [75]. In sparse vegetation pastures, the sparse vegetation may result in weaker soil structures and low structural stability, as vegetation plays a crucial role in protecting the soil from erosion and maintaining aggregate stability [76].

Additionally, a histogram analysis of MWD and GMD across the entire study area revealed that frequency distribution followed a normal distribution with a mean MWD of 1.64 mm and a mean GMD of 0.48 mm (Figure 6) (equal to the mean MWD value of the entire data set) (Table 2). Le Bissonnais [8] classified the stability of aggregates into the five classes of “very unstable”, “unstable”, “medium”, “stable”, and “very stable”, when the values of MWD are <0.4, 0.40–0.80, 0.4–0.80, 0.8–1.3, 1.3–2, and >2 mm, respectively. Based on this classification, most of the soils in our study area are classified as “stable” and “very stable”, which implies they are not prone to a high risk of soil degradation.

Our uncertainty maps for SAS indices based on the two models are presented in (Figure 7 and Figure 8). The least amount of uncertainty was observed from the central boundary to the northern part of the study region, where MWD and GMD are (0.003 mm _ 0.005 mm) and (0.002 mm), respectively (Figure 7 and Figure 8). Given the low level of uncertainty in this area, the high density of observation points, and the strong correlation between the environmental covariates and surface SAS indices, the RF algorithm was chosen as the superior method for predicting these indices. Baltensweiler et al. [77] also reported the effect of the observed data density on predictive model uncertainty, showing that increased data density generally leads to reduced uncertainty in predictions. However, in some parts of the northern and southern regions, the amount of uncertainty was slightly more, which may be a consequence of sedimentation conditions in this area. Furthermore, the uncertainty map showed that the results of the S2 scenario have slightly higher uncertainty than that of the S1 scenario based on the applied statistical indices (Table 4 and Figure 7 and Figure 8). Since the S2 scenario excluded the soil covariates and only used the other environmental covariates as predictors, this may have resulted in lower prediction accuracy which led to higher associated uncertainty here when compared with the S1 scenario.

The results of the model uncertainty S1 scenario showed that the uncertainty for MWD and GMD ranged from 0.005 to 0.15 mm and 0.002 to 0.09 mm, respectively. Shahini Shamsabadi et al. [78] stated that to make better decisions about the quality of prediction maps, the results of model uncertainty should also be considered alongside the model validation statistics. The quantification of model uncertainty is an essential step in DSM if the spatial prediction maps of soil properties are to be applied to management and decision-making procedures [79]. Overall, the uncertainty analysis showed that the maps of MWD and GMD performed well under both scenarios. Therefore, the current map products can be considered useful for decision-making related to agricultural management and ecosystem services monitoring, a significant benefit of DSM already highlighted in several studies [80,81].

Table 3. Environmental predictors were used to predict soil aggregate stability (SAS) indices in the present study (detailed descriptions can be seen in Table S1).

Environmental Variables	Source	Abbreviation	Definition	Scorpan Factors	Reference
Topographic	DEM	CNBL	Channel Network Base Level	Topographic (r)	[54,82,83,84,85]
		WB	Watershed Basins	Topographic (r)
		VD	Valley Depth	Topographic (r)
		Diff	Diffuse Insolation	Topographic (r)
		SH	Standardized Height	Topographic (r)
Soil	LAB	BD	Bulk density	Soil (s)	[86]
		PI	Plasticity Index	Soil (s)
		SL	Shrinkage Limit	Soil (s)
		LL	Liquid Limit	Soil (s)
		PL	Plastic Limit	Soil (s)
RS indices	Landsat 8 images	NRVI	Normalized Ratio Vegetation Index	Organism (o, s)	[51,87,88,89]
		Green	Greenness	Organism (o, s)
		DVI	Difference Vegetation Index	Organism (o)
		SAVI	Soil Adjusted Vegetation Index	Organism (o, s)
		TVI	Transformed Vegetation Index	Organism (o)

Note: r, s, o, and p represent relief, soil, organism, and parent materials in the SCORPAN model, respectively.

Table 4. Statistical criteria of the model performance to estimate aggregate stability indices using machine learning (ML) algorithm for two scenarios.

		MWD (mm)			GMD (mm)
Scenarios (S)	ML Algorithm	R²	RMSE	NRMSE	R²	RMSE	NRMSE
S1 (RS + Topographic and Soil covariates)	RF	0.72	0.16	0.09	0.7	0.09	0.18
S1 (RS + Topographic and Soil covariates)	k-NN	0.63	0.19	0.11	0.62	0.07	0.14
S2 (RS + Topographic covariates)	RF	0.69	0.13	0.07	0.69	0.12	0.25
S2 (RS + Topographic covariates)	k-NN	0.58	0.22	0.13	0.59	0.1	0.20

Note: RF: Random Forest; k-NN: k-nearest neighbor; MWD: mean weight diameter; GMD: geometric mean diameter; RS: remote sensing; R²: coefficient of determination; RMSE: root mean square error; NRMSE: normalized root means square error.

3.4. Strengths and Limitations of the Research

Innovative use of machine learning: This study employs advanced machine learning algorithms (RF and k-NN) to create spatial prediction maps for MWD and GMD. This approach enhances the efficiency and accuracy of SAS assessments compared with traditional, labor-intensive methods. Practical application for sustainable management: The resulting digital SAS maps serve as valuable tools for sustainable soil and natural resource management. They facilitate timely interventions to combat soil degradation, particularly in semi-arid regions, thus promoting sustainable land-use practices. Limited generalizability: The findings are specific to the Lapuee Plain and may not be directly applicable to other regions with different soil and environmental conditions. Data collection constraints: The time-consuming and expensive nature of experimental and laboratory methods for measuring SAS indices may have limited the scope and scale of the study, potentially impacting the representativeness of the results. Model transferability: While the ML algorithms used in the study were promising, their performance in predicting SAS indices in other regions with distinct soil-forming factors remains uncertain and requires further exploration. Incomplete consideration of factors: This study focused on topographic covariates, remote sensing indices, and soil covariates, potentially overlooking other influential factors that could affect soil aggregate stability, such as land-use practices or climate variability.

4. Conclusions

In conclusion, this study examined the topsoil variations of soil aggregate stability (SAS) indices in the Lapuee plains of Shiraz, Iran. Due to the time-consuming and expensive nature of experimental and laboratory methods for measuring SAS indices (MWD and GMD), this study aimed to predict these over the given area. The results showed that scenario S1, which incorporated topographic attributes (CNBL, WB, VD, Diff, and SH), remote sensing (RS) indices (NRVI, Green, DVI, SAVI, and TVI), and soil covariates (BD, PL, SL, LL, and PI), resulted in better accuracy (Table 4). We see that some of the topographic attributes emerged as the most influential factors in explaining the spatial variation of SAS indices, followed by soil covariates. Our results confirmed that the topographic covariates, by controlling the state of soil water, the dynamics of mineralization of plant litter, erosion, and deposition, can be used to predict soil properties even in low-relief areas. The performance of the two MLAs was evaluated based on their prediction accuracy and associated uncertainty. Both algorithms demonstrated acceptable performance, but the RF algorithm slightly outperformed the k-NN algorithm, showing higher accuracy and lower uncertainty as indicated by the model validation statistics used.

The high-resolution maps of SAS indices generated in the Lapuee plain in Southwestern Iran are important for understanding the soil’s ability to resist erosion, maintain fertility, and support agricultural productivity. By assessing SAS, researchers, and land managers can identify areas at risk of degradation, especially in the northern and southern parts of the study area, implement targeted conservation practices, and make appropriate and scientific-based managerial decisions to sustain soil health and productivity. This knowledge is crucial for developing effective soil management strategies and most importantly for preserving soils.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs16224304/s1, Table S1. Description of selected topography and remote sensing indices. Figure S1. Scatter plot for a) MWD and b) GMD based on best model (RF) and best scenario (S1). Figure S2. Map of Environmental variables. Figure S3. Map of soil variables.

Author Contributions

Conceptualization, P.K. and A.A.M.; methodology, P.K. and A.A.M.; software, P.K.; validation, P.K. and S.R.M.; formal analysis, P.K. and N.M.K.; resources, A.A.M. and M.B.; data curation, P.K.; writing—P.K.; writing—review and editing A.A.M., T.S., M.B., S.R.M. and N.M.K.; visualization, S.R.M.; supervision, A.A.M. and T.S.; project administration, A.A.M. and T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by “Shiraz University, and the Faculty of Agriculture”. Shiraz University: 99GCB1M148056.

Data Availability Statement

Data are available upon reasonable email request to the corresponding author. The data are not publicly available due to ethical reasons.

Acknowledgments

The authors express their deep gratitude to Shiraz University for providing the financial support necessary to conduct this research. They also sincerely thank the editor and reviewers involved in the publication process. NMK was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy—EXC number 2064/1—Project number 390727645.

Conflicts of Interest

The remaining authors declare that the research was conducted without any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Barthès, B.; Roose, E. Aggregate Stability as an Indicator of Soil Susceptibility to Runoff and Erosion; Validation at Several Levels. CATENA 2002, 47, 133–149. [Google Scholar] [CrossRef]
Cantón, Y.; Solé-Benet, A.; De Vente, J.; Boix-Fayos, C.; Calvo-Cases, A.; Asensio, C.; Puigdefábregas, J. A Review of Runoff Generation and Soil Erosion across Scales in Semiarid South-Eastern Spain. J. Arid Environ. 2011, 75, 1254–1261. [Google Scholar] [CrossRef]
Khosravani, P.; Baghernejad, M.; Moosavi, A.A.; Rezaei, M. Digital Mapping and Spatial Modeling of Some Soil Physical and Mechanical Properties in a Semi-Arid Region of Iran. Environ. Monit. Assess. 2023, 195, 1367. [Google Scholar] [CrossRef]
Mustafa, A.; Minggang, X.; Ali Shah, S.A.; Abrar, M.M.; Nan, S.; Baoren, W.; Zejiang, C.; Saeed, Q.; Naveed, M.; Mehmood, K.; et al. Soil Aggregation and Soil Aggregate Stability Regulate Organic Carbon and Nitrogen Storage in a Red Soil of Southern China. J. Environ. Manag. 2020, 270, 110894. [Google Scholar] [CrossRef]
Khanifar, J.; Khademalrasoul, A. Effects of Neighborhood Analysis Window Forms and Derivative Algorithms on the Soil Aggregate Stability—Landscape Modeling. CATENA 2021, 198, 105071. [Google Scholar] [CrossRef]
Kamamia, A.W.; Vogel, C.; Mwangi, H.M.; Feger, K.-H.; Sang, J.; Julich, S. Mapping Soil Aggregate Stability Using Digital Soil Mapping: A Case Study of Ruiru Reservoir Catchment, Kenya. Geoderma Reg. 2021, 24, e00355. [Google Scholar] [CrossRef]
Kemper, W.D.; Rosenau, R.C. Aggregate Stability and Size Distribution. In Methods of Soil Analysis; John Wiley & Sons, Ltd.: Hobokoen, NJ, USA, 1986; pp. 425–442. ISBN 978-0-89118-864-3. [Google Scholar]
Le Bissonnais, Y. Aggregate Stability and Assessment of Soil Crustability and Erodibility: I. Theory and Methodology. Eur. J. Soil Sci. 2016, 67, 11–21. [Google Scholar] [CrossRef]
Hillel, D. Environmental Soil Physics: Fundamentals, Applications, and Environmental Considerations; Elsevier Science: Amsterdam, The Netherlands, 2014. [Google Scholar]
Six, J.; Elliott, E.T.; Paustian, K. Soil Macroaggregate Turnover and Microaggregate Formation: A Mechanism for C Sequestration under No-Tillage Agriculture. Soil Biol. Biochem. 2000, 32, 2099–2103. [Google Scholar] [CrossRef]
Bird, S.B.; Herrick, J.E.; Wander, M.M.; Wright, S.F. Spatial Heterogeneity of Aggregate Stability and Soil Carbon in Semi-Arid Rangeland. Environ. Pollut. 2002, 116, 445–455. [Google Scholar] [CrossRef]
Gavili, E.; Moosavi, A.A.; Moradi Choghamarani, F. Cattle Manure Biochar Potential for Ameliorating Soil Physical Characteristics and Spinach Response under Drought. Arch. Agron. Soil Sci. 2018, 64, 1714–1727. [Google Scholar] [CrossRef]
Okolo, C.C.; Gebresamuel, G.; Zenebe, A.; Haile, M.; Eze, P.N. Accumulation of Organic Carbon in Various Soil Aggregate Sizes under Different Land Use Systems in a Semi-Arid Environment. Agric. Ecosyst. Environ. 2020, 297, 106924. [Google Scholar] [CrossRef]
Mozaffari, H.; Akbar Moosavi, A.; Ostovari, Y.; Cornelis, W. Comparing Visible-Near-Infrared Spectroscopy with Classical Regression Pedotransfer Functions for Predicting Near-Saturated and Saturated Hydraulic Conductivity of Calcareous Soils. J. Hydrol. 2022, 613, 128412. [Google Scholar] [CrossRef]
Zahedifar, M. Assessing Alteration of Soil Quality, Degradation, and Resistance Indices under Different Land Uses through Network and Factor Analysis. CATENA 2023, 222, 106807. [Google Scholar] [CrossRef]
Khosravani, P.; Baghernejad, M.; Moosavi, A.A.; FallahShamsi, S.R. Digital Mapping to Extrapolate the Selected Soil Fertility Attributes in Calcareous Soils of a Semiarid Region in Iran. J. Soils Sediments 2023, 23, 4032–4054. [Google Scholar] [CrossRef]
Wagner, S.; Cattle, S.R.; Scholten, T. Soil-aggregate Formation as Influenced by Clay Content and Organic-matter Amendment. Z. Pflanzenernähr. Bodenk. 2007, 170, 173–180. [Google Scholar] [CrossRef]
Riveras-Muñoz, N.; Seitz, S.; Witzgall, K.; Rodríguez, V.; Kühn, P.; Mueller, C.W.; Oses, R.; Seguel, O.; Wagner, D.; Scholten, T. Biocrust-Linked Changes in Soil Aggregate Stability along a Climatic Gradient in the Chilean Coastal Range. Soil 2022, 8, 717–731. [Google Scholar] [CrossRef]
Minasny, B.; McBratney, A.B.; Lark, R.M. Digital Soil Mapping Technologies for Countries with Sparse Data Infrastructures. In Digital Soil Mapping with Limited Data; Hartemink, A.E., McBratney, A., Mendonça-Santos, M.D.L., Eds.; Springer: Dordrecht, The Netherlands, 2008; pp. 15–30. ISBN 978-1-4020-8591-8. [Google Scholar]
Minasny, B.; McBratney, A.B. Digital Soil Mapping: A Brief History and Some Lessons. Geoderma 2016, 264, 301–311. [Google Scholar] [CrossRef]
Bouslihim, Y.; Rochdi, A.; Aboutayeb, R.; El Amrani-Paaza, N.; Miftah, A.; Hssaini, L. Soil Aggregate Stability Mapping Using Remote Sensing and GIS-Based Machine Learning Technique. Front. Earth Sci. 2021, 9, 748859. [Google Scholar] [CrossRef]
Lagacherie, P.; McBratney, A.B. Chapter 1 Spatial Soil Information Systems and Spatial Soil Inference Systems: Perspectives for Digital Soil Mapping. In Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2006; Volume 31, pp. 3–22. ISBN 978-0-444-52958-9. [Google Scholar]
Khanifar, J.; Khademalrasoul, A.; Amerikhah, H. Modelling of soil aggregate stability as an index of soil erodibility using geomorphometric parameters. Agric. Eng. 2020, 43, 49–64. [Google Scholar] [CrossRef]
Zeraatpisheh, M.; Ayoubi, S.; Mirbagheri, Z.; Mosaddeghi, M.R.; Xu, M. Spatial Prediction of Soil Aggregate Stability and Soil Organic Carbon in Aggregate Fractions Using Machine Learning Algorithms and Environmental Variables. Geoderma Reg. 2021, 27, e00440. [Google Scholar] [CrossRef]
Khaledian, Y.; Miller, B.A. Selecting Appropriate Machine Learning Methods for Digital Soil Mapping. Appl. Math. Model. 2020, 81, 401–418. [Google Scholar] [CrossRef]
Browning, D.M.; Duniway, M.C. Digital Soil Mapping in the Absence of Field Training Data: A Case Study Using Terrain Attributes and Semiautomated Soil Signature Derivation to Distinguish Ecological Potential. Appl. Environ. Soil Sci. 2011, 2011, 421904. [Google Scholar] [CrossRef]
Rezaei, M.; Mousavi, S.R.; Rahmani, A.; Zeraatpisheh, M.; Rahmati, M.; Pakparvar, M.; Jahandideh Mahjenabadi, V.A.; Seuntjens, P.; Cornelis, W. Incorporating Machine Learning Models and Remote Sensing to Assess the Spatial Distribution of Saturated Hydraulic Conductivity in a Light-Textured Soil. Comput. Electron. Agric. 2023, 209, 107821. [Google Scholar] [CrossRef]
Ayoubi, S.; Mokhtari Karchegani, P.; Mosaddeghi, M.R.; Honarjoo, N. Soil Aggregation and Organic Carbon as Affected by Topography and Land Use Change in Western Iran. Soil Tillage Res. 2012, 121, 18–26. [Google Scholar] [CrossRef]
Ye, L.; Tan, W.; Fang, L.; Ji, L.; Deng, H. Spatial Analysis of Soil Aggregate Stability in a Small Catchment of the Loess Plateau, China: I. Spatial Variability. Soil Tillage Res. 2018, 179, 71–81. [Google Scholar] [CrossRef]
Koppen, W. Das Geographische System de Klimate. In Handbuch der Klimatologie; Gebruder Borntraeger: Berlin, Germany, 1936. [Google Scholar]
Khosravani, P.; Baghernejad, M.; Taghizadeh-Mehrjardi, R.; Mousavi, S.R.; Moosavi, A.A.; Fallah Shamsi, S.R.; Shokati, H.; Kebonye, N.M.; Scholten, T. Assessing the Role of Environmental Covariates and Pixel Size in Soil Property Prediction: A Comparative Study of Various Areas in Southwest Iran. Land 2024, 13, 1309. [Google Scholar] [CrossRef]
Van Wambeke, A.R. The Newhall Simulation Model for Estimating Soil Moisture and Temperature Regimes. Department of Crop and Soil Sciences, Cornell University: Ithaca, NY, USA, 2000. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Schmidt, K.; Toomanian, N.; Heung, B.; Behrens, T.; Mosavi, A.; Band, S.S.; Amirian-Chakan, A.; Fathabadi, A.; Scholten, T. Improving the Spatial Prediction of Soil Salinity in Arid Regions Using Wavelet Transformation and Support Vector Regression Models. Geoderma 2021, 383, 114793. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Emadi, M.; Cherati, A.; Heung, B.; Mosavi, A.; Scholten, T. Bio-Inspired Hybridization of Artificial Neural Networks: An Application for Mapping the Spatial Distribution of Soil Texture Fractions. Remote Sens. 2021, 13, 1025. [Google Scholar] [CrossRef]
Mousavi, S.R.; Sarmadian, F.; Angelini, M.E.; Bogaert, P.; Omid, M. Cause-Effect Relationships Using Structural Equation Modeling for Soil Properties in Arid and Semi-Arid Regions. CATENA 2023, 232, 107392. [Google Scholar] [CrossRef]
Olaya, V.; Conrad, O. Chapter 12 Geomorphometry in SAGA. In Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2009; Volume 33, pp. 293–308. ISBN 978-0-12-374345-9. [Google Scholar]
Kuhn, M.; Johnson, K. A Short Tour of the Predictive Modeling Process. In Applied Predictive Modeling; Springer: Berlin, Germany, 2013; pp. 19–26. [Google Scholar]
Bouslihim, Y.; John, K.; Miftah, A.; Azmi, R.; Aboutayeb, R.; Bouasria, A.; Razouk, R.; Hssaini, L. The Effect of Covariates on Soil Organic Matter and pH Variability: A Digital Soil Mapping Approach Using Random Forest Model. Ann. GIS 2024, 30, 215–232. [Google Scholar] [CrossRef]
Suleymanov, A.; Gabbasova, I.; Komissarov, M.; Suleymanov, R.; Garipov, T.; Tuktarova, I.; Belan, L. Random Forest Modeling of Soil Properties in Saline Semi-Arid Areas. Agriculture 2023, 13, 976. [Google Scholar] [CrossRef]
Van der Westhuizen, S.; Heuvelink, G.B.M.; Hofmeyr, D.P. Multivariate Random Forest for Digital Soil Mapping. Geoderma 2023, 431, 116365. [Google Scholar] [CrossRef]
Amin, M.E.S.; Abdelfattah, M.A.; Mohamed, E.S.; Nabil, M.; Belal, A.A.; Ahmed, S.; Samir, E.; Mahmoud, A.G. Sentinel-2 Satellite Imagery for Retrieving and Mapping Soil Properties Using Machine Learning. In Applications of Remote Sensing and GIS Based on an Innovative Vision, Proceedings of the First International Conference of Remote Sensing and Space Sciences Applications, Hurghada, Egypt, 8–10 December 2022; Gad, A.A., Elfiky, D., Negm, A., Elbeih, S., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 281–288. [Google Scholar]
Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
Mousavi, S.R.; Jahandideh Mahjenabadi, V.A.; Khoshru, B.; Rezaei, M. Spatial Prediction of Winter Wheat Yield Gap: Agro-Climatic Model and Machine Learning Approaches. Front. Plant Sci. 2024, 14, 1309171. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ließ, M.; Glaser, B.; Huwe, B. Uncertainty in the Spatial Prediction of Soil Texture. Geoderma 2012, 170, 70–79. [Google Scholar] [CrossRef]
Hengl, T.; Heuvelink, G.B.M.; Kempen, B.; Leenaars, J.G.B.; Walsh, M.G.; Shepherd, K.D.; Sila, A.; MacMillan, R.A.; Mendes De Jesus, J.; Tamene, L.; et al. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions. PLoS ONE 2015, 10, e0125814. [Google Scholar] [CrossRef]
Wang, C.; Shi, Y.; Fan, X.; Shao, M. Attribute Reduction Based on K-Nearest Neighborhood Rough Sets. Int. J. Approx. Reason. 2019, 106, 18–31. [Google Scholar] [CrossRef]
Mousavi, S.R.; Sarmadian, F.; Omid, M.; Bogaert, P. Three-Dimensional Mapping of Soil Organic Carbon Using Soil and Environmental Covariates in an Arid and Semi-Arid Region of Iran. Measurement 2022, 201, 111706. [Google Scholar] [CrossRef]
Hazelton, P.; Murphy, B. Interpreting Soil Test Results: What Do All the Numbers Mean? 2nd ed.; CSIRO Publishing: Melbourne, VIC, Australia, 2007. [Google Scholar]
Behrens, T.; Zhu, A.-X.; Schmidt, K.; Scholten, T. Multi-Scale Digital Terrain Analysis and Feature Selection for Digital Soil Mapping. Geoderma 2010, 155, 175–185. [Google Scholar] [CrossRef]
Tucker, G.E.; Slingerland, R. Drainage Basin Responses to Climate Change. Water Resour. Res. 1997, 33, 2031–2047. [Google Scholar] [CrossRef]
Schumm, S.A. The Fluvial System. In Applied Fluvial Geomorphology; Hails, J.R., Ed.; Wiley Interscience: New York, NY, USA, 1977. [Google Scholar]
Parker, R.S. Experimental Study of Drainage Basin Evolution and Its Hydrologic Implications; Colorado State University: Fort Collins, CO, USA, 1977; Volume 90. [Google Scholar]
Fu, P.; Rich, P.M. A Geometric Solar Radiation Model with Applications in Agriculture and Forestry. Comput. Electron. Agric. 2002, 37, 25–35. [Google Scholar] [CrossRef]
Schillaci, C.; Acutis, M.; Lombardo, L.; Lipani, A.; Fantappiè, M.; Märker, M.; Saia, S. Spatio-Temporal Topsoil Organic Carbon Mapping of a Semi-Arid Mediterranean Region: The Role of Land Use, Soil Texture, Topographic Indices and the Influence of Remote Sensing Data to Modelling. Sci. Total Environ. 2017, 601–602, 821–832. [Google Scholar] [CrossRef] [PubMed]
Sabetizade, M.; Gorji, M.; Roudier, P.; Zolfaghari, A.A.; Keshavarzi, A. Combination of MIR Spectroscopy and Environmental Covariates to Predict Soil Organic Carbon in a Semi-Arid Region. CATENA 2021, 196, 104844. [Google Scholar] [CrossRef]
Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef]
Mondal, A.; Khare, D.; Kundu, S.; Mondal, S.; Mukherjee, S.; Mukhopadhyay, A. Spatial Soil Organic Carbon (SOC) Prediction by Regression Kriging Using Remote Sensing Data. Egypt. J. Remote Sens. Space Sci. 2017, 20, 61–70. [Google Scholar] [CrossRef]
Guevara, M.; Arroyo, C.; Brunsell, N.; Cruz, C.O.; Domke, G.; Equihua, J.; Etchevers, J.; Hayes, D.; Hengl, T.; Ibelles, A.; et al. Soil Organic Carbon Across Mexico and the Conterminous United States (1991–2010). Glob. Biogeochem. Cycles 2020, 34, e2019GB006219. [Google Scholar] [CrossRef]
Taghizadeh-Mehrjardi, R.; Khademi, H.; Khayamim, F.; Zeraatpisheh, M.; Heung, B.; Scholten, T. A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties. Remote Sens. 2022, 14, 472. [Google Scholar] [CrossRef]
Stavi, I.; Lal, R. Variability of Soil Physical Quality in Uneroded, Eroded, and Depositional Cropland Sites. Geomorphology 2011, 125, 85–91. [Google Scholar] [CrossRef]
Zhang, P.; Wang, Y.; Xu, L.; Li, R.; Sun, H.; Zhou, J. Factors Controlling Spatial Variation in Soil Aggregate Stability in a Semi-Humid Watershed. Soil Tillage Res. 2021, 214, 105187. [Google Scholar] [CrossRef]
Zhang, W.-C.; Wu, W.; Li, J.-W.; Liu, H.-B. Climate and Topography Controls on Soil Water-Stable Aggregates at Regional Scale: Independent and Interactive Effects. CATENA 2023, 228, 107170. [Google Scholar] [CrossRef]
Celik, I. Land-Use Effects on Organic Matter and Physical Properties of Soil in a Southern Mediterranean Highland of Turkey. Soil Tillage Res. 2005, 83, 270–277. [Google Scholar] [CrossRef]
Eichenberg, D.; Pietsch, K.; Meister, C.; Ding, W.; Yu, M.; Wirth, C. The Effect of Microclimate on Wood Decay Is Indirectly Altered by Tree Species Diversity in a Litterbag Study. J. Plant Ecol. 2017, 10, 170–178. [Google Scholar] [CrossRef]
Behrens, T.; Schmidt, K.; MacMillan, R.A.; Viscarra Rossel, R.A. Multiscale Contextual Spatial Modelling with the Gaussian Scale Space. Geoderma 2018, 310, 128–137. [Google Scholar] [CrossRef]
Nsabimana, G.; Bao, Y.; He, X.; Nambajimana, J.D.D.; Wang, M.; Yang, L.; Li, J.; Zhang, S.; Khurram, D. Impacts of Water Level Fluctuations on Soil Aggregate Stability in the Three Gorges Reservoir, China. Sustainability 2020, 12, 9107. [Google Scholar] [CrossRef]
Millard, K.; Richardson, M. On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sens. 2015, 7, 8489–8515. [Google Scholar] [CrossRef]
Arabameri, A.; Pradhan, B.; Pourghasemi, H.R.; Rezaei, K.; Kerle, N. Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms. Appl. Sci. 2018, 8, 1369. [Google Scholar] [CrossRef]
Ostovari, Y.; Moosavi, A.A.; Mozaffari, H.; Poppiel, R.R.; Tayebi, M.; Demattê, J.A.M. Soil Erodibility and Its Influential Factors in the Middle East. In Computers in Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2022; pp. 441–454. ISBN 978-0-323-89861-4. [Google Scholar]
Zhang, W.-C.; Wu, W.; Liu, H.-B. Planting Year- and Climate-Controlled Soil Aggregate Stability and Soil Fertility in the Karst Region of Southwest China. Agronomy 2023, 13, 2962. [Google Scholar] [CrossRef]
Mikha, M.M.; Green, T.R.; Untiedt, T.J.; Hergret, G.W. Land Management Affects Soil Structural Stability: Multi-Index Principal Component Analyses of Treatment Interactions. Soil Tillage Res. 2024, 235, 105890. [Google Scholar] [CrossRef]
Spohn, M.; Giani, L. Impacts of Land Use Change on Soil Aggregation and Aggregate Stabilizing Compounds as Dependent on Time. Soil Biol. Biochem. 2011, 43, 1081–1088. [Google Scholar] [CrossRef]
Six, J.; Bossuyt, H.; Degryze, S.; Denef, K. A History of Research on the Link between (Micro)Aggregates, Soil Biota, and Soil Organic Matter Dynamics. Soil Tillage Res. 2004, 79, 7–31. [Google Scholar] [CrossRef]
Ayuke, F.O.; Brussaard, L.; Vanlauwe, B.; Six, J.; Lelei, D.K.; Kibunja, C.N.; Pulleman, M.M. Soil Fertility Management: Impacts on Soil Macrofauna, Soil Aggregation and Soil Organic Matter Allocation. Appl. Soil Ecol. 2011, 48, 53–62. [Google Scholar] [CrossRef]
Milne, R.M.; Haynes, R.J. Soil Organic Matter, Microbial Properties, and Aggregate Stability under Annual and Perennial Pastures. Biol. Fertil. Soils 2004, 39, 172–178. [Google Scholar] [CrossRef]
Baltensweiler, A.; Walthert, L.; Hanewinkel, M.; Zimmermann, S.; Nussbaum, M. Machine Learning Based Soil Maps for a Wide Range of Soil Properties for the Forested Area of Switzerland. Geoderma Reg. 2021, 27, e00437. [Google Scholar] [CrossRef]
Shahini Shamsabadi, M.; Esfandiarpour-Broujeni, I.S.A.; Mosleh, Z.; Shirani, H.; Salehi, M.H. Error and Uncertainty Analysis in the Preparation of Thematic Maps using Artificial Neural Network and Environmental Data (A Case Study: Digital Soil Map of Shahrekord Plain). Geog. Environ. Plan. 2019, 30, 23–36. [Google Scholar] [CrossRef]
Fathololoumi, S.; Vaezi, A.R.; Alavipanah, S.K.; Ghorbani, A.; Saurette, D.; Biswas, A. Improved Digital Soil Mapping with Multitemporal Remotely Sensed Satellite Data Fusion: A Case Study in Iran. Sci. Total Environ. 2020, 721, 137703. [Google Scholar] [CrossRef]
Rau, K.; Eggensperger, K.; Schneider, F.; Hennig, P.; Scholten, T. How Can We Quantify, Explain, and Apply the Uncertainty of Complex Soil Maps Predicted with Neural Networks? Sci. Total Environ. 2024, 944, 173720. [Google Scholar] [CrossRef]
Fathizad, H.; Taghizadeh-Mehrjardi, R.; Hakimzadeh Ardakani, M.A.; Zeraatpisheh, M.; Heung, B.; Scholten, T. Spatiotemporal Assessment of Soil Organic Carbon Change Using Machine-Learning in Arid Regions. Agronomy 2022, 12, 628. [Google Scholar] [CrossRef]
Montgomery, D.R.; Dietrich, W.E. Channel initiation and the problem of landscape scale. Science 1992, 255, 826–830. [Google Scholar] [CrossRef]
Strahler, A.N. Quantitative analysis of watershed geomorphology. Eos Trans. Am. Geophys. Union 1957, 38, 913–920. [Google Scholar] [CrossRef]
Schmidt, J.; Hewitt, A. Fuzzy land element classification from DTMs based on geometry and terrain position. Geoderma 2004, 121, 243–256. [Google Scholar] [CrossRef]
Wilson, J.P.; Gallant, J.C. Secondary Topographic Attributes. Terrain Analysis: Principles and Applications. John Wiley & Sons: New York, NY, USA, 2000; pp. 87–131. [Google Scholar]
PA 19428-2959; Committee D18 on Soil and Rock Subcommittee D18.04 on Hydrologic Properties and Hydraulic Barriers. Research Report: D18-1018. ASTM: West Conshohocken, PA, USA, 2010.
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Gao, B.C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]

Figure 1. Workflow of predicting the mean weight diameter (MWD) and geometric mean diameter (GMD) of soil aggregates of topsoil (0–30 cm). DEM (digital elevation models), RS (remote sensing), MWD (mean weight diameter), GMD (geometric mean diameter), R² (coefficient of determination), RMSE (root mean square error), NRMSE (normalized root mean squared error), RF (random forest), k-NN (k-nearest neighbor), RFE (recursive feature elimination). CNBL: Channel Network Base Level, SH: Standardized Height, Diff: Diffuse Insolation, WB: Watershed Basins, VD: Valley Depth, Green: Greenness, SAVI: Soil Adjusted Vegetation Index, NRVI: Normalized Ratio Vegetation Index, DVI: Difference Vegetation Index, TVI: Transformed Vegetation Index, PL: Plastic Limit, SL: Shrinkage Limit, LL: Liquid Limit, PI: Plasticity Index, BD: Bulk density.

Figure 2. Location of the Fars province in Iran (a), the Lapuee Plain (b), and soil sample points across the study area.

Figure 3. Results of ranking the environmental variable’s importance (%) for prediction of soil aggregate stability (SAS) indices (mean weight diameter (MWD), and geometric mean diameter (GMD)) using random forest (RF) model. (a,c) related to scenario S1 (including RS indices, topographic, and soil covariates) and (b,d) related to scenario S2 (including RS indices and topographic covariates). CNBL: Channel Network Base Level, SH: Standardized Height, Diff: Diffuse Insolation, WB: Watershed Basins, VD: Valley Depth, Green: Greenness, SAVI: Soil Adjusted Vegetation Index, NRVI: Normalized Ratio Vegetation Index, DVI: Difference Vegetation Index, TVI: Transformed Vegetation Index, PL: Plastic Limit, SL: Shrinkage Limit, LL: Liquid Limit, PI: Plasticity Index, BD: Bulk density.

Figure 4. Results of ranking the environmental variable’s importance (%) for prediction of soil aggregate stability (SAS) indices (mean weight diameter (MWD), and geometric mean diameter (GMD)) using k-nearest neighbor (k-NN) model. (a,c) related to scenario S1 (including RS indices, topographic, and soil covariates) and (b,d) related to scenario S2 (including RS indices and topographic covariates). CNBL: Channel Network Base Level, SH: Standardized Height, Diff: Diffuse Insolation, WB: Watershed Basins, VD: Valley Depth, Green: Greenness, SAVI: Soil Adjusted Vegetation Index, NRVI: Normalized Ratio Vegetation Index, DVI: Difference Vegetation Index, TVI: Transformed Vegetation Index, PL: Plastic Limit, SL: Shrinkage Limit, LL: Liquid Limit, PI: Plasticity Index, BD: Bulk density.

Figure 5. Spatial variability maps of soil aggregate stability (SAS) indices (mean weight diameter (MWD) and geometric mean diameter (GMD)) derived from random forest (RF) and k-nearest neighbor (k-NN) models based on the two applied scenarios: (a) Scenario S1 (including RS indices, topographic and soil covariates) and (b) scenario S2 (including RS indices and topographic covariates).

Figure 6. Frequency distribution (histograms) of the predicted (mean weight diameter (a) (MWD), and geometric mean diameter (b) (GMD)) across the whole study region. The vertical dashed red line represents the mean values.

Figure 7. The uncertainty maps of mean weight diameter (MWD) of soil aggregates using the random forest (RF) model for two scenarios: (a) scenario S1 (including RS indices, topographic and soil covariates) and (b) scenario S2 (including RS indices and topographic covariates).

Figure 8. The uncertainty maps of geometric mean diameter (GMD) of soil aggregates using the random forest (RF) model for two scenarios: (a) scenario S1 (including RS indices, topographic and soil covariates) and (b) scenario S2 (including RS indices and topographic covariates).

Table 1. The classification of the mean weight diameter (MWD) of soil aggregates according to their stability and crustability classifications [8].

Class	MWD (mm)	Stability	Crustability
1	<0.4	Very unstable	Systematic crust formation
2	0.4–0.8	Unstable	Crusting frequent
3	0.8–1.3	Medium	Crusting moderate
4	1.3–2.0	Stable	Crusting rare
5	>2.0	Very stable	No crusting

Table 2. Summary statistics of soil aggregate stability (SAS) indices along with the other soil covariates for all datasets (n = 129 points).

Aggregate Stability Indices	Unit	Min	Max	Average	SD	CV (%)
MWD	Mm	0.24	2.25	1.64	1.01	61.5
GMD	Mm	0.009	0.81	0.48	0.15	31.2
BD	g·cm⁻³	0.36	1.95	1.14	0.46	40.3
SL	%	10.1	45	23.2	7.59	32.6
PL		18.6	49.2	39.4	6.04	15.3
LL		30	73	53.1	9.28	17.4
PI		2.4	31.9	14.2	5.44	38.1

Note: MWD: Mean weight diameter; GMD: Geometric mean diameter; Min: Minimum; Max: Maximum; SD: Standard deviation; CV%: coefficient of variation percentage, SL: Shrinkage Limit, PL: Plastic Limit, LL: Liquid Limit, PI: Plasticity Index.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khosravani, P.; Moosavi, A.A.; Baghernejad, M.; Kebonye, N.M.; Mousavi, S.R.; Scholten, T. Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region. Remote Sens. 2024, 16, 4304. https://doi.org/10.3390/rs16224304

AMA Style

Khosravani P, Moosavi AA, Baghernejad M, Kebonye NM, Mousavi SR, Scholten T. Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region. Remote Sensing. 2024; 16(22):4304. https://doi.org/10.3390/rs16224304

Chicago/Turabian Style

Khosravani, Pegah, Ali Akbar Moosavi, Majid Baghernejad, Ndiye M. Kebonye, Seyed Roohollah Mousavi, and Thomas Scholten. 2024. "Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region" Remote Sensing 16, no. 22: 4304. https://doi.org/10.3390/rs16224304

APA Style

Khosravani, P., Moosavi, A. A., Baghernejad, M., Kebonye, N. M., Mousavi, S. R., & Scholten, T. (2024). Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region. Remote Sensing, 16(22), 4304. https://doi.org/10.3390/rs16224304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Enhances Soil Aggregate Stability Mapping for Effective Land Management in a Semi-Arid Region

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area Description

2.2. Field Study and Laboratory Analysis

2.3. Environmental Covariates

2.4. Environmental Covariate Selection for Modeling Purposes

2.5. Modeling and Mapping Aspects

3. Results and Discussion

3.1. Summary Statistics of Soil Variables and SAS Indices

3.2. The Importance of Environmental Covariates and Modeling Performance

3.3. Spatial Prediction and Uncertainty Maps

3.4. Strengths and Limitations of the Research

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI