Next Article in Journal
Micro-Incubator Protocol for Testing a CO2 Sensor for Early Warning of Spontaneous Combustion
Previous Article in Journal
Various Cultivars of Citrus Fruits: Effects of Construction on Gas Diffusion Resistance and Internal Gas Concentration of Oxygen and Carbon Dioxide
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning to Propose a Qualitative Classification of Risk of Soil Erosion

by
Dione Pereira Cardoso
1,
Paulo Cesar Ossani
2,
Marcelo Angelo Cirillo
3,
Marx Leandro Naves Silva
1 and
Junior Cesar Avanzi
1,*
1
Department of Soil Science, Federal University of Lavras, Lavras 37203-202, Brazil
2
Department of Statistics, State University of Maringá, Maringá 87020-900, Brazil
3
Department of Statistics, Federal University of Lavras, Lavras 37203-202, Brazil
*
Author to whom correspondence should be addressed.
AgriEngineering 2024, 6(4), 4280-4293; https://doi.org/10.3390/agriengineering6040241
Submission received: 7 October 2024 / Revised: 1 November 2024 / Accepted: 11 November 2024 / Published: 14 November 2024

Abstract

:
Soil loss compromises ecosystem services essential for sustainable development, necessitating effective strategies to identify priority areas for conservation practices aimed at reducing soil erosion. Current methods often rely on literature-based classification, which can be subjective. This study explores the use of artificial intelligence techniques to enhance the objectivity and efficiency of qualitative classifications for soil erosion risk. Accordingly, the aims were to apply Machine Learning methods, specifically cluster analysis, to categorize soil erosion risk in the Peixe Angical Basin, in addition to using a discriminant analysis to propose a discriminant classifier vectors for current and future predictions of soil loss risks. Our database consisted of pixel-based data on the R, K, LS, and C factors. These input data were linked to soil losses (output data), which had been classified based on findings from studies conducted in a different basin. Following this, machine learning techniques were applied to analyze the data. The cluster analysis identified seven distinct erosion risk groups: slight, slight to moderate, moderate, moderate to severe, severe, very severe, and extremely severe. Additionally, discriminant analysis facilitated the development of seven predictive models for current and future soil erosion risk, streamlining the need of new soil erosion modeling and enhancing decision-making processes. We anticipate that this methodology can be applied to other basins, providing a more robust framework for assessing soil erosion risk without relying on arbitrary qualitative classification.

1. Introduction

Soil ecosystem services are essential for environmental sustainability, providing vital functions such as water purification, erosion control, and hydrological regulation. However, these services can be severely compromised by inappropriate anthropogenic activities. To maintain the integrity of these ecosystem services, it is necessary to identify priority areas for the implementation of soil conservation. A qualitative classification of soil erosion risk, based on soil erosion modeling, is necessary to select these priority areas for soil and water conservation. By analyzing the spatial distribution of soil loss values, we can delineate zones that require intervention to prevent or reduce surface runoff, thereby alleviating the impacts of soil erosion. Consequently, adopting a qualitative classification framework enhances decision-making by public managers, extension workers and farmers, promoting effective conservation efforts.
In the 1980s, the FAO [1] propose a quantitative and qualitative classification system for assessing erosion severity, which relied on specific standard erosion plots for soil loss determination. Currently, advancements in geographic information system (GIS) and machine learning techniques enable the efficient modeling of spatial distributions and the processing of extensive databases. The classification system established by the FAO [1] delineates four levels of soil loss: slight (0–10 Mg ha−1 year−1), moderate (10–50 Mg ha−1 year−1), high (50–200 Mg ha−1 year−1), and severe (>200 Mg ha−1 year−1). These quantitative thresholds were designed to guide decision-making processes, from the management of individual rural properties to the global initiatives, aimed at preventing and mitigating erosion impacts in priority areas.
Quantitative classification is important for facilitating comparison within specific areas or regions, thereby enhancing decision-making in prioritizing conservation efforts. This quantification of soil losses concurrently supports qualitative classification, providing a comprehensive understanding of soil erosion risk. To effectively assess the soil erosion risk across basins, regions, and countries, it is vital to recognize the varying intensity levels, ranging from slight to extremely severe. This spectrum of severity informs the development of targeted and effective conservation strategies.
Soil loss and its classifications are influenced by various modeling input factors, including rainfall erosivity, soil erodibility, slope length and steepness, cover management, and supporting practices. Among these, rainfall erosivity stands out as great influence in modeling [2]. However, the qualitative classification reported in the literature often exhibits diverse limit values, posing challenges for inter-basins comparisons. Several studies have modeled soil erosion for Brazilian conditions, some considering both classifications–quantitative and qualitative [3,4,5,6,7,8,9], while others focus solely on quantitative assessment [10,11,12,13,14,15,16,17]. This has resulted in a spectrum of soil erosion risks categories, ranging from three to nine classes, which hampers the ability to compare soil erosion risks across different regions. Consequently, achieving a standardized approach to soil erosion risk assessment remains a challenge in the field.
The quantitative classification of soil erosion exhibits considerable variability, influenced by several soil and climate factors. For example, Galdino et al. [18] defines high soil erosion as within the range of 10 to 40 Mg ha−1 year−1, whereas Salis et al. [19] establish higher thresholds, considering values above 25 Mg ha−1 year−1 or 20 Mg ha−1 year−1, respectively, as very high. Gomes et al. [20] categorizes soil losses exceeding 80 Mg ha−1 year−1 as very severe, whereas Avanzi et al. [21] and Batista et al. [15] elevate this threshold to over 100 Mg ha−1 year−1, labeling such losses as extremely severe. This inconsistency underscores the need for standardized criteria in soil loss classification.
Given this variability in classification thresholds, a standardized qualitative system would significantly enhance effectiveness and efficiency, allowing for more meaningful comparisons across different areas. Furthermore, such a system would improve the precision of soil conservation strategies, thereby supporting more informed and secure decision-making processes.
In the context of soil losses, it is important to recognize that each soil class has its own tolerance levels. Whitin a chronosequence, Oxisols, which represent the most weathered soil in the landscape [22], inherently have the highest tolerance for soil losses. In this sense, deeper and more weathered soils have a soil loss tolerance of approximately 12 Mg ha−1 year−1 [23]; thus, soil losses exceeding this threshold can be categorized as high erosion. However, since not all river basins contain the same soil classes, a standardized classification that accommodates various environments must be tailored accordingly. For effective decision-making, an ideal approach would involve a qualitative classification that accurately reflects the soil erosion risk specific to each context.
The qualitative classification system for soil erosion—comprising categories such as slight, slight to moderate, moderate, moderate to severe, severe, very severe, and extremely severe—should be uniformly applied across all assessed basins. However, the number of classes within this classification may vary depending on the unique characteristics of each area. Unfortunately, this classification is often implemented without prior analysis, largely due to challenges in accessing comprehensive database. For a public manager making decisions at the macro-regional level, the lack of a standardized qualitative classification presents significant obstacles, leading to costly and biased decision-making. Therefore, establishing a standardized qualitative classification system is essential for facilitating more efficient and impartial decision-making across diverse geographical regions.
This study aimed to apply machine learning techniques, specifically cluster analysis, to obtain the optimal number of groups for qualitatively classifying soil erosion risk in the Peixe Angical Basin. Additionally, we employed discriminant analysis to develop discriminant classification vectors for current and future predictions of soil erosion risks. Machine Learning algorithms enhance classification accuracy by identifying patterns within databases, even when the data points may seem unrelated. This approach streamlines decision-making for basin managers, facilitating more efficient soil and water conservation and promoting environmental sustainability. Importantly, the techniques adopted in this study are universally applicable, regardless of geographical location or size, highlighting the versatility of machine learning in addressing erosion risk assessment and management challenges across diverse contexts.

2. Materials and Methods

2.1. Study Area

The Peixe Angical Reservoir Drainage Basin (PARDB) is formed by the Tocantins River and its tributaries, which include the Palma, Paranã, Maranhão, and Almas rivers. The basin covers an area of 125,776.82 km2, with 0.62% located in the Federal District, 75.58% in the state of Goiás, and 23.80% in the state of Tocantins. Geographically, PARDB spans latitudes between 11° and 17° Southand longitudes from 47° to 51° West (Figure 1). The region’s climate is classified as Aw –tropical with dry winters [24], according to Köppen’s system. The altitude within the basin ranges from 235 m to 1674 m, as indicated by the elevation model [25]. The dominant soil class in the area is Litholic Neosol, although other soil classes such as Latosol, Gleisols, and Argisols are also present [26]. Pasture is the predominant land use across the PARDB landscape [26].

2.2. Data Description

Soil erosion modeling was performed using the InVEST SDR (Sediment Delivery Ratio) module, version 3.14.0 Workbench (Windows), which calculated soil loss based on USLE input parameters (Figure 2). These parameters included rainfall erosivity (R, MJ mm ha−1 h−1 yr−1), soil erodibility (K, Mg h MJ−1 mm−1), slope length-gradient factor (LS, dimensionless), cover-management factor (C, dimensionless) and support practice factor (P, dimensionless). The methodology to obtaining these parameters is detailed in Cardoso et al. [26]. The input rasters of the Peixe Angical Basin had a spatial resolution of 30 m, enabling a detailed assessment of soil erosion risk across the basin’s varied landscape features.
Table 1 illustrates how the data were structured, considering the 50,108 observations. The values for rainfall erosivity (R Factor), soil erodibility (K Factor), slope length and steepness (LS Factor), and cover-management (C Factor) were ordered from the lowest to the highest erosion risk class.
Soil loss (A) was classified into seven categories, as proposed by Avanzi et al. [21], with the following thresholds: 1—slight (0.0–2.5 Mg ha−1 yr−1), 2—slight/moderate (2.5–5.0 Mg ha−1 yr−1), 3—moderate (5.0–10.0 Mg ha−1 yr−1), 4—moderate/high (10.0–15.0 Mg ha−1 yr−1), 5—high (15.0–25.0 Mg ha−1 yr−1), 6—very high (25.0–100.00 Mg ha−1 yr−1), and 7—extremely high (>100.0 Mg ha−1 yr−1). In Brazil, an earlier classification applied to the Rio Grande basin in southern Minas Gerais followed similar intervals, including higher ranges, such as 100–500 Mg ha−1 yr−1 and >500 Mg ha−1 yr−1 [17]. To improve alignment with the diverse characteristics of different basins, we propose a new approach that uses cluster and discriminant analysis for a qualitative classification of soil erosion risk. Although these techniques are widely used in other fields, their application in soil erosion modeling is novel, especially for classifying soil erosion risk qualitatively.

2.3. Cluster

Cluster analysis is a statistical technique used to make group observations based on their shared characteristics, aiming to identify similarities within each group, while ensuring distinct differences between groups. This method can be performed using two main approaches: hierarchical and non-hierarchical.
Hierarchical clustering organizes observations in a hierarchy, typically visualized using a dendrogram. The similarity or dissimilarity between observations is determined using various measures, guiding the formation of clusters based on their relative closeness in the dataset.
Dissimilarity measures represent the distance between data points and can be calculated using various methods, such as Euclidean, Manhattan, Minkowski, Canberra, and Maximum distances. Each method influences the resulting clustering by grouping data based on these calculated distances. Common clustering techniques include Single Linkage, Complete Linkage, Average Linkage, Centroid method, Median, Ward, and McQuitty method. Conversely, similarity measures, which complement dissimilarity, assess how assess how closely data points resemble one [27].
Non-hierarchical methods directly partition n elements into k groups (clusters) without following a hierarchical structure, requiring the user to specify the desired number of clusters. These methods are iterative and particularly effective for analyzing large datasets. Common non-hierarchical algorithms include k-means and k-medoid, which group observations into user-defined clusters without categorizing any elements as noise. K-means uses centroids, while k-medoid utilizes medoids, typically employing Euclidean distance for comparison, though other distances measures can also be applied. Both algorithms are non-deterministic due to their reliance on random initial points. In this work, the k-means method was chosen for creating the groupings.
Hierarchical methods are often used initially to identify the optimal number of clusters (k), which can then be refined using non-hierarchical methods. The latter are favored for their perceived accuracy in cluster formation, enhancing the results obtained from hierarchical clustering.

2.4. Projection Pursuit

This technique is used for exploratory analysis of multivariate data, seeking low-dimensional linear projections on high-dimensional data. These projections are obtained by optimizing an objective function known as the projection index [28]. To investigate cluster formation in high-dimensional space, the Penalized Discriminant Analysis (PDA) index was applied, taking into account the number of variables (Table 1).
The PDA index builds on the Linear Discriminant Analysis (LDA) index [29] and was developed to solve the problems with many highly correlated predictors [30,31]. This index is defined for one or more dimensions, as described in Equation (1)
P I p d a A = 1 A T 1 λ H + n λ I p A A T 1 λ H + B + n λ I p A  
where I p is the order identity matrix, p and λ 0,1 are a parameter to be estimated. If λ = 0 , the PDA index is the same as that of LDA [31].
The kurtosis index, defined by Pena and Prieto [32], is used for identifying clusters in multivariate data, using information from univariate projections of the sample data in certain directions. The selected directions aim to either minimize or maximize the kurtosis coefficient. Maximizing this index favors the detection of outliers, while minimizing favors the detection of clusters. The index is defined for one dimension by Equation (2).
P I C u r t o s e A = n 1 2 i = 1 n y i y ¯ 4 n i = 1 n y i y ¯ 2 2
In this work, the projection pursuit technique was employed as an exploratory method to analyze n-dimensional data, allowing for a visual assessment of class separation. Out of 50,108 observations, 70% (35,130) were used for model training, while the remaining 30% (14,978) served as the test set to evaluate accuracy in the discriminant analysis technique. In contrast, all observations were utilized for the cluster analysis. The results were obtained using the MVar 2.2.1 package [33] in the R software [34], version 4.3.3.

3. Results

3.1. Visual Analysis of Soil Erosion Modeling Input Data and Detection of Possible Outliers

As shown in Figure 3 and Figure 4, projection pursuit was employed to reduce the dimensionality of the input and output data for hydrological modeling. The input factors included rainfall erosivity (R), soil erodibility (K), topographic factor (LS), and cover-management factor (C), while the output was represented by the qualitative classifications for the years 1990, 2000, 2010, and 2017. The application of projection pursuit, utilizing both minimum and maximum kurtosis indices for the data from these years, revealed similar patterns regarding the presence of outliers and clusters, as illustrated in Figure 3 and Figure 4. Given the consistent behavior across these years, the analyses of cluster and discriminant were focused on the year 2017, as the results were similar for all years.
Figure 3 demonstrates a clear separation of data points reduced to two dimensions, with minimal overlap between the years. This indicates intra-class consistency, meaning the elements within each year are highly similar to one another, yet distinctly different from those in other years. However, the grouping behavior remains notably similar across the years, as reflected in the comparable shapes observed in each year’s projection.
In Figure 4, the two dimensions reduction in data reveals a clear separation between the years, like the partners observed in Figure 3. The analysis of data for each year shows distinct grouping, with notable outliers appearing in similar patterns across different periods. Due to this consistent behavior over time, the decision was made to focus the clustering, and discriminant analyzes exclusively on the 2017 data.

3.2. Proposals for Soil Erosion Risk Classes

Figure 5 illustrates the formation of soil erosion risk classes using cluster analysis techniques. By applying hierarchical cluster analysis to the 2017 data, with Minkowsk distance (lambda = 3) and with the Ward method, a cutoff point around 50,000 was used to generate the dendrogram presented in Figure 5.
The non-hierarchical k-means method was applied with k = 7, generating seven distinct groups (classes) with well-defined characteristics, as shown in Figure 4, which was generated using the projection pursuit technique and the PDA index. These results align with the same number of classes observed for a sub-basin in the state of Espírito Santo [21]. However, for other locations, it is important to apply the technique to determine the appropriate number of classes. The distribution of elements across the new classes is presented in Table 2. Notably, class 5 and class 7 have the largest and smallest numbers of observations, representing 21.88% (qualitative classification: High) and 5.04% (qualitative classification: Slight), respectively.
The distribution of the new erosion risk classes, without overlapping, is shown in Figure 6. As expected in a real-world field scenario, soil loss is assigned exclusively to a single class, since it would be impossible to categorize an area simultaneously as both “very high” and “extremely high” risk, for example.
Table 3 presents the minimum and maximum values for each soil erosion risk class based on the input factors used in hydrological modeling (R, K, LS, and C). A descriptive analysis of each group shows that rainfall erosivity plays a decisive role in classifying soil losses compared to other factors (soil erodibility, topographic factor and cover-management factor). Every class contained at least one pixel with forest formation (C factor = 0.001) and one pixel with crop formation (C factor = 0.50). However, they are not isolated pixels but rather continuous features, indicating that such land cover type can occur across different soil classes. In contrast with the development of soil erodibility and rainfall erosivity maps, these factors may be more distinct.
Using the 35,130 observations for training, discriminant analysis was applied to the formed groups, resulting in the following discriminant vectors for each class.
C 1 = 0.24820 R 0.01557 K 0.00066 L S 2.12334 C 915.44461 C 2 = 0.30467 R 0.03562 K 0.00006 L S 4.85647 C 1378.20518 C 3 = 0.32143 R 0.04508 K 0.00157 L S 6.14642 C 1533.49969 C 4 = 0.38898 R 0.01583 K 0.00211 L S 2.15835 C 2247.10265 C 5 = 0.28734 R 0.02303 K + 0.00019 L S 3.13947 C 1226.51055 C 6 = 0.26993 R 0.02248 K + 0.00035 L S 3.06501 C 1082.35721 C 7 = 0.34303 R 0.05744 K 0.00094 L S 7.83168 C 1746.88104
The performance metrics for the classifiers, including sensitivity, specificity, ROC area, false positive rate (FP Rate), false negative rate (FN Rate), and F-score for each soil erosion risk class, are highlighted in Table 4. The sensitivity of the C3 classifier was 0.9989, the highest among all, while the lowest sensitivity was observed in C5 (0.9297), indicating that the C3 classifier had the best performance in correctly identify true positive cases. All classification models performed well, with the ROC area values exceeding 0.97 (Table 4). Additionally, both false positive and false negative rates were low, indicating a minimal chance that the classifier does not accurately represent reality. The F-score further confirms a good classifier performance, ranging from 0.9601 (C5) to 0.9972 (C1), demonstrating good qualitative classification regarding soil erosion risk. Overall, all classifiers showed excellent performance.
Table 5 shows the confusion matrix for the soil erosion risk classes and their corresponding accuracy proportions. Rainfall erosivity (R) emerged as the most important factor in defining the new classification. When the discriminant vectors were applied to the randomly selected 14,978 test observations, an impressive accuracy of 97.77% was achieved. These results are summarized in the confusion matrix presented in Table 5.
These discriminant vectors facilitate the classification of soil erosion risk into the following categories: slight, slight to moderate, moderate, moderate to severe, severe, very severe, and extremely severe.

3.3. Application of Discriminant Vectors for Soil Erosion Risk Class

Figure 7 illustrated the application of discriminant vectors in the qualitative classification to assess soil erosion risk in the Peixe Angical Basin. Analysis of the results indicates that the most significant soil loss is associated with the vector C4, indicating a moderate to severe soil erosion risk across the entire basin.
Therefore, it is expected that appropriate conservation practices will be adopted to reduce soil loss, ultimately leading to a shift toward classes with lower environmental impact. In principle, the application of these vectors can simulate the most effective practices for controlling erosion before they are implemented in the field, thereby contributing to both economic and environmental benefits.

4. Discussion

The pre-analysis and post-analysis using projection pursuit facilitate a visual examination of an extensive database, allowing for the identification of outliers and similarities. This analysis revealed that the qualitative classification proposed in the literature is not applicable to the Peixe Angical Basin, as there was significant overlap between the classes. Subsequently, after defining new classes based on cluster analysis, the visualization through projection pursuit clearly distinguished these classes without any points of intersection, which could lead to ambiguous interpretations of the results and potential underestimation or overestimation. Given this distinct behavior, the newly defined classes are more suitable for qualitatively representing soil loss in this basin.
The descriptive analysis of each group indicated that rainfall erosivity was a key factor in classifying soil losses, particularly in relation to other factors such as soil erodibility, topographic factor, and cover-management. These vectors facilitate the classification of soil erosion risk at various levels, providing a robust foundation for identifying priority areas for soil and water conservation practices within the Peixe Angical Basin. Notably, this classification approach eliminates the need for region-specific modeling, thereby streamlining the conservation planning process. This prior assessment enables the adoption of conservation practices and the reevaluation of land use strategies to mitigate the impacts of soil erosion, promoting the long-term sustainability of the basin. The capability to predict different soil management scenarios and their corresponding impacts allows for more efficient and cost-effective planning, minimizing resource waste while promoting the preservation of ecosystem services.
Before implementing any conservation measures, it is crucial to evaluate soil erosion risk using these vectors, as they provide valuable insights into soil erosion dynamics. This approach enables the identification of priority areas for soil conservation practices within the Peixe Angical Basin without the need for complex modeling. Additionally, it allows for future projections regarding potential changes in rainfall erosivity and land use. In contrast, the K and LS factors are expected to remain stable in the short and medium term.
The application of these vectors provides valuable insights for implementing soil conservation practices and planning future land use changes, thereby promoting sustainable management of the basin. Utilizing discriminant vectors enables proactive decision-making, allowing for the anticipation of optimal scenarios that minimize soil impact. This approach fosters more efficient planning, reducing resource expenditure while preserving ecosystem services. Moreover, it empowers managers and extension workers to make informed decisions in the short term, thereby enhancing the preservation and sustainability of the basin’s ecosystems.
To determine the soil erosion risk class of a specific area, the values of R, K, LS, and C must be substituted into the seven equations. The class associated with the highest resulting value indicates the soil erosion risk level for that pixel. For other study basins, it is essential to apply the same technique and evaluate the outcomes. This qualitative classification aligns with the proposal by Avanzi et al. [21]. Likewise, other studies conducted in Brazil have utilized the same number of classes [3,7,8]. However, some classifications differ from the findings in this study, presenting lower classes [4,5] and higher classes [16,17]. In the case of Peixe Angical Basin, the classification was based on a pre-analysis of the data, followed by the application of machine learning technique, providing a more robust scientific foundation for the qualitative classification of soil erosion risk.
Furthermore, the results derived from the discriminant vectors offer a strong foundation for decision-making by managers and extension works, contributing to the preservation of natural resources and the maintenance of environmental balance. Applying these vectors in other study basins is essential to assess their effectiveness in various contexts and to ensure the replicability of results. This qualitative approach aligns with prior proposals in the literature, reinforcing the credibility and significance of the findings.
This approach not only identifies priority areas for soil conservation practices but also provides valuable insights for planning and implementing sustainable soil management strategies. By offering a detailed understanding of the factors influencing soil erosion, these results significantly contribute to the promotion of environmentally responsible management and the preservation of ecosystems globally.

5. Conclusions

The application of the k-means method successfully identified seven distinct groups of soil erosion risk within the Peixe Angical Basin, each with unique characteristics.
Rainfall erosivity was found to be the most significant factor in determining this classification, achieving a high accuracy rate of 99.35% when discriminant vectors were applied to the selected test observations.
The qualitative classification, guided by discriminant analysis, indicated that the Peixe Angical Basin is at a moderate to severe soil erosion risk.
This approach—combining cluster analysis and discriminant analysis—provides a robust, accurate assessment of soil erosion risk, enabling more effective soil and water conservation measures.
The versatility of these techniques highlights their potential for application across diverse geographic regions, making them valuable tools for supporting conservation management practices in basins.

Author Contributions

Conceptualization, D.P.C., P.C.O. and M.A.C.; methodology, D.P.C., P.C.O. and M.A.C.; validation, D.P.C., P.C.O., M.A.C., M.L.N.S. and J.C.A.; formal analysis, D.P.C., P.C.O., M.A.C., M.L.N.S. and J.C.A.; investigation, D.P.C.; resources, D.P.C.; data curation, D.P.C.; writing—original draft preparation, D.P.C., P.C.O., M.A.C., M.L.N.S. and J.C.A.; writing—review and editing, D.P.C., P.C.O., M.A.C., M.L.N.S. and J.C.A.; visualization, D.P.C., P.C.O., M.A.C., M.L.N.S. and J.C.A.; supervision, J.C.A.; project administration, D.P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Council for Scientific and Technological Development—CNPq, number of the process 152652/2022-1.

Data Availability Statement

All data are available and open access.

Acknowledgments

D.P.C., J.C.A., and M.L.N.S. thank CNPq for grants number 152652/2022-1, 307059/2022-7, and 307950/2021-2, respectively.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Food and Agriculture Organization [FAO]; United Nations Environment Programme [PNUMA]; United Nations Educational, Scientific and Cultural Organization [UNESCO]. Metodología Provisional para la Evaluación de la Degradación de Los Suelos. Technical Report. 1981. Available online: https://catalogosiidca.csuca.org/Record/UCR.000142550/Description (accessed on 22 June 2023).
  2. Bertoni, J.; Lombardi Neto, F. Conservação do Solo, 10th ed.; Ícone: São Paulo, Brazil, 2017; 392p. [Google Scholar]
  3. Alves, W.S.; Martins, A.P.; Morais, W.A.; Pôssa, É.M.; Castro, R.M.; Moura, D.M.B. USLE modelling of soil loss in a Brazilian cerrado catchment. Remote Sens. Appl. Soc. Environ. 2022, 27, 100788. [Google Scholar] [CrossRef]
  4. Castro, R.M.; Alves, W.S.; Marcionilio, S.M.L.O.; Moura, D.M.B.; Oliveira, D.M.S. Soil losses related to land use and rainfall seasonality in a watershed in the Brazilian Cerrado. J. S. Am. Earth Sci. 2022, 119, 104020. [Google Scholar] [CrossRef]
  5. Cunha, E.R.; Santos, C.A.G.; Silva, R.M.; Panachuki, E.; Oliveira, P.T.S.; Oliveira, N.S.; Falcão, K.S. Assessment of current and future land use/cover changes in soil erosion in the Rio da Prata basin (Brazil). Sci. Total Environ. 2022, 818, 151811. [Google Scholar] [CrossRef] [PubMed]
  6. Botelho, T.H.; Jácomo, S.D.A.; Almeida, R.T.; Griebeler, N.P. Use of USLE/GIS technology for identifying criteria for monitoring soil erosion losses in agricultural areas. Eng. Agric. 2018, 38, 13–21. [Google Scholar] [CrossRef]
  7. Durães, M.F.; Coelho Filho, J.A.P.; Oliveira, V.A.D. Water erosion vulnerability and sediment delivery rate in upper Iguaçu river basin–Paraná. RBRH 2016, 21, 728–741. [Google Scholar] [CrossRef]
  8. Durães, M.F.; Mello, C.R.D. Distribuição espacial da erosão potencial e atual do solo na Bacia Hidrográfica do Rio Sapucaí, MG. Eng. Sanit. Ambient. 2016, 21, 677–685. [Google Scholar] [CrossRef]
  9. Miranda, R.B.; Scarpinella, G.D.A.; Silva, R.S.; Mauad, F.F. Water erosion in Brazil and in the world: A brief review. Mod. Environ. Sci. Eng. 2015, 1, 17–26. [Google Scholar] [CrossRef]
  10. Couto Júnior, A.A.; Conceição, F.T.; Fernandes, A.M.; Spatti Junior, E.P.; Lupinacci, C.M.; Moruzzi, R.B. Land use changes associated with the expansion of sugar cane crops and their influences on soil removal in a tropical watershed in São Paulo State (Brazil). Catena 2019, 172, 313–323. [Google Scholar] [CrossRef]
  11. Vieira, A.S.; Valle Junior, R.F.; Rodrigues, V.S.; Quinaia, T.L.S.; Mendes, R.G.; Valera, C.A.; Fernandes, L.F.S.; Pacheco, F.A.L. Estimating water erosion from the brightness index of orbital images: A framework for the prognosis of degraded pastures. Sci. Total Environ. 2021, 776, 146019. [Google Scholar] [CrossRef]
  12. Weiler, E.B.; Tamiosso, M.F.; Cruz, J.C.; Reichert, J.M.; Schorr, L.P.B.; Mantovanelli, B.C.; Santos, F.D.; Fantinel, R.A.; Baumhardt, E. Integrated Environmental Management and Planning based on Soil Erosion Susceptibility Scenarios. An. Acad. Bras. Ciênc. 2021, 93, e20191120. [Google Scholar] [CrossRef]
  13. Pinto, G.S.; Servidoni, L.E.; Lense, G.H.E.; Moreira, R.S.; Mincato, R.L. Estimativa das perdas de solo por erosão hídrica utilizando o Método de Erosão Potencial. Rev. Dep. Geogr. 2020, 39, 62–71. [Google Scholar] [CrossRef]
  14. Rodrigues, V.S.; Valle Júnior, R.F.; Fernandes, L.F.S.; Pacheco, F.A.L. The assessment of water erosion using Partial Least Squares-Path Modeling: A study in a legally protected area with environmental land use conflicts. Sci. Total Environ. 2019, 691, 1225–1241. [Google Scholar] [CrossRef]
  15. Batista, P.V.G.; Silva, M.L.N.; Silva, B.P.C.; Curi, N.; Bueno, I.T.; Acérbi Júnior, F.W.; Davies, J.; Quinton, J. Modelling spatially distributed soil losses and sediment yield in the upper Grande River Basin-Brazil. Catena 2017, 157, 139–150. [Google Scholar] [CrossRef]
  16. Ayer, J.E.B.; Olivetti, D.; Mincato, R.L.; Silva, M.L.N. Erosão hídrica em Latossolos Vermelhos distróficos. Pesq. Agropec. Trop. 2015, 45, 180–191. [Google Scholar] [CrossRef]
  17. Beskow, S.; Mello, C.R.; Norton, L.D.; Curi, N.; Viola, M.R.; Avanzi, J.C. Soil erosion prediction in the Grande River Basin, Brazil using distributed modeling. Catena 2009, 79, 49–59. [Google Scholar] [CrossRef]
  18. Galdino, S.; Sano, E.E.; Andrade, R.G.; Grego, C.R.; Nogueira, S.F.; Bragantini, C.; Flosi, A.H. Large-scale modeling of soil erosion with RUSLE for conservationist planning of degraded cultivated Brazilian pastures. Land Degrad. Dev. 2016, 27, 773–784. [Google Scholar] [CrossRef]
  19. Salis, H.H.C.; Costa, A.M.; Viana, J.H.M. Estimativa da perda anual de solos na bacia hidrográfica do Córrego Marinheiro, Sete Lagoas-MG, por meio da RUSLE. Bol. Geogr. 2019, 37, 101–115. [Google Scholar] [CrossRef]
  20. Gomes, L.; Simões, S.J.; Dalla Nora, E.L.; Sousa-Neto, E.R.; Forti, M.C.; Ometto, J.P.H. Agricultural expansion in the Brazilian Cerrado: Increased soil and nutrient losses and decreased agricultural productivity. Land 2019, 8, 12. [Google Scholar] [CrossRef]
  21. Avanzi, J.C.; Silva, M.L.N.; Curi, N.; Norton, L.D.; Beskow, S.; Martins, S.G. Spatial distribution of water erosion risk in a watershed with eucalyptus and Atlantic Forest. Ciênc. Agrotec. 2013, 37, 427–434. [Google Scholar] [CrossRef]
  22. Resende, M.; Curi, N.; Rezende, S.B.; Corrêa, G.F.; Ker, J.C. Pedologia: Base Para Distinção de Ambientes, 6th ed.; Editora UFLA: Lavras, Brazil, 2014; 378p. [Google Scholar]
  23. Food and Agriculture Organization of the United Nations. Soil Erosion by Water: Some Measures for Its Control on Cultivated Lands; FAO: Rome, Italy, 1965; 284p. [Google Scholar]
  24. Alvares, C.A.; Stape, J.L.; Sentelhas, P.C.; Gonçalves, J.L.M.; Sparovek, G. Köppen’s climate classification map for Brazil. Meteorol. Z. 2013, 22, 711–728. [Google Scholar] [CrossRef]
  25. Farr, T.G.; Rosen, P.A.; Caro, E.; Crippen, R.; Duren, R.; Hensley, S.; Kobrick, M.; Paller, M.; Rodriguez, E.; Roth, L.; et al. The shuttle radar topography mission. Rev. Geophys. 2007, 45, RG2004. [Google Scholar] [CrossRef]
  26. Cardoso, D.P. Rainfall Erosivity Estimation via Several Methods, and Water Erosion Modeling at Peixe Angical Reservoir-TO. Ph.D. Thesis, University Federal of Lavras, Lavras, Brazil, 2021. [Google Scholar]
  27. Rencher, A.C.; Christensen, W.F. Methods of Multivariate Analysis, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2012; 781p. [Google Scholar]
  28. Friedman, J.H.; Tukey, J.W. A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. 1974, 23, 881–890. [Google Scholar] [CrossRef]
  29. Espezua, S.; Villanueva, E.; Maciel, C.D.; Carvalho, A. A projection pursuit framework for supervised dimension reduction of high dimensional small sample datasets. Neurocomputing 2015, 149, 767–776. [Google Scholar] [CrossRef]
  30. Hastie, T.; Buja, A.; Tibshirani, R. Penalized discriminant analysis. Ann. Statist. 1995, 23, 73–102. [Google Scholar] [CrossRef]
  31. Lee, E.K.; Cook, D. A projection pursuit index for large p small n data. Stat. Comput. 2010, 20, 381–392. [Google Scholar] [CrossRef]
  32. Pena, D.; Prieto, F. Cluster identification using projections. J. Am. Stat. Assoc. 2001, 96, 1433–1445. [Google Scholar] [CrossRef]
  33. Ossani, P.C.; Cirillo, M.A. MVar: Multivariate Analysis. Available online: https://cran.r-project.org/web/packages/MVar/index.html (accessed on 22 June 2023).
  34. R Development Core Team. R: A Language and Environment for Statistical Computing. Available online: http://www.R-project.org/ (accessed on 17 September 2023).
Figure 1. Location map of the Peixe Angical Drainage Basin, showing its boundaries across Goiás, Tocantins, and District Federal, along with the spatial distribution of altitudes.
Figure 1. Location map of the Peixe Angical Drainage Basin, showing its boundaries across Goiás, Tocantins, and District Federal, along with the spatial distribution of altitudes.
Agriengineering 06 00241 g001
Figure 2. Input parameters for soil erosion modeling: rainfall erosivity (R), soil erodibility (K), topographic factor (LS), and cover-management factor(C).
Figure 2. Input parameters for soil erosion modeling: rainfall erosivity (R), soil erodibility (K), topographic factor (LS), and cover-management factor(C).
Agriengineering 06 00241 g002
Figure 3. Application of the projection pursuit technique on soil loss data for 1990, 2000, 2010, and 2017 using the minimum Kurtosis index.
Figure 3. Application of the projection pursuit technique on soil loss data for 1990, 2000, 2010, and 2017 using the minimum Kurtosis index.
Agriengineering 06 00241 g003
Figure 4. Application of the projection pursuit technique to soil loss data (years of 1990, 2000, 2010, and 2017), using the maximum Kurtosis index.
Figure 4. Application of the projection pursuit technique to soil loss data (years of 1990, 2000, 2010, and 2017), using the maximum Kurtosis index.
Agriengineering 06 00241 g004
Figure 5. Dendrogram using Minkowski distance and Ward method for similarity and dissimilarity analysis in the formation of erosion risk classes for the Peixe Angical Basin.
Figure 5. Dendrogram using Minkowski distance and Ward method for similarity and dissimilarity analysis in the formation of erosion risk classes for the Peixe Angical Basin.
Agriengineering 06 00241 g005
Figure 6. Application of the projection pursuit technique with the PDA index on 2017 soil loss data applying k-means method.
Figure 6. Application of the projection pursuit technique with the PDA index on 2017 soil loss data applying k-means method.
Agriengineering 06 00241 g006
Figure 7. General soil erosion risk classification for the Peixe Angical Basin based on the discriminant vectors.
Figure 7. General soil erosion risk classification for the Peixe Angical Basin based on the discriminant vectors.
Agriengineering 06 00241 g007
Table 1. Organization of Input Data for Cluster Analysis.
Table 1. Organization of Input Data for Cluster Analysis.
InstancesFactors for USLE ModelingClasses
RKLSC
1 x i j 1
22
50,1087
x i j = value of the ith instance (observation) of the jth attribute.
Table 2. Number of observations for soil erosion risk class.
Table 2. Number of observations for soil erosion risk class.
Classes1234567
Totals8.2387.9106.1878.41210.9645.8722.525
Table 3. Ranges of soil erosion risk factors for the Peixe Angical Basin, state of Tocantins.
Table 3. Ranges of soil erosion risk factors for the Peixe Angical Basin, state of Tocantins.
LimitsFactor Ranges of the USLE
RKLSCClasses
MJ mm ha−1 h−1 year−1Mg h MJ−1 mm−1Dimensionless
Minimum7074.140.0090<10.0011
Maximum7684.940.0567>200.5001
Minimum7683.450.0032<10.0012
Maximum8265.080.0567>200.5002
Minimum8265.260.0032<10.0013
Maximum8779.610.0567>200.5003
Minimum8780.080.0032<10.0014
Maximum9289.020.0567>200.5004
Minimum9284.970.0032<10.0015
Maximum9853.040.0355>200.5005
Minimum9853.330.0090<10.0016
Maximum10,855.870.0355>200.5006
Minimum10,856.770.0090<10.0017
Maximum11,966.680.0567>200.5007
Table 4. Performance evaluation of classifiers for qualitative classification of soil erosion risk, including sensitivity, specificity, ROC area, false positive rate (FP Rate), false negative rate (FN Rate), and F-score.
Table 4. Performance evaluation of classifiers for qualitative classification of soil erosion risk, including sensitivity, specificity, ROC area, false positive rate (FP Rate), false negative rate (FN Rate), and F-score.
ClasseSensitivitySpecificityROC AreaFP RateFN RateF-Score
10.99870.99920.99890.00080.00130.9972
20.96530.99330.97930.00670.03470.9661
30.99890.98580.99240.01420.00110.9748
40.99051.00000.99520.00000.00950.9950
50.92970.99570.96440.00100.07030.9601
60.99560.99570.99570.00430.00440.9865
70.94770.99950.97360.00050.05230.9711
Table 5. Confusion matrix results for soil erosion risk class: total values, counts, and success proportion.
Table 5. Confusion matrix results for soil erosion risk class: total values, counts, and success proportion.
Classes1234567
182270000110
20812024904300
301210,9520000
400025010024
502660057521690
635000078750
7003061005565
Total8262839811,5072502579580555589
Number of hits8227812010,9522501575278755565
Proportion of hits0.99570.96690.95180.99960.99260.97770.9957
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cardoso, D.P.; Ossani, P.C.; Cirillo, M.A.; Silva, M.L.N.; Avanzi, J.C. Using Machine Learning to Propose a Qualitative Classification of Risk of Soil Erosion. AgriEngineering 2024, 6, 4280-4293. https://doi.org/10.3390/agriengineering6040241

AMA Style

Cardoso DP, Ossani PC, Cirillo MA, Silva MLN, Avanzi JC. Using Machine Learning to Propose a Qualitative Classification of Risk of Soil Erosion. AgriEngineering. 2024; 6(4):4280-4293. https://doi.org/10.3390/agriengineering6040241

Chicago/Turabian Style

Cardoso, Dione Pereira, Paulo Cesar Ossani, Marcelo Angelo Cirillo, Marx Leandro Naves Silva, and Junior Cesar Avanzi. 2024. "Using Machine Learning to Propose a Qualitative Classification of Risk of Soil Erosion" AgriEngineering 6, no. 4: 4280-4293. https://doi.org/10.3390/agriengineering6040241

APA Style

Cardoso, D. P., Ossani, P. C., Cirillo, M. A., Silva, M. L. N., & Avanzi, J. C. (2024). Using Machine Learning to Propose a Qualitative Classification of Risk of Soil Erosion. AgriEngineering, 6(4), 4280-4293. https://doi.org/10.3390/agriengineering6040241

Article Metrics

Back to TopTop