Soil Classification Mapping Using a Combination of Semi-Supervised Classification and Stacking Learning (SSC-SL)
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Soil Data
2.3. Environmental Variables Data
2.3.1. Remote Sensing Data
2.3.2. Terrain Data
2.3.3. Soil Parent Material Data
2.3.4. Pre-Processing of Environmental Variables
2.4. Stacking Learning (SL)
2.4.1. Ranger
2.4.2. Rpart
2.4.3. XGBoost
2.4.4. Model Parameter Setting
2.5. Selection of Unlabelled Sample Points
2.6. Semi-Supervised Classification (SSC)
2.7. Model Prediction
- (1)
- Ranger, Rpart, and XGBoost were selected as the base learners to model and predict the unlabelled data. The optimal tuning of the model hyperparameters and modelling prediction on unlabelled data were carried out. Finally, the prediction results of each of the three base learners were obtained.
- (2)
- Using random forest as a meta-learner, the prediction results of the three base learners were combined to obtain the stacking learning model.
- (3)
- The cluster analysis of the environmental variables was performed using the FCM model to generate a cluster map, and then the unlabelled dataset was selected near the patch boundaries of the cluster map.
- (4)
- An initial model was trained with labelled data and then the unlabelled dataset was predicted.
- (5)
- We established a confidence threshold range from 0.55 to 0.95, with increments of 0.05. Within this range, we filtered the unlabelled datasets that met or exceeded a specific confidence threshold and subsequently merged them with the labelled data.
- (6)
- Steps 4–5 were repeated until no new data reached the preset confidence threshold, and the final training set, extended by the semi-supervised classification method, was obtained.
- (7)
- The training set expanded by the semi-supervised classification method in the SL model and its three base learners were remodelled to obtain the four models of SSC-SL, SSC-Ranger, SSC-Rpart, and SSC-XGBoost, and the original four models (SL, Ranger, Rpart, and XGBoost) were outputted for comparison.
- (8)
- The soil map was predicted using eight models.
2.8. Assessment of Model Accuracy
2.9. Importance Analysis of Environmental Variables
3. Results
3.1. Prediction Accuracy of Different Models
3.2. Spatial Distribution of Soil Subgroups
3.3. Importance Analysis of Environmental Variables
4. Discussion
5. Conclusions
- The SSC-SL model exhibits robust performance in soil classification mapping, significantly enhancing prediction accuracy. By incorporating unlabelled sample points through the SSC approach and leveraging the compounded strengths of individual models in the SL framework, the SSC-SL model effectively addresses the limitations of the prediction degradation caused by the selection of a simple machine learning model.
- The soil subgroup spatial distribution maps generated by the SSC-SL model are more rational, improving upon the spatial distribution ranges of different soil types in the foundational models. This model also offers a clearer depiction of details.
- SPM, LU, MRVBF, and Ele consistently rank as highly important in all eight models, indicating that they are the primary environmental variables influencing the soil types in the study area.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Wagg, C.; Bender, S.F.; Widmer, F.; Van Der Heijden, M.G.A. Soil Biodiversity and Soil Community Composition Determine Ecosystem Multifunctionality. Proc. Natl. Acad. Sci. USA 2014, 111, 5266–5270. [Google Scholar] [CrossRef]
- Amundson, R.; Berhe, A.A.; Hopmans, J.W.; Olson, C.; Sztein, A.E.; Sparks, D.L. Soil and Human Security in the 21st Century. Science 2015, 348, 1261071. [Google Scholar] [CrossRef]
- Ippolito, T.A.; Herrick, J.E.; Dossa, E.L.; Garba, M.; Ouattara, M.; Singh, U.; Stewart, Z.P.; Prasad, P.V.V.; Oumarou, I.A.; Neff, J.C. A Comparison of Approaches to Regional Land-Use Capability Analysis for Agricultural Land-Planning. Land 2021, 10, 458. [Google Scholar] [CrossRef]
- Alhajj Ali, S.; Vivaldi, G.A.; Garofalo, S.P.; Costanza, L.; Camposeo, S. Land Suitability Analysis of Six Fruit Tree Species Immune/Resistant to Xylella Fastidiosa as Alternative Crops in Infected Olive-Growing Areas. Agronomy 2023, 13, 547. [Google Scholar] [CrossRef]
- Poggio, L.; De Sousa, L.M.; Batjes, N.H.; Heuvelink, G.B.M.; Kempen, B.; Ribeiro, E.; Rossiter, D. SoilGrids 2.0: Producing Soil Information for the Globe with Quantified Spatial Uncertainty. Soil 2021, 7, 217–240. [Google Scholar] [CrossRef]
- Liu, F.; Wu, H.; Zhao, Y.; Li, D.; Yang, J.-L.; Song, X.; Shi, Z.; Zhu, A.-X.; Zhang, G.-L. Mapping High Resolution National Soil Information Grids of China. Sci. Bull. 2022, 67, 328–340. [Google Scholar] [CrossRef]
- Žížala, D.; Minařík, R.; Skála, J.; Beitlerová, H.; Juřicová, A.; Reyes Rojas, J.; Penížek, V.; Zádorová, T. High-Resolution Agriculture Soil Property Maps from Digital Soil Mapping Methods, Czech Republic. Catena 2022, 212, 106024. [Google Scholar] [CrossRef]
- Lembrechts, J.J.; Ashcroft, M.B.; Frenne, P.D.; Kemppinen, J.; Kopecký, M.; Luoto, M.; Maclean, I.M.D.; Crowther, T.W.; Bailey, J.J.; Haesen, S.; et al. Global Maps of Soil Temperature. Glob. Chang. Biol. 2022, 28, 3110–3144. [Google Scholar] [CrossRef]
- Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; De Sousa, L. Global Mapping of Soil Salinity Change. Remote Sens. Environ. 2019, 231, 111260. [Google Scholar] [CrossRef]
- Asgari, N.; Ayoubi, S.; Jafari, A.; Demattê, J.A.M. Incorporating Environmental Variables, Remote and Proximal Sensing Data for Digital Soil Mapping of USDA Soil Great Groups. Int. J. Remote Sens. 2020, 41, 7624–7648. [Google Scholar] [CrossRef]
- Teng, H.; Viscarra Rossel, R.A.; Shi, Z.; Behrens, T. Updating a National Soil Classification with Spectroscopic Predictions and Digital Soil Mapping. Catena 2018, 164, 125–134. [Google Scholar] [CrossRef]
- Cao, D.; Xing, H.; Wong, M.S.; Kwan, M.-P.; Xing, H.; Meng, Y. A Stacking Ensemble Deep Learning Model for Building Extraction from Remote Sensing Images. Remote Sens. 2021, 13, 3898. [Google Scholar] [CrossRef]
- Cui, S.; Yin, Y.; Wang, D.; Li, Z.; Wang, Y. A Stacking-Based Ensemble Learning Method for Earthquake Casualty Prediction. Appl. Soft Comput. J. 2021, 101, 107038. [Google Scholar] [CrossRef]
- Faska, Z.; Khrissi, L.; Haddouch, K.; El Akkad, N. A Robust and Consistent Stack Generalized Ensemble-Learning Framework for Image Segmentation. J. Eng. Appl. Sci. 2023, 70, 74. [Google Scholar] [CrossRef]
- Aydın, Y.; Işıkdağ, Ü.; Bekdaş, G.; Nigdeli, S.M.; Geem, Z.W. Use of Machine Learning Techniques in Soil Classification. Sustainability 2023, 15, 2374. [Google Scholar] [CrossRef]
- Sharififar, A.; Sarmadian, F.; Malone, B.P.; Minasny, B. Addressing the Issue of Digital Mapping of Soil Classes with Imbalanced Class Observations. Geoderma 2019, 350, 84–92. [Google Scholar] [CrossRef]
- van Engelen, J.E.; Hoos, H.H. A Survey on Semi-Supervised Learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
- Kostopoulos, G.; Karlos, S.; Kotsiantis, S.; Ragos, O. Semi-Supervised Regression: A Recent Review. IFS 2018, 35, 1483–1500. [Google Scholar] [CrossRef]
- Taghizadeh-Mehrjardi, R.; Schmidt, K.; Amirian-Chakan, A.; Rentschler, T.; Zeraatpisheh, M.; Sarmadian, F.; Valavi, R.; Davatgar, N.; Behrens, T.; Scholten, T. Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space. Remote Sens. 2020, 12, 1095. [Google Scholar] [CrossRef]
- Zhang, L.; Yang, L.; Ma, T.; Shen, F.; Cai, Y.; Zhou, C. A Self-Training Semi-Supervised Machine Learning Method for Predictive Mapping of Soil Classes with Limited Sample Data. Geoderma 2021, 384, 114809. [Google Scholar] [CrossRef]
- Zhu, C.; Wei, Y.; Zhu, F.; Lu, W.; Fang, Z.; Li, Z.; Pan, J. Digital Mapping of Soil Organic Carbon Based on Machine Learning and Regression Kriging. Sensors 2022, 22, 8997. [Google Scholar] [CrossRef]
- Fang, Z.; Lu, W.; Zhu, F.; Zhu, C.; Li, Z.; Pan, J. Landscape Classification System Based on RKM Clustering for Soil Survey UAV Images—Case Study of the Small Hilly Areas in Jurong City. Sensors 2022, 22, 9895. [Google Scholar] [CrossRef]
- Chinese Soil Taxonomy Research Group. Keys to Chinese Soil Taxonomy, 3rd ed.; University of Science and Technology of China Press: Hefei, China, 2001. [Google Scholar]
- Jenny, H. Factors of Soil Formation: A System of Quantitative Pedology; McGraw-Hill: New York, NY, USA, 1941. [Google Scholar]
- Demattê, J.A.M.; da Silva Terra, F. Spectral Pedology: A New Perspective on Evaluation of Soils along Pedogenetic Alterations. Geoderma 2014, 217–218, 190–200. [Google Scholar] [CrossRef]
- Li, Y.; Zhao, Z.; Wei, S.; Sun, D.; Yang, Q.; Ding, X. Prediction of Regional Forest Soil Nutrients Based on Gaofen-1 Remote Sensing Data. Forests 2021, 12, 1430. [Google Scholar] [CrossRef]
- Marchetti, A.; Piccini, C.; Santucci, S.; Chiuchiarelli, I.; Francaviglia, R. Simulation of Soil Types in Teramo Province (Central Italy) with Terrain Parameters and Remote Sensing Data. Catena 2011, 85, 267–273. [Google Scholar] [CrossRef]
- Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Finke, P. Comparing the Efficiency of Digital and Conventional Soil Mapping to Predict Soil Types in a Semi-Arid Region in Iran. Geomorphology 2017, 285, 186–204. [Google Scholar] [CrossRef]
- Wilson, M.J. The Importance of Parent Material in Soil Classification: A Review in a Historical Context. Catena 2019, 182, 10413. [Google Scholar] [CrossRef]
- Wadoux, A.M.J.-C. Using Deep Learning for Multivariate Mapping of Soil with Quantified Uncertainty. Geoderma 2019, 351, 59–70. [Google Scholar] [CrossRef]
- Kuhn, M. caret: Classification and Regression Training. R Package Version 6.0-92. Available online: https://CRAN.R-project.org/package=caret (accessed on 2 December 2023).
- Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble Deep Learning: A Review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Wright, M.N.; Ziegler, A. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. J. Stat. Soft. 2017, 77, 1–17. [Google Scholar] [CrossRef]
- Breiman, L. Classification and Regression Trees; Chapman & Hall: New York, NY, USA, 1984. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 2 December 2023).
- RStudio Team. RStudio: Integrated Development for R; RStudio, PBC: Boston, MA, USA, 2020; Available online: http://www.rstudio.com/ (accessed on 2 December 2023).
- Coyle, J.; Hejazi, N.; Malenica, I.; Phillips, R.; Sofrygin, O. sl3: Pipelines for Machine Learning and Super Learning, R Package Version 1.4.4. Available online: https://github.com/tlverse/sl3 (accessed on 2 December 2023).
- Van Der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super Learner. Stat. Appl. Genet. Mol. Biol. 2007, 6. [Google Scholar] [CrossRef]
- Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J. Cybern. 1973, 3, 32–57. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, C.; Li, Z.; Yang, L.; Jin, X.; Gao, X. Analysis on the Temporal and Spatial Characteristics of the Shallow Soil Temperature of the Qinghai-Tibet Plateau. Sci. Rep. 2022, 12, 19746. [Google Scholar] [CrossRef]
- Peng, Y.; Roell, Y.E.; Odgers, N.P.; Møller, A.B.; Beucher, A.; Greve, M.B.; Greve, M.H. Mapping and Describing Natural Terroir Units in Denmark. Geoderma 2021, 394, 115014. [Google Scholar] [CrossRef]
- Dunkl, I.; Ließ, M. On the Benefits of Clustering Approaches in Digital Soil Mapping: An Application Example Concerning Soil Texture Regionalization. Soil 2022, 8, 541–558. [Google Scholar] [CrossRef]
- Gelb, J.; Apparicio, P. Apport de la classification floue c-means spatiale en géographie: Essai de taxinomie socio-résidentielle et environnementale à Lyon. Cybergeo 2021, 972, 1–26. [Google Scholar] [CrossRef]
- Estévez, V.; Beucher, A.; Mattbäck, S.; Boman, A.; Auri, J.; Björk, K.-M.; Österholm, P. Machine Learning Techniques for Acid Sulfate Soil Mapping in Southeastern Finland. Geoderma 2022, 406, 115446. [Google Scholar] [CrossRef]
- Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on Deep Semi-Supervised Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
- Palomares Alabarce, F.J. SSLR: Semi-Supervised Classification, Regression and Clustering Methods, R Package Version 0.9.3.3. Available online: https://CRAN.R-project.org/package=SSLR (accessed on 2 December 2023).
- Tao, S.; Zhang, X.; Feng, R.; Qi, W.; Wang, Y.; Shrestha, B. Retrieving Soil Moisture from Grape Growing Areas Using Multi-Feature and Stacking-Based Ensemble Learning Modeling. Comput. Electron. Agric. 2023, 204, 107537. [Google Scholar] [CrossRef]
- Cahyana, D.; Barus, B.; Darmawan; Mulyanto, B.; Sulaeman, Y. Assessing Machine Learning Techniques for Detailing Soil Map in the Semiarid Tropical Region. IOP Conf. Ser. Earth Environ. Sci. 2021, 648, 012018. [Google Scholar] [CrossRef]
- Assami, T.; Hamdi-Aїssa, B. Digital Mapping of Soil Classes in Algeria—A Comparison of Methods. Geoderma Reg. 2019, 16, e00215. [Google Scholar] [CrossRef]
Type | Name of Environmental Variables/Definition | Abbreviation |
---|---|---|
Remote Sensing | Simple Ratio (NIR/R) | SR |
Normalised Difference Vegetation Index ((NIR − R)/(NIR + R)) | NDVI | |
Green Ratio Vegetation Index (NIR/G) | GRVI | |
Normalised Difference Water Index ((G − NIR)/(G + NIR)) | NDWI | |
Green Leaf Index ((2 ×G − B −R)/(2 × G + B + R)) | GLI | |
Land Use | LU | |
Terrain | Elevation (m) | Ele |
Aspect | Asp | |
Slope | Slo | |
Plan Curvature | PlC | |
Profile Curvature | PrC | |
Topographic Position Index | TPI | |
Topographic Wetness Index | TWI | |
Multi-Resolution of Ridge Top Flatness Index | MRRTF | |
Multi-Resolution Valley Bottom Flatness Index | MRVBF | |
Horizontal Distance to Ridge Line | HDRL | |
Horizontal Distance to Valley Line | HDVL | |
Soil Parent Material | Soil Parent Material | SPM |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, F.; Zhu, C.; Lu, W.; Fang, Z.; Li, Z.; Pan, J. Soil Classification Mapping Using a Combination of Semi-Supervised Classification and Stacking Learning (SSC-SL). Remote Sens. 2024, 16, 405. https://doi.org/10.3390/rs16020405
Zhu F, Zhu C, Lu W, Fang Z, Li Z, Pan J. Soil Classification Mapping Using a Combination of Semi-Supervised Classification and Stacking Learning (SSC-SL). Remote Sensing. 2024; 16(2):405. https://doi.org/10.3390/rs16020405
Chicago/Turabian StyleZhu, Fubin, Changda Zhu, Wenhao Lu, Zihan Fang, Zhaofu Li, and Jianjun Pan. 2024. "Soil Classification Mapping Using a Combination of Semi-Supervised Classification and Stacking Learning (SSC-SL)" Remote Sensing 16, no. 2: 405. https://doi.org/10.3390/rs16020405
APA StyleZhu, F., Zhu, C., Lu, W., Fang, Z., Li, Z., & Pan, J. (2024). Soil Classification Mapping Using a Combination of Semi-Supervised Classification and Stacking Learning (SSC-SL). Remote Sensing, 16(2), 405. https://doi.org/10.3390/rs16020405