Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Soil Sampling and Analysis
2.3. Digital Soil Mapping Technique
2.4. Selection of Environmental Covariates
2.5. Machine Learning Techniques
2.5.1. Random Forest
2.5.2. Cubist
2.5.3. Extreme Gradient Boosting
2.5.4. Support Vector Machines
2.6. Model Evaluation
2.7. Uncertainty Assessment
3. Results
3.1. Descriptive Statistics
3.2. Evaluation of Prediction Models
3.3. Importance of Environmental Variables
3.4. Spatial Prediction of SOC
3.5. Uncertainty Prediction
4. Conclusions
- RF had the best accuracy and the lowest uncertainty for predicting the regional SOC compared to XGBoost, SVM, and Cubist.
- Compared to XGBoost, SVM, and Cubist, the RF showed higher R2 and RMSE values for predicting SOC based on the validation data.
- The order of the most-crucial factors in the RF model for predicting the SOC was Elev > MAT > Band 3 > Band1 > MRVBF. The most-crucial variables for utilizing a Cubist model to predict the variance of the SOC were Slp, TRI, MAT, and Band4. The AP and LS were the most-essential factors in the XGBoost and SVM models.
- The predicted SOC ranged from 0.44 to 1.35%, 0.031 to 1.61%, 0.035 to 1.71%, and 0.47 to 1.36% with the RF, Cubist, XGBoost, and SVM models.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Schillaci, C.; Acutis, M.; Vesely, F.; Saia, S. A simple pipeline for the assessment of legacy soil datasets: An example and test with soil organic carbon from a highly variable area. Catena 2019, 1, 110–122. [Google Scholar] [CrossRef]
- Yigini, Y.; Panagos, P. Assessment of soil organic carbon stocks under future climate and land cover changes in Europe. Sci. Total Environ. 2016, 557–558, 838–850. [Google Scholar] [CrossRef] [PubMed]
- Jena, R.K.; Moharana, P.C.; Dharumarajan, S.; Sharma, G.K.; Ray, P.; Deb Roy, P.; Ghosh, D.; Das, B.; Alsuhaibani, A.M.; Gaber, A.; et al. Spatial Prediction of Soil Particle-Size Fractions Using Digital Soil Mapping in the North Eastern Region of India. Land 2023, 12, 1295. [Google Scholar] [CrossRef]
- Moharana, P.C.; Meena, R.L.; Nogiya, M.; Jena, R.K.; Sharma, G.K.; Sahoo, S.; Jha, P.K.; Aditi, K.; Vara Prasad, P.V. Impacts of Land Use on Pools and Indices of Soil Organic Carbon and Nitrogen in the Ghaggar Flood Plains of Arid India. Land 2022, 11, 1180. [Google Scholar] [CrossRef]
- Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Kerry, R. Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma 2016, 266, 98–110. [Google Scholar] [CrossRef]
- Taghizadeh-Mehrjardi, R.; Minasny, B.; Sarmadian, F.; Malone, B.P. Digital mapping of soil salinity in Ardakan region, central Iran. Geoderma 2014, 213, 15–28. [Google Scholar] [CrossRef]
- Liu, F.; Zhang, G.-L.; Song, X.; Li, D.; Zhao, Y.; Yang, J.; Wu, H.; Yang, F. High-resolution and three-dimensional mapping of soil texture of China. Geoderma 2020, 361, 114061. [Google Scholar] [CrossRef]
- Moharana, P.C.; Dharumarajan, S.; Kumar, N.; Jena, R.K.; Pradhan, U.K.; Meena, R.M.; Sahoo, S.; Kumar, S.; Meena, R.L.; Tailor, B.; et al. Modelling and Prediction of Soil Organic Carbon using Digital Soil Mapping in the Thar Desert Region of India. J. Indian Soc. Soil. Sci. 2022, 70, 86–96. [Google Scholar] [CrossRef]
- Hengl, T.; de Jesus, J.M.; MacMillan, R.A.; Batjes, N.H.; Heuvelink, G.B.M.; Ribeiro, E.; Samuel-Rosa, A.; Kempen, B.; Leenaars, J.G.B.; Walsh, M.G.; et al. SoilGrids1km—Global soil information based on automated mapping. PLoS ONE 2014, 9, e105992. [Google Scholar] [CrossRef]
- Hengl, T.; Heuvelink, G.B.; Kempen, B.; Leenaars, J.G.B.; Walsh, M.G.; Shepherd, K.D.; Sila, A.; MacMillan, R.A.; de Jesus, J.M.; Tamene, L.; et al. Mapping soil properties of Africa at 250 m resolution: Random forests significantly improve current predictions. PLoS ONE 2015, 10, e0125814. [Google Scholar] [CrossRef]
- Arrouays, D.; Grundy, M.G.; Hartemink, A.E.; Hempel, J.W.; Heuvelink, G.B.M.; Hong, S.Y.; Lagacherie, P.; Lelyk, G.; McBratney, A.B.; McKenzie, N.J.; et al. GlobalSoilMap: Toward a fine-resolution global grid of soil properties. Adv. Agron. 2014, 125, 93–134. [Google Scholar]
- Dharumarajan, S.; Hegde, R.; Janani, N.; Singh, S.K. The need for digital soil mapping in India. Geoderma Reg. 2019, 16, e00204. [Google Scholar] [CrossRef]
- Mishra, G.; Giri, K.; Jangir, A.; Francaviglia, R. Projected trends of soil organic carbon stocks in Meghalaya state of Northeast Himalayas, India. Implications for a policy perspective. Sci. Total Environ. 2020, 698, 134266. [Google Scholar] [CrossRef] [PubMed]
- Jigyasu, D.K.; Kumar, A.; Shabnam, A.A.; Sharma, G.K.; Jena, R.K.; Das, B.; Naik, V.S.; Ahmed, S.A.; Kumari, K.M.V. Spatial Distribution of the Fertility Parameters in Sericulture Soil: A Case Study of Dimapur District, Nagaland. Land 2023, 12, 956. [Google Scholar] [CrossRef]
- Walkley, A.; Black, I.A. An examination of the Degtjareff method for determining soil organic matter, and a proposed modification of the chromic acid titration method. Soil. Sci. 1934, 37, 29–38. [Google Scholar] [CrossRef]
- McBratney, A.B.; Santos, M.M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
- Dokuchaev, V.V. Russian Chernozems (Russkii Chernozems); Kaner, N., Ed.; US Department of Commerce: Springfield, VA, USA, 1976.
- Jenny, H. Factors of Soil Formation; McGraw Hill: New York, NY, USA, 1941. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: https://www.R-project.org/ (accessed on 5 February 2023).
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and mapping of soil organic carbon using machine learning algorithms in Northern Iran. Remote Sens. 2020, 12, 2234. [Google Scholar] [CrossRef]
- Mahmoudzadeh, H.; Matinfar, H.R.; Taghizadeh-Mehrjardi, R.; Kerry, R. Spatial prediction of soil organic carbon using machine learning techniques in western Iran. Geoderma Reg. 2020, 21, e00260. [Google Scholar] [CrossRef]
- Mikkonen, H.G.; van de Graaff, R.; Clarke, B.O.; Dasika, R.; Wallis, C.J.; Reichman, S.M. Geochemical indices and regression tree models for estimation of ambient background concentrations of copper, chromium, nickel and zinc in soil. Chemosphere 2018, 210, 193–203. [Google Scholar] [CrossRef]
- Fan, J.; Wang, X.; Wu, L.; Zhou, H.; Zhang, F.; Yu, X.; Lu, X.; Xiang, Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Convers. Manag. 2018, 164, 102–111. [Google Scholar] [CrossRef]
- Zhang, H.T.; Gao, M.X. The Application of Support Vector Machine (SVM) Regression Method in Tunnel Fires. Procedia Eng. 2018, 211, 1004–1011. [Google Scholar] [CrossRef]
- Lagacherie, P.; Arrouays, D.; Bourennane, H.; Gomez, C.; Martin, M.; Saby, N.P.A. How far can the uncertainty on a Digital Soil Map be known?: A numerical experiment using pseudo values of clay content obtained from Vis-SWIR hyperspectral imagery. Geoderma 2019, 337, 1320–1328. [Google Scholar] [CrossRef]
- Dharumarajan, S.; Kalaiselvi, B.; Suputhra, A.; Lalitha, M.; Vasundhara, R.; Anil Kumar, K.S.; Nair, K.M.; Hegde, R.; Singh, S.K.; Lagacherie, P. Digital soil mapping of soil organic carbon stocks in Western Ghats, South India. Geoderma Reg. 2021, 25, e00387. [Google Scholar] [CrossRef]
- Solomatine, D.P.; Shrestha, D.L. A novel method to estimate model uncertainty using machine learning techniques. Water Resour. Res. 2009, 45, W00B11. [Google Scholar] [CrossRef]
- Malone, B.P.; McBratney, A.B.; Minasny, B. Empirical estimates of uncertainty for mapping continuous depth functions of soil attributes. Geoderma 2011, 160, 614–626. [Google Scholar] [CrossRef]
- Jena, R.K.; Bandyopadhyay, S.; Pradhan, U.K.; Moharana, P.C.; Kumar, N.; Sharma, G.K.; Roy, P.D.; Ghosh, D.; Ray, P.; Padua, S.; et al. Geospatial Modelling for Delineation of Crop Management Zones Using Local Terrain Attributes and Soil Properties. Remote Sens. 2022, 14, 2101. [Google Scholar] [CrossRef]
- Lamichhane, S.; Adhikari, K.; Kumar, L. Use of multi-seasonal satellite images to predict SOC from cultivated lands in a Montane ecosystem. Remote Sens. 2021, 13, 4772. [Google Scholar] [CrossRef]
- Falahatkar, S.; Hosseini, S.M.; Ayoubi, S.; Salmanmahiny, A. Predicting soil organic carbon density using auxiliary environmental variables in northern Iran. Arch. Agron. Soil. Sci. 2016, 62, 375–393. [Google Scholar] [CrossRef]
- da Silva Chagas, S.; de Carvalho Junior, W.; Bhering, S.B.; Calderano Filho, B. Spatial prediction of soil surface texture in a semiarid region using random forest and multiple linear regressions. Catena 2016, 139, 232–240. [Google Scholar] [CrossRef]
- Akpa, S.I.; Odeh, I.O.; Bishop, T.F.; Hartemink, A.E. Digital mapping of soil particle-size fractions for Nigeria. Soil. Sci. Soc. Am. J. 2014, 78, 1953–1966. [Google Scholar] [CrossRef]
- Pahlavan-Rad, M.R.; Akbarimoghaddam, A. Spatial variability of soil texture fractions and pH in a flood plain (case study from eastern Iran). Catena 2018, 160, 275–281. [Google Scholar] [CrossRef]
- Sahoo, U.K.; Singh, S.L.; Gogoi, A.; Kenye, A.; Sahoo, S.S. Active and passive soil organic carbon pools as affected by different land use types in Mizoram, Northeast India. PLoS ONE 2019, 14, e0219969. [Google Scholar] [CrossRef] [PubMed]
- Pahlavan-Rad, M.R.; Dahmardeh, K.; Hadizadeh, M.; Keykha, G.; Mohammadnia, N.; Gangali, M.; Keikha, M.; Davatgar, N.; Brungard, C. Prediction of soil water infiltration using multiple linear regression and random forest in a dry flood plain, eastern Iran. Catena 2020, 194, 104715. [Google Scholar] [CrossRef]
- Liang, Z.; Chen, S.; Yang, Y.; Zhao, R.; Shi, Z.; Rossel, R.A.V. National digital soil map of organic matter in topsoil and its associated uncertainty in 1980’s China. Geoderma 2019, 335, 47–56. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content. |
Groups | Predictor | Abbreviation | Resolution | Description |
---|---|---|---|---|
Terrain indices | Elevation (m) | Elev | 30 m | Vertical distance above sea level |
Slope | Slp | 30 m | Inclination of the land surface from the horizontal | |
Relative Slope Position | RSP | Relative slope position | ||
Topographic Wetness Index | TWI | 30 m | Ratio of local catchment area to slope | |
Multi-Resolution index of Valley Bottom Flatness | MRVBF | 30 m | Measure of flatness and lowness | |
Valley Depth | VD | 30 m | Relative position of the valley | |
Channel Network Base Level | CNBL | 30 m | Calculates the distance to a channel network base level | |
Channel Network Distance | CND | 30 m | Calculates the distance to a channel network | |
Spectral indices | Normalized Difference Vegetation Index | NDVI | 30 m | Amount of vegetation |
Landsat data (11 bands) | Band1–11 | 30 m | Landsat OLI spectral band | |
Climate | Annual Precipitation | AP | 1 km | Bioclimatic variables (BIO1) |
Mean Annual Temperature | MAT | 1 km | Bioclimatic variables (BIO12) |
ML Algorithms | Random Forest (RF) | Cubist (Regression Tree) | Extreme Gradient Boosting (XGBoost) | Support Vector Machine (SVM) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Hyperparameters | Mtry | Ntree | Committees | Neighbors | Booster | Max_Depth | Min_Child_Weight | Colsample_Bytree | Subsample | eta | Kernel Type | C | σ |
Defined Parameters | 1–30 | 100–3000 | 1–100 | 0–9 | gbtree | 3–10 | 0–5 | 0.5–1 | 0.5–1 | 0.01–0.5 | RBF | 0.01–100 | 0.01–100 |
Definition | the number of input variables | the number of trees | the number of model trees | the number of nearest neighbors | the type of model | the depth of the tree | the minimum sum of weights of all observations | the number of variables supplied to a tree | the number of samples supplied to a tree | learning rate | the kernel function | the penalty parameter | the bandwidth parameter |
Dataset | n | Min | Max | Mean | SE | Median | SD | CV | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|---|---|---|
Total | 160 | 0.10 | 1.85 | 0.81 | 0.03 | 0.75 | 0.38 | 47.36 | 0.28 | −0.66 |
Calibration | 128 | 0.10 | 1.54 | 0.81 | 0.03 | 0.81 | 0.37 | 45.78 | 0.04 | −0.88 |
Validation | 32 | 0.29 | 1.85 | 0.78 | 0.07 | 0.63 | 0.42 | 54.37 | 1.05 | 0.27 |
Covariates | Min | Max | Mean | Median | SD | CV | Skewness | Kurtosis |
---|---|---|---|---|---|---|---|---|
Asp | 0.00 | 6.28 | 3.18 | 3.14 | 1.86 | 58.32 | −0.02 | −1.09 |
CI | −47.18 | 51.21 | 5.39 | 3.84 | 14.96 | 277.65 | 0.33 | 1.46 |
CNBL | 70.75 | 109.97 | 82.43 | 79.61 | 8.59 | 10.42 | 1.18 | 0.67 |
CND | 0.00 | 22.51 | 7.03 | 5.98 | 5.37 | 76.44 | 0.75 | 0.02 |
Elev | 59.67 | 116.40 | 88.53 | 86.13 | 11.17 | 12.62 | 0.34 | −0.46 |
LS-Factor | 0.02 | 9.90 | 2.67 | 2.56 | 1.62 | 60.83 | 1.31 | 3.54 |
MRRTF | 0.00 | 4.42 | 0.65 | 0.33 | 0.83 | 128.10 | 2.06 | 4.23 |
MRVBF | 0.00 | 5.80 | 0.73 | 0.30 | 1.10 | 150.04 | 2.51 | 6.26 |
NDVI | 0.08 | 0.46 | 0.37 | 0.37 | 0.05 | 14.19 | −1.15 | 4.44 |
RSP | 0.00 | 0.43 | 0.11 | 0.10 | 0.09 | 83.05 | 1.17 | 1.26 |
Slp | 0.00 | 0.26 | 0.10 | 0.10 | 0.05 | 50.57 | 0.31 | 0.11 |
TCA | 959 | 4,892,340 | 78,638 | 2317 | 522,500 | 664 | 8 | 71 |
TPI | −11.19 | 10.49 | 0.85 | 1.26 | 2.98 | 352.22 | −0.39 | 1.59 |
TRI | 1.08 | 8.40 | 3.23 | 3.08 | 1.22 | 37.73 | 1.07 | 2.27 |
VD | 18.95 | 100.15 | 59.45 | 59.20 | 17.64 | 29.68 | 0.09 | −0.63 |
Band1 | 10,693.90 | 14,115.70 | 11,112.31 | 11,038.65 | 355.06 | 3.20 | 4.01 | 31.11 |
Band2 | 9613.74 | 13,974.50 | 10,121.27 | 10,015.80 | 448.33 | 4.43 | 4.22 | 33.29 |
Band3 | 8952.72 | 14,430.60 | 9615.18 | 9455.46 | 554.81 | 5.77 | 4.38 | 34.80 |
Band4 | 7539.60 | 15,172.70 | 8488.40 | 8311.01 | 796.62 | 9.38 | 4.02 | 30.28 |
Band5 | 14,978.70 | 21,984.60 | 18,392.15 | 18,427.75 | 1178.43 | 6.41 | 0.10 | 0.37 |
Band6 | 11,266.50 | 20,536.80 | 13,582.21 | 13,334.00 | 1183.53 | 8.71 | 1.79 | 7.28 |
Band7 | 7657.73 | 18,707.30 | 9506.82 | 9174.76 | 1252.43 | 13.17 | 2.90 | 17.33 |
Band8 | 8331.01 | 14,683.00 | 9091.32 | 8789.09 | 739.24 | 8.13 | 3.19 | 19.53 |
Band9 | 5024.32 | 5068.92 | 5048.36 | 5048.27 | 7.03 | 0.14 | 0.00 | 0.93 |
Band10 | 27,930.40 | 30,232.70 | 28,404.28 | 2343.90 | 268.27 | 0.94 | 2.56 | 13.60 |
Band11 | 24,840.80 | 26,183.80 | 25,195.66 | 25,178.80 | 178.82 | 0.71 | 1.66 | 6.24 |
AP | 2554.71 | 3253.06 | 3019.87 | 3126.40 | 222.67 | 7.37 | −0.59 | −1.18 |
MAT | 23.65 | 24.15 | 23.98 | 23.95 | 0.10 | 0.43 | −0.24 | −0.41 |
Model | Calibration | Validation | ||||||
---|---|---|---|---|---|---|---|---|
R2c | CCCc | RMSEc | MEc | R2v | CCCv | RMSEv | MEv | |
RF | 0.966 | 0.863 | 0.159 | 0.001 | 0.418 | 0.549 | 0.377 | 0.136 |
Cubist | 0.396 | 0.571 | 0.291 | 0.039 | 0.230 | 0.314 | 0.485 | 0.062 |
SVM | 0.471 | 0.453 | 0.293 | 0.015 | 0.081 | 0.175 | 0.452 | 0.049 |
XGBoost | 0.998 | 0.990 | 0.022 | 0.000 | 0.152 | 0.190 | 0.424 | 0.054 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kumar, A.; Moharana, P.C.; Jena, R.K.; Malyan, S.K.; Sharma, G.K.; Fagodiya, R.K.; Shabnam, A.A.; Jigyasu, D.K.; Kumari, K.M.V.; Doss, S.G. Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India. Land 2023, 12, 1841. https://doi.org/10.3390/land12101841
Kumar A, Moharana PC, Jena RK, Malyan SK, Sharma GK, Fagodiya RK, Shabnam AA, Jigyasu DK, Kumari KMV, Doss SG. Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India. Land. 2023; 12(10):1841. https://doi.org/10.3390/land12101841
Chicago/Turabian StyleKumar, Amit, Pravash Chandra Moharana, Roomesh Kumar Jena, Sandeep Kumar Malyan, Gulshan Kumar Sharma, Ram Kishor Fagodiya, Aftab Ahmad Shabnam, Dharmendra Kumar Jigyasu, Kasthala Mary Vijaya Kumari, and Subramanian Gandhi Doss. 2023. "Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India" Land 12, no. 10: 1841. https://doi.org/10.3390/land12101841
APA StyleKumar, A., Moharana, P. C., Jena, R. K., Malyan, S. K., Sharma, G. K., Fagodiya, R. K., Shabnam, A. A., Jigyasu, D. K., Kumari, K. M. V., & Doss, S. G. (2023). Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India. Land, 12(10), 1841. https://doi.org/10.3390/land12101841