Water Hyacinth Invasion and Management in a Tropical Hydroelectric Reservoir: Insights from Random Forest and SVM Classification
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Dataset and Preprocessing
- Resampling: The spatial resolution of the bands was modified, bringing them all to 10 m using one of the visible bands as a reference. In this process, each pixel of the resampled bands takes the value of the nearest pixel in the original resolution, without interpolation. This preserves the integrity of the categorical data. An advantage of the nearest neighbor method is its ability to conserve the original data values, avoiding distortions that could affect the analysis of discrete data, such as classifications, and it is computationally efficient [16]. The result is a set of bands homogeneous in spatial resolution, suitable for multiband analysis, such as creating spectral indices.
- Subset: The images were cropped to a region of interest (ROI) that included the entire reservoir to facilitate processing. Additionally, a subset was created with bands B2, B3, B4, B5, B6, B7, B8, B8A, B9, B11, and B12.
- Collocation: Finally, an image stacking was performed, using a maximum of three images per period through the Collocation tool. This tool allows for the spatial overlay of images so that the pixel values of the secondary products (dependent images) are sampled on the geographic grid of the main product (reference image).
2.3. Training and Validation Points
2.4. Image Classification with Machine Learning
2.5. Assessing Accuracy
- Precision: This measures the proportion of instances classified as positive that are actually positive (TP) with respect to instances classified as false positives (FP). In other words, how accurate the model’s positive predictions are.
- Recall: This indicates the proportion of positive instances that were correctly identified by the model with respect to instances classified as false negatives (FN). It is especially important in problems where the omission of positive instances is critical.
- F1-score: This is a measure that combines Precision and Recall into a single value, calculating the harmonic mean between them. It provides a balance between the model’s precision and recall capabilities.
- Accuracy: This provides a global measure of the proportion of correct predictions made by the model over the total number of predictions made.
- Macro average (Macro avg): This offers a broader evaluation of the model’s performance by considering the class balance in the dataset. The macro average calculates the unweighted average of the metrics for each class.
- Weighted average (Weighted avg): This also considers the class balance but considers the weight of each class based on its support in the dataset.
3. Results and Discussion
- Water: both classifiers exhibit very similar and highly accurate results for the Water class.
- Water Hyacinth: while both techniques show good performance, RF appears to be slightly more precise in classifying Water Hyacinth.
- Other Vegetation: both classifiers provide consistent and high results for the Other Vegetation class.
- Bare Ground: both classifiers have high precision but lower recall for the Bare Soil class, indicating that there may be some instances of bare soil that were not correctly classified, particularly with SVM.
- Grasses: both classifiers offer similar and high results for the Grasses class.
4. Conclusions
- Algorithm performance:
- ▪
- Random Forest: provided a smoother and more stable view of water hyacinth dynamics, beneficial for analyzing long-term trends and gradual changes.
- ▪
- Support Vector Machine: Indicated greater sensitivity to fluctuations in water hyacinth coverage, capturing more pronounced variations and reflecting higher sensitivity to changes in control conditions. This makes it suitable for detecting rapid changes and peaks in infestation.
- Impact of the COVID-19 pandemic:
- ▪
- During the COVID-19 pandemic (March 2020–July 2022), control activities were intermittent or suspended, leading to a significant increase in the area covered by water hyacinth according to both algorithms. This highlights the importance of maintaining infestation control and the need for continued implementation of control measures.
- Post-pandemic control and recovery:
- ▪
- After the resumption of post-pandemic control activities, a decrease in the area occupied by water hyacinth is observed in both algorithms, indicating the effectiveness of the implemented measures.
- Algorithm strengths and applications:
- ▪
- RF’s stability advantage: allows for long-term monitoring and trend analysis, providing a reliable tool for assessing the effectiveness of control policies over time.
- ▪
- SVM’s sensitivity to rapid changes: makes it beneficial for reactive management and emergency control measure planning due to its high sensitivity to data variability.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Suárez, B.; Vera, J.D.; Botero, F.; Suárez, B.H.; Giraldo, W. Hidroituango Intake Gate Closure—Emergency Conditions. Rev. Fac. Ing. 2022. [Google Scholar] [CrossRef]
- Datta, A.; Maharaj, S.; Prabhu, G.N.; Bhowmik, D.; Marino, A.; Akbari, V.; Rupavatharam, S.; Sujeetha, J.A.R.P.; Anantrao, G.G.; Poduvattil, V.K.; et al. Monitoring the Spread of Water Hyacinth (Pontederia crassipes): Challenges and Future Developments. Front. Ecol. Evol. 2021, 9. [Google Scholar] [CrossRef]
- Malik, A. Environmental Challenge Vis a Vis Opportunity: The Case of Water Hyacinth. Environ. Int. 2007, 33, 122–138. [Google Scholar] [CrossRef]
- Barrett, S.C.H.; Forno, I.W. Style Morph Distribution in New World Populations of Eichhornia crassipes (Mart.) Solms-Laubach (Water hyacinth). Aquat. Bot. 1982, 13, 299–306. [Google Scholar] [CrossRef]
- Wilson, J.R.; Holst, N.; Rees, M. Determinants and Patterns of Population Growth in Water Hyacinth. Aquat. Bot. 2005, 81, 51–67. [Google Scholar] [CrossRef]
- Rodríguez-Lara, J.W.; Cervantes-Ortiz, F.; Arámbula-Villa, G.; Mariscal-Amaro, L.A.; Aguirre-Mancilla, C.L.; Andrio-Enríquez, E. Water Hyacinth (Eichhornia crassipes): A Review. Agron. Mesoam. 2022, 33. Available online: https://www.redalyc.org/journal/437/43768481006/43768481006.pdf (accessed on 1 September 2024).
- Harun, I.; Pushiri, H.; Amirul-Aiman, A.J.; Zulkeflee, Z. Invasive Water Hyacinth: Ecology, Impacts and Prospects for the Rural Economy. Plants 2021, 10, 1613. [Google Scholar] [CrossRef] [PubMed]
- Pádua, L.; Antão-Geraldes, A.M.; Sousa, J.J.; Rodrigues, M.Â.; Oliveira, V.; Santos, D.; Miguens, M.F.P.; Castro, J.P. Water Hyacinth (Eichhornia crassipes) Detection Using Coarse and High Resolution Multispectral Data. Drones 2022, 6, 47. [Google Scholar] [CrossRef]
- Thamaga, K.H.; Dube, T. Understanding Seasonal Dynamics of Invasive Water Hyacinth (Eichhornia crassipes) in the Greater Letaba River System Using Sentinel-2 Satellite Data. GIsci Remote Sens. 2019, 56, 1355–1377. [Google Scholar] [CrossRef]
- Godana, G.; Fufa, F.; Debesa, G. Eichhornia Crassipes Expansion Detection Using Geospatial Techniques: Lake Dambal, Oromia, Ethiopia. Environ. Chall. 2022, 9, 100616. [Google Scholar] [CrossRef]
- Villa, P.; Bresciani, M.; Bolpagni, R.; Pinardi, M.; Giardino, C. A Rule-Based Approach for Mapping Macrophyte Communities Using Multi-Temporal Aquatic Vegetation Indices. Remote Sens. Environ. 2015, 171, 218–233. [Google Scholar] [CrossRef]
- Pádua, L.; Duarte, L.; Antão-Geraldes, A.M.; Sousa, J.J.; Castro, J.P. Spatio-Temporal Water Hyacinth Monitoring in the Lower Mondego (Portugal) Using Remote Sensing Data. Plants 2022, 11, 3465. [Google Scholar] [CrossRef] [PubMed]
- Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
- Mukarugwiro, J.A.; Newete, S.W.; Adam, E.; Nsanganwimana, F.; Abutaleb, K.; Byrne, M.J. Mapping Spatio-Temporal Variations in Water Hyacinth (Eichhornia crassipes) Coverage on Rwandan Water Bodies Using Multispectral Imageries. Int. J. Environ. Sci. Technol. 2021, 18, 275–286. [Google Scholar] [CrossRef]
- BID. EPM Implementacion Modelo Calidad del Agua Phi Resumen Didáctico; BID: Medellin, Colombia, 2016. [Google Scholar]
- Roy, D.P.; Li, J.; Zhang, H.K.; Yan, L. Best Practices for the Reprojection and Resampling of Sentinel-2 Multi Spectral Instrument Level 1C Data. Remote Sens. Lett. 2016, 7, 1023–1032. [Google Scholar] [CrossRef]
- Rolon-Mérettea, D.; Ross, M.; Rolon-Mérettea, T.; Church, K. Introduction to Anaconda and Python: Installation and setup. Python Res. Psychol. 2020, 16, S3–S11. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and Regression by Random Forest. R News 2002, 2, 18–22. [Google Scholar]
- Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random Forests for Land Cover Classification. Pattern Recognit. Lett. 2006, 27, 294–300. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Bayable, G.; Cai, J.; Mekonnen, M.; Legesse, S.A.; Ishikawa, K.; Imamura, H.; Kuwahara, V.S. Detection of Water Hyacinth (Eichhornia crassipes) in Lake Tana, Ethiopia, Using Machine Learning Algorithms. Water 2023, 15, 880. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar] [CrossRef]
- Dersseh, M.G.; Steenhuis, T.S.; Kibret, A.A.; Eneyew, B.M.; Kebedew, M.G.; Zimale, F.A.; Worqlul, A.W.; Moges, M.A.; Abebe, W.B.; Mhiret, D.A.; et al. Water Quality Characteristics of a Water Hyacinth Infested Tropical Highland Lake: Lake Tana, Ethiopia. Front. Water 2022, 4, 774710. [Google Scholar] [CrossRef]
- Avci, C.; Budak, M.; Yağmur, N.; Balçik, F. Comparison between Random Forest and Support Vector Machine Algorithms for LULC Classification. Int. J. Eng. Geosci. 2023, 8, 1–10. [Google Scholar] [CrossRef]
Band | Spatial Resolution (m) | Wavelength (nm) |
---|---|---|
B2 (blue) | 10 | 496.6 |
B3 (green) | 10 | 560 |
B4 (red) | 10 | 664.5 |
B5 (red edge 1) | 20 | 703.9 |
B6 (red edge 2) | 20 | 740.2 |
B7 (red edge 3) | 20 | 782.5 |
B8 (NIR) | 10 | 835.1 |
B8A (NIR narrow) | 20 | 864.8 |
B11 (SWIR sirrus) | 20 | 1613.7 |
B12 (SWIR) | 20 | 2202.4 |
Year | Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2018 | 1 | 1 | 1 | 1 | 4 | ||||||||
2019 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 8 | |||||
2020 | 1 | 1 | 1 | 1 | 4 | ||||||||
2021 | 1 | 1 | 1 | 1 | 1 | 5 | |||||||
2022 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 7 | |||||
2023 | 1 | 1 | 2 | 1 | 5 | ||||||||
Total | 2 | 4 | 3 | 2 | 2 | 1 | 5 | 5 | 2 | 2 | 3 | 2 | 33 |
Parameter | Value | Description |
---|---|---|
max_depth | None | The nodes expand until all sheets are pure or contain fewer samples than the value defined in min_samples_split. |
min_samples_split | 2 | The minimum number of samples required to split an internal node. Controls the creation of new splits and helps prevent overfitting. |
min_samples_leaf | 1 | It represents the minimum number of samples required to be in a leaf node. |
min_weight_fraction_leaf | 0 | It represents the minimum weighted fraction of the total sum of sample weights required to be in a leaf node. |
max_features | auto | All characteristics are considered when looking for the best division. |
max_leaf_nodes | None | There is no limit to the number of leaf nodes allowed in each tree. |
min_impurity_decrease | 0 | Indicates that a node will split if this division induces a decrease in impurity greater than or equal to this value. |
bootstrap | True | Sampling with replacement was used when building trees. |
oob_score | False | It indicates that the score is not calculated outside the bag to evaluate the generalization of the model. |
n_jobs | None | Indicates that only one job is used during tuning and prediction. |
random_state | None | It uses the global random seed from the Numpy library. |
verbose: | 0 | No output during adjustment. |
warm_start | False | Previous results are not reused to adjust and add additional trees to the estimator. |
class_weight | None | All classes have equal weight in the classification problem. |
ccp_alpha | 0 | Represents the pruning complexity parameter. |
max_samples | None | Use all the samples when adjusting each tree. |
max_depth | None | The nodes expand until all sheets are pure or contain fewer samples than the value defined in min_samples_split. |
min_samples_split | 2 | Minimum number of samples required to split an internal node. Controls the creation of new splits and helps prevent overfitting. |
Parameter | Value | Description |
---|---|---|
C | 1 | Control the trade-off between the soft decision limit and the correct ranking of training points. |
kernel | rbf | Radial kernel suitable for most cases. |
degree | 3 | Polynomial kernel grade. |
gamma | scale | Uses as the gamma value. |
coef0 | 0 | A standalone term in the kernel function. |
shrinking | True | Uses the support vector reduction heuristic. |
probability | False | Indicates whether probability estimates should be enabled. |
tol | 0.001 | Tolerance to stop the adjustment process. Controls the accuracy of the result. |
cache_size | 200 | Cache size in MB. |
class_weight | None | All classes have the same weight. |
verbose | False | Controls the verbosity of the output during tuning. |
max_iter | −1 | Maximum number of iterations. There is no limit. |
decision_function_shape | ovr | Specifies the shape of the decision function. ‘OVR’ (one against the rest). |
break_ties: | True | If there is a tie in the class decision, the tie will be broken according to the score of the classifiers. |
random_state | None | Controls the randomness of the estimator. |
RF | SVM | |||||||
---|---|---|---|---|---|---|---|---|
Class | Precision | Recall | F1-Score | Support | Precision | Recall | F1-Score | Support |
Water | 0.98 | 0.95 | 0.97 | 162 | 0.97 | 0.95 | 0.96 | 162 |
Water Hyacinth | 0.86 | 0.96 | 0.9 | 136 | 0.81 | 0.93 | 0.87 | 136 |
Other Vegetation | 0.9 | 0.92 | 0.91 | 157 | 0.87 | 0.92 | 0.9 | 157 |
Bare Ground | 1 | 0.72 | 0.84 | 39 | 1 | 0.62 | 0.76 | 39 |
Grass | 0.92 | 0.88 | 0.9 | 94 | 0.94 | 0.85 | 0.89 | 94 |
Accuracy | 0.92 | 588 | 0.9 | 588 | ||||
Macro Avg | 0.93 | 0.89 | 0.9 | 588 | 0.92 | 0.85 | 0.88 | 588 |
Weighted Avg | 0.9 | 0.92 | 0.92 | 588 | 0.91 | 0.9 | 0.9 | 588 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Correa-Mejía, L.F.; Garcés-Gómez, Y.A. Water Hyacinth Invasion and Management in a Tropical Hydroelectric Reservoir: Insights from Random Forest and SVM Classification. Ecologies 2025, 6, 8. https://doi.org/10.3390/ecologies6010008
Correa-Mejía LF, Garcés-Gómez YA. Water Hyacinth Invasion and Management in a Tropical Hydroelectric Reservoir: Insights from Random Forest and SVM Classification. Ecologies. 2025; 6(1):8. https://doi.org/10.3390/ecologies6010008
Chicago/Turabian StyleCorrea-Mejía, Luis Fernando, and Yeison Alberto Garcés-Gómez. 2025. "Water Hyacinth Invasion and Management in a Tropical Hydroelectric Reservoir: Insights from Random Forest and SVM Classification" Ecologies 6, no. 1: 8. https://doi.org/10.3390/ecologies6010008
APA StyleCorrea-Mejía, L. F., & Garcés-Gómez, Y. A. (2025). Water Hyacinth Invasion and Management in a Tropical Hydroelectric Reservoir: Insights from Random Forest and SVM Classification. Ecologies, 6(1), 8. https://doi.org/10.3390/ecologies6010008