Next Article in Journal
Comparison and Assessment of Regional and Global Land Cover Datasets for Use in CLASS over Canada
Next Article in Special Issue
Application of Probabilistic and Machine Learning Models for Groundwater Potentiality Mapping in Damghan Sedimentary Plain, Iran
Previous Article in Journal
Use of Very High-Resolution Optical Data for Landslide Mapping and Susceptibility Analysis along the Karnali Highway, Nepal
Previous Article in Special Issue
An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatial Mapping of the Groundwater Potential of the Geum River Basin Using Ensemble Models Based on Remote Sensing Images

1
National Institute of Ecology (NIE), 1210 Geumgang-ro, Maseo-myeon, Seocheon-gun, Chungcheongnam-do 33657, Korea
2
Department of Geoinformatics, University of Seoul, 163 Seoulsiripdaero, Dongdaemun-gu, Seoul 02504, Korea
3
Geoscience Platform Research Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), 124, Gwahak-ro Yuseong-gu, Daejeon 34132, Korea
4
Korea University of Science and Technology, 217 Gajeong-ro Yuseong-gu, Daejeon 305-350, Korea
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(19), 2285; https://doi.org/10.3390/rs11192285
Submission received: 3 September 2019 / Revised: 23 September 2019 / Accepted: 26 September 2019 / Published: 30 September 2019

Abstract

:
This study analyzed the Groundwater Productivity Potential (GPP) of Okcheon city, Korea, using three different models. Two of these three models are data mining models: Boosted Regression Tree (BRT) model and Random Forest (RF) model. The other model is the Logistic Regression (LR) model. The three models are based on the relationship between groundwater-productivity data (specific capacity (SPC) and transmissivity (T)) and the related hydro-geological factors from thematic maps, such as topography, lineament, geology, land cover, and etc. The thematic maps which are generated from the remote sensing images. Groundwater productivity data were collected from 86 wells locations. The resulting GPP maps were validated through area-under-the-curve (AUC) analysis using wells data that had not been used for training the model. When T was used in the BRT, RF, and LR models, the obtained GPP maps had 81.66%, 80.21%, and 85.04% accuracy, respectively, and when SPC was used, the maps had 81.53%, 78.57%, and 82.22% accuracy, respectively. The LR model, which is a statistical model, showed the highest verification accuracy, also the other two models showed high accuracies. These observations indicate that all three models can be useful for groundwater resource development.

Graphical Abstract

1. Introduction

According to The United Nations World Water Development Report (WWDR) 2018, more than 2 billion people in the World do not have access to safe drinking water and sanitation. If the current levels of water pollution and consumption are not reduced, nearly one-third of the world’s population will suffer under severe water stress by approximately 2050 [1]. Climate change, increasing water scarcity, environmental degradation, population growth, and urbanization are already posing challenges for surface water supply systems [2]. Other means of meeting the demand for freshwater, such as using groundwater, will have to be determined. Presently around 20% of the total groundwater resources are being used globally [3].
Groundwater is a very efficient resource and can be used for agriculture, forestry, rearing of livestock, industrial purposes, and as a drinking water source for the community [4]. One of the most valuable benefits of groundwater is that it is less susceptible to environmental pollution than surface water [5]. Therefore, effort to find high quality groundwater is growing globally [6]. In South Korea, the rate of groundwater use has increased, and yet its supply does not meet the needs of the people [7]. Therefore, in South Korea, studies to evaluate the sustainability of groundwater in order to improve the use of groundwater and to evaluate the potential of groundwater in order to efficiently manage groundwater should be encouraged.
With scientific advancement in terms of GIS technology, various spatial modeling techniques have been developed and applied to evaluate the potential of groundwater productivity in recent years. GIS and remote sensing can be used to detail large areas in a more cost-effective manner [8,9,10,11,12,13,14,15,16]. In contemporary studies, Frequency Ratio (FR) [17,18,19], Random Forest (RF) [20,21,22,23], Logistic Regression (LR) [24,25,26], Boosted Regression Tree (BRT) [27,28], Support Vector Machine (SVM) [13,29,30,31], Artificial Neural Network (ANN) [32,33,34,35,36], Weights of Evidence (WoE) [37,38,39], Evidential Belief Function (EBF) [40,41,42], and various other ensemble models have been applied for Groundwater Productivity Potential (GPP) mapping.
Ensemble models such as RF and BRT were also used to study ecology, landslide, subsidence, flood vulnerability, and etc. [43]. Nsiah, et al. [44] evaluated the groundwater potential of Ghana’s Nabogo basin using the weighted overlay technique. They achieved more accurate and reliable results by utilizing the commonly used specific capacity (SPC) values as wells as transmissivity (T) values. Park, Hamm, Jeon and Kim [24] performed GPP mapping using the LR and Multivariate Adaptive Regression Splines (MARS) models; these showed 84% and 87% verification accuracies, respectively. Lee, et al. [45] analysed the relationships between the groundwater pumping capacity and related factors using the FR and Boosted Classification Tree (BCT) models in Goyang-si in Gyeonggi-do province, South Korea. The results of the accuracy rates were 68.31% and 69.39%, respectively. In the previous studies, various ensemble models were used to predict GPP, and their accuracy showed a reliable level of results (approximately, >65%). However, for high accuracy of results, many studies will need to be carried out through the application of various topographical, geological and hydrological data (e.g., data obtained through remote sensing images) and various models that have not been utilized previously.
The purpose of the present study was to apply and analyze the LR (statistical model), RF & BRT (machine learning models) and determine their ability to perform accurate and effective GPP mapping. In addition, this study also intended to identify the important factors affecting GPP. Numerous preceding studies have used various models to analyze GPP, however, the LR, RF and BRT models have not yet been widely used. Therefore, we used them in correlation with hydrogeological factors related to groundwater productivity data to perform a more accurate GPP analysis and to verify and compare LR, RF and BRT models’ accuracy and suitability. Also, various groundwater-related factors used in this study are derived from the thematic maps based on remote sensing data [46,47,48,49]. This study can be used as a reference to related future studies, such as the development of clean water resources, particularly groundwater [50].

2. Study Area and Spatial Data Set

2.1. Study Area

The research area in this study was Okcheon-gun in South Korea. The region is geographically located in the upstream area of the Geum River basin, which is the basin of one of South Korea’s four major rivers. The Geum River flows from north to south in this area. Okcheon-gun lies between 36°10′N and 36°26′N latitudes and 127°29′E and 127°53′E longitudes. Its total area is 537.06 km2, of which 347.04 km2 is forest land, 55.82 km2 is covered by fields, 45.63 km2 is used as paddy fields, and other areas occupy 88.57 km2 [6]. The annual precipitation in the area is 1297.4 mm, which is nearly equal to South Korea’s annual average precipitation of 1277.4 mm (1978~2007) [33]. However, due to the influence of the East Asian Monsoon climate, rainfall is intense during summer and winter, while there is not enough water during spring and autumn. This area uses approximately 45,032,000 m3 of groundwater per year. Of the total consumed groundwater, 67.2% is used for living, 32.1% for agriculture, and 0.5% for industrial purposes. As a result, the Okcheon-gun uses most of the groundwater as living and agricultural part, and the GPP map of Okcheon-gun is necessary for more efficient groundwater management [51].
Geologically, this area was developed from the Okcheon era and includes the unrecorded Okcheon supergroup. It also includes the Pyeongan supergroup, Paleozoic Choseon supergroup, the Triassic and Jurassic granitic rocks, the Cretaceous sedimentary, Quaternary alluvium, volcanic rocks, and intrusive igneous rocks (Figure 1). The Quaternary alluvium was found to be distributed along the tributaries in the Okcheon area. The alluvial layers in the plain are developed in the granite area, and the basin shape is narrow downstream in the plain. The Quaternary alluvium constitutes unconsolidated clastic sediments consisting of gravel, sand, silt, and clay. Relatively, silt and clay are more thickly deposited in the plain due to river flooding.
The representative geological structure of the Okcheon area shows the fault to the northwest of the Okcheon. It also shows the thrust fault in the Okcheon and Choseon supergroup, located over the upper Pyeongan Supergroup. The thrust fault developed in the northeast and south-northwest directions [7]. A strike-strip fault exists to the northwest of the Okcheon fault which occurs across Jurassic granite rocks, the Okcheon supergroup, the Paleozoic sequence, and the Triassic granite rocks. It stretches to tens of kilometers from the west side (Figure 1).

2.2. Spatial Data Set

This study used three models that are based on the relationship between groundwater productivity and geological factors (Table 1). To calculate GPP accurately, we used SPC and T as groundwater productivity values.
SPC is defined as the amount of water that can be produced by lowering a unit of the surface of water contained in wells through pumping. Its value is derived using the pumping test results of dividing the pumping rate by the drawdown. The pumping tests last for more than 24 h. The formula for calculating the SPC is as follows:
S P C = Q h 0 h
where SPC is the specific capacity of aquifer [ L 2 T 1 ]; m 3 /day/m), Q is the pumping rate ([ L 3 T 1 ]; m3/day), and h 0 h is the drawdown ([L]; m). T is defined as the flow rate under unit pressure. It is a function of the unit width of the entire aquifer. Therefore, T represents the ability to transfer the flow in aquifers at constant thickness. T is the measure of a material’s capacity to transmit water according to Darcy’s law. In other words, it indicates the volume of water flowing through a 0.3 m × 0.3 m cross-sectional area of an aquifer under a hydraulic gradient of 0.3 m/0.3 m in a given amount of time (usually 24 h).
K ( x , y ) = 1 b 0 b K ( x , y , z ) d z
T = K b
where T is transmissivity ( L 2 T 1 ), b is aquifer thickness (L), and K is hydraulic conductivity [6]. Generally, high values of T indicate wider unit widths of the aquifer and better drawdown. The mathematical calculations for the process of estimating T using SPC are explained in detail in [52,53]. All the T and SPC values in this study were extracted from the pumping test recorded in [7].
The Table 2 shows that the results of the pumping test Okcheon for about 120 min for each wells in Okcheon. The test were performed by Korea Institute of Geoscience and Mineral Resource, and all detailed experimental procedures and results are reported in the national groundwater survey report [7]. The groundwater productivity data used in this study were converted into the binary form, where 1 is displayed when there is more than a median value of groundwater productivity, and 0 is displayed otherwise. The split criterion was T (2.6 m 2 / day ) and the corresponding SPC (4.88 m 3 /day/m), which is the median of the two values. In the present study, we applied T and SPC to the three models.
18 various topographical factors were used for GPP analysis, including terrain and surface data derived from remote sensing images. (Figure 2). The selected factors were slope gradient, relative slope position, plan curvature, hydrogeology, hydraulic slope, distance from faults, distance from lineament, depth of groundwater, distance from channel network, lineament density, valley depth, Topographic Wetness Index (TWI), slope length (LS) factor, drainage basin, Terrain Ruggedness Index (TRI), convergence index, land-cover, and soil texture. The spatial database of these factors was reproduced using the ArcGIS software with SAGA-GIS.
The topographical data was obtained through digitizing using aerial photographs taken in 2006; additional corrections were performed and updated by other high-resolution satellite images. The satellite image used for correction was Pleiades 1A, spatial resolution of multi-spectral is 0.5 m, and the image was similar to that of the aerial photograph. Land cover maps were classified into 8 main categories using an unsupervised classification method from aerial photographs with a spatial resolution of 0.25 m taken in 2013. In addition, Kompsat-3 remote sensing image with spatial resolution of 0.7 m was used to evaluate the classification accuracy [45].
The digital elevation model (DEM) was generated from a topographic map with a resolution of 30 m using a 1:5000 digital topographic map from the National Geographic Information Institute (NGII). The slope gradient, plan curvature, relative slope position, valley depth, LS factor, convergence index, drainage basin, TWI, and TRI were calculated using the DEM [54]. Also, various thematic maps such as those depicting soil texture, land cover, and hydrogeology were resampling at 30 m resolution and used in this study [11].
Various parameters were used to analyze the GPP with more precision. The LS factor is the ratio of soil loss per unit catchment area to the slope length (L) and slope steepness (S). The formula proposed by Moore and Burch [55] for calculating the LS factor is as follows (Equation (4)):
L S = ( A s 22.13 ) 0.6 ( sin β 0.0896 ) 1.3
where,
  • As is the catchment area
  • β represents the slope gradient measured in degrees
The convergence index represents the structure of the slope as a set of convergence and divergence sites. The index value for maximum convergence is +100. Conversely, the index value for maximum divergence is −100. If there is a flat, the index is 0 [56]. This means that the index value is closer to convergence (+100) for larger slope values.
The TRI is an index developed by Riley [57]. TRI represents the altitude difference between adjacent cells in a grid. The TRI index is calculated by determining the height differences between the center cell and the eight cells surrounding it. This difference corresponds to the average altitude change between any point on the grid and the surrounding area.

3. Methodology

The GPP mapping process is shown in Figure 3. A total of 84 groundwater wells were split: 50% were randomly demarcated as training data and the other 50% were retained as validation data. A total of 18 hydrogeology-related factors were combined into a spatial database. Then, the selected T and SPC data (T values ≥ 2.61 m2/day and SPC values ≥ 4.88 m3/day/m) were used to train the three models. Finally, the results of GPP maps was verified using Area-Under-the-Curve (AUC) analysis.
The 18 factors were arranged in a grid format with 1016 rows by 1211 columns. There was a total of 601,320 cells in the grid. The T and SPC values corresponded to a total of 86 cells.

3.1. Random Forest(RF) Model

The random forest model is an ensemble classification technique that was developed as an extension of classification and regression trees (CART) to improve the prediction performance of the model [58]. The RF model constructs numerous decision trees to estimate the spatial relationship between groundwater and various topographic factors that consist of either categorical or continuous response variables. The RF model functions in two steps. First, it constructs the plurality of decision trees; this is the learning step. Second, a test to classify a loaded input value or predict its loading is performed. The advantages of the RF model include extremely high accuracy, simple and fast learning and testing algorithms, ability to handle thousands of input variables without deleting variables, good generalization performance through randomization, and multi-class algorithm characteristics.
Before running the RF model, we have defined two parameters. The first is the number of randomly sampled variables ( m t r y ) to use in each tree building process and the number of trees ( n t r e e ) to build in the forest to run. Both parameters should be optimized to minimize generalization errors. Breiman [59], Liaw and Wiener [60] stated that even a single variable ( m t r y = 1 ) can performance to high accuracy, while Grömping [61] showed that more than two variables (i.e., m t r y = 2 , 3 , 4 , , m ) should be used to increase the accuracy of the model.
The RF model consists of a combination of numerous trees generated by bootstrap samples using out-of-bag (OOB) errors. Two thirds of the samples are used for training, and the other 1/3 are used for verification. The OOB is an unbiased estimate of the generalization error. A detailed description of the mathematical formulation of RF model is found in Breiman [59].
The goal of the RF model is to analyze the relationship between independent and dependent variables in the model building stage to determine the weights for each factor. In this study, in order to analyze correlations between groundwater and related factors, groundwater productivity data (T or SPC) was used as a dependent variable, and 18 groundwater-related factors were used as independent variables. The parameters used in the RF model are as follows: (1) the number of randomly sampled variables in each spilt ( m t r y ), (2) the number of trees to be grown ( n t r e e ), and (3) the minimum size of the observations at the end node of the tree (node size). These parameters were set to 500, 10, and 5, respectively using the STATISTICA 10.1 software [62].

3.2. Logistic Regression(LR) Model

The LR model is useful for predicting whether groundwater exists in a particular location, based on the predictor variables. The primary reason for using the LR model is to explain the relationship between dependent variables and independent variables [63]. The advantage of the LR model is that variables need not have normal distribution, regardless of whether they are continuous, discrete, or a combination of both types [26]. In this study, dependent variables indicate the presence of groundwater using a binary variable. The following show the relationship between groundwater presence and the dependency of a variable:
P = e Z 1 + e Z
Z = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + + b m x m
Z = l o g e [ p 1 p ] = l o g i t ( p )
where a is the intercept of the LR model, x 1 , x 2 , , x m are regression coefficient of the logistic regression model, and Z is a linear combination function of the coefficient representing a linear relationship. The parameters b 1 , b 2 , , b m are the independent variable. The probability (p) represents to the estimated probability of potential groundwater. The value of Z is denoted in the binary form, where Z = 1 implies more than a specific amount of groundwater (T 2.61 m 2 /day or SPC 4.88 m 3 /day/m), and Z = 0 indicates either less than that specific amount or no groundwater. Function Z is represented as logit (p) is a likelihood ratio that the dependent variable Z is 1. The LR model coefficient is a value that represents the percentage (%) of the variance of the dependent that variable is explained by the independent variable, and has a value between 0.00 and 1.00. A value closer to 1.00 means closer to a perfect relationship, which is almost the same as the square of the multiple correlation coefficient in a linear regression.

3.3. Boosted Regression Tree (BRT) Model

BRT is one of the many ensemble models that combine two or more models to enhance the capability for prediction. This model can be used to effectively classify or perform regression analysis considering continuous and categorical data. The model constructs a binary tree that is divided into two samples. Each split node determines whether an observed value corresponds to a binary of 1 or 0. The residual and standard deviations of the node are calculated in the following step. It has also been used in research to detect natural resources such as groundwater and minerals. Basically, the model depends on the number of the regression trees produced. Thus, it is likely to be the same as the RF model. BRT adopts a machine learning technique to resolve regression problems and uses a predefined loss function to create each regression tree step by step. It measures the error in a step and fixes it in the subsequent steps. The BRT model does not need to transform original data or remove outlier data for training. It is also suitable for analyzing nonlinear complex relationships [64].

3.4. GPP Mapping Process

First, we applied the Frequency Ratio (FR) method to calculate the spatial relationships between groundwater presence and related topographical factors. In this study, we analyzed for correlation between the groundwater wells locations and 18 factors related to groundwater productivity using frequency ratio of each factor. The relationship analysis is the ratio of the area where groundwater productivity in the total study area. So, a value of 1 indicates an average. In other words, if the average value of FR is greater than 1, it has a higher correlation with groundwater. Also, the logistic regression coefficient correlates with the potential productivity of groundwater, and the higher the value, the higher the correlation. The details of FR calculations are described in more detail in [6]. To analyze the GPP, the results of the three models were compared based on the FR model results. The results of the RF and BRT models were calculated using the STATISTICA software and there were classified and re-summed over regression values at all nodes to calculate the importance of the predictor. The LR model was calculated using SPSS 21 statistical software. Groundwater productivity data (T and SPC) were randomly separated for each model. They were used as training (50%) and verification (50%) data.
Most of the maps used in GPP analysis were generated using the ArcGIS 10.5 software. A number of groundwater-related factor maps, including geological maps, were regenerated in the ASCII grid format at a 30 m resolution. The groundwater productivity values (T and SPC) were set as independent variables and used as training data. In the following step, all data were classified as categorical or continuous data. The continuous variables included the Terrain Ruggedness Index (TRI), slope length (LS) factor, hydraulic slope, depth of groundwater, lineament density, slope gradient, relative slope position, drainage basins, valley depth, distance from faults, Topographic Wetness Index (TWI), and distance from lineament. The categorical variables included the hydro-geology, soil texture, convergence index, plan curvature, and land cover.
To validate the algorithms, 86 T data points were divided into two different groups and randomly selected. Verification was performed using the previously segregated verification data. In the final verification process, a Receiver Operating Characteristics (ROC) curve was implemented. ROC is an index for the performance of models [13]. To quantitatively determine the accuracy of the models’ verification, Area Under the Curve (AUC) of the ROC curve was recalculated for the total area and the correct prediction accuracies were obtained. Typically, the accuracy of the validation of the model is measured by the area under the ROC curve, and it can lie between 0.5 and 1. High AUC values indicate the superior performance of an algorithm.

4. Results

4.1. Correlation between GPP and the Variables

Generally, the productivity of groundwater is affected by various factors such as topography, hydrogeology, soil, forestation, and flow velocity [54]. To quantitatively analyze, we examined the relationship between the related factors using the FR and LR models.
Table 3 shows the coefficients of the factors in each class range; they were calculated with respect to groundwater T and SPC values. In general, the relationship between slope gradient and groundwater is inversely proportional. Thus, when the slope gradient is high, it is difficult for groundwater to accumulate in the aquifer. For slopes between 0° to 5°, the ratio was approximately 3.0, which indicated a high probability of GPP. The hydraulic slope gradient, relative slope position, valley depth, and slope-length (LS) were also found to be inversely proportional to GPP.
In the case of hydrogeological factors, the frequency ratio was higher for unconsolidated clastic sediment areas (Table 3). It was 0 for carbonate rocks, dolomite rock, and non-porous volcanic rocks. The areas with unconsolidated clastic sediments were shows that have a stronger GPP than areas with carbonated rocks because groundwater cannot easily penetrate between their particles that were so tiny and dense. With regards to land cover, groundwater potential values were higher for paddy fields and urban areas and lower in the mixed forest area. In fact, it is highly probable that there are many wells containing a large amount of groundwater in the mixed-forest covered area, but the frequency ratio was relatively low because 65% of the study area was covered with mixed-forest.
In the case of soil texture, the frequency ratio was higher for the D class (very slow infiltration rate) and lower for the C class (low infiltration rate). Sandy soil (D class) has an excellent effect on groundwater penetration because of its high permeability. Conversely, clay soil (A) has a low impact on groundwater accumulation because of its poor drainage capability and permeability.
In case of plan curvature, concave areas have a ratio of about 1.2 which is considerably higher than that of convex areas (approximately 0.70). Concave surfaces contain more water, particularly during periods of heavy rainfall. Therefore, areas with concave surfaces are more advantageous than areas with convex surfaces for storing groundwater. The GPP frequency ratio generally increases with increase in linear density. That is, the nearer the linear density is to 0, the lower the GPP generation is. When the value of linear density is larger, the GPP generation is also higher. With regards to the topological factors, such as distance from a fault, lineament, and channel network, the closer the area is to a river, the higher is the likelihood of the groundwater productivity. The longer the distance is, the lesser is the likelihood of groundwater generation. In other words, various linear structures and remote areas were leakier, while the nearby areas had better recharge and higher penetration.
The depth of groundwater was highest between 6 m and 12 m. The TWI index is defined as a function of the upstream contributing area per unit and the slope gradient. The results have shown that as the TWI value increases, the groundwater productivity ratio also increases. This is because high wetting index demonstrates better groundwater retention capability of an area.
The TRI index represents the altitude difference between adjacent cells in grid. That is, the higher the TRI value, the greater is the difference in altitude between adjacent areas. Therefore, low TRI value indicates high groundwater content because the GPP is higher at low altitude differences. A drainage basin represents a catchment area. When the TRI value is between 100 and 105, the GPP is the highest.
For mapping the GPP, the LR coefficients of the 18 factors were computed using the SPSS software (Table 3). The LR coefficient represents the probability of occurrence, and this value typically ranges between 0 and 1. If the value of multiple logistic coefficients is calculated to be less than 0, the GPP is low. This is so because GPP becomes smaller than 1 when converted to the corresponding log value.
Positive values were obtained for the relative slope position, TWI, lineament density, distance from lineament, drainage basin, and TRI when T productivity was used in the LR model. SPC productivity values were positive for LS factor, lineament density, distance from lineament, depth of groundwater, drainage basin, and TRI. Non-porous volcanic rocks had the lowest T and SPC values for the hydrogeology factor, thereby indicating the lowest impact on GPP. The urban areas had the highest values of land-cover factor. The flat item showed least influence on the GPP corresponding to plan curvature and convergence index variables.
Table 4 shows the importance of the values of each predictor variable in the BRT and RF models. The data in the table also explain the correlation between GPP and the related factors. The predictor importance ranges between 0 and 1. It indicates a factor near 1 that can be said to be closely related to the presence of groundwater. As shown in Table 4, the most important variable affecting groundwater productivity (both T and SPC values) when applying the BRT and RF models is land cover. Conversely, the least influential variable in the case of both models is plan curvature.

4.2. GPP Mapping and Validation

The GPP map was generated using the predictor values determined by the three models. That is, the higher the value of probability for an area, the more likely it is to contain groundwater. The probabilities calculated by the three models was re-expressed in the form of the groundwater productivity potential index (Figure 4).
The next step was to validate the GPP maps created using the BRT, RF, and LR models. The prediction rate of the validations was determined by comparing the GPP maps created using the RF, LR, and BRT models with the remaining 50% of groundwater wells data not used in the training set. A GPP rank with more than 10% value could explain the presence of 30% of the groundwater wells and rank with more than 60% value could explain 90% of the groundwater wells identified by the three models.
AUC was used to comparatively analyze the results of the three models in order to quantitatively compare the results of each model. Upon validation of the GPP maps (Figure 5), the LR, RF, and BRT models produced AUC values of 0.8504, 0.8021, and 0.8166 with T values (i.e., the prediction accuracy was 85.04%, 80.21% and 81.66%), respectively, and 0.8222, 0.7857, and 0.8153 with SPC values (i.e., the prediction accuracy was 82.22%, 78.57% and 81.53%), respectively. All models indicated the presence of 90% of the potential groundwater wells in 60% of the study area.
This study applied the RF, BRT (data mining), and LR (statistics) models to estimate GPP. The RF and BRT data mining models showed good accuracy while spatially predicting GPP. Their accuracy amounted to approximately 80% and more. The LR statistical model showed the highest verification accuracy, reaching beyond 85%.

5. Conclusions and Discussion

The region of Okcheon in South Korea needs a stable water management system that can provide high-quality drinking water and water for agricultural use; this system should be based on sources other than surface water. To provide a sufficient amount of water, it is very important to predict the locations of uncontaminated usable groundwater with accuracy. Therefore, this study estimated the groundwater of the un-surveyed area by analyzing the relationship between the well locations and the surrounding environment including the topographic factors using three models. For the models applied in this study, half of well location data (43 well locations) were used as training data and the other half were used as validation data. Total 18 topography, soil, and land cover variables were used as independent variables. Finally, the estimated potential groundwater maps were provided by using three models of LR, RF and BRT.
From the results of the LR, RF and BRT models, the following relationships between wells data and the examined factors could be established. GPP is higher in gentle slopes, hydraulic slopes, lower relative slope positions, and shorter slope lengths because rainfall running off from the upper regions accumulates in the lower regions. This in turn positively influences the aquifer. In addition, the TRI is an index representing the altitude difference between two adjacent areas in open terrain. The GPP is higher where the altitude difference is not significant. On the other hand, distance from fault, lineament, and channel network showed a negative correlation with GPP. Groundwater in aquifers hydrologically flows from high to low gradients like surface water. As a result, most of the groundwater charged in areas of low altitude and some are eventually discharged back to the river, the lowest zone. In the end, most areas with large amounts of groundwater are close to the river, which are clearly reflected in this study. These results indicate that closeness to rivers increases the GPP of an area, as is known from the hydrogeological point of view.
In case of “distance from the fault” factor, Bense, et al. [61] mentioned that the deformation along faults in the shallow crust (<1 km) introduces permeability heterogeneity and anisotropy, which has an important impact on processes such as groundwater. While the results in this paper show that voids between defects have a positive effect on groundwater recharge, direct assessment of the impact between defects and groundwater recharge remains a difficult discussion. We considered that needs to be further discussed through various experiments.
In conclusion, the three proposed models were able to estimate the location of groundwater wells with an average GPP probability of over 80%. These results validate the usefulness of the three models for groundwater resource development. The final GPP map proposed in this paper used 86 limited wells data, so there is a limit to reflect the real world. However, if we collect more wells data for this region in the future and perform a GPP analysis, you can expect better results. Also, it is showed from the results that the accuracy is higher when GPP is predicted using T values rather than SPC values. Despite the limitations, these GPP mapping methods can be efficiently applied in the future for national groundwater development and utilization planning in Korea.

Author Contributions

S.L. and H.-S.J. conceived and designed the experiments; J.-C.K. performed the experiments; All the authors analyzed the results, J.-C.K. wrote the paper.

Funding

This research received no external funding.

Acknowledgments

This research was conducted as parts of the Basic Research Project of the Korea Institute of Geoscience and Mineral Resources (KIGAM) and the Science and Technology Internationalization Project (NRF-2016K1A3A1A09915721) hosted by the National Research Foundation of Korea (NRF) Grant, which is funded by the Ministry of Science and ICT.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. The United Nations World Water Development Report 2018 (WWDR). In Nature-Based Solutions for Water; UNESCO: Paris, France, 2018.
  2. World Health Organization (WHO). Progress on Sanitation and Drinking Water: 2015 Update and mdg Assessment; WHO: Geneva, Switzerland, 2015. [Google Scholar]
  3. World Economic Forum (WEF). The Global RISK Report 2016; WEF: Geneva, Switzerland, 2016. [Google Scholar]
  4. Vadiati, M.; Asghari-Moghaddam, A.; Nakhaei, M.; Adamowski, J.; Akbarzadeh, A. A fuzzy-logic based decision-making approach for identification of groundwater quality based on groundwater quality indices. J. Environ. Manag. 2016, 184, 255–270. [Google Scholar] [CrossRef] [PubMed]
  5. Naghibi, S.A.; Moghaddam, D.D.; Kalantar, B.; Pradhan, B.; Kisi, O. A comparative assessment of gis-based data mining models and a novel ensemble model in groundwater well potential mapping. J. Hydrol. 2017, 548, 471–483. [Google Scholar] [CrossRef]
  6. Kim, J.-C.; Jung, H.-S.; Lee, S. Groundwater productivity potential mapping using frequency ratio and evidential belief function and artificial neural network models: Focus on topographic factors. J. Hydroinf. 2018, 20, 1436–1451. [Google Scholar] [CrossRef]
  7. Ministry of Land, Transport and Maritime Affairs (MLTM). National Groundwater Monitoring Network in Korea Annual Report 2016; MLTM: Seoul, Korea, 2016.
  8. Kim, J.-C.; Yoon, J.-D.; Park, J.-S.; Choi, J.-Y.; Yoon, J.-H. Utilizing the revised universal soil loss equation (rusle) technique comparative analysis of soil erosion risk in the Geumhogang Riparian Area. Korean J. Remote Sens. 2018, 34, 179–190. [Google Scholar]
  9. Kim, D.; Jung, H.-S.; Kim, J.-C. Comparison of snow cover fraction functions to estimate snow depth of South Korea from modis imagery. Korean J. Remote Sens. 2017, 33, 401–410. [Google Scholar]
  10. Kim, J.-C.; Jung, H.-S. Application of landsat tm/etm+ images to snow variations detection by volcanic activities at southern volcanic zone, Chile. Korean J. Remote Sens. 2017, 33, 287–299. [Google Scholar]
  11. Park, S.-H.; Jung, H.-S.; Choi, J.; Jeon, S. A quantitative method to evaluate the performance of topographic correction models used to improve land cover identification. Adv. Space Res. 2017, 60, 1488–1503. [Google Scholar] [CrossRef]
  12. Baek, W.-K.; Jung, H.-S.; Jo, M.-J.; Lee, W.-J.; Zhang, L. Ground subsidence observation of solid waste landfill park using multi-temporal radar interferometry. Int. J. Urban Sci. 2018, 1–16. [Google Scholar] [CrossRef]
  13. Lee, S.; Lee, M.-J.; Jung, H.-S. Data mining approaches for landslide susceptibility mapping in Umyeonsan, Seoul, South Korea. Appl. Sci. 2017, 7, 683. [Google Scholar] [CrossRef]
  14. Kim, J.-C.; Kim, D.-H.; Park, S.-H.; Jung, H.-S.; Shin, H.-S. Application of landsat images to snow cover changes by volcanic activities at Mt. Villarica and Mt. Lliama, Chile. Korean J. Remote Sens. 2014, 30, 341–350. [Google Scholar] [CrossRef]
  15. Lee, Y.-S.; Park, S.-H.; Jung, H.-S.; Baek, W.-K. Classification of natural and artificial forests from kompsat-3/3a/5 images using artificial neural network. Korean J. Remote Sens. 2018, 34, 1399–1414. [Google Scholar]
  16. Baek, W.-K.; Jung, H.-S.; Chae, S.-H.; Lee, W.-J. Two-dimensional velocity measurements of uvêrsbreen glacier in Svalbard using terrasar-x offset tracking approach. Korean J. Remote Sens. 2018, 34, 495–506. [Google Scholar]
  17. Al-Abadi, A.M. Modeling of groundwater productivity in northeastern Wasit Governorate, Iraq using frequency ratio and Shannon’s entropy models. Appl. Water Sci. 2015, 7, 699–716. [Google Scholar] [CrossRef]
  18. Jothibasu, A.; Anbazhagan, S. Spatial mapping of groundwater potential in ponnaiyar river basin using probabilistic-based frequency ratio model. Model. Earth Syst. Environ. 2017, 3, 33. [Google Scholar] [CrossRef]
  19. Naghibi, S.A.; Pourghasemi, H.R. A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour. Manag. 2015, 29, 5217–5236. [Google Scholar] [CrossRef]
  20. Golkarian, A.; Naghibi, S.A.; Kalantar, B.; Pradhan, B. Groundwater potential mapping using c5. 0, random forest, and multivariate adaptive regression spline models in gis. Environ. Monit. Assess. 2018, 190, 149. [Google Scholar] [CrossRef] [PubMed]
  21. Rahmati, O.; Pourghasemi, H.R.; Melesse, A.M. Application of gis-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran. Catena 2016, 137, 360–372. [Google Scholar] [CrossRef]
  22. Sameen, M.I.; Pradhan, B.; Lee, S. Self-learning random forests model for mapping groundwater yield in data-scarce areas. Nat. Resour. Res. 2018, 28, 1–19. [Google Scholar] [CrossRef]
  23. Zabihi, M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Behzadfar, M. Gis-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ. Earth Sci. 2016, 75, 1–19. [Google Scholar] [CrossRef]
  24. Park, S.; Hamm, S.-Y.; Jeon, H.-T.; Kim, J. Evaluation of logistic regression and multivariate adaptive regression spline models for groundwater potential mapping using r and gis. Sustainability 2017, 9, 1157. [Google Scholar] [CrossRef]
  25. Zandi, J.; Ghazvinei, P.T.; Hashim, R.; Yusof, K.B.W.; Ariffin, J.; Motamedi, S. Mapping of regional potential groundwater springs using logistic regression statistical method. Water Resour. 2016, 43, 48–57. [Google Scholar] [CrossRef]
  26. Lee, S.; Lee, M.-J. Susceptibility mapping of Umyeonsan using logistic regression (lr) model and post-validation through field investigation. Korean J. Remote Sens. 2017, 33, 1047–1060. [Google Scholar]
  27. Lee, S.; Kim, J.-C.; Jung, H.-S.; Lee, M.J.; Lee, S. Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul Metropolitan City, Korea. Geomat. Nat. Haz. Risk. 2017, 8, 1185–1203. [Google Scholar] [CrossRef]
  28. Mousavi, S.M.; Golkarian, A.; Naghibi, S.A.; Kalantar, B.; Pradhan, B. Gis-based groundwater spring potential mapping using data mining boosted regression tree and probabilistic frequency ratio models in iran. AIMS Geosci. 2017, 3, 91–115. [Google Scholar]
  29. Lee, S.; Hong, S.-M.; Jung, H.-S. Gis-based groundwater potential mapping using artificial neural network and support vector machine models: The case of Boryeong City in Korea. Geocarto. Int. 2018, 33, 847–861. [Google Scholar] [CrossRef]
  30. Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resour. Manag. 2017. [Google Scholar] [CrossRef]
  31. Oh, H.-J.; Kadavi, P.R.; Lee, C.-W.; Lee, S. Evaluation of landslide susceptibility mapping by evidential belief function, logistic regression and support vector machine models. Geomat. Nat. Haz. Risk. 2018, 9, 1053–1070. [Google Scholar] [CrossRef] [Green Version]
  32. Alizadeh, M.; Alizadeh, E.; Asadollahpour Kotenaee, S.; Shahabi, H.; Beiranvand Pour, A.; Panahi, M.; Bin Ahmad, B.; Saro, L. Social vulnerability assessment using artificial neural network (ann) model for earthquake hazard in Tabriz City, Iran. Sustainability 2018, 10, 3376. [Google Scholar] [CrossRef]
  33. Kim, D.; Jung, H.-S. Mapping oil spills from dual-polarized sar images using an artificial neural network: Application to oil spill in the Kerch Strait in November 2007. Sensors 2018, 18, 2237. [Google Scholar] [CrossRef]
  34. Lee, S.; Lee, S.; Song, W.; Lee, M.-J. Habitat potential mapping of marten (martes flavigula) and leopard cat (prionailurus bengalensis) in South Korea using artificial neural network machine learning. Appl. Sci. 2017, 7, 912. [Google Scholar] [CrossRef]
  35. Lee, S.; Song, K.-Y.; Kim, Y.; Park, I. Regional groundwater productivity potential mapping using a geographic information system (gis) based artificial neural network model. Hydrogeol. J. 2012, 20, 1511–1527. [Google Scholar] [CrossRef]
  36. Sokeng, V.J.; Kouamé, F.; Ngatcha, B.N.; N’da, H.D.; You, L.A.; Rirabe, D. Delineating groundwater potential zones in western cameroon highlands using gis based artificial neural networks model and remote sensing data. Int. J. Innovation Appl. Stud. 2016, 15, 747–759. [Google Scholar]
  37. Ghorbani Nejad, S.; Falah, F.; Daneshfar, M.; Haghizadeh, A.; Rahmati, O. Delineation of groundwater potential zones using remote sensing and gis-based data-driven models. Geocarto. Int. 2017, 32, 167–187. [Google Scholar] [CrossRef]
  38. Lee, S.; Kim, Y.-S.; Oh, H.-J. Application of a weights-of-evidence method and gis to regional groundwater productivity potential mapping. J. Environ. Manag. 2012, 96, 91–105. [Google Scholar] [CrossRef] [PubMed]
  39. Tahmassebipoor, N.; Rahmati, O.; Noormohamadi, F.; Lee, S. Spatial analysis of groundwater potential using weights-of-evidence and evidential belief function models and remote sensing. Arab. J. Geosci. 2016, 9, 79. [Google Scholar] [CrossRef]
  40. Mogaji, K.A.; San Lim, H. Application of dempster-shafer theory of evidence model to geoelectric and hydraulic parameters for groundwater potential zonation. NRIAG J. Astron. Geophys. 2018, 7, 134–148. [Google Scholar] [CrossRef]
  41. Pourghasemi, H.R.; Beheshtirad, M. Assessment of a data-driven evidential belief function model and gis for groundwater potential mapping in the Koohrang Watershed, Iran. Geocarto. Int. 2015, 30, 662–685. [Google Scholar] [CrossRef]
  42. Zeinivand, H.; Ghorbani Nejad, S. Application of gis-based data-driven models for groundwater potential mapping in Kuhdasht Region of Iran. Geocarto. Int. 2018, 33, 651–666. [Google Scholar] [CrossRef]
  43. Kim, J.-C.; Lee, S.; Jung, H.-S.; Lee, S. Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto. Int. 2018, 33, 1000–1015. [Google Scholar] [CrossRef]
  44. Nsiah, E.; Appiah-Adjei, E.K.; Adjei, K.A. Hydrogeological delineation of groundwater potential zones in the nabogo basin, ghana. J. Afr. Earth Sci. 2018, 143, 1–9. [Google Scholar] [CrossRef]
  45. Lee, S.; Hyun, Y.; Lee, M.-J. Groundwater potential mapping using data mining models of big data analysis in Goyang-Si, South Korea. Sustainability 2019, 11, 1678. [Google Scholar] [CrossRef]
  46. Jo, M.J.; Jung, H.S.; Won, J.S. Detecting the source location of recent summit inflation via three-dimensional insar observation of Kilauea volcano. Remote Sens. 2015, 7, 14386–14402. [Google Scholar] [CrossRef]
  47. Hu, F.; Xia, G.-S.; Hu, J.; Zhang, L. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens. 2015, 7, 14680–14707. [Google Scholar] [CrossRef]
  48. Watts, A.C.; Ambrosia, V.G.; Hinkley, E.A. Unmanned aircraft systems in remote sensing and scientific research: Classification and considerations of use. Remote Sens. 2012, 4, 1671–1692. [Google Scholar] [CrossRef]
  49. Atzberger, C. Advances in remote sensing of agriculture: Context description, existing operational monitoring systems and major information needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef]
  50. Tam, V.T.; Nga, T.T.V. Assessment of urbanization impact on groundwater resources in hanoi, vietnam. J. Environ. Manag. 2018, 227, 107–116. [Google Scholar] [CrossRef]
  51. Ministry for Food, Agriculture, Forestry and Fisheries (MFAFF). Rural Groundwater Survey Report (Okcehon Gun); MFAFF: Seoul, Korea, 2010.
  52. Bradbury, K.R.; Rothschild, E.R. A computerized technique for estimating the hydraulic conductivity of aquifers from specific capacity data. Groundwater 1985, 23, 240–246. [Google Scholar] [CrossRef]
  53. Razack, M.; Huntley, D. Assessing transmissivity from specific capacity in a large and heterogeneous alluvial aquifer. Groundwater 1991, 29, 856–861. [Google Scholar] [CrossRef]
  54. Oh, K.-Y.; Jung, H.-S.; Lee, M.-J. Accuracy evaluation of dem generated from satellite images using automated geo-positioning approach. Korean J. Remote Sens. 2017, 33, 69–77. [Google Scholar] [CrossRef]
  55. Moore, I.; Burch, G. Modelling erosion and deposition: Topographic effects. Transact. ASAE 1986, 29, 1624–1630. [Google Scholar] [CrossRef]
  56. Claps, P.; Fiorentino, M.; Oliveto, G. Informational entropy of fractal river networks. J. Hydrol. 1996, 187, 145–156. [Google Scholar] [CrossRef] [Green Version]
  57. Riley, S.J. Index that quantifies topographic heterogeneity. Intermountain J. Sci. 1999, 5, 23–27. [Google Scholar]
  58. Cutler, A.; Cutler, D.R.; Stevens, J.R. Random forests. In Ensemble Machine Learning; Springer: Berlin, Germany, 2012; pp. 157–175. [Google Scholar]
  59. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  60. Liaw, A.; Wiener, M. Classification and regression by randomforest. R News 2002, 2, 18–22. [Google Scholar]
  61. Grömping, U. Variable importance assessment in regression: Linear regression versus random forest. Am. Statist. 2009, 63, 308–319. [Google Scholar] [CrossRef]
  62. Were, K.; Bui, D.T.; Dick, Ø.B.; Singh, B.R. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an afromontane landscape. Ecol. Indic. 2015, 52, 394–403. [Google Scholar] [CrossRef]
  63. Lee, S.; Talib, J.A. Probabilistic landslide susceptibility and factor effect analysis. Environ. Geol. 2005, 47, 982–990. [Google Scholar] [CrossRef]
  64. Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
Figure 1. Study area (a) and geological map (b).
Figure 1. Study area (a) and geological map (b).
Remotesensing 11 02285 g001
Figure 2. The spatial database constructed for Groundwater Productivity Potential (GPP). (a) slope gradient, (b) hydraulic slope, (c) relative slope position, (d) valley depth, (e) Topographic Wetness Index (TWI), (f) Slope length (LS) factor, (g) drainage basin, (h) distance from lineament, (i) lineament density, (j) distance from fault depth (continue). (k) distance from channel network, (l) depth of groundwater, (m) Terrain Ruggedness Index (TRI), (n) Hydrogeology, (o) convergence index, (p) soil texture, (q) land cover, (r) plan curvature.
Figure 2. The spatial database constructed for Groundwater Productivity Potential (GPP). (a) slope gradient, (b) hydraulic slope, (c) relative slope position, (d) valley depth, (e) Topographic Wetness Index (TWI), (f) Slope length (LS) factor, (g) drainage basin, (h) distance from lineament, (i) lineament density, (j) distance from fault depth (continue). (k) distance from channel network, (l) depth of groundwater, (m) Terrain Ruggedness Index (TRI), (n) Hydrogeology, (o) convergence index, (p) soil texture, (q) land cover, (r) plan curvature.
Remotesensing 11 02285 g002aRemotesensing 11 02285 g002b
Figure 3. Flow chart of the study procedures.
Figure 3. Flow chart of the study procedures.
Remotesensing 11 02285 g003
Figure 4. The results of GPP maps generated using LR, BRT and RF models. T/SPC values of (a,b) LR model, (c,d) BRT model and (e,f) RF model.
Figure 4. The results of GPP maps generated using LR, BRT and RF models. T/SPC values of (a,b) LR model, (c,d) BRT model and (e,f) RF model.
Remotesensing 11 02285 g004
Figure 5. ROC curves for the GPP maps with T/SPC values produced by (a) LR model, (b) BRT model and (c) RF model.
Figure 5. ROC curves for the GPP maps with T/SPC values produced by (a) LR model, (b) BRT model and (c) RF model.
Remotesensing 11 02285 g005
Table 1. Spatial data set related to groundwater of the study area.
Table 1. Spatial data set related to groundwater of the study area.
CategoryFactorsData TypeScale
Geological map 1HydrogeologyPolygon1:50,000
Land cover map 2Land coverPolygon1:5000
Soil map 3Soil texturePolygon1:25,000
Topographic map 4Slope gradient
Hydraulic slope gradient
Relative slope position
Valley depth
Topographic Wetness Index (TWI)
Slope Length factor (LS-factor)
Drainage basin
Distance from lineament
Line density
Distance from fault
Distance from channel network
Depth of groundwater
Terrain Ruggedness Index (TRI)
Convergence index
Plan curvature
GRID1:5000
1 The geology map offered by Ministry of Land, Transport and Maritime Affairs. 2 The land cover map offered by the Korea Ministry of Environment. 3 The soil map and land cover map offered by the National Institute of Agricultural Science and Technology. 4 Topographical maps offered by National Geographic Information Institute.
Table 2. The results of the pumping test of wells in the Okcheon-gun.
Table 2. The results of the pumping test of wells in the Okcheon-gun.
Type of AquifersSPC (m3/day/m)T (m2/day)
MinMaxAverageMedianMinMaxAverageMedian
Porous rock saturated aquifers2.23769.2320.074.880.70489.9123.782.61
Alluvial aquifer2.67283.3337.600.8373.1611.30
Table 3. Frequency ratio and logistic regression (LR) model’s results between groundwater productivity data and related factors.
Table 3. Frequency ratio and logistic regression (LR) model’s results between groundwater productivity data and related factors.
FactorClassNo. of Pixels in Domain a% of DomainT ≥ 2.61 bSPC ≥ 4.88 cLogistic Regression Coefficient
No. of Data 1% of Data 1FR of Data 1No. of Data 1%of Data 1FR of Data 1T ≥ 2.61SPC ≥ 4.88
Slope gradient (degree)0–5.11113,31918.852558.143.092455.812.96−0.58−0.74
5.11–13.97128,40121.351637.211.741637.211.74
13.97–20.82112,84218.7724.650.2536.980.37
20.82–28.18118,77019.7500.000.0000.000.00
28.18–90127,98821.2800.000.0000.000.00
Hydraulic slope (degree)0–5154,28125.663376.742.993581.403.17−0.40−0.84
5–10113,23918.83818.600.99716.280.86
10–20178,30329.6524.650.1612.330.08
20–30101,22116.8300.000.0000.000.00
30–9054,2769.0300.000.0000.000.00
Relative slope position0–0.0275118,08619.641739.532.012251.162.610.32−0.52
0.0275–0.2235122,63920.401944.192.171432.561.60
0.2235–0.4784121,24820.1624.650.2324.650.23
0.4784–0.7529120,08419.9712.330.1236.980.35
0.7529–1119,26319.8349.300.4724.650.23
Valley depth (m)0–19.1231118,57119.72818.600.94613.950.47−0.16−0.28
19.1231–37.0510126,09420.971739.531.891227.910.79
37.0510–58.5645122,75620.41511.630.571125.581.25
58.5645–88.4443119,94719.951023.261.17818.601.42
88.4443–304.7743113,95218.9536.980.37613.951.07
TWI−0.27–3.6117,06219.4724.650.2412.330.120.02−0.11
3.6–4.35129,04621.4624.650.2236.980.33
4.35–5.4118,68519.7436.980.3524.650.24
5.4–7.8117,83219.591739.532.021944.192.26
7.8–25.37118,69519.741944.192.241841.862.12
LS factor0–1.0473117,81219.592558.142.972558.142.97−0.560.23
1.0473–3.7223119,31419.841432.561.641330.231.52
3.7223–6.3280123,64920.5636.980.3436.980.34
6.3280–8.9336123,83720.60912.330.1124.650.23
8.9336–47.4598116,70819.4100.000.0000.000.00
Lineament density (km/km2)0–0.6219118,88819.7749.300.4749.300.470.060.05
0.6219–1.0305123,34820.51716.280.79716.280.79
1.0305–1.4036123,43720.53920.931.021125.581.25
1.4036–1.8300118,21819.661125.581.301227.911.42
1.8300–4.5306117,42919.531227.911.43920.931.07
Distance from fault (m)0–783116,64119.401023.261.201125.581.32−0.13−0.27
783–1740122,11620.311432.561.601023.261.15
1740–2957122,19420.32818.600.92818.600.92
2957–4610120,77320.08716.280.81716.280.81
4610–11,090119,59619.8949.300.47716.280.82
Distance from lineament (m)0–84133,99522.281432.561.461534.881.570.120.05
84–182119,97819.95818.600.93920.931.05
182–308119,28619.84818.600.94613.950.70
308–510117,39719.52818.600.95920.931.07
510–1804110,66418.40511.630.6349.300.51
Distance from channel network (m)0–10.7073126,75021.082558.142.762865.123.09−0.68−0.05
10.7073–29.9805124,45620.701534.881.691227.911.35
29.9805–57.8195120,03519.9612.330.1224.650.23
57.8195–104.9317115,49919.2112.330.1212.330.12
104.9317–546.0730114,58019.0512.330.1200.000.00
Depth of ground water (m)0–677,39812.87818.601.45818.601.45−0.370.33
6–12165,83127.582251.161.862455.812.02
12–18118,08319.64920.931.07920.931.07
18–2487,65514.5824.650.321 2.330.16
24–30152,35325.3424.650.1812.330.09
Drainage basin (km2)0–100.8281120,21919.99613.950.70716.280.810.600.32
100.8281–125.4287123,55320.551739.531.921944.192.15
125.4287–157.2648121,26720.17716.280.81716.280.81
157.2648–202.1247120,23820.001125.581.28920.931.05
202.1247–442.3421116,04319.3024.650.2412.330.12
Terrain Ruggedness Index (TRI)0–0.6067114,53219.052251.162.692455.812.930.750.52
0.6067–1.9716130,75621.741944.192.031637.211.71
1.9716–3.0333125,98720.9512.330.1112.330.11
3.0333–4.0950115,65619.2312.330.1224.650.24
4.0950–38.0000114,38919.0200.000.0000.000.00
Hydro geologyUnconsolidated clastic rock94,01015.631739.532.531739.532.53−0.20−1.44
Intrusive igneous rocks255,68342.521944.191.041739.530.930.04−0.99
Dolomite rock91731.5312.331.5200.000.000.75−11.90
Non-porous volcanic rock4310.0700.000.0000.000.00−8.13−8.27
Clastic sedimentary rock15,5362.5812.330.9012.330.901.02−0.26
Carbonate rocks 36080.6000.000.0000.000.001.760.71
Metamorphic rocks222,87937.06511.630.31818.600.5000
Land coverBarren land54640.9112.332.5600.000.000.5010.05
Field77,48812.891330.232.351330.232.350.2110.18
Paddy field62,78910.441841.864.011841.864.010.371.08
Mixed forest395,63065.7949.300.14511.630.18−1.159.82
Water24,1284.0112.330.5800.000.00−0.530.36
Wetlands39630.6600.000.0000.000.00−11.6410.24
Urban area21,9453.65511.633.19613.953.820.5310.97
Grass land99131.6512.331.4112.331.410.000.00
Soil textureHigh Infiltration rate 247,47141.152148.841.192353.491.300.5010.71
Moderate infiltration rate105,56317.561023.261.32716.280.930.2110.24
Low Infiltration rate199,49233.18716.280.49818.600.560.3710.03
Very slow infiltration rate22,5393.7549.302.48511.633.10−1.1510.22
Water26,2554.3712.330.5300.000.00−0.530.00
Plan curvatureConcave (−)308,87551.372967.441.312762.791.220.35−1.20
014090.2300.000.0000.000.00−9.79−9.66
Convex (+)291,03648.401432.560.671637.210.770.000.00
Convergence indexConcave (−)294,20848.932660.471.242660.471.240.430.91
014340.2400.000.0000.000.00−9.67−7.87
Convex (+)305,67850.831739.530.781739.530.780.000.00
a Total number of pixels is 601,320. b,c Total number of pixels of wells location is 43 (training set).
Table 4. Predictor of importance factor of the Boosted Regression Tree (BRT), Random Forest (RF) models.
Table 4. Predictor of importance factor of the Boosted Regression Tree (BRT), Random Forest (RF) models.
FactorBoosted Regression TreesRandom Forest
T ≥ 2.61SPC ≥ 4.88T ≥ 2.61SPC ≥ 4.88
Land cover1.0000001.0000001.0000000.823951
Relative slope position0.2509300.6908070.1391810.698339
Hydraulic slope gradient0.2822500.5091700.2316250.723901
Depth of groundwater0.1769120.4848240.1214990.970446
Slope gradient0.2970030.4802540.3024060.677954
Distance from channel network0.2591620.4575260.1090570.845124
Hydrogeology0.3714010.4556710.1136390.843345
Topographic Wetness Index (TWI)0.2177930.3499100.2557830.642187
LS-factor0.2518770.3237310.2504760.850150
Terrain Ruggedness Index (TRI)0.3329750.3114330.1623390.723926
Distance from fault0.1706140.3023490.5231010.933228
Soil texture0.1817880.3014340.5562081.000000
Drainage basin0.1691700.2534170.1497560.841819
Line density0.1010060.2149750.2609510.612850
Distance from lineament0.1144290.2011700.1747360.403119
Convergence index0.0619170.1353850.1553900.347651
Valley depth0.1556040.1330190.0702900.532714
Plan curvature0.0520660.0602350.1349180.392855

Share and Cite

MDPI and ACS Style

Kim, J.-C.; Jung, H.-S.; Lee, S. Spatial Mapping of the Groundwater Potential of the Geum River Basin Using Ensemble Models Based on Remote Sensing Images. Remote Sens. 2019, 11, 2285. https://doi.org/10.3390/rs11192285

AMA Style

Kim J-C, Jung H-S, Lee S. Spatial Mapping of the Groundwater Potential of the Geum River Basin Using Ensemble Models Based on Remote Sensing Images. Remote Sensing. 2019; 11(19):2285. https://doi.org/10.3390/rs11192285

Chicago/Turabian Style

Kim, Jeong-Cheol, Hyung-Sup Jung, and Saro Lee. 2019. "Spatial Mapping of the Groundwater Potential of the Geum River Basin Using Ensemble Models Based on Remote Sensing Images" Remote Sensing 11, no. 19: 2285. https://doi.org/10.3390/rs11192285

APA Style

Kim, J. -C., Jung, H. -S., & Lee, S. (2019). Spatial Mapping of the Groundwater Potential of the Geum River Basin Using Ensemble Models Based on Remote Sensing Images. Remote Sensing, 11(19), 2285. https://doi.org/10.3390/rs11192285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop