1. Introduction
In many hydrogeological modelling studies, models are calibrated using the zonation approach, in which a model domain is discretized into a finite number of zones representing the variability of a calibrated parameter, thus reducing the number of variables [
1,
2,
3]. The problem with the zonation approach is that the boundaries between zonation can cause an abrupt change in parameter values, which could be unrealistic [
4], and the structural error can be large [
5]. The full parameterization of a parameter at each node is an alternative to the zonation approach, but the computation cost is high [
4].
De Marsily [
6] introduced the pilot-points method as a means to reduce the computational burden of the full parameterization approach. In the pilot-points approach, the hydraulic parameter domain is represented by a finite number of points distributed over the model domain. The inverse problem solves for this parameter at these points and the results are interpolated (i.e., using kriging or another interpolation method). The inverse problem requires an optimization process to minimize the objective function at the calibration targets. After optimization, interpolation of the solution obtained at pilot-points is done to cover the entire model domain. The calibration is then obtained and checked against the threshold error. If the error value is above a pre-defined threshold value, then the optimization process is repeated again until the error is accepted.
The pilot-points approach was later developed to increase its efficiency, especially in highly non-linear models [
7,
8,
9]. Many studies have used geostatistics with the pilot-points method for fractured aquifer model calibration [
10,
11,
12]. Lavenue and Pickens [
13] used adjoint sensitivity to identify the most sensitive areas of the hydraulic conductivity to place pilot-points. Lavenue and de Marsily [
12] used kriging with pilot-points to calibrate a highly heterogeneous conductivity field. Doherty [
8] incorporated regularization with the pilot-points method using the parameter estimation software PEST [
14], whereas Alcolea et al. [
9] added prior information to the optimization problem to minimize the objective function and to reduce the number of parameters. Regularization makes use of information from data to reduce the number of parameters and makes calibration easier, especially in complex problems [
4]. Fienen et al. [
15] explained how to constrain regularization and produce a good calibration.
Although the pilot-points method eliminates the drawbacks of the zonation method, it has some limitations. The pilot-points method neglects other sources of model error, such as noise in the data, and can result in model over-parameterization [
15,
16]. One major problem of the pilot-points approach is the choice of the number and locations of pilot-points [
10]. While no rules exist for placing these points, it is recommended to place them in a uniform pattern, avoiding large gaps between them, and to use hydraulic properties characteristic of length as a guide for the separation distance [
4]. De Marsily [
6] recommended that the number of pilot-points should be less than the number of observed data of the calibrated model, and the placement of pilot-points should be at high head gradient locations. Some studies have suggested the use of the correlation length of a hydraulic property to distribute the points [
17,
18]. Klaas and Imteaz [
19] suggested a spatial distribution of the pilot-points based on the spatial structure of the model grid. While most previous studies focus on the number of pilot-points and how to select their locations, the main goal of this study is to explore the number of pilot-points adequate for calibration, and the effect of pilot-points location and distribution on the model results. In addition to that main goal, this study proposes a new methodology of the optimization of pilot-points locations different than the ones in the literature.
This paper considered the groundwater flow model in the northern karst aquifer of Qatar as a case study. This aquifer is the only natural source of fresh groundwater in the country and it is highly heterogeneous as it comprises karstified limestone with lots of conduits and cavities. A 3D finite-difference-based model was developed using MODFLOW. This model was calibrated using the pilot-points approach and PEST software [
14]. The calibration process relied on historical data of groundwater level, representing steady state conditions. A statistical analysis was developed to evaluate the impact of pilot-points number and locations on the estimated parameters. Two pilot-points configurations were tested: randomly and uniformly placed pilot-points, with 20 perturbations of points locations in each configuration.
The Study Area and Conceptual Model
The area of study is the northern karst aquifer of Qatar (
Figure 1). Qatar is located in the Arabian Peninsula, to the east of Saudi Arabia, with a total land area of 11,500 km
2. The country is very arid, with no surface water and barely 80 mm of average annual rainfall [
20]. The area of study covers approximately 4338 km
2 and its geology comprises three layers of limestone and dolomite of Pliocene and Eocene age, namely the Dam and Dammam Formation, Rus Formation, and Um Al-Radhuma Formation (
Figure 2). A thick layer of gypsum formation occurs within the Rus Formation and extends in many places over the southern aquifer. This has resulted in deterioration of the groundwater quality in the southern aquifer, compared to the north.
Qatar generally has a flat topography, except for some areas in the south. The country has a flat terrain as the topography varies between 0 near the coast to more than 100 m above the mean sea level in the south-west. Many land depressions occur in the country and vary in size between a few meters and more than one kilometer [
21]. These land depressions are a result of land collapse due to the dissolution of the evaporate layer (gypsum) in the sub-surface and also due to cracks in the limestone at the top layer of geology. These land depressions are important for groundwater recharge. Rainfall-runoff accumulates in land depressions after heavy storms, eventually recharging the aquifer [
22] and also enriching the soil by transporting fine sediments, making the depression suitable for agriculture [
21].
The main aquifer occurs in Rus and Um Al-Radhuma Formation and is recharged through local rainfall. Thicknesses of the geological formations vary, as is shown in
Figure 2, where thickness ranges between 50 m for the Dam and Dammam Formation to more than 300 m for the Um Al-Radhuma Formation [
23]. A layer of gypsum (Evaporite unit) occurs within the Rus Formation, especially in the southern part of the country (
Figure 2), but this layer is absent in the north. This explains why the groundwater quality is better in the north than the south.
The long term annual rainfall recharge is approximately 60 million m
3 in the entire country [
20,
21,
24]. Fresh groundwater occurs as lenses on the top of brackish and saline groundwater [
25], especially in the northern part of the country, which receives the highest rainfall.
The aquifer has been extensively pumped over the last three decades, which has resulted in a significant drop in the water table and increase in salinity.
Table 1 shows the groundwater abstraction from the aquifer for the period between 1976 and 2009 [
26].
The conceptual model was built based on available data of groundwater levels [
20] and a previous modelling study [
24]. The model area covers the northern part of the country and is bounded by the Arabian Gulf from the east, north, and west. The southern boundary was considered as the head-dependent flow boundary.
The finite deference-based MODFLOW 2000 model was used to simulate the flow in the area [
27]. The model grid contains 222 rows and 148 columns, with a regular grid size of 500 m in each direction. The model comprises one layer representing Rus and Um Al-Radhuma Formation. Top and bottom of the layers were identified based on structural contours of the geology [
28] and bore-logs [
20,
21].
The east, north, and west boundaries are the sea, which was considered as a constant head boundary, whereas the southern boundary was considered as a head-dependent flow boundary. The General Head Boundary GHB module in MODFLOW was used to simulate the flow along the southern boundary. Flow values were estimated based on contour maps and previous modelling studies [
24]. During the calibration process, the conductance value of the GHB was adjusted to decrease the error in the water budget [
4].
The aquifer is highly heterogeneous as hydraulic conductivity varies significantly from one location to another. In general, hydraulic conductivity may vary between 0.01 m/day up to more than 100 m/day, due to the existence of fractures and conduits in the limestone [
24]. The hydraulic conductivity values were later calibrated using the parameter estimation and uncertainty analysis software (PEST) with MODFLOW 2000 [
14].
Groundwater recharge comes essentially from rainfall, which is too little. Despite the lower amount of rainfall, recharge occurs when heavy storms create enough runoff that accumulates in land depressions. Several studies have estimated rainfall recharge in Qatar [
20,
21,
22,
29,
30]. From the previously-mentioned studies, it was found that rainfall recharge varies between 5 and 68 million m
3 per year. In this study, the recharge spatial distribution is based on Baalousha et al. [
22].
2. Model Calibration: PEST and Pilot-Points
The model was calibrated with PEST [
14] using the truncated singular value decomposition “SVD-assist” [
31], which is helpful in the case of high parameterization. By applying the SVD to the Jacobian matrix, super and sensitive parameters can be identified. Parameters with small eigenvalues can then be truncated. Kriging was used for interpolation of the parameter values obtained at the pilot-points.
The PEST calibration tool uses the least square method to minimize the difference between the model results and the observation measurements after applying weights to observations. The objective function is the squared sum of weighted residuals. The weights are assigned based on the uncertainty of field measurements; that is, measurements with high uncertainty have lower weights and vice versa.
The objective function
φ of the inverse problem is based on the best fit between the linear combination of input parameters
p, as affected by the model inputs
X compared to observations
h, is given by Doherty [
14,
32]:
where
Q is a diagonal matrix of the squared observation and
t indicates the transpose of the matrix. The weights and given by:
where
m is the number of observations (i.e., targets),
w is the weight of an observation point, and
r is the residual. The weights vary depending on the relative importance of field data. In this study, all observations have been assigned an equal weight of 1 as all have the same importance.
As the inverse problem is ill-posed, PEST uses several tools to make it more stable. Regularization is one of these mathematical tools. PEST includes two types of mathematical regularization: one is Tikhonov regularization and the other is singular value decomposition—or SVD.
The Tikhonov regularization approach uses more information with the ill-posed inverse problem, such as suggesting a preferred value of the calibrated parameters. The SVD approach reduces the dimensionality of the problem by subtractions of parameters. The readers can refer to [
4,
5] for more details on regularization in PEST.
PEST minimizes the objective function until a pre-defined threshold error is achieved. The model was calibrated for steady state conditions using historical data of groundwater level. The calibrated parameter was the hydraulic conductivity. As the model has a steady state, the calibration targets are the groundwater head measurements in 1958, which is the oldest available data for the area of study (
Figure 3). In this case, the threshold is the absolute mean residual head at any point that is less than 0.5. This value is considered adequate for model calibration given the uncertainty in the historical target heads values. It is assumed that the aquifer was in steady state conditions (or near steady state) because groundwater abstraction at that time was very low, and groundwater abstraction was little [
24]. In the presumed steady state conditions in 1958, groundwater levels varied between 0.0 near the coastline to more than 13 m above the mean sea level in the central part of the aquifer. For calibration, groundwater level data from 1958 was used as observation points covering the entire model domain. As the uncertainty of this data is similar, all points had an equal weight in calibration.
3. Number of Pilot-Points
Three calibration sets with two configurations each were examined with different numbers of pilot-points. The number of pilot-points in the three runs was 112, 187, and 200 points, for the first, second, and third case, respectively. For each case, the model domain was divided into cells equal to the number of points. It should be noted that these cells are for pilot-points, not to be confused with the finite difference cells. Two configurations of pilot-points are considered: uniformly placed points in the centre of cells and random configuration were one point is randomly placed in each cell.
Statistical results of the three cases are shown in
Table 2. As expected, when the number of pilot-points increases, so does the number of variables and the solution takes a longer time to be achieved (Equation (1)). In all cases, the threshold error (mean head residual < 0.5) was achieved, but the sum of squared weighted residuals varied. In the first case, random configuration (i.e., pilot-points = 112), the points were randomly placed in a grid of eight rows and 14 columns distributed over the model domain, thus the density was one pilot-point for every 72.8 km
2. On the other hand, the uniform configuration of the first case implies placing the 112 points in the centres of the 112 cells. Given the low density of the points, the convergence was not easy, as fewer points were used for interpolation of the entire model domain. The sum of the squared weighted residual was 1370.
Mean Residual—Average error for the observations. This can be misleading because the positive and negative errors can cancel each other out. The root mean squared residual is calculated by calculating the square root of the average of the square of the errors for the observations. This tends to give more weight to cases where a few extreme error values exist.
In the second case (pilot-points = 187), the model domain was divided into a grid of 17 rows and 11 columns, thus the density of pilot-points is one point per 43.55 km2. As with the first case, uniformly distributed pilot-points produced less error than the randomly-distributed case. Although the mean residual head shows improvement from case 1, the sum of the squared weighted residual is significantly lower.
In the third case, the model domain was divided into a grid of 20 rows and 10 columns. The pilot-points density, in this case, is one per 40.8 km2. Obviously, this case shows much improvement compared to the previous two cases, as the number of pilot-points is higher. However, this came at a price of higher computational expenses.
4. Calibration Results
Based on the results shown in
Table 2, it was clear the number of pilot-points of 187 (uniformly placed) is adequate for the calibration, as it converges easily with an error below the threshold value of 0.5 m. Therefore, 187 pilot-points were used to calibrate the model. As explained above, the pilot-points were defined using a grid of 17 rows and 11 columns created over the model domain. Each cell in this grid is 6700 m wide and has a height of 6500 m. The pilot-points were placed in the center of the defined cells (not to be confused with the finite-difference grid).
Figure 4 shows the calibrated groundwater level for the steady state solution, with pilot-points and targets. The sum of squared weighted residual is 421 for the entire model area. The individual residuals at target points are within the acceptable threshold (i.e., the mean absolute residual of head at any point is less than 0.5 m), as shown in
Table 2.
As depicted in
Figure 5, values of the calibrated hydraulic conductivity vary between 0.001 and 191 m/day. While the majority of the model area shows low values (less than 10 m/day), a high hydraulic conductivity up to 191 m/day occurs in some locations. These locations occur in the east, north-west, south, and centre of the model area, as appears in
Figure 5. This reveals the high heterogeneity of the aquifer due to karst features in some places.
5. Effect of Pilot-Points Locations
Two cases of pilot-points configurations were considered to investigate the effect of pilot-points locations on the calibrated hydraulic conductivity. These configurations are: randomly and uniformly spaced.
5.1. Random Perturbation
The same grid defined previously was used to generate the pilot-points (17 × 11 cells). Therefore, the total number of pilot-points is 187. In each square of this grid, one pilot-point was randomly placed, similar to Latin hypercube sampling. This results in stratified locations of pilot-points locations, which converges faster than random sampling [
33,
34,
35] and covers the entire sampling domain.
5.2. Uniform Perturbation of Pilot-Points
As per the random placement, the model domain was divided into 17 rows and 11 columns. In each square, one pilot-point was placed at the same location within the square (same x and y within the square). The locations have been changed 20 times for the 20 runs of the model, keeping the distance increments between points constant.
6. Result and Discussion
To examine the effect of the locations of pilot-points on the estimated parameters, 20 runs were performed for each configuration of pilot-points locations (random and uniform distribution), respectively. Statistical analysis of the results was conducted, as well as a comparison between the uniform and random perturbation.
6.1. Random Placement
Twenty runs were performed using random realizations of the pilot-points in each run, as described above. In each run, the hydraulic conductivity was calibrated. In all cases, the calibrated absolute residual error was below the threshold value (<0.5). After all runs were completed, descriptive statistics of calibrated hydraulic conductivity were obtained to explore the effect of pilot-point locations on model calibration.
Figure 5 shows the calibrated model results, with 187 pilot-points centered at 187 cells. The calibrated hydraulic conductivity varies between 0.001 and 191.0 m/day.
Figure 6 shows the mean value contour maps of the calibrated hydraulic conductivity and the coefficient of variation map resulting from the 20 runs. The coefficient of variation was used to show how the hydraulic conductivity values spread around the mean. It was obtained by dividing the standard deviation by the mean map. The mean value of the hydraulic conductivity resulting from these iterations varies between 0.001 and 118.8 m/day, which is lower than the initial results of the model (
Figure 5). While the vast majority of the model domain has low conductivity (less than 10 m/day), areas of relatively higher hydraulic conductivity occur in the east and north-western area.
Table 3 shows descriptive statistics of the mean and coefficient of variation of the hydraulic conductivity maps. The mean and median values of hydraulic conductivity are 9.58 and 1.4 m/day, respectively. The low median of the hydraulic conductivity is in line with the results of the pumping test [
20], which show low values, except at some locations, where fractures/conduits occur.
The coefficient of variation values of the hydraulic conductivity of the 20 runs vary between 0 and 3.5, as shown in
Table 3 and
Figure 7. Unlike the mean hydraulic conductivity distribution, high values of the coefficient of variation occur in the middle of the model.
6.2. Uniform Placement of Random Points
In the second case, another 20 runs were performed using uniformly placed pilot-points. In each run, pilot-points were distributed in a way that their locations are different, but the spacing between them is constant.
Figure 8 shows the mean value map of the calibrated hydraulic conductivity using 187 pilot-points (the same number as in random placement). The map shows the mean of 20 runs using different perturbations of pilot-points and a uniform spacing between points. As with previous examples, the same threshold error was used. Descriptive statistics of both the mean and the coefficient of variation, in this case, are shown in
Table 4. The hydraulic conductivity values vary between 0.001 and 162 m/day, with high values in the middle of the aquifer and in the north-west area. The coefficient of variation varies between 0.0 and 2.6.
Comparing the data in
Table 3 and
Table 4, while the median values are very similar in both cases, the mean of the uniformly distributed case is higher. From the minimum and maximum values, the uniform case data covers a wider range compared to the random case.
The statistics of the coefficient of variation maps show more similarity than the mean. The maximum values of the coefficient of variation are 3.6 and 3.4 for the random and uniform case, respectively. However, the uniform case shows less variance as the 99% tile is 2.6 compared with 3.4 in the case of random points. The mean and the median of the coefficient of variation are much lower in the case of uniform points than random points.
Figure 6 and
Figure 7 clearly show the similarity of the coefficient of variation in both cases, where high variations occur in the same areas. Unlike the random case, it is obvious that the coefficient of variation and the mean values in
Table 4 are not similar.
6.3. Parameter Sensitivity Analysis
As part of PEST calculation, the Jacobian matrix is calculated as at least the number of calibration iterations plus one. The PEST calculates sensitivity with respect to each calibrated parameter. The Jacobian matrix contains the derivative of model output to the hydraulic conductivity in this case. Each column in the Jacobian matrix contains the derivative of the model output with respect to the calibrated parameter at each observation point. PEST calculates the composite sensitivity, which is the normalized magnitude of the Jacobian matrix columns with respect to the observation points. For any parameter
i, the composite sensitivity (
Si) is given by Doherty [
14]:
where
J is the Jacobian matrix,
Jt is the transpose of the Jacobian matrix,
m is the number of observations, and
Q is the weight matrix. In this study, Equation (3) calculates the sensitivity of the parameter value at each pilot-point normalized by the number of observation points (or targets)
m.
Figure 8 shows the mean composite sensitivity in the case of random and uniform pilot-points. While the general trend in sensitivity is similar in both figures, some differences exist. The random case shows a higher sensitivity magnitude than the uniform case. This is consistent with the high coefficient of variation for the case of randomly placed pilot-points. Composite sensitivity maps show that the model is more sensitive to parameter changes at pilot-points uniformly rather than randomly placed, with a maximum sensitivity of 0.048 and 0.073 for the random and uniform case, respectively.
6.4. Optimization of Pilot-Points Locations
Spatial interpolation depends on values at sampling points (in this case the pilot-points) and the spatial correlation between these points (in kriging). It is important to focus pilot-points at areas where the changes in the parameter under consideration are high (in this case high dk/dn, where n is the flow direction). In the steady state simulation, groundwater head is fixed. When calibrating both hydraulic conductivity and recharge, different solutions can be as long as the ration between recharge and hydraulic conductivity remains the same [
36,
37]. It is possible to estimate hydraulic conductivity if the recharge is well-known and vice versa [
36].
For any given groundwater head, the spatial changes in hydraulic conductivity (dk/dn) are directly proportional to the spatial changes in the hydraulic head. The spatial changes in hydraulic conductivity (dk/dn) can be estimated from the piezometric map (
Figure 3), and recharge
R, using the continuity equation in both
x and
y directions:
where
B is the thickness of the aquifer and
h is the piezometric level. For the isotropic aquifer, Equation (4) can be rewritten as follows:
Raster maps were prepared using ArcGIS for the derivative of piezometric maps, aquifer thickness, and recharge. Using Equation (5), the derivative of estimated hydraulic conductivity (
K) was obtained as shown in
Figure 9. Although these maps are just an estimation, they provide a good indication of areas that need more pilot-points.
Pilot-points were redistributed on areas of high spatial variation of the estimated hydraulic conductivity. A total of 180 points were used, with a high density at a high spatial variation of the calibrated parameter and lower density elsewhere. Statistical analysis of the model calibration results using this configuration of pilot-points is shown in
Table 5.
Comparing the data in
Table 2 and
Table 5, it is obvious that the optimized pilot-points produced much better calibration.
7. Conclusions
The northern aquifer of Qatar has a high variability in hydraulic conductivity as the aquifer is karst, but has a lower hydraulic conductivity in general. The calibrated hydraulic conductivity varies between 0.001 and 200 m/day. This study investigated the effect of pilot-points number, locations, and distribution on the model calibration results. It also explored the optimization of pilot-points locations for a steady state flow model.
Three calibration sets were performed using three different densities of pilot-points: one point per 72.8 km2, one point per 43.55 km2, and one point per 40.8 km2. It was found that the pilot-point density of one per 40.8 km2 provided the sought solution below the pre-defined threshold error, without over-parameterizing the inverse problem.
Two different perturbations of pilot-points were used; randomly and uniformly placed points, keeping the number of points constant in both cases (187 points or one per 43.55 km2). A total of 40 model calibrations were performed to test both configurations of pilot-points. The mean and the coefficient of variation of 20 runs for each case were obtained, with descriptive statistics.
Results show that the uniformly placed pilot-points perform better than the randomly placed points, given the same number of points. The randomly placed points solution has a higher coefficient of variation from the mean. This is because the interpolation between points is better when points are uniformly placed. The range of the calibrated parameter is higher in the case of uniform points. The uniformly placed points provide a good and unbiased coverage of the model area compared to randomly placed points. In both cases, high uncertainty (i.e., high coefficient of variation) occurs in the same areas.
In this study, and as the model represents a steady state, the relationship between hydraulic conductivity and recharge provided a means to estimate changes in hydraulic conductivity. This estimation can be used as a guide to place pilot-points at high variations of the hydraulic conductivity. Results demonstrated the usefulness of this method to optimize the pilot-points locations.