1. Introduction
The Central Valley (CV) aquifer lies under one of the most productive areas of the United States. It accounts for 17% of the total national irrigated land used for agriculture and 75% of the irrigated land of the state of California [
1]. In the CV, for the past several decades huge amount of chemical nitrogen fertilizer has been applied to increase the crop productivity [
2]. Based on a mass balance study, the long-term assessment of nitrogen loading in the Sacramento Valley, San Joaquin Basin, and Tulare Basin of the CV observed significant increase in the nitrogen fertilizer application rate between year 2002–2012 compared to 1991–2001 [
3]. The estimated 740,000 tons of nitrogen fertilizer was applied to 6.7 million acres of irrigated farmland in California. The excess nitrogen fertilizer, not absorbed by crops, as a chemical contaminant, easily dissolves in water and leaches into the aquifer to pollute the groundwater [
4]. Harter [
4] estimated more than 80 pounds of nitrogen per acre per year leaching into the groundwater. The nitrate contamination level in some public, domestic, and monitoring wells of the CV have exceeded the EPA maximum contaminant level (MCL) of 10 mg/L [
5,
6,
7]. Several studies have attributed the groundwater nitrate concentration level to agricultural activities of the region [
7,
8,
9].
The susceptibility of groundwater nitrate contamination is influenced by the aquifer characteristics and geochemical condition of the valley, which could enhance or reduce the leaching rate of nitrate into the aquifer [
10]. The hydrogeological conditions like shallow water table, high permeability, and high hydraulic conductivity enhance the downward percolation rate of the nitrate into the aquifer. The lower nitrate concentration has been found in reducing environment due to denitrification process that breaks down nitrate into nitrogen gas [
11]. The interplay of these different environmental factors often makes it difficult to predict the areas vulnerable to groundwater nitrate contamination. However, identification of vulnerable areas is of extreme importance for groundwater management agencies to develop site-specific mitigation tools and adopt appropriate policies to mitigate the contamination.
The process-based, statistical, and index- overlay methods are the most widely used techniques to assess the vulnerability of the aquifer to groundwater nitrate contamination. Process-based methods are deterministic ways of predicting the nitrate contamination based on solute transport model. Physical, chemical, geological, and biological processes are considered to simulate the subsurface flow of contaminants, involving complex numerical computations [
12]. The process-based methods are effective at local scale prediction, but because of detailed data requirements, they are not practical for large spatial scale prediction [
13,
14].
Statistical methods are robust ways of analyzing the relationship between groundwater nitrate contamination and predictor variables. The multiple linear regression calculates individual coefficients of the predictor variables to measure its contribution to the nitrate contamination [
15,
16]. There are numerous studies where logistic regression has been applied to calculate the probability of nitrate concentration above the background value [
17,
18]. Other statistical methods have been used to measure the linear trend [
19], comparing the mean nitrate concentrations under different agriculture settings [
8] or the difference of nitrate concentration over different time intervals [
20]. The computing power of statistical method to process large data in a short period of time signifies the strength of these methods. However, careful analysis is still warranted in the interpretation of statistical output. The availability of sufficient data meeting the requirement of the statistical analysis often becomes challenging. Further, site-specific nitrate contamination data or data covering regional or national scale are sparse and not uniform, which could cause biasness in the statistical results. Long term monitoring of groundwater data over large spatial scale demands government priority and budgeting over a long period of time.
The index and overlay method uses geological and hydrogeological setting to assess the vulnerability of the study area to the groundwater contamination. This method is preferred for its simplicity to map the vulnerable areas by using different layers of data that affect the transport of nitrate to create spatially oriented index values. Several index and overlay methods has been applied to assess the vulnerability of aquifer such as DRASTIC [
21], GOD [
22], AVI, and SINTACS [
23]. The selection of correct method is important in generating the vulnerability map and site-specific methods have been proposed in different study. For example, EPIK and RISK methods are used for Karst aquifers, the AVI method for porous aquifers and GOD for sedimentary aquifers. DRASTIC has been a widely used method and is applicable to all aquifer types [
21]. DRASTIC was developed by Aller et al. (1987) for the United States Environmental Protection Agency (EPA). The DRASTIC method has been applied frequently to assess the vulnerability of the study area to groundwater nitrate contamination [
13,
14,
24,
25,
26]. Recently, the application of GIS techniques has enhanced the ability to process different data layers to compute the DRASTIC Index values [
27,
28]. The method has been relatively successful in generating the vulnerability map with minimum error in the model results. It is important to understand that the selection of vulnerability method could produce difference in the output map and therefore researcher should be aware in using the correct method.
Some of the limitations of DRASTIC method include its subjectivity in assigning the weight values to its variables and the dimensionless index values [
29,
30]. The pre-assigned weights to DRASTIC variables could be inaccurate for a given study area and could create biases in the calculation of DRASTIC Index values, as the influence of variables to vulnerability could change in different study areas. The generalization of the subjective weights of DRASTIC method have been criticized in several studies [
30]. The need to add additional variables to increase the efficiency of method has been realized in many studies. Several studies have modified the DRASTIC variables by adding landuse variables [
14,
31] and calibrating the point rating of variables based on correlation of nitrate concentration and hydrological variables to the model or modified the rating values to increase the efficiency of the method [
32]. Some authors have also emphasized on the availability and reliability of data suitable for the DRASTIC methods and interdependence of variable in linear combination could cause uncertainty in the method [
30].
Index–overlay methods, process-based methods, and statistical methods all have their strengths and limitations. The pros and cons of these methods can be accounted to correctly meet the scientific objective, budgeting constraints, and to aid in the decision supporting process. In this study, we developed two nitrate vulnerability maps: one based on the optimized-DRASTIC method and the other based on the new Geodetector-Frequency Ratio (GFR) method proposed here. Geodetector is a relatively new method that measures the correspondence between the incidence rate of events and the spatially stratified heterogeneity of explanatory variables [
33]. The Geodetector calculates the Power of Determinant (PD) values for each contributing factor (or independent variable) based on the spatial variance analysis, which estimates the degree of influence a variable has on the incidence rate, which is groundwater nitrate contamination in this case. The Frequency-ratio (FR) is a technique commonly used in landslide susceptibility mapping [
34,
35] and was recently introduced in groundwater vulnerability mapping [
36]. The FR measures the percent of area with incidence in each interval of a factor to the percent of area of that interval relative to the whole study area. The interval with higher FR is more susceptible to the incidence. We assigned the weights to the predictor variables based on the Geodetector results and assigned the rating based on the FR method to compute the Nitrate Susceptibility Index (NSI) using the geometric mean [
37].
Therefore, the objective of this paper is to generate a nitrate vulnerability map of the CV using the quantitatively derived rating and weights of the variables using GFR method and to compare the result with that generated from optimized-DRASTIC model. As will become evident later, the GFR method generated a map that is more consistent with the spatial distribution of contaminated wells in the CV, because GFR method integrates Geodetector and Frequency-Ratio method, which overcomes the subjective assignment of weights and rating values of variables in optimized-DRASTIC method.
4. Discussions
This study developed the new GFR method and applied it to analyze the vulnerability of the Central Valley (CV) aquifer to the groundwater nitrate contamination. The GFR result was also compared with the result derived from optimized-DRASTIC method. The optimized-DRASTIC index values were generally higher in the eastern part of the CV and the GFR method predicted the high vulnerability in the San Joaquin and Tulare basin compared to the Sacramento Valley. The GFR method predicted higher number of basins in the higher index range than the optimized-DRASTIC method.
The DRASTIC method is a widely applied groundwater vulnerability assessment method used on local, regional, and national scale. The DRASTIC map can provide a general picture of the vulnerable area of the CV aquifer to the groundwater nitrate contamination. The hydrogeologic variables used in the DRASTIC method are the key variables in determining the susceptibility of the aquifer. The DRASTIC Index values, calculated using weight and rating values of the variables, have been found consistent with other validation techniques in several other studies. In this study, in order to capture the local hydrogeologic condition of the CV, we optimized the DRASTIC method by modifying the range of rating interval of DRASTIC variables using natural breaking classification to calculate the index values [
14]. In the GFR method, the incorporation of variables with significant PD values, normalized weight, frequency-ratio, and geometric mean to calculate the GFR Index generated the vulnerability map that was more consistent with the measured groundwater concentration values in the aquifer as measured by both Pearson’s correlation and the PD value of Geodetector method.
The variables, weights, and rating values in DRASTIC method are preassigned. This pre-assignment of variables, weights, and rating in the DRASTIC method increases the chance of overlooking a variable that might be the underlying factor for increasing the susceptibility of the aquifer. Also, the subjective assignment of weights to the variables may not always correctly represent the influencing variable in a local aquifer. In this study, higher weight is assigned to the Depth to Water (DTW) in the shallow Sacramento Valley than San Joaquin and Tulare basin. However, the hydrogeologic condition of the CV in the southern part of the valley have changed due to hundreds of wells dug to pump the groundwater for irrigation for the last several decades. Although DTW is deep in this part of the valley, the residence time of contaminants is still short due to higher hydraulic gradient caused by heavy pumping, and groundwater generally flows towards the cone of depression [
1]. The groundwater basins in this part of the CV has the highest priority for groundwater evaluation and monitoring by California Groundwater Elevation Monitoring (CASGEM) [
42] .The results of the optimized-DRASTIC method could have been impacted by the inaccurate weighting of DTW. Also, variables specified by DRASTIC method are not always easily available for the study area. In this study, impact of vadose zone (IVZ) data was not easily available for Central Valley. Here, we calculated the IVZ data based on depth to water and permeability [
51]. The selection process of site specific data to correctly represent contamination in the data processing is crucial to understand the contaminant’s spatial distribution. The DRASTIC method has been criticized for its lack of ability to include the site-specific variables in the method. Landuse characteristics and other variables have been added to the DRASTIC method to increase its prediction accuracy.
We borrowed the Frequency- Ratio method from the landslide susceptibility method, which has been successfully applied in many landslides susceptibility studies. In the GFR method, Geodetector identifies precipitation, fertilizer, manure, elevation, and clay as the significant variables among twelve different variables and have direct spatial association with the distribution of the contaminated wells in the CV [
26]. Although, in this study, Geodetector utilized only 12 different explanatory variables to compute the PD values, the method can process any number of potential explanatory variables to determine their relevance based on the computed PD values and significance tests. In addition, this method can process both categorical and continuous variables, which makes it possible to include wider variety of potential explanatory variables. Furthermore, this method makes no assumptions about the data making it suitable to process nitrate data which are mostly skewed data as they are found in low concentration in normal condition. The GFR method offers a general framework for objectively selecting contributing factors (based on the significance test of PD values) and assigning their weights (based on normalized PD values), which can potentially be applied to any other study areas. Thus, GFR method is more effective than optimized- DRASTIC method for accurately capturing the vulnerability of groundwater contamination, as illustrated in this study in the CV aquifer.
The results of this study are comparable to the findings of previous studies performed in the CV. Ransom et al. [
69] also found higher nitrate concentration in wells of San Joaquin and Tulare Basin than Sacramento valley. The Sacramento Valley is composed of fine grained sediments compared to southern part of the CV, which could reduce the susceptibility of Sacramento Valley. There are several studies which modify DRASTIC method by adding the variables or changing the intervals to assign rating values matching the site-specific condition. However, they are mostly site specific, whereas the GFR method offers a general framework that can be applied anywhere.
In this study, a total of 31 groundwater basins were observed with contaminated wells (>5 mg/L) out of 41 basins in the CV. The advantage of the groundwater susceptibility mapping using overlay–index approach is that a vulnerability index value can be calculated for all the basins using the rating and weight values [
70]. The visual comparison of two vulnerability index maps showed that optimized-DRASTIC method produced generally a higher index value in the eastern flank of the valley, while the GFR index was higher in San Joaquin and Tulare Basins. However, the GFR method was found to be more consistent with observed contamination data than the optimized-DRASTIC method, as measured by the Pearson’s correlation coefficient and the PD value between the vulnerability index and percent of contaminated wells in the groundwater basins, which confirmed the validity of the newly developed GFR method. The GFR index values were negatively skewed, showing relatively higher numbers of groundwater basins in the upper range. The spatial variance analysis based Geodetector method was effective and can be used in different areas to measure the spatial association. However, high PD values obtained from the Geodetector method is a statistical output and does not necessarily mean causality. Therefore, we should still be careful in selecting the relevant variables.