1. Introduction
Salinity has been long recognized as a critical environmental variable in estuaries which are transition zones between upstream freshwater environments and downstream saline marine environments [
1]. The spatial and temporal variation pattern of salinity in estuaries plays a dominant role in the health of estuarine habitats and biota [
2,
3,
4,
5,
6,
7]. This pattern is typically influenced by drivers including upstream freshwater inflow, downstream tidal forces, as well as local water diversions, precipitation, evaporation, and wind in the estuaries, among others. Understanding the variation pattern of salinity is the foremost step in predicting its future behavior and thus guiding salinity management in estuaries [
8,
9,
10,
11,
12]. This is particularly the case for estuarine environments with paramount economic, ecological, and social significance including the Sacramento–San Joaquin Delta in California, United States.
As the fifth largest economy in the world, the state of California accommodates a population of nearly 40 million and is one of the most productive agricultural areas globally [
13]. Reliable water supply is indispensable to support such a large population and sustain such a robust economy. However, the spatial and temporal distribution of precipitation, the largest water supply source for the state, largely mismatches water demands. Most of the state’s population and farmlands (and thus water demand) is in the southern half, while most of the precipitation falls in the northern mountain ranges in the state. In addition, a majority of precipitation occurs in the wet winter season while the highest water demand is normally in the dry summer and fall. To balance the mismatch, a complex water storage and transfer system has been built in the state to redistribute water across different spatial and temporal scales. The most critical components of the system are the State Water Project (SWP) and the Central Valley Project (CVP) which are operated by the state and the federal governments, respectively. SWP and CVP infrastructure consist of tens of dams and reservoirs, pumping plants, hydro-power generation plants, and over 1000 km of aqueducts, tunnels, canals, and pipelines [
14,
15].
The hub of this statewide water redistribution system is the Sacramento–San Joaquin Delta (Delta, California, CA, USA). Physically, the delta is a patchwork of islands surrounded by about 1100 km of waterways (
Figure 1). It receives freshwater from the largest two rivers in the state, namely Sacramento River on the north and San Joaquin River and its tributaries on the south-east. Freshwater inflows are either diverted to water users within and outside of the delta or serve to repel seawater intrusion from its downstream boundary at Martinez (
Figure 1). Ecologically, the delta is a globally important biodiversity hot spot with the highest priority of conservation [
16]. It provides habitats that support about 750 species of plants and animals including some near extinction [
17]. Socioeconomically, the delta provides water to about two thirds of the state’s population and over 15,000 km
2 of farmlands via SWP and CVP deliveries. Millions of people use the delta for recreation and transportation [
18]. These physical, ecological, and socioeconomic features of the delta drive SWP and CVP operations with the coequal goals of a reliable water supply and an ecologically sustainable delta ecosystem [
19]. The SWP and CVP pump water from southern delta (
Figure 1) and transfer the water to municipal and agricultural users in the state. The pumping time and rates are dictated by state and federal regulatory requirements to ensure that: (1) flow and water quality standards at various locations in the delta are complied with, and that (2) additional regulations to protect endangered species are followed [
20,
21]. One critical water quality standard is that the salinity level, typically reported in units of electrical conductivity (EC) as microSiemens/cm (µs/cm), at compliance locations cannot exceed preset threshold values during certain periods in a certain type (e.g., wet, dry, critical) of water year. Traditionally empirical and process-based models have been used to simulate salinity variations at these locations to guide real-time delta operations (e.g., SWP and CVP operations) and long-term planning studies (e.g., structural changes to the delta) to ensure salinity compliance.
One of the earliest models developed for this purpose is the conceptual–empirical salinity gradient model (i.e., G-model) of [
22]. The model derives salinity from antecedent delta outflow based on the assumption that there is a non-linear relationship between these variables. Hutton et al. [
11] extended the G-model to simulate the low salinity zone in the delta defined as the position of a predetermined salinity isohaline. Numerical process-based models have also been developed to simulate the spatial and temporal variation of salinity in the delta. These models include, to name a few, but are not limited to, the one-dimensional Delta Simulation Model II (DSM2) [
23], two-dimensional RMA10 [
24] and TRIM2D [
25], and three-dimensional SCHISM [
26,
27], UnTrim [
28,
29], and SUNTANS [
30,
31]. Although physically more rigorous and being able to provide higher spatial resolution on flow and salinity distribution than one-dimensional models, multi-dimensional models are computationally expensive. For studies with long data temporal sequences and multiple scenarios involved, simpler one-dimensional models are still favored. This is the case for DSM2 which is widely applied in contemporary applications in the delta [
32]. The DSM2 domain covers the entire delta with Martinez as its downstream boundary. For operational planning and forecasting studies, DSM2 relies on the Martinez Boundary Salinity Generator (MBSG) [
33] to produce downstream salinity boundary conditions for DSM2 simulations.
In addition to empirical and process-based models, data-driven models including Artificial Neural Networks (ANNs) have also been explored in deriving salinity in the delta area [
34,
35,
36,
37,
38,
39]. An ANN employs a mathematical network structure to implicitly identify the relationships between one or more inputs (e.g., usually measured variables such as stage or flow) and outputs (e.g., salinity at selected locations) datasets. The basic processing units in the network are called neurons. These neurons are arranged in layers and are connected to other neurons in adjacent layers. Multilayer perceptrons (MLPs) are probably the most popular ANN models applied in the field of water resources engineering [
40]. The above-mentioned ANN studies generally use MLP-based models in estimating salinity in the Delta. An MLP is a feedforward ANN typically consisting of two visible layers on both ends of the network (i.e., input and output layers) and one or more hidden layers in the middle. A neuron in a specific layer takes inputs from neurons in the previous layer and outputs a linear or non-linear transformation of the combined input information to neurons in the next layer. The connections between neurons are represented by linear weights. These weights are determined in the training process by minimizing the difference (i.e., error signal) between network predictions of the variable of interest (e.g., salinity) and the corresponding observations. The most common training method is gradient descent, which propagates the difference backward into the network and updates the weights according to the chain rule [
41].
Despite their popularity, MLPs do not treat the sequential ordering of input time series as a feature during training. For non-linear systems where short- or long-term temporal dependencies exist between output and input (e.g., salinity at the current time relates to antecedent flows), MLPs may not be the most viable choice [
42]. Recurrent Neural Networks (RNNs), a different category of ANN, are designed to overcome this drawback of MLPs [
43]. RNNs process input in its temporal order. The output of hidden layer neurons at each time step is recurrently fed as an additional input to the next time step. This feature grants RNNs the advantage of better understanding the temporal dynamics between input and output variables. In spite of this advantage, standard RNN models are shown to have difficulties in capturing long-term dependencies [
44]. This is mainly caused by two problems encountered during the training process: vanishing gradient (network weights approach zero) and exploding gradient (network weights become extremely large). These two problems occur mostly because the error signal can only be back propagated effectively for a few steps [
45].
Many variants of RNNs have been proposed to avoid the vanishing and exploding gradient problems. The most well-known and successful variant is the long short-term memory (LSTM) network [
45]. LSTM introduces the concept of gates, which are essentially neurons with learnable weights. The gates inform the network on what information to discard, what to retain, and for how long. This gate configuration helps the network preserve essential information over a long time and avoid rapid error signal decay. LSTM networks have only been applied recently in the field of water resources in terms of modeling rainfall-runoff process [
46,
47,
48,
49], groundwater table [
50,
51], water level in channels [
52], water quality [
53,
54], and reservoir operations [
55]. Given the long-term dependencies between salinity and flow/stage, LSTM should also be suitable for salinity simulation given flow and stage inputs.
Convolutional neural network (CNN) is another category of ANNs. A CNN consists of a sequence of layers that shrink in length from the input layer to the output layer. The shrinking aims to condense information learnt from previous layers to more abstract concepts in deep layers [
42]. CNN is a leading network architecture in deep learning techniques and has had extremely successful applications in image pattern recognition and classification [
56,
57,
58,
59]. It has only recently received significant attention in water-related time series modeling in terms of ground water level prediction [
60], precipitation estimation [
61,
62], and flood forecasting [
63,
64].
To our knowledge, few studies have explored the applicability of state-of-the-art deep learning techniques (e.g., LSTM, CNN) in salinity estimation for deltaic and/or estuarine environments, not to mention in the Sacramento–San Joaquin Delta specifically. This study explores the capacity of these techniques by presenting a case study of emulating the Martinez Boundary Salinity Generator in estimating the downstream salinity boundary (i.e., Martinez salinity) for the delta via LSTM- and CNN-based models. The salinity generator itself is applied as the benchmark model. A number of MLP-based neural network models are also proposed for comparison purpose. The rest of the paper is organized as follows:
Section 2 describes the study area, the available dataset, the Martinez Salinity Boundary Generator, the proposed neural network models, and the evaluation metrics;
Section 3 presents the results and findings;
Section 4 discusses data stationarity, study limitations, implications, and future work; the last section concludes the paper.
3. Results
The results are grouped into three sub-sections accordingly. The first sub-section presents the overall results during the entire evaluation period from water year 2003–2014. In the second sub-section, the entire evaluation period is divided into three sub-periods containing three different ranges (high, medium, and low) of salinity, respectively. Model performance in simulating different ranges of salinity is examined. In the last sub-section, the evaluation period is divided into five sub-periods representing five different water year types, respectively. Model results are evaluated in each of these five sub-periods.
3.1. Entire Evaluation Period
Standard deviation (SD) of simulated salinity at Martinez along with its correlation (R) with the reference salinity as well as its centered root mean square difference (RMSD) for each model are calculated and illustrated in
Figure 3. The hybrid MBSG–CNN model slightly outperforms the process-based MBSG model (
Figure 3a). The former has a smaller (by an amount of 5.7%) RMSD and a higher (by about 0.4%) R value compared to the latter. The SD values of both models are fairly close to each other. They are both smaller than their counterpart of the reference salinity, indicating that salinity simulations of both models have relatively less variation compared to the reference salinity.
For MLP models, when only using net Delta outflow (NDO) and average water stage as input (MLP1; point B in
Figure 3b), the resulting salinity simulations have a smaller (by 9.5%) correlation value and a remarkably larger (by 90%) RMSD compared to MBSG simulations (point B in
Figure 3a). The SD values of both MLP1 and MGSB are comparable to each other, yet both are smaller than that of the reference salinity. Adding daily maximum and minimum stage as input (MLP2; point C in
Figure 3b) yields simulations with only a slightly higher R value and a marginally smaller RMSD than that of the MLP1 simulations. The SD of MLP2 differs noticeably (12% smaller) from that of the reference salinity. When further incorporating MBSG simulations as an additional input feature (MLP3; point D in
Figure 3b), however, the results are improved markedly. The metrics (SD, R, and RMSD) of MLP3 become comparable that of MBSG. Fine-tuning MLP3 hyper-parameters (MLP4; point E in
Figure 3b) leads to salinity simulations with even more satisfactory metrics compared to both MLP3 and MBSG.
Different from MLP1, the LSTM model using NDO and average stage information alone as input (LSTM1; point B in
Figure 3c) yields comparable simulations to that of the MBSG (point B in
Figure 3a). The MBSG model has slightly smaller RMSD and higher R. However, the SD value of LSTM1 is closer to that of the reference salinity compared to the SD value of MBSG. Adding daily maximum and minimum stage information as input (LSTM2; point C in
Figure 3c) yields simulations with a higher R value and a lower RMSD than LSTM1. Further including MBSG simulations as input (LSTM3; point D in
Figure 3c) leads to salinity simulations with smaller RMSD, higher R, and better SD (i.e., closer to the reference SD) than LSTM2 and MBSG simulations. Fine-tuning LSTM3 hyper-parameters (LSTM4; point E in
Figure 3c) results in simulations with even better R and RMSD than that of LSTM3 simulations.
In addition to R, SD, and RMSD, bias, and mean absolute error (MAE) are also calculated for all models studied. Overall, the process-based MBSG model under-estimates the reference salinity (bias = −2.4%;
Figure 4). Similarly, most neural network models also under-estimate the salinity except for MLP1 (14.7% bias) and LSTM3 (2.1% bias). In terms of the magnitude, MLP4 and LSTM4 are less biased than MBSG. The remaining neural network models have comparable but slightly higher bias than MBSG except for MLP1. MLP1 also has the largest mean absolute error (MAE = 3979 µs/cm). MLP2 has the second largest MAE value. The MAE values of other models are consistently smaller than 2000 µs/cm. Compared to MBSG, four neural network models, including MLP4, LSTM3, LMST4, and the hybrid model, have smaller MAE values.
Looking at five metrics all together, for MLP and LSTM models, incorporating maximum and minimum stage information generally improves network performance. Adding MBSG simulations as an additional network input feature leads to further improvement. The improvement is much more significant for MLP rather than LSTM. Fine-tuning network hyper-parameters is shown to improve the general performance of both MLP and LSTM models. Put differently, among all MLP (LSTM) models, MLP4 (LSTM4) has the best performance in general during the entire evaluation period. Among all nine neural network models, LSTM4 has the smallest RMSD, highest R, and lowest MAE; MLP4 has the lowest bias; LSTM1 and LSTM3 have the closest SD to that of the reference salinity. MLP4, LSTM3, and LSTM4 are the only three models which outperform the process-based MBSG model in terms of all five metrics. In comparison, the hybrid model yields improvement over MBSG in terms of R, RMSD, and MAE. The bias and SD values of the hybrid model are comparable to that of MBSG.
3.2. Different Salinity Ranges
Martinez salinity varies seasonally, typically with low values in winter/spring and high values during summer/fall (
Figure 2). Salinity management practices vary accordingly, based on the range of salinity. In addition to looking at model performance in the entire evaluation period, this section further examines its performance during different salinity ranges. Specifically, three ranges are considered, including low range (less than 25th percentile of observed Martinez salinity during the evaluation period; <1.19 × 10
4 microsiemens per centimeter (µs/cm)), medium range (25th percentile to 75th percentile), and high range (over 75th percentile; >2.53 × 10
4 µs/cm). The entire evaluation period is divided into three sub-periods accordingly. The length of the low salinity period is identical to that of the high salinity period, with each accounting for half of the length of the medium range salinity period.
Based on the results during the entire evaluation period presented in
Section 3.1, MLP4 and LSTM4 have the best performance among all MLP and LSTM models, respectively. The hybrid model provides generally comparable or superior simulations than MBSG. The current section first evaluates the performance of these three neural network models (MLP4, LSTM4, and Hybrid) against that of the MBSG model (
Figure 5). For low range salinity (
Figure 5a), all three models yield higher correlation values and lower RMSD compared to MBSG. For medium range salinity (
Figure 5b), all three models have higher correlation values and smaller RMSD with one exception. The RMSD of MLP4 is slightly (2%) larger than that of MBSG. For high range salinity (
Figure 5c), the RMSD of MLP4 is even higher (by 9.7%) compared to its counterpart of MBSG. The correlation value of MLP4 is also smaller. Conversely, LSTM4 and the hybrid model outperform MBSG in terms of both R and RMSD. Regarding SD, for both medium and high ranges of salinity, all three neural network models and the MBSG model yield simulations with higher variations (higher SD) than the reference salinity; for low salinity, LSTM4 is the only model with a higher than reference SD value. For low, medium, and high ranges of salinity, MLP4, LSTM4, and the hybrid model have the most satisfactory SD (closest to reference SD) values, respectively.
For the models not depicted in
Figure 5, those three metrics (R, SD, and RMSD) are also examined (
Table 3). Similar to the results presented in
Section 3.1, adding additional information as network input features generally improves model performance across all three salinity ranges. Nevertheless, a noticeable difference is that fine-tuning network hyper-parameters does not necessarily lead to improved performance. The differences in these three metrics between MLP3 (LSTM3) and MLP4 (LSTM4) are minimal. MLP3 performs relatively better than MLP4 in high salinity ranges while LSTM3 generally outperforms LSTM4 in medium and high salinity ranges.
Table 3 also indicates that model performance differs evidently in high salinity range versus low to -medium ranges. Specifically, SD, and RMSD values of high salinity simulations are considerably smaller than that of the low and medium salinity simulations while the R value of the former is remarkably smaller than that of the latter. This suggests that simulations on high salinity are generally less spread out (smaller SD and RMSD). However, their linear relationship with the corresponding reference salinity is remarkably weaker when compared to that of simulations on low- to medium ranges of salinity.
In terms of bias and MAE, different models perform differently across different salinity ranges. First, all models tend to over simulate low salinity (
Figure 6a). The process-based model has a bias of 7.9% and MAE of 1519 µs/cm for low salinity simulations. In comparison, only LSTM1, MLP3, and the hybrid model are less biased among all nine neural network models. Overall, MLP1 is the outlier model with significantly large bias and MAE. MLP2 shows improvement over MLP1. However, its bias and MAE values are still remarkably larger than that of the remaining models. In contrast, MLP3 and the hybrid model have the smallest bias and MAE. Second, all models except for MLP1 under-simulate high salinity (
Figure 6c). The bias and MAE of MBSG are −6.4% and 1858 µs/cm, respectively, for high salinity simulations. Four neural network models including MLP4, LSTM1, LSTM3, and LSTM4 have smaller bias and MAE than MBSG. Among them, LSTM3 has the most satisfactory bias and MAE. Finally, most models also under-estimate the medium range salinity (
Figure 6b). MLP1 is again the outlier model with the largest positive bias and MAE. LSTM3 is the other model with positive bias (4.2%). MBSG simulations on medium salinity have a bias of −1.2% and MAE of 1723 µs/cm, respectively. In comparison, both MLP4 and LSTM4 have smaller bias and MAE values.
All in all, no single model consistently outperforms the others in terms of all five metrics across low, medium, and high ranges of salinity. However, among all models, MLP1 and MLP2 have the worst performance measured by nearly all metrics. For high range salinity, LSTM3 has the best performance in general. It is the least bias model with the smallest RMSD and MAE and the best SD. The associated correlation coefficient (0.598) is very close to the optimal value (0.599) of the hybrid model. For medium range salinity, LSTM3 has the smallest RMSE and the best SD; LSTM4 has the highest correlation coefficient and smallest MAE, while MLP4 is the least biased. The results on low salinity are mixed. The five optimal metrics come from five different models, respectively. Nevertheless, on average, MLP4, LSTM4, and the hybrid models have relatively better performance.
3.3. Different Water Year Types
In the Delta, water quality standards vary with water year types (e.g.,
Table A1 in the
Appendix A). Understanding model performance in different types of water years is critical to guide corresponding salinity management practices. The entire evaluation period (2003–2014) is divided into five sub-periods, with each sub-period containing the data from a specific water year type (W, AN, BN, D, C). There are two wet years, two above-normal years, three below-normal years, three dry years, and two critical years. Therefore, these five sub-periods vary (from two to three years) in length.
Following
Section 3.2, this section first examines three metrics illustrated by the Taylor diagram of the process-based MBSG model and three neural network models (MLP4, LSTM4, and Hybrid). Overall, the performance of these four models are fairly close to each other across all five types of water years (
Figure 7). However, none of them consistently outperform the others. Specifically, across all types of water years, LSTM4 and the hybrid model have higher correlation values than MLP4 and MBSG. In addition, the hybrid model has smaller RMSE than MLP4 and MBSG. Regarding SD, MLP4 has the best performance in all types of water years except for above-normal years. The SD value of MBSG is the closest to the reference SD (−0.6% difference versus –2.3% of MLP4). Model performance also varies across different water year types. Highest R values of all four models occur in wet years when salinity is generally low. In contrast, R values during dry and critical years (when salinity are normally high on average) are typically the lowest. The smallest and highest RMSD values are observed in below-normal and critical years, respectively, for MBSG, MLP4, and the Hybrid model. For LSTM4, wet years have the smallest RMSD while above-normal years have the highest RMSD. In terms of SD, model performance is generally the worst in critical years, followed by dry years. On average, above-normal years have the most satisfactory SD values.
The performance of those four models is also compared to that of the remaining models.
Table 4 shows the RMSE of all nine neural network models along with the process-based MBSG model. For MLP models, when only NDO and stage data are considered as network input features (MLP1 and MLP2), the resulting RMSE are much larger than the process-based model across all types of water years. Adding MBSG simulations as an additional input (MLP3) largely improves model performance. Fine-tuning hyper-parameters (MLP4) leads to even smaller RMSE in all types of water years except for below-normal years. For LSTM models, when only NDO and average stage are employed (LSTM1), the resulting RMSE values are generally comparable to that of the MBSG. Adding minimum and maximum stage (LSTM2) yields smaller RMSE in general. Incorporating MBSG simulations as input (LSTM3) leads to consistently smaller RMSE values (versus RMSE values of MBSG, LSTM1, and LSTM2) in all five types of water years. Fine-tuning hyper-parameters (LSTM4) does not necessary lead to further improvement. Similar features are also observed in other two metrics (
Table A3 and
Table A4 in the
Appendix A). Looking all models together, LSTM3 has the smallest RMSE in wet years; LSTM4 has the smallest RMSE in dry and critical years, while the hybrid model performs the best during above-normal and below-normal years.
Similar to what has been observed in the entire evaluation (
Figure 4) period and in three sub-periods representing three different salinity ranges (
Figure 6), MLP1 and MLP2 tend to be the outlier models with very different bias and MAE from other models (
Figure 8). Their MAE values are markedly larger than that of other models. MLP1 considerably over-estimates the salinity in all types of water years except for the critical years, while MLP2 largely under-estimates Martinez salinity in dry and critical years. For the remaining models, the hybrid model performs the best in above-normal years with the smallest bias and MAE; MLP4 is the best performance model in below-normal years; LSTM3 outperforms other models in dry years. For wet years, two models (MLP4 and LSTM3) have the smallest bias and MAE, respectively. For critical years, LSTM3 and LSTM4 has the smallest bias and MAE, respectively.
Examining five metrics altogether, the neural network models can outperform the process-based MBSG model consistently across all water year types.
Table 5 tabulates the improvements calculated as the percent difference between the optimal metrics (of the neural network models with the outlier models MLP1 and MLP2 excluded) and the corresponding MBSG metrics. For R, SD, RMSE, and MAE, the improvements in extreme years (wet, dry, and critical) are more noticeable than the improvements in near-normal (above-normal and below-normal) years. For R and SD (RMSE and MAE), the largest improvements occur in critical (wet) years. The optimal metrics are not associated with a single neural network model. In extreme years, the LSTM models (LSTM3 and LSTM4) tend to be the optimal models; in above-normal years, the hybrid model seems to have the best metrics (except for SD); in below-normal years, the hybrid model and MLP4 perform relatively better in terms of the number of optimal metrics associated with them.