Next Article in Journal
Application of the 2D Depth-Averaged Model, FLATModel, to Pumiceous Debris Flows in the Amalfi Coast
Previous Article in Journal
Organochlorine Pollutants within a Polythermal Glacier in the Interior Eastern Alaska Range
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York

by
Katerina Tsakiri
1,*,
Antonios Marsellos
2 and
Stelios Kapetanakis
3
1
Department of Information Systems and Supply Chain Management, Rider University, Lawrenceville, NJ 08648, USA
2
Department of Geology, Environment, and Sustainability, Hofstra University, Hempstead, NY 11549, USA
3
School of Computing, Engineering, and Mathematics, University of Brighton, Brighton BN24GJ, UK
*
Author to whom correspondence should be addressed.
Water 2018, 10(9), 1158; https://doi.org/10.3390/w10091158
Submission received: 28 June 2018 / Revised: 21 August 2018 / Accepted: 22 August 2018 / Published: 29 August 2018
(This article belongs to the Section Hydrology)

Abstract

:
This research introduces a hybrid model for forecasting river flood events with an example of the Mohawk River in New York. Time series analysis and artificial neural networks are combined for the explanation and forecasting of the daily water discharge using hydrogeological and climatic variables. A low pass filter (Kolmogorov–Zurbenko filter) is applied for the decomposition of the time series into different components (long, seasonal, and short-term components). For the prediction of the water discharge time series, each component has been described by applying the multiple linear regression models (MLR), and the artificial neural network (ANN) model. The MLR retains the advantage of the physical interpretation of the water discharge time series. We prove that time series decomposition is essential before the application of any model. Also, decomposition shows that the Mohawk River is affected by multiple time scale components that contribute to the hydrologic cycle of the included watersheds. Comparison of the models proves that the application of the ANN on the decomposed time series improves the accuracy of forecasting flood events. The hybrid model which consists of time series decomposition and artificial neural network leads to a forecasting up to 96% of the explanation for the water discharge time series.

1. Introduction

Worldwide, flood events are the most significant natural hazards yielding severe consequences to the socio-economic structure with damage up to billions of dollars [1,2,3]. The flood mortality rate during the last couple of years has fallen during this natural hazard which is called the most dangerous natural phenomenon, but although the mortality rate is decreasing the cost is increasing [4]. An increase in the frequency of irregular flood events has been observed with climatic changes [5,6,7,8,9], and prediction of the flood events is required in highly occupied regions such as in New York State.
In upstate New York along the Mohawk River and other tributaries, flood events are widespread and persistent. In Schenectady, New York, flood events are due to either an increase in precipitation and storms during the summer or ice jams in the winter months. Ice jams occur when floes accumulate at the base of bridge piers, locks, and dam structures, impeding the downstream water flow and causing an upstream rise in the water level [10,11]. Flood forecasting research has been conducted in the past at Schenectady in which flood forecasting and damage evaluation has been surveyed [12,13,14,15,16,17]. Prediction of the flood events is a necessity that will mitigate damage. In this study, we implement an advanced artificial neural network (ANN) prediction model that is based on minimum interferences time series from hydrogeological and climatic data to predict the water discharge time series in Mohawk River.
Flood prediction and forecasting is a topic of the most significant interest in the area of natural hazards [18,19,20]. There are two main approaches to the flood prediction problem. The first uses mathematical modeling where the physical dynamics are studied, modeled and applied. Mathematical modeling takes into consideration measurements reflecting the hydrologic cycle. These metrics are then united and transformed into a hydrographic prediction tailor-made to a specific spatial variation of the variables contributing to water flow. A channel flow routing model can then be adapted to calculate river flow and yield a successful prediction. River flow or water discharge forecasting is an example of mathematical modeling for flow forecasting [21]. A second approach is the statistical observations of the relationships among inputs and outputs of water discharge with no other meta information of the physical dynamics of the process. Such observations contribute to the creation of stochastic models such as moving average models [22], and Markov [20], respectively.
Natural hazard geophysical processes are related to multiple daily natural phenomena like rain, wind and gravitational forces. Some hazards are described by univariate models like precipitation, wind speed, or groundwater-level time series variables. Floods result in a multi-variable relationship which describes the interaction of the climatic and hydrogeological variables. However, this correlation may involve interferences in the data acquired from different time scales. An example is the groundwater fluctuation which is related to annual water discharge, seasonal cycles, and short-term variations [14] and a resulted prediction yielded up to 83% of explanation using the multiple linear regression model and the decomposition of the variables. The orchestration of those components results in interferences. Those interferences introduce noise in the data and may hamper the accuracy of the forecasting analysis. The decomposition of the variables into different components decrease the noise and separate the different signals on the data. However, the short-term prediction model showed the lowest performance among the different time series components. In this research, we present how this methodology can be improved using non-linear methods such as the ANN.
There are several studies on analysis and prediction in river flooding applied in several locations (e.g., in Germany, Indus River in India or Elbe River in Czech Republic) [23,24,25,26]. In the Czech Republic, the investigation of the August 2002 flooding has shown an explanation described by a correlation of 0.75 yielding a coefficient of determination (R2) value of 0.56 between the water discharge and the climatic variables. The resulting moderate linear relationship might be due to a mixed interference of scales in the time series. However, this model can be improved with the separation of scales in the time series [27,28,29]. In particular, the separation of the components in the time series is necessary to avoid interferences from different covariance structures on the data. An absence of separation of scales may lead to erroneously estimated parameters of the linear regression model or other multivariate models (principal components, canonical pairs). For this reason, the time series decomposition is essential before performing any analysis.
The time series decomposition using the Kolmogorov–Zurbenko (KZ) filter provides adequate separation of frequencies in the time series data, and it has been applied in many environmental applications [30,31,32,33]. The KZ filter provides a simple design and allows a physical interpretation of all the components of the time series [29,34]. The KZ filter provides the best and closest results to the optimal mean square of error. Also, it allows effective separation of frequencies for application directly to datasets with missing data [18,32,35,36].
Several studies have examined the prediction of the water discharge in different locations [9,23,37,38]. Some studies have used artificial neural networks [23,37,38], and other traditional statistical approaches [9,39]. ANNs have become popular for prediction and modeling purposes in various fields [40,41,42,43,44,45]. Several studies have developed hydrological forecasting models of river flow using ANN [46,47]. In recent studies, they have been used as an alternative to previous methods such as statistical and empirical methods for evaluating different physical phenomena [23,37,43]. In comparison with a multiple linear regression (MLR) model, an ANN provides a computational way of determining a non-linear relationship between some inputs and one or more outputs. ANN has applied for modeling, identification, and prediction of complex systems [48]. Although the methods used for the prediction of the water discharge provide prediction accuracy approximately up to 80% [14], the performance of the prediction may be increased using ANNs. There are cases where ANNs can suffer in a similar way to linear models due to dataset limitations such as volume and noise [49,50].
Noise in hydrologic datasets is a case-specific feature and depends on many factors (including equipment and sensor sensitivity). Different signals may produce noise on the time series data [14,27,28] and subsequently influence the ANN application. We intend to determine the different signals on the time series and eliminate the noise on the data using time series decomposition. We aim to examine how much the ANN application will be improved for each component, separately.
In this study, we examine the water discharge time series as a function of climatic and hydrological variables. We use the multiple linear regression model and artificial neural networks in the decomposed time series data. We examine whether the time series components can explain the hydrological behavior of the Mohawk River basin and reveal the main characteristics of the watershed. We aim to facilitate modeling by separating individual patterns, and decrease interferences originating from covariance structures existing between the components of the time series. We compare the linear MLR model with a non-linear ANN one using the climatic and hydrogeological variables. It is anticipated that this study will provide baseline information toward the establishment of a flood warning system for several sections of Mohawk Watershed, New York.

2. Materials and Methods

2.1. Study Area

The Mohawk River crosses its basin with an easterly flow direction and a length of 240 kilometers. It is the largest tributary to the Hudson River, which flows into Cohoes, a few kilometers north of the city of Albany. Many tributaries contribute to the Mohawk River including the Schoharie Creek, the East Canada Creek, and the West Canada Creek. We selected for investigation three water discharge stations located in Mohawk River basin western from Albany, NY, USA, and at a location where a confluence exists between the Mohawk River and the Schoharie Creek originating from the Mohawk watershed and the Schoharie watershed, respectively (Figure 1). The Mohawk River Watershed is divided into three geographic regions. Those are the Upper Mohawk, the Main River, and the Schoharie Watershed. The Main River is also called the Mohawk Valley which includes portions of Fulton, Montgomery, Schenectady, Saratoga, and Albany Counties, and it is a lowland area with extensive agricultural land use. Schoharie Creek is a less agricultural land use area, and it contributes to the public water supply of New York City through a pipeline from the Schoharie Reservoir [51].

2.2. Data

For the analysis, we use daily water discharge, groundwater level, temperature, tides, wind speed, and precipitation time series data. The water discharge data were derived from three different United States Geological Survey (USGS) gage stations (Figure 1). Water discharge data are measured from 1 January 2005 to 23 February 2013. For the water discharge data, we use the monitoring stations 01351500 (https://waterdata.usgs.gov/nwis/inventory?agency_code=USGS&site_no=01351500; latitude 42°48′00″, longitude 74°15′46″), 01347000 (https://waterdata.usgs.gov/nwis/inventory?agency_code=USGS&site_no=01347000; latitude 42°55′11.1″, longitude 74°25′40.5″), and 01346000 (https://waterdata.usgs.gov/nwis/inventory?agency_code=USGS&site_no=01346000; latitude 43°04′08″, longitude 74°59′18″. Referred in this paper as WD1500, WD1347, and WD1346 respectively). Those stations are located near West Canada Creek (WD1346), East Canada Creek (WD1347), and Schoharie Creek (WD1500). The groundwater USGS station 425511074254001 (https://waterdata.usgs.gov/nwis/inventory?agency_code=USGS&site_no=425511074254001; latitude 42°55′11.1″, longitude 74°25′40.5″) is located between WD1500 and WD1347 and is referred to as GWL4001. The atmospheric data have been obtained by the National Oceanic and Atmospheric Administration database (NOAA(http://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/quality-controlled-local-climatological-data-qclcd), Albany, Hudson River, NY). The water discharge is measured in cubic feet per second (cfs), the groundwater level in feet (ft) with reference to land-surface datum (LSD; that is depth to the water level), the average wind speed in miles per hour (mph), the temperature in Fahrenheit (°F), the precipitation data in inches to hundredths, and the tides (https://tidesandcurrents.noaa.gov/stationhome.html?id=8518995) in feet from the NOAA station 8518995 in Albany, Hudson River (latitude 42°39′ N, longitude 73°44.8′ W).

2.3. Time Series Decomposition–Kolmogorov–Zurbenko (KZ) Filter

The presence of different signals in time series data may provoke misleading results for the application of a numerical model on the raw data [14,18,19]. The decomposition of the time series applied to the variables [14,18], and separation of the different signals on the data has been applied and results in the minimum interferences between the components of the time series. After the decomposition of the time series, we apply a model for each component separately [19,27,28]. The decomposition of the time series of a variable can be expressed using the expression (1):
X ( t ) = L ( t ) + Se ( t ) + Sh ( t )
where X(t) represents the original time series of a variable, L(t) is the long-term trend component, Se(t) is the seasonal component, and Sh(t) is the short-term component. The long-term component expresses the fluctuations of a time series longer than a given threshold, the seasonal component describes the intraannual fluctuations, and the short-term component describes the short-term variations.
The KZ filter was applied for the decomposition of the time series data. The KZ filter is a low pass filter, and it is defined by three (p) iterations of a simple moving average of 33 (m) points, where an odd window size m is used (m = 2k + 1). The moving average of the KZ is given by expression (2):
Y t = 1 m j = k k ( X t + j )
where m = 2k + 1. The output of the first iteration becomes the input for the second iteration, and so on. The time series produced by p iterations of the filter described in expression (2) is denoted by:
Y t = KZ m , p ( X t )
The KZ33,3 filter (length of 33 with three iterations) was applied in all the variables to provide a physical based explanation of the water discharge time series. The parameters of the filter have been selected to provide the optimal solution for the water discharge time series [14].
The water discharge time series for the three different stations were logarithmically transformed to convert the water discharge time series into a stationary time series with constant variance (Figure 2). The KZ filter was applied to the logarithm of the daily water discharge and produced a time series which consists of the long-term variations in the time series (L(t)). The KZ filter with the same parameters was applied to the variables of daily wind speed, temperature, tides, precipitation, and groundwater level and produced the long-term component of each variable.

2.4. Multiple Linear Regression (MLR)

For the explanation and prediction of the water discharge time series by the climatic and hydrogeological variables, we use the multiple linear regression model which describes the projection of the water discharge time series into the climatic and hydrogeological variables. The purpose of the MLR is to explain as much as possible of the variation observed in the water discharge time series, leaving as little variation as possible to unexplained “noise” [52]. For the raw time series data, the multiple linear regression model can be expressed by:
WD(t) = a* T(t) + b* TD(t) + c* PR(t) + d* WS(t) + e* GWL(t) +ε(t)
where WD(t), T(t), TD(t), PR(t), WS(t), GWL(t), denote the raw data of the logarithm of the water discharge, temperature, tide, precipitation, wind speed, and groundwater-level time series, respectively. The regression coefficients in the multiple linear regression model described in expression (4), have been selected such as they are statistically significant for the model and have been multiplied with the independent variables. ε(t) represents the residuals.
Because expression (4) is given for the natural logarithm of the water discharge, the additive term ε(t) has a multiplicative effect, exp(ε(t)), in the original time series. Since the term ε(t) is sufficiently small, we can conclude that:
exp(ε(t)) ≈ 1 + ε(t),
or ln(1 + ε(t)) ≈ ε(t)
Therefore, ε(t)*100 corresponds to percent changes in the water discharge time series unexplained by the climatic and hydrogeological variables.
To measure the strength of the relationship in the multiple linear regression model of the raw data, we use the coefficient of determination, which is the square of the correlation coefficient, R2 as a measure of goodness of fit.
For the prediction of the long-term component of the water discharge, the multiple linear regression model is applied in the filtered climatic and hydrogeological variables in the three stations. Expression (6) describes the multiple linear regression model of the long-term component of the water discharge time series:
WDLT(t) = e* TLT(t) + f* TDLT(t) + g* PRLT(t) + h*WSLT(t) + i* GWLLT(t) +ε(t)
In expression (6), we denote the long-term components of the water discharge, temperature, tides, precipitation, wind speed and groundwater-level time series with WDLT(t), TLT(t), TDLT(t), PRLT(t), WSLT(t), GWLLT(t), respectively.
Similar methodology has been applied to the prediction of the seasonal component of the water discharge time series. The seasonal component of the water discharge, WDSe(d), can be defined by the expression (7):
WD Se ( d ) = 1 Y y = 1 Y ( WD ( t ) WD LT ( t ) )
d = 1, 2, …, 365, and d = 1 represents 1 January; y = 1, ..., Y.
In expression (7), d represents the days of a year, y is the number of years of the observed values, WD(t) is the raw logarithm of the water discharge time series, and WDLT(t) is the long-term component of the water discharge time series. The seasonal components of the climatic and hydrogeological variables can be defined similarly. Based on the previous studies, for the prediction of the seasonal component of the water discharge time series, we apply the multivariate linear regression model [14].
To estimate the short-term component of the water discharge time series, WDSh (t), we use the expression (8):
WDSh (t) = WD(t) − WDLT(t) − WDSe(d)
where WD(t) is the raw water discharge time series, WDLT(t) is the long-term component of the water discharge time series, and WDSe(d) is the seasonal component of the water discharge. Similarly, we can estimate the short-term components of the climatic and hydrogeological variables.
To estimate the total explanation of the MLR model for the water discharge time series, we combine the explanation of the multiple linear regression models for the long, seasonal, and short-term components using expression (9):
Expl = ( KZ ,   Se ,   Sh ) ( R 2 % Var )
where R2 is the coefficient determination and the %Var is the percent of the variance.

2.5. Artificial Neural Network (ANN)

ANNs are a robust method for classifying flow data which may perform well for solving non-linear problems using their inner-parallel architecture [20,49,50,53]. The neural classification usually has two main areas of application. One area is to use a number of relevant dataset attributes (predictors) and calculate the value of an unknown attribute (target value). Another area is to use all available attributes at any given time window t and calculate their value at t + 1, t + 2, …, t + n (one step ahead and multi-step ahead prediction). This methodology in combination with the KZ filter may provide the minimum interferences between the long-, seasonal-, and short-term component and a maximum performance of ANN.
For this study, ANNs are applied to the decomposed time series (long-, seasonal-, and short-term component) of all the variables. ANN is a mathematical model that consists of an interconnected group of neurons, and it processes information using a connection to approach the computation. ANN changes the structure based on external and internal information that flows through the network during the learning process. The advantages of ANNs are that they can represent both linear and non-linear relationships and learn these relationships directly from the data [53].
For the application of ANNs, a feedforward multilayer perceptron (MLP) classifier with two layers has been implemented for the analysis. The hyperbolic tangent has been selected as the transfer function of the MLP classifier. The formula of the transfer function can be described by Equation (10):
φii) = tanh (υi)
where φi is the output of the ith node (neuron) and υi is the weighted sum of the input connections [53]. MLP training identifies the weight, ωij, for each layer based on the network learning (Figure 3).
Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. MLP applies back-propagation to improve the weights of each neuron while minimizing the error [53]. The error can be represented by Equation (11):
e j ( n ) = d j ( n ) y j ( n )
where j is an output node in the nth data point, d is the target value, and y is the MLP produced value [53]. The error in the output node j in the nth data point is given by the following Equation (12):
ε ( n ) = 1 2 j e j 2 ( n )
To minimize the error of each weight, we may use the gradient descent as described in following Equation (13):
Δ ω ij ( n ) = η ε ( n ) υ j ( n ) y i ( n )
In Equation (13), yi is the output of the previous neuron and n is the learning rate. To evaluate our proposed ANNs architecture, different sensor observations have been used from 2005 to 2013. We split the dataset in this study in an 80% (training) and 20% (testing) ratio. A seed was kept constant to provide consistent results in the ANN runs of the different time scale components. As soon as we split the climatic and hydrological time series variables, we apply the ANN approach in each component (long-, seasonal-, and short-term), separately (Figure 4).

2.6. Assessment of Model Performance for MLR and ANN Model

To estimate the performance of the ANN and MLR model, the coefficient of determination, R2, and the mean square of error (MSE) are used for the analysis. The forecasting accuracy of the water discharge time series is determined by using those criteria with the following expressions (14 and 15):
MSE = 1 n i = 1 n ( Y ^ i Y i ) 2
R 2 = i = 1 n ( Y i Y ¯ i ) 2 i = 1 n ( Y i Y ^ i ) 2 i = 1 n ( Y i Y ¯ i ) 2
where, Y i is the actual water discharge value; Y ^ i is the mean of actual water discharge value, Y ¯ i is the estimated water discharge value, and n is the total number of observations. The value of the MSE is always positive, representing a zero in the ideal case [54]. For ideal data modeling, the coefficient of determination, R2, should approach to 100% as closely as possible.

3. Results

3.1. Assessment of the MLR Model

To evaluate the effectiveness of the MLR model in river flood prediction, we estimate the coefficient of determination, R2, for the raw time series data and the components of the time series (long-, seasonal-, and short-term). Table 1 summarizes the coefficients of determination, R2, for the raw data of the three stations. For the raw data, R2 is moderate because different signals exist in all the time series, and they cause interferences in the model. The accuracy of the model has been improved using the decomposition of the time series. In particular, the coefficient of determination varies from 48.7% to 59% for the all the locations (Table 1). Stations WD1346 and WD1347 perform with a similar range (47.6% and 48.7%, respectively), while station WD1500 has shown a better performance of 59%. With the decomposition of all the time series, the accuracy of the MLR model is higher (Table 1). The long-term components of the water discharge time series provide a strong relationship with a coefficient of determination, R2, that ranges from 73.5% to 83.3%. Also, the MLR model applied to the seasonal components of the water discharge time series provides a very strong relationship with values that range from 91.2% to 93.1%. As shown in previous studies, it is reasonable to observe seasonal variations because all the climatic and hydrogeological variables consist of seasonal fluctuations (periodicity 365 days) [14]. The short-term component provides R2 of 34.1% at the station WD1500, while the R2 ranges from 68.3% to 70.6% for the stations WD1346 and WD1347. All stations have shown a range of different time-scale components affecting the Mohawk watershed in particular in the short term.
The performance of the MLR model varies for the short-term components of the three stations. The stations WD1346 and WD1347 located at Herkimer and Little Falls area, inside the Mohawk watershed provide a similar performance of the MLR model (68.3% and 70.6%), while the performance is lower for the station WD1500 (34.1%) which is located at Schoharie Creek, inside the Schoharie watershed. The multiple linear regression model is applied in the short-term components of the variables and derives a moderate R2 (Table 1).

3.2. Assessment of the ANN Model

The ANN model has been applied in the raw data (Figure 5) yielding a range of coefficient determination from 53.4% to 60.7% in all the locations (Table 2). The accuracy of forecasting has been improved using the ANNs on the decomposed time series data (long, seasonal, and short-term component). In particular, the performance of the long-term component ranges from 97.6% to 98% (Table 2) which is approximately 15% higher compared with the MLR model. The performance of the seasonal component using the ANN model yields a small improvement compared to the MLR model (R2 ranges from 93.1% to 95.1% for ANN). However, the performance of the short-term component has been substantially improved using the ANN model (R2 ranges from 94.1% to 95.3%). The accuracy of the model has been improved approximately three to six times for the short-term component (depending on the station) compared with the MLR method.

3.3. Mean Square of Error (MSE) for the MLR and ANN Model

Another measure that we use to evaluate the performance of both models is the MSE. Table 3 shows the MSE estimated for the MLR and the training and testing data of the ANN. For the decomposed data of the water discharge time series of all the stations, the MSE is lower for the ANN compared with the MLR. This reveals that the ANN model provides a higher performance compared with the MLR model.
MSE has also been plotted as a function of the coefficient of determination, R2 (Figure 6). Using the ANN model in the decomposed data (long-, seasonal-, and short-term), a lower MSE and higher R2 has been achieved compared with the MLR model. Figure 6 also reveals a linear pattern between the MSE and the coefficient of determination for each of the component for both models (MLR and ANN), while there is no pattern between the measures for the raw data. This pattern of the forecasting accuracy (lower MSE) has been achieved when the frequencies on the data have been separated (long, seasonal, and short-term component). Figure 6, also shows a different pattern between the MSE and the coefficient of determination for the two different watersheds (Mohawk River and Schoharie Creek). It is reasonable to assume that the accuracy of the model depends on the watershed of the river for the MLR model. This plot can be used to identify different watersheds based on the measures of goodness of fit (MSE and R2) for the MLR model. However, such an interpretation was not apparent in the ANN model.

3.4. Total Explanation for the MLR and ANN Model

To evaluate the overall explanation of the water discharge time series in all the stations, we multiply the percent of variance for each component with the corresponding coefficient of determination (Expression (9)) for the two different models (MLR and ANN). Table 4 shows the results of the MLR and ANN application to the three stations. From Table 4, it can be observed that the overall forecasting accuracy has been increased up to six times using the ANN compared with the MLR models. In particular, for station WD1500, the overall explanation has been increased six times (from 69.4% to 95.1%) using ANNs. Similar results can be conducted for the remaining stations. The increase in the overall explanation of the model using the ANN method is attributed to the significant increase in the explanation of the short-term component. The short-term component describes short-term variations which can be modeled with a better accuracy using a probabilistic method (ANN). The explanation of all the stations varies from 95.1% to 95.8%, which allows a very accurate prediction of the flood events.
A graphical representation of the overall MLR and ANN model can be used to compare the raw and the predicted values of the water discharge time series in all the stations. Following the process, Figure 7 shows the flooding event during the Irene storm (21 August to 28 August 2011) in West Canada Creek at near Herkimer (Mohawk Watershed). Using the ANN model, the predicted values perform well compared with the raw data. With the MLR model, some of the flood events may be predicted but with not such high forecasting accuracy. A similar plot can be derived for the station WD1500 (Figure 8) which shows the impact of the Mid-Atlantic United States flood of 2006 (25 June to 5 July 2006) at Schoharie Creek, New York. The predicted values of the water discharge using the ANN model achieve all the flood events and perform well compared with the raw data. Previous studies have shown that separating the data into summer and winter periods may increase the performance of the MLR model [14]. This plot shows the MLR model derived from only the summer period of 2005. The flooding events may be predicted as a good fit, but they do not provide as high a performance as the ANN model. In this study, we do not separate the data in summer and winter periods as the ANN model provides the maximum performance in the predicted values without separating the data in different periods of the year.

4. Discussion

4.1. Performance of ANN and MLR on Decomposition Data

Traditional statistical models (i.e., multiple linear regression model) have shown substantially comprehensive and adequate results to the ongoing demand for as close to real-time as possible analysis and prediction [55,56,57,58]. In our study MLR and ANN applications in decomposed time series substantially increase the model accuracy. Time series decomposition is essential before the application of the linear or non-linear model to achieve the separation of the different signals in the time series. However, there is a measurable limitation regarding the targeted prediction in terms of the long- and short-term components of the data by applying the multiple linear regression model (Table 1), which was very prominent with the short-term component of the water discharge from the station WD1500. The prediction accuracy of the water discharge can be improved using ANNs. These have been shown to be highly efficient in predictions of the time series data in a range of environments [55,56,58]. Therefore, a hybrid time series analysis and artificial neural network approach increases the prediction in all the components of the water discharge time series.
The MLR model had a wide range of coefficient determination (R2) based on the short-, seasonal-, and long-term data. The stations WD1346 and WD1347 located at the Herkimer and Little Falls area, inside the same watershed that is the Mohawk Watershed provide a similar performance as the MLR model on the short-term component (68.3% and 70.6%). The WD1500 is located at Schoharie Creek, in a watershed that provides a lower performance on the short term (34.1%). This performance can be justified due to the naive nature of the model which does not use any previous knowledge (e.g., case-based reasoning) or feed outputs to inputs (recurrent neural networks) to enhance its training and improve its outcomes. The superior performance of the ANN on the short-term time data can be explained because a clear pattern can be identified from the decomposed data and, thus, the training can make the model substantially robust and accurate to its predictions. The short-term data are “cleaned” enough from the long-term and seasonal-term variations enabling the ANNs application for an effective non-linear approach for the prediction of the short-term component of the water discharge time series with better accuracy than the multiple linear regression model. ANNs stands above MLR in performance in the short-term component, while the MLR determines a weak R2 for the short-term component implying the existence of an unclear linear relationship.

4.2. Comparison of MLR and ANN Performance on the Physical Interpretation

This study has not only presented the robustness of ANN models in river flood forecasting but also assessed their superiority to multiple regression models. To establish the actual merits of ANNs relative to conventional statistical methods (i.e., the multiple linear regression model), we compared the performance of the ANN model and the MLR at each component. The common technique for both models is the necessity of the decomposition of the time series which allows a physical elimination of the interferences of different frequencies in the time series. This paper presents a combination of time series decomposition and ANNs which offers more advantages than the combination of time series decomposition and the multiple linear regression model. The results of this study can be applied to similar watersheds for the river flood prediction.
It is well known that stochastic optimization does not provide identical results each run time. For this reason, studying an ensemble model like an ANN and an MLR provides more accurate prediction and comparisons. In our study, keeping the seed constant in ANN, the results were consistent for MSE and R2. However, utilizing stochastic processes such as ANN, results may easily differ through changing the weights or the importance of the variables by providing the same R2 and MSE which subsequently may yield wrong interpretations of the model. A linear approach is a cause–effect relationship between the dependent and the independent variables that do not allow such loose constraints, and although it may indicate lower values on the measures of goodness of fit such as the R2 and the MSE of the ANN, the linear approach is more reliable in the physical interpretation of the prediction model.
The long-term component of the water discharge of the WD1500 at Schoharie Creek dominates in comparison to the short-term component, while the WD1346 and WD1347 stations located closer to the Mohawk Watershed show very similar long-term and short-term components. A physical interpretation may be provided by MLR, perhaps, by distinguishing different watersheds that correlate to contrasting performances in the different time scale components, while the outstanding performance of the ANN regardless of the watershed shows uniform results. The total explanation using the MLR model reveals the importance of each time scale component. Each component in the water discharge time series may affect the Mohawk Watershed and the subsequent hydrologic cycle.
The MLR model’s coefficient determination (R2) of the water discharge time series varies between the time series components, especially with the location of the water discharge stations. All stations, regardless of their location, indicate similar values of the linear relationship in the seasonal-term and the long-term components. The short-term shows a broader range of values in the three stations (Figure 9). In particular, the short-term value is lower for the station that is located at the Schoharie Creek. The stations located at the Mohawk Watershed have provided higher performance in the short-term. The WD1346 measures the water discharge of the East Canada Creek, the station is located 6.6 km away from the Mohawk River following the ECC meander’s path, and the WD1347 station measures the Mohawk River. The WD1500 station located at the Schoharie Watershed is at a significant distance from the Mohawk River of approximately 24 km following the Schoharie’s Creek meander path. We think that this contrasting curvature length may arrange the way that a river may recharge or discharge in a short time. In the long term, the curvature length may not affect the discharge rate as it is reflected in the uniform high performance of the MLR model across the three stations. It is reasonable to assume that in stations located closer to Schoharie Creek, such as the WD1500, the series may be affected by the river and the watershed properties or use such as permeability, agricultural, residential or business activities and associated water usage or supply needs. Also, we notice that the long-term components seem to dominate in watersheds with longer river meanders and less curvature, while the short-term components are mostly affected by smaller watersheds, and shorter river meanders with mature curvature. Another possible explanation for the watersheds’ performance difference in the short term is the characteristics of local climatic variations at the drainage basin including but not limited to the precipitation, temperature, and local winds.
Even though there might be local climatic variations in the Schoharie Watershed, mainly due to a mountainous landscape, upstate New York is the primary source of water supply for New York State. Reservoirs and dams operation are necessary to the sustainable water supply and flood control. Reservoirs such as those of Delta Dam on the Mohawk River above Rome supply water to the Erie Canal or the Hinckley Reservoir at West Canada Creek or at Schoharie Creek, where at the Gilboa Dam the Schoharie Reservoir supplies water to the New York City. The reservoirs’ operation may introduce interferences in the water discharge time series and affect the multi-scale time series components, especially the short-term variations. The Schoharie Watershed is one of the two reservoirs in the Catskill system, and this watershed drains from the Catskills mountainous area. Schoharie watershed behaves as a diverse basin with dimensions and storage capacity ratio different from other watersheds such as the Mohawk Valley Watershed as it is a very small reservoir that may provide a rapid watershed recharge or discharge [55]. Therefore, the short-term component is expected to be affected by local climatic variations or human contribution such as the reservoir operation to provide the public supply of the water in the New York City.

5. Conclusions

The hybrid model application of the time series decomposition and ANNs has been shown to be an efficient model of flood hazard forecasting, and it presents an improved prediction model for a flood hazard up to 96%. The combination of the KZ filter and the ANNs approach allows us to eliminate the short-term variations on the decomposed data and take advantage of a non-linear approach. In particular, the decomposition of the water discharge time series from different watersheds has substantially diminished interferences, and it improved the ANN’s performance. This methodology allows us to study similar patterns and frequencies in the same component and maximizes the performance and physical interpretation of the forecasting model. The results demonstrated that ANN provides a more efficient prediction model compared with the MLR model. However, the MLR model allows physical interpretation of the decomposed data. The Mohawk Watershed is affected by physical phenomena dependent on multiple time scale components such as long-term or short-term variations of the water discharge time series. Local infrastructure and control of the water supply through dams and intermittent operation of reservoirs may affect the short-term water discharge variations. Depending on the drainage area, the short-term water discharge variations may be prominent in the MLR prediction model.

Author Contributions

K.T., A.M. and S.K. are responsible for the conceptualization, the methodology, the application of software in the data, the formal analysis of the data, the visualization of the results, and the writing, review and editing of different parts of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We would like to acknowledge the anonymous reviewers for their careful reading of our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Svetlana, D.; Radovan, D.; Ján, D. The economic impact of floods and their importance in different regions of the world with emphasis on Europe. Procedia Econ. Financ. 2015, 34, 649–655. [Google Scholar] [CrossRef]
  2. Berz, G. Flood disasters: Lessons from the past—Worries for the future. Proc. Inst Civ. Eng.-Water Marit. Eng. 2000, 142, 3–8. [Google Scholar] [CrossRef]
  3. Jha, A.K.; Bloch, R.; Lamond, J. Cities and Flooding: A Guide to Integrated Urban Flood Risk Management for the 21st Century; The World Bank: Washington, DC, USA, 2012. [Google Scholar]
  4. Annual Disaster Statistical Review 2016: The Numbers and Trends; Centre for Research on the Epidemiology of Disasters (CRED), Institute of Health and Society (IRSS): Université catholique de Louvain, Brussels, Belgium, 2017; p. 91.
  5. Arnell, N.W. Climate change and global water resources. Glob. Environ. Chang. 1999, 9, 31–49. [Google Scholar] [CrossRef]
  6. Arora, V.K.; Boer, G.J. Effects of simulated climate change on the hydrology of major river basins. J. Geophys. Res. Atmos. 2001, 106, 3335–3348. [Google Scholar] [CrossRef] [Green Version]
  7. Milly, P.C.D.; Wetherald, R.T.; Dunne, K.A.; Delworth, T.L. Increasing risk of great floods in a changing climate. Nature 2002, 415, 514–517. [Google Scholar] [CrossRef] [PubMed]
  8. Kleinen, T.; Petschel-Held, G. Integrated assessment of changes in flooding probabilities due to climate change. Clim. Chang. 2007, 81, 283–312. [Google Scholar] [CrossRef]
  9. Damle, C.; Yalcin, A. Flood prediction using Time Series Data Mining. J. Hydrol. 2007, 333, 305–316. [Google Scholar] [CrossRef]
  10. Pariset, E.; Hausser, R.; Gagnon, A. Formation of Ice Covers and Ice Jams in Rivers. J. Hydraul. Div. 1966, 92, 1–24. [Google Scholar]
  11. Garver, J.I.; Cockburn, J.M.H. A historical Perspective of Ice Jams on the Lower Mohawk River. In Proceedings of the Mohawk Watershed Symposium, Schenectady, NY, USA, 27 March 2009; pp. 25–29. [Google Scholar]
  12. Marsellos, A.E.; Garver, J.I.; Cockburn, J.M.H. Flood Map and Volumetric Calculation of the January 25th Ice-jam Flood Using LiDAR and GIS. In Proceedings of the Mohawk Watershed Symposium, Schenectady, NY, USA, 19 March 2010; pp. 23–27. [Google Scholar]
  13. Foster, J.A.; Marsellos, A.E.; Garver, J.I. Predicting Trigger Level for Ice jam Flooding of the Lower Mohawk River Using LiDAR and GIS. In Proceedings of the Mohawk Symposium, Schenectady, NY, USA, 18 March 2011; pp. 13–15. [Google Scholar]
  14. Tsakiri, K.G.; Marsellos, A.E.; Zurbenko, I.G. An efficient prediction model for water discharge in Schoharie Creek, NY. J. Climatol. 2014. [Google Scholar] [CrossRef] [PubMed]
  15. Plate, E.J.; Shahzad, K.M. Uncertainty analysis of multi-model flood forecasts. Water 2015, 7, 6788–6809. [Google Scholar] [CrossRef]
  16. Lewis, A.; Weaver, E.; Dorward, E.; Marsellos, A.E. Two Methods for Determining the Extent of Flooding During Hurricane Irene in Schenectady, NY. In Proceedings of the Mohawk Watershed Symposium, Schenectady, NY, USA, 18 March 2016; pp. 21–23. [Google Scholar]
  17. Mahoney, L.; Roscoe, S.L.; Marsellos, A.E. Reconstruction and flood simulation using GIS and Google Earth to Determine the Extent and Damage of the January 14–15th 2018 Ice Jams on the Mohawk River in Schenectady, New York. In Proceedings of the Mohawk Watershed Symposium, Schenectady, NY, USA, 23 March 2018; pp. 39–42. [Google Scholar]
  18. Yang, W.; Zurbenko, I. Nonstationarity. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 107–115. [Google Scholar] [CrossRef]
  19. Rao, S.T.; Zurbenko, I.G.; Neagu, R.; Porter, P.S.; Ku, J.Y.; Henry, R.F. Space and time scales in ambient ozone data. Bull. Am. Meteorol. Soc. 1997, 10, 2153–2166. [Google Scholar] [CrossRef]
  20. Yakowitz, S. Markov flow models and the flood warning problem. Water Resour. Res. 1985, 21, 81–88. [Google Scholar] [CrossRef]
  21. Moore, R.J.; Jones, D.A.; Black, K.B.; Austin, R.M.; Carrington, D.S.; Tinnion, M.; Akhondi, A. RFFS and HYRAD: Integrated System for Rainfall and River Flow Forecasting in Real-Time and Their Application in Yorkshire; BHS National Meeting: Newcastle upon Tyne, UK, 1994. [Google Scholar]
  22. Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control. Available online: http://garfield.library.upenn.edu/classics1989/A1989AV48500001.pdf (accessed on 22 August 2018).
  23. Elsafi, S.H. Artificial Neural Networks (ANNs) for flood forecasting at Dongola station in River Nile Sudan. Alex. Eng. J. 2014, 53, 655–662. [Google Scholar] [CrossRef]
  24. Bronstert, A. River flooding in Germany: Influenced By climate change? Phys. Chem. Earth 1995, 20, 445–450. [Google Scholar] [CrossRef]
  25. Hartmann, H.; Andresky, L. Flooding in the Indus river basin—A spatiotemporal analysis of precipitation records. Glob. Planet. Chang. 2013, 107, 25–35. [Google Scholar] [CrossRef]
  26. Kotlarski, S.; Hagemann, S.; Krahe, P.; Podzun, R.; Jacob, D. The Elbe river flooding 2002 as seen by an extended regional climate model. J. Hydrol. 2012, 472, 169–183. [Google Scholar] [CrossRef]
  27. Zurbenko, I.G.; Sowizral, M. Resolution of the destructive effect of noise on linear regression of two time series. Far East J. Theor. Stat. 1999, 3, 139–157. [Google Scholar]
  28. Tsakiri, K.G.; Zurbenko, I.G. Effect of noise in principal component analysis. J. Stat. Math. 2011, 2, 40. [Google Scholar]
  29. Zurbenko, I.G. The Spectral Analysis of Time Series; Amsterdam North Holland Series in Statistics and Probability; Elsevier North-Holland: New York, NY, USA, 1986. [Google Scholar]
  30. Tsakiri, K.; Zurbenko, I.G. Explanation of fluctuations in water use time series. Environ. Ecol Stat. 2013, 20, 399–412. [Google Scholar] [CrossRef]
  31. Tsakiri, K.G.; Zurbenko, I.G. Prediction of ozone concentrations using atmospheric variables. Air Qual. Atmos. Health 2011, 4, 111–120. [Google Scholar] [CrossRef]
  32. Eskridge, R.E.; Ku, J.U.; Porter, P.S.; Rao, S.T.; Zurbenko, I. Separating Different Scales of Motion in Time Series of Meteorological Variables. Bull. Am. Meteorol. Soc. 1997, 78, 1473–1483. [Google Scholar] [CrossRef] [Green Version]
  33. Ward, D. Guidelines for the reporting of different forms of data. Data and Metadata Reporting and Presentation Handbook; OECD Publishing: Paris, France, 2007. [Google Scholar]
  34. Wikipedia. Kolmogorov-Zurbenko Filters. Available online: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Zurbenko_filter (accessed on 2 February 2018).
  35. Yang, W.; Zurbenko, I. Kolmogorov-Zurbenko filters. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 340–351. [Google Scholar] [CrossRef]
  36. Close, B.; Zurbenko, I. Kolmogorov-Zurbenko Adaptive Filters. Available online: http://cran.r-project.org/web/packages/kza/index.html (accessed on 22 August 2018).
  37. Chiari, F.; Delhom, M.; Santucci, J.F.; Filippi, J.B. Prediction of the hydrologic behavior of a watershed using artificial neural networks and geographic information systems. IEEE Syst. Man Cybern. 2000, 1, 382–386. [Google Scholar]
  38. Campolo, M.; Andreussi, P.; Soldati, A. River flood forecasting with a neural network model. Water Resour. Res. 1999, 35, 1191–1197. [Google Scholar] [CrossRef] [Green Version]
  39. Xiong, L.; Shamseldin, A.Y.; O’connor, K.M. A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi–Sugeno fuzzy system. J. Hydrol. 2001, 245, 196–217. [Google Scholar] [CrossRef]
  40. Smith, J.; Eli, R.N. Neural-network models of rainfall-runoff process. J. Water Resour. Plann. Manag. 1995, 121, 499–508. [Google Scholar] [CrossRef]
  41. Lee, S.Y.; Chung, S.K.; Oh, J.K. Demand forecasting based infrastructure asset management. KSCE J. Civ. Eng. 2004, 8, 165–172. [Google Scholar] [CrossRef]
  42. Kim, T.W. Monthly precipitation forecasting using rescaling errors. KSCE J. Civ. Eng. 2006, 10, 137–143. [Google Scholar] [CrossRef]
  43. Şahin, M.; Kaya, Y.; Uyar, M. Comparison of ANN and MLR models for estimating solar radiation in Turkey using NOAA/AVHRR data. Adv. Space Res. 2013, 51, 891–904. [Google Scholar] [CrossRef]
  44. Luxhøj, J.T.; Riis, J.O.; Stensballe, B. A hybrid econometric—Neural network modeling approach for sales forecasting. Int. J. Prod. Econ. 1996, 43, 175–192. [Google Scholar] [CrossRef]
  45. Theodosiou, M. Disaggregation & aggregation of time series components: A hybrid forecasting approach using generalized regression neural networks and the theta method. Neurocomputing 2011, 74, 896–905. [Google Scholar] [CrossRef]
  46. Govindoraju, R.S.; Rao, A.R. Artificial Neural Networks in Hydrology. Available online: https://books.google.com.hk/books?hl=zh-TW&lr=&id=5-7qCAAAQBAJ&oi=fnd&pg=PR11&dq=Artificial+neural+networks+in+hydrology&ots=dIdLPN7xl0&sig=aLKmodH9KCSnVjOk0_D2P2rFP4U&redir_esc=y#v=onepage&q=Artificial%20neural%20networks%20in%20hydrology&f=false (accessed on 22 August 2018).
  47. Kişi, Ö. a combined generalized regression neural network wavelet model for monthly streamflow prediction. KSCE J. Civ. Eng. 2011, 15, 1469–1479. [Google Scholar] [CrossRef]
  48. Li, Y.X.; Jiang, L.C. Application of ANN algorithm in tree height modeling. Appl. Mech. Mater. 2010, 20, 756–761. [Google Scholar] [CrossRef]
  49. Gaume, E.; Gosset, R. Over-parameterisation, a major obstacle to the use of artificial neural networks in hydrology? Hydrol. Earth Syst. Sci. 2003, 7, 693–706. [Google Scholar] [CrossRef] [Green Version]
  50. Imrie, C.E.; Durucan, S.; Korre, A. River flow prediction using artificial neural networks: Generalisation beyond the calibration range. J. Hydrol. 2000, 233, 138–153. [Google Scholar] [CrossRef]
  51. Mohawk River Watershed Coalition (MRWC). Mohawk River Watershed Management Plan; Department of State: New York, NY, USA, March 2015; p. 131.
  52. Helsel, D.R.; Hirsch, R.M. Techniques of Water-Resources Investigations of the U. S. Geological Survey Book 4, Hydrologic Analysis and Interpretation, Chapter A3. Statistical Methods in Water Resources. Available online: https://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new.pdf (accessed on 22 August 2018).
  53. Haykin, S. Neural Networks: A comprehensive Foundation. Available online: https://www.amazon.com/Neural-Networks-Comprehensive-Foundation-2nd/dp/0132733501 (accessed on 22 August 2018).
  54. Iqbal, M. An Introduction to Solar Radiation; Academic Press: Toronto, ON, Canada, 1986; pp. 215–234. [Google Scholar]
  55. Joint Venture. Joint Venture. Joint Venture of Gannett Flemming and Hazen and Sawyer, prepared for NYCDEP. In Phase 1 Final Report, Catskill Turbidity Control Study; Bureau of Water Supply: Flushing, NY, USA, 2004. [Google Scholar]
  56. Minocha, V.K. Discussion of “comparative analysis of event-based rainfall-runoff modeling techniques—Deterministic, statistical, and artificial neural networks” by Ashu Jain and S. K. V. Prasad Indurthy. J. Hydrol. Eng. 2004, 9, 550–551. [Google Scholar] [CrossRef]
  57. Sveinsson, O.G.; Lall, U.; Fortin, V.; Perrault, L.; Gaudet, J.; Zebiak, S.; Kushnir, Y. Forecasting spring reservoir inflows in Churchill Falls basin in Quebec, Canada. J. Hydrol. Eng. 2008, 13, 426–437. [Google Scholar] [CrossRef]
  58. Magar, R.B.; Jothiprakash, V. Intermittent reservoir daily-inflow prediction using lumped and distributed data multi-linear regression models. J. Earth Syst. Sci. 2011, 120, 1067–1084. [Google Scholar] [CrossRef]
Figure 1. The map shows the location of the weather station, the USGS groundwater station (GWL4001), the two USGS water discharge stations (WD1346 and WD1347) located at the Mohawk River (MR) area of the Mohawk watershed, the third station (WD1500) located at the Schoharie Creek (S) of the Schoharie watershed, and the station of the tides in Albany, NY (ECC—East Canada Creek; MR—Mohawk River; S—Schoharie Creek; WCC—West Canada Creek; NOAA—National Oceanic and Atmospheric Administration; USGS—United States Geological Survey). The geographic information system (GIS) map was composed in ArcMap utilizing data from the New York State clearinghouse and the New York geographic information gateway.
Figure 1. The map shows the location of the weather station, the USGS groundwater station (GWL4001), the two USGS water discharge stations (WD1346 and WD1347) located at the Mohawk River (MR) area of the Mohawk watershed, the third station (WD1500) located at the Schoharie Creek (S) of the Schoharie watershed, and the station of the tides in Albany, NY (ECC—East Canada Creek; MR—Mohawk River; S—Schoharie Creek; WCC—West Canada Creek; NOAA—National Oceanic and Atmospheric Administration; USGS—United States Geological Survey). The geographic information system (GIS) map was composed in ArcMap utilizing data from the New York State clearinghouse and the New York geographic information gateway.
Water 10 01158 g001
Figure 2. The logarithmic water discharge time series of the USGS stations WD1346 (USGS 01346000), WD1347 (USGS 01347000), and WD1500 (USGS 01351500) measured from 1 January 2005 to 23 February 2013.
Figure 2. The logarithmic water discharge time series of the USGS stations WD1346 (USGS 01346000), WD1347 (USGS 01347000), and WD1500 (USGS 01351500) measured from 1 January 2005 to 23 February 2013.
Water 10 01158 g002
Figure 3. The multilayer perceptron (MLP) feedforward classifier structure with vector inputs represented as x1, x2, , xn, weights as ω1, ω2, , ω n; b is the bias and φ is the transfer function; υ refers to the weighted sum of the input.
Figure 3. The multilayer perceptron (MLP) feedforward classifier structure with vector inputs represented as x1, x2, , xn, weights as ω1, ω2, , ω n; b is the bias and φ is the transfer function; υ refers to the weighted sum of the input.
Water 10 01158 g003
Figure 4. The flowchart shows the hybrid model that consists of the data processing and the integration of the decomposition into the multiple linear regression (MLR) or artificial neural network (ANN) application.
Figure 4. The flowchart shows the hybrid model that consists of the data processing and the integration of the decomposition into the multiple linear regression (MLR) or artificial neural network (ANN) application.
Water 10 01158 g004
Figure 5. The feedforward artificial neural network of the raw data from the USGS station WD1346 using a hidden layer transfer function of hyperbolic tangent and an output layer for the prediction of water discharge.
Figure 5. The feedforward artificial neural network of the raw data from the USGS station WD1346 using a hidden layer transfer function of hyperbolic tangent and an output layer for the prediction of water discharge.
Water 10 01158 g005
Figure 6. The MSE as a function of the coefficient determination (R2) for the raw and the decomposed time series data using the MLR and the training ANN model.
Figure 6. The MSE as a function of the coefficient determination (R2) for the raw and the decomposed time series data using the MLR and the training ANN model.
Water 10 01158 g006
Figure 7. The predicted MLR and ANN values plotted against the raw values of the water discharge time series for the station WD1346. Significant peaks reflect the flooding event during the Irene storm (21 August to 28 August 2011) in West Canada Creek at Herkimer. The water discharge gage station is located in the Mohawk watershed.
Figure 7. The predicted MLR and ANN values plotted against the raw values of the water discharge time series for the station WD1346. Significant peaks reflect the flooding event during the Irene storm (21 August to 28 August 2011) in West Canada Creek at Herkimer. The water discharge gage station is located in the Mohawk watershed.
Water 10 01158 g007
Figure 8. The predicted MLR and ANN values plotted against the raw values of the water discharge time series for the station WD1500. Significant peaks reflect the impact of the Mid-Atlantic United States flood of 2006 (25 June to 5 July 2006) in Schoharie Creek. The water discharge gage station is located in the Schoharie watershed.
Figure 8. The predicted MLR and ANN values plotted against the raw values of the water discharge time series for the station WD1500. Significant peaks reflect the impact of the Mid-Atlantic United States flood of 2006 (25 June to 5 July 2006) in Schoharie Creek. The water discharge gage station is located in the Schoharie watershed.
Water 10 01158 g008
Figure 9. The coefficient determination (R2) of the different time series components for the three water discharge USGS monitoring stations (WD1346, WD1347, WD1500) from the multi linear regression application.
Figure 9. The coefficient determination (R2) of the different time series components for the three water discharge USGS monitoring stations (WD1346, WD1347, WD1500) from the multi linear regression application.
Water 10 01158 g009
Table 1. Coefficient of determination (R2) of the raw and decomposed data for the explanation of the water discharge time series using the MLR model.
Table 1. Coefficient of determination (R2) of the raw and decomposed data for the explanation of the water discharge time series using the MLR model.
StationRawLongSeasonalShort
Multiple Linear
Regression R2
WD134648.773.593.170.6
WD134747.673.692.168.3
WD150059.083.391.234.1
Table 2. Coefficient of determination (R2) of the raw and decomposed data for the explanation of the water discharge time series using ANN model.
Table 2. Coefficient of determination (R2) of the raw and decomposed data for the explanation of the water discharge time series using ANN model.
StationRawLongSeasonalShort
Artificial Neural
Network R2
WD134653.497.693.194.9
WD134755.497.695.195.3
WD150060.798.094.594.1
Table 3. The mean square of error (MSE) of the raw and decomposed data for the three water discharge stations. Training and testing data refer to the ANN (TR—training; TE—testing; MLR—multiple linear regression).
Table 3. The mean square of error (MSE) of the raw and decomposed data for the three water discharge stations. Training and testing data refer to the ANN (TR—training; TE—testing; MLR—multiple linear regression).
Station RawLongSeasonalShort
WD1346
TR0.2370.0180.0360.012
TE0.2460.0170.0390.014
MLR0.1900.1000.0550.126
WD1347
TR0.2240.0120.0250.011
TE0.2340.0120.0300.015
MLR0.2140.1100.0630.141
WD1500
TR0.1960.0100.0270.008
TE0.2220.0110.0310.013
MLR0.4290.2050.1000.274
Table 4. Summary of the explanation in the water discharge time series for all the stations.
Table 4. Summary of the explanation in the water discharge time series for all the stations.
StationLongSeasonalShortTotal Explanation
Variance %
WD134623.537.139.4
WD134723.436.140.5
WD150053.116.230.7
MLR Explanation %
WD134617.334.527.879.6
WD134717.233.227.778.1
WD150044.114.810.569.4
ANN Explanation %
WD134623.035.437.495.8
WD134722.834.538.695.8
WD150051.015.228.995.1

Share and Cite

MDPI and ACS Style

Tsakiri, K.; Marsellos, A.; Kapetanakis, S. Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York. Water 2018, 10, 1158. https://doi.org/10.3390/w10091158

AMA Style

Tsakiri K, Marsellos A, Kapetanakis S. Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York. Water. 2018; 10(9):1158. https://doi.org/10.3390/w10091158

Chicago/Turabian Style

Tsakiri, Katerina, Antonios Marsellos, and Stelios Kapetanakis. 2018. "Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York" Water 10, no. 9: 1158. https://doi.org/10.3390/w10091158

APA Style

Tsakiri, K., Marsellos, A., & Kapetanakis, S. (2018). Artificial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York. Water, 10(9), 1158. https://doi.org/10.3390/w10091158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop