High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK)

Timis, Elisabeta Cristina; Hangan, Horia; Cristea, Vasile Mircea; Mihaly, Norbert Botond; Hutchins, Michael George

doi:10.3390/hydrology12020020

Open AccessArticle

High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK)

by

Elisabeta Cristina Timis

^1,*

,

Horia Hangan

¹,

Vasile Mircea Cristea

¹,

Norbert Botond Mihaly

^1,*

and

Michael George Hutchins

²

¹

Department of Chemical Engineering, Computer Aided Process Engineering Research Centre, “Babes-Bolyai” University, Cluj-Napoca, 11 Arany Janos, 400028 Cluj, Romania

²

UK Centre for Ecology and Hydrology Wallingford, Oxford OX10 8BB, UK

^*

Authors to whom correspondence should be addressed.

Hydrology 2025, 12(2), 20; https://doi.org/10.3390/hydrology12020020

Submission received: 21 November 2024 / Revised: 14 January 2025 / Accepted: 15 January 2025 / Published: 21 January 2025

(This article belongs to the Special Issue Hydrodynamics and Water Quality of Rivers and Lakes)

Download

Browse Figures

Versions Notes

Abstract

:

The forecasting of river flows and pollutant concentrations is essential in supporting mitigation measures for anthropogenic and climate change effects on rivers and their environment. This paper addresses two aspects receiving little attention in the literature: high-resolution (sub-daily) data-driven modeling and the prediction of phosphorus compounds. It presents a series of artificial neural networks (ANNs) to forecast flows and the concentrations of soluble reactive phosphorus (SRP) and total phosphorus (TP) under a wide range of conditions, including low flows and storm events (0.74 to 484 m³/s). Results show correct forecast along a stretch of the River Swale (UK) with an anticipation of up to 15 h, at resolutions of up to 3 h. The concentration prediction is improved compared to a previous application of an advection–dispersion model.

Keywords:

pollutant transport forecast; hydrological model; artificial neural networks; river flow forecast; in-river phosphorus model; high-resolution model

1. Introduction

Floods, droughts, and water contamination are major events (escalated in disasters on very many occasions) that humankind faces in the context of growing pressures on nature, together with more industrialization, climate change, and growing urbanization [1,2]. Public health and security are threatened under these circumstances. Therefore, effective forecasting (here, also termed prediction) and early warning systems are very important for the prevention and management of floods and water pollution [3,4], as they offer stakeholders a time window to prepare mitigation measures and grounds to develop additional support tools [5,6,7]. Forecasting tools may be based on conventional models [8] and artificial intelligence (AI)-based models (data-driven approach) used separately or in combination [3,6,9]. Conventional modeling requires significant effort in four main directions: (1) the understanding of phenomena; (2) their mathematical description or the selection of an appropriate off-the-shelf model which may need significant adjustment; (3) the gathering of large amounts of field data for calibration and validation; and (4) significant computational resources and workforce. On the other hand, AI models, despite demanding a significant amount of field data (e.g., in some cases, historical data may be needed), require much less effort in the first two directions and are very promising [10,11,12]. They are adaptable and accurate in handling complex, large datasets [9,13] and can be more accurate [9] and reliable [14] than conventional models, especially in cases where the description of phenomena is very difficult to achieve [4,15].

Over the years to date, AI techniques have significantly advanced the modeling of river water flows and concentrations [13,16,17]. It is observed that modeling at a high time resolution (e.g., hourly) is implemented less often [13,18,19,20] than applications with a coarser (daily to yearly) resolution [1,15,21,22,23,24,25], despite sub-daily simulation being essential for representing short-duration storms and flash events [26,27]. Multiple aspects arise which pinpoint exploitable areas for further development. First, models involving hydrological and quality variables (e.g., chemical parameters) together are less prominent compared to “pure” hydraulic or hydrological models [15,28,29]. Second, in the context of a very steep increase in machine learning applications (including artificial neural networks (ANNs)) in the water quality field (from 310 in 2000 to 3444 in 2020 [17]), phosphorus compounds still receive little attention [16,17,30] compared to other indicators, such as dissolved oxygen (DO [31,32]), biochemical oxygen demand (BOD [32,33]), total dissolved solids (TDS [34,35,36]), nitrates (NO-3 [30]), electrical conductivity (EC [34,36]), or pH [35,37]. Moreover, models tend to include a single phosphorus compound type [32,38,39,40,41,42] rather than fully accounting for different species [30]. Third, there is a clear need for measures to alleviate the four identified drawbacks of ANNs stated in Table 1 [4,9,17,43,44,45]. Drawbacks one and three are addressed in this research via a systematic methodology for data handling and ANN development.

This paper is focused on the development of ANN-based models capable of forecasting the dynamics of water flow and pollutant concentrations (soluble reactive phosphorus (SRP) and total phosphorus (TP)) in a downstream location based on data from an upstream location (and tributaries in certain cases), at resolutions of up to 3 h. A 54 km stretch of the River Swale (part of the Ouse River catchment, United Kingdom (UK)) is used as a case study. An earlier study of the Ouse [46], forecasting depth and water flows with a horizon (lead time) of 6 h with the help of ANNs, concluded that longer time horizons would be required for real-time practical applications in flood management warning systems. Longer lead time (up to 20 days) has been achieved only with monthly [47] and daily resolution models [48], while sub-daily models have achieved few hours of lead time [26,49], and it has been observed that increasing lead time causes decreased prediction performance. Due to this, it is important to add the estimation of inflows of water reservoirs [50,51] and the need of a forecasting tool to handle a wide range of water flows, including extremes (very low drought flows to flood flows) which, for most models or from a data analyst point of view, may be considered outliers [1].

The objective of this study is to offer reliable ANN-based models for the prediction of water flow, SRP concentrations, and TP concentrations in ordinary conditions, during storms, or in the case of uncontrolled pollution events. To cover a wide range of conditions, the ANNs for water flow feature forecast horizons of 1 h, 6 h, 8 h, 9 h, 12 h, and 15 h, while the ANNs for water flow, SRP, and TP concentrations together feature a forecast horizon of 15 h. The models use easily attainable field data (water flow, temperature, and SRP and TP concentrations) and an indicator of seasonality. Therefore, the use of additional parameters (such as rainfall) is out of the scope of this paper, as the interest is to offer forecasts using easily attainable parameters from the river only to facilitate easier model use for stakeholders. The model’s reliability is tested by using independent data and assessing the prediction performance via the visual inspection of figures and the following numerical indices: Nash–Sutcliffe efficiency (NSE) [52], Kling–Gupta efficiency (KGE) [53], modified peak-flow criteria (PFC), and modified low-flow criteria (LFC) [49].

This paper’s novel contributions are as follows: (a) the modeling of phosphorus compounds; (b) the use of high-resolution measurements (between 15 min and 3 h for water flow and 3 h for phosphorus compounds), which are less prominent in the literature compared to lower resolutions (daily and above); (c) an increased forecasting horizon (also termed lead time in the literature) of up to 15 h (we believe that this has not been previously achieved by other high-time-resolution data-driven models) for sub-daily ANNs that can cover extreme events as well as base flows; (d) the improved performance of water flow and phosphorus species predictions compared to existing models for the same river stretch, demonstrating the benefits of ANNs; and (e) a systematic methodology acting as a blueprint for wider applications.

2. Materials and Methods

2.1. A Description of the River Stretch and the Field Data

The river stretch involved in this research (see Figure 1) is the lower part of the River Swale (North Yorkshire, northeast England), flowing from Catterick (the upstream end, M1) to Crakehill (the downstream end, M2).

Catterick is situated in a piedmont zone (of the Pennine hills) from where the river flows through the valley of Swaledale, draining the North Yorkshire Dales and the Vale of York. The Swale catchment is part of the greater Ouse catchment, as the River Swale merges with the River Ure (10 km downstream of Crakehill) forming the River Ouse, a tributary of the Thames. The catchment geology is diverse, consisting mainly of limestones, sandstones (particularly in the lower area), and shales, with significant superficial deposits (mainly boulder clay). The headwaters feature moorland and grassland, while the Vale of York is largely covered by arable land [54]. The investigated stretch is situated on Triassic new red sandstone covered with coarse gravels and boulders in the upper part and human-made floor downwards [55,56]. It is influenced by 3 major tributaries (Bedale Beck, Wiske, and Cod Beck, denoted by T1, T2, and T3 in Figure 1) discharging significant amounts of phosphorus compounds. Wiske and Cod Beck alone are responsible for 78% of the SRP in the River Swale [57]. Other significant influences are the 15 minor tributaries [58], point pollution sources (e.g., 3 sewage treatment works and a quarry discharging directly in the main channel), activities in populated areas (e.g., Catterick), diffuse pollution sources (farming, agriculture, and leisure activities), and several water abstractions [55,59].

This lowland area of the Swale catchment receives rainfall of up to 1300 mm/year [56], with an average annual rainfall of 1123 mm in the upper zone and 835 mm in the lower zone (Table 2). Historical daily data show average flows of 12.9 m³/s in Catterick and of 20.2 m³/s in Crakehill and a pattern within the year (applicable to the three main tributaries as well): lower values during summer months (especially July and August) and higher values in winter (especially December and January) [54,60,61,62,63].

In the NE English region in which the Swale is located, there have been long-term positive trends in the peak river flow. There is much spatio-temporal variability in these trends in England. It is known that 48% of basins in northeast England show increases in their annual maximum flow (AMAX) and 6% significant decreases in periods of at least 30 years prior to 2017 [67]. Long-term trends in the Q90 (90% exceedance) low flow and Q10 peak flow have been identified, while in the Swale, there have been significant increases across 1985–2014 in both low- and high-flow statistics [68].

The River Swale is in an area of a high mean annual runoff (471 mm) and is a fast-responding river with a low base-flow index (0.48) at the downstream end of the stretch [54,60]. There has been significant overbank flooding downstream (e.g., in York) in recent years. For this downstream stretch, especially detailed flood records with long-term trend analysis are available, and there is much concern surrounding heightened risk in the future [54,60,69]. Developing early warning systems is especially important for such cases. During the investigated period (January 1994 to February 2000), the water flows exhibited an average of 20.07 m³/s with a standard deviation of 26.73 m³/s. Values were generally lower during spring, summer, and the beginning of autumn (exhibiting a minimum of 0.17 m³/s in October 1995) compared to the higher mid–late autumn and winter water flows (exhibiting a maximum of 484 m³/s in January 1994). Such a broad flow range ensures that trained ANNs may cope with extreme conditions as well as with base water flows.

The field data employed in this research are of two kinds: (1) the regular monitoring of the water flow at M1 and M2, with an average time resolution of 15 min (January 1994 to December 1996); and (2) special monitoring campaigns (carried out under low-flow, usual-flow, and storm conditions) with data comprising water flow and concentrations of SRP and TP at M1, at M2, and in three main tributaries (T1, T2, and T3) with resolutions up to 3 h (15 min, 1 h, and 3 h, depending on the campaign). These intensive campaigns were organized at multiple locations within the catchment between February 1996 and February 2000. Such detailed campaigns have not been repeated more recently. Out of the extensive datasets, we employed 107,881 samples for water flow, 2445 samples for SRP, and 2445 samples for TP.

2.2. Building the ANN Models

All ANNs are feedforward backpropagation networks developed using the MATLAB Deep Learning Toolbox following the steps in Figure 2, as described in Section 2.2.1, Section 2.2.2, Section 2.2.3, Section 2.2.4, Section 2.2.5, Section 2.2.6 and Section 2.2.7.

2.2.1. Travel Time Evaluation

Regular water flow monitoring data were employed for the estimation of travel times between M1 and M2. The visual analysis of the data reveals a consistent transition of the peaks on the time axis from M1-to-M2 (Figure 3) by approximately 14 h.

Provided that the positions of peaks relative to each other are reasonably consistent, we propose estimating the travel time by shifting the downstream (M2) flow back in time hour by hour from 5 h to 25 h and calculating the normalized squared difference between the shifted M2 and (unmodified) M1 values. The lowest difference corresponds to the number of hours shifted which is numerically close to the observed travel time. Results in Table 3 show an average travel time between M1 and M2 of around 14 h to 15 h. The low-flow subset regards no-rain to light-rain conditions, overlapping mainly with spring and summer, while the high-flow subset mainly comprises autumn and winter measurements.

Travel time values together with the sampling resolution guided the data selection for the ANNs and network structuring (e.g., setting the forecast horizon).

2.2.2. Performance Evaluation Methods

1.: The ANNs’ prediction performance for water flows, SRP, and TP was assessed using NSE, Equation (1) [52]; KGE, Equation (2) [53]; PFC, Equation (4); and LFC, Equation (5) [49]. A perfect prediction is indicated by an NSE and KGE of 1.00 and a PFC and LFC of 0.00.

$N S E = 1 - \frac{\sum_{i = 1}^{n} {(O_{i} - P_{i})}^{2}}{\sum_{i = 1}^{n} {(O_{i} - \bar{O})}^{2}}$

(1)

where n is the total number of samples, i the individual samples, $O$ the observations, P the predictions, and $\bar{O}$ the mean of the observations.

$K G E = 1 - \sqrt{{(R - 1)}^{2} + {(\frac{σ_{P}}{σ_{O}} - 1)}^{2} + {(\frac{\bar{P}}{\bar{O}} - 1)}^{2}}$

(2)

$R = \frac{{C o v}_{P O}}{σ_{P} σ_{O}}$

(3)

2.: where R is the linear correlation coefficient between predictions and observations, Cov_PO the covariance between the predictions and observed values, σ_P and σ_O the standard deviations of the predictions and observations, and $\bar{P}$ the mean of the predictions.

$P F C = \frac{{(\sum_{p = 1}^{T_{p}} ({({O - P}^{2}) O}^{2}))}^{0.25}}{{(\sum_{p = 1}^{T_{p}} (O^{2}))}^{0.25}}$

(4)

where p is the number of data points among the peaks; O is the observed data; P is the simulated data; and Tp is the total number of peaks (flows greater than the measured mean peak value).

$L F C = \frac{{(\sum_{n = 1}^{T_{L}} ({({O - P}^{2}) O}^{2}))}^{0.25}}{{(\sum_{n = 1}^{T_{p}} (O^{2}))}^{0.25}}$

(5)

where n is the number of data points among the low flows; and T_L is the total number of low flows (flows smaller than the measured mean low value).

Additionally, the visual inspection of hydrographs and scatter plots was carried out to assess the ANNs’ performance (steps in Figure 2). The visual inspection of SRP and TP loads (evaluated from concentration and flow) was added for the evaluation of the ANNs predicting concentrations together with the water flow.

2.2.3. Slitting of Data

Available field data were split into two: a set for the ANNs’ development (termed “TRAIN” data, usually containing two thirds of the data, divided automatically by MATLAB in different proportions for training, testing, and validation) and a second set used for an additional independent evaluation of the forecast performance (termed “UNSEEN” data, containing at least one third of the data). Three types of ANNs were developed using the TRAIN data. The ANNs differentiate via the form of inputs (data from a single time stamp or data as time series), as shown in Figure 4.

Type 1 ANNs: One dataset of measurements corresponding to a time stamp was used as inputs in each step for the ANN to predict the output. Figure 4a illustrates the first computation step for an ANN with a resolution (time step) of 3 h and a forecast (prediction) horizon of 15 h. Type 2 ANNs: Observation data series (data from multiple time stamps) were used as inputs in each forecast step, or the ANN generated the output in a further step. Figure 4b illustrates the first computation step for an ANN with a resolution of 3 h and a forecast horizon of 12 h. Type 3 ANNs: The inputs were a time series of observations and a time series of earlier predictions of the output. Figure 4c illustrates steps 1, 2, and 21+ for an ANN with a resolution of 1 h and forecast horizon of 1 h. In each computation step (e.g., step 1), the input consisted of 20 past values from M1 (e.g., U1 to U20) and 20 past values from M2 (e.g., D1 to D20). The output was the predicted value at M2 with the corresponding forecast horizon (e.g., D21). In step 1 (ANN initialization), D1 to D20 consisted of observed values only. In further steps, observed values at M2 were incrementally substituted by predicted values from previous computation steps. M2 observations were completely replaced with M2 predictions after 20 iterations (step 21+).

All three types of ANNs predicted water flow, while type 2 (sub-type 2.3) was employed for the prediction of both water flow and concentrations at M2, as it was considered the most appropriate to use. The forecasting horizon and the time step (step size/resolution) for the inputs and outputs of the three ANN types were selected based on the dataset resolution and on the reach travel time. Depending on the employed field data (long-term water flow or monitoring campaigns of water flow alone or with concentrations of SRP and TP), the ANN types were further divided into sub-types, each assigned to a code in Table 4. The seasonality factor (as in [8]) was employed as an additional input for sub-type 2.3 to facilitate better SRP and TP predictions, as their transformations were strongly correlated to seasonality. The developed ANNs were applicable in a wide range of situations, e.g., (i) when tributaries’ data were unavailable/unknown (sub-types 2.2 and 3.1), as, generally, there may be a lot of cases in which tributaries are not monitored; (ii) when multiple successive measurements were available at the upper end of the stretch (types 2 and 3 all sub-types); and (iii) when only one measurement was available (sub-type 1.1).

The three ANN types for flow forecasting helped provide a foundation for the development of ANNs for the forecasting of both water flow and concentrations. Moreover, the ANNs’ capability to predict under different field circumstances (e.g., water flow variability), such as unexpected water flow changes along the river (e.g., abundant rain or a sudden decrease in flows), was tested using a different data preparation method applied to the sub-type 2.2 networks, which predicted water flow at M2 based on observations at M1 and employed a 1 h resolution. The TRAIN data were mixed; the complete time series was split into many smaller time series and then reassembled randomly. Results revealed that different water flow scenarios did not significantly affect the ANNs’ performance.

2.2.4. Exploring Wide Ranges of ANN Hyperparameters

We created randomizing functions to generate a wide range of networks with respect to the following: (A) the number of layers; (B) the number of neurons; (C) the transfer function in each layer; and (D) the training function. A benchmark performance threshold was set, depending on the network purpose (e.g., NSE > 0.85 for ANNs predicting only flows). Networks predicting above the threshold were saved. This step would generally have been slow and time consuming; therefore, automated algorithms were implemented.

2.2.5. Reducing the ANN Hyperparameter Search Ranges

After enough ANNs met the benchmark (>20, varying among the ANN sub-types), we analyzed complying ANNs for common features to facilitate the following: (a) the programming of characteristics (A–D) defined in the previous step; or (b) the constraint of the range of randomizing functions (Step 1). As a common feature, it was observed that all saved ANNs were trained using the Levenberg–Marquardt (LM) training algorithm. The good performance of the LM algorithm in training the ANNs for flow has been confirmed by others [70].

2.2.6. Generating Better-Performing ANNs

A new series of ANNs was generated (>20). A higher performance was achieved. Each network was analyzed under the following criteria: (a) NSE for the training; (b) NSE for the testing; (c) the ability to predict observations under extreme events (e.g., peaks or low-flow situations); and (d) the graphical comparison of the network output against measurements. The ANNs considered most suitable were selected. These ANNs were considered proficient and reliable for multi-step ahead forecasting.

2.2.7. Evaluating Forecast Performance Using UNSEEN Data

The selected ANNs’ forecast performance was evaluated again using the UNSEEN data to ensure their generalization capacity. These ANNs are available on HydroShare [71].

3. Results

3.1. ANNs’ Architectures

A selection of the best-performing ANNs during development (training, validation, and testing) and independent evaluation is illustrated in Table 5, showing one or multiple ANNs for each sub-type. ANNs from a sub-type may be different in terms of architecture, data resolution (time step), the number of inputs, and the forecast horizon.

3.2. ANNs’ Calibration Results (Training, Validation, and Testing) Using TRAIN Data

The ANNs’ development steps aimed at producing reliable ANNs for a wide range of conditions, using the optimum hyperparameters. Performance evaluation indices (Table 6) and the visual evaluation of results for the flow-predicting networks (Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9) show that among the long-forecasting-horizon ANNs, #1.1.1 predicted extremes better than medium flows (Figure 5), while ANN #2.1.1 (Figure 6) predicted equally well for all flows, except for the range 30 m³/s to 60 m ³/s. ANNs #2.1.3 and #2.1.4 (Figure 7) predicted very well for all tested flow ranges, being the most performant (on TRAIN data), together with #3.1.1. They featured the lowest forecasting horizon (1 h and 6 h) and were closely followed in performance by ANNs #1.1.1 and #2.1.1, featuring longer forecasting horizons (15 h and 9 h). All these networks (except #3.1.1) used observed water flows at M1, T1, T2, and T3 (from the special monitoring campaigns) to forecast water flows at M2.

Networks #2.2.1, #2.2.2 (Figure 8), and #3.1.1 (Figure 9) used observed water flows at M1 (from the regular monitoring of the water flow at M1 and M2) to forecast water flows at M2. ANN #3.1.1 offered better flow prediction over the entire range, probably due to the lower forecast horizon (1 h) and to the additional use of previous predictions of water flows at M2 as inputs. However, the other two networks offered longer forecasting horizons of 8 h and 12 h and offered good prediction performance. Moreover, ANN #2.2.2 was different from the other sub-type 2.2 ANNs via the preparation of data (mixed time series reassembled randomly, as described in Section 3.1), aiming to capture unexpected flow conditions.

Most of the ANNs are deep neural networks, with more than one hidden layer, except the networks used to predict both water flow and concentrations (ANN #2.3.1, Figure 10).

Generally, adding layers and/or neurons should have facilitated better predictions in cases of complex phenomena, with the associated drawbacks of more demanding data and computation requirements and the greater risk of overfitting. For sub-type 2.3, good prediction performance could be achieved using the shallow ANN #2.3.1. Complementing the ANN with additional layers or using more complex transfer functions increased the training time and slightly improved the prediction performance for at least one of the indicators (flow, SRP, TP) but did not improve the forecast horizon on any occasion. The more complex ANN #2.3.2 featured similar performance to ANN #2.3.1. ANN #2.3.1 was preferable for estimating both water flow and concentrations due to the architectural simplicity and long forecast horizon (15 h). Overall, simpler networks appeared better suited for this application. Increasing complexity did not improve performance. Similar behavior was visible in the case of water flow predictions, as explained in the following section.

3.3. ANNs’ Forecast Results Using UNSEEN Data

During the additional testing, the best-performing flow-forecast ANNs were #3.1.1, #2.1.1, #2.1.4, and #2.2.2 (Table 7). Three different time resolutions were tested for the ANNs predicting water flow: 15 min (#2.1.3), 1 h (#2.1.4, #2.2.1, #2.2.2, and #3.1.1), and 3 h (#1.1.1, #2.1.1, #2.1.2, #2.3.1, and #2.3.2). It was observed that a higher resolution did not necessarily lead to better prediction performance for the UNSEEN data. Among the ANN designs of sub-type 2.1, the higher resolution of 15 min in the case of network #2.1.3 was associated with a lower prediction performance (NSE of 0.85) compared to the 1 h resolution of network #2.1.4 (NSE of 0.96), which had the same forecast horizon of 6 h. Better predictions (compared to networks with a 15 min resolution) were also made by networks with a 3 h resolution and a forecast horizon of 9 h and 12 h (NSE of 0.97 for ANN #2.1.1 and 0.96 for #2.1.2). ANN #2.1.3 performed better during training (NSE of 0.99), while ANNs #2.1.1 and #2.1.4 performed very well on both sets of data. In terms of input preparation, it is noted that network #2.2.2, for which the inputs were mixed (to capture unexpected weather scenarios), had good performance (TRAIN NSE of 0.91 and UNSEEN NSE of 0.89).

ANN #1.1.1 showed very good results (NSE of 0.97) (Figure 11) for M2 flow prediction, except local peaks (up to 45 m³/s), which were occasionally overpredicted. In the case of increasing or decreasing the forecast horizon, the NSE always decreased, as expected, considering that the relationship between inputs (flows at M1) and outputs (flows at M2) was strongly connected to the travel time. This ANN had good practical use in predicting a value at the downstream end of the river stretch based on a single measurement at the upstream end and in tributaries, offering water stakeholders a time window of 15 h to make decisions and implement actions.

ANNs #2.1.1 to #2.1.4 had the advantage of good prediction performance and capturing the flow dynamics trend, probably caused by the wide spectrum of input data. The results for the UNSEEN data in Figure 12 (9 h and 12 h forecast horizon) and Figure 13 (6 h forecast horizon) do not show significant improved peak prediction compared to ANN #1.1.1. Comparing ANNs using different data resolutions (1 h and 15 min in Figure 5), it is observed how the higher-resolution ANN #2.1.3 (15 min) predicted the lower flows slightly better, while the 1 h-resolution ANN #2.1.4 predicted the other flow ranges better. From a practical point of view, network #2.1.2 may be the most promising due to the longest anticipation time (12 h), the lower number of inputs needed (12 observations compared to 20, 52, or 196 in the case of other ANNs of the same sub-type), and the good prediction performance (NSE of 0.96).

The ANN #2.2.2 forecast results (Figure 14) show a better forecast for low flows (up to 50 m³/s) and high extremes compared to for medium-range peaks. It is significant that simulated values follow the trend of observations very well given the 12 h forecast horizon and the capturing of unexpected weather events.

The water flow forecasts of ANN #3.1.1 (Figure 15) reveal very good prediction, with the better capturing of peaks compared to earlier sub-types, despite the large variability of targeted outputs (observed M2 flows between 2.42 m³/s and 158 m³/s). Reduced overestimation was obtained for the medium-range peaks (20 m³/s to 90 m³/s), while slight underestimation occurred for the largest peak (160 m³/s). This one-step recurrent ANN could be iteratively used for the computation of predictions over long time horizons, provided that M1 water flows were specified.

Figure 16 presents four samples of the UNSEEN data, each 3 weeks long during a different time of the year, chosen randomly, and meeting two conditions: each span belongs to another season and all the intervals must have variations in the flowrate. A short time drop in the predicted flow rate immediately before the very steep peaks (e.g., spring on 22nd April, summer on 10th August) can be observed. This might be because the ANN overreacted and anticipated the flow rate to become lower based on the decreasing upstream flow and applied correction when the upstream values became higher.

The water flow and concentration ANNs with a forecast horizon of 15 h (#2.3.1 with two neurons in a single hidden layer) and 12 h (#2.3.2 with three hidden layers) offered relatively similar prediction performance for water flow and SRP, while ANN #2.3.1 had better performance for OP (Figure 17). ANN #2.3.1 captured the OP loadings better, while #2.3.2 captured the SRP loadings better.

Flows above 20 m³/s were slightly overpredicted by both networks on around 75% of occasions (Figure 17). Very large values of SRP (over 0.22 mg/L) were underestimated, while for the other values, there was no evidence of systematic over- or underestimation. The values of NSE associated with ANN #2.3.1 with the UNSEEN data (NSE_Flowrate > 0.70; NSE_SRP > 0.74; NSE_TP > 0.60) were better in comparison to earlier results for the same river stretch using an advection–dispersion model. Timis and colleagues [8] reported NSE values of 0.39 for SRP and 0.47 for organic phosphorus (OP, estimated as the difference between TP and SRP) during evaluation runs.

4. Discussion

This paper presented a method of designing ANNs to predict water flow and phosphorus species’ (SRP and TP) concentrations based on different types of available data. Given the non-linear and complex nature of ANN models, the employed method focused on evaluating the model’s predictive accuracy using multiple performance indices and additional performance testing using an independent dataset (UNSEEN data) to ensure robustness. This approach was preferred to performing a sensitivity analysis, which is usually employed to aid feature selection and model simplification, improve model robustness, and understand the system’s internal processes. The understanding of this system has been enhanced by experimental investigations [55,58,72,73] and an advection–dispersion model [8]. Therefore, these ANN models seek the practical use described in Table 8 and the desired forecast horizon increase (from 6 h), compared to the earlier ANN applications in the wider Ouse catchment [46].

All ANNs could forecast a wide range of water flows (0.74 to 484 m³/s) with a wide range of anticipation (1 h to 15 h, depending on the needs). ANNs #3.1.1 (1 h forecast horizon), #1.1.1 (15 h forecast horizon), #2.1.1 (9 h forecast horizon), and #2.2.2 (12 h forecast horizon) performed equally well with both TRAIN and UNSEEN data. Most ANNs (except ANNs #1.1.1 and #3.1.1) had a better performance when it came to estimating lower flows compared to high-flow peaks, as observed for other existing AI models [74,75]. A good capability to predict a wide range of water flows was seen, which is noteworthy as few previous studies [2,76] have accurately predicted very high flows, especially at a high sub-daily resolution and featuring lower forecast horizons compared to present studies [20,26]. In particular, such predictions are fundamental requirements for urban streams [2,38,77].

Among the described ANNs, the highlight was ANN #2.3.1, which predicted both the flow rate and the concentrations of SRP and TP with a 15 h forecast horizon and high predictive performance using UNSEEN data (NSE_Flowrate > 0.70; NSE_SRP > 0.74; NSE_TP > 0.60). Its capabilities equaled those of other AI-based models including at least one phosphorus compound. TP was predicted at a monthly resolution with an NSE of 0.71 [78]; at monthly, bimonthly, and trimonthly resolutions with an R² of 0.74 to 0.94 [79]; and at a daily resolution with relative errors of up to 20% [80]. Better results, with an R² of 0.92 to 0.97, have been obtained for TP by more complex models comprising one or multiple variables at daily [39] and sub-daily resolutions [38]. Compared to the aforementioned models in the literature, the ANNs in this study provided a higher time resolution in the forecasts while asking for easily attainable field data.

These practical aspects related to the presented ANNs, together with the good prediction performance, are even more valuable in the context of the difficulty of forecasting river variables due to the very fast changing dynamics which make forecasting a challenging task [3] and cause results at higher resolutions to be worse than those at lower resolutions [20]. This further highlights the need for high-frequency data (e.g., sub-daily) and for models comprising both hydrological and water quality parameters [15]. Such data from future monitoring campaigns may facilitate at least three research directions: (1) the application of these ANN models for forecasting and analytical purposes; (2) the further improvement of ANN models’ performances and functionality (e.g., by including additional water quality parameters); and (3) the testing of other ANN types and additional AI techniques for this case study.

5. Conclusions

In this paper, the performance and applicability of feedforward backpropagation artificial neural networks was assessed using datasets for a stretch of the River Swale. All networks were double-tested, first by using the incorporated Deep Neural Network Toolbox of MATLAB (using a share of the TRAIN data) and second by manually using additional datasets alongside during data processing (UNSEEN data).

The ANNs successfully and reliably predicted flows and SRP and OP concentrations. ANNs with higher time resolutions (15 min for #2.1.3) performed better with the TRAIN data, while the ANNs with 1 h or 3 h resolutions performed very well using both datasets (#2.1.1, #2.1.4, and #3.1.1). The most potent ANN variants were as follows: (1) #2.1.4 and #2.1.2, water flow prediction with a 6 h and 12 h forecast horizon using time series from the upstream end and tributaries as an input (NSE of 0.96); (2) #2.2.2, water flow prediction with a 12 h forecast horizon using main stream flow rate time series as an input in the absence of data on tributary flows (NSE of 0.89); (3) #3.1.1, water flow prediction with a 1 h forecast horizon (extendable by recurrence to any future horizon) using upstream flow rate time series and previous ANN-predicted values as an input in the absence of data on tributary flows; (4) #2.3.1, the prediction of water flow and concentrations together with a 15 h forecast horizon using as an input the main stream and two tributary flow rate and concentration time series along with temperature and seasonality. These networks bring significant improvement beyond that achieved previously in the River Swale using an advection–dispersion water quality model.

Author Contributions

Conceptualization, V.M.C. and E.C.T.; methodology, V.M.C., N.B.M. and E.C.T.; software, H.H. and E.C.T.; validation, E.C.T. and H.H.; formal analysis H.H., E.C.T. and V.M.C.; investigation, H.H. and E.C.T.; resources, V.M.C.; data curation, H.H. and M.G.H.; writing—original draft preparation, H.H.; writing—review and editing, E.C.T., V.M.C. and M.G.H.; visualization, V.M.C., N.B.M. and M.G.H.; supervision, V.M.C.; project administration, E.C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Field data (measurements of phosphorus species’ concentrations) from the River Swale monitoring stations are openly available in HydroShare at https://doi.org/10.4211/hs.858aaf445ca645f5948a7bd73c16cdd6. A detailed description of the first five monitoring campaigns [58] and of the following five campaigns [55,72] is also available. Further discussions and graphical representations of water flow and phosphorus compounds’ time series are also available online [8]. These data are part of a larger, freely available accredited dataset [81], also including nutrient concentrations for the River Swale. The water flow data are available online in the UK National River Flow Archive (the link is available in the references list) at the specific locations for the two sites: the upstream end (M1) of the investigated river stretch (station number 27090) and at the downstream end (M2) of the river stretch (station number 27071).

Acknowledgments

We thank Mike Bowes for his knowledge of nutrient dynamics in the Swale River basin. The Environment Agency and Yorkshire Water provided data for model testing.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bak, G.; Bae, Y. Deep Learning Algorithm Development for River Flow Prediction: PNP Algorithm. Soft Comput. 2023, 27, 13487–13515. [Google Scholar] [CrossRef]
Park, K.; Jung, Y.; Kim, K.; Park, S.K. Determination of Deep Learning Model and Optimum Length of Training Data in the River with Large Fluctuations in Flow Rates. Water 2020, 12, 3537. [Google Scholar] [CrossRef]
Ahmed, A.A.; Sayed, S.; Abdoulhalik, A.; Moutari, S.; Oyedele, L. Applications of Machine Learning to Water Resources Management: A Review of Present Status and Future Opportunities. J. Clean. Prod. 2024, 441, 140715. [Google Scholar] [CrossRef]
Noori, N.; Kalin, L.; Isik, S. Water Quality Prediction Using SWAT-ANN Coupled Approach. J. Hydrol. 2020, 590, 125220. [Google Scholar] [CrossRef]
Avramenko, Y.; Ani, E.C.; Kraslawski, A.; Agachi, P.S. Mining of Graphics for Information and Knowledge Retrieval. Comput. Chem. Eng. 2009, 33, 618–627. [Google Scholar] [CrossRef]
Zhang, Q.; You, X.Y. Recent Advances in Surface Water Quality Prediction Using Artificial Intelligence Models. Water Resour. Manag. 2024, 38, 235–250. [Google Scholar] [CrossRef]
Hòa, P.T.; Mogoş-Kirner, M.; Mircea Cristea, V.; Alexandra, C.; Agachi, P.Ş. Simulation and Control of Floods in a Water Network. Case Study Jijia River Catchment. Environ. Eng. Manag. J. 2017, 16, 587–595. [Google Scholar]
Timis, E.C.; Hutchins, M.G.; Cristea, V.M. Advancing Understanding of In-River Phosphorus Dynamics Using an Advection–Dispersion Model (ADModel-P). J. Hydrol. 2022, 612, 128173. [Google Scholar] [CrossRef]
Tan, W.Y.; Lai, S.H.; Teo, F.Y.; El-Shafie, A. State-of-the-Art Development of Two-Waves Artificial Intelligence Modeling Techniques for River Streamflow Forecasting. Arch. Comput. Methods Eng. 2022, 29, 5185–5211. [Google Scholar] [CrossRef]
Difi, S.; Elmeddahi, Y.; Hebal, A.; Singh, V.P.; Heddam, S.; Kim, S.; Kisi, O. Monthly Streamflow Prediction Using Hybrid Extreme Learning Machine Optimized by Bat Algorithm: A Case Study of Cheliff Watershed, Algeria. Hydrol. Sci. J. 2023, 68, 189–208. [Google Scholar] [CrossRef]
Khatun, A.; Chatterjee, C.; Sahu, G.; Sahoo, B. A Novel Smoothing-Based Long Short-Term Memory Framework for Short-to Medium-Range Flood Forecasting. Hydrol. Sci. J. 2023, 68, 488–506. [Google Scholar] [CrossRef]
Martins, L.L.; Martins, W.A.; Rodrigues, I.C.d.A.; Xavier, A.C.F.; de Moraes, J.F.L.; Blain, G.C. Gap-Filling of Daily Precipitation and Streamflow Time Series: A Method Comparison at Random and Sequential Gaps. Hydrol. Sci. J. 2023, 68, 148–160. [Google Scholar] [CrossRef]
Ibrahim, K.S.M.H.; Huang, Y.F.; Ahmed, A.N.; Koo, C.H.; El-Shafie, A. A Review of the Hybrid Artificial Intelligence and Optimization Modelling of Hydrological Streamflow Forecasting. Alex. Eng. J. 2022, 61, 279–303. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Matta, E.; Cominola, A.; Xia, X.; Zhang, Q.; Liang, Q.; Hinkelmann, R. Neurocomputing in Surface Water Hydrology and Hydraulics: A Review of Two Decades Retrospective, Current Status and Future Prospects. J. Hydrol. 2020, 588, 125085. [Google Scholar] [CrossRef]
Dodig, A.; Ricci, E.; Kvascev, G.; Stojkovic, M. A Novel Machine Learning-Based Framework for the Water Quality Parameters Prediction Using Hybrid Long Short-Term Memory and Locally Weighted Scatterplot Smoothing Methods. J. Hydroinform. 2024, 26, 1059–1079. [Google Scholar] [CrossRef]
Irwan, D.; Ali, M.; Ahmed, A.N.; Jacky, G.; Nurhakim, A.; Ping Han, M.C.; AlDahoul, N.; El-Shafie, A. Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications. Arch. Comput. Methods Eng. 2023, 30, 4633–4652. [Google Scholar] [CrossRef]
Cojbasic, S.; Dmitrasinovic, S.; Kostic, M.; Sekulic, M.T.; Radonic, J.; Dodig, A.; Stojkovic, M. Application of Machine Learning in River Water Quality Management: A Review. Water Sci. Technol. 2023, 88, 2297–2308. [Google Scholar] [CrossRef]
Devi, G.; Sharma, M.; Sarma, P.; Phukan, M.; Sarma, K.K. Flood Frequency Modeling and Prediction of Beki and Pagladia Rivers Using Deep Learning Approach. Neural Process. Lett. 2022, 54, 3263–3282. [Google Scholar] [CrossRef]
Nur Adli Zakaria, M.; Abdul Malek, M.; Zolkepli, M.; Najah Ahmed, A. Application of Artificial Intelligence Algorithms for Hourly River Level Forecast: A Case Study of Muda River, Malaysia. Alex. Eng. J. 2021, 60, 4015–4028. [Google Scholar] [CrossRef]
Feinstein, J.; Ploussard, Q.; Veselka, T.; Yan, E. Using Data-Driven Prediction of Downstream 1D River Flow to Overcome the Challenges of Hydrologic River Modeling. Water 2023, 15, 3843. [Google Scholar] [CrossRef]
Zounemat-Kermani, M.; Mahdavi-Meymand, A.; Hinkelmann, R. A Comprehensive Survey on Conventional and Modern Neural Networks: Application to River Flow Forecasting. Earth Sci. Inf. 2021, 14, 893–911. [Google Scholar] [CrossRef]
Granata, F.; Di Nunno, F.; de Marinis, G. Stacked Machine Learning Algorithms and Bidirectional Long Short-Term Memory Networks for Multi-Step Ahead Streamflow Forecasting: A Comparative Study. J. Hydrol. 2022, 613, 128431. [Google Scholar] [CrossRef]
Hayder, G.; Solihin, M.I.; Najwa, M.R.N. Multi-Step-Ahead Prediction of River Flow Using NARX Neural Networks and Deep Learning LSTM. H2Open J. 2022, 5, 42–59. [Google Scholar] [CrossRef]
Chu, H.; Wei, J.; Wu, W.; Jiang, Y.; Chu, Q.; Meng, X. A Classification-Based Deep Belief Networks Model Framework for Daily Streamflow Forecasting. J. Hydrol. 2021, 595, 125967. [Google Scholar] [CrossRef]
Martinho, A.D.; Saporetti, C.M.; Goliatt, L. Approaches for the Short-Term Prediction of Natural Daily Streamflows Using Hybrid Machine Learning Enhanced with Grey Wolf Optimization. Hydrol. Sci. J. 2023, 68, 16–33. [Google Scholar] [CrossRef]
Yeoh, K.L.; Puay, H.T.; Abdullah, R.; Manan, T.S.A. Appraisal of Data-Driven Techniques for Predicting Short-Term Streamflow in Tropical Catchment. Water Sci. Technol. 2023, 88, 75–91. [Google Scholar] [CrossRef]
Jhong, Y.-D.; Lin, H.P.; Chen, C.S.; Jhong, B.C. Real-Time Neural-Network-Based Ensemble Typhoon Flood Forecasting Model with Self-Organizing Map Cluster Analysis: A Case Study on the Wu River Basin in Taiwan. Water Resour. Manag. 2022, 36, 3221–3245. [Google Scholar] [CrossRef]
Maier, H.R.; Jain, A.; Dandy, G.C.; Sudheer, K.P. Methods Used for the Development of Neural Networks for the Prediction of Water Resource Variables in River Systems: Current Status and Future Directions. Environ. Model. Softw. 2010, 25, 891–909. [Google Scholar] [CrossRef]
Tiyasha; Tung, T.M.; Yaseen, Z.M. A Survey on River Water Quality Modelling Using Artificial Intelligence Models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar] [CrossRef]
Elsayed, A.; Rixon, S.; Levison, J.; Binns, A.; Goel, P. Machine Learning Models for Prediction of Nutrient Concentrations in Surface Water in an Agricultural Watershed. J. Environ. Manag. 2024, 372, 123305. [Google Scholar] [CrossRef]
Bisht, A.K.; Singh, R.; Bhutiani, R.; Bhatt, A. Artificial Neural Network Based Water Quality Forecasting Model for Ganga River. Int. J. Eng. Adv. Technol. 2019, 8, 2778–2785. [Google Scholar] [CrossRef]
Novianta, M.A.; Syafrudin; Warsito, B.; Rachmawati, S. Monitoring River Water Quality through Predictive Modeling Using Artificial Neural Networks Backpropagation. AIMS Environ. Sci. 2024, 11, 649–664. [Google Scholar] [CrossRef]
Li, X.; Song, J. A New ANN-Markov Chain Methodology for Water Quality Prediction. In Proceedings of the International Joint Conference on Neural Networks, Killarney, Ireland, 12–17 July 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2015; p. 7280320. [Google Scholar]
Affandi, F.; Rahman, M.F.A.; Ani, A.I.C.; Sulaiman, M.S. Artificial Neural Network Modeling for Predicting the Quality of Water in the Sabak Bernam River. Indones. J. Electr. Eng. Comput. Sci. 2022, 3, 1684–1691. [Google Scholar] [CrossRef]
Ubah, J.I.; Orakwe, L.C.; Ogbu, K.N.; Awu, J.I.; Ahaneku, I.E.; Chukwuma, E.C. Forecasting Water Quality Parameters Using Artificial Neural Network for Irrigation Purposes. Sci. Rep. 2021, 11, 24438. [Google Scholar] [CrossRef]
Abdullah, D.; Gartsiyanova, K.; Mansur Qizi, K.E.M.; Javlievich, E.A.; Bulturbayevich, M.B.; Zokirova, G.; Nordin, M.N. An Artificial Neural Networks Approach and Hybrid Method with Wavelet Transform to Investigate the Quality of Tallo River, Indonesia. Casp. J. Environ. Sci. 2023, 21, 647–656. [Google Scholar] [CrossRef]
Ghazali, R.; Fuzi, N.A.M.; Mostafa, S.A.; Khattak, U.F.; Ali, R.R. An Application of Multilayer Perceptron for the Prediction of River Water Quality. In International Conference on Innovative Computing and Communications; Lecture Notes in Networks and Systems; Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A., Eds.; Springer Science and Business Media Deutschland GmbH: Delhi, India, 2023. [Google Scholar]
Chen, J.; Li, H.; Felix, M.; Chen, Y.; Zheng, K. Water Quality Prediction of Artificial Intelligence Model: A Case of Huaihe River Basin, China. Environ. Sci. Pollut. Res. 2024, 31, 14610–14640. [Google Scholar] [CrossRef]
Baek, S.S.; Pyo, J.; Chun, J.A. Prediction of Water Level and Water Quality Using a Cnn-Lstm Combined Deep Learning Approach. Water 2020, 12, 3399. [Google Scholar] [CrossRef]
Song, J.; Meng, H.; Kang, Y.; Zhu, M.; Zhu, Y.; Zhang, J. A Method for Predicting Water Quality of River Basin Based on OVMD-GAT-GRU. Stoch. Environ. Res. Risk Assess. 2024, 38, 339–356. [Google Scholar] [CrossRef]
Tian, Q.; Luo, W.; Guo, L. Water Quality Prediction in the Yellow River Source Area Based on the DeepTCN-GRU Model. J. Water Process. Eng. 2024, 59, 105052. [Google Scholar] [CrossRef]
Yao, J.; Chen, S.; Ruan, X. Interpretable CEEMDAN-FE-LSTM-Transformer Hybrid Model for Predicting Total Phosphorus Concentrations in Surface Water. J. Hydrol. 2024, 629, 130609. [Google Scholar] [CrossRef]
Hunter, J.M.; Maier, H.R.; Gibbs, M.S.; Foale, E.R.; Grosvenor, N.A.; Harders, N.P.; Kikuchi-Miller, T.C. Framework for Developing Hybrid Process-Driven, Artificial Neural Network and Regression Models for Salinity Prediction in River Systems. Hydrol. Earth Syst. Sci. 2018, 22, 2987–3006. [Google Scholar] [CrossRef]
Ghorbani, M.A.; Khatibi, R.; Danandeh Mehr, A.; Asadi, H. Chaos-Based Multigene Genetic Programming: A New Hybrid Strategy for River Flow Forecasting. J. Hydrol. 2018, 562, 455–467. [Google Scholar] [CrossRef]
Tikhamarine, Y.; Souag-Gamane, D.; Najah Ahmed, A.; Kisi, O.; El-Shafie, A. Improving Artificial Intelligence Models Accuracy for Monthly Streamflow Forecasting Using Grey Wolf Optimization (GWO) Algorithm. J. Hydrol. 2020, 582, 124435. [Google Scholar] [CrossRef]
See, L.; Openshaw, S. Applying Soft Computing Approaches to River Level Forecasting. Hydrol. Sci. J. 1999, 44, 763–778. [Google Scholar] [CrossRef]
Nguyen, T.T.H.; Vu, D.Q.; Mai, S.T.; Dang, T.D. Streamflow Prediction in the Mekong River Basin Using Deep Neural Networks. IEEE Access 2023, 11, 97930–97943. [Google Scholar] [CrossRef]
Cheng, M.; Fang, F.; Kinouchi, T.; Navon, I.M.; Pain, C.C. Long Lead-Time Daily and Monthly Streamflow Forecasting Using Machine Learning Methods. J. Hydrol. 2020, 590, 125376. [Google Scholar] [CrossRef]
Tan, W.Y.; Lai, S.H.; Pavitra, K.; Teo, F.Y.; El-Shafie, A. Deep Learning Model on Rates of Change for Multi-Step Ahead Streamflow Forecasting. J. Hydroinform. 2023, 25, 1667–1689. [Google Scholar] [CrossRef]
Latif, S.D.; Ahmed, A.N. A Review of Deep Learning and Machine Learning Techniques for Hydrological Inflow Forecasting. Environ. Dev. Sustain. 2023, 25, 12189–12216. [Google Scholar] [CrossRef]
Hadiyan, P.P.; Moeini, R.; Ehsanzadeh, E. Application of Static and Dynamic Artificial Neural Networks for Forecasting Inflow Discharges, Case Study: Sefidroud Dam Reservoir. Sustain. Comput. Inform. Syst. 2020, 27, 100401. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River Flow Forecasting through Conceptual Models Part I—A Discussion of Principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the Mean Squared Error and NSE Performance Criteria: Implications for Improving Hydrological Modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
NRFA Station Data for 27071—Swale at Crakehill. Available online: https://nrfa.ceh.ac.uk/data/station/info/27071 (accessed on 15 December 2024).
Bowes, M.J.; House, W.A. Phosphorus and Dissolved Silicon Dynamics an the River Swale Catchment, UK: A Mass-Balance Approach. Hydrol. Process. 2001, 15, 261–280. [Google Scholar] [CrossRef]
Dennis, I. The Impact of Historical Metal Mining on the River Swale Catchment, North Yorkshire, UK. Ph.D. Thesis, University of Wales, Aberystwyth, UK, 2005. [Google Scholar]
House, W.A.; Leachb, D.; Warwick, M.S.; Whitton, B.A.; Pattinsonc, S.N.; Rylandb, G.; Pinderb, A.; Ingramb, J.; Lishmand, J.P.; Smith, S.; et al. Nutrient Transport in the Humber Rivers. Sci. Total Environ. 1997, 194, 303–320. [Google Scholar] [CrossRef]
House, W.A.; Warwick, M.S. A Mass-Balance Approach to Quantify the Importance of in-Stream Processes during Nutrient Transport in a Large River Catchment. Sci. Total Environ. 1998, 210–211, 139–152. [Google Scholar] [CrossRef]
Ani, E.C.; Hutchins, M.; Kraslawski, A.; Agachi, P.S. Mathematical Model to Identify Nitrogen Variability in Large Rivers. River Res. Appl. 2011, 27, 1216–1236. [Google Scholar] [CrossRef]
NRFA Station Data for 27090—Swale at Catterick Bridge. Available online: https://nrfa.ceh.ac.uk/data/station/info/27090 (accessed on 15 December 2024).
NRFA Station Data for 27085—Cod Beck at Dalton Bridge. Available online: https://nrfa.ceh.ac.uk/data/station/info/27085 (accessed on 15 December 2024).
NRFA Station Data for 27069—Wiske at Kirby Wiske. Available online: https://nrfa.ceh.ac.uk/data/station/info/27069 (accessed on 15 December 2024).
NRFA Station Data for 27075—Bedale Beck at Leeming. Available online: https://nrfa.ceh.ac.uk/data/station/info/27075 (accessed on 15 December 2024).
Locational Information | National River Flow Archive. Available online: https://nrfa.ceh.ac.uk/locational-information (accessed on 15 December 2024).
Rainfall Statistics|National River Flow Archive. Available online: https://nrfa.ceh.ac.uk/rainfall-statistics (accessed on 11 January 2025).
FEH Catchment Descriptors|National River Flow Archive. Available online: https://nrfa.ceh.ac.uk/feh-catchment-descriptors (accessed on 11 January 2025).
Hannaford, J.; Mastrantonas, N.; Vesuviano, G.; Turner, S. An Updated National-Scale Assessment of Trends in UK Peak River Flow Data: How Robust Are Observed Increases in Flooding? Hydrol. Res. 2021, 52, 699–718. [Google Scholar] [CrossRef]
Harrigan, S.; Hannaford, J.; Muchan, K.; Marsh, T.J. Designation and Trend Analysis of the Updated UK Benchmark Network of River Flow Stations: The UKBN2 Dataset. Hydrol. Res. 2018, 49, 552–567. [Google Scholar] [CrossRef]
Macdonald, N. Trends in Flood Seasonality of the River Ouse (Northern England) from Archive and Instrumental Sources since AD 1600. Clim. Change 2012, 110, 901–923. [Google Scholar] [CrossRef]
Shikhar, K.C.; Bhattarai, K.P.; De Shan, T.; Mishra, S.; Joshi, I.; Singh, A.K. Comprehensive Performance Analysis of Training Functions in Flow Prediction Modeusing Artificial Neural Network. Water SA 2024, 50, 190–200. [Google Scholar] [CrossRef]
Hangan, H.; Timis, E.C.; Cristea, V.M.; Hutchins, M.G. ANNs to Forecast Waterflow, Soluble Reactive Phosphorus and Total Phosphorus at the Lower End of a River Stretch. 2023 HydroShare. Available online: http://www.hydroshare.org/resource/02f5b6093a5941d8ba471b2b387765dd (accessed on 11 January 2025).
Bowes, M.J.; House, W.A.; Hodgkinson, R.A. Phosphorus Dynamics along a River Continuum. Sci. Total Environ. 2003, 313, 199–212. [Google Scholar] [CrossRef]
Bowes, M.J.; House, W.A.; Hodgkinson, R.A.; Leach, D.V. Phosphorus-Discharge Hysteresis during Storm Events along a River Catchment: The River Swale, UK. Water Res. 2005, 39, 751–762. [Google Scholar] [CrossRef] [PubMed]
Reed, S.M. An Active Learning Convolutional Neural Network for Predicting River Flow in a Human Impacted System. Front. Water 2023, 5, 1271780. [Google Scholar] [CrossRef]
Letessier, C.; Cardi, J.; Dussel, A.; Ebtehaj, I.; Bonakdari, H. Enhancing Flood Prediction Accuracy through Integration of Meteorological Parameters in River Flow Observations: A Case Study Ottawa River. Hydrology 2023, 164, 164. [Google Scholar] [CrossRef]
Li, F.F.; Wang, Z.Y.; Qiu, J. Long-Term Streamflow Forecasting Using Artificial Neural Network Based on Preprocessing Technique. J. Forecast. 2019, 38, 192–206. [Google Scholar] [CrossRef]
Henonin, J.; Russo, B.; Mark, O.; Gourbesville, P. Real-Time Urban Flood Forecasting and Modelling—A State of the Art. J. Hydroinform. 2013, 15, 717–736. [Google Scholar] [CrossRef]
Li, Q.; Yang, Y.; Yang, L.; Wang, Y. Comparative Analysis of Water Quality Prediction Performance Based on LSTM in the Haihe River Basin, China. Environ. Sci. Pollut. Res. 2023, 30, 7498–7509. [Google Scholar] [CrossRef]
Liu, M.; Lu, J. Support Vector Machine―An Alternative to Artificial Neuron Network for Water Quality Forecasting in an Agricultural Nonpoint Source Polluted River? Environ. Sci. Pollut. Res. 2014, 21, 11036–11053. [Google Scholar] [CrossRef]
Petrea, S.M.; Zamfir, C.; Simionov, I.A.; Mogodan, A.; Nuţă, F.M.; Rahoveanu, A.T.; Nancu, D.; Cristea, D.S.; Buhociu, F.M. A Forecasting and Prediction Methodology for Improving the Blue Economy Resilience to Climate Change in the Romanian Lower Danube Euroregion. Sustainability 2021, 13, 11563. [Google Scholar] [CrossRef]
Leach, D.; Neal, M.; Bachiller-Jareno, N.; Tindall, I.; Moore, R. Major Ion and Nutrient Data from Rivers [LOIS]—EIDC. Available online: https://catalogue.ceh.ac.uk/documents/4482fa14-aee2-4c7f-9c62-a08dc9704051 (accessed on 8 November 2024).

Figure 1. A study area map showing the catchment’s position within the UK.

Figure 2. Models’ development and independent testing procedure.

Figure 3. The illustration of flow peaks at the upstream (M1) and downstream end (M2) of the river stretch.

Figure 4. Illustration of input and output data for ANN types: (a) type 1; (b) type 2; and (c) type 3. M1 = upstream end; M2 = downstream end; T1, T2, and T3 = tributaries.

Figure 5. Type 1 ANN results for M2 flow forecast based on M1 and tributaries’ water flow with 3 h resolution and 15 h forecast horizon, using TRAIN data.

Figure 6. Type 2 ANN results, with sub-type 2.1, for M2 flow forecast based on M1 and tributaries with resolution of 3 h and 9 h (#2.1.1) and 12 h (#2.1.2) forecast horizons, using TRAIN data.

Figure 7. Type 2 ANN results, with sub-type 2.1, for M2 flow forecast based on M1 and tributaries with resolutions of 15 min (#2.1.3) and 1 h (#2.1.4) and 6 h forecast horizon, using TRAIN data.

Figure 8. Type 2 ANN results, with sub-type 2.2, for M2 flow forecast based on M1 with resolution of 1 h and 12 h forecast horizon, using mixed TRAIN data.

Figure 9. Type 3 ANN results for M2 flow forecast based on M1 and previous predictions at M2 with resolution of 1 h and 1 h forecast horizon, using TRAIN data.

Figure 10. A comparison of predicted and observed water flow, soluble reactive phosphorus (SRP), and total phosphorus (TP) with a 12 h (ANN #2.3.2.) and 15 h (ANN #2.3.1.) forecast horizon and a resolution of 3 h with TRAIN data.

Figure 11. Type 1 ANN results, with sub-type 1.1, for M2 flow forecast based on M1 and tributaries’ water flow with 3 h resolution and 15 h forecast horizon, using UNSEEN data.

Figure 12. Type 2 ANN results, with sub-type 2.1, for M2 flow forecast based on M1 and tributaries with resolution of 3 h and for 9 h (#2.1.1) and 12 h (#2.1.2) forecast horizons, using UNSEEN data.

Figure 13. Type 2 ANN results, with sub-type 2.1, for M2 flow forecast based on M1 and tributaries with resolutions of 15 min (#2.1.3) and 1 h (#2.1.4) and for 6 h forecast horizon, using UNSEEN data.

Figure 14. Type 2 ANN results, with sub-type 2.2, for M2 flow forecast based on M1 with resolution of 1 h and 12 h forecast horizon when the UNSEEN data were mixed randomly to capture unexpected events.

Figure 15. Type 3 ANN results for M2 flow forecast based on M1 and previous predictions at M2 with resolution of 1 h and 1 h forecast horizon, using UNSEEN data.

Figure 16. Type 3 ANN results, with sub-type 3.1, using UNSEEN data for M2 flow forecast based on M1 with resolution of 1 h and 1 h forecast horizon for 3 weeks sampled from each season.

Figure 17. A comparison of predicted and observed water flow, soluble reactive phosphorus (SRP), and total phosphorus (TP) with a 12 h (ANN #2.3.1.) and 15 h (ANN #2.3.1.) forecast horizon and a resolution of 3 h with UNSEEN data.

Table 1. Drawbacks of ANN models and proposed steps to overcome them.

Drawbacks	Mitigation Measures
An often-time-consuming process of trial-and-error to identify the optimum ANN type and structure, avoiding overfitting	Automated algorithms can be set up for the handling of the trial-and-error process to minimize time
The danger of large errors during extrapolations	Increased model accuracy may be achieved using multiple models, hybrid models, or combinations of ANNs with optimization techniques [13,44,45]
The lack of process understanding and limited possibility of correlating phenomena with the model parameters	Additional datasets for independent verifications after model developmentThe combination of ANNs with mechanistic models
Missing data	Using other types of data in the system or relying on data in the literature [17]

Table 2. Study area details [54,60,61,62,63]: National Grid Reference (NGR [64]); average annual rainfall in standard period (1961–1990) in millimeters (SAAR [65,66]); base flow index (BFI); 10% exceedance (Q10); 5% exceedance (Q5).

Site Name and Notation	Geographical Coordinates, NGR	Elevation, m	SAAR, mm	Historical Daily Flow Data
Site Name and Notation	Geographical Coordinates, NGR	Elevation, m	SAAR, mm	Period	BFI	Mean, m³/s	Q10, m³/s	Q5, m³/s
Swale in Catterick Bridge, M1	SE226993	60 m	1123	1992–2023	0.37	12.9	31	47
Bedale Beck in Leeming, T1	SE306902	24 m	729	1983–2023	0.31	3.2	5	14
Wiske in Kirby Wiske, T2	SE375843	20 m	632	1980–2023	0.16	4.8	14	29
Cod Beck in Dalton Bridge, T3	SE421766	19 m	696	1988–2023	0.48	1.7	4	6
Swale in Crakehill, M2	SE426734	12 m	835	1955–2023	0.48	20.2	47	68

Table 3. The values of water flow and travel times.

Water Flow Range	Indicator	Flow at M1 [m³/s]	Flow at M2 [m³/s]	Travel Time Range [h]
All flows (entire dataset)	Minimum	0.74	1.97	14–15
	Average	11.82	18.69
	Maximum	484	224
Lower-flows subset	Minimum	0.74	1.97	14–16
	Average	3.58	5.57
	Maximum	84.4	52.5
Higher-flows subset	Minimum	0.995	3.04	12–14
	Average	18.72	29.68
	Maximum	484	224

Table 4. Characteristics of the developed ANNs.

ANN Type	ANN Sub-Type	ANN Notation	Details for ANN Input and Output Data
1	1.1	#1.1.1	Inputs: one set of observed water flows at M1, T1, T2, and T3 in a single time stamp. Output: water flow at M2.
2	2.1	#2.1.1; #2.1.2; #2.1.3; #2.1.4	Inputs: time series of observed water flows at M1, T1, T2, and T3. Output: water flow at M2.
2	2.2	#2.2.1; #2.2.2	Inputs: time series of observed water flows at M1. No tributaries’ data. Output: water flow at M2.
2	2.3	#2.3.1; #2.3.2	Inputs: time series of observed water flows, SRP, and TP concentrations at M1, T2, and T3, water temperature, and seasonality. Output: water flow and SRP and TP at M2.
3	3.1	#3.1.1	Inputs: time series of observed water flows at M1 and previous predictions of water flows at M2. No tributaries’ data. Output: water flow at M2.

M1 = the upstream end; M2 = the downstream end; T1, T2, and T3 = tributaries.

Table 5. ANNs’ characteristics.

ANN No.	Predicted Indicators	Hidden Layers’ Neurons	Transfer Functions for Hidden Layers	Resolution [h]	Input Window [h]	Output Timing [h]	Forecast Horizon [h]
#1.1.1	flow	[3, 3]	[logsig, tansig]	3	0	15	15
#2.1.1	flow	[3, 3]	[logsig, tansig]	3	0–12	21	9
#2.1.2	flow	[1, 7, 6]	[logsig, purelin tansig]	3	0–6	18	12
#2.1.3	flow	[5, 10, 7]	[tansig, tansig, logsig]	0.25	0–12	18	6
#2.1.4	flow	[6, 6, 4]	[tansig, purelin, logsig]	1	0–12	18	6
#2.2.1	flow	[3, 2, 7, 5]	[logsig, tansig, logsig, purelin]	1	0–9	17	8
#2.2.2	flow	[7, 6, 4]	[logsig, logsig, logsig]	1	0–9	21	12
#2.3.1	flow, SRP, OP	[2]	[purelin]	3	0–12	27	15
#2.3.2	flow, SRP, OP	[4, 6, 5]	[purelin, purelin, purelin]	3	0–12	24	12
#3.1.1	flow	[6, 1, 1]	[purelin, purelin, purelin]	1	0–19	20	1

Table 6. ANNs’ model performance during development using TRAIN data.

ANN No.	Indicator	Forecast Horizon, h	Prediction Performance (TRAIN Data)
ANN No.	Indicator	Forecast Horizon, h	KGE	NSE	PFC	LFC
#1.1.1.	flow	15	0.98	0.97	0.17	0.17
#2.1.1.	flow	9	0.97	0.96	0.19	0.17
#2.1.2.	flow	12	0.89	0.97	0.26	0.15
#2.1.3.	flow	6	0.99	0.99	0.02	0.12
#2.1.4.	flow	6	0.99	0.99	0.14	0.24
#2.2.1.	flow	8	0.92	0.89	0.20	0.12
#2.2.2.	flow	12	0.94	0.91	0.20	0.12
#2.3.1.	flow	15	0.92	0.94	0.10	0.19
	SRP	15	0.78	0.69	0.17	0.33
	TP	15	0.45	0.48	0.44	0.37
#2.3.2.	flow	12	0.99	0.99	0.26	0.19
	SRP	12	0.74	0.61	0.19	0.32
	TP	12	0.73	0.53	0.41	0.34
#3.1.1.	flow	1	0.99	0.99	0.13	0.09

Table 7. ANNs’ forecast performance with the UNSEEN dataset.

ANN No.	Indicator	Forecast Horizon, h	Prediction Performance (UNSEEN Data)
ANN No.	Indicator	Forecast Horizon, h	KGE	NSE	PFC	LFC
#1.1.1.	flow	15	0.93	0.97	0.15	0.15
#2.1.1.	flow	9	0.96	0.97	0.19	0.18
#2.1.2.	flow	12	0.88	0.96	0.17	0.13
#2.1.3.	flow	6	0.89	0.85	0.38	0.19
#2.1.4.	flow	6	0.97	0.96	0.21	0.15
#2.2.1.	flow	8	0.91	0.85	0.21	0.18
#2.2.2.	flow	12	0.94	0.89	0.19	0.16
#2.3.1.	flow	15	0.74	0.70	0.44	0.28
	SRP	15	0.74	0.74	0.28	0.24
	TP	15	0.79	0.60	0.29	0.28
#2.3.2.	flow	12	0.79	0.70	0.29	0.38
	SRP	12	0.85	0.75	0.25	0.27
	TP	12	0.40	0.08	0.31	0.39
#3.1.1.	flow	1	0.99	0.99	0.13	0.11

Table 8. A summary of the best-performing ANNs during development and evaluation.

ANN Number	Applicability and Observations
#1.1.1	A water flow forecast at M2 if a single set of measurements is available at M1 and in main tributaries at a resolution of 1 h. The travel time along the stretch is covered by the forecast horizon of 15 h. No perfect estimations of large peaks are needed.
#2.1.1 #2.1.2 #2.1.4	A water flow forecast at M2 when flows are monitored at M1 and in tributaries. Applicable to a well-monitored watercourse. Different resolutions of data can be used (1 h for #2.1.4 and 3 h for the other two). Better prediction performance compared to 1.1.1, but a lower forecast horizon (6 h, 9 h, 12 h) would not be problematic. Better peak predictions compared to types 1.1.1 and 2.2.2.
#2.2.2	A water flow forecast at M2 with 12 h of anticipation based on hourly data at M1, when knowledge on tributaries is not available. Very good prediction of high peaks (up to 225 m³/s) and flows under 50 m³/s. Reduced need for observations.
#3.1.1	A water flow forecast at M2 with 1 h of anticipation in situations without monitoring data in tributaries and at M2. Used when very accurate predictions are needed and a short forecast horizon (>1 h) is sufficient. Trends are captured very well and there are better peak predictions compared to earlier ANNs. Reduced need for observations.
#2.3.1	For when water flows and concentrations need to be predicted. Data from two of the main tributaries and the upstream end are available. The travel time along the stretch is covered by the forecast horizon of 15 h.

M1, T1, T2, and T3 = observed water flow and concentrations of SRP and TP at the upstream end (M1) and in the tributaries (T1, T2, and T3).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Timis, E.C.; Hangan, H.; Cristea, V.M.; Mihaly, N.B.; Hutchins, M.G. High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK). Hydrology 2025, 12, 20. https://doi.org/10.3390/hydrology12020020

AMA Style

Timis EC, Hangan H, Cristea VM, Mihaly NB, Hutchins MG. High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK). Hydrology. 2025; 12(2):20. https://doi.org/10.3390/hydrology12020020

Chicago/Turabian Style

Timis, Elisabeta Cristina, Horia Hangan, Vasile Mircea Cristea, Norbert Botond Mihaly, and Michael George Hutchins. 2025. "High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK)" Hydrology 12, no. 2: 20. https://doi.org/10.3390/hydrology12020020

APA Style

Timis, E. C., Hangan, H., Cristea, V. M., Mihaly, N. B., & Hutchins, M. G. (2025). High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK). Hydrology, 12(2), 20. https://doi.org/10.3390/hydrology12020020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Resolution Flow and Phosphorus Forecasting Using ANN Models, Catering for Extremes in the Case of the River Swale (UK)

Abstract

1. Introduction

2. Materials and Methods

2.1. A Description of the River Stretch and the Field Data

2.2. Building the ANN Models

2.2.1. Travel Time Evaluation

2.2.2. Performance Evaluation Methods

2.2.3. Slitting of Data

2.2.4. Exploring Wide Ranges of ANN Hyperparameters

2.2.5. Reducing the ANN Hyperparameter Search Ranges

2.2.6. Generating Better-Performing ANNs

2.2.7. Evaluating Forecast Performance Using UNSEEN Data

3. Results

3.1. ANNs’ Architectures

3.2. ANNs’ Calibration Results (Training, Validation, and Testing) Using TRAIN Data

3.3. ANNs’ Forecast Results Using UNSEEN Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI