1. Introduction
Understanding the hydrological processes in large, open areas, such as catchments, and further modelling these processes are still open research questions. In [
1], the author discussed the issues with the invalidation of computer models from environmental science perspective. In [
2], Teng et al., reviewed state-of-the-art methods for flood monitoring and hydrodynamic models and concluded that no “perfect model” exists. There are still unanswered research questions that warrant addressing. The author also states that recent improvements in remote sensing and that the availability of data plays a key role in the development of new models as well as improving the accuracy of existing models. Many advanced sensing and modelling techniques have been developed in the past decades but we are still far from fully interpreting the nature of hydrological processes and we cannot measure everything that affects the hydrological environment. In fact, only a fixed range of measurements in time and space can be obtained through a limited range of techniques [
3]. Traditionally, hydrologists have focused on building physical models. Building such a model, especially a generic model that can be easily transferred from catchment to catchment, is extremely challenging. Significant research effort has been devoted to quantify the response to water inputs [
4,
5,
6,
7,
8].
In recent years, many researchers have switched from this traditional perceptual-based practice to statistical-based approaches, also known as the data-driven approach [
9,
10,
11]. It has been acknowledged that field data still holds the key to hydrological environmental understanding [
3]. It has been also noted that globally there is a shortage of hydrometric data and that this is hampering water management efforts [
12,
13]. The lack of data can be attributed to the high cost of ownership of hydrometric sensing equipment and global restrictions on funding. Significant reduction in metering density since the 1980s has been observed in Canada and US [
14]. This has been recognized by government bodies, such as EU (EU data portal:
https://www.europeandataportal.eu/en/), the US (U.S. Government’s open data:
https://www.data.gov/) and the Irish (Ireland’s open data portal:
https://gata.gov.ie) governments, who have started releasing official data to the public. However, public data is often sparse (both at temporal and spatial scales), which may not provide adequate information at the regional level. This is mainly due to the high cost of deploying monitoring stations and data collection procedures. In contrast, building, evaluating and testing hydrological models typically require years of high frequency datasets.
Recent advances in smart sensors, wireless data communications, cloud computing and machine learning have pointed hydrological research in a new direction. Low power, self-contained sensor units have been released, new wireless communication methods have been standardized, e.g., LPWAN, 4G and 5G, and fast growing cloud computing services have become available. New smart sensor network architectures for environmental monitoring tasks have been proposed [
15,
16]. One typical architecture of a future smart sensor network for environmental monitoring consists of three virtual layers: (1) a physical layer, where all the smart sensors reside and data pre-processing occurs, either on the sensor itself or on a field gateway; (2) the data transmission layer, where data and instruction exchange occur; (3) a data processing layer, where meaningful information is extracted and organized.
There are many definitions of “smart sensors” [
17,
18,
19]. However, to summarize all these definitions, a smart sensor must be
intelligent and
adaptable. In future large-scale sensor networks, e.g., a rapidly sampling hydrometric network, the data collected will probably be far too large for traditional applications to send, store or process. Not to mention, most of the measurements carry negligible information as in most environmental monitoring networks, “normal” behaviour is not particularly interesting to researchers. Thus, a sensor unit must be intelligent and pre-process data locally on board or this process may also occur on field gateway devices depending on the sensor network structure. In this way, only condensed useful information is uploaded to the server. Also, the sensor must adapt itself to handle variations in its environment. For instance, the sensor will update its on-board data model automatically if the physical condition of the river channel changes. Another example is that the sensor may adjust the threshold associated with it when measurements start getting noisy, e.g., after a heavy rainfall or a storm event.
In this work, we first provide a general overview of previous and current developments in hydrology and machine learning research domains and their limitations. Subsequently, the catchment used and the sensors deployed for this study is described. The performance of the deployed off-the-shelf sensor units are evaluated against the high precision and high-cost reference stations operated by South Dublin Council. These affordable sensor units provide an opportunity of bridging the gap in the availability of fine-grained “big” datasets and data-driven models. In addition, a data-driven smart sensing method MoPBAS [
20] (which would potentially be built into the sensing units), is illustrated and its strengths and weaknesses are discussed as an introduction to the smart sensing concept. Moreover, events have also been constructed from the data based on the anomalies detected by the MoPBAS methods and further mapped to storm events that occurred during the deployment period at the catchment. Finally, the lag time of the River Dodder is also analyzed. To achieve this, we proposed and evaluated an automated end-to-end pipeline (as shown in
Figure 1) from data capturing to information extraction processes.
The main contributions of this work are: (1) establishing the performance of an off-the-shelf, low-cost, liquid level sensor, designed for operation in a closed tank environment when deployed in an open water environment and (2) using the collected data to show a full, automated, pipe-line of data collection, data processing and information extraction processes in a catchment area. This can significantly reduce the costs of deploying a water level monitoring system at high spatial and temporal scale and potentially assist hydrologists in better understanding and managing catchments. To support reproducible and collaborative research, the collected dataset and the source code of this work is publicly available at
https://github.com/DianZhang/WaterLevelMonitoring (for research purposes only).
The remainder of the paper is organized as follows:
Section 2 presents an overview of the previous and current development in hydrology and machine learning.
Section 3 describes the test site in this work followed by the description of the sensors deployed and the data captured. A computationally inexpensive smart sensing method is described in conjunction with fixed threshold alerting, which has been built into the sensor. The discussion of the results obtained is carried out in
Section 5. Finally,
Section 6 contains the conclusions and future work.
2. Literature Review
The cost of simple telemetered river gauges in Ireland has been estimated at up to €15,000 per installation and up to €5000 per annum for ongoing operation and maintenance [
21]. Similar estimations have been reported in the USA [
22]. Low-cost alternatives based on low-power Wireless Sensor Networks (WSNs) with commercial off-the-shelf sensors have been developed and tested in locations including Sao Paulo Brazil [
23], the Sierra Nevada Mountains [
24] and the Upper Hudson River, New York [
25]. These studies were mainly focused on addressing the power and communication issues around distributed hydrometric monitoring that typically require fixed grid power supplies or solar power installations and have yet to be commercially available. The Kingspan Watchman Anywhere is a complete off-the-shelf solution, proven in the field of tank level monitoring. It is a simple and robust ultrasonic sensor with integrated battery (4 × LR14 Alkaline C batteries) and tri-band GSM/GPRS telemetry. The cost of the sensors is approximately €180 (includes VAT and delivery) per unit (Price from:
https://heatingpartswarehouse.co.uk/product/watchman-anywhere-sonic-oil-level-monitor/?gclid=EAIaIQobChMIysCugebp4QIVaL7tCh1BWAicEAYYAiABEgJSFvD_BwE last accessed: 11 April 2019). The sensor unit has one year free data communication subscription and €30 per sensor per year after.
It can be installed quickly and easily, and for the duration of the field test in this work (27 months), no maintenance was required. With smart communication management and power-saving, the operational life can extend to 5 years. When adapted to river level monitoring, this offers a low-cost alternative to traditional hydrometric stations and allows for the cost effective installation of high spatial density networks.
A catchment is defined as a specific segment of the earth’s surface, set off from adjacent segments by a more or less clearly defined boundary, and occupied at any given time by a particular grouping of plants and animals [
26]. Traditional hydrological models at a catchment are physically based (also known as process-based) that can be dated back to the 1960s [
27]. Many approaches have been proposed since [
28,
29,
30,
31]. However, the initial optimism of physically based methods has also been challenged by the scientific community [
32,
33,
34,
35,
36]. It has been argued that there are fundamental problems in the application of physically based models for practical prediction in hydrology and these problems result from limitations of the model equations relative to a heterogeneous reality. Today, understanding and modelling the hydrological processes in large areas are still open research questions. Many models have been proposed and evaluated. One of the most widely applied models is the SWAT (soil and water assessment tool model) [
37,
38]. It is a comprehensive model and the development of the model is still ongoing. However, the model requires a diversity of information in order to produce outputs. Furthermore, significant effort is required to configure, calibrate, run and evaluate a SWAT model. For example, the input/output document for SWAT 2009 model has over 600 pages (39 chapters) (
https://swat.tamu.edu/media/19754/swat-io-2009.pdf last accessed: 20 April 2019). Other hydrological models such as MIKE SHE [
29] (and its variations [
39]), FEFLOW [
40], MODFLOW [
41] and HydroGeoSphere [
42], are also applied in the literature. However, similar to SWAT, these models require extensive data and input parameters, which are sometimes not available. This makes it difficult to calibrate a model and often results in inefficient outputs. Also, both meteorological data and soil properties have a large influence on the performance of these models. A proper knowledge of subsurface flow pathways and hydraulic characteristics is necessary, otherwise, an ineffective calibrated model may perform poorly [
43]. Ref. [
44] provided an excellent review of current development as well as challenges in hydrology. The author also gave his vision of future trends in hydrology modelling.
On the other hand, recent developments in IoT [
45,
46] and cloud computing technologies [
47,
48] have enabled the opportunity of automated data capturing, transmitting and processing on a massive scale. Microsoft Azure, IBM Watson, SAP Leonardo, Amazon AWS etc., all provide IoT platforms as a service (PaaS) for IoT applications. This also enables the possibility of rapid development of the back-end for smart remote monitoring systems. In terms of data analysis, recent advances in machine learning, especially deep learning, have achieved near human performance in applications such as object detection in images [
49,
50], image caption [
51,
52] and machine translation [
53,
54]. However, these domain success models can hardly adapt to catchment modelling due to its complexity, spatial heterogeneity and lack of data. These deep models contain millions of parameters (e.g., AlexNet: 63 million [
49], VGG16: 138 million [
55]), which require a massive dataset to train and evaluate (AlexNet and VGG16 are trained using the ImageNet dataset, which contains over 14 million annotated images [
56]). Collecting a dataset at such a scale for catchment monitoring is not feasible.
In contrast, many researchers in the environmental science domain are still focusing on developing new sensors to measure physical properties [
57,
58,
59,
60] or bio-chemical properties [
61,
62] of a water body. Much research has also been carried out from the catchment management perspective. Rather than build complex models and simulations, simple real, or near-real, time data-driven monitoring systems from key locations have been proposed and evaluated [
63,
64,
65]. Research is also focused on the fusion of multiple sensing modalities that combines information from various data sources that complements each other to provide higher level information for further analysis and decision support [
66].
3. The Deployment Site
The Dodder river originates in the Dublin mountains to the south of Dublin City, flowing through the towns of Churchtown and Dundrum before joining the River Liffey at Dublin port and then entering Dublin Bay. The area of the catchment is 142.4 km
[
67]. The Dodder has five major tributaries, including the Tallaght Stream, the Owendoher stream, the Whitechurch stream, the Little Dargle and the Dundrum Slang, which contribute almost 50% of the flow. Upstream from the confluence with the Tallaght stream, there are two storage reservoirs (Glenasmole Reservoir Uppper and Glenasmole Reservoir Lower) which hold 1.6 and 0.73 million m
, respectively. The larger reservoir is used to supply drinking water to County Dublin, while the smaller one is used to maintain a minimum flow in the Dodder river. In advance of a heavy rainfall, the water level of the lower reservoir is reduced to provide storage capacity. The catchment, especially the lower Dodder, is known for risk of flooding due to its large change in elevation over a short distance (160 m at the lower reservoir to sea level over 13.5 km measured in a direct line). An overview of the area of the catchment is shown in
Figure 2. Three major flood events have been recorded in Aug 1986, Feb 2002 and Oct 2011 with 369, 621 and 335 dwellings reported flooded in the catchment. Thus, due to the economic importance and high risk, the Dodder Catchment was selected as a suitable test site for this study.
4. Sensor Deployment
The sensors deployed in this case study were off-the-shelf Kingspan Watchman Anywhere Pro ultra-sonic sensors that are designed for monitoring liquid levels in tanks. Liquids, such as diesel, AdBlue, lubricant additives etc., have been monitored successfully using these sensors for periods of over 5 years. The relatively low-cost (almost 100 times cheaper compared to the reference station operated by Dublin City Council that cost over €15k to construct and approximately €5k to operate and maintain annually as stated in
Section 2) enables the monitoring of a catchment at a much higher spatial density than feasible with traditional, more expensive, monitoring stations. The sensor unit is capable of two-way communication that allows measurements to be sent to a cloud server and the unit to receive instructions remotely. The sensor unit itself consists of an ultrasonic transducer, tri-band GSM/GPRS module (LPWAN and 5G in future version according to Kingspan’s sensor development division), four type C LR14 Alkaline batteries, a control board and a UV Stabilized Polypropylene housing (as shown in
Figure 3 left). The full specificatin of the sensor unit can be found here (
https://www.kingspan.com/irl/en-ie/product-groups/service-and-telemetry/telemetry/commercial-level-monitoring/watchman-anywhere-pro last accessed 24 April 2019). A sample installation is shown in
Figure 3 right, where the sensor unit is screwed into the wall sitting on top of a stilling tube, which calms the water surface and prevents interference in the signal path from intruders, such as spiders. An ultra-sonic signal is emitted, then reflected as an echo signal from the water surface before being captured by the receiver. The time interval is used by the sensor to calculate the distance to the water surface, which is converted to water level by subtracting from the distance between the sensor and the river bottom (measured during installation).
In comparison with traditional monitoring stations, the Kingspan solution has several advantages:
Easy to deploy, 30-minute average install time.
Easy to upgrade, remote firmware upgrade or settings update.
No mains electric power required – safety and civil cost benefits.
Field proven, over 30 thousand units installed for fluid level monitoring in 24 countries globally
Built-in alert when rapid changes are detected.
No additional maintenance required.
A total of 11 sensor units were originally deployed in the South Dublin region. The geolocations of the installations are shown in
Figure 4. Nine of these units were along the River Dodder and its tributaries. One unit was vandalized immediately after installation thus its location is not shown. The Bohernabreena and the Clonskeagh bridge units on the River Dodder and two further units (Gandon Close on the Poddle River and Lady’s Lane on the Camac River) were co-located with existing hydrometric stations belonging to Dublin City Council (DCC) in order to validate the performance of the sensors. Detailed deployment site information is listed in
Table 1. Unfortunately, units at Brehons Chair and Edmondstown were vandalized a few months after being installed, thus, the dataset collected from these units is not included in this study.
Data Captured
Originally, the sensors were set to take hourly readings. However, after comparing with the DCC reference stations, it was found that the measurements did not capture rapid variations at multiple sites. Therefore, the sampling rates of all the sensors were switched to 15 mins on 17 November 2015 by sending instructions remotely from the central control server. To optimize the battery life, the sensor stored the measurements and sent one data package every four hours unless the built-in alert level was breached, in which situation, the sensor sent the data immediately. During the 27 month deployment (since the increase of the sampling rate), a total of 624,276 readings have been received from the eight sensor units, while 1716 values were lost. A summary of the dataset is described in
Table 2. The distributions of all sensor readings are shown in
Figure 5.
5. Smart Sensing
The sensor has a built in fixed dangerous level alerting mechanism. Three dangerous levels were set individually for each of the sensor units. An alert message, green, amber or red, was sent to the corresponding operator when any of the thresholds were breached. Until 8 February 2018, a total number of 115 alerts have been received since the deployment.
To convert the units to smart sensors, we investigated a data-driven anomaly detection method. To prove the concept, the data-driven MoPBAS [
20] anomaly detection was applied to the captured data. As described in [
20], the method is specifically designed for environmental sensors. There are several advantages including:
Low hardware requirement (the algorithm can be built into the sensor)
Computationally inexpensive (anomalies can be detected in real time)
A compositional small training data set is required (model can be built as soon as a small set of data, e.g., 50 readings, has been received)
Easy to tune (initial parameters can be set based on site survey)
Dynamic modeling (model is trained based on the data captured by the sensor. Thus, every model is trained based on the variation of the measurements at a site)
Dynamic updating (model is updated in real-time when new data arrives)
Dynamic threshold (detection threshold is constantly updating based on variation of the measurements)
Once the anomalies are detected, they are grouped into events based on their temporal information. Contiguous anomalies are considered as the same event.
8. Conclusions
In this work, we first evaluated the performance of low-cost off-the-shelf Kingspan Watchman Anywhere Pro sensors for open water level monitoring tasks. The performance is very promising, which indicates that this self-contained unit can measure water levels of an open water body almost as accurately as traditional sophisticated stations with a small fraction of the cost. This indicates that the dataset collected is valid and provided sufficient information for the subsequent analysis. In addition, the proposed full end-to-end pipeline was evaluated using the collected dataset as a case study to illustrate a fully automated data collection, transmission and information extraction system. We demonstrated a data-driven anomaly detection method that can automatically adapt based on the variation at the site. This enables the possibility of creating a large-scale data-driven smart water level sensor network that can automatically adapt based on the characteristics of the target site. As an example of the utility of the system, lag time along the River Dodder was also analyzed and the results show that there was no significant lag time along the river. The reason could be that the rapid change in water level sensor measurements was only due to heavy rainfalls, which generally covers the whole catchment. The water enters the river through surface run-off, drain systems, etc., concurrently along the whole river channel. Thus, the water levels at all the deployed locations rise almost simultaneously. Finally, the rainfall values from all major storm events during the test period were also analyzed. The results show that all the rapid changes in water levels after storm events that have heavy rainfall were successfully detected. The benefit of coupling anomaly detection and, subsequently, abnormal event construction with storm events is that it provides large-scale fine-grained water level response to heavy rainfall at a local level, which can subsequently contribute to data-driven rainfall modeling. The data-driven anomaly detection method combined with the self-contained sensor unit provided a fully automatic pipeline for this end-to-end process. The pipeline described in this work provided the opportunity of extracting high-level information at high spatial and temporal scales automatically, which can potentially further assist hydrologists to better understand the hydrological processes in large areas.
Recent developments (such as deep learning) in computer science, especially in the big data analysis domain have shown significant improvement in performance and achieved near-human performance (e.g., in image captioning, object detection in images) or exceed human performance (e.g., AlphaGO) in many applications. As future work, we will try to adopt these successful methods to the catchment monitoring domain, either by fine-tuning an existing model or retraining a model from scratch. LSTM (long-short term memory), a type of recurrent neural network (RNN), has shown excellent accuracy in predictive and time series applications, such as machine translation, and will be evaluated using the dataset collected in this work.