1. Introduction
Estimating rainfall in urban areas is essential for water management, such as rainfall–runoff (R-R) modelling, stormwater management, and flash flood nowcasting [
1]. Therefore, accurate rainfall data are essential for authorities managing water resources and floods in urban areas. Furthermore, urban catchments are known for their high spatial variability and relatively shorter catchment response [
2]. Thus, rainfall estimation in an urban catchment requires a high spatial and temporal data resolution to better represent the hydrological processes in the catchment [
2].
To date, several techniques have been developed for measuring rainfall, such as conventional rain gauges, weather radars, and satellite imagery techniques, which are well accepted and utilised [
1,
3]. However, finding a universal methodology for rainfall estimation remains elusive, as these techniques, though advantageous in some respects, are limited in other aspects [
4]. For instance, conventional rain gauges are primitive in only providing a direct rainfall measurement of the amount reaching the ground surface. However, although rain gauges are point estimates with a reliable temporal resolution, placing many of them over a catchment to achieve a desired spatial resolution may not be feasible due to the required hardware maintenance and operating costs [
5].
On another front, weather radars and satellite imagery techniques are indirect areal rainfall estimation techniques that obtain a good level of rainfall’s spatial and temporal characteristics compared to rain gauges. Yet, the rainfall estimates obtained by weather radars are affected by multiple error sources that require constant calibration to ensure their reliability [
6,
7]. On the other hand, satellite imagery techniques have issues in specific areas such as coastal regions, snow-covered areas, valleys and mountains, and hills with shallow clouds [
8]. Additionally, they provide a limited spatial resolution (i.e., 25 km × 25 km), which can be improved by downscaling methods resolving 1 km × 1 km at best [
9]. These limitations highlight the potential inaccuracies in capturing localised precipitation events that may trigger flash floods.
Given the limitations of the techniques, constant efforts have been made to develop alternative rainfall-sensing approaches, such as telecommunication networks, citizen science, acoustic sensors, and image sensors. These alternative methods provide low-cost complementary observations to the data measured by existing rainfall estimation techniques.
Among these alternative approaches, citizen science stands out the most due to its successful application in the last two decades in the US’s quantitative rainfall measurement through the CoCoRaHS project [
10]. The CoCoRaHS project was established in 1997, where citizens are equipped with a mini rain gauge to collect daily rainfall estimates. These rain gauges can be installed in backyards or any open area, and citizens only need to read the measurements and upload them to a website where all readings are collected and presented for different agencies to utilise. So far, more than 20,000 participants have been involved in this project. Several efforts followed the success of the CoCoRaHS project, which involved regular public rain data collection across North America as a great example of shared data.
However, citizen science rain gauges can be easily installed in rural areas and townhouses but not in urban areas with tall buildings, and they can only provide a single daily rainfall measurement. This highlights the need to develop new techniques for citizens to utilise with more ease and acceptability to collect rainfall measurements in urban areas, which can provide sub-daily rainfall measurements of a rainfall event to be used in hydrological modelling applications. Nowadays, most urban citizens are equipped with smartphones and internet connections, while most buildings have surveillance cameras; therefore, this study proposes using smartphones or surveillance cameras as image sensors for measuring rainfall with the notion that such data collection can be conducted through citizen science. The predicted rainfall data by this method will provide complementary rainfall data to the existing rainfall measurement techniques, such as rain gauges and radars, as this approach can offer additional spatial coverage for rainfall measurement. The reason for selecting images in this study is the constant development and progressing attention in the literature towards image sensors and image processing techniques for rainfall sensing.
There are a few studies that share our objective of measuring rainfall using camera-based techniques. For instance, Allamano et al. [
11] employed a statistical framework based on fundamental camera optics to estimate rainfall intensity from images captured by smartphones. The proposed framework utilised five processing phases: (1) drop detection, (2) blur effect removal, (3) estimation of the drop velocities, (4) drop positioning in the control volume, and (5) rain rate estimation. The developed approach for drop detection was entirely based on identifying the brightness threshold in a camera setting that allows drop size detection. This study utilised 104 min for the development of the framework. The results of the developed method were compared with tipping-bucket rain gauge data, which resulted in a root mean square error performance of 3.01 mm/h.
Similarly, Dong et al. [
12] developed a rainfall intensity estimation technique using video recordings. The proposed methodology was based on identifying raindrop size distribution in a video frame (image) and utilising it to estimate rainfall intensity by fitting the DSD in a gamma distribution model. The developed methodology realises two stages: (1) extracting grey tone features from images to detect the presence of raindrops in images and (2) extracting the average colour tensor and average intensity difference features for each raindrop from images to focus on the raindrop in the images and eventually calculate the diameter for each focused raindrop. The authors highlighted that the main challenge with this methodology is finding the best camera setting, such as focal length and focal plane, to effectively capture the small drop size in an image. The algorithm developed in this study was compared with pluviometer rain gauge data (9 min) for three rainfall intensities (low, moderate, and heavy rain), which showed an acceptable agreement between the two methods.
Jiang et al. [
13] attempted to improve the work proposed by Allamano et al. [
11] and Dong et al. [
12] and utilise it through surveillance cameras with real-world backgrounds (moving cars). The enhanced algorithm adds a removal framework that eliminates unfocused raindrops and unsatisfying size–velocity relationships. The authors utilised a total of 403 min for the development of the algorithm, which resulted in a mean average percentage error of (21.8%), which is slightly better than the previously published methods by Allamano et al. [
11] (26.0%) and Dong et al. [
12] (31.8%).
Yin et al. [
14] utilised state-of-the-art convolutional neural networks (CNNs) to estimate rainfall intensity using captured images. The authors used a pre-trained model (ResNet) with transfer learning to develop the irCNN model (rainfall estimator from images). The ground-truth rainfall values were collected using a tipping-bucket rain gauge with 1 min and 0.1 mm resolution. The model was trained on a synthetic surveillance camera dataset consisting of 4000 images and tested using a smartphone dataset (918-1s frames) collected on the campus of Zhejiang University, China. The model resulted in a MAPE value of 18.5%. Then, the same model was trained using a real-time surveillance camera dataset of 7117 images from six rainfall events, resulting in a MAPE value of 16.5%. The authors further tested the model performance based on a training event approach by selectively changing the number of events collected using the real-time surveillance camera used for training and testing, resulting in an average MAPE value of 21.9%.
Recently, Wang et al. [
15] proposed a near-infrared surveillance-video-based rain gauge using a 1D convolutional neural network (CNN). The proposed model utilises a rain streak extraction algorithm and uses that feature as the input to the CNN model. A total of 4368 min was used for model development collected using a siphon rain gauge capable of reporting rainfall information every 0.1 mm at a 1 min resolution. The developed model had a varying level of performance in terms of mean absolute error (MAE) ranging from 8.86 to 84.84 mm/h. The authors highlighted that the developed model is only recommended where the surveillance area’s lighting conditions and the cameras’ main parameters do not differ significantly from the camera used in their study.
To this end, the above literature review shows a successful effort toward developing an image-based rainfall estimator. However, those studies utilised a limited number of rainfall events, focusing more on surveillance cameras and less on smartphones. In addition, the AI-based techniques in past studies can still be improved to result in more accurate rainfall estimation. Therefore, this study focuses on three contributions: (1) extending the input images from surveillance cameras to a mix of smartphones and surveillance cameras, (2) collecting an extensive dataset (one of the largest ever published in the literature) of observed rainfall and its corresponding images captured by smartphones and surveillance cameras to train the AI-based model better, and (3) introducing a CNN algorithm that utilises state-of-the-art image processing techniques before feeding images to the CNN model.
In this study, it is speculated that due to the diversified capabilities of smartphones and CCTV cameras in the market to capture HD images, rainfall representation would differ and would have a great adverse impact on the CNN model performance to translate rainfall images to rainfall intensities. Therefore, this study examined the capabilities of image pre-processing techniques to achieve a reliable performance on diversified data for future implementation of image rainfall-sensing applications.
The image-based technique for rainfall estimation developed in this study has the potential to be used in citizen science applications, which could contribute to enhancing the spatial coverage of rainfall data in urban areas, which can help improve the spatial representation of rainfall for stormwater network design and rainfall–runoff studies by providing more confidence to the model, representing a further complementary source of data.
2. Materials and Methods
The methodology of this study includes four main stages. Stage 1 focuses on collecting rainfall images from different sources (i.e., surveillance cameras and smartphones) to conduct rainfall image analysis using an event-based approach. Stage 2 compares five image pre-processing techniques based on CNN model performance on the surveillance camera dataset. Stage 3 evaluates the developed model performance on the surveillance camera dataset using both surveillance camera and smartphone datasets. The final stage (Stage 4) focuses on improving the CNN model by retraining the surveillance camera-CNN model by a transfer-learning [
16] technique to adapt the model for smartphone data.
The basic idea of the suggested method includes both the CNN model development and rainfall image capturing. When it rains, rain images are collected from already-installed sensors, such as cameras, which are common in urban areas. Based on these images, the suggested CNN model is then used to forecast the rainfall intensity. The rainfall intensity estimated by the CNN model at each location will result in highly spatiotemporal rainfall data.
2.1. Study Site and Data Collection
The study site and its surroundings are located at Monash University’s Malaysia campus (see
Figure 1). The data were collected in different locations with different spatiotemporal characteristics to develop a diverse rain image dataset. The rainfall data were captured using a tipping-bucket rain gauge at 1 min resolution. The rainfall image data were captured using both a surveillance camera and smartphones. The aforementioned data collection for rain and its corresponding images took place between May and December 2022 to represent Malaysia’s southeast (May–September) and northeast (November–March) monsoon seasons [
17].
2.1.1. Rain Data Collection
Since the extracted snapshots (rain images) were at a resolution of one frame per second, the rainfall intensity data were linearly interpolated to allocate a specific rainfall intensity to each image frame. In this process, the rainfall intensities recorded at two consecutive minutes were used to interpolate values for the captured frames between the centroid of each minute.
2.1.2. Rain Images from Surveillance Cameras and Smartphones
The first and most important task for developing a reliable deep-learning model is obtaining sufficient high-quality data.
This study utilises natural rainfall data, and from a hydrological point of view, rainfall is imbalanced in nature, where more low-intensity rainfall is observed in a rainfall event than high-intensity rainfall.
The current collected dataset reflects this fact, and the authors intended to keep it as is to allow acceptance from hydrologists, as they usually tend to resist/refrain from methods that use synthetic data.
In this study, an outdoor solar surveillance camera was installed approximately 700 m from the rain gauge station. The video files had a maximum resolution of 1080 pixels and were in the mp4 format. The surveillance camera recorded 805 min of video during rainfall events from 1 May 2022 to 31 December 2022. This camera depicted a variety of rainfall events with different intensities and durations. Data were collected with a fixed background to better show how rain affects the images’ characteristics. Eventually, a total of 6121 images were extracted from these video recordings. Additionnally, rainfall image data were also collected using smartphones at a few points on the Monash University campus, around the rain gauge station (see
Figure 1), during rainfall events between May and December 2022. As a result, 1984 rainfall images for various rainfall intensities and durations were collected.
Figure 2 shows sample rainfall images taken by smartphones during the data collection period.
The dataset was taken from two locations, as shown in
Figure 1. Location 1 (labelled as CCTV station) was mainly used for the CCTV dataset collection, with one CCTV camera set to a static position with a fixed view angle. However, Location 2 (labelled as smartphone stations) was where the smartphone dataset was captured using one mobile phone, capturing photos at different points within that location (or site).
The geographical location was urban, with typical city sceneries, such as buildings, trees, streets, etc. in the background (see
Figure 2).
Weather conditions ranged from cloudy to slightly cloudy.
Most of the photos in the datasets were taken during the daytime.
As a pilot study, we intended to limit the impact of diversifying the dataset to explore the possibility of achieving a working model. The next increment of this study is to challenge the model further and expose its limitations.
Figure 1.
Locations of rainfall images captured using surveillance cameras and smartphones during rain events near the Monash University campus in Malaysia.
Figure 1.
Locations of rainfall images captured using surveillance cameras and smartphones during rain events near the Monash University campus in Malaysia.
Figure 2.
Sample images captured by smartphone on campus during rainfall events.
Figure 2.
Sample images captured by smartphone on campus during rainfall events.
2.2. Image Pre-Processing
Image pre-processing is essential after gathering the image records and rain gauge readings. Real-world raw images often lack focus on certain features or trends; therefore, once collected, pre-processing images to convert them into a simpler, more meaningful format for deep-learning applications is an essential step. Image thresholding effectively and efficiently reduces the number of elements and locates the objects in complex images. Therefore, such a pre-processing method prevents the CNN model from misinterpreting or favouring specific images during learning [
18]. This study examines four thresholding methods to highlight important rainfall features in an image and potentially improve the model’s performance in estimating rainfall intensity. These four thresholding techniques are image sharpening, pixel intensity, Otsu, and Yen methods [
19,
20,
21].
In the current study, the choice of pre-processing methods was based on the most widely accepted methods in the literature to allow future research studies to replicate the same study easily. At the same time, to prevent over-production or removing the rainfall from the images, the most advanced texture description treats rainfall as noise and removes it from the image. In future studies, more advanced texture descriptors shall be considered and analysed for rainfall image sensing.
Image sharpening is a crucial step in image processing that enhances the perceived sharpness of an image. Digital cameras often use and incorporate sharpening algorithms. Professional photographers also use these sharpening methods to improve the quality of their images [
21]. The most widely used and significant characteristic for categorisation is the pixel intensity value, which is the basic information held within pixels [
22]. In a greyscale image, each pixel’s intensity value is one; in a colour image, it is three. Rain can affect an image pixel by changing the pixel’s intensity; background pixels covered by raindrops will show changes in the pixel’s intensity due to being covered by raindrops.
Otsu’s thresholding is a method used for distinguishing objects from their background image by giving each pixel a value for intensity T (threshold) so that each pixel can be categorised as a point in the background or a point on the object [
23]. Otsu’s approach chooses the appropriate threshold (foreground pixels and background pixels) from images with two classes of pixels supporting a bi-modal histogram. For each 2D image, the method creates a histogram and calculates the weights, mean, and variance of the background and foreground pixels for a single threshold. Yen’s thresholding method, however, was established to segment images using automatic multilevel thresholding to produce a 2D image that facilitates identifying the foreground from the image background [
19].
After reviewing the pros and cons of using these four image processing techniques, this study adopted four combinations for developing CNN-based rainfall estimation models: Model 1 using Otsu’s threshold method, Model 2 using Yen’s threshold method, Model 3 using a combination of Yen’s, sharpening, and pixel intensity methods, and Model 4 using a combination of Otsu’s, sharpening, and pixel intensity methods. The sharpening and image intensity techniques are not used alone because the overall performances of Otsu’s and Yen’s methods are consistently better than the other two in the literature [
19,
21].
2.3. CNN Model Development
CNNs are a deep-learning method that was initially developed for image classifications and object detection. CNNs have shown excellent performance in image recognition, improving image quality, object identification, and rain image analysis [
23,
24]. These studies have led to the development of rain filters that remove rain from images. Due to their original purpose, these studies mostly focus on rain removal and image restoration. Therefore, this study uses a CNN model framework to fine-tune and improve the image rain CNN model.
As per the authors’ knowledge, regression CNN models using images as inputs are limited or lacking in the literature, and no open-source models are available to allow transfer learning. The only available or accessible CNN models are those used for classification problems, such as VGG, Renet, googlenet, etc. In our initial examination, using classification CNN models for regression problems, those models yielded negative results, which forced us to explore the direction of building models from scratch.
Nonetheless, the transfer-learning approach is considered in this study for enhancing the model performance on the smartphone dataset, where the initial model was built on a surveillance camera dataset, and the smartphone dataset was introduced to the model through transfer learning, yielding a slight model improvement.
The CNN model used in this study is Konstanz Information Miner (KNIME) [
25], a deep-learning platform with free and open-source data analytics software commonly used to create CNNs. KNIME uses the most popular deep-learning framework, Keras network integration [
25]. This deep-learning framework can compare the images with the rainfall data produced by the rain gauge to estimate rainfall intensities in terms of time series. The CNN model automatically finds the features required for detection or classification when it is given raw image data. In other words, it estimates continuous values of rainfall intensities by analysing the complex relationships between image features and a single output associated with each image, which is a rainfall intensity value.
Before training the model, the rainfall images need to be pre-processed, which involves extracting image frames at a rate of one frame per second from rain videos. The corresponding rainfall intensity from the rain gauge (1 min intervals) was used to depict each minute of rain video. Linear interpolation was then used to assign rainfall intensity values to each extracted frame from the video recordings at each second [
25]. The convolutional design of a typical CNN includes an input layer, convolutional layers, subsampling layers, full connection layers, and an output layer [
26]. For example, an image containing a few raindrops encoded by a pixel matrix is delivered to the input layer. Then, various feature maps are created within the convolution layer using various convolutional kernels (typically utilising a k × k matrix), each with a different weight vector. The subsampling layer performs a local averaging or maximum function to lower the resolution of the feature map and the sensitivity of the output to the shift and distortion of the raw image input. A different characteristic of the input image must be identified for each iteration of the convolution and subsampling operations. The number of featured maps at each convolutional layer and subsampling layer must be predetermined in the model. Finally, the result of the CNN model is a probability vector for the number of raindrops in the image, which is produced using the feature maps and full connection layers. This process is described and illustrated in
Figure 3.
Various CNN types have been reported in the literature for detecting raindrops from rain images. These models varied in terms of their design, including the number of layers, the size of convolutional kernels, the method of facilitating subsampling, etc. Further information about such CNN models can be found in [
23,
24,
27].
Although deep-learning models have been effective in different applications, model training is still challenging due to the numerous model parameters involved [
14]. This study proposes a CNN model based on the general CNN architecture to estimate rainfall intensity from rainfall images.
CNN is a complex model with several internal parameters requiring optimisation, such as the number of convolutional layers, kernel size, activation function type, input size, learning rate, batch size, etc. In the current study, a trial-and-error approach was utilised, similar to several studies in the literature, such as [
13,
14,
15], to achieve the final model topology.
Several input sizes and batches were examined, and the current one documented in this paper was optimal as far as we have investigated, resulting in the model being trained on the given network without crashing. In future studies, a Baye search optimisation method [
28,
29] could potentially be used for CNN model optimisation and performance improvement.
The architecture of the regression CNN starts with an input image in the first layer containing a greyscale rain image photo of 250 × 250 pixels. Then, 64 convolutional kernels, each of which is a 7 × 7 matrix, are used to create 64 distinct feature maps for the raw input image in the second layer. The convolutional kernel is applied to the input image’s pixel matrix by shifting the kernel by two steps (convolutional stride s = 2) to simplify the resolution of the feature map. After adjusting the number of convolutional and pooling layers and setting the kernels and strides in the network, a dropout layer was added and set to 30%. This means that 30% of the neurons in this layer will be dropped randomly in every epoch to prevent the model from overfitting. Since most CNN models are used to classify objects in images, a regression layer was added to the proposed CNN model to calculate the intensity of the rain using the rectified linear unit known as the
relu function. The
relu function is an activation function described as a piecewise linear function that outputs the inputs directly when the input is positive, while it outputs zero otherwise [
29]. The final CNN architecture in this study consists of 17 convolutional layers, as schematically depicted in
Figure 4, and a schematic structure of the CNN model is shown in
Figure 5, where all layers have different convolutional kernel sizes and an increased number of feature mappings (ranging from 64 to 512). The CNN model employs three particular subsampling layers that identify the maximum subsampling technique for values of each matrix of the feature maps.
The CNN model was trained to predict rainfall intensity based on image features acquired by the rain images using the Keras network learner [
30]. Since the model in this study is designed for regression rather than classification, the mean squared error (MSE) was utilised as the loss function. Due to previously demonstrated outstanding performance [
31], the Adam optimiser [
32,
33] was chosen in this study to train the CNN model. The Adam optimiser is a stochastic gradient descent (SGD) iterative method based on adaptive estimation of first-order and second-order moments [
32]. Given the massive number of CNN model parameters that must be calibrated, adequate model training necessitates a vast quantity of rainfall images from the large dataset captured by a surveillance camera and smartphones.
2.4. Model Performance Criteria
The performance criteria employed in this study to evaluate the models’ capacity for prediction at various stages of training, validation, and testing are represented in
Table 1. These criteria are frequently used in CNN model development studies, hydrological modelling, and forecasting applications [
34,
35,
36,
37]. The CNN model was trained using the mean absolute error (MAE) and the mean squared error (MSE) as its loss functions, where smaller values of MAE and MSE imply higher model performance. Following the model’s training, the goodness of fit between the predicted and observed rainfall intensities was assessed using the coefficient of determination (R
2).
Table 1.
Model performance criteria.
Table 1.
Model performance criteria.
Formula | Range |
---|
| [0–1] |
| [0, +∞) |
| [0, +∞) |
4. Conclusions
In this study, an image-based rainfall estimating technique using a deep-learning algorithm CNN was developed and examined for images captured by smartphones and surveillance cameras. The results of this study showed the following:
- (1)
The performance of the image-based rainfall estimating model was assessed by comparing the estimated rainfall data with the observed data. In this comparison, the study’s best rainfall estimating CNN model resulted in R2 = 0.955 using outdoor surveillance rainfall photos as the input. Then, the same model was used to estimate rainfall using the smartphone rain image dataset in the KNIME transfer-learning environment using two different approaches, where Approach 1 gave the best result of R2 = 0.844.
- (2)
The developed CNN model demonstrated its significant potential as a rain-sensing tool by effectively estimating rainfall intensity based on images captured by smartphones and surveillance cameras (i.e., rainfall videos). This model can be implemented in citizen science applications to enhance spatial coverage of rainfall data in urban areas. In addition, this model leverages urban image-based rainfall sensors, a low-cost data collection system, to improve the spatio-temporal resolution of rainfall data.
This study is subject to certain constraints. The outdoor security camera has low night vision, meaning it can only record rain images during the day. As a result, the surveillance camera collection only includes daytime rain-related photos. Subsequent studies ought to examine the model’s ability to predict the intensity of rainfall during nighttime and low-light rainfall events, as this could impact the visibility of rain features in photos. Apart from that, the quality and completeness of the training data significantly impact how well the CNN model predicts rainfall. Erroneous data values or missing photos for particular rain classes are examples of data-collecting issues.
The suggested models for estimating rainfall could be further optimised and enhanced using more complex machine-learning tools and a larger dataset. Since accumulating various rainfall photos from widely spaced sensors can further improve the model’s calibration and validation for rainfall categorisation and estimation, enriching the model with large, high-quality datasets is recommended. This is essential for fine-tuning a reliable CNN rainfall estimation algorithm and making it more equipped for practical applications. Creating rainfall time series and determining any time gap between records and rain gauge data can be significantly aided by mobile recording devices, such as smartphone cameras. In addition, future research can examine the uncertainties associated with different parts of the proposed model and make thorough comparisons among other image rainfall-intensity-based deep-learning models.
The developed pilot model has been designed with scalability in mind. Although the current study focused on collecting rainfall data in tropical climates using two settings and two devices, further exploration is required for the model to generalise across different urban environments, regions, and varying climatic conditions. Future research will focus on enhancing the model to ensure it can effectively capture rainfall data in diverse locations, including areas with less intense rains, thus improving its scalability.