The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province

Zhou, Zhiwei; Fang, Shibiao; Fang, Weihua; Xu, Yaozong; Zhu, Bin; Li, Lei; Ji, Haixiang; Tu, Wenrong

doi:10.3390/w16202880

Open AccessArticle

The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province

by

Zhiwei Zhou

¹,

Shibiao Fang

^2,*

,

Weihua Fang

³,

Yaozong Xu

⁴,

Bin Zhu

⁵,

Lei Li

⁶,

Haixiang Ji

³ and

Wenrong Tu

^7,*

¹

Jiangxi Academy of Water Science and Engineering, Hydraulic Safety Engineering Technology Research Center of Jiangxi Province, Nanchang 330029, China

²

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China

³

Nanjing Research Institute of Hydrology and Water Conservation Automation, Nanjing 210012, China

⁴

China Railway Water Conservancy Information Technology Co., Ltd., Nanchang 330029, China

⁵

Jiangxi Provincial Water Conservancy Investment Jianghe Information Technology Co., Ltd., Nanchang 330029, China

⁶

Beijing Guoxinhuayuan Technology Co., Ltd., Beijing 100071, China

⁷

College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China

^*

Authors to whom correspondence should be addressed.

Water 2024, 16(20), 2880; https://doi.org/10.3390/w16202880

Submission received: 11 September 2024 / Revised: 6 October 2024 / Accepted: 9 October 2024 / Published: 10 October 2024

(This article belongs to the Section Urban Water Management)

Download

Browse Figures

Versions Notes

Abstract

:

This study aims to enhance the detection and assessment of safety hazards in small-scale reservoir engineering using advanced image processing and deep learning techniques. Given the critical importance of small reservoirs in flood management, water supply, and ecological balance, the effective monitoring of their structural integrity is crucial. This paper developed a fully convolutional semantic segmentation method for hidden danger images of small reservoirs using an encoding–decoding structure, utilizing a deep learning framework of convolutional neural networks (CNNs) to process and analyze high-resolution images captured by unmanned aerial vehicles (UAVs). The method incorporated data augmentation and adaptive learning techniques to improve model accuracy under diverse environmental conditions. Finally, the quantification data of hidden dangers (length, width, area, etc.) were obtained by converting the image pixels to the actual size. Results demonstrate significant improvements in detecting structural deficiencies, such as cracks and seepage areas, with increased precision and recall rates compared to conventional methods, and the HHSN-25 network (Hidden Hazard Segmentation Network with 25 layers) proposed in this paper outperforms other methods. The main evaluation indicator, mIoU of HHSN-25, is higher than other methods, reaching 87.00%, and the Unet is 85.50%, and the Unet++ is 85.55%. The proposed model achieves reliable real-time performance, allowing for early warning and effective management of potential risks. This study contributes to the development of more efficient monitoring systems for small-scale reservoirs, enhancing their safety and operational sustainability.

Keywords:

hidden dangers; small reservoir engineering; deep learning; semantic segmentation; water conservancy project

1. Introduction

Small-scale reservoirs are a critical component of water resource management and play an essential role in flood control, irrigation, and ecological preservation, particularly in regions susceptible to climatic and hydrological variability [1]. In China alone, there are over 98,000 reservoirs, of which 95% are classified as small-scale, reflecting their pivotal importance in the national context. Among these, Jiangxi Province is particularly notable, with over 10,300 small-scale reservoirs—accounting for 11.4% of the country’s total [2]. These reservoirs significantly contribute to the socio-economic development of the region by mitigating floods, ensuring agricultural productivity, and maintaining ecological balance. However, the safety and stability of these small-scale reservoirs remain a significant challenge due to several factors, including inadequate design, incomplete hydrological and geological data, and substandard construction practices. These challenges render small-scale reservoirs more susceptible to failures compared to their larger counterparts [3,4]. Since 2010, 1031 reservoirs in Jiangxi Province have experienced major hazards, including severe dam deformation in 127 cases, leakage damage in 382 cases, and scattered flooding in 170 cases. These hidden dangers can easily lead to dam landslides, collapses, and breaches. As of now, more than 10 dam collapses have occurred, causing casualties and damaging over 5000 acres of farmland, resulting in a direct economic loss of 50 million yuan. The remaining 600 reservoirs have become dangerous and unable to operate normally, with a reinforcement cost of about 200 million yuan and indirect economic losses of 160 million yuan due to the inability to realize their functional benefits.

The monitoring and assessment of small-scale reservoirs have traditionally relied on manual inspections and conventional surveillance methods. These approaches are not only labor-intensive and time-consuming but also prone to human error, which can compromise the timely detection of hazards [5,6,7]. Consequently, there is an increasing demand for automated and accurate techniques that can enhance the safety management of small-scale reservoirs. One promising solution is the application of image processing and machine learning techniques, particularly deep learning, to the detection and characterization of structural defects in reservoirs [8].

The conventional approaches to reservoir safety monitoring, such as visual inspections, geotechnical instrumentation, and satellite remote sensing, have provided essential baseline data for hazard detection and management. However, these methods exhibit several limitations. Visual inspections and manual monitoring are labor-intensive, limited by human capability, and can be subjective and inconsistent [9]. Geotechnical instrumentation provides detailed point-based measurements but lacks spatial coverage and can be affected by external environmental factors [10]. Satellite remote sensing offers wide area coverage but is often hindered by coarse resolution, low revisit frequency, and sensitivity to atmospheric conditions [11]. In recent years, several studies have explored the potential of machine learning techniques, such as support vector machines (SVM) and random forests, for detecting and classifying structural anomalies in large dams [12]. These methods, while promising, have not yet been extensively applied to small-scale reservoirs. The specific challenges of small-scale reservoirs, such as their varied construction standards, diverse environmental settings, and unique hydrological conditions, necessitate the development of more tailored and sophisticated monitoring techniques [13,14].

Deep learning, particularly convolutional neural networks (CNNs), has emerged as a powerful tool for image-based structural defect detection due to its ability to automatically learn and extract hierarchical features from complex datasets [15]. CNN-based methods have been successfully applied in various fields, including urban infrastructure monitoring [16], crack detection in concrete surfaces [17], and landslide mapping [18]. These studies demonstrate that deep learning models, especially when combined with high-resolution imagery, can provide detailed and accurate assessments of structural conditions. Semantic segmentation, a subfield of deep learning, has proven particularly effective in detecting and categorizing structural defects in various settings [19]. Unlike traditional image classification, which assigns a single label to an entire image, semantic segmentation provides pixel-wise classification, enabling precise localization and characterization of defects [20]. This capability is crucial for identifying and analyzing multiple types of hazards in small-scale reservoirs, where defects may vary significantly in size, shape, and context [21]. While there have been significant advancements in the application of deep learning for infrastructure monitoring, most studies have focused on large-scale structures or urban environments. There is a noticeable gap in the application of these technologies to small-scale reservoirs, which are characterized by unique hazard patterns that differ significantly in spatial scale and temporal dynamics from those of larger infrastructure [22]. Furthermore, the integration of deep learning models with multi-source data, such as hydrological, geological, and meteorological data, for comprehensive risk assessment and prediction remains underexplored [23]. Then, devices equipped with high-resolution visual sensors, such as industrial cameras, underwater robots, and unmanned aerial vehicles (UAVs), are used to collect a large number of reservoir surface images. Feng et al. [24] proposed a method for classifying reservoir damage based on convolutional neural networks (CNNs) and transfer learning. Huang et al. [25] improved region-based CNNs to classify and locate multiple reservoir damage regions and achieved accurate results. Zhao et al. [26] proposed excellent reservoir damage detection methods using the improved You Only Look Once v5 (YOLOv5) and a three-dimensional (3D) photogrammetric reconstruction method. Semantic segmentation networks, which can classify each pixel in an image, have been studied to achieve accurate pixel-level dam crack detection [27,28,29]. Li et al. [30] proposed a digital twin method for the structural health monitoring of underground pipelines. Underwater structural defects, including cracks and tunnel defects, were detected pixel-wise using DL techniques [31,32,33].

This study aims to bridge this gap by developing a novel semantic segmentation-based framework for small-scale reservoir safety monitoring. The proposed framework utilizes a combination of deep learning techniques and multi-source data fusion to achieve quantitative extraction of key warning indicators, such as cracks, seepage areas, and surface deformation. This approach integrates image data with conventional monitoring datasets to establish a robust model for real-time hazard evaluation and prediction, enhancing early warning capabilities and operational safety [34]. Our approach involves the development of a deep learning model based on semantic segmentation to identify and categorize various types of structural hazards in small-scale reservoirs. This model leverages multi-source data, including UAV imagery, multispectral and infrared data, and environmental sensor data, to capture the diverse range of hazard indicators present in reservoir environments [35]. We utilize U-Net architecture, a popular choice in semantic segmentation tasks due to its ability to retain fine spatial details while capturing global contextual information [36]. To enhance the robustness and accuracy of the model, we integrate domain adaptation techniques to handle the variability in data quality and environmental conditions. Furthermore, we employ data augmentation strategies, such as random cropping, flipping, and brightness adjustment, to increase the diversity of the training data and improve model generalization.

The proposed framework offers several significant contributions to the field of reservoir safety monitoring. First, it provides a comprehensive methodology for the automatic detection and classification of structural hazards in small-scale reservoirs, addressing a critical gap in current research. Second, by integrating multi-source data, our approach enables a more holistic assessment of reservoir safety, capturing a broader range of hazard indicators and providing more accurate predictions of potential failures. Moreover, the development of a real-time hazard evaluation and prediction model has practical implications for improving the operational safety of small-scale reservoirs. The ability to detect and respond to hazards promptly can significantly reduce the risk of failure, minimize damage, and ensure the continued provision of essential water services. This is particularly relevant in regions such as Jiangxi Province, where small-scale reservoirs play a critical role in supporting socio-economic development and maintaining ecological balance. By leveraging semantic segmentation techniques and multi-source data fusion, we aim to provide a more accurate and comprehensive assessment of reservoir safety, enhancing early warning capabilities and contributing to the sustainable management of water resources. This research not only addresses existing gaps in the literature but also offers a practical solution to the challenges faced by small-scale reservoir engineering in China and beyond.

2. Construction of Hidden Hazards Dataset for Small Reservoir Engineering

2.1. Causes of Hidden Hazards in Small-Scale Reservoirs

Small-scale reservoirs, as a critical component of water resource management, have various causes of safety hazards. These mainly involve the following aspects:

Design and Construction Deficiencies: Many small-scale reservoirs lack standardization in their initial design and construction stages, especially those built during the mid-20th century’s large-scale construction efforts. Common issues include non-standardized design, poor-quality construction materials, and outdated construction techniques. These factors make the reservoirs susceptible to structural damage over time or under extreme weather conditions, leading to potential hazards such as dam cracks, seepage, and landslides.

Material and Structural Issues: The construction of small-scale reservoirs often relies on locally available materials, which may not meet durability and stability standards. Additionally, the aging and erosion of dam structures exacerbate material deterioration, triggering hazards. For instance, earth and concrete dams are prone to cracks and seepage due to weathering and chemical erosion.

Climate Change and Extreme Weather Events: Global climate change has increased the frequency of extreme weather events such as heavy rainfall and floods, imposing greater water pressure and flood risks on small-scale reservoirs. This risk is particularly pronounced in mountainous and hilly regions, where extreme weather can destabilize the water body, increasing the risk of dam collapse or piping. Common hidden hazards in small-scale reservoir engineering include cracks, collapse, leaching, and seepage, as shown in Figure 1.

2.2. Image Data Sources for Hidden Hazard Detection

To accommodate the actual environment and hardware capabilities, sample images were captured using handheld devices and subsequently cropped to sizes suitable for processing. The CMOS image sensor is a critical factor determining the quality of the captured samples, with key performance indicators including the ability to restore received images, which depends on pixel array parameters. During image capture, the most direct parameters are sensor size, effective pixels, and shutter speed. The handheld device used is equipped with a 1/1.7-type Sony IMX600 image sensor with 40 M square active pixels (manufactured by Sony Corporation, Tokyo, Japan). Its CMOS image sensor achieves high-speed image capture through a column-parallel AD converter circuit and high sensitivity and low noise through a backside-illuminated pixel structure. It utilizes an R, G, and B color mosaic filter and incorporates lens shading correction, with dual sensors capable of synchronous operation. The IMX600 image sensor parameters are detailed in Table 1.

In order to further understand the hidden hazards, the Jiangxi Provincial Government conducted a survey on all small reservoirs in the province. According to the survey, among the 11 prefecture-level cities (including directly administered counties) in the province, 33 earth and stone dams in 8 prefecture-level cities including Nanchang, Jiujiang, Shangrao, Fuzhou, Yichun, Ji’an, Pingxiang, and Jingdezhen have cracks, as shown in Table 2.

According to the investigation, among the cracks in 33 dams, 79 cracks were investigated, as shown in Table 3.

Cracks, as the most common hidden danger, have a significant impact on the safety of engineering in small reservoirs. Therefore, it is necessary to use image recognition combined with experimental data to study the development law of cracks. Figure 2 shows the testing of cracks.

On the other hand, the leaching and seepage of small reservoirs caused by cracks or collapses are also worthy of the attention. The common path of reservoir dam failure is as follows: flood → seepage damage → landslide → dam crest lowering → no manual rescue intervention → dam collapse, as shown in Figure 3. A time series correlation diagram of seepage under rainfall conditions is shown in Figure 4, according to the survey.

Under rainfall conditions, small reservoirs are particularly vulnerable to seepage, a critical safety concern that can lead to structural instability and potential failure. When rainwater infiltrates the embankments or foundations of a reservoir, it increases the pore water pressure within the soil, reducing the shear strength and potentially leading to internal erosion or piping. Seepage pathways can develop more easily during intense or prolonged rainfall, especially in areas with poor compaction or material defects. In small reservoirs, which often lack the sophisticated monitoring systems of larger dams, seepage can remain undetected until it reaches a critical stage. Signs of seepage include wet spots or boils on downstream slopes, increased water turbidity, or unexpected drops in reservoir water levels. In the context of heavy rainfall, these signs may intensify rapidly, as the hydraulic gradient between the reservoir water and the downstream area increases, forcing more water through any existing or newly formed cracks or weak zones.

Managing seepage under such conditions requires continuous monitoring and timely intervention. Utilizing advanced technologies, such as UAVs and deep learning-based image analysis, can help identify early signs of seepage, allowing for proactive maintenance and repairs. Additionally, employing geotechnical instruments, such as piezometers and seepage meters, can provide real-time data to assess the extent of seepage and guide decision making to prevent structural failure and ensure the safety of the reservoir.

2.3. Dataset Construction

2.3.1. Data Collection

Data were collected from multiple sources, including UAV images, sensor data, and fixed camera data, ensuring coverage across sufficient time and spatial ranges, particularly in areas with high hazard occurrences. The target for image sampling was small-scale reservoir infrastructure in Jiangxi Province, focusing on cracks, seepage, and collapses of varying sizes and positions, with the smallest hazards (cracks) appearing as a single pixel in the image.

2.3.2. Data Preprocessing

Collected image data underwent data augmentation methods (e.g., rotation, translation, scaling) to expand the dataset and enhance model generalization. The original image samples were sized at 3648 × 2736 pixels, with 300 images containing hazards selected after screening. These images were first cropped to a size of 3648 × 1824 pixels for easier processing. Subsequently, the 3648 × 1824 images were horizontally cropped into 912 × 912 pixel images, and additional 912 × 912 pixel images were obtained through random cropping after rotating the original images. In total, 4800 images containing both hazard and non-hazard areas were produced. These 912 × 912 images were then uniformly resized to 256 × 256 pixels. After a second round of filtering, the images were categorized into two sets: one containing only background and the other containing hazards. The dataset construction involved selecting images with hazards, resulting in a final dataset of 4000 valid samples, which will undergo manual annotation in the next step.

2.3.3. Manual Annotation

Based on the image data, hazards such as cracks, collapses, and seepage were labeled, including their locations and severity. The open-source LabelImg tool was used for manual annotation, offering several annotation modes: classification, bounding box, segmentation, and brush-based segmentation. Given the significant differences among hazards, such as the fine and narrow cracks, a brush mode was used to color crack areas while leaving non-crack areas uncolored. Conversely, for larger areas like collapse zones, a bounding box mode was employed. The tool outputs labels in 32-bit RGB image format, with unannotated areas not displayed. Post-processing is needed to convert these labels for model input. Annotation is the most time-consuming step in dataset creation. Unlike common object detection tasks that require only drawing a rectangular bounding box around the object, semantic segmentation annotation for images requires marking the entire object area. This is particularly challenging for crack images, where cracks are thin and small, necessitating careful adjustment of brush width and frequent erasure of excess annotations. An example of the original crack image and the output label style from the LabelImg tool is shown in Figure 5. The annotated image is a 24-bit RGB three-channel image, with crack areas marked in red [255, 0, 0] and the background in transparent pixels [0, 0, 0]. The resulting 32-bit label image is not optimal for computer processing. Given that the pixel data in this study falls into two categories, the 32-bit label images will be converted to 8-bit images for easier data reading in subsequent processing.

The processed dataset was divided into training, validation, and test sets to ensure model training, validation, and testing on different data, enhancing model reliability and stability.

3. Deep Learning Based Semantic Segmentation Model for Hidden Hazards Images

3.1. Network Architecture for Hazard Segmentation

The proposed high-performance semantic segmentation method for detecting hazards in images of small-scale reservoirs is illustrated in Figure 6. The network comprises three main components: an encoder, a decoder, and a classifier. The overall structure consists of 12 encoding layers, 12 decoding layers, and 1 classification layer, making a total of 25 layers, which we have named HHSN-25 (Hidden Hazard Segmentation Network with 25 layers). The encoder structure is built from convolutional blocks (Conv) and residual blocks (Block). The Conv block comprises a 2D convolution (Conv2d), Batch Normalization (BN), and a GELU activation function, while the Block contains two ODConv units. Each ODConv unit is made up of an omni-dimensional dynamic convolution (ODConv2d), BN, and GELU activation. The primary role of the encoder is to reduce feature dimensions and increase network depth, thereby expanding the receptive field to capture more hazard-related features.

The decoder consists of transposed convolutional blocks (TConv) and Blocks. The TConv block includes a 2D transposed convolution (ConvTranspose2d), BN, and GELU activation function. The main purpose of the decoder is to recover the original image dimensions while further extracting hazard features, merging multi-dimensional features from various stages of the encoder for pixel-level classification. The classification layer is a 2D convolutional layer that transforms the dimensions along the channel direction to match the number of classes, with a SoftMax function used to predict the probability of each pixel belonging to a specific class. In the HHSN-25 structure, parameters for each sub-module are specified as module name, output channels, kernel size, stride.

In the HHSN-25 architecture, the encoding and decoding layers work together to effectively capture and reconstruct features from input data. The encoder progressively reduces the spatial dimensions of the input while simultaneously increasing feature depth, which allows it to extract high-level features and latent patterns pertinent to hidden hazards in reservoir engineering. This process transforms the raw input data into a compact, abstract representation, which is useful for efficient processing. The decoder then takes this compressed representation and gradually reconstructs the spatial resolution through a series of upsampling operations. It combines information from both the encoding and decoding layers at various stages through skip connections. These connections retain important spatial details lost during downsampling, thereby ensuring that fine-grained information is preserved during reconstruction. This collaborative interaction between encoding and decoding helps the model retain both high-level features and detailed spatial information, which is essential for accurately segmenting and identifying hidden hazards with high precision and stability.

3.1.1. Feature Dimension Transformation in the Hazard Segmentation Network

The transformation of feature dimensions in a typical deep learning network involves two aspects: changes in feature map size and feature map count. Regarding feature map size, convolution and transposed convolution operations are primarily responsible for altering dimensions. The convolution and transposed convolution processes for 2D images are shown in Figure 7.

Figure 7a demonstrates the process where a feature map of size 4 × 4 is convolved with a kernel size of 3, stride of 1, and no padding, resulting in a 2 × 2 feature map. Here, the stride represents the number of pixels the kernel moves at each step, and padding refers to adding blank pixels around the edges to accommodate the desired size. Figure 7b illustrates the reverse, where a 2 × 2 feature map is transformed via transposed convolution with a kernel size of 2, stride of 1, and padding of 1 to produce a 4 × 4 feature map. The input–output relationship for convolution and transposed convolution is given by specific equations:

o_{c o n v} = \frac{i - k + 2 p}{s} + 1

(1)

o_{t c o n v} = s (i - 1) - 2 p + k + [\frac{o + 2 p - k}{s}]

(2)

where o_conv and o_tconv represent the output size, and i is the input size, k is the kernel size, s is the stride, and p is the padding. The second aspect of dimension transformation concerns the change in the number of channels. Increasing the number of channels while downsampling along spatial dimensions is crucial for network performance, as it enhances the diversity of high-level features. In HHSN-25, the encoder increases the number of channels as the feature map dimensions decrease, while the decoder decreases the number of channels as the feature map dimensions are restored. This design adheres to the feature pyramid principle, optimizing memory use.

3.1.2. Basic Feature Extraction Structure of the Hazard Segmentation Network

Increasing the network depth helps expand the receptive field and extract higher-level features. Since the introduction of ResNet, skip connections have become the dominant structure in deep learning, such as in ConvNeXt, which improves network performance by using skip connections and increasing kernel size. The basic feature extraction structure (Block) in HHSN-25 adopts the skip connection concept from ResNet, integrating omni-dimensional dynamic convolution and experimentally verified optimal kernel sizes.

The ResNet structure consists of two convolutions with a kernel size of 3. The first convolution is followed by BN and a ReLU activation function; the second is followed by BN, with the skip connection then linked to a ReLU activation function, keeping the feature map size unchanged throughout. In ConvNeXt, the input first passes through a depthwise convolution with a kernel size of 7 and Layer Normalization (LN), followed by a pointwise convolution with a kernel size of 1 to increase the channel count fourfold, activated by GELU. A final pointwise convolution restores the channel count before the skip connection. The Block in our study applies two omni-dimensional dynamic convolutions with kernel sizes of 3 and 5, followed by skip connections, each convolution operation incorporating BN and GELU activation. The omni-dimensional dynamic convolution introduces a multi-dimensional attention mechanism, employing a parallel strategy to learn different attentions across four dimensions in kernel space. The basic structure of the omni-dimensional dynamic convolution used in this study is depicted in Figure 8.

This convolution utilizes an SE-style attention module with multiple heads to compute various types of attention. For input data, global average pooling, fully connected layers, and ReLU activation functions reduce it to a one-dimensional feature vector, which is then processed by four fully connected heads to produce different attention values. The four attention dimensions focus on position, channel, filter, and kernel, capturing richer contextual information. The omni-dimensional dynamic convolution utilizes a new multi-dimensional attention mechanism, calculating four types of attention across all four dimensions in parallel within the kernel space:

y = (α_{w 1} ⊙ α_{f 1} ⊙ α_{c 1} ⊙ α_{s 1} ⊙ W_{1} + \dots α_{w n} ⊙ α_{f n} ⊙ α_{c n} ⊙ α_{s n} ⊙ W_{n}) \times x

(3)

where α_w₁ represents the attention scalar of the convolutional kernel, α_s₁, α_c₁, and α_f₁ represent the three newly introduced attention points, which are calculated along the spatial dimension, input channel dimension, and output channel dimension of the convolutional kernel. ⊙ represents the multiplication operation along different dimensions of the kernel space.

3.2. Loss Function and Improvements for the Hazard Segmentation Network

Among various deep learning-based image semantic segmentation methods, Cross Entropy (CE) is the most commonly used function for loss calculation, analyzing each pixel separately and comparing predicted values for each pixel class with the ground truth. This approach assigns the same weight to every pixel during loss calculation, which may not be ideal for segmentation tasks involving imbalanced pixel distributions. The CE loss function expression is as follows:

C E_{L o s s} = - \sum_{c = 1}^{M} y_{c} \log (p_{c})

(4)

where M represents the number of categories, and y_c is a one hot vector with element values of only 0 and 1, and p_c represents the probability that the predicted sample belongs to class c. When there are only two categories, the Binary Cross Entropy (BCE) loss function can be expressed as:

B C E_{L o s s} = - \sum_{c = 1}^{N} [y_{c} \log (p_{c}) + (1 - y_{c}) \log (1 - p_{c})]

(5)

where p_c represents model input and y_c represents real labels. The Binary Cross Entropy with Logits (BCEL) loss function is a loss function that combines a Sigmoid layer and BCE loss into a single category. Shrivastava et al. [20] proposed an algorithm for online hard example mining (Ohem) to address the issue of imbalanced positive and negative samples. The OhemCE loss function first calculates the cross-entropy loss, then selects difficult samples based on the loss, and it finally applies higher weights to them in subsequent training. The Intersection over Union (IoU) loss function is commonly used in object detection, which reflects the ratio of the intersection and union of annotated and predicted values. It is also commonly used for loss calculation in image semantic segmentation. The loss expression of IoU is as follows:

I o U_{L o s s} = 1 - \frac{A \cap B}{A \cup B}

(6)

Focal Loss (FL) is commonly used to deal with class imbalance. It determines the weight of class-related losses based on the difficulty of distinguishing class samples, that is, applying smaller weights to easily distinguishable samples and larger weights to difficult to distinguish samples. The FL expression is as follows:

F o c a l_{L o s s} = - \sum_{c = 1}^{M} {(1 - p_{c})}^{γ} y_{c} \log p_{c}

(7)

FL adds a weighting coefficient

{(1 - p_{c})}^{γ}

before the standard CE function. The Dice loss function is named after the Dice coefficient, which is a metric used to evaluate the similarity between two samples and can effectively represent their similarity. The Dice loss function is expressed as follows:

D i c e_{L o s s} = 1 - 2 \times \frac{|A \cap B|}{|A| + |B|}

(8)

where A represents the pixel labels of the actual segmented image, and B represents the pixel class prediction of the segmented image by the model. Considering the disproportionate nature of hazard pixels and background pixels in crack images, this chapter uses FL combined with Dice coefficient for loss calculation, which is expressed as follows:

L o s s = 1 - \sum_{c = 1}^{M} {(1 - p_{c})}^{γ} y_{c} \log p_{c} - \frac{2 I + ε}{U + ε}

(9)

where I represents the intersection of correctly identified pixels, U represents the union of prediction and label, and ε is the smoothing coefficient.

4. Results

4.1. Experimental Environment and Evaluation Indicators

All models in this topic are trained and verified on the Windows11 operating system. The hardware configuration is an AMD Ryzen 7 series 5800H processor, equipped with 16 GB running memory, and the display card is NVIDIA RTX3060 mobile terminal platform equipped with 6 GB display memory. The deep learning framework is the pytorch, CUDA 11.6 environment. Furthermore, due to memory limitations, the number of multiple threads and the size of batch processing were set to 4, and different sets of hyperparameters were analyzed during training to select the best validation model configuration. Each dataset was split into 80% (3200 images) for training, 10% (400 images) for validation, and 10% (400 images) for testing. All image input sizes used in the experiment were set to 256 × 256.

The easiest way to do this is to calculate how many pixels are being correctly segmented, but this is not enough. The essence of crack image segmentation is to evaluate each pixel using a classifier, so the result of a pixel consists of two kinds: correct and wrong. Since the hidden danger dataset is a dataset with unbalanced category pixels, if the average accuracy is simply used as the evaluation index, the accuracy of the hidden danger pixels will be covered by the accuracy of the background pixels, and the results will not be well observed. In this experiment, the average intersection ratio (mean Intersection over Union, mIoU) was used as the main performance evaluation index, and the precision rate (P), recall rate (R), F1, accuracy, and Mean Pixel Accuracy (mPA) score were also considered [15]. These indicators were mainly obtained from the confusion matrix, which can be expressed by the true case (TP), true negative case (TN), false positive case (FP), and false negative case (FN).

4.2. Comparison of the Experimental Results

4.2.1. Basic Feature Extraction Structure Block

For small hidden danger pixels (such as cracks), theoretically, only a small window is needed to identify their features, and the existing hidden danger image segmentation methods are mostly improved based on the current mainstream methods, among which a large number of methods to expand the field may be too radical. The appropriate structure of the convolutional neural network, mainly by expanding the convolution kernel and deepening the network depth expand feeling field, can be studied, and the basic feature extraction structure Block using the performance of the kernel experiment can then be studied with the introduction of the full-dimensional dynamic convolution model. The comparison of Block structure performance with the performance of several other mainstream network modules was also proposed. The effect of different convolution kernel sizes in Block on the network is shown in Figure 9, where k values in turn represent the convolution kernel sizes of the two convolutional layers in Block. From the experiment, we can first see that the effect of using full-dimensional dynamic convolution (ODConv) is better than that of ordinary convolution. Second, the effect is better when the k values are taken successively at 3 and 5. In addition, the actual mIoU gap in the three sets of experiments using full-dimensional dynamic convolution is less obvious than the ordinary convolution, which further proves that the attention of different dimensions improves the network performance.

The performance of ResNet, MobileNetV3 and ConvNet feature extracted structures ResBlock, MobileBlock, ConvNeXt and Block in HHSN-25 is shown in Figure 10. It can be seen that ConvNeXt structure has the lowest performance in the crack image segmentation task, while the performance of Block phase is significantly improved compared with other structures.

4.2.2. Comparison of Changes in Different Network Widths in Decoding Structures

The pyramid structure is a pattern of network width that has been proven to be suitable in convolutional neural networks through research. In actual network construction, feature maps are usually upsampled and downsampled for dimensionality reduction, and the number of channels decreases and doubles accordingly. In HHSN-25, if the number of feature image channels recovered during the decoding process corresponds to the encoding structure, some hidden features may be lost during this process, but the additional feature channels will increase the computational resource consumption of the network. Therefore, it is necessary to study the variation law of features in the channel dimension during the decoding stage. As shown in Figure 11, the comparison of the changes in four different network widths is presented, which is reflected by the number of feature channels during each dimensionality increase. The horizontal axis represents the number of feature channels output after four dimensionality increase operations during the decoding process. The experimental results showed that the network performance was optimal when the number of channels was 128, 96, 64, and 40, respectively. In this feature fusion strategy, the number of channels was doubled before and after each dimensionality upgrade, while the results with the number of channels 64, 32, 16, and 8 were significantly lower than the other three results. The reason for this is that excessive reduction in the number of channels during dimensionality upgrade resulted in fewer hidden feature information being included in the fewer channels.

4.2.3. The Influence of Different Loss Functions on the Experimental Results

The performance comparison of HHSN-25 network using different loss functions is shown in Figure 12. It can be seen that due to the improved loss function focusing on the similarity of different samples, the attention to different samples is different. Using the improved loss function in this project for loss calculation yields the best results, significantly higher than the results of other loss functions.

4.2.4. The Performance Comparison of Different Segmentation Methods

To verify the effectiveness of the HHSN-25 network proposed in this chapter, its experimental results were compared with those of other segmentation methods, as shown in Figure 13. It can be seen that in the graphical data of the main evaluation indicator mIoU, HHSN-25 is significantly higher than other methods, reaching 87.00%, FCN is 85.00%, SegNet is 85.00%, Deeplabv3+ is 80.50%, Unet is 85.50%, Unet++ is 85.55%, Swin-Unet is 82.00%, and PSPNet is 84.50%. In addition, among the other commonly used evaluation indicators in Figure 13, HHSN-25 outperforms other methods.

The output class labels of the semantic segmentation network are: crack, collapse, leaching, and seepage. We have referenced studies by Feng et al. (2019), which also analyzed similar damage types of cracks and collapse in dams and reservoirs, reporting detection accuracy of 90%, approximately. These studies have been compared to our approach, which achieves an average accuracy of 98%, demonstrating improvement.

Meanwhile, from Figure 13, it can be seen that there is almost no difference in the accuracy index of these methods. This is because the number of hidden pixels is much less than that of background pixels. When calculating them together, the large base of background pixels will weaken the expression ability of this index for hidden pixels. This also explains why accuracy was not used as the main evaluation index in this project. When calculating the F1 score and mPA index, the indicators of hidden and background pixels are first calculated separately, and then the average value is taken. Therefore, it can better reflect the overall classification effect of background pixels and hidden pixels and is also within the reference range for evaluating the performance of the network.

4.2.5. Other Experimental Results

The loss function can be used to better describe the difference between the predicted value and the true value of the model, and the smaller the loss value, the better the performance of the model. During the training process of the model, the loss value decreases with the increase in training rounds until convergence, at which point the model usually achieves good results. The convergence of the loss function during the experiment is affected by factors such as data, network, and environment. As shown in Figure 14, the loss value of HHSN-25 during training shows the trend in change with the number of training rounds. It can be seen that the loss value has hardly changed by the time it reaches 50 rounds of training, which also explains the reason for choosing 50 rounds of training in this experiment. If the number of training rounds is too small, the model may not converge, resulting in poor performance and cannot be used as a basis for comparing the final experimental results. If there are too many training rounds, on the one hand, it will consume more time, and on the other hand, the network performance improvement will not be significant, and it may cause overfitting.

In addition, due to the different loss functions of each method during training, even if their values are scaled to the same range, they cannot be used as a basis for comparison due to the different calculation methods. Figure 14 is only used as a training network, and the main means of measuring the performance of these methods is still to compare the quality of mIoU values, reflecting the final segmentation effect.

5. Discussion and Application

Based on the image semantic segmentation method framework, we extracted hidden hazard image features and quantified them. The features were as follows:

Crack length and width

Find the boundary of the target object in the segmentation result (edge detection); Use the bounding box or fitting ellipse to fit the target object and obtain its length and width.

Use the height and width of the bounding box to represent the length and width of the target object.

Leaching area, leakage area, and collapse area

Calculate the number of pixels within the target object area (the area of the segmentation region).

Convert pixel area to actual area (given the spatial resolution of the image, i.e., the actual physical size corresponding to each pixel).

Area = number of pixels in the segmented area multiplied by the actual area per pixel.

In the context of transfer learning or model adaptation for the HHSN-25 architecture, the process involves initially training the model on a large, general dataset to learn common features that are transferable across tasks. This is followed by fine-tuning on a specific reservoir engineering dataset to adapt the model to the domain-specific characteristics of hidden hazards in small-scale reservoirs. During this fine-tuning phase, the model selectively updates the weights of certain layers to refine its understanding of unique hazard features, while preserving foundational patterns acquired from the initial training.

To ensure compatibility with reservoir-specific image data, transfer learning also involves adapting feature extraction layers, such as ResNet or ConvNeXt, through incremental training. This helps the model to better capture the localized textures and anomalies associated with different hazards. By leveraging transfer learning, the model can effectively generalize from limited labeled data and reduce training time, as the model starts with a base of pre-learned representations. This approach is particularly useful when training data for certain hazards are scarce, as it enables the model to make informed predictions based on previously learned knowledge from related tasks.

5.1. Integration of Hidden Hazard Feature Extraction Algorithm

The trained image hidden feature extraction algorithm has been integrated into edge devices and carried out demonstration applications. During actual operation, the collected images need to be clear enough to identify small structural changes such as cracks or leaks. The camera equipment needs to be able to capture the target area video clearly within a certain distance, with a distance of less than 50 m. As such, a high-definition (higher than 1080p resolution) camera has been used to ensure rich details in the image. This ensures that the data collected under different environmental conditions is of sufficient quality, so that the model can accurately detect and identify engineering hazards.

The HHSN-25 model can be integrated into existing monitoring systems by running on edge devices for real-time hazard detection or cloud-based systems for large-scale monitoring. The HHSN-25 model is not only versatile in terms of its deployment options but also boasts a robust feature set that enhances its capabilities. When integrated into edge devices, it leverages local processing power to promptly identify potential hazards, ensuring a swift response to any emerging threats. This real-time detection feature is crucial for applications that require immediate action, such as in industrial safety monitoring or disaster prevention systems. On the other hand, when deployed in cloud-based systems, the HHSN-25 model can harness the vast computational resources of the cloud to perform large-scale monitoring across wide geographical areas. This scalability allows it to process and analyze vast amounts of data from multiple sources, providing a comprehensive overview of hazard situations and facilitating decision making at a macro level. Moreover, the HHSN-25 model is designed to be highly adaptable, enabling it to integrate seamlessly with various sensor types and data streams. This adaptability ensures that the model can be tailored to specific use cases and requirements, maximizing its effectiveness in different monitoring scenarios. Furthermore, the model incorporates advanced machine learning algorithms that enable it to continuously learn and improve over time. As it is exposed to more data and hazard situations, the HHSN-25 model becomes increasingly proficient in detecting and classifying hazards, enhancing its overall accuracy and reliability. In summary, the HHSN-25 model offers a versatile and powerful solution for hazard detection and monitoring. Whether deployed on edge devices for real-time detection or in cloud-based systems for large-scale monitoring, the model’s robust feature set, adaptability, and continuous learning capabilities make it an invaluable tool for ensuring safety and security in a wide range of applications.

Through the edge computing architecture, this algorithm realizes real-time monitoring and analysis, and it reduces the total time from data acquisition to hidden danger detection and alarm from the original average of 5 min to 40 s (reduced by about 86%). The high-performance embedded processor NVIDIA Jetson AGX Xavier and software configuration (TensorFlow, PyTorch) are specially designed for edge computing, so that it has efficient processing capacity and low energy consumption and is suitable for real-time data processing and analysis. In addition, the edge computing architecture enables data processing to be completed locally, reducing dependence on the cloud, improving response speed and privacy protection. Comparing the adaptive detection model developed in this project with the mainstream method “Edge AI and Machine Learning”, Table 4 can be obtained. The results indicate that the differences between the two are mainly reflected in the data processing and resource requirements during model training and inference computation. The deep learning method of the detection model developed in this project does not rely on manually annotated data, and, thus, reduces the need for annotated data when training on edge devices. After the model training is completed and integrated, the inference calculation process is relatively fast, and the computation time is relatively short. In addition, the multimodal data fusion and adaptive deep learning model have improved the anti-interference ability of the research and development model in complex environments, enabling it to dynamically adjust and adapt to sudden changes in the environment, and play an advantage in scenarios where data annotation is difficult. The training process of Edge AI and Machine Learning is usually completed in the cloud. When reasoning on edge devices, due to the use of pre trained models and optimization techniques, the performance in dealing with sudden changes in environments (such as night, rain, and fog) is weak and the adaptability is poor.

Precision and recall vary across hazard types. In this study, the model’s precision and recall were evaluated for specific hidden hazard classes: cracks, collapse, leaching, and seepage. Each hazard type demonstrated slightly different characteristics in terms of these metrics, reflecting the model’s capability to differentiate and accurately identify distinct types of anomalies.

Cracks: The model achieved a high precision of 97% in detecting cracks, indicating that false positives were minimal. The recall for cracks was also robust, at approximately 94%, suggesting strong detection capability, although some cracks might not have been detected under certain conditions, such as in images with varying light or debris covering parts of the crack.

Collapse: For collapse hazards, the precision was around 98%, while recall was slightly lower at 93%. This indicates that the model effectively minimized false alarms, though a few instances of collapse might have been missed due to occlusions or the complex visual nature of collapsed structures.

Leaching: The precision for leaching was noted at 99%, with a recall of 95%. This shows that the model is particularly adept at identifying leaching without mistakenly classifying other anomalies as leaching. The high recall value also demonstrates the model’s ability to consistently recognize leaching occurrences, even in cases where staining or discoloration might present challenges.

Seepage: Seepage detection scored an impressive precision of 98% and recall of 96%. These figures reflect the model’s proficiency in accurately distinguishing seepage hazards, with few false negatives or positives. The characteristics of seepage—such as moisture patterns or soil discoloration—are distinct and were well captured by the model.

5.2. Algorithm Promotion and Practical Application

To sum up, the project not only significantly improves the accuracy and recall rate of hidden danger detection of small reservoir projects but also greatly shortens the detection and response time and significantly improves the safety and operation efficiency of reservoirs by integrating edge computing hardware and software configuration, optimizing response speed and adopting advanced multi-source data fusion and image segmentation adaptive models. At the same time, the image hidden danger feature extraction algorithm has been widely applied in multiple typical reservoirs, including 25 small reservoirs in Jiangxi Province, including Chookeng Reservoir, Jutang Tuanjie Reservoir, Xiashitang Reservoir, Gaokeng Reservoir, Dongzhan Reservoir, Linjiang Zoujia Reservoir, Zhangshu City Daqiao Dam, Yangqiao Tientang Reservoir, Lianhua County Reservoir, Changlan Maqing Dam, Xixi Reservoir, Xiaoshankou Reservoir, Mengtang Reservoir, etc. The application effect of a typical reservoir is shown in Figure 15, Figure 16, Figure 17 and Figure 18.

On the basis of existing research, further exploration and expansion of the application scope of this project will bring more potential improvement space and innovative value to the safety monitoring and management of water conservancy engineering. Firstly, the existing technological framework can be extended to larger scale water conservancy facilities such as reservoirs, rivers, and gates, especially in areas with complex geographical locations and variable climate conditions. By increasing the adaptability to diverse data sources, the robustness and universality of the model can be improved in different environments and conditions. To further improve system performance, future research can optimize in the following areas:

Introducing deep learning and transfer learning techniques: Combining existing multi-source data fusion and adaptive image segmentation models, introducing deep learning frameworks, especially convolutional neural networks (CNN) and structured data processing methods based on graph neural networks (GNN), to improve the depth and accuracy of feature extraction. Meanwhile, transfer learning techniques can optimize existing models, reduce reliance on large-scale annotated data, and accelerate model deployment and application.

Enhance real-time data processing capability: Although edge computing has significantly improved data processing speed, it can further combine 5G technology and distributed computing methods to enhance the real-time data transmission and processing capability of the system, ensure that a large amount of data can be acquired and processed in time when emergencies occur, and improve the timeliness and accuracy of decision responses.

Developing multimodal data fusion methods: Currently, multi-source data fusion mainly focuses on the integration of visual image data and sensor data. In the future, more types of sensors (such as acoustics, radar, optics, etc.) can be introduced, combined with multimodal data sources such as meteorological data and historical disaster records, to construct a more comprehensive hazard monitoring model. Through multimodal information fusion, potential risk factors in complex environments can be better understood, and the ability to identify and warn of hidden dangers in small reservoirs can be improved.

Improved data management and analysis capabilities: In terms of data management, future research can develop and apply more efficient data storage, retrieval, and analysis tools to meet the storage and analysis needs of large-scale, multi type data. By utilizing data lake and data warehouse technology, data from different sources can be integrated into a unified analysis platform, and more valuable hidden danger information and patterns can be extracted from it with the help of artificial intelligence and machine learning algorithms.

Intelligent management and decision support: Through further algorithm optimization and data mining, this system can not only be used for detection and warning, but also provide intelligent management suggestions and decision support for reservoir managers. For example, by combining historical data and real-time monitoring data, the model can generate dynamic risk assessment reports, provide emergency response plans for different levels of hidden dangers, and help decision-makers manage the safe operation of reservoirs more scientifically.

5.3. Model’s Performance with Different Hyperparameters

Hyperparameters play a crucial role in determining both the performance and stability of deep learning models, including our hazard detection framework. Variations in key hyperparameters, such as learning rate and batch size, can significantly affect the model’s ability to generalize, converge, and remain stable during training.

5.3.1. Learning Rate

The learning rate is a critical hyperparameter that directly influences how quickly the model adapts to new data. A high learning rate can lead to rapid learning but might cause instability, resulting in the model overshooting the optimal parameters, failing to converge, or converging prematurely to a suboptimal solution. Conversely, a low learning rate stabilizes the learning process but might prolong training and lead to overfitting as the model adapts too slowly. In our tests, the learning rate critically impacts the convergence speed and the stability of training. During our experiments, a learning rate of 0.001 was found optimal, balancing accuracy and convergence. A lower learning rate (e.g., 0.0001) resulted in slower convergence and prolonged training, requiring around 50 epochs to reach an accuracy plateau. In contrast, a higher learning rate (e.g., 0.01) accelerated convergence but led to unstable training with oscillations in the loss curve, ultimately reducing accuracy by approximately 3–5% due to overshooting. Empirical results show that maintaining the learning rate around 0.001 provided stable convergence while maximizing accuracy, achieving 95% recall and 98% precision across hazard classes.

5.3.2. Batch Size

The batch size determines how many samples are processed before updating the model’s weights. A larger batch size tends to result in more stable gradient estimates but requires more memory, whereas smaller batch sizes may introduce noise into the gradient descent process, potentially destabilizing the training. In our model, smaller batch sizes introduced more variability, but when set too large, it led to slow updates, impacting the convergence speed. Future improvements may involve dynamic batch sizing strategies, where batch sizes adjust according to the training phase. Batch size influences model generalization and computational efficiency. With a batch size of 32, the model achieved the best balance between stability and performance, reaching 96% accuracy after 30 epochs. Smaller batch sizes (e.g., 16) led to more fluctuation in the loss curve, although final accuracy was comparable, at 95%. Larger batch sizes (e.g., 64) reduced training time but led to slightly worse generalization, reducing accuracy by around 2% on unseen data. The selected batch size of 32 provided optimal stability with a consistent learning rate, achieving high recall and precision while maintaining training efficiency.

5.3.3. Optimizer Choice

Optimizers, such as Adam and SGD (Stochastic Gradient Descent), have different impacts on the model’s performance and stability. Adam, with adaptive learning rates, was found to accelerate convergence and stabilize training early on but was occasionally prone to overfitting. On the other hand, SGD provided smoother convergence but required more careful learning rate scheduling. The choice of optimizer, therefore, directly influenced model stability. Optimizer choice affects how effectively the model navigates the loss landscape. We compared three optimizers: Adam, SGD, and RMSprop. Adam achieved the best performance, with 97% accuracy and stable loss reduction across epochs. SGD required a significantly lower learning rate (0.001), resulting in a slower training process and achieving 94% accuracy after 40 epochs. RMSprop offered a balance, with 95% accuracy but a less stable loss curve than Adam. Based on this analysis, Adam was selected for its ability to maintain stability and high accuracy across epochs.

For future work, several additional hyperparameters could be considered for exploration:

Dropout Rate: Dropout is a regularization technique to prevent overfitting by randomly deactivating neurons during training. Future work could experiment with varying dropout rates across layers, especially under conditions where datasets are limited, to enhance stability and prevent overfitting.

Weight Decay: Adjusting weight decay or L2 regularization could be crucial for maintaining model generalization, particularly in complex models prone to overfitting. Fine-tuning this parameter might improve the stability of the model by penalizing excessively large weight values.

Early Stopping Patience: Early stopping halts training once performance ceases to improve. Fine-tuning the patience parameter, which determines how long to wait for improvement before stopping, could help in achieving a balance between underfitting and overfitting, improving model robustness.

Gradient Clipping: This technique helps to prevent exploding gradients, which can lead to instability, especially in deep models. Investigating different thresholds for gradient clipping may ensure smoother and more stable updates to the model parameters.

5.4. Other Significant Hidden Hazards in Small-Scale Reservoir Engineering

In addition to the hazards already included in the dataset, there are several other potential risks in small-scale reservoir engineering that are critical to address for maintaining safety and operational integrity. These hazards include erosion, sedimentation, and issues related to the foundation and surrounding slopes. Each of these hazards presents unique challenges for detection, and future efforts should focus on developing models and methods to address them effectively.

Erosion: Erosion of the reservoir embankments or surrounding areas can lead to gradual weakening of structural integrity. It is crucial to develop techniques that can detect early signs of erosion, such as soil displacement, changes in vegetation, or subtle shifts in the topography. A multi-temporal image analysis using remote sensing data can help identify these patterns. Machine learning models trained on datasets that include erosion-related features can be effective for early warning.

Sedimentation: Sediment buildup in the reservoir reduces water capacity and can lead to operational issues. Identifying sedimentation patterns requires combining bathymetric surveys with surface monitoring. Using underwater sensors and image analysis can assist in detecting sediment accumulation over time. It is important to create a dataset that captures the characteristics of sedimentation processes in various reservoirs.

Foundation Issues: Problems with the foundation, such as subsidence or differential settlement, are harder to detect because they occur below the surface. Geotechnical monitoring, along with ground-penetrating radar or other subsurface imaging techniques, could be combined with visual monitoring data to track any foundation shifts or settlements. Collecting these data requires collaboration with geological surveys and real-time monitoring through embedded sensors.

Surrounding Slopes: Slope instability can lead to landslides or rockfalls, which can severely impact the reservoir structure. Detecting slope hazards involves monitoring changes in vegetation, rock fractures, or moisture levels. UAVs equipped with high-resolution cameras and LiDAR sensors could be used to monitor slope conditions over time. Deep learning models could be trained to identify early signs of slope failure using time-series image data.

Erosion and sedimentation not only affect the structural safety of reservoirs but also have long-term impacts on water capacity and environmental health. Foundation issues can lead to catastrophic failures if left undetected, making real-time monitoring essential for early intervention. Slope instability poses both immediate and long-term risks to reservoir stability, and proactive detection of landslide-prone areas is key to preventing disaster.

To capture these hidden hazards, a combination of historical data (from past reservoir failures), real-time monitoring (using sensors and UAVs), and synthetic data (generated from simulations) is necessary. Engaging with local authorities and field engineers to gather case studies of erosion, sedimentation, and foundation issues will help enrich the dataset. Datasets should be diverse and include images from various weather conditions, seasons, and environmental contexts to improve model robustness. By expanding the dataset and developing hazard-specific models, future systems can better predict and mitigate these risks, ensuring greater safety and operational efficiency for small-scale reservoirs.

6. Conclusions

This study demonstrates the effectiveness of an advanced image processing framework for detecting and assessing safety hazards in small-scale reservoirs using deep learning techniques. By employing a fully convolutional semantic segmentation method with an encoding–decoding structure, the proposed model effectively utilizes convolutional neural networks (CNNs) to enhance detection accuracy and response efficiency. The pyramid structure, validated as optimal for CNNs, was carefully implemented to balance feature extraction and computational resource usage. Experimental results showed that the best network performance was achieved with a specific channel configuration, highlighting the importance of carefully managing channel dimensions during the decoding stage to preserve hidden features without excessive computational costs.

Furthermore, the HHSN-25 network demonstrated superior performance over existing segmentation methods, such as FCN, SegNet, and Deeplabv3+, with a mean Intersection over Union (mIoU) reaching 87.00%. The study also integrated an improved loss function focusing on the similarity of different samples, which proved to be the most effective in enhancing the model’s performance. The chosen evaluation metrics, particularly the F1 score and mPA index, provided a comprehensive assessment of both hidden and background pixel classification, further validating the superiority of the proposed approach.

The convergence analysis of the loss function also highlighted the optimal training rounds required to achieve a balance between model performance and computational efficiency, avoiding issues of underfitting or overfitting. These findings underscore the potential of the proposed approach to provide accurate and real-time hazard detection, significantly contributing to the safety and sustainability of small-scale reservoirs.

The limitations of the proposed method mainly revolve around the model’s dependence on the quality and variety of training data. As small-scale reservoir data are often scarce, the model’s ability to generalize across different conditions (e.g., weather patterns, sedimentation, or vegetation growth) may be limited. Additionally, the computational complexity of deep learning models can be a constraint when deploying on edge devices in real-time applications. Moreover, the accuracy of the model might be affected in extreme environmental conditions, such as heavy rain, fog, or night-time monitoring, where visibility is drastically reduced.

Overall, the proposed framework successfully integrates advanced neural network architectures and data processing strategies to offer a robust solution for reservoir monitoring, with potential applications in other areas of water resource management and infrastructure safety. Further research can explore the integration of transfer learning techniques to improve model accuracy when faced with unseen conditions or locations where reservoir data are limited. Another promising direction is the development of hybrid models that combine deep learning with physical models of reservoir behavior to enhance prediction accuracy. Additionally, research could be directed at improving the model’s performance under extreme conditions, using enhanced image preprocessing techniques like noise reduction, fog removal, and image enhancement.

Author Contributions

Conceptualization, Z.Z. and S.F.; methodology, S.F. and W.T.; software, S.F. and W.F.; validation, Z.Z., Y.X. and B.Z.; formal analysis, Z.Z., W.F., Y.X. and B.Z.; investigation, L.L., H.J. and W.T.; resources, Z.Z.; data curation, Z.Z., S.F., W.F., Y.X., B.Z. and L.L.; writing—original draft preparation, Z.Z. and S.F.; writing—review and editing, S.F. and W.T.; supervision, S.F.; project administration, Z.Z.; funding acquisition, Z.Z. and S.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 42476213. This research was funded by the Science and Technology Project of the Water Resources Department of Jiangxi Province (Grant No. 202325ZDKT18), Science and Technology Project of the Water Resources Department of Jiangxi Province (Grant No. 202425BZKT04), Open Research Fund Project of Jiangxi Academy of Water Science and Engineering (Grant No. 2022SKSG03), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2022A1515110200).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 42476213), Science and Technology Project of the Water Resources Department of Jiangxi Province (Grant No. 202325ZDKT18), Science and Technology Project of the Water Resources Department of Jiangxi Province (Grant No. 202425BZKT04), Open Research Fund Project of Jiangxi Academy of Water Science and Engineering (Grant No. 2022SKSG03), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2022A1515110200).

Conflicts of Interest

Author Yaozong Xu was employed by the company China Railway Water Conservancy Information Technology Co., Ltd. Author Bin Zhu was employed by the company Jiangxi Provincial Water Conservancy Investment Jianghe Information Technology Co., Ltd. Author Lei Li was employed by the company Beijing Guoxinhuayuan Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

Terms	Abbreviations
Convolutional neural networks	CNNs
Unmanned aerial vehicles	UAVs
Hidden Hazard Segmentation Network with 25 layers	HHSN-25 network
Support vector machines	SVM
Deep learning	DL
Convolutional blocks	Conv
Residual blocks	Block
2D convolution	Conv2d
Batch Normalization	BN
Omni-dimensional dynamic convolution	ODConv2d
Transposed convolutional blocks	TConv
2D transposed convolution	ConvTranspose2d
Layer Normalization	LN
Cross Entropy	CE
Binary Cross Entropy with Logits	BCEL
Intersection over Union	IoU
Online hard example mining	Ohem
Focal Loss	FL
Complementary Metal-Oxide-Semiconductor	CMOS
Analog-to-Digital	AD
Gaussian Error Linear Unit	GELU
Rectified Linear Unit	ReLU
Squeeze-and-Excitation	SE
Omnidirectional Convolution	ODConv

References

Wang, D.; Li, Y.; Yang, X.; Zhang, Z.; Gao, S.; Zhou, Q.; Zhuo, Y.; Wen, X.; Guo, Z. Evaluating urban ecological civilization and its obstacle factors based on integrated model of PSR-EVW-TOPSIS: A case study of 13 cities in Jiangsu Province, China. Ecol. Indicat. 2021, 133, 108431. [Google Scholar] [CrossRef]
Wang, Q.; Li, S.; Li, R. Evaluating water resource sustainability in Beijing, China: Combining PSR model and matter-element extension method. J. Clean. Prod. 2019, 206, 171–179. [Google Scholar] [CrossRef]
Nahmens, I. From lean to green construction: A natural extension. In Proceedings of the Building a Sustainable Future—Proceedings of the 2009 Construction Research Congress, Seattle, DC, USA, 5–7 April 2009; pp. 1058–1067. [Google Scholar]
Plessis, C.D. Astrategic framework for sustainable construction in developing countries. Constr. Manag. Econ. 2007, 25, 67–76. [Google Scholar] [CrossRef]
Lapworth, D.J.; Boving, T.B.; Kreamer, D.K.; Kebede, S.; Smedley, P.L. Groundwater quality: Global threats, opportunities and realising the potential of groundwater. Sci. Total Environ. 2022, 811, 152471. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Qu, R.; Xing, Z.; Lu, W. Identifying groundwater contaminant sources based on a KELM surrogate model together with four heuristic optimization algorithms. Adv. Water Resour. 2020, 138, 103540. [Google Scholar] [CrossRef]
Li, J.; Lu, W.; Wang, H.; Fan, Y.; Chang, Z. Groundwater contamination source identification based on a hybrid particle swarm optimization-extreme learning machine. J. Hydrol. 2020, 584, 124657. [Google Scholar] [CrossRef]
Zhang, J.; Vrugt, J.A.; Shi, X.; Lin, G.; Wu, L.; Zeng, L. Improving Simulation Efficiency of MCMC for Inverse Modeling of Hydrologic Systems with a Kalman-Inspired Proposal Distribution. Water Resour. Res. 2020, 56, e2019WR025474. [Google Scholar] [CrossRef]
Seyedpour, S.M.; Kirmizakis, P.; Brennan, P.; Doherty, R.; Ricken, T. Optimal remediation design and simulation of groundwater flow coupled to contaminant transport using genetic algorithm and radial point collocation method (RPCM). Sci. Total Environ. 2019, 669, 389–399. [Google Scholar] [CrossRef]
Wang, Z.; Lu, W. Groundwater Contamination Source Recognition Based on a Two-Stage Inversion Framework with a Deep Learning Surrogate. Water 2024, 16, 1907. [Google Scholar] [CrossRef]
Rashid, M.; Bari, B.S.; Yusup, Y.; Kamaruddin, M.A.; Khan, N. A Comprehensive Review of Crop Yield Prediction Using Machine Learning Approaches with Special Emphasis on Palm Oil Yield Prediction. IEEE Access 2021, 9, 63406–63439. [Google Scholar] [CrossRef]
Siade, A.J.; Cui, T.; Karelse, R.N.; Hampton, C. Reduced-Dimensional Gaussian Process Machine Learning for Groundwater Allocation Planning Using Swarm Theory. Water Resour. Res. 2020, 56, e2019WR026061. [Google Scholar] [CrossRef]
Sze, V.; Chen, Y.; Yang, T.; Emer, J.S. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 2017, 105, 2295–2329. [Google Scholar] [CrossRef]
Liang, T.; Glossner, J.; Wang, L.; Shi, S.; Zhang, X. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 2021, 461, 370–403. [Google Scholar] [CrossRef]
Murad, A.; Pyun, J. Deep Recurrent Neural Networks for Human Activity Recognition. Sensors 2017, 17, 2556. [Google Scholar] [CrossRef]
Anul Haq, M.; Khadar Jilani, A.; Prabu, P. Deep Learning Based Modeling of Groundwater Storage Change. Comput. Mater. Contin. 2022, 70, 4599–4617. [Google Scholar] [CrossRef]
Jiang, S.; Fan, J.; Xia, X.; Li, X.; Zhang, R. An Effective Kalman Filter-Based Method for Groundwater Pollution Source Identification and Plume Morphology Characterization. Water 2018, 10, 1063. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE Press: New York, NY, USA, 2014; pp. 580–587. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot MultiBox detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE Press: New York, NY, USA, 2016; pp. 779–788. [Google Scholar]
Choi, K.; Lee, H.W.; Mao, Z.; Lavy, S.; Ryoo, B.Y. Environmental, Economic, and Social Implications of Highway Concrete Rehabilitation Altematives. J. Constr. Eng. Manag. 2016, 142, 04015079. [Google Scholar] [CrossRef]
Feng, C.; Zhang, H.; Wang, S.; Li, Y.; Wang, H.; Yan, F. Structural damage detection using deep convolutional neural network and transfer learning. KSCE J. Civ. Eng. 2019, 23, 4493–4502. [Google Scholar] [CrossRef]
Huang, B.; Zhao, S.; Kang, F. Image-based automatic multiple-damage detection of concrete dams using region-based convolutional neural networks. J. Civ. Struct. Health Monit. 2023, 13, 413–429. [Google Scholar] [CrossRef]
Zhao, S.; Kang, F.; Li, J. Concrete dam damage detection and localisation based on YOLOv5s-HSC and photogrammetric 3D reconstruction. Autom. Constr. 2022, 143, 104555. [Google Scholar] [CrossRef]
Feng, C.; Zhang, H.; Wang, H.; Wang, S.; Li, Y. Automatic pixel-level crack detection on dam surface using deep convolutional network. Sensors 2020, 20, 2069. [Google Scholar] [CrossRef] [PubMed]
Chen, B.; Zhang, H.; Li, Y.; Wang, S.; Zhou, H.; Lin, H. Quantify pixel-level detection of dam surface crack using deep learning. Meas. Sci. Technol. 2022, 33, 065402. [Google Scholar] [CrossRef]
Zhang, E.; Shao, L.; Wang, Y. Unifying transformer and convolution for dam crack detection. Autom. Constr. 2023, 147, 1–14. [Google Scholar] [CrossRef]
Li, M.; Feng, X.; Han, Y. Brillouin fiber optic sensors and mobile augmented reality-based digital twins for quantitative safety assessment of underground pipelines. Autom. Constr. 2022, 144, 104617. [Google Scholar] [CrossRef]
Cao, W.; Li, J. Detecting large-scale underwater cracks based on remote operated vehicle and graph convolutional neural network. Front. Struct. Civ. Eng. 2022, 16, 1378–1396. [Google Scholar] [CrossRef]
Li, Y.; Bao, T.; Huang, X.; Wang, R.; Shu, X.; Xu, B.; Tu, J.; Zhou, Y.; Zhang, K. An integrated underwater structural multi-defects automatic identification and quantification framework for hydraulic tunnel via machine vision and deep learning. Struct. Health Monit. 2023, 22, 2360–2383. [Google Scholar] [CrossRef]
Li, Y.; Bao, T.; Huang, X.; Chen, H.; Xu, B.; Shu, X.; Zhou, Y.; Cao, Q.; Tu, J.; Wang, R.; et al. Underwater crack pixel-wise identification and quantification for dams via lightweight semantic segmentation and transfer learning. Autom. Constr. 2022, 144, 104600. [Google Scholar] [CrossRef]
Kokkaew, N.; Rudjanakanoknad, J. A Framework of Green Growth Assessment for Thailand’s Highway Infrastructure Developments. Comput. Civ. Build. Eng. 2014, 2014, 1951–1959. [Google Scholar]
Zhou, J.; Jing, J.F.; Zhang, H.H.; Wang, Z.; Huang, H.L. Real-time Fabric Defect Detection Algorithm Based on S-YOLOV3 Model. Laser Optoelectron. Prog. 2020, 57, 161001. [Google Scholar] [CrossRef]
Zhang, G.S.; Ge, G.Y.; Zhu, R.H.; Sun, Q. Gear defect detection based on the improved YOLOv3 network. Laser Optoelectron. Prog. 2020, 57, 121009. [Google Scholar] [CrossRef]

Figure 1. Hidden hazards in small-scale reservoir engineering(watermark comes from Jiangxi Academy of Water Science and Engineering and is a necessary part of the work documents).

Figure 2. Crack hazard investigation in small-scale reservoir engineering.

Figure 3. Reservoir accident site map.

Figure 4. Time series correlation diagram of seepage under rainfall conditions and abnormal values at measurement points. Red squares represent abnormal data during the period, and circle represents abnormal data at the end of the period.

Figure 5. LabelImg tool outputs labels.

Figure 6. Hidden Hazard Segmentation Network with 25 layers.

Figure 7. Convolution and deconvolution. (a) Convolution; (b) Deconvolution.

Figure 8. Basic structure of full-dimensional dynamic convolution.

Figure 9. Comparison of different convolution kernel sizes in Block.

Figure 10. Comparison of the different Block structures.

Figure 11. Comparison of changes in different network widths.

Figure 12. Comparison of network performance calculated using different loss functions.

Figure 13. Comparison of the performance of the different methods.

Figure 14. HHSN-25 train-validation loss trends.

Figure 15. Identification of leaching and extraction of quantitative indicator(watermark comes from Jiangxi Academy of Water Science and Engineering and is a necessary part of the work documents).

Figure 16. Identification of cracks and extraction of quantitative indicator(watermark comes from Jiangxi Academy of Water Science and Engineering and is a necessary part of the work documents).

Figure 17. Identification of leakage, and extraction of quantitative indicator(watermark comes from Jiangxi Academy of Water Science and Engineering and is a necessary part of the work documents).

Figure 18. Identification of Collapse and Extraction of Quantitative Indicator(watermark comes from Jiangxi Academy of Water Science and Engineering and is a necessary part of the work documents).

Table 1. IMX600 image sensor parameters.

Name	Parameters
Image size	Diagonal 9.2 mm (1/1.7 type)
Total number of pixels	7392 (H) × 5744 (V) approximately 42.28 M pixels
Total number of effective pixels	7392 (H) × 5584 (V) approximately 41.27 M pixels
Total number of active pixels	7360 (H) × 5504 (V) approximately 40.51 million pixels
Chip size	8.51 mm (H) × 6.22 mm (V)
Cell size	1.00 μm (H) × 1.00 μm (V)
Substrate material	Silicon

Table 2. Crack hazard investigation in the province.

Order	Cities	Earth and Rock Dams in the Province			Earth-Rock Dams in this Survey
		Reservoir Number	Number of Dams	Adopting Concrete Anti-Seepage Wall	Reservoirs with Cracks	Divide by Grade		Divided by Dam Height
		Reservoir Number	Number of Dams	Adopting Concrete Anti-Seepage Wall	Reservoirs with Cracks	Large and Medium Size	Small Size	≥15 m	<15 m
1	Nanchang	8	7	7	2	1	1	1	1
2	Jiujiang	29	14	6	7	3	4	7	0
3	Shangrao	44	25	6	7	7	0	5	2
4	Fuzhou	27	20	11	3	3	0	3	0
5	Yichun	49	41	22	1	1	0	1	0
6	Jian	47	29	10	8	5	3	6	2
7	Pingxiang	7	5	2	2	1	1	2	0
8	Jindezhen	7	7	3	3	2	1	3	0
9	Gan Zhou	47	19	7	0
10	Yingtan	11	8	6	0
11	Xinyu	7	7	2	0
Amount to		283	182	82	33	23	10	28	5

Table 3. Crack hazard investigation in reservoirs.

Class		Crack Number	Proportion (%)
Total		79
Type	Transverse crack	32	41
Type	Longitudinal crack	47	59
Location	Before and after the seepage wall	61	77
	Upstream dam slope	13	16
	Downstream dam slope	3	4
	Other parts or sites	2	3
Current situation	Stable	64	81
	Still developing	4	5
	The situation is unknown	11	14

Table 4. Comparison the adaptive detection model developed in this project with the mainstream method “Edge AI and Machine Learning”.

Comparative Indicators	Adaptive Detection Model	Edge AI and Machine Learning
Data acquisition and processing	40 s	5 min
Model calculation time	10 s	25 s
Recall	95%	85%
F1 score	93%	80%
Average intersection ratio	94%	82%
Stability	High stability (sensors with electromagnetic interference resistance; multiple sensors complement each other to provide redundant information)	Medium stability (grounding protection system; set backup computing units and sensors)
Robustness	High frequency data sampling and multimodal data fusion enhance anti-interference ability in complex environments; adaptive deep learning models enable them to dynamically adjust and adapt to sudden changes in the environment.	By utilizing advanced signal processing techniques and filtering algorithms, noise and interference caused by environmental changes can be effectively filtered out, but it cannot cope with sudden environmental changes.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, Z.; Fang, S.; Fang, W.; Xu, Y.; Zhu, B.; Li, L.; Ji, H.; Tu, W. The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province. Water 2024, 16, 2880. https://doi.org/10.3390/w16202880

AMA Style

Zhou Z, Fang S, Fang W, Xu Y, Zhu B, Li L, Ji H, Tu W. The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province. Water. 2024; 16(20):2880. https://doi.org/10.3390/w16202880

Chicago/Turabian Style

Zhou, Zhiwei, Shibiao Fang, Weihua Fang, Yaozong Xu, Bin Zhu, Lei Li, Haixiang Ji, and Wenrong Tu. 2024. "The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province" Water 16, no. 20: 2880. https://doi.org/10.3390/w16202880

APA Style

Zhou, Z., Fang, S., Fang, W., Xu, Y., Zhu, B., Li, L., Ji, H., & Tu, W. (2024). The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province. Water, 16(20), 2880. https://doi.org/10.3390/w16202880

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province

Abstract

1. Introduction

2. Construction of Hidden Hazards Dataset for Small Reservoir Engineering

2.1. Causes of Hidden Hazards in Small-Scale Reservoirs

2.2. Image Data Sources for Hidden Hazard Detection

2.3. Dataset Construction

2.3.1. Data Collection

2.3.2. Data Preprocessing

2.3.3. Manual Annotation

3. Deep Learning Based Semantic Segmentation Model for Hidden Hazards Images

3.1. Network Architecture for Hazard Segmentation

3.1.1. Feature Dimension Transformation in the Hazard Segmentation Network

3.1.2. Basic Feature Extraction Structure of the Hazard Segmentation Network

3.2. Loss Function and Improvements for the Hazard Segmentation Network

4. Results

4.1. Experimental Environment and Evaluation Indicators

4.2. Comparison of the Experimental Results

4.2.1. Basic Feature Extraction Structure Block

4.2.2. Comparison of Changes in Different Network Widths in Decoding Structures

4.2.3. The Influence of Different Loss Functions on the Experimental Results

4.2.4. The Performance Comparison of Different Segmentation Methods

4.2.5. Other Experimental Results

5. Discussion and Application

5.1. Integration of Hidden Hazard Feature Extraction Algorithm

5.2. Algorithm Promotion and Practical Application

5.3. Model’s Performance with Different Hyperparameters

5.3.1. Learning Rate

5.3.2. Batch Size

5.3.3. Optimizer Choice

5.4. Other Significant Hidden Hazards in Small-Scale Reservoir Engineering

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI