Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab

Jo, So-Hyeon; Woo, Joo; Kang, Chang Ho; Kim, Sun Young

doi:10.3390/rs16224267

Open AccessArticle

Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab

¹

Railroad Test and Certification Division, Korea Railroad Research Institute, Uiwang 16105, Republic of Korea

²

School of Computer Science and Engineering, Kunsan National University, Gunsan 54150, Republic of Korea

³

Department of Artificial Intelligence and Robotics, Sejong University, Seoul 05006, Republic of Korea

⁴

School of Mechanical Engineering, Kunsan National University, Gunsan 54150, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2024, 16(22), 4267; https://doi.org/10.3390/rs16224267

Submission received: 4 September 2024 / Revised: 12 November 2024 / Accepted: 12 November 2024 / Published: 15 November 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Building damage due to various causes occurs frequently and has risk factors that can cause additional collapses. However, it is difficult to accurately identify objects in complex structural sites because of inaccessible situations and image noise. In conventional approaches, close-up images have been used to detect and segment damage images such as cracks. In this study, the method of using a deep learning model is proposed for the rapid determination and analysis of multiple damage types, such as cracks and concrete rubble, in disaster sites. Through the proposed method, it is possible to perform analysis by receiving image information from a robot explorer instead of a human, and it is possible to detect and segment damage information even when the damaged point is photographed at a distance. To accomplish this goal, damage information is detected and segmented using YOLOv7 and Deeplabv2. Damage information is quickly detected through YOLOv7, and semantic segmentation is performed using Deeplabv2 based on the bounding box information obtained through YOLOv7. By using images with various resolutions and senses of distance for training, damage information can be effectively detected not only at short distances but also at long distances. When comparing the results, depending on how YOLOv7 and Deeplabv2 were used, they returned better scores than the comparison model, with a Recall of 0.731, Precision of 0.843, F1 of 0.770, and mIoU of 0.638, and had the lowest standard deviation.

Keywords:

deep learning; combined deep learning models; damage detection; damage segmentation; Deeplab; YOLO

1. Introduction

Over the past 20 years, the number of disasters has increased by 1.7 times compared to the previous 20 years, with floods, storms, and earthquakes accounting for 80% of the total number. The climate crisis is considered the main cause of disasters, and the number of disasters is expected to increase as the climate crisis becomes more serious in the future [1]. Cracks and concrete piles are common in collapsed and broken buildings due to these disasters and dilapidation. Cracks and rubble can cause additional collapses, increasing the magnitude of damage. In a disaster environment, quick judgment and analysis of such damaged information is an important part of figuring out the scene. Collecting data for the investigation of the causes of facility damage and the prevention of accidents due to such damage in disaster environments is essential. It is dangerous for a person to enter and inspect the scene of an accident, and it is also difficult to go into a potentially narrow section. Considering the additional risk of disaster occurrence and the narrow and complex terrain characteristics, the use of disaster robots is necessary. Therefore, robots, drones, etc., are deployed instead of humans, and the images are received and analyzed through remote control. In the case of the remote method, since workers cannot directly see the damage information, methods are needed to accurately grasp the damage information. To collect accurate and proper information, it is crucial to develop algorithms that can detect and segment information in real-time and enhance their performance.

In this paper, the goal is to detect and segment the damaged information obtained at a long distance. The model presented in this study is part of a program that uses a remote robot to explore disaster areas and gather damage data, including cracks and rubble, even at a distance. Higher-resolution images containing multiple types of damage such as cracks and rubble are acquired through a camera mounted on a robot that is deployed to a disaster environment.

In the case of existing damage detection, images taken at a short distance or monotonous backgrounds are used, and instances where the object size is small due to the longer distance have not been frequently investigated. Also, most of the models only perform the detection of one object. When learning with data in which the object is located largely in the center, it is impossible to detect it in a wide field of view. Due to this, damaged objects that should be detected may not be identified, resulting in errors where such damaged objects are missed. The robot judges well when it is close to the object, but image accuracy decreases as the distance increases. However, in the real field, damage information is attained in a complex background, so it is important to determine the target of damage in these situations.

In field exploration, it is necessary to identify all the damage across the entire environment before proceeding with a detailed investigation and then to focus the investigation on the necessary points. For this purpose, the YOLOv7 and Deeplabv2 models were used to detect and segment multiple damage information based on the image obtained through a camera mounted on a rover. YOLOv7 has shown an overall higher performance when recognizing small- and medium-sized objects [2], and it is also slightly faster than YOLOv8. It proposes a trainable bag of freebies that can improve accuracy without increasing inference costs. The Deeplab series has been developed to v1, v2, v3, and v3+. Generally, v3+ has been reported to have a better performance, but when segmenting objects such as sketches with line shapes, v2 has been shown to perform better [3,4]. In the case of cracks, v2 was used among the Deeplab models because it has a line shape like a sketch.

First, damage information is quickly detected using YOLOv7, and then the input image is cropped using bounding box (Bbox) information obtained through YOLOv7. Segmentation is performed using Deeplabv2 only for cropped images. This allows the model to be provided with segmentation targets by specifying the areas to be segmented in the image. Through this, damage can be preferentially detected at a long distance, and it is possible to detect them even in areas where access is impossible at a short distance. The main focus of the paper is as follows:

By combining the detection and segmentation models, the performance of the evaluation metric is improved compared to a single-use model, and it can identify small objects that were previously undetected.
It is possible to detect and segment information even at a long distance rather than a short distance.
It is possible to detect and segment multiple types of damage information (multiple classes) rather than a single type, such as only cracks or piles, etc.

Section 2 provides the related work in research for the detection or segmentation of damage information. Section 3 describes the method and model proposed in this paper. Section 4 describes the datasets used for training and testing, the evaluation method, and the experimental environment. Section 5 illustrates and discusses the training and analysis results. Limitations of the proposed work and future work are then discussed. Section 6 presents conclusions on the system and experimental results proposed in this paper.

2. Related Work

Damage information detection methods have been developed in various ways with a focus on crack detection [5]. Visual inspection can be used to evaluate the degree of damage, and detection methods using contact or sensors, such as non-destructive inspection, have been studied [6]. However, in a complex environment, the inspection of this approach comes with risks and limitations, and issues with the reliability of the sensor data can arise. To remedy these, some methods for detecting damage using image and video with remote sensing techniques have been studied [7]. Augmented reality was used to detect damage in multiclass structures and showed an accuracy of over 90% for up to 2 m about four types of damage [8]. With the rapid development of computer vision and deep learning in the past few years, object detection and segmentation models have been used in various fields. In particular, convolutional neural networks (CNNs) have typically been proposed for image classification tasks [9,10,11]. Various models such as Fully Convolutional Networks, Mask R-CNN, Faster R-CNN, and ResNet have been developed for image segmentation and detection based on CNNs [12,13,14]. Previous work [15] has proposed the Robust Mask R-CNN as an end-to-end deep learning method for detecting cracks. In another study [16], researchers proposed an approach based on a transformer network for automated pavement crack inspection. Gupta et al. [17] detected and segmented defects in thermal images using a 2D/3D-UNet network. The advancement of technologies such as optics, flight, and communication has made it possible to work with high-quality images. Based on this, CNN architectures were designed and methods for detecting building damage using satellite images or drones were studied [18,19,20]. RescueNet, a unified model capable of end-to-end learning, was developed, along with a novel localization-aware loss function for building segmentation and damage classification [21].

Especially in damage detection, cracks are actively studied, and various methods have been proposed for the detection and segmentation of information on cracks, mainly in asphalt, bridges, and building walls [22,23,24]. Xiang et al. [25] proposed a dual-encoder network fusing transformers and convolutional neural networks to segment crack information. Alexnet (ImageNet CNN model) was implemented to classify collapse events and support decision-making for building a database [26]. YOLOv3 was improved to make it lightweight, and it was further combined with MobileNets and CBAM (Convolutional Block Attention Module) to detect cracks on a bridge surface [27]. Wang et al. [28] proposed YOLOv5s-VF to detect defects on a railway surface, and Wang et al. [29] used BiFPN with weights removed, based on YOLOv7, to detect defects on a steel strip surface, with an ECA attention mechanism combined in the backbone. Kumar et al. [30] suggested a real-time multi-drone damage detection system using the edge computing principle and YOLO. To segment cracks, researchers in [31] proposed an end-to-end deep learning model and used the entire image information at once rather than a method such as a sliding window. A semantic damage detection network, SDDNet, was proposed using various convolution modules, Atrous Spatial Pyramid Pooling (ASPP), and decoder modules [32]. Instead of existing crack images with a monotonous background, the cracks were segmented even on complex backgrounds; however, there was difficulty in segmenting faint cracks. Faster R-CNN and a modified Tubularity Flow Field (TuFF) algorithm were combined to detect cracks even in environments with complex angles and distances [33]. However, most studies focused on detecting only one object or they used monotonous images with greatly enlarged objects. In a real environment, complex images are obtained, and various damage factors exist, so problems may occur at the actual use stage.

3. Proposed Methodology

To evaluate and compare the performance of the combination of detection and segmentation models, three methods are defined as follows:

(1): Deeplabv2: Using full-scale images;
(2): InYD: Intersection of YOLOv7 and Deeplabv2;
(3): DCIY: Deeplabv2 with Cropped Images by YOLOv7.

3.1. Deeplabv2

In the first method, two classes of damage information (i.e., cracks and concrete remains) are trained on Deeplabv2, an existing segmentation model. The Deeplabv2 model is based on the Resnet101 model, and an ASPP has been added to apply multi-scale context [34].

In the process of obtaining damage information, especially for cracks in complex environments, segmentation can be difficult when trained using only enlarged images without the surrounding information. To prevent this, it is necessary to collect the surrounding information. However, if the size of the filter is increased for this purpose, the computation cost increases. ASPP is a technique that arranges Atrous Convolutions in parallel. In the convolution calculation process, the Field of View (FOV) can be diversified by adding spaces between the filter values.

The space interval of the filter values is represented by the r (rate) value. The model used in this paper has r values of 6, 12, 18, and 24. Because ASPP which has several receptive fields can be less affected by the size of an object, it has an advantage to segment objects of various sizes in each image.

3.2. InYD

In the second method, damage information is detected and segmented using the results of Deeplabv2 and Yolov7. YOLOv7 is used as a detection model to detect damage information such as cracks and rubble occurrences in a disaster environment. The YOLO model shows excellent speed in object detection, and many versions have been released, up to YOLOv11 [35]. YOLOv7, which is used in this study, shows a recording speed that is 120% faster than YOLOv5 for the MS COCO dataset. YOLOv7 improves performance without increasing the inference cost using various Bags-of-Freebies (BoFs), such as a reparameterization method that fuses the multiple layers learned during training into one layer during inference, and a soft label generation method using lead head and ground truth. YOLOv7 transforms RepVGG to use the RepConvN structure without an identity connection in RepConv. This method prevents residual and concatenation connections from destruction and makes a RepConvN structure that can be reparameterized. Additionally, by adding the E-ELAN structure, the learning ability is improved by preventing the unstable state in the infinite computational block, which is a disadvantage of the existing ELAN. The YOLOv7 model is constructed using the CBS module, ELAN module, and MP module, and the SiLU function as the activation function. The CBS module consists of Conv, BN, and SiLU, and the MP module consists of Maxpool and CBS.

InYD is a method designed by simply overlapping the Bbox results of YOLOv7 and the segmentation results of Deeplabv2. The training data are the same as in the first method, and the post-processed image is extracted as the result of intersecting the segmentation result through Deeplabv2 and the Bbox detection result through YOLOv7. If the regions of the detection and segmentation results of the two models are the same, the results can be output. In this case, since the two results are judged by comparing them, false consequences or over-detection that may occur when searching for damage information with only Deeplabv2 or YOLOv7 can be reduced. However, if either side fails to detect or segment the object, there is a possibility that no results can be obtained.

3.3. DCIY

The third is the method proposed in this paper; after identifying the damage information through YOLOv7, the images are extracted and cropped based on the Bbox information obtained by YOLOv7. Object information and Bbox coordinate information detected through the YOLOv7 model are used for image cropping as preprocessing for the damage segmentation model. The objects in the Bbox image are segmented using Deeplabv2. The overview of the proposed method is shown in Figure 1, and Algorithm 1 shows the pseudo-code of DCIY.

Although YOLOv7’s performance has improved compared to the previous versions in terms of speed and accuracy, the tendency to over-detect was still confirmed in the damage information detection test. To prevent this, Deeplabv2 was used with YOLOv7 to explore damage information.

When the segmentation model is used alone, when taken from a close distance, the segmentation works well as shown in Figure 2a,c. However, as shown in Figure 2b,d, it can be confirmed that the segmentation is not properly performed when the image is taken from a long distance. In particular, the dark part of the background in Figure 2b was also mistakenly judged as a crack and segmented, and in Figure 2d, some objects could not be segmented.

Errors can be reduced by first identifying the damage coordinates through a detection model and then segmenting the corresponding parts. In the image analysis for damage detection in a disaster environment, the scale of cracks and piles is first identified with Bbox through YOLOv7. To segment the cracks and piles in the area indicated by Bbox, the corresponding area is cropped.

In the case of segmentation models, the time and capacity may increase with larger image sizes. Through the proposed method, it is possible to calculate only the object to be segmented out of the entire image by first quickly detecting the damage information and proceeding with the segmentation operation only for the detected area. This can reduce unnecessary calculations for the entire image and improve segmentation accuracy [36].

This method enables the rapid detection and segmentation of facility damage information in complex and messy disaster environments.

Algorithm 1 Pseudo-code of DCIY
1.	procedure DCIY(IMG)
2.	function Train(model, input):
3.	Evaluation ← Eval(model(input))
4.	model update with Evaluation
5.	end function
6.	if input IMG from a camera then
7.	if YOLOv7 is learning then
8.	Train(YOLOv7, IMG)
9.	else
10.	Bbox ← YOLOv7(IMG)
11.	Crop IMG ← Bbox ∩ IMG
12.	end if
13.	if Crop IMG exists then
14.	if Deeplabv2 is learning then
15.	Train(Deeplabv2, Crop IMG)
16.	else
17.	Seg IMG ← Deeplabv2(Crop IMG)
18.	end if
19.	end if
20.	end if
21.	end procedure

4. Experimental Configurations

In this section, the methods for experiments and evaluations are introduced. The characteristics and configurations of the dataset used for training, as well as the method of evaluation and the experimental environment are discussed.

4.1. Dataset

In this paper, the purpose is not only to acquire and recognize an enlarged image after approaching the object for detection but also to detect a long-distance object. However, in the case of other public datasets, most of them provide only one class or have a monotonous background, small size image, enlarged objects, road cracks, etc. In this case, however, there is no reason to detect objects and crop the damaged information, so such datasets are not appropriate for this paper because they differ from the purpose presented.

Figure 3 shows the proportion of the area that an object occupies per pixel according to the resolution of the image file. If the image is small with the same resolution, the object’s area in the pixel unit increases. The changes also affect the detection model training by changing the area accounting for an object in each pixel unit. There is nothing wrong with having large objects that take up a large area in a picture. However, learning multiple small objects rather than a large object is much more helpful in training the object detection model [37,38]. In [32], it was reported that performance improved when learning using data with a complex background, compared to data focused on cracks.

As shown in Figure 4, a test was conducted to confirm the change according to the size of the picture by photographing the crack on the wall near the laboratory. Cracks were detected when the ratio of crack size to image size was large (i.e., when cropped from the original photo), even though it was the same picture. However, using the original picture, it can be seen that it is almost undetectable when the crack size is small. This is the reason why images with cracks of various sizes are needed in the training dataset. Accordingly, when the images were collected, the number, position, and ratio of the objects in the images were controlled to have various forms.

Crack and pile images were collected to make datasets for training, validation, and testing. A total of 727 crack and pile images were collected from the reproduced disaster environment and several old buildings nearby. Figure 5 shows the reproduced environment for data collection and testing. An indoor space with a size of more than 2000 m³ with obstacles such as bumps was constructed at the National Disaster Management Research Institute’s demonstration test center, and a remote robot equipped with a camera was used to move around and collect images. Since it is difficult to reproduce rubble and wall cracks in a large space, concrete piles were placed, and crack marks were attached to the pillars for use. The camera used is a FLIR Blackfly Camera (4 mm lens) and operates as a Global Shutter. The camera has higher-resolution (HR) specifications and can obtain clear and large images. In the case of HR, the image quality does not deteriorate significantly even if the image is cropped. The raw images were received through this camera, and there can be various objects in one image.

The cracks and piles were labeled for each image. Also, 727 images were labeled with a dataset for detection and a dataset for segmentation. In the case of the detection dataset, the amount of data was increased to a total of 2076 images, including original images through mosaic and noise augmentation techniques to supply more data. The segmentation dataset was obtained by cropping the image based on the Bbox for the objects detected through YOLOv7 in 727 images. When detected with YOLOv7, some Bbox images may have been too small to be used as segmentation model input. To prevent this, the minimum resolution was set to 64x64, and if the size of the Bbox was smaller than this, the image was cropped by adding pixels centered on the detection object. By adding the data created using YOLOv7 and the existing data, a total of 2307 images with various resolutions were included, as shown in Figure 6. Each image of Figure 6 shows only the object part cut out from the original image. The dataset for the test used the raw images of the original size that were not duplicated with the training data, and the test was conducted on 52 images, which was about 1/10 of the amount of data obtained via photography. Table 1 shows the dataset specifications.

4.2. Evaluation Method

For evaluation and comparison of the above three methods, Precision, Recall, F1 score, and IoU are used. The relationship between the true and false of the problem can be classified into four types. TP refers to when the actual outcome and prediction are “Positive”. If the actual is “Positive” but the prediction is “Negative”, it is a FN (False Negative). Likewise, if you predicted “Positive” when it was in fact “Negative”, this is FP (False Positive). Finally, TN means when the actual and predicted are “Negative”. Precision, Recall, and F1 scores can be expressed as a combination of the above four [39].

Precision is the ratio of actual positives (True Positives) among predicted positives (True Positives and False Positives).

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

Recall is the ratio of what the model predicted to be positive (True Positive) out of the actual positives (True Positive and False Negative). That is, it can indicate how many correct answers the model obtained for a positive outcome.

R e c a l l = \frac{T P}{T P + F N}

(2)

If the difference between the Precision and Recall values is significant, there is a possibility of relying on one indicator in evaluating the model.

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

The F1 Score represents the harmonic mean of Precision and Recall, and it can be referred to as the data imbalance between classes. The F1 Score has a value of between 0 and 1, and the higher the score, the better [40].

Intersection over Union (IoU) is used to evaluate the accuracy of object detection and represents the relationship between the actual correct answer (Ground Truth box) and the predicted value (Predicted box). Since performance is evaluated through the size of the overlapping area between the actual correct answer and the predicted value, the larger the overlapping size, the better the performance. Bbox_GT is the Ground Truth box, and Bbox_P is the Predicted box [41].

I o U = \frac{a r e a ({B b o x}_{G T} \cap {B b o x}_{P})}{a r e a ({B b o x}_{G T} \cup {B b o x}_{P})}

(4)

5. Result and Discussions

This section shows the technical specifications for using YOLOv7 and Deeplabv2, and the results and analysis of the experiments.

5.1. Training

The technical specifications of Deeplabv2 and YOLOv7 are shown in Table 2. A total of 2076 images were used for YOLOv7 training, with an input size of 640 × 640. The images were a mixture of original images and augmented images through mosaic and noise techniques. Momentum was set to 0.937, batch size was set to 32, weight decay was set to 0.0005, and epochs were 300. The training image was cropped through the Bbox coordinates and detection information output as a result of YOLOv7 and used as a dataset for Deeplabv2 training. A total of 2307 images were used for Deeplabv2 training, and the input size was 320 × 320. Crack and pile images were randomly mixed and input, and the random input produced better results than the sequential input by class. The learning rate was set to 0.00025, momentum to 0.9, batch size to 5, weight decay to 0.0005, and the model was trained for 20,000 steps.

5.2. Result and Analysis

YOLOv7 and DeepLabv2 were trained separately, and the results are as follows. For the learning results of YOLOv7, Precision was 0.8536, Recall was 0.9169, [email protected] was 0.9085, and mAP@[.5: .95] was 0.625. The results were compared in three ways as mentioned in Section 2. The first method was just using DeepLabv2. The second was a method of extracting overlapping detection results for damage through post-processing using the result of damage segmentation through DeepLabv2, trained as above, and the results of YOLOv7 (i.e., InYD). The other one used YOLOv7, trained as above, to detect damage on the test dataset and, after cropping the test dataset image based on the Bbox, segmentation was performed on the image with Deeplabv2 (i.e., DCIY).

When comparing the segmentation results of DeepLabv2 and InYD for the same image, Deeplabv2 segmented sparsely or did not accurately segment actual cracks. However, with DCIY, it was confirmed that the entire shape of the crack could be segmented by cropping and segmenting only the necessary part using YOLO. The concrete piles also showed similar results. Deeplabv2 could not segment all the piles for the very small ones or for shaky pictures. However, like cracks, when the necessary parts were cropped using YOLO and segmented by Deeplab, most of the piles were successfully segmented. In other words, it can be seen that it is effective to quickly find a necessary part from an image and segment only the selected part rather than using the entire image to better detect a distant object in a wide space. When using this method, data collection becomes easier because the datasets (images taken with similar sizes and angles, and some images that are very small, etc.) mentioned in Section 4 can also be used.

Figure 7 shows the original test set images of 1280 × 1024 in size and the results of detection and segmentation. InYD shows post-processed images of the results of YOLOv7 and Deeplabv2, and the overlapping part of the two results is displayed. The translucent gray box in InYD indicates the detection result of YOLOv7. When the two results are used together, the error of over-detection can be reduced compared to obtaining damage information using the two results separately. However, as the intersection of the two results is used, if object detection is omitted in either model, the result is discarded. As shown in the case of InYD, segmentation performance deteriorates when the object moves away, resulting in partial segmentation or missing objects. Looking at Case 4–10, YOLOv7 detects the cracks and piles, but Deeplabv2 does not segment them, so they are missing.

DCIY shows the results of the first detection of damage through YOLOv7 and the subsequent object segmentation with Deeplabv2 based on the detection result. In contrast with InYD, the objects were detected and segmented for all images. The objects were detected with YOLO, and it was confirmed that the cracks and piles could be segmented with the Bbox area. When detecting an object first using YOLO, over-segmentation was prevented, as shown in Figure 2b, and segmentation was possible to some extent even when the damage information in the image was small or for disturbances such as light. DCIY (Pile) did not miss the small objects that InYD had failed to capture.

The difference in the method also affected the evaluation metric. Figure 8 shows the histograms of Recall, Precision, F1, and IoU for each model. It shows how each score is distributed in the score band. CDF means Cumulative Distribution Functions. In the case of Recall, the distribution of DCIY is not good, but it can be seen that all the other scores are distributed in a higher score range compared to other models.

In the case of InYD, the probability of correctly answering increases compared to Deeplabv2, so the Recall value increases. However, by excluding non-duplicate results, the percentage of actual correct answers decreases, resulting in a decrease in Precision. The decrease in Precision was larger than the increase in Recall, so the value of F1 recorded a lower score than Deeplabv2. In the case of IoU, the precision of InYD was reduced, so the IoU score was lower than that of Deeplabv2.

In the case of DCIY, the method proposed in this paper, it can be seen that the evaluation metric score has improved compared to the previous method. Since the result of detection depends on YOLOv7, the Recall value decreased according to the detection result of YOLOv7. However, by performing segmentation only on the images within the detected Bbox, it could make the most of Deeplabv2’s performance, and the Precision value greatly increased accordingly. As can be seen in Figure 8b, while the Precision of Deeplabv2 and InYD are distributed over a wide range of scores, DCIY is narrowly distributed over a high range of scores. As the imbalance between Precision and Recall is adjusted, the F1 score also increases as a result, and IoU also has the highest score among the three methods. The gap in the score distribution in the evaluation metrics, except for Recall, is greatly reduced. This reduces the standard deviation σ for each score, and it can be confirmed that the performance is improved by moving to the higher score band.

Table 3 shows the average and standard deviation of the evaluation metrics scores for the three methods. The average Recall is the highest for InYD, but the standard deviation is the lowest for DCIY. Except for R, all have the highest score of DCIY, as well as the lowest score for standard deviation, showing that the reliability of the score is high.

5.3. Limitations and Future Work

In Section 4 and Section 4.1, the other open datasets could not be compared as they did not fit the purpose of this paper. In addition, in the case of the other segmentation models that have similar purposes, most of them perform single-class crack segmentation, so it was difficult to compare them with the model proposed in this paper. An attempt was made to compare only the cracks, rather than multiple types of damage, but the papers on the model intended for comparison did not provide a dataset, or the available dataset was not relevant to the model presented in this paper (i.e., characteristics with long distances and complexity). The purpose presented in this paper is that object detection and segmentation can be performed even for damage information on a distant object. Therefore, comparing the segmentation results using the above dataset or model is simply a comparison of the performance of the segmentation model, that is, comparing Deeplabv2 and other segmentation models. So, we were unable to compare our model with other datasets and other models.

The available damage datasets do not have object detection labels and segmentation labels that include cracks and rubble in complex environments. So, the dataset was manually created. Damage data resulting from the collapse of a facility are difficult and dangerous to find, making it difficult to collect a large amount of images. Instead, an experiment was conducted by creating fake damage. The smaller the crack, the more likely it was to be affected by camera performance. However, improving segmentation performance for low-quality images is a common challenge. In the future, it is necessary to collect data through continuous monitoring in an actual disaster environment, create detection labels and segmentation labels, and then apply them to the various models.

Potential limitations may arise when image quality is compromised or if the robot’s exploration is limited or obstructed. It is crucial to design a model and build a dataset while considering the camera that the robot will use, the images it will receive, and the environment in which it will operate. Additionally, it would be beneficial to explore collaboration with other tools that the robot can utilize, rather than relying solely on cameras.

When the model encounters novel or unseen scenarios that were not well represented in the training data, detection and segmentation are possible as long as the object shares features with the learned data. However, scenarios with objects that do not share these features are unlikely to be recognized by the model. When a novel issue arises, these data can be added to train the model and enhance its capabilities for damage detection and segmentation. It is important to continuously update the dataset with actual field data.

Furthermore, several technical considerations need to be addressed regarding the implementation and deployment of deep learning models in real-world disaster scenarios. The computational capacity of both the main system and deployed robots must be carefully evaluated. The durability of a robot in harsh environments is another critical factor. The choice between edge computing and cloud-based solutions presents different challenges; edge computing requires optimization for resource consumption, while cloud-based systems must address data transfer speeds and server infrastructure requirements. These technical aspects significantly influence the model’s real-world performance and must be carefully considered in future developments.

6. Conclusions

This paper proposes a method that combines YOLOv7 and Deeplabv2 models for quickly and accurately detecting damage information in a disaster environment. YOLOv7 was used to quickly detect the damage information, while the Deeplabv2 model was employed to segment the damage information using only the information in the detected Bbox. This process was designed to eliminate unnecessary background calculation processes and focus on the object. As a result, the method proposed in this paper scored better in evaluation metrics than the method using the segmentation model alone or the post-processing method using the resultant values of the segmentation and detection model. The standard deviation was also reduced by reducing the imbalance of the output values. Compared to existing segmentation models that have an excellent performance only with objects photographed from a close distance, the method proposed can detect damaged parts and analyze information even from a long way off. This is expected to make it possible to perform environmental analysis even at a distance during remote sensing in a disaster environment. The developed technology can be widely used not only in disaster environments but also in various industrial fields utilizing sensor-based robots to detect and segment target objects. In the future, research will be conducted to compare the performance of the latest YOLO and the combined version of other segmentation models and to improve segmentation performance for data with a smaller resolution than conventional object segmentation due to Bbox cropping. As new and improved object models and segmentation models are developed, the synergies will likely further increase.

Author Contributions

Conceptualization, S.Y.K. and C.H.K.; methodology, S.Y.K., S.-H.J. and J.W.; software, S.-H.J. and J.W.; validation, S.Y.K., S.-H.J., J.W. and C.H.K.; formal analysis, S.-H.J. and J.W.; investigation, S.-H.J. and J.W.; resources, S.Y.K., S.-H.J. and J.W.; data curation, S.Y.K., S.-H.J., J.W. and C.H.K.; writing—original draft preparation, S.-H.J.; writing—review and editing, S.Y.K., S.-H.J. and C.H.K.; visualization, S.-H.J. and J.W.; supervision, S.Y.K. and C.H.K.; project administration, S.Y.K. and C.H.K.; funding acquisition, S.Y.K., S.-H.J. and C.H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Republic of Korea (NRF) grant funded by the Ministry of Science and ICT, the Republic of Korea (No. 2021R1C1C1009219); This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (2022R1A6A3A01087518); This research was supported by Korea Basic Science Institute (National research Facilities and Equipment Center) grant funded by the Ministry of Education (2023R1A6C101B042); This work was supported by the faculty research fund of Sejong University in 2024; This research outputs are the part of the project “Disaster Field Investigation using Mobile Robot technology (II)”, which is supported by the NDMI (National Disaster Management research Institute) under the project number NDMI-MA-2022-06-02. The authors would like to acknowledge the financial support of the NDMI.

Data Availability Statement

The data supporting the results of this study can be obtained by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Van Loenhout, J.; Below, R.; McClean, D. The Human Cost of Disasters: An Overview of the Last 20 Years (2000–2019). Technol Report. Centre for Research on the Epidemiology of Disasters (CRED) and United Nations Office for Disaster Risk Reduction (UNISDR), 2020. Available online: https://www.preventionweb.net/files/74124_humancostofdisasters20002019reportu.pdf (accessed on 13 October 2022).
Gochoo, M.; Otgonbold, M.-E.; Ganbold, E.; Hsieh, J.-W.; Chang, M.-C.; Chen, P.-Y.; Dorj, B.; Al Jassmi, H.; Batnasan, G.; Alnajjar, F.; et al. Fisheye8k: A benchmark and dataset for fisheye camera object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 5305–5313. [Google Scholar]
Zou, C.; Yu, Q.; Du, R.; Mo, H.; Song, Y.-Z.; Xiang, T.; Gao, C.; Chen, B.; Zhang, H. Sketchyscene: Richly-annotated scene sketches. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 421–436. [Google Scholar]
Ge, C.; Sun, H.; Song, Y.Z.; Ma, Z.; Liao, J. Exploring local detail perception for scene sketch semantic segmentation. IEEE Trans. Image Process. 2022, 31, 1447–1461. [Google Scholar] [CrossRef] [PubMed]
Mohan, A.; Poobal, S. Crack detection using image processing: A critical review and analysis. Alex. Eng. J. 2018, 57, 787–798. [Google Scholar] [CrossRef]
Yuan, F.; Yu, Y.; Li, L.; Tian, G. Investigation of DC electromagnetic-based motion induced eddy current on NDT for crack detection. IEEE Sens. J. 2021, 21, 7449–7457. [Google Scholar] [CrossRef]
Dong, L.; Shan, J. A comprehensive review of earthquake-induced building damage detection with remote sensing techniques. ISPRS J. Photogramm. Remote Sens. 2013, 84, 85–99. [Google Scholar] [CrossRef]
Awadallah, O.; Sadhu, A. Automated multiclass structural damage detection and quantification using augmented reality. J. Infrastruct. Intell. Resil. 2023, 2, 100024. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
O’Shea, K. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Amjoud, A.B.; Amrouch, M. Object detection using deep learning, CNNs and vision transformers: A review. IEEE Access 2023, 11, 35479–35516. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Bai, Y.; Sezen, H.; Yilmaz, A. End-to-end deep learning methods for automated damage detection in extreme events at various scales. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 6640–6647. [Google Scholar]
Guo, F.; Qian, Y.; Liu, J.; Yu, H. Pavement crack detection based on transformer network. Autom. Constr. 2023, 145, 104646. [Google Scholar] [CrossRef]
He, Y.; Mu, X.; Wu, J.; Ma, Y.; Yang, R.; Zhang, H.; Wang, P.; Wang, H.; Wang, Y. Intelligent detection algorithm based on 2D/3D-UNet for internal defects of carbon fiber composites. Nondestruct. Test. Eval. 2024, 39, 923–938. [Google Scholar] [CrossRef]
Duarte, D.; Nex, F.; Kerle, N.; Vosselman, G. Satellite image classification of building damages using airborne and satellite image samples in a deep learning approach. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 4, 89–96. [Google Scholar] [CrossRef]
Vetrivel, A.; Duarte, D.; Nex, F.; Gerke, M.; Kerle, N.; Vosselman, G. Potential of multi-temporal oblique airborne imagery for structural damage assessment. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 355–362. [Google Scholar] [CrossRef]
Xu, J.Z.; Lu, W.; Li, Z.; Khaitan, P.; Zaytseva, V. Building damage detection in satellite imagery using convolutional neural networks. arXiv 2019, arXiv:1910.06444. [Google Scholar]
Gupta, R.; Shah, M. Rescuenet: Joint building segmentation and damage assessment from satellite imagery. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 4405–4411. [Google Scholar]
Reddy, A.; Indragandhi, V.; Ravi, L.; Subramaniyaswamy, V. Detection of Cracks and damage in wind turbine blades using artificial intelligence-based image analytics. Measurement 2019, 147, 106823. [Google Scholar] [CrossRef]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3708–3712. [Google Scholar]
Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
Xiang, C.; Guo, J.; Cao, R.; Deng, L. A crack-segmentation algorithm fusing transformers and convolutional neural networks for complex detection scenarios. Autom. Constr. 2023, 152, 104894. [Google Scholar] [CrossRef]
Yeum, C.M.; Dyke, S.J.; Ramirez, J. Visual data classification in post-event building reconnaissance. Eng. Struct. 2018, 155, 16–24. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, J.; Cai, F. On bridge surface crack detection based on an improved YOLO v3 algorithm. IFAC-PapersOnLine 2020, 53, 8205–8210. [Google Scholar] [CrossRef]
Wang, M.; Li, K.; Zhu, X.; Zhao, Y. Detection of surface defects on railway tracks based on deep learning. IEEE Access 2022, 10, 126451–126465. [Google Scholar] [CrossRef]
Wang, Y.; Wang, H.; Xin, Z. Efficient detection model of steel strip surface defects based on YOLO-V7. IEEE Access 2022, 10, 133936–133944. [Google Scholar] [CrossRef]
Kumar, P.; Batchu, S.; Kota, S.R. Real-time concrete damage detection using deep learning for high rise structures. IEEE Access 2021, 9, 112312–112331. [Google Scholar] [CrossRef]
Lee, D.; Kim, J.; Lee, D. Robust concrete crack detection using deep learning-based semantic segmentation. Int. J. Aeronaut. Space Sci. 2019, 20, 287–299. [Google Scholar] [CrossRef]
Choi, W.; Cha, Y.J. SDDNet: Real-time crack segmentation. IEEE Trans. Ind. Electron. 2019, 67, 8016–8025. [Google Scholar] [CrossRef]
Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 405–420. [Google Scholar]
Kuznetsova, A.; Rom, H.; Alldrin, N.; Uijlings, J.; Krasin, I.; Pont-Tuset, J.; Kamali, S.; Popov, S.; Malloci, M.; Kolesnikov, A.; et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis. 2020, 128, 1956–1981. [Google Scholar] [CrossRef]
Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. pp. 740–755. [Google Scholar]
Veropoulos, K.; Campbell, I.C.G.; Cristianini, N. Controlling the Sensitivity of Support Vector Machines. In Proceedings of the International Joint Conference on Artificial Intelligence, (IJCAI99), Stockholm, Sweden, 31 July–6 August 1999; pp. 55–60. [Google Scholar]
Lipton, Z.C.; Elkan, C.; Naryanaswamy, B. Optimal thresholding of classifiers to maximize F1 measure. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, 15–19 September 2014; Proceedings, Part II 14. pp. 225–239. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 658–666. [Google Scholar]

Figure 1. Damage detection and segmentation methods.

Figure 2. Crack and pile detection using a segmentation model. (a) Crack taken at a short distance, (b) Crack taken at a long distance, (c) Piles taken at a short distance, (d) Piles taken at a long distance.

Figure 3. The ratio of objects within one pixel.

Figure 4. Difference in detection according to the resolution. (a) Case ① in Figure 3, (b) Case ② in Figure 3.

Figure 5. Experimental environment.

Figure 6. Datasets of different resolutions and sizes.

Figure 7. Crack and pile detection.

Figure 8. Histograms. (a) Recall, (b) Precision, (c) F1 Score, (d) IoU.

Table 1. Dataset specifications.

	YOLOv7	Deeplabv2
Raw	Custom 727 images
Train	643 → 2076 (Augmentation)	2307
Val	32	-
Test	52

Table 2. Technical specifications.

	YOLOv7	Deeplabv2
GPU	NVIDIA GeForce RTX3090	NVIDIA GeForce RTX2080Ti
OS	Win10	Win11
Framework	torch 1.11	Tensorflow 1.15
CUDA	11	10
Python	3.7	3.7

Table 3. Comparison of evaluation metrics according to training methods.

	Deeplabv2		InYD		DCIY
	Mean	σ	Mean	σ	Mean	σ
R	0.797	0.254	0.823	0.265	0.731	0.157
P	0.510	0.333	0.478	0.321	0.843	0.088
F1	0.567	0.333	0.551	0.329	0.770	0.107
mIoU	0.466	0.305	0.448	0.301	0.638	0.139

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jo, S.-H.; Woo, J.; Kang, C.H.; Kim, S.Y. Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab. Remote Sens. 2024, 16, 4267. https://doi.org/10.3390/rs16224267

AMA Style

Jo S-H, Woo J, Kang CH, Kim SY. Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab. Remote Sensing. 2024; 16(22):4267. https://doi.org/10.3390/rs16224267

Chicago/Turabian Style

Jo, So-Hyeon, Joo Woo, Chang Ho Kang, and Sun Young Kim. 2024. "Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab" Remote Sensing 16, no. 22: 4267. https://doi.org/10.3390/rs16224267

APA Style

Jo, S. -H., Woo, J., Kang, C. H., & Kim, S. Y. (2024). Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab. Remote Sensing, 16(22), 4267. https://doi.org/10.3390/rs16224267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab

Abstract

1. Introduction

2. Related Work

3. Proposed Methodology

3.1. Deeplabv2

3.2. InYD

3.3. DCIY

4. Experimental Configurations

4.1. Dataset

4.2. Evaluation Method

5. Result and Discussions

5.1. Training

5.2. Result and Analysis

5.3. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI