Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models

Elhariri, Esraa; El-Bendary, Nashwa; Taie, Shereen A.

doi:10.3390/a15080281

Open AccessArticle

Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models

by

Esraa Elhariri

^1,*

,

Nashwa El-Bendary

²

and

Shereen A. Taie

¹

Faculty of Computers and Information, Fayoum University, Fayoum 63514, Egypt

²

College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Aswan 81516, Egypt

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(8), 281; https://doi.org/10.3390/a15080281

Submission received: 10 July 2022 / Revised: 2 August 2022 / Accepted: 8 August 2022 / Published: 11 August 2022

(This article belongs to the Special Issue Algorithms for Feature Selection)

Download

Browse Figures

Versions Notes

Abstract

:

Crack detection on historical surfaces is of significant importance for credible and reliable inspection in heritage structural health monitoring. Thus, several object detection deep learning models are utilized for crack detection. However, the majority of these models are powerful at most in achieving the task of classification, with primitive detection of the crack location. On the other hand, several state-of-the-art studies have proven that pixel-level crack segmentation can powerfully locate objects in images for more accurate and reasonable classification. In order to realize pixel-level deep crack segmentation in images of historical buildings, this paper proposes an automated deep crack segmentation approach designed based on an exhaustive investigation of several U-Net deep learning network architectures. The utilization of pixel-level crack segmentation with U-Net deep learning ensures the identification of pixels that are important for the decision of image classification. Moreover, the proposed approach employs the deep learned features extracted by the U-Net deep learning model to precisely describe crack characteristics for better pixel-level crack segmentation. A primary image dataset of various crack types and severity is collected from historical building surfaces and used for training and evaluating the performance of the proposed approach. Three variants of the U-Net convolutional network architecture are considered for the deep pixel-level segmentation of different types of cracks on historical surfaces. Promising results of the proposed approach using the

U^{2} - N e t

deep learning model are obtained, with a Dice score and mean Intersection over Union (mIoU) of 71.09% and 78.38% achieved, respectively, at the pixel level. Conclusively, the significance of this work is the investigation of the impact of utilizing pixel-level deep crack segmentation, supported by deep learned features, through adopting variants of the U-Net deep learning model for crack detection on historical surfaces.

Keywords:

deep feature learning; crack detection; pixel-level segmentation; historical surfaces; U-Net; data augmentation

1. Introduction

For historical buildings and objects that are subject to various damages, cracks represent the most critical type of damage [1,2,3]. However, the task of inspecting the structural conditions of historical surfaces is carried out manually in most cases, which is a costly and subjective process. Thus, developing automated and efficient structural damage identification techniques has attracted the attention of many researchers.

The processes of crack detection and analysis are of great significance, as indicated by the importance of cracks as an indicator of buildings’ safety and durability factors [4]. Most structures, such as bridges, roads, and buildings, become susceptible to danger and cracking due to the periodical loading, weather, or stress accumulation [5]. Consequently, continuous structural health monitoring can be a very laborious task for humans, while it is easy for automated systems based on computer vision. Thus, the demand for building automatic crack detection systems has rapidly increased, despite the challenges that they face due to complex real environments, illumination conditions, and the irregular shape of cracks [6].

Pixel-level crack detection in historical building images is a challenging task because of many sources of uncertainty, such as shadows, ornaments, carvings, separators, crack-like artifacts, and wood patterns [6]. For the case of crack detection from images, the user only provides different photos as input and receives as output any detected cracks in these photos, without the necessity for any manual intervention. The task of semantic segmentation is to assign each pixel of an image a class label. Manually performing this task is time-consuming, error-prone, and resource-exhaustive, as some cracks may be only one pixel wide. Pixel-level crack detection can be used to perform quantitative analysis on crack width and length to assess the degree of crack severity. Practically, such assessment is effectively used for diagnosing the integrity of a building’s infrastructure, such as bridges, pavements, and other buildings, using image sensors. Historical buildings have various surface structures, dust, ornaments, carvings, wood patterns, separators, corrosion on walls, bird droppings, fungi, detachment, color degradation, other damages, and weathering damages that hinder the ability to recognize the correct abnormality—namely, cracks. As shown in Figure 1, surface images include corrosion, shadows, as well as different appearances, and textures. In Figure 1a,b, corrosion and wood patterns are shown, respectively. On the other hand, Figure 1c,d,h contain different texture patterns, while carvings, separators, and bird droppings are shown in Figure 1e,f,g, respectively.

This research focuses on utilizing deep learned features for the pixel-level crack detection/segmentation problem. Cracks usually appear on structures as the first visible sign of structural damage. In the last few decades, as a response to the crack segmentation problem, various traditional computer vision-based segmentation approaches, including thresholding, the modified Tubularity Flow Field (TuFF) algorithm, filtering, tensor voting, super-pixel methods, morphological approaches, clustering, and skeletonization techniques, have been introduced in the literature to perform pixel-level crack detection in the images of engineering structures. Moreover, several deep learning-based crack segmentation methods have been widely developed to efficiently detect surface cracks, such as DeepCrack, SegNet, SDDNet, and U-Net [7,8,9,10,11]. The significance of this paper is to investigate the impact of utilizing pixel-Level deep crack segmentation through adopting several variants of the U-Net deep learning model for crack detection on historical surfaces.

The main contributions of this paper are summarized as follows:

Developing an automated pixel-level detection approach through assessing various U-Net deep learning architectures for handling the problem of deep crack segmentation on historical surfaces.
Investigating two loss functions, namely Dice and cross-entropy (CE), in addition to a third hybrid loss function, for training and enhancing the performance of the proposed approach.
Constructing an expert-annotated primary dataset of crack images on historical surfaces, collected over two years from a historical location in Historic Cairo, Egypt.
Applying a contrast stretching method for handling the impacts of different environmental conditions on images of historical surfaces.
Building an extra semi-supervised pixel-level map generation module for annotating historical surface images, to avoid the cost of pixel-by-pixel manual annotation.

The remainder of this paper is organized as follows. Section 2 introduces the state-of-the-art literature related to the pixel-level crack detection problem. Section 3 describes the different phases of the proposed pixel-level crack detection approach, namely the data acquisition, data preparation, pixel-level map generation, and pixel-level crack segmentation phases. Section 4 presents details of the crack image datasets and discusses the obtained experimental results. Section 5 discusses the research conclusions and addresses recommendations for future research.

2. State-of-the-Art Studies for Crack Segmentation

This section reviews the most relevant literature addressing pixel-level crack detection techniques.

In [11], Dais et al. proposed automatic crack classification and segmentation methods based on a Convolutional Neural Network (CNN) and transfer learning schema for masonry surfaces. The performance of different pre-trained deep learning models was investigated for the crack detection task. Then, both Feature Pyramid Networks (FPN) and U-Net with the best model as the encoder were used for pixel-level crack detection. Meanwhile, in [7], Y. Liu et al. proposed an end-to-end pixel-wise crack segmentation method based on a deep hierarchical CNN having the power of learning and effectively aggregating multi-scale and multi-level features from the lower convolutional layers to the higher ones. Moreover, a loss function was especially designed to alleviate the problem of imbalanced data distribution. Another pixel-level, end-to-end, trainable deep CNN for road crack detection, able to handle strong non-uniformity, complex topology, and strong noise-like problems in the crack images, was proposed in [12] by Song et al., using a multi-scale dilated convolution module to obtain more abundant crack texture information. Furthermore, Dung in [13] proposed an end-to-end deep fully convolutional network (FCN)-based crack detection method for semantic segmentation on concrete crack images.

Moreover, in [14], Zhang et al. presented a novel crack segmentation method based on a context-aware deep convolutional network to segment cracks effectively in structural infrastructure under different conditions. Meanwhile, in [15], Kang et al. proposed a hybrid crack detection, localization, and measurement method based on a Faster R-CNN, a modified Tubularity Flow Field (TuFF) algorithm, and a modified Distance Transform Method (DTM) for pixel-level crack segmentation.

From another perspective, J. Liu et al. in [16] proposed a U-Net-based crack image segmentation network for pavement cracks with a one-cycle training schedule for speeding up the convergence. The proposed segmentation network replaced the encoder part of the U-Net neural network with a pre-trained ResNet-34 neural network.

Ghorbanzadeh et al. in [17] proposed a framework for landslide detection that integrated the ResU-Net deep learning model with rule-based object-based image analysis (OBIA) to improve the detection rate of the ResU-Net model. Based on evaluating the performance of the proposed integrative framework, using the Sentinel-2 imagery dataset, the F1 score value was enhanced by 8% and 22% against the F1 score values achieved by the ResU-Net and OBIA approaches, respectively.

In [18], Jia et al. proposed an integrated pixel-level crack detection and quantification approach based on DeepLabv3+ to automatically detect and quantify cracks in asphalt pavement images for operation and maintenance. In addition, they used adjustment of the weight of the different classes to address the problem of class imbalance, skeletonizing, and fast parallel thinning (FPT) techniques for measuring the length, width, area, and ratio of cracks. In [10], Chen et al. presented an encoder–decoder model based on a modified SegNet architecture, which handled arbitrary-sized images for pixel-level crack detection in concrete pavement, a bridge deck, and asphalt pavement, aiming at improving the performance by merging the features of low-resolution input samples.

Li et al. in [19] proposed a pixel-level crack detection method based on Deep Local Pattern Predictor (DLPP) for concrete bridge images, aiming at handling the limitations resulting from noise and clutter in the environment. Finally, in [20], Guzman-Torres et al. proposed an open-source crack detection platform based on an improved VGG-16 transfer learning model. The proposed platform in this study was capable of detecting multi-scale concrete structure cracks. Moreover, the impact of various DL architectures, regularization techniques, network depths, and transfer learning methods was examined. The proposed approach achieved accuracy and an F1 score of 99.5% and 100%, respectively.

In general, based on the surveyed related literature, several aspects related to crack detection have been addressed. However, there is a very limited number of studies addressing the problem of pixel-level crack detection in the field of historical heritage. Consequently, the approach proposed in this paper investigates the performance of adopting pixel-level deep crack segmentation models on historical surfaces using variants of the U-Net deep learning model in order to address segmentation problems with limited amounts of data.

Table 1 summarizes the presented exhaustive survey of state-of-the-art studies related to crack segmentation based on deep learning.

3. The Proposed Deep Crack Segmentation Approach

As stated in the previous sections, this paper proposes an end-to-end

U^{2}

-Net based pixel-level crack segmentation approach aiming to generate a pixel-level crack map. It consists of four phases, as shown in Figure 2, namely (1) data acquisition phase, which is responsible for collecting surface crack images of historical buildings; (2) data preparation phase, which is responsible for preparing and preprocessing the images for training; (3) pixel-level map generation, which is responsible for annotation of the primary dataset, and (4) pixel-level crack segmentation phase, which investigates three different U-Net architectures, namely Deep ResU-Net, ResU-Net++, and

U^{2}

-Net.

3.1. Data Acquisition Phase

In this phase, real data of surface cracks are collected from an ancient building suffering from damage problems. One of these problems is the presence of cracks, which are defined as distinguishable damage with the human eye. A primary dataset of 40 raw images of the building surfaces was captured using a digital Canon camera (EOS REBEL T3i) with

5184 \times 3456

resolution. The image collection location was the mosque (Masjed) of the Amir Altunbugha Al-Maridani, located in Sekat Al Werdani, “Bab-Al-Wazir” street, in the El-Darb El-Ahmar district of Historic Cairo, Egypt, with location coordinates 30.03974 N 31.25922 E. This mosque was built during the era of the Mamluk Sultanate of Cairo, Egypt, in 1339-40 CE. It is distinguished by its octagonal minaret and its large dome and is considered one of the most distinctively decorated historical buildings. The collected images of the building surfaces were captured using a digital Canon camera (EOS REBEL T3i) with

5184 \times 3456

resolution, over two years (2018 and 2019), before completing the restoration and rehabilitation project of the mosque in 2020.

3.2. Data Preparation Phase

In this phase, the image data are prepared and preprocessed for the next pixel-level crack segmentation phase. Samples of cracks in the primary dataset are shown in Figure 3. The data preparation phase comprises multiple steps:

(1)

Image bank generation: raw images are divided into (

256 \times 256

pixel resolution) sub-images;

(2)

Filtering: only sub-images with cracks are considered, while intact ones are ignored, and

(3)

Augmentation: several spatial transformations are applied systematically, as follows [21]:

1.: Flipping images vertically;
2.: Flipping images horizontally;
3.: Flipping images vertically, then horizontally;
4.: Rotating images vertically by 90°, then by −90°, individually;
5.: Combining the output images of the previous steps with the original images to establish a new dataset ( $N e w D a t a$ ).

3.3. Semi-Supervised Pixel-Level Map Generation

This phase aims to build up the training dataset for crack segmentation. As is known, pixel-level annotation is extremely expensive and labor-consuming, especially for critical domains such as structural damages, requiring expert dedication. This module only requires a bounding box around the object; then, GrabCut [22] will work iteratively to generate the mask instead of marking pixel by pixel. This saves time, effort, and cost, as shown in Figure 4.

GrabCut is an iterative segmentation method, which is an extension of the Graphcut method. Gaussian mixture models (GMMs) are used for color images instead of the model for monochrome images. GMMs are utilized to learn the color distributions of the background and foreground by assigning each pixel a probability to belong to a group of other pixels [22,23]. Given a color image I, let us consider the array

z = (z_{1}, \dots, z_{n}, \dots, z_{N})

of N pixels, where

z_{i} = (R_{i}, G_{i}, B_{i}), i \in [1, \dots, N]

in the

R G B

space. The segmentation of the image is presented as an array of opacity values

α = (α_{1}, . . ., α_{N})

at each pixel with

α_{n} \in 0, 1

, with 0 and 1 for background and foreground, respectively. Two GMMs for the background and the foreground are taken to be the full-covariance Gaussian mixture with K components. In the optimization framework, an extra vector

k = k_{1}, . . ., k_{n}, . . ., k_{N}

is introduced, with

k_{n} \in 1, . . . K

, assigning each pixel a unique GMM component either from the foreground or the background model according to

α_{n}

to easily deal with the GMM [22,23]. The Gibbs energy function E defined by GrabCut for segmentation, such that its minimum value should agree with a good segmentation, and it is computed as follows:

E (\underset{̲}{α}, \underset{̲}{θ}, z) = U (\underset{̲}{α}, k, \underset{̲}{θ}, z) + V (\underset{̲}{α}, z)

(1)

The data term U evaluates the fit of the opacity distribution

α

to the data z, given the histogram model

θ

, and is defined to be

U (\underset{̲}{α}, k, \underset{̲}{θ}, z) = \sum_{n} D (α_{n}, k_{n}, \underset{̲}{θ}, z_{n})

(2)

where

D (α_{n}, k_{n}, \underset{̲}{θ}, z_{n}) = - log p (z_{n} | α_{n}, k_{n}, \underset{̲}{θ}) - log π (α_{n}, k_{n})

,

p (\cdot)

is a Gaussian probability distribution, and

π (\cdot)

are mixture weighting coefficients, so that (up to a constant)

D (α_{n}, k_{n}, \underset{̲}{θ}, z_{n}) = - log π (α_{n}, k_{n}) + \frac{1}{2} log det \sum (α_{n}, k_{n}) +

\frac{1}{2} {[z_{n} - μ (α_{n}, k_{n})]}_{}^{T} \sum {(α_{n}, k_{n})}_{}^{- 1} [z_{n} - μ (α_{n}, k_{n})]

(3)

Therefore, the parameters of the model are as follows:

\underset{̲}{θ} = π (α, k), μ (α, k), \sum (α, k), α \in 0, 1, k = 1, . . . K

(4)

where

π, μ, \sum

are the weights, mean, and covariance values of the 2K Gaussian components for both background and foreground distributions. Meanwhile, the smoothness term V is computed as follows [22,23]:

V (\underset{̲}{α}, z) = γ \sum_{(m, n) \in C} [α_{n} \neq α_{m}] exp - β {‖ z_{m} - z_{n} ‖}_{}^{2}

(5)

The original GrabCut segmentation works as shown in Algorithm 1 [22,23].

Algorithm 1 GrabCut pixel-level map generation

1:: Obtain a bounding box b around the region of interest.
2:: Initialize trimap T by supplying only $T_{B}$ . The foreground is set to $T_{F}$ = 0; $\bar{T_{U}} = T_{B}$ , complement of the background.
3:: Make $α_{n} = 0$ for $n \in T_{B}$ and $α_{n} = 1$ for $n \in T_{U}$ as initial segmentation.
4:: Initialize GMMs for each of background and foreground from sets $α_{n} = 0$ and $α_{n} = 1$ , respectively.
5:: Assign GMM components to pixels: for each n in $T_{U}$ ,

$k_{n} : = arg min_{k_{n}} D_{n} (α_{n}, k_{n}, θ, z_{n})$

(6)
6:: Learn GMM parameters from data z:

$θ : = arg min_{θ} U (α, k, θ, z)$

(7)
7:: Estimate segmentation: use min-cut formula

$min_{α_{n} : n \in T_{U}} min_{k} E (α, k, θ, z)$

(8)
8:: Repeat steps 5, 6, and 7 until convergence
9:: Apply border matting method

3.4. Segmentation-Based Variant U-Net Models

U-Net is an encoder–decoder convolutional network that utilizes skip connections for preserving features at multi-resolution, used to solve end-to-end semantic segmentation tasks. Skip connections are a critical component of conventional deep neural networks (DNNs) such as DenseNet, ResNet, ResNeXt, and WideResNet. Skip connections build a short-cut from shallow layers to deep layers by directly connecting the input of a convolutional block/the residual module to its output. Their task throughout the network is to speed up the learning process, preserve low-level features, and avoid the problem of vanishing gradient in deep models [24,25,26].

In general, U-Net comprises two parts, namely the encoder part and the decoder part. The encoder part is responsible for taking an image as input and extracting features at multiple scales and abstraction levels, thus yielding a multi-level, multi-resolution feature representation. It is a simple down-sampling path consisting of stacked convolutional blocks with a max-pooling operator, used as dimensionality reduction, which can be replaced by a deeper network such as VGG or ResNet. Meanwhile, the decoder part controls the reconstruction of the probability segmentation maps, where its task is to take the feature representation and classify all pixels at the original image resolution in parallel. U-Net has the advantage of performing well on segmentation problems with limited amounts of data [24,25,26]. In this phase, three different U-Net based models along with two different loss functions, namely Dice and cross-entropy (CE), are utilized for pixel-level crack detection, as shown in Equations (9) and (10). Moreover, benchmark datasets of various crack types and severity are utilized for training the proposed model.

D i c e = 1 - \frac{2 \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i, j} p_{i, j} + ϵ}{\sum_{i = 1}^{N} \sum_{j = 1}^{C} (y_{i, j} + p_{i, j}) + ϵ}

(9)

C E = \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i, j} log p_{i, j}

(10)

where p is a predicted map and y is its corresponding ground truth for class j. N and C are the number of pixels and the number of classes (excluding the background), respectively. Finally,

ϵ

is a smoothing constant to avoid dividing by zero.

Moreover, contrast stretching is used as a prepossessing step during the testing time to handle the problems arising from different environmental conditions. In addition, the historical building’s concrete has various surface structures, dust, ornaments, carvings, wood patterns, separators, corrosion on walls, bird droppings, fungi, detachment, color degradation, other damages, and weathering damages that hinder the ability to recognize the correct abnormality, mainly cracks.

3.4.1. Deep Residual Unet (ResUnet)

Deep ResUnet is a U-Net-based segmentation neural network that integrates the strengths of U-Net and residual neural network, as shown in Figure 5. This integration yields two benefits: (1) the ease of network training due to the residual unit; (2) the facility of information propagation without degradation, due to the skip connections inside a residual unit and between both low and high levels of the network, resulting in the possibility of designing a neural network with a much smaller number of parameters while achieving similar or even better performance on tasks of semantic segmentation [25,26,27].

3.4.2. ResUnet++

The ResUNet++ architecture is a Deep ResUNet-based semantic segmentation deep neural network that combines the strengths of residual blocks, Atrous Spatial Pyramidal Pooling (ASPP), squeeze and excitation blocks, and attention blocks, as shown in Figure 6. As stated in [28], attention maps improve image classification by highlighting relevant and suppressing misleading information such as the background. The attention mechanism is widely employed in Natural Language Processing (NLP) and semantic segmentation tasks, such as pixel-wise prediction. It pays attention to the subset of its input feature map, where its task is to determine which parts of the network require more attention. The advantages of the attention mechanism are that it reduces the information encoding computational cost, is simple, and improves the results. The ASPP is a module having the ability to precisely capture multi-scale information and add more global contextual information for more robust and accurate classification. The ASPP module consists of several parallel atrous convolutions with different expansion rates to investigate its input feature map at a specific effective field-of-view for precisely extracting information at multi-scale. Moreover, the global average pooling adds more global contextual information [26,27].

Figure 5. Architecture of Deep Residual U-Net (ResU-Net).

Moreover, the squeeze and excitation block (SE) is an architectural unit constructed to enhance the network’s representational power by enabling it to fulfill the dynamic recalibration of the channel-wise feature. It can learn to use global information to automatically acquire the importance of each feature channel. It then selectively emphasizes useful features and suppresses nonproductive ones. Meanwhile, the residual block comprises multiple combinations of convolutional layers, batch normalization, ReLU activation, and identity skip connections. It aims to ease the training process and improve the network’s representation ability [29].

3.4.3. $U^{2}$ -Net

U^{2}

-Net is a semantic segmentation neural network designed for salient object detection (SOD). It is a nested U-structure, with exactly two levels, as shown in Figure 7. A novel ReSidual U-block (RSU) was inspired by the structure proposed in [29] to collect intra-stage multi-scale features. The structure of RSU block is shown in Figure 8, where

# L

represents the number-of-layers in the encoder and

C H L_{i n}

,

C H L_{o u t}

are the input and output channels, respectively. Also,

# C H L

is the number-of-channels in the internal layers of the block.

Briefly, the RSU block comprises three main components:

1.: The input convolution layer is responsible for transforming the input feature map to an intermediate one;
2.: A L height U-Net is responsible for learning how to elicit and encode the multi-scale contextual information using the intermediate feature map as input;
3.: The residual connection is responsible for fusing both local features and multi-scale features by a summation operator.

Thus, the

U^{2}

-Net has the advantage of the ability to capture more contextual information from multi-scales and increase the whole architecture’s depth without notably increasing the cost of computation thanks to the RSU block [29].

3.4.4. Utilized U-Net-Based Models

Deep ResU-Net model: In this paper, a nine-level architecture of deep ResU-Net is utilized along with two different loss functions for pixel-by-pixel crack detection. All levels are built with residual blocks comprising two $3 \times 3$ convolution blocks and an identity mapping connecting both the input and output of the block.
ResU-Net++ model: The original architecture of ResU-Net++ is utilized along with two different loss functions for pixel-by-pixel crack detection with the filter numbers [16, 32, 64, 128, 256]. The filter number was selected based on experiments.
$U^{2}$ -Net model: Moreover, the original architecture of $U^{2}$ -Net is utilized along with two different loss functions for pixel-by-pixel crack detection.

4. Experimental Results

In this study, simulation experiments were performed on a Kaggle kernel with an NVIDIA TESLA P100 GPU and 16 GB memory. The proposed approach was designed with Tensorflow and Pytorch using a Python environment on the Linux platform. To evaluate the proposed models, five performance metrics, namely accuracy, precision, recall, Dice coincidence index (Sørensen similarity coefficient), Jaccard coefficient, and IoU, were calculated according to Equations (11) to (16), respectively [30].

A c c u r a c y = \frac{T P + T N}{T P + F N + T N + F P}

(11)

r e c a l l = \frac{T P}{T P + F N}

(12)

p r e c i s i o n = \frac{T P}{T P + F P}

(13)

D i c e_{i n d e x} = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(14)

I o U = \frac{G r o u n d T r u t h \cap P r e d i c t i o n}{G r o u n d T r u t h \cup P r e d i c t i o n}

(15)

J a c c a r d = \frac{T P}{T P + F P + F N}

(16)

The TP, FP, TN, and FN terms are defined as follows:

True Positive (TP): the pixel is a crack and is classified as a crack;
False Positive (FP): the pixel is intact and is classified as a crack;
True Negative (TN): the pixel is intact and is classified as intact;
False Negative (FN): the pixel is a crack and is classified as intact.

4.1. Dataset Description

This subsection describes the different characteristics of the utilized datasets in this study, as follows:

1.: The open crack detection dataset [7] is a benchmark dataset, consisting of a total of 537 images with manual annotation maps. It is divided into 300 and 237 images as training and testing datasets, respectively.
2.: The CrackForest dataset [31] is another benchmark crack detection dataset, consisting of a total of 118 images.
3.: The primary dataset consists of a total of 263 crack images of historical surfaces with ornaments, carvings, wood patterns, separators, and corrosion on walls.

All models were trained on a mixed dataset, consisting of the augmented CrackForest [31] dataset and an open crack [7] training dataset, using rotation and flipping, with a total of 2508 crack images.

Moreover, the proposed approach was subsequently tested on a mixed dataset of two different datasets (the open crack detection testing dataset and the primary dataset of the historical building) using 500 crack images.

4.2. Results and Discussion

All models were trained for 50 epochs. The training started with a batch size of 16 for Deep ResU-Net and ResU-Net++ models and 12 for the

U^{2} - N e t

model. Figure 9 shows samples of the training dataset. Moreover, samples of GrabCut results are shown in Figure 10.

As shown in Table 2, which presents the performance metrics of evaluating the Deep ResU-Net model, the best model is the one trained using cross-entropy (CE) as a loss function and tested on enhanced images using contrast stretching. It is noticed that using contrast stretching as a preprocessing step in the testing phase enhances the performance by increasing the mean IoU (mIoU) from 67.069% to 74.959% for the model that uses CE loss. In general, the performance of the Deep ResU-Net model is improved when tested on preprocessed images by contrast stretching.

Table 3 shows the performance metrics of evaluating the ResU-Net++ model.

From Table 3, it is noticed that the performance of the ResU-Net++ model is improved when tested on preprocessed images by contrast stretching.

Table 4 and Table 5 show the performance metrics of evaluating the ResU-Net++ model on each testing dataset individually, with and without contrast stretching.

Table 6 shows the performance metrics of evaluating the

U^{2}

-Net model.

As noticed from Table 6, the best model is the one trained using the Dice coefficient as a loss function by achieving an mIoU and Dice score of 80.922% and 75.523%, respectively. It is noticed that using contrast stretching as a preprocessing step in the testing phase has a slight effect on the performance. It does not enhance the performance.

At this point, the

U^{2}

-Net architecture utilized for the crack segmentation task is highlighted since it obtained the best results, as concluded from Table 2, Table 3 and Table 6. Table 7 shows the performance metrics of evaluating the

U^{2}

-Net model on each testing dataset individually.

It is concluded from Table 7 that the proposed crack segmentation approach based on the

U^{2}

-Net model outperforms the DeepCrack [7] approach without any preprocessing or post-processing by achieving an mIoU of 83.87%. Moreover, the proposed approach performs well on the primary dataset, although it is excluded from training. Figure 11 shows some result samples of

U^{2}

-Net on the primary dataset.

4.3. Comparative Analysis

Figure 12, Figure 13, Figure 14 and Figure 15 show different samples of

U^{2}

-Net and ResU-Net++ results.

As shown in Figure 12a, Figure 13a,c,d and Figure 14d,e, ResU-Net++ without contrast stretching has difficulty in dealing with blurred images, where it cannot detect all or most of the crack pixels.

On the other hand, it is noticed that using contrast stretching as a preprocessing step during the testing phase helps in improving the performance of ResU-Net++ when dealing with blurred images, where it increases the Dice score from 37.98% to 61.42%. Although contrast stretching enabled ResU-Net++ to detect a higher percentage of cracks on historical surfaces, it also mistakes shadows for cracks. Conclusively, ResU-Net++, both with and without contrast stretching, lacks the ability to handle images with shadows.

The main advantage of using the

U^{2}

-Net model for crack segmentation is its capability of detecting tiny cracks and unlabeled cracks, as in Figure 15b,c. Also, the

U^{2}

-Net model shows outperformance for detecting cracks on edges, as in Figure 15b and cracks in blurred images, as shown in Figure 12a and Figure 13a,c,d. Moreover, the model is capable of detecting cracks in images with varying lighting conditions that result in shadow, as in Figure 12d.

Conversely, the main limitation of crack segmentation based on the

U^{2}

-Net model is dealing with deep circle patterns with shadows, as shown in Figure 12b, and patterns with multi-level depth, as in Figure 15b. This is due to the nature of the

U^{2}

-Net model for detecting salient objects, as, when moving between two or more different depths, a part of the boundary between depths is mistakenly marked as a crack.

5. Conclusions and Future Work

This paper presents a study highlighting the effectiveness of implementing pixel-level deep crack segmentation models on historical surfaces. The significance of the proposed approach is revealed through adopting variants of the U-Net deep learning model for handling segmentation problems with limited amounts of data.

Thus, in this paper, the proposed automated approach examined three U-Net-based deep learning models, namely, Deep ResU-Net, ResU-Net++, and

U^{2}

-Net, for pixel-level crack segmentation. Decisively, it was observed that the

U^{2}

-Net semantic segmentation neural network model outperformed other tested U-Net models based on its ability to recognize more contextual information from multi-scales and increase the whole architecture’s depth with a reasonable computation cost.

Moreover, in order to tackle the challenge of limited historical surface data in the literature, a primary dataset is generated, for testing the proposed approach, containing crack images of historical surfaces with various crack types, sizes, and severity on several complex backgrounds of ornaments, carvings, wood patterns, separators, and corrosion. To the best of the authors’ knowledge, this research is the first to utilize the U-Net deep learning model for pixel-level deep crack segmentation on images of historical surfaces.

It is observed that the performance of the proposed approach using the

U^{2}

-Net model deteriorates, with Dice score, mIoU, and Jaccard measures declining from 80.52% to 71.09%, from 83.78% to 78.38%, and from 67.392% to 55.147%, respectively, when tested on the historical surfaces dataset compared to when tested on the open crack dataset [7]. The

U^{2}

-Net model is known for detecting salient objects, which results in some cases of mistakenly detecting the boundaries as cracks. However, despite this observation, it still surpasses the performance reported in the literature considering the challenges previously stated regarding crack detection on historical surfaces. Furthermore, several state-of-the-art variant U-Net models are examined for their efficacy to classify crack images from historical surfaces on a pixel level, with the highest obtained Dice score of 71.09%. In particular, for historical surfaces, when the ResU-Net++ is considered with contrast stretching, the Dice score increases from 37.98% to 61.42%, which demonstrates the beneficial effect of using contrast stretching as a preprocessing step during the testing phase.

The most significant findings of the proposed approach in this research surpassed those of the other proposed approaches in the literature with comparable experiments for crack segmentation on concrete or asphalt surfaces. However, several challenges still remain to be addressed for future research, such as running more experiments considering other types of surfaces. Furthermore, although the proposed deep segmentation approach achieved promising results considering the crack defect on historical surfaces, generating more annotated images considering other defect types, such as corrosion, bird droppings, and shadows, should be considered for future research work in order to enrich the currently available dataset. Moreover, additional experiments considering images under low lighting and/or other environmental conditions should reveal the sensitivity of the proposed approach to image quality.

Author Contributions

E.E.: Data Collection, Initial Filtering, Data Preparation, Annotation, Methodology, Software, Validation, Writing—Original Draft. N.E.-B.: Data Collection, Data Preparation, Annotation, Supervision, Writing—Review and Editing. S.A.T.: Initial Filtering, Annotation, Supervision, Writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, H.; Ahn, E.; Shin, M.; Sim, S.-H. Crack and noncrack classification from concrete surface images using machine learning. Struct. Health Monit. 2019, 18, 725–738. [Google Scholar] [CrossRef]
Atha, D.J.; Jahanshahi, M.R. Evaluation of deep learning approaches based on convolutional neural networks for corrosion detection. Struct. Health Monit. 2018, 17, 1110–1128. [Google Scholar] [CrossRef]
Cavalagli, N.; Kita, A.; Falco, S.; Trillo, F.; Costantini, M.; Ubertini, F. Satellite radar interferometry and in-situ measurements for static monitoring of historical monuments: The case of Gubbio, Italy. Remote. Sens. Environ. 2019, 235, 111453. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, Z.; Qi, D.; Liu, Y. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring. Sensors 2011, 14, 19307–19328. [Google Scholar] [CrossRef] [PubMed]
Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-Based Crack Detection Methods: A Review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
Palevičius, P.; Pal, M.; Landauskas, M.; Orinaitė, U.; Timofejeva, I.; Ragulskis, M. Automatic Detection of Cracks on Concrete Surfaces in the Presence of Shadows. Sensors 2022, 22, 3662. [Google Scholar] [CrossRef]
Liu, Y.; Jian, Y.; Xiaohu, L.; Renping, X.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
Guan, H.; Li, J.; Yu, Y.; Chapman, M.; Wang, H.; Wang, C.; Zhai, R. Iterative Tensor Voting for Pavement Crack Extraction Using Mobile Laser Scanning Data. IEEE Trans. Geosci. Remote. Sens. 2015, 53, 1527–1537. [Google Scholar] [CrossRef]
Weng, X.; Huang, Y.; Wang, W. Segment-based pavement crack quantification. Autom. Constr. 2019, 105, 102819. [Google Scholar] [CrossRef]
Chen, T.; Cai, Z.; Zhao, X.; Chen, C.; Liang, X.; Zou, T.; Wang, P. Pavement crack detection and recognition using the architecture of SegNet. J. Ind. Inf. Integr. 2020, 18, 100144. [Google Scholar] [CrossRef]
Dais, D.; Bal, I.E.; Smyrou, E.; Sarhosis, V. Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. Autom. Constr. 2021, 125, 103606. [Google Scholar] [CrossRef]
Song, W.; Jia, G.; Zhu, H.; Jia, D.; Gao, L. Automated pavement Crack damage detection using deep multiscale convolutional features. J. Adv. Transp. 2020, 2020, 1–11. [Google Scholar] [CrossRef]
Dung, C.V. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Zhang, X.; Rajan, D.; Story, B. Concrete crack detection using context-aware deep semantic segmentation network. Comput. Aided Civ. Infrastruct. Eng. 2019, 34, 951–971. [Google Scholar] [CrossRef]
Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
Liu, J.; Yang, X.; Lau, S.; Wang, X.; Luo, S.; Lee, V.C.-S.; Ding, L. Automated pavement crack detection and segmentation based on two-step convolutional neural network. Comput. Aided Civ. Infrastruct. Eng. 2020, 35, 1291–1305. [Google Scholar] [CrossRef]
Ghorbanzadeh, O.; Shahabi, H.; Crivellari, A.; Hoayouni, S.; Blaschke, T.; Ghamisi, P. Landslide detection using deep learning and object-based image analysis. Landslides 2022, 19, 929–939. [Google Scholar] [CrossRef]
Jia, A.; Xue, X.; Wang, Y.; Luo, X.; Xue, W. An integrated approach to automatic pixel-level crack detection and quantification of asphalt pavement. Autom. Constr. 2020, 114, 103176. [Google Scholar] [CrossRef]
Li, Y.; Li, H.; Wang, H. Pixel-Wise Crack Detection Using Deep Local Pattern Predictor for Robot Application. Sensors 2018, 18, 3042. [Google Scholar] [CrossRef]
Guzmán-Torres, J.A.; Naser, M.Z.; Domínguez-Mota, F.J. Effective medium crack classification on laboratory concrete specimens via competitive machine learning. Structures 2022, 37, 858–870. [Google Scholar] [CrossRef]
Sajjad, M.; Khan, S.; Muhammad, K.; Wu, W.; Ullah, A.; Baik, S.W. Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J. Comput. Sci. 2019, 30, 174–182. [Google Scholar] [CrossRef]
Rother, C.; Kolmogorov, V.; Blake, A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 2004, 23, 309–314. [Google Scholar] [CrossRef]
Khattab, D.; Theobalt, C.; Hussein, A.S.; Tolba, M.F. Modified GrabCut for human face segmentation. Ain Shams Eng. J. 2014, 5, 1083–1091. [Google Scholar] [CrossRef]
Vuola, A.O.; Akram, S.U.; Kannala, J. Mask-RCNN and U-Net ensembled for nuclei segmentation. In Proceedings of the IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy, 8–11 April 2019; pp. 208–212. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual U-Net. IEEE Geosci. Remote. Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Jetley, S.; Lord, N.A.; Lee, N.; Torr, P.H.S. Learn To Pay Attention. In Proceedings of the 6th International Conference on Learning Representations, (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. Unet++: A nested U-Net architecture for medical image segmentation. In Proceedings of the 4th International Workshop of Deep Learning in Medical Image Analysis (DLMIA 2018) & 8th International Workshop of Multimodal Learning for Clinical Decision Support (ML-CDS 2018), Granada, Spain, 20 September 2018; pp. 3–11. [Google Scholar] [CrossRef]
Xiao, X.; Lian, S.; Luo, Z.; Li, Z.S. Weighted Res-UNet for high-quality retina vessel segmentation. In Proceedings of the 9th IEEE International Conference on Information Technology in Medicine and Education (ITME 2018), Hangzhou, China, 19–21 October 2018; pp. 327–331. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, O.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognit. 2020, 106, 107404. [Google Scholar] [CrossRef]
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
Labatut, V.; Cherifi, H. Accuracy measures for the comparison of classifiers. In Proceedings of the 5th International Conference on Information Technology, Amman, Jordan, 11–13 May 2011; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. Historical building surface examples.

Figure 2. Structure of the proposed deep crack segmentation approach.

Figure 3. Samples of cracks in the primary dataset.

Figure 4. Illustration of semi-supervised GrabCut pixel-level map generation.

Figure 6. Architecture of ResU-Net++.

Figure 7. Architecture of

U^{2}

-Net.

Figure 7. Architecture of

U^{2}

-Net.

Figure 8. General structure of RSU block.

Figure 9. Training dataset samples.

Figure 10. Semi-supervised GrabCut pixel-level map generation.

Figure 11.

U^{2}

-Net result samples.

Figure 11.

U^{2}

-Net result samples.

Figure 12. Results on several samples of the used crack detection dataset, with different patterns—group1. In each column, we present the results based on: (a) Blurred image with a crack, (b) Wide crack with a marker, (c) Branched crack with a flower, (d) Branched crack with flower (a different point of view), (e) Branched crack.

Figure 13. Results on several samples of the used crack detection dataset, with different patterns—group-2. In each column, we present the results based on: (a) Blurred image with a crack, (b) Wide crack with a marker, (c) Blurred image with a crack and texture, (d) Blurred image with a crack, (e) Image with a crack inside carvings.

Figure 14. Results on several samples of the used crack detection dataset, with different patterns—group-3. In each column, we present the results based on: (a) Image with a short wide crack on the edge, (b) Wide crack on the edge, (c) Image with a crack and corrosion, (d) Blurred image with a crack, (e) Blurred image with a crack inside ornaments.

Figure 15. Results on several samples of the used crack detection dataset, with different patterns—group-4. In each column, we present the results based on: (a) Image with a branched crack (a multi-depth view), (b) Image with a crack inside carvings, (c) Image with a crack and marker, (d) Image with a crack inside ornaments, (e) Image with a branched crack.

Table 1. Summary of surveyed state-of-the-art studies for crack segmentation approaches (N/A: not applicable).

Study	Method	Feature	Post Processing	Datasets	Performance
[11]	U-Net and FPN with different CNN as encoder	CNN learned features	N/A	Dataset of 351 images containing cracks and 118 intact images	F1 score = 79.6%
[7]	An end-to-end deep hierarchical CNN with a special loss function	Multi-scale and multi-level CNN features	CRFs and GF methods	Dataset of 537 images with manually annotated maps	mIoU = 85.9%, l Best F1 score = 86.5%
[12]	An end-to-end trainable deep CNN with a multi-scale dilated convolution module	Multi-scale dilated convolution CNN features	N/A	Datasets of total 4736, 1036, and 2416 crack images are used as training, validation, and testing set CFD and AigleRN datasets used for testing,	Recall = 97.85%, F-measure = 97.92%, Precision = 98.00%, mIoU = 73.53%
[13]	Context-aware deep CNN	CNN learned features	N/A	CrackForest Dataset (CFD), Tomorrows Road Infrastructure Monitoring Management Dataset (TRIMMD), Customized Field Test Dataset (CFTD)	F1 score = 0.8234, 0.7937, 0.8252
[15]	A U-Net with a pre-trained ResNet-34 as an encoder, followed by SCSE modules in the decoder	CNN learned features	N/A	CFD dataset, Crack500 dataset	F1 score = 96%, 73%
[16]	An end-to-end deep FCN with pre-trained VGG16 as the encoder part	CNN learned features	N/A	Crack dataset, containing 40,000 images	Average precision = 90%
[18]	DeepLabv3+	Multi-scale CNN features	Skeletonizing and FPT methods	Dataset of 300 crack images captured using a smartphone, another dataset of 80 pavement crack images	mIoU = 0.8342, 0.7331 using validation and testing datasets
[10]	An encoder-decoder based on a modified SegNet architecture	CNN learned features	N/A	Datasets of 2000 bridge deck images and 1000 pavement crack images	Accuracy = 90% mAP = 83%
[19]	A crack segmentation method based on DLPP	DLPP features	N/A	326,000 images collected from 45 various bridges	The proposed method was effective and robust
[14]	A hybrid method based on a Faster R-CNN, modified TuFF and DTM algorithms	Level set function	N/A	A dataset of 100 images captured from various places	mIoU = 83%
[20]	Source crack detection approach based on a much improved VGG-16 transfer learned model	CNN learned features	N/A	SDNET2018 dataset	Accuracy = 99.5% and F1 score = 100%

Table 2. Deep ResU-Net performance metrics.

Loss Function	Contrast Stretching	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dice Loss	No	98.189	46.312	40.651	65.226	40.231	25.181
Dice Loss	Yes	98.135	63.760	52.642	69.77	52.489	35.583
Binary Cross-Entropy	No	98.256	58.618	41.761	67.069	40.712	25.559
Binary Cross-Entropy	Yes	98.279	79.968	61.607	74.959	57.923	40.769
Hybrid	No	98.022	47.336	32.855	62.567	34.495	20.842
Hybrid	Yes	98.155	70.195	44.622	67.876	47.678	31.301

Table 3. ResU-Net++ performance metrics.

Loss Function	Contrast Stretching	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dice Loss	No	98.285	70.360	60.621	73.359	58.934	41.778
Dice Loss	Yes	97.96	66.648	74.66	76.236	68.073	51.599
Binary Cross-Entropy	No	98.318	79.161	52.466	72.124	52.447	35.545
Binary Cross-Entropy	Yes	97.99	79.12	69.205	77.572	62.27	45.212
Hybrid	No	98.448	71.78	54.60	71.88	55.31	38.227
Hybrid	Yes	98.21	70.69	74.29	77.32	69.43	53.175

Table 4. Results of ResU-Net++-based crack segmentation on individual datasets (no contrast stretching).

Individual Test Dataset	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dataset [7]	98.67	78.10	81.81	82.01	77.80	63.67
Historical Dataset	98.19	67.03	33.17	63.92	37.98	23.44

Table 5. Results of ResU-Net++-based crack segmentation on individual datasets (with contrast stretching).

Individual Test Dataset	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dataset [7]	98.34	73.92	87.94	82.05	79.58	66.09
Historical Dataset	97.34	62.37	69.20	73.11	61.42	44.32

Table 6.

U^{2}

-Net performance metrics.

Table 6.

U^{2}

-Net performance metrics.

Loss Function	Contrast Stretching	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dice Loss	No	98.507	73.169	84.347	80.922	75.523	60.672
Dice Loss	Yes	98.473	72.397	84.995	80.778	75.294	60.377
Binary Cross-Entropy	No	98.239	75.248	75.795	78.834	71.678	55.858
Binary Cross-Entropy	Yes	98.189	74.267	77.406	78.944	71.929	56.153
Hybrid	No	98.208	69.396	85.365	79.502	73.065	57.561
Hybrid	Yes	98.175	68.742	85.773	79.318	72.746	57.166

Table 7. Results of

U^{2}

-Net-based crack segmentation on individual datasets.

Table 7. Results of

U^{2}

-Net-based crack segmentation on individual datasets.

Individual Test Dataset	Accuracy %	Precision%	Recall%	mIoU%	Dice Score%	Jaccard%
Dataset [7]	98.87	79.71	85.39	83.78	80.52	67.392
Historical Dataset	98.32	67.73	83.03	78.38	71.09	55.147

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elhariri, E.; El-Bendary, N.; Taie, S.A. Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models. Algorithms 2022, 15, 281. https://doi.org/10.3390/a15080281

AMA Style

Elhariri E, El-Bendary N, Taie SA. Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models. Algorithms. 2022; 15(8):281. https://doi.org/10.3390/a15080281

Chicago/Turabian Style

Elhariri, Esraa, Nashwa El-Bendary, and Shereen A. Taie. 2022. "Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models" Algorithms 15, no. 8: 281. https://doi.org/10.3390/a15080281

APA Style

Elhariri, E., El-Bendary, N., & Taie, S. A. (2022). Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models. Algorithms, 15(8), 281. https://doi.org/10.3390/a15080281

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models

Abstract

1. Introduction

2. State-of-the-Art Studies for Crack Segmentation

3. The Proposed Deep Crack Segmentation Approach

3.1. Data Acquisition Phase

3.2. Data Preparation Phase

3.3. Semi-Supervised Pixel-Level Map Generation

3.4. Segmentation-Based Variant U-Net Models

3.4.1. Deep Residual Unet (ResUnet)

3.4.2. ResUnet++

3.4.3. $U^{2}$ -Net

3.4.4. Utilized U-Net-Based Models

4. Experimental Results

4.1. Dataset Description

4.2. Results and Discussion

4.3. Comparative Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Automated Pixel-Level Deep Crack Segmentation on Historical Surfaces Using U-Net Models

Abstract

1. Introduction

2. State-of-the-Art Studies for Crack Segmentation

3. The Proposed Deep Crack Segmentation Approach

3.1. Data Acquisition Phase

3.2. Data Preparation Phase

3.3. Semi-Supervised Pixel-Level Map Generation

3.4. Segmentation-Based Variant U-Net Models

3.4.1. Deep Residual Unet (ResUnet)

3.4.2. ResUnet++

3.4.3. U 2 -Net

3.4.4. Utilized U-Net-Based Models

4. Experimental Results

4.1. Dataset Description

4.2. Results and Discussion

4.3. Comparative Analysis

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.4.3. $U^{2}$ -Net