AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation

Zhang, Cheng; Li, Gang; Zhang, Zekai; Shao, Rui; Li, Min; Han, Delong; Zhou, Mingle

doi:10.3390/app13031435

Open AccessArticle

AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation

by

Cheng Zhang

^†

,

Gang Li

^†

,

Zekai Zhang

,

Rui Shao

,

Min Li

,

Delong Han

and

Mingle Zhou

^*

Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(3), 1435; https://doi.org/10.3390/app13031435

Submission received: 1 January 2023 / Revised: 16 January 2023 / Accepted: 17 January 2023 / Published: 21 January 2023

(This article belongs to the Special Issue Applications of Deep Learning and Artificial Intelligence Methods)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The pothole is a common road defect that seriously affects traffic efficiency and personal safety. Road evaluation and maintenance and automatic driving take pothole detection as their main research part. In the above scenarios, accuracy and real-time pothole detection are the most important. However, the current pothole detection methods can not meet the accuracy and real-time requirements of pothole detection due to their multiple parameters and volume. To solve these problems, we first propose a lightweight one-stage object detection network, the AAL-Net. In the network, we design an LF (lightweight feature extraction) module and use the NAM (Normalization-based Attention Module) attention module to ensure the accuracy and real time of the pothole detection process. Secondly, we make our own pothole dataset for pothole detection. Finally, in order to simulate the real road scene, we design a data augmentation method to further improve the detection accuracy and robustness of the AAL-Net. The metrics F1 and GFLOPs show that our method is better than other deep learning models in the self-made dataset and the pothole600 dataset and can well meet the accuracy and real-time requirements of pothole detection.

Keywords:

lightweight network; attentional mechanisms; data augmentation; road surface defect detection

1. Introduction

Road infrastructure construction is an important basis for the normal operation of society today and is one of the important ways of passenger transport and material transport. Every year, a large amount of money is invested in road maintenance all over the world. Potholes are one of the main forms of road defects. In terms of road evaluation and maintenance and automatic driving, pothole detection is the main part of the research.

Road pothole detection is mainly studied in the two scenarios of road evaluation and maintenance and automatic driving. There are mainly two methods to detect potholes. The first is the manual detection method, which is still the main form of pothole detection at present [1]. However, the manual detection method has the following three disadvantages: long time consumption, low precision, and high loss. The second is to adopt automatic detection methods. Automatic detection methods are mainly divided into the following three types: vision-based detection method, vibration-based detection method, and 3D reconstruction-based detection method [2]. The vision-based detection method [3,4] is suitable for detecting the number of potholes as well as the shape of the potholes, but the method is susceptible to environmental influences, such as light and shadows. The vibration-based detection method [5,6], which has the lowest cost of the three methods, is more influenced by the sensor and the speed at which the vehicle moves during its adoption. The 3D reconstruction-based detection method [7,8] can detect the shape of potholes most accurately among the three methods, but the cost is the highest. Different automatic detection methods are used in [9,10]. With the maturity of deep learning, vision-based detection methods are gradually applied. Vision-based detection methods are mainly divided into two methods: semantic segmentation [11] and object detection [12]. In the area of object detection, there are a variety of object detectors, which are constantly innovative. With the use of attention mechanisms , the accuracy of the object detector is improved. In particular, from 2018 to 2020, the IEEE (Institute of Electrical and Electronics Engineers) held the GRDDC (Global Road Damage Detection Challenge) [13]. The vision-based detection method can detect the number of potholes and the size and shape of potholes, but this method is easily affected by light, ponding in potholes, and other environments, as shown in Figure 1. In terms of the pothole detection model, the current model algorithm [14,15,16,17] has high redundancy, many parameters, and a poor real-time performance. Now, no matter whether in a road assessment and maintenance scenarios or in the field of automatic driving, there are high requirements for the convenience of detection model deployment, in real time, and the accuracy of pothole detection. Compared with manual detection, automatic detection methods have improved in all aspects, but they can not achieve the balance of cost, efficiency, and accuracy and still can not meet the requirements of detection efficiency and accuracy.

In order to solve the above problems, we propose a one-stage object detector (AAL-Net) for pothole detection. In the model, through the LF module and the NAM attention module, we achieve the goals of lightweight and high accuracy of the model and meet the real-time and accuracy requirements of pothole detection. At the same time, we make our own dataset to test the effectiveness of our model. Finally, we propose a data augmentation method to improve the accuracy and robustness of pothole detection.

To sum up, our contributions are as follows:

We make a new pothole detection dataset for pothole detection.
We propose a one-stage object detector, the AAL-Net. By designing the LF module and using the NAM attention module, we achieve the goal of a lightweight network model and high detection accuracy.
We design a data augmentation method to improve the accuracy and robustness of pothole detection.

The content of this article is organized as follows: The second part discusses the related work, the third part details the AAL-Net we designed and the data augmentation methods we used, the fourth part introduces the dataset and the effectiveness of the comparative experiments and data augmentation methods, and the last section summarizes the results.

2. Related Work

2.1. Object Detectors

Common object detectors are divided into two categories, one-stage and two-stage. Two-stage object detectors are represented by the R-CNN [18] family, including Fast R-CNN, Faster R-CNN [19], and R-FCN [20]. One-stage object detectors are represented by YOLO [21,22,23,24], SSD [25], and RetinaNet [26]. In object detection, the object detector usually consists of three parts: the backbone, neck, and head. The backbone usually consists of a convolutional cascade used for feature extraction. The backbone is generally divided into two categories, heavyweight and lightweight. Heavyweight backbone networks can extract more features, but the extraction speed is slow. The lightweight-level backbone network extracts features fast but extracts few features. Common heavyweight backbones are the VGG [27], ResNet [28], VIT [29], etc. There are also lightweight backbones such as ShuffleNet [30,31], MobileNet [32,33,34], and GhostNet [35]. ShuffleNetv1 [30] is a neural network structure specially designed for devices with limited computing resources. It mainly adopts pointwise group convolution and channel shuffle technologies, which greatly reduces the computational overhead while preserving the model accuracy. ShuffleNetv2 [31] further improves Shufflenetv1 and puts forward four principles for designing a lightweight network: 1. When the input and output channels are the same, the MAC (Memory Access Cost) is the minimum. 2. Too many packets will increase the MAC. 3. Fragmentation is not friendly to parallel acceleration. 4. An element-by-element operation will increase the memory consumption. MobileNetv1 [32] was proposed by Google in 2017, mainly focusing on the use and deployment of a convolution neural network’s mobile terminal. In short, it is to replace the conventional convolution with the VGG network with depthwise convolution. The subsequent MobileNetv2 [33] is improved by replacing the ReLU6 activation function with a linear activation function and designing an inverted residual block. MobileNetv3 [34] uses the NAS (network architecture search) to search and determine the network structure, introduces the Squeeze and Excitation structure based on the block of MobileNetv2, uses the h-swish activation function, and improves the tail structure of MobileNetv2 network. Common necks are the FPN [36], PANet [37], and BiFPN [38]. The head is used to predict the category information and the bounding box of the object and is usually divided into two categories: the decoupled detection head and coupled detection head. The number of coupled detection head parameters is small, but the effect is average. The decoupled detection head has a large number of parameters but is effective.

2.2. Surface Defect Detection

In the detection of road surface defects, Park et al. [3] applied different YOLO models to the field of pothole detection and compared them. Deepak Kumar Dewangan et al. [4] combined convolutional neural networks and cameras to propose an improved prototype convolution neural network model. Baek et al. [9] proposed a pothole detection method that first converts RGB images to greyscale to reduce the computational effort and then adopts an edge detection method to extract features of potholes. This method takes a long time to process the images and does not have a good real-time performance. Hansen Chen et al. [10] focused on pothole detection in 2D vision and proposed a new method for pothole detection based on location-aware convolutional neural networks, focusing on distinguished regions in the road. In the field of industrial surface defect detection, Li et al. [39] proposed an attention enhancement-based PANet feature fusion method. Zhang et al. [40] proposed a lightweight industrial training model based on knowledge distillation while also incorporating the transformer self-attention used to extract global features. Wang et al. [41] explored different types of shallow and lightweight neural networks, including supervised and unsupervised models, to improve the fall detection results. Among the tested lightweight neural networks, the best one proved to be the supervised convolutional neural network. Li et al. [42] developed a lightweight convolutional neural network, WearNet, using convolution layers, max pooling layers, convolution blocks, dropout layers, and GAP (global average pooling) layers to realize the automatic scratch detection of contact sliding parts (such as metal forming parts).

2.3. Attention Mechanism

Attention mechanisms have been a hot topic of research in the field of deep learning in recent years. Many studies have focused on extracting features in one or more dimensions by using feature maps in different dimensions to perform attentional operations. SENet (Squeeze and Excitation) [43] focuses on the importance of features of the feature map in different channels and operates on the channel dimension. The BAM (Bottleneck Attention Module) [44] operates on channel and spatial dimensions. The CBAM (Convolutional Block Attention Module) [45] provides a method for sequentially embedding channels and spatial attention submodules. To avoid ignoring cross-dimensional interactions, the TAM (Triplet Attention Module) [46] considers dimensional correlations by rotating the feature map.

3. Method

3.1. Architecture

As shown in Figure 2, the pothole detection method designed in this paper involves data augmentation and network model training. First of all, data augmentation is carried out for the input pothole dataset. The data augmentation methods mainly include taking a manhole similar to the pothole as a negative sample and adding fog to simulate the real environment. Secondly, we select the mature network model YOLOv5 one-stage detection model as the baseline model for improvement. It mainly includes three parts: backbone, neck, and head. In the backbone part, we use GhostConv to replace the conventional convolution and the designed LF module to replace the feature extraction module, improving the cspdarknet53 [47] structure used by YOLOv5. In the neck part, we also use the PANet structure. Similar to backbone, we use GhostConv, the designed LF module, and the NAM attention module. In the head part, the same as the YOLOv5 network model, we use the anchor-based detection head, that is, YOLO head. We use three detection heads, which can meet the actual requirements of detecting potholes of different sizes.

3.2. Structure

3.2.1. Backbone

As shown in Figure 3, the backbone mainly includes five feature extraction modules and an SPPF (Spatial Pyramid Pooling—Fast) module. The first feature extraction module, CBS, is composed of Conv, Batch Normalization, and SiLU activation function. The remaining four feature extraction modules are composed of GCBS and LF modules designed by us. Unlike CBS, GCBS consists of GhostConv, Batch Normalization, and SiLU activation function. The difference between the four feature extraction modules is that block1, block2, block3, and block4 contain 1, 2, 3, and 1 LF modules, respectively, which can achieve different levels of feature extraction for feature maps of different sizes.

In the backbone, we use GhostConv to achieve the goal of model lightweight. Compared with ordinary convolution, some feature maps in GhostConv have depthwise separable convolution operations, which reduces the number of parameters used. However, at the same time, it also leads to the problems of incomplete extracted hole feature information and low detection accuracy. Therefore, we designed the LF module to enhance the feature extraction capability of the feature extraction network.

As shown in Figure 4, in the LF module, we first use two CBS feature extraction modules to halve the number of channels in the original feature map. Then, in order to suppress the unimportant information in the feature map, pay attention to the effective information of the pothole, and improve the detection accuracy, we integrate the ECA (Efficient Channel Attention) attention module [48]. First, the global average pooling operation is performed on the feature map, and the channel compression is performed to obtain the average value of the entire feature map. Then, a 1 × 1 convolution operation is carried out for feature extraction and sigmoid activation function operation to obtain the weight information of the feature map. Finally, it is multiplied with the original feature map to obtain a new feature map with greater attention to potholes. To make the LF module lightweight, we use depthwise separable convolution to extract feature information. In order to compensate for the feature loss caused by the depthwise separable convolution, we conduct an attention operation again to focus on useful information. Here, we will splice the output feature map and the original feature map on the channel to fuse the feature information. The number of channels is equal to the number of channels before the LF module is input. Then, we perform a 1 × 1 convolution operation to extract the feature information. Finally, we use the attention module again to filter out unimportant information. To sum up, our LF module has fewer parameters and can pay more attention to important information. The LF module Formulas (1)–(5) we designed are as follows:

C B S = SiLU (B N (Conv (x)))

(1)

E C A = x \cdot (Sig (Conv (Avg (x))))

(2)

P_{1} = C B S (E C A (C B S (x)))

(3)

P_{2} = (D W Conv (P_{1})) Cat (P_{1})

(4)

y = ECA (CBS ((E C A (P_{2})) Cat (CBS (x))))

(5)

where x is the input feature map,

C o n v

is a convolution operation,

B N

is Batch Normalization,

S i L U

is SiLU activate function,

C B S

is the CBS module,

A v g

indicates the global average pooling operation,

S i g

is sigmoid activate function, · is point multiplication operation,

E C A

is an ECA module,

P_{1}

is the output feature map before lightweight and

P_{2}

is the output feature map after lightweight,

D W C o n v

is a depthwise separable convolution,

C a t

is concat operation, y is the output feature map.

At the end of the backbone, in order to enrich the information in the pothole feature map obtained by the feature extraction network, we use the SPPF module to fuse the features of different receptive fields in the feature map.

3.2.2. Neck

As shown in Figure 5, the neck uses the PANet structure, conducts top-down feature fusion, and then conducts bottom-up feature fusion. The neck part mainly includes the NAM attention module [49] and four feature fusion modules. Similarly, in order to ensure the lightweight and detection accuracy of the entire network model and meet the real-time and accuracy requirements of hole detection, we used the GCBS module and LF module in each feature fusion module.

To enrich the information in the fusion feature map, we fuse the feature information from block2 and block3 in the feature extraction network with the N1 module and N2 module after the NAM attention operation. The attention thought of NAM is similar to that of ECA, which is to focus on more important feature information. The NAM attention module first performs a BN operation on the input feature map and then conducts a sigmoid activation operation after obtaining the corresponding weight. Then, the original feature map is multiplied by the result to obtain the output feature map that focuses on important information.

After the N1 module, the size of the feature map changes from 20 × 20 to become 40 × 40, 256 channels. After the N2 module, the size of the feature map changes from 40 × 40 to become 80 × 80, and the number of channels is 128. After the N3 module, the size of the feature map is 40 × 40, 256 channels. After the N4 module, the size of the feature map is 20 × 20, and the number of channels is 512. We, respectively, send the output results of N2, N3, and N4 modules to the detection head to detect small, medium, and large potholes.

3.3. Method of Data Augmentation

3.3.1. Negative Sample

A negative sample refers to an image that does not contain the target to be recognized by the task. It can be a background image that does not contain the detection target or an object image that is easily mistaken for the detection target. The purpose of training negative samples is to reduce the false detection rate and false recognition rate and improve the generalization ability of the network model. During the experiment, we also find that taking the object image that is easily mistaken as the detection object as a negative sample can also improve the detection accuracy of the target. Objects with the same characteristics can be easily identified as a class. Objects on some roads can be easily detected as potholes by the model, especially manholes (Figure 6).

In this experiment, we propose a method to enhance the dataset with negative samples, which improves the detection accuracy of potholes. We improve the original pothole dataset by adding the manhole images to obtain a new negative sample dataset.

3.3.2. Fog

In order to enhance the robustness of the method in extreme weather, we use the fogging method to enhance the dataset and improve the detection accuracy in ordinary environments. In this study, we use the fog algorithm to process the image and add the original pothole dataset to obtain a new fog dataset. The fogging result is shown in Figure 7.

The process of fogging algorithm is as follows: First, normalize the image pixels to find the point with the largest pixel value. The height map is generated from the normalized image using the diamond square algorithm to determine the fog concentration. Second, the result of the diamond square algorithm is processed with the maximum pixel value and then regularized. Finally, it reverts to normal pixel size. Formulas (6)–(8) of process are as follows:

Y = B N (X)

(6)

X = X + random (X) \times D S (severity)

(7)

X_{out} = F B N (B N (\frac{X \times y_{max}}{(y_{max} + random (X)}))

(8)

where

B N

denotes the image pixel regularization operation,

r a n d o m

denotes the random point taken in the graph,

D S

denotes the diamond squared algorithm,

s e v e r i t y

denotes the add fog concentration variable, and

F B N

(Fine-grained Batch Normalization) denotes the image pixel restoration normalization operation.

3.4. Loss Function

In this method, the loss function involves three components. They are classification loss, confidence loss, and location loss. For both classification loss and confidence loss, the BCE (Binary Cross-Entropy) loss is used. Equation (9) of BCE loss is as follows:

L_{B C E} (p (y), y) = - ω * (y * ln (p (y)) + (1 - y) * ln (1 - p (y)))

(9)

where y denotes the predicted class,

p (y)

denotes the probability of the class y, and

ω

denotes the weight value.

In addition, CIoU loss is used for the localization loss. Equations (10)–(12) of CIoU loss are as follows:

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(10)

v = \frac{4}{π^{2}} {(arctan \frac{w^{g t}}{h^{g t}} - arctan \frac{w}{h})}^{2}

(11)

α = \frac{v}{1 - I o U + v}

(12)

where b,

b^{g t}

represent the centroids of the anchor and object boxes, and

ρ

represents the Euclidean distance between the two centroids. c represents the diagonal distance

α

of the smallest rectangle that can cover both the anchor and object boxes and is the parameter used to balance the proportions. v is used to measure the consistency of the proportions between the anchor and object boxes. w and h represent the width and height of the boxes.

In summary, the total loss function used in this paper is (13)

L_{Total} = 1.5 L_{B C E} (p (y), y) + 0.05 L_{C I o U}

(13)

4. Results and Discussion

4.1. Experiment Description

4.1.1. Dataset

Eight datasets are used in this paper, including two datasets that were not augmented and the datasets that were processed with negative, fogging, and negative samples mixed with fog enhancement for two datasets.

To solve the real problem of pothole detection, we create a dataset containing pictures of potholes with shadows and water accumulation similar to the road background. In addition, the shape and size of the potholes in the dataset are different, as shown in Figure 1. The self-made and augmented datasets are shown in Table 1. We also visualized the types of pictures contained in the self-made and augmented datasets in Figure 8.

(1): The self-made dataset, A1 in Table 1, includes 994 pictures of potholes, 795 pictures in the training set, 199 pictures in the test set, and the ratio of the training set to the test set is 8:2.
(2): The negative sample dataset, A2 in Table 1, includes 746 pictures of potholes under normal conditions in the baseline dataset A1 and 248 pictures of manholes. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.
(3): The fogging dataset, referred to as A3 in Table 1, includes 746 pictures of potholes under normal conditions in the baseline dataset A1 and 248 pictures of potholes after the fogging treatment. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.
(4): The fogging and negative sample dataset, A4 in Table 1, includes 498 pictures of potholes in normal conditions in the baseline dataset A1, 248 pictures of potholes in the fogging dataset, and 248 pictures of manholes in the negative sample dataset. The training set includes 795 pictures, and the test set contains 199 pictures. The ratio of the training set to the test set is 8:2.

The Pothole600 dataset [50] was originally used for image segmentation, and we process it for labeling for the object detection. The Pothole600 dataset has smaller potholes, which can be used to verify the accuracy of our method for detecting smaller potholes. The details of the pothole600 dataset are shown in Table 2. We also visualized the types of pictures contained in pothole600 and the augmented datasets in Figure 9.

(1): The pothole600 dataset, B1 in Table 2, contains pictures of 600 potholes. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
(2): The negative pothole600 sample dataset, B2 in Table 2, includes 450 original pothole600 pothole pictures and 150 manhole pictures. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
(3): The pothole600 fogging dataset, B3 in Table 2, includes 450 original pothole600 pothole pictures and 150 fogging-treated pictures. The training set includes 480 pictures, and the test set contains 120 pictures. The ratio of the training set to the test set is 8:2.
(4): The pothole600 fogging and negative sample dataset, B4 in Table 2, includes 300 pictures of potholes under normal conditions in the baseline dataset B1, 150 pictures of potholes treated with fogging in the fogging dataset B3, and 150 pictures of manholes in the negative sample dataset B2. The training set consists of 450 pictures, the test set contains 150 pictures, and the ratio of the training set to the test set is 8:2.

4.1.2. Experimental Settings

In this experiment, the PyTorch version is 1.9.1, the CUDA version is 11.0, and the python version is 3.9. The training and testing are performed on the GPU NVIDIA-A100-SXM4-40GB. During each experiment, the epoch is set to 300 on the self-made pothole dataset and 200 on the pothole600 dataset because of the small number of pictures. The picture size is 460 × 460. Based on the parameter settings commonly used in engineering, the parameters are set as follows: The batch size is set to 32. The number works is set to 8. The learning rate is set to 0.01. The optimizer is the SGD.

4.1.3. Metrics

P (Precision): The accuracy rate is the ratio of the number of correct tests to the number of positive tests. Equation (14) is as follows:

p r e c e s i o n = \frac{T P}{T P + F P}

(14)

R (Recall): Recall is the ratio of the number of correct detections to the number of actual positive detections. Equation (15) is as follows:

r e c a l l = \frac{T P}{T P + F N}

(15)

where

T P

indicates the number of actuals and detections that are positive;

F P

indicates the number of detections that are positive but negative;

F N

indicates the number of detections that are negative but positive.

F1 is a measure of the accuracy of a model that takes into account both the accuracy and recall of a classification model. Equation (16) is as follows:

F 1 = \frac{2 \times p r e c e s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(16)

mAP (Mean Average Precision): mAP is the mean value of the average precision of the different categories. Equation (17) is as follows:

m A P = \frac{\sum_{1}^{N} \int_{0}^{1} p (r) d r}{N}

(17)

where N represents the total number of categories of detection objects, p represents the precision of each category, and r represents the recall of each category.

GFLOPs (Giga Floating-point Operations Per Second): the number of floating-point operations per second can be used to measure the complexity of a model. Smaller GFLOPs indicate that the model requires less computation and runs faster.

Parameter: parameter refers to the total number of parameters to be trained in model training. It is used to measure the size of the model. The unit is usually M. M refers to millions and is the unit of count.

4.2. Comparison Experiments

In order to prove the validity of the AAL-Net, we compare it with YOLOv3-tiny, YOLOv5s, and YOLOv5s-tiny, which replaces all Convs in YOLOv5s with GhostConv. To prove the robustness of the method, the experiments are carried out on the self-made dataset (A1) and pothole600 dataset (B1), respectively. The results are as follows.

As shown in Table 3, in the self-made dataset, the parameters of the AAL-Net are about 58% less than YOLOv3-tiny, the GFLOPs are about 37% less, and the F1 value is increased by 4.35%. The parameters of the AAL-Net are about 47% less than YOLOv5s and about 48% less than GFLOPs, and the F1 value is increased by 2.18%. Even though the parameters of the AAL-Net and GFLOPs are close to YOLOv5s-tiny, the F1 value is 1.11% higher than YOLOv5s-tiny. In addition, to prove the validity of our data augmentation method, we train the AAL-Net in the training set of the fogging and negative sample dataset (A4), test it in the test set of the self-made dataset (A1), and the final result is 13.68% higher than the F1 value train in the training set of the self-made dataset (A1). We visualize the experimental results (Figure 10) and the actual results (Figure 11).

Similarly, as shown in Table 4, in the pothole600 dataset, the F1 values of AAL-Net are 6.06%, 1.53%, and 0.42% higher than those of the YOLOv3-tiny, YOLOv5s, and YOLOv5s-tiny models, respectively. With our data augmentation method, the F1 value of the AAL-Net is increased by 0.4%. The experimental results (Figure 12) and the actual effects (Figure 13) are visualized.

By comparing and analyzing the experimental results, the AAL-Net proposed by us has less parameters than the tiny version of YOLOv3, YOLOv5s, and the model replacing Conv in YOLOv5s with GhostConv, which indicates that the AAL-Net proposed by us has lower memory requirements for embedded devices and is easier to deploy. The value of the GFLOPs is lower, indicating that the AAL-Net proposed by us is faster and can better meet the real-time detection of potholes. At the same time, the F1 value is higher in the self-made dataset and pothole600 dataset, indicating that the AAL-Net proposed by us has a strong generalization ability and can detect the potholes on the road surface more accurately. In addition, through our data augmentation method, the F1 value of the AAL-Net is further improved, indicating the effectiveness of our data augmentation method. Compared with the model replacing Conv with GhostConv in YOLOv5s, we use the LF module and add the NAM attention mechanism, and the parameter is lower, which indicates the light weight of the LF module we proposed. The higher F1 value also shows that the LF module we designed and the use of the NAM attention mechanism effectively increase the detection accuracy of the model.

4.3. Data Augmentation Experiments

To demonstrate the effectiveness of our proposed data augmentation methods, we used the YOLOv5s model as a baseline model for training. For the negative sample data augmentation method, we used the training set of the negative sample dataset (A2 and B2) to train and the test set of the unprocessed dataset (A1 and B1) to test. For the fogging data augmentation method, we used the training set of the fogging dataset (A3 and B3) to train and the test set of the unprocessed dataset (A1 and B1) to test. For the mixed enhancement method of fogging and negative samples, we used the training set of the fogging and negative sample mixed dataset (A4 and B4) to train and the test set of the unprocessed dataset (A1 and B1) to test. Similarly, we performed the experiments on the self-made dataset and the pothole600 dataset. The experimental results are shown below.

As shown in Table 5, in the self-made dataset, the experimental F1 value is increased by 17.72% with the negative sample data augmentation method. After the fogging data augmentation method, the effect of improving the detection accuracy is more obvious; the F1 value is increased by 25.01%. When both the negative samples and fogging data augmentation methods are used at the same time, the effect of improving the detection accuracy is not only better with one of them, but the F1 value is also improved by 16.84%. The actual results (Figure 14) are visualized.

As shown in Table 6, in the pothole600 dataset, the F1 values are increased by 2.29%, 2.29%, and 0.25%, respectively, after the negative samples, fogging, and mixed data augmentation methods. The actual effects (Figure 15) are visualized.

5. Conclusions

To solve the practical problem of pothole detection, we first made our own dataset to validate our proposed method. Secondly, we proposed a one-stage lightweight object detector, the AAL-Net. We designed the LF module and used the NAM attention module to make the AAL-Net lightweight and accurate. At the same time, combining the data augmentation method, which takes the manhole similar to the pothole as a negative sample and simulates the real road environment with fog, improves the detection accuracy and the practicability of the method. Finally, our method is better than other deep learning methods in the real time and accuracy of pothole detection and has good practical value. In the future, we will enrich the dataset, further improve the detection accuracy of the method, and improve the real-time detection of potholes. All datasets used in the paper can be found in [51].

Author Contributions

Conceptualization, Software, Resources, Methodology, Writing—Original Draft Preparation, C.Z.; Methodology, Writing—Review and Editing, Funding acquisition, G.L.; Software, Supervision, Z.Z.; Validation, Visualization, R.S.; Project administration, M.L.; Formal Analysis, D.H.; Data Curation, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Taishan Scholars Program (NO.tsqn202103097).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fan, R.; Liu, M. Road Damage Detection Based on Unsupervised Disparity Map Segmentation. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4906–4911. [Google Scholar] [CrossRef] [Green Version]
Kim, Y.-M.; Kim, Y.-G.; Son, S.-Y.; Lim, S.-Y.; Choi, B.-Y.; Choi, D.-H. Review of Recent Automated Pothole-Detection Methods. Appl. Sci. 2022, 12, 5320. [Google Scholar] [CrossRef]
Park, S.-S.; Tran, V.-T.; Lee, D.-E. Application of Various YOLO Models for Computer Vision-Based Real-Time Pothole Detection. Appl. Sci. 2021, 11, 11229. [Google Scholar] [CrossRef]
Dewangan, D.K.; Sahu, S.P. PotNet: Pothole Detection for Autonomous Vehicle System Using Convolutional Neural Network. Electron. Lett. 2021, 57, 53–56. [Google Scholar] [CrossRef]
Sattar, S.; Li, S.; Chapman, M. Road Surface Monitoring Using Smartphone Sensors: A Review. Sensors 2018, 18, 3845. [Google Scholar] [CrossRef] [Green Version]
Du, R.; Qiu, G.; Gao, K.; Hu, L.; Liu, L. Abnormal Road Surface Recognition Based on Smartphone Acceleration Sensor. Sensors 2020, 20, 451. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ul Haq, M.U.; Ashfaque, M.; Mathavan, S.; Kamal, K.; Ahmed, A. Stereo-Based 3D Reconstruction of Potholes by a Hybrid, Dense Matching Scheme. IEEE Sens. J. 2019, 19, 3807–3817. [Google Scholar] [CrossRef]
Guan, J.; Yang, X.; Ding, L.; Cheng, X.; Lee, V.C.S.; Jin, C. Automated Pixel-Level Pavement Distress Detection Based on Stereo Vision and Deep Learning. Autom. Constr. 2021, 129, 103788. [Google Scholar] [CrossRef]
Baek, J.-W.; Chung, K. Pothole Classification Model Using Edge Detection in Road Image. Appl. Sci. 2020, 10, 6662. [Google Scholar] [CrossRef]
Chen, H.; Yao, M.; Gu, Q. Pothole Detection Using Location-Aware Convolutional Neural Networks. Int. J. Mach. Learn. Cybern. 2020, 11, 899–911. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, X.; Cervone, G.; Yang, L. Detection of Asphalt Pavement Potholes and Cracks Based on the Unmanned Aerial Vehicle Multispectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3701–3712. [Google Scholar] [CrossRef]
Salaudeen, H.; Çelebi, E. Pothole Detection Using Image Enhancement GAN and Object Detection Network. Electronics 2022, 11, 1882. [Google Scholar] [CrossRef]
Arya, D.; Maeda, H.; Kumar Ghosh, S.; Toshniwal, D.; Omata, H.; Kashiyama, T.; Sekimoto, Y. Global Road Damage Detection: State-of-the-Art Solutions. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10 December 2020; pp. 5533–5539. [Google Scholar]
Gao, M.; Wang, X.; Zhu, S.; Guan, P. Detection and Segmentation of Cement Concrete Pavement Pothole Based on Image Processing Technology. Math. Probl. Eng. 2020, 2020, 1360832. [Google Scholar] [CrossRef] [Green Version]
Masihullah, S.; Garg, R.; Mukherjee, P.; Ray, A. Attention Based Coupled Framework for Road and Pothole Segmentation. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10 January 2021; pp. 5812–5819. [Google Scholar]
Fan, J.; Bocus, M.J.; Hosking, B.; Wu, R.; Liu, Y.; Vityazev, S.; Fan, R. Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for Road Pothole Detection. In Proceedings of the 2021 IEEE International Conference on Autonomous Systems (ICAS), Montreal, QC, Canada, 11 August 2021; pp. 1–5. [Google Scholar]
Anand, S.; Gupta, S.; Darbari, V.; Kohli, S. Crack-Pot: Autonomous Road Crack and Pothole Detection. In Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), IEEE, Canberra, Australia, 10–13 December 2018; pp. 1–6. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. ISBN 978-3-319-46447-3. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11218, pp. 122–138. ISBN 978-3-030-01263-2. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10781–10790. [Google Scholar]
Li, G.; Shao, R.; Wan, H.; Zhou, M.; Li, M. A Model for Surface Defect Detection of Industrial Products Based on Attention Augmentation. Comput. Intell. Neurosci. 2022, 2022, 9577096. [Google Scholar] [CrossRef]
Zhang, Z.-K.; Zhou, M.-L.; Shao, R.; Li, M.; Li, G. A Defect Detection Model for Industrial Products Based on Attention and Knowledge Distillation. Comput. Intell. Neurosci. 2022, 2022, 6174255. [Google Scholar] [CrossRef]
Wang, G.; Li, Q.; Wang, L.; Zhang, Y.; Liu, Z. Elderly Fall Detection with an Accelerometer Using Lightweight Neural Networks. Electronics 2019, 8, 1354. [Google Scholar] [CrossRef] [Green Version]
Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A New Lightweight Deep Neural Network for Surface Scratch Detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. ISBN 978-3-030-01233-5. [Google Scholar]
Liu, Z.; Wang, L.; Wu, W.; Qian, C.; Lu, T. TAM: Temporal Adaptive Module for Video Recognition. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 13688–13698. [Google Scholar]
Wang, C.-Y.; Mark Liao, H.-Y.; Wu, Y.-H.; Chen, P.-Y.; Hsieh, J.-W.; Yeh, I.-H. CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1571–1580. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-Based Attention Module. arXiv 2021, arXiv:2111.12419. [Google Scholar]
Fan, R.; Wang, H.; Bocus, M.J.; Liu, M. We Learn Better Road Pothole Detection: From Attention Aggregation to Adversarial Domain Adaptation. In Computer Vision—ECCV 2020; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12538, pp. 285–300. [Google Scholar]
Available online: https://www.kaggle.com/datasets/zchengcheng/pothole-datasets (accessed on 14 January 2023).

Figure 1. Potholes with different environments, shapes, and sizes in the self-made dataset. The upper left is a small round pothole in the normal environment. On the upper right is a long strip of the small pothole in the shadow environment. At the bottom left is a small round pothole in rainy weather. At the lower right is a large irregular pothole in rainy weather.

Figure 2. Framework of the method.

Figure 3. Backbone Structure. The left part is the structure of the backbone, and the right part is the size and number of channels of the output feature map of each module.

Figure 4. LF module.

Figure 5. Neck.

Figure 6. Negative sample: manhole.

Figure 7. Comparison of pothole photos before and after fogging.

Figure 8. Self-made datasets and augmented number of pictures in the dataset.

Figure 9. Pothole600 datasets and augmented number of pictures in the dataset.

Figure 10. In the self-made dataset, the comparison between F1 values and parameter values of different methods, the horizontal axis represents different methods, the left half of the vertical axis represents F1 values, and the right half of the vertical axis represents parameter values.

Figure 11. Visualization of image detection on the self-made dataset using different methods.

Figure 12. In the pothole600 dataset, the comparison between F1 values and parameter values of different methods, the horizontal axis represents different methods, the left half of the vertical axis represents F1 values, and the right half of the vertical axis represents parameter values.

Figure 13. Visualization of image detection on the pothole600 dataset using different methods.

Figure 14. Visualization of images detected on the self-made dataset using different data augmentation methods.

Figure 15. Visualization of images detected on the pothole600 dataset using different data augmentation methods.

Table 1. Number of photos of each type in A1, A2, A3, A4 of the self-made dataset.

Name	Pothole	Fog	Manhole
Pothole(A1)	994	0	0
Pothole-manhole(A2)	746	0	248
Pothole-fog(A3)	746	248	0
Pothole-manhole-fog(A4)	498	248	248

Table 2. Number of photos of each type in B1, B2, B3, B4 of pothole600 dataset.

Name	Pothole	Fog	Manhole
Pothole600(B1)	600	0	0
Pothole600-manhole(B2)	450	0	150
Pothole600-fog(B3)	450	150	0
Pothole600-manhole-fog(B4)	300	150	150

Table 3. Comparison of experimental results with existing methods in the self-made dataset.

Model	P (%)	R (%)	F1 (%)	Parameter	GFLOPs
YOLOv3-tiny	69.40	62.58	65.81	8.67M	13.0
YOLOv5s	72.92	63.68	67.98	7.02M	15.9
YOLOv5s-tiny	73.94	64.77	69.05	3.68M	8.2
AAL-Net	77.47	64.11	70.16	3.67M	8.2
AAL-Net+aug	87.97	80.08	83.84	3.67M	8.2

Table 4. Comparison of experimental results with existing methods in the pothole600 dataset.

Model	P (%)	R (%)	F1 (%)	Parameter	GFLOPs
YOLOv3-tiny	91.12	88.97	90.03	8.67M	13.0
YOLOv5s	93.11	96.06	94.56	7.02M	15.9
YOLOv5s-tiny	95.29	96.06	95.67	3.68M	8.2
AAL-Net	95.34	96.85	96.09	3.67M	8.2
AAL-Net+aug	95.38	97.63	96.49	3.67M	8.2

Table 5. Based on YOLOv5s model, compare the results of data augmentation in the self-made dataset.

Data Augmentation	P (%)	R (%)	F1 (%)
no	72.92	63.68	67.98
manhole	89.71	82.04	85.70
fog	96.03	90.15	92.99
fog-manhole	85.87	83.80	84.82

Table 6. Based on YOLOv5s model, compare the results of data augmentation in the pothole600 dataset.

Data Augmentation	P (%)	R (%)	F1 (%)
no	93.11	96.06	94.56
manhole	96.08	97.63	96.85
fog	96.09	97.63	96.85
fog-manhole	95.97	93.68	94.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, C.; Li, G.; Zhang, Z.; Shao, R.; Li, M.; Han, D.; Zhou, M. AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Appl. Sci. 2023, 13, 1435. https://doi.org/10.3390/app13031435

AMA Style

Zhang C, Li G, Zhang Z, Shao R, Li M, Han D, Zhou M. AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Applied Sciences. 2023; 13(3):1435. https://doi.org/10.3390/app13031435

Chicago/Turabian Style

Zhang, Cheng, Gang Li, Zekai Zhang, Rui Shao, Min Li, Delong Han, and Mingle Zhou. 2023. "AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation" Applied Sciences 13, no. 3: 1435. https://doi.org/10.3390/app13031435

APA Style

Zhang, C., Li, G., Zhang, Z., Shao, R., Li, M., Han, D., & Zhou, M. (2023). AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation. Applied Sciences, 13(3), 1435. https://doi.org/10.3390/app13031435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AAL-Net: A Lightweight Detection Method for Road Surface Defects Based on Attention and Data Augmentation

Abstract

1. Introduction

2. Related Work

2.1. Object Detectors

2.2. Surface Defect Detection

2.3. Attention Mechanism

3. Method

3.1. Architecture

3.2. Structure

3.2.1. Backbone

3.2.2. Neck

3.3. Method of Data Augmentation

3.3.1. Negative Sample

3.3.2. Fog

3.4. Loss Function

4. Results and Discussion

4.1. Experiment Description

4.1.1. Dataset

4.1.2. Experimental Settings

4.1.3. Metrics

4.2. Comparison Experiments

4.3. Data Augmentation Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI