1. Introduction
Road infrastructure is an indispensable foundation for economic development, and countries invest enormous resources in road construction projects every year [
1]. However, the increase in traffic volume and the prolongation of road infrastructure usage have led to an alarming rise in the prevalence of road cracks [
2,
3]. Cracks can be classified into five distinct categories: alligator crack, longitudinal crack, transverse crack, pothole, and patching [
4]. Each type of crack has different detection needs and treatment priorities. The first three types of cracks are the main targets in the field of road crack detection, in which longitudinal and transverse cracks that are linear in shape are often a sign of structural damage to the road, and timely treatment before the crack reaches a critical size that may endanger road safety or significantly increase the cost of maintenance not only reduces the occurrence of safety accidents but also greatly reduces the cost of maintenance of the road, so linear cracks are the main area of research in the field of image vision [
5,
6]. If this linear crack is not controlled in a timely and effective manner, it will not only weaken the bearing capacity of the road but also lead to disruption in the road surface, thereby creating serious safety hazards [
7], and may eventually lead to the road being scrapped [
8]. Therefore, in civil engineering practice, the identification of linear cracks is a key initiative to ensure the safety of road structures and extend their service life [
9].
Traditional road crack detection is mainly based on the inspector’s visual observation; although the accuracy rate is high, the efficiency is low and it needs to consume great human and material resources [
10]. As the number of roads continues to grow, it has become evident that the current method for efficient detection is insufficient to meet the rising demand [
11]. Consequently, a new approach based on digital image processing has emerged.
Traditional digital image processing methods, such as threshold segmentation, edge detection, and morphological operations, have been effective in the field of road crack detection [
12]. Kusumaningrum et al. [
13] proposed a road crack detection model based on the degree of category damage by converting RGB images captured by a digital camera into grayscale images, normalizing the images and determining the damage classification, and, finally, classifying the damage based on the grayscale thresholds using the K-NN algorithm, which achieves a significant improvement in accuracy. Liu et al. [
14] addressed the issue of the inadequate performance of automatic crack detection by extracting crack information at multiple scales through thresholding at varying scales and employing the concept of edge detection to integrate crack characteristics at different scales. Zhong et al. [
15] put forth an algorithm that can accurately detect cracks in various orientations based on mobile laser scanning (MLS) data. This is achieved by assigning two-dimensional indices in accordance with the angle of laser scanning or acquisition time and extracting crack features through the integration of morphological filtering, sparse algorithms, and Freeman coding. Mohamed et al. [
16] used Beamlet algorithms to improve the grayscale representation of road cracks and to reduce the noise in the image, combined with a dedicated crack segmentation network to classify the pavement cracks, which was effective in detecting different types of cracks. Although traditional image processing methods have achieved satisfactory results in the field of road crack detection, their generalization ability and robustness are relatively weak due to their complexity and dependence on specific conditions [
17]. Therefore, to improve the performance of crack detection, researchers have turned their attention to deep-learning algorithms.
Deep-learning algorithm [
18] can autonomously discern and efficiently extract image features and has made significant advancements in the domain of image processing, particularly in object detection. Consequently, the deployment of this algorithmic approach to crack detection in intricate backgrounds is anticipated to demonstrate enhanced resilience and precision. Such an implementation can not only adapt to complex and evolving image contexts but also enhance the efficiency and dependability of crack detection.
Single-stage deep-learning algorithms represent a significant subcategory of deep-learning algorithms. Among these, algorithms such as RetinaNet [
19], YOLO series [
20,
21], EfficientDet [
22], and SSD [
23] have demonstrated exceptional performance in real-time processing scenarios due to their high-speed processing capabilities. This has attracted numerous researchers to pursue further exploration and innovation, thereby advancing the continuous progress and development of deep-learning single-stage algorithms. Zhu et al. [
24] put forth a novel Crack U-Net model that exhibits remarkable precision in crack detection. The basic convolution module of the network is composed of a module based on the U-Net network, residual blocks, and mini-U. The dilated convolution is used to replace the traditional convolution to fully capture the edge information of the crack. The resulting network achieves excellent accuracy on the CRACK500 and CFD datasets. Fan et al. [
25] conducted an evaluation and comparison of the performance of 30 state-of-the-art (SoTA) DCNNs for road crack detection. The results demonstrate that all evaluated DCNNs exhibit comparable performance with a limited amount of training data, with PNASNet achieving the optimal balance between speed and accuracy. Li et al. [
26] developed a novel road crack detection model, DenxiDeepCrack, based on data collected using UAV technology. The model demonstrated effective performance in detecting road cracks on both the CrackTree260 dataset and a self-built dataset. Hammouch et al. [
27] put forth a methodology for the automated identification and categorization of flexible pavement fissures in Morocco, employing a CNN and pre-training the VGG-19 network through transfer learning. This approach yielded remarkable detection outcomes on the self-constructed dataset. Lv et al. [
28] proposed a mask-region-based convolutional neural network (Mask R-CNN) model based on the mask region, to classify the types of road crack damage and elucidate the design concept of deep-learning technology in road crack detection tasks. The model demonstrated high detection accuracy for different types of road cracks. Li et al. [
29] proposed a novel algorithm for the precise detection of road cracks. The proposed algorithm is based on the gm-resnet network and introduces a global attention mechanism to facilitate the extraction of high-latitude features across channels and spaces. Additionally, the algorithm employs a focus loss function to address the issue of category imbalance, resulting in high accuracy on the Concrete Structure Spalling and Cracking (CSSC) dataset. Chen et al. [
30] devised a novel neural network for pixel-level pavement crack detection, which ingeniously fuses the strengths of the codec network architecture and attention mechanism to more accurately and efficiently extract crack pixels. Experimental outcomes demonstrate that the network exhibits remarkable detection capabilities. The research conducted by these scholars has yielded favorable detection outcomes on public datasets and self-created datasets, providing robust technical support for road maintenance and management. This indicates that the field of road crack detection is progressing towards a more intelligent and automated future.
Accurately identifying tiny cracks poses a big challenge in road crack detection. Often overlooked for their small size, these cracks signal road deterioration. Early detection is crucial to prevent major damage and ensure road safety [
31]. Therefore, the detection of small cracks has become the focus of current research. Zhang et al. [
32] put forth the notion that the CTCD-Net, with its innovative attention mechanism and cross-layer fusion, can precisely capture microcrack features while effectively reducing background noise, significantly improving detection accuracy and completeness. Ciocarlan et al. [
33] developed an add-on NFA model for the detection of small targets. The model introduces an
a contrario decision criterion, which enhances the sensitivity of feature mapping, minimizes false positives, and markedly improves the accuracy of small target detection. Li et al. [
34] designed a deep-learning model based on CrackTinyNet (CrTNet). The model incorporates the BiFormer converter with an optimized loss function and the Space-to-Depth Conv technology, demonstrating distinctive capabilities in microcrack detection. He et al. [
35] put forth a novel UAV-based MUENet algorithm. The algorithm has markedly enhanced the precision, velocity, and adaptability of road crack detection through its main and auxiliary dual-path module (MADPM), non-uniform fusion structure, and efficient E-SimOTA strategy. It has established a new standard for the identification of minute target cracks on the pavement. These investigations into the detection of small targets have not only yielded noteworthy outcomes, but, moreover, have provided a substantial impetus for the advancement of real-time and precise road crack detection technology.
While progress has been made in the field of road crack detection, the current automated identification methods still have some limitations due to the complexity and variability of the shapes, sizes, and backgrounds of pavement cracks. Especially in large-scale application scenarios that require a fast response and real-time processing, traditional models tend to have high computational complexity and high resource consumption, making it difficult to be lightweight while ensuring high accuracy; models that can be lightweight also tend to exhibit problems such as insufficient robustness and a high false detection rate. Therefore, the development of a model that can accurately detect line cracks of different scales under complex road surface backgrounds and meet the demand for being lightweight has become an urgent pain point in the field of road crack detection [
36]. To address this issue, this paper puts forth a novel lightweight road crack detection model, USSC-YOLO, which is based on the YOLOv5s model and UAV aerial road crack data. This model is capable of the real-time and rapid monitoring of road cracks at different scales in complex backgrounds while ensuring optimal processing speed, especially for linear cracks. The contributions of this study are summarized as follows:
To detect cracks at varying scales and expand the crack detection range of a single image, we employed the use of drones to capture images of road cracks from both close and distant perspectives. Subsequently, we constructed a UNFSRCI dataset comprising images of road cracks and non-cracked surfaces at disparate scales, to facilitate the dialectical learning of crack characteristics by neural networks.
To mitigate the challenges of small crack proportions, sparse feature maps, and complex background interference, we incorporated the CA attention mechanism at the backbone’s terminus of the YOLOv5s model. This mechanism, leveraging unique coordinate encoding, heightens the model’s focus on crack regions, enhancing feature representation and resilience to background noise.
To improve detection accuracy for small cracks across different scales, we integrated the Swin Transformer module into the neck of YOLOv5s. This advanced multi-scale self-attention mechanism significantly boosts the model’s ability to capture minute pixel-level details, enhancing crack detection sensitivity and reducing missed and false detections.
In view of the possible parameter inflation and model runtime degradation caused by various improvement measures, we optimized the model structure and upgraded the feature extraction backbone to the efficient ShuffleNet V2 architecture. These changes not only stabilize the detection accuracy but also significantly reduce the computational capacity, allowing the model to operate efficiently in low-latency scenarios like drone aerial photography.
4. Discussion
As traffic volumes and the accumulation of road travel time increase, the prevalence of road cracks is becoming a significant concern [
75,
76]. Therefore, it is essential that we detect cracks accurately and quickly. Traditional detection methods are inefficient and costly, and have limitations in real-time detection capability and detection range. With the advancement in scientific and technological knowledge, artificial intelligence has been increasingly applied to the domain of traffic management [
77]. Under this background, deep-learning algorithms have become a crucial tool for detecting road conditions, in which single-stage algorithms, although faster in processing speed, are slightly insufficient in detection accuracy. Accordingly, the quest for an algorithm that can guarantee both real-time performance and high accuracy has emerged as a pivotal research topic [
78,
79,
80].
Considering the daily operational status of road traffic flow, and in order not to obstruct traffic, this study employs UAV aerial photography technology to capture detailed images of road traffic conditions daily. In order to accommodate the model’s requirements for comprehensive crack detection and to facilitate the detailed examination of crack characteristics, we captured multi-scale crack-level images encompassing both near and long shots. In light of the lack of a public dataset on crack-free images such as deformation joints and post-pouring strips, we purposely added images containing these strongly intrusive backgrounds to the sample set of crack-free images, thereby constructing a novel UNFSRCI dataset. It is anticipated that this data resource will address some of the deficiencies in the database of drone-photographed crack images, thereby furnishing a more comprehensive array of resources for researchers and practitioners.
In order to ensure that the model can operate efficiently in low-latency requirements scenarios such as UAV aerial photography, this paper selects a lightweight YOLOv5s as the object of improvement. However, the YOLOv5s algorithm exhibits shortcomings in robustness and accuracy, such as missed detection and false detection. Accordingly, we improve the detection performance of YOLOv5s by adding the CA attention mechanism together with the Swin Transformer. However, these two operations lead to a larger number of parameters in the model, thereby increasing the amount of computation, resulting in elevated GPLOPs. This ultimately leads to a decrease in the real-time detection performance of the road crack detection model. In order to address the issue of increased model operation load, we have replaced the entire backbone’s basic feature extraction units of YOLOv5s with ShuffleNet V2. Compared to YOLOv5s, the introduction of ShuffleNet V2 modules results in a significant reduction in GFLOPs, indicating a substantial decrease in computational complexity. In summary, the YOLOv5s + ShuffleNet V2 + Swin + CA variant, obtained through the aforementioned improvements, not only possesses excellent detection performance but is also sufficiently lightweight to be suitable for real-time detection on drones.
This exceptional detection performance brings tangible benefits to road operations. The improved detection accuracy, particularly in identifying small and complex cracks, enables more timely maintenance actions, effectively preventing further road deterioration. This not only reduces long-term maintenance costs but also significantly enhances road safety. As a result, the USSC-YOLO model, when applied in real-world scenarios, not only increases detection efficiency to reduce the burden of manual inspections but also aids road managers in devising more effective maintenance strategies to extend the lifespan of roads and ensure safer transportation.
Ablation experiments conducted on the UNFSRCI dataset confirmed that the USSC-YOLO model achieves a significant improvement in detection performance compared to the original YOLOv5s model. Additionally, when compared to other mainstream models with high detection accuracy, the new model greatly reduces computational load. However, the GFLOPs value of the proposed new model only has a small decrease compared to the original model. Therefore, in the future, we will continue to explore methods for reducing the computational load, aiming to further cause the model to become lightweight without compromising its current excellent detection performance. Currently, the crack images in the UNFRCI dataset come from only one city in China, which limits the diversity of the dataset. Therefore, we plan to collect more road crack images from different regions in the future to further expand the dataset. In addition, the current experimental results are only based on specific hardware conditions. In the future, we will continue to explore the impact of hardware configurations and plan to deploy the model on multiple hardware devices and measure and report the FPS performance under different hardware configurations in detail to determine the optimal application of the model under different hardware conditions.
5. Conclusions
Road cracks are the early warning signs of road damage, and their detection is of great importance in ensuring road safety. However, crack detection work with traditional detection methods will inevitably affect the normal operation of traffic. To reconcile this contradiction, this paper relies on visual recognition technology to develop a lightweight model suitable for real-time accurate detection by UAVs.
Specifically, we insert the CA attention mechanism into the backbone network of the YOLOv5s model to improve the model’s ability to extract crack features. Then, the Swin Transformer block is introduced at the end of the neck to reduce the false detection rate. Finally, the basic feature extraction unit of the backbone network is replaced with ShuffleNet V2 to give the model a better performance with fewer parameters.
The experimental results demonstrate that the USSC-YOLO model exhibits a notable enhancement in detection accuracy and a reduction in computational cost compared to YOLOv5s. With its mAP@50 and mAP@50-95, respectively, exhibiting an increase of 6.3% and 12%, the robustness and generalization ability of the model are significantly augmented.
The accurate detection of road cracks is a prerequisite for the intelligent management of road cracks. Currently, we are undertaking a comprehensive analysis of a range of factors, including the coverage area of cracks, covered lanes, and the degree of road use to model the degree of damage caused by cracks. This can assist in the evaluation of the present road safety situation and the repair sequence for crack targets, thereby providing a scientific basis for intelligent management of road cracks.