1. Introduction
Road infrastructure is pivotal for the socio-economic development and general prosperity of societies, acting as the backbone of connectivity and transportation. Maintaining the integrity and longevity of these road networks is imperative for transportation authorities globally. A major impediment faced by these bodies is the swift detection and subsequent classification of road anomalies like potholes, cracks, and other deformities. Such irregularities, if left unchecked, can result in accidents, amplify vehicular wear and tear, and escalate road maintenance expenditures.
Historically, the detection of such road maladies predominantly depended upon human inspection, a method that, despite its earnest intent, is laborious, time-intensive, and susceptible to oversights. The evolution of camera and laser technologies [
1] has ushered in a new era, allowing for the capture of road images in high resolution, even while in transit. This has resulted in the accumulation of expansive datasets of road imagery. However, the true challenge surfaces in the meticulous identification and categorization of road distress within these extensive datasets [
2].
Addressing the intricacies of pavement management, Jooste et al. architected a model harnessing the prowess of multi-classification machine learning algorithms, aiming for the judicious application of treatments to distinct road segments. This model showcases formidable accuracy, predicting both untreated and treated segments with high precision [
3].
Numerous innovators have further enhanced this domain. Hanandeh introduced a novel intersection of mathematical modeling with Genetic Algorithm and Artificial Neural Networks, aspiring to deduce the Pavement Quality Index of flexible pavements [
4]. Zhao and collaborators ventured into machine learning, aiming to gauge the repercussions of excessive traffic weight on asphalt pavement’s life, employing a random survival forest algorithm [
5]. In yet another groundbreaking endeavor, Kaloop et al. presented a hybrid model, demonstrating superior accuracy in estimating the International Roughness Index of rigid pavements compared to traditional techniques [
6].
In this expansive realm, researchers like Qureshi et al. probed the potential of intelligent image analysis techniques for pavement assessment, elucidating that these avant-garde systems could substantially economize both cost and time [
7]. Dong and his team showcased a fusion LSTM-BPNN model that surpassed its predecessors in forecasting pavement performance [
8]. Moreover, a slew of reviews and studies from researchers like Yang et al. and Damirchilo et al. spotlighted the myriad applications and predictive capacities of machine learning and artificial intelligence within pavement engineering [
9,
10,
11].
Despite these advancements, road distress identification, with its intricate nuances, often remains elusive to conventional image classification models. A pivotal reason is the sparse distribution of pixels that characterize road distress juxtaposed against a predominant background—creating a stark imbalance and a unique challenge in image analysis.
To counter these challenges, this paper details a novel model rooted in sparse perception for road distress classification. The model’s architecture rests on two foundational pillars: a sparse feature extraction module and a domain-specific loss function. The sparse feature extraction module, leveraging the potential of dilated convolution, is adept at capturing sparse features across various scales, creating a comprehensive representation of road conditions. Meanwhile, our loss function, conceived with a meticulous understanding of the pavement domain, ensures the consistent sparsity of the feature space.
Rigorous validation on an authentic dataset affirmed the potency of our model, recording remarkable F1 scores and signifying a profound edge over conventional methods.
In encapsulation, our contributions are as follows:
We design a pioneering model emphasizing sparse perception tailored for road distress classification.
We design an intricate sparse feature extraction module, capitalizing on dilated convolution, to distill and amalgamate sparse features from diverse feature scales.
We crafting a loss function underpinned by a thorough of the pavement, ensuring consistent feature space sparsity.
We empirical validation of the model on an authentic dataset, underscoring its superiority.
2. Method
The maintenance and preservation of transportation infrastructure systems hinge crucially upon the effective identification and classification of pavement distress. The health and safety of these systems are paramount, necessitating reliable and precise methods for identifying potential hazards. In the current study, we introduce a novel approach explicitly crafted for the classification of pavement image distress. This innovative method addresses the shortcomings of existing image classification models in the field of computer vision, which often fail to account for features unique to pavement distress images.
At the heart of our method lies a bespoke sparse feature module. This module is designed to optimize the extraction of image features, thus significantly enhancing the overall performance of the model. Pavement images typically present a unique challenge, wherein the background occupies the majority of the image, and the distresses, which are the primary targets of our analysis, are sparsely distributed.
Figure 1 presents an instance of a road image dataset. The visual display comprises two facets: a grayscale image to the left, which is employed in our model due to its optimal fitting characteristics, and a depth map appearing concurrently on the right for clarity and comprehension. A critical aspect to note in the figure is a specific region of interest, demarcated by a red bounding box. This portion, though relatively diminutive in comparison to the entirety of the image, captures the focus of our model due to the significant information it carries about the road’s condition. The highlighted area demonstrates the inherent sparse nature of the distress features in road imagery—it constitutes only a small proportion of the total image pixels, yet represents the vital information necessary for effective distress classification.
By centering our attention on these sparse regions, we ensure that no critical data points are overlooked. Further, we have developed a loss function tailored to imposing a sparsity constraint, ensuring that the features extracted align with the sparse nature of the target areas. This illustrative example, therefore, underlines our model’s core proposition: focusing on sparse, significant features rather than the largely monotonous background. It encapsulates the challenges of pavement distress classification and how our proposed method addresses them, serving as a testament to the potential effectiveness of our sparse perception-based road disease classification model.
2.1. BoneNetwork
Our method leverages the strengths of the High-Resolution Network (HRNet) [
12], an established architecture, or BoneNetwork, widely employed for diverse tasks within the realm of computer vision. HRNet has heralded significant progress in the field by maintaining high-resolution representations throughout the model. Its multi-scale, parallel, and high-resolution design effectively tackles the issue of information loss at lower resolutions, thereby facilitating the capture of fine-grained details. However, when HRNet is applied to pavement image data, certain limitations become evident. The model fails to effectively isolate and highlight the sparse target features against the pervasive background. Our work is designed to address these limitations, focusing on the critical, sparse target areas to yield better results.
2.2. Sparse Module
The second key component of our method is the Sparse Module. This module serves a critical role in our proposed model. It is crafted to focus on the sparse targets amidst the monotonous backgrounds, a feature often overlooked in conventional methods. The module is designed to be adaptable and scalable, ensuring compatibility with diverse model architectures. It harnesses the power of sparse representation theory, integrating sparse paradigm formulas to enhance the extraction of sparse features from pavement images.
Figure 2 representation portrays the comprehensive architecture of our model, meticulously designed to address the challenges of pavement distress classification. This schematic illustrates a holistic workflow that initiates with an input pavement image and ends with the extraction and sharing of sparse image features then getting classification results.
The pavement image serves as the initial input, and the image is processed through four distinctive feature extractors. Each of these extractors is unique in terms of size, offering a multi-scale method to feature extraction. Simultaneously, as part of this intricate process, we introduce a bespoke sparse module. This component is designed to extract features from the largest size features, further enhancing the ability of the model to identify sparse areas of interest. The utilization of a 1 × 1 convolution aids in the maintenance of these features. In the following step, a pooling layer is employed to identify and highlight sparse points within the image, consequently expanding the sensory field of the model. The sparse module also leverages dilated convolutions of varying sizes to further scrutinize the image, focusing on the comprehensive coverage and extraction of relevant features. The model then amalgamates these diverse outputs, effectively compiling them into a singular, coherent format. This concatenated output serves to participate in multi-dimensional information exchange among different sizes, thus enhancing the overall feature extraction process.
Figure 3 visually encapsulates a comparative exploration of feature extraction methodologies for pavement distress. The graphical representation utilizes color coding to distinguish between different regions of interest within the pavement image. Notably, the red box delineates the target area, indicative of pavement distress, while the blue box demarcates the surrounding background region.
The crux of our enhancement lies in the ingenious alteration of the conventional feature extraction process. As the illustration suggests, we have enhanced the qualitative aspect of feature extraction by expanding the receptive field. This expansion allows the model to grasp a broader perspective of the image, effectively incorporating a larger, more diverse set of contextual details. By doing so, the model transcends the limitations of a narrow field, gaining a more holistic understanding of the spatial and relational intricacies embedded within the image.
Furthermore, the illustration elucidates our decision to implement a sparse threshold set at 0.1. This sparse threshold plays a pivotal role in fine-tuning the feature extraction process, ensuring that the model maintains a focus on areas with significant information density. The sparse threshold serves as a guiding parameter, allowing the model to prioritize extraction from the key areas of interest—the sparse distress points—while effectively reducing the noise from the relatively monotonous background. Through the expanded receptive field and the incorporation of a sparse threshold, we have been able to enhance the qualitative feature extraction, leading to a more accurate and effective representation of pavement distress.
2.3. Domain Sparse Constrained Loss Function
Building upon the foundational concepts of computer vision and image classification, the third component of our research introduces a uniquely tailored mechanism—the Domain Sparse Constrained Loss Function. This innovative approach to handling the intricacies of pavement distress classification is borne out of the need to explicitly incorporate the sparse nature of pavement distresses into our model’s learning process.
Figure 4 portrays the distinct difference in feature space distribution between conventional image classification tasks and the specialized domain of pavement images. The left shows the feature distribution pattern of usual computer vision tasks, the middle shows the sparse feature distribution specific to the road distress domain, and the right shows our proposed loss function dedicated to optimizing the feature extraction by exploiting the distance between sparse clusters. The key objective of our function is to discern and distinguish these sparse, yet critical, features in the pavement images. The loss function, by imposing a distance-based separation, helps to accentuate the sparse features, thereby facilitating their recognition and classification.
Unlike traditional loss functions, our design takes into account the sparsity of target features in pavement image data. This tailoring to the specifics of pavement distresses ensures the optimal extraction and classification of the sparse features. In essence, we modify the original loss function by incorporating a term for
regularization, which helps enforce sparsity in the extracted features:
where
is the cross entropy loss function typically used in classification tasks, and
is the
regularization term. The
factor in this equation is a hyperparameter that controls the degree of regularization, striking a balance between the original loss and the imposed sparsity constraint (is 1 in this case). The
regularization term is defined as the sum of the absolute values of the weights in our model’s dilated convolution modules, a formulation known for its property of promoting sparsity:
In this equation, represents the L1 norm of the weight matrix in the convolutional layer of each dilated convolution module.
Moreover, the Domain Sparse Constrained Loss Function addresses the common challenges in handling sparse data. Conventional models often fall short in dealing with sparse data due to the risk of overfitting or inadequate feature extraction. Our custom loss function mitigates this by imposing a constraint that not only aligns with the data’s natural distribution but also aids in maintaining a robust model by discouraging overfitting.
To recap, the Domain Sparse Constrained Loss Function serves a dual purpose: Firstly, it improves the quality of feature extraction from sparse data by incorporating the nature of the data directly into the model’s learning process. Secondly, it improves the overall robustness and performance of the model by mitigating the common challenges associated with handling sparse data.
3. Results
In this section, we present an empirical evaluation of our proposed model and compare its performance with six contemporary, state-of-the-art methods in image classification. The primary aim of this study was to ascertain the performance and efficiency of our model in addressing image classification tasks with a particular focus on pavement distress categories. A dataset embodying an array of pavement distress categories was meticulously analyzed as part of the evaluation process. Extensive experimentation underscored the superior performance of our sparse model in terms of classification accuracy as compared to the other methods.
Dataset. Our dataset comprises over 100,000 km of road data from various Chinese regions, such as Jiangsu, Henan, Xinjiang, and Shanxi, and features a variety of pavement conditions. Data were collected using a 3D line laser scanner mounted on a vehicle, with a 1 mm sampling accuracy and a 100 km/h acquisition speed, scanning a 4 m wide lane at a 28 kHz frequency. The raw data were converted to images, and the brightness was homogenized. Images were resized to 512 × 512 pixels for model input as shown in
Figure 5. Road engineering professionals performed pixel-level mask labeling, and domain experts confirmed accuracy. The dataset was split into 80% training, 10% validation, and 10% test sets. The Road dataset includes diverse cracks and healthy pavement, with no overlap or excess images from the same road segment, ensuring a high-resolution and robust dataset.
Metrics. The
score is a commonly used metric in machine learning, particularly in tasks that deal with binary or multi-class classification. It is a balanced measure of precision and recall, providing a single metric that encapsulates model performance in terms of both false positives and false negatives. Precision measures the correctness of the positive predictions. It calculates the ratio of correctly predicted positive observations to the total predicted positives. Recall measures the ability of a model to find all the relevant cases in a dataset. It calculates the ratio of correctly predicted positive observations to all the observations in the actual positive class.
where True Positive (TP) means images correctly classified as positive, False Positive (FP) means images are incorrectly classified as positive and False Negative (FN) means positive images are incorrectly classified as negative.
Table 1 and
Table 2 outlines the performance metrics of the various models tested against three classes of road damage, namely, Crack, Pothole, and SealedCrack, and their average performance across these classes. In terms of the average performance, our method significantly outperforms all other models with average
scores of 0.94, an improvement over the closest competitors, EfficientNet and REXNet, which both achieved an average
score of 0.92.
Our model also excelled in the identification of Crack and Pothole defects, recording top scores of 0.99 and 0.92, respectively. Given the inherent sparsity of cracks and potholes in pavement images, our model has showcased impressive performance gains in these categories. Cracks and potholes present unique challenges due to their infrequent occurrence and diverse characteristics, calling for a model that can effectively isolate and recognize these sparse yet critical data points. Our proposed model, designed with a focus on the unique sparsity of such features, has excelled in the accurate detection of both crack and pothole defects, recording top scores of 0.99 and 0.92, respectively. The effective capture and identification of these sparse and intricate patterns in road images demonstrate the adaptability and efficacy of our method. The superior performance in detecting these specific distress types underscores the model’s ability to address the critical aspect of sparsity in pavement distress detection.
These findings robustly affirm the efficacy of our proposed model in recognizing a wide array of road damage types. The superior performance of our model can be largely attributed to the innovative design of our sparse feature module and the domain-specific sparse constraint, which contribute to the robustness of the model in a variety of detection scenarios.
4. Conclusions
In this study, we have presented a novel method for pavement distress classification, designed specifically to handle the unique challenges posed by the sparse distribution of features in road images. The traditional image classification models used in computer vision often fail to capture these sparse features effectively, leading to suboptimal performance when applied to tasks such as road distress detection and classification. Our model, equipped with a specially designed sparse feature module and a sparse constrained loss function, shows superior performance in tackling this problem.
The experimental results indicate that our model outperforms other state-of-the-art models in terms of classification accuracy for various road distress types, especially for sparsely distributed features such as cracks and potholes. The improved performance can be attributed to the effective extraction and focus on these sparse yet crucial features in the image.
Additionally, the sparse constrained loss function provides a more refined control over the learning process, guiding the model to focus on the sparse target areas and ignore the largely monotonous background. This results in a more accurate and effective distress classification, as demonstrated by the high average score achieved by our model.
This research provides paving the way for further optimization of image classification models for similar tasks. Future work can extend this methodology to other sparse feature problems within the domain of road detection.
The success of our model underlines the significance of incorporating domain-specific knowledge in the design of image classification models, encouraging continued exploration of specialized models to tackle domain-specific challenges effectively. This work also holds practical implications for transportation authorities, offering a more reliable, efficient, and scalable method for road maintenance and safety assessment.
5. Future Work
While our study has achieved notable progress in the domain of pavement distress classification, there are several promising directions for future research that can extend and enhance the proposed methodology. Improving model performance through augmented data is a crucial consideration. Developing specialized data augmentation techniques that capture the nuances of real-world road conditions and distress instances could lead to enhanced model generalization. Extending the methodology to analyze distress patterns across extensive road networks could provide insights into broader maintenance strategies. Exploring ways to incorporate geospatial information and analyzing distress trends on a regional scale could guide informed decision making by transportation authorities. Researching ways to optimize the algorithm’s efficiency and enable real-time distress assessment would be valuable. In conclusion, our study lays the foundation for further research that addresses both technical intricacies and real-world applications in pavement distress classification. By exploring these potential directions, we aim to contribute to the advancement of computer vision methodologies and the effectiveness of road maintenance practices.