1. Introduction
Road networks require large investments for construction, renewal and maintenance. Growing networks demand increasing investments while repair deficits continue to grow. In 2017, the repair deficit of the Finnish road network was approximately EUR 1.3 billion [
1], while in the USA, it was USD 420 billion [
2]. In the European Union, deficits have grown since investments dwindled after the 2008 financial crisis, though exact figures are unavailable [
3]. Materially, deficits translate into pavement defects, caused by weathering, wear, and structural problems, which in turn decrease safety, disturb traffic flow, increase fuel use, and cause time delays and discomfort [
4].
Hadjidemetriou et al. [
5] outline four “diseases”, or distress types, for pavement defect classification, presented in
Table 1. In Finnish conditions, the Finnish Transport Infrastructure Agency [
6] explains most of these defects, especially rutting and cracking, with climatic reasons, specifically the use of studded tires in the winter and water freezing under the pavement. In Nordic conditions, rutting has been explored by Lampinen [
7], while Belt et al. [
8] have modeled the structural deterioration and predicted future condition of roads. Automation and high-precision instrumentation can provide high-quality data on road pavement conditions, allowing for accurate estimation of the maintenance needs and prioritization of different targets.
Comprehensive knowledge about road conditions is necessary for planning timely maintenance procedures [
9,
10]. However, many road quality surveying methods that are currently in use appear out of date. Rutting is often measured using laser profilometers with limited numbers of lasers, e.g., 17 [
11,
12,
13], while cracking and potholes are identified manually or using low-resolution images and rudimentary feature extraction algorithms on limited areas [
14]. Better coverage and more information on road quality allow for more effective pavement management [
15]. In fact, many proposals for crack and distress detection have been made in recent years, as can be seen in multiple reviews [
16,
17,
18,
19]. In addition to detection and identification, some degree of information extraction is necessary to determine the need for maintenance procedures. While some information can be extracted from 2D images, 3D—and with time series, even 4D—data provide opportunities for more accurate assessments.
Distress detection and analysis is only one aspect of effective pavement management. In addition to pavement surface conditions, effective management requires knowledge about the foundations of the road, such as structure bearing capacity and pavement thickness [
20]. While surface deformations and distress can reveal structural issues [
21], dedicated tools such as ground-penetrating radar and deflectometers are important in establishing underlying structural conditions [
22,
23]. In some cases, embedded sensors are used to continuously monitor structural conditions [
24]. Through the use these kinds of methods, possible structural failures can be predicted [
22]. While possibilities for structural evaluation beyond the visible surface exist, surface distress detection is an essential part of a viable pavement management system [
15]. Indeed, an effective approach integrates different data sources to continuously monitor a road through its lifetime, model structural responses to use, evaluate performance, and prompt maintenance [
24].
The development of practically feasible and efficient—mobile, automated—distress detection methods requires accurate information about road surfaces and defects. Defects vary in size, ranging from small cracks to large potholes. At the same time, asphalt concrete surfaces are relatively coarse, and distinguishing small cracks can be challenging, depending on resolution and precision. High-resolution reference models allow the identification and quantification of both small and large defects. These reference models can then be used in the development of automated methods for defect detection, measurement, and classification. However, the production of high-resolution models can be laborious and time-consuming. This article aims to compare and contrast various methods for the production of such reference models in realistic circumstances, where conditions are nonoptimal.
Point clouds are simple and common 3D model formats, and can be produced by measurements with, for example, laser scanners and photogrammetry. Evaluating the quality of point clouds produced by different methods is fundamentally a comparative endeavor. Many different approaches have been proposed for comparing point clouds. Lehtola et al. [
25] divide point cloud quality evaluation into three approaches: (1) the control point approach, where a distance between two control points is evaluated from different point clouds; (2) the subset approach, where a subset of a point cloud is extracted and evaluated, for example, by comparing the planarity of a subset representing a wall; and (3) the full point cloud approach, where point clouds are taken in their entirety and compared using an arbitrary metric. The nature of pavement defects and their measurement places the evaluation of their quality in the second—subset—approach, since it makes sense to extract, compare, and evaluate individual defects or areas of interest. Different aspects of, and approaches to, point cloud quality have been investigated by many authors, for example quality metrics [
26,
27], subjective assessment [
28,
29], interactive evaluation [
30], and color [
31,
32].
Considering the geometric accuracy of a given point cloud poses a challenge, as this is typically done by comparing the cloud to another, one of greater accuracy, and deviation determines the accuracy of the given cloud. Without an established reference modeling method, there is no obvious way to determine ground truth. Nonetheless, comparisons allow us to determine deviations between modeling methods and to establish any systematic failures in instruments. Additionally, the use of established and well-calibrated methods, such as terrestrial laser scanning (TLS), which can be considered accurate within its calibrated accuracy, provides a reliable baseline [
33]. However, TLS data may be too sparse for accurate quantification of pavement defects. Various approaches have previously been used for pavement modeling. Inzerillo et al. [
34] modeled a large pothole using both TLS and handheld photogrammetry, determining that the photogrammetric model can be of higher precision. Knyaz and Chibunichev [
35] constructed a stereo camera system that uses structured light and used it to model the deformation of a paved surface, reporting the measuring accuracy to be 0.1 mm, and 3D model resolution about 0.3 mm. Puzzo et al. [
36] determined the accuracies of various cameras in photogrammetric modeling of asphalt surfaces for roughness modeling, concluding that digital single-lens reflex (dSLR) cameras outperform others.
This research examines three technologies (photogrammetry, terrestrial laser scanning, and structured-light laser scanning with handheld scanners) and five instrumentations for creating high-density and highly accurate point clouds of road surfaces for referential use in the development of automated defect detection and analysis systems. These instrumentations, described in this article as methods, are examined in realistic circumstances; that is, test plots are real road surfaces, defects are real defects, and measurements are conducted alongside traffic in natural lighting and nonoptimal weather conditions. Compromises were made in measurements, and obtained measurements are nonideal. As a result, results are nonuniversal, but provide a comparative case study of how different approaches to road surface modeling perform. For ground truth, laboratory measurements with pavement samples are conducted. A state-of-the-art TLS instrument is contrasted with two photogrammetric approaches and two handheld structured-light scanners. Based on reference measurements, photogrammetry based on high-resolution images is chosen as a reference for further evaluation of other methods. Point clouds are compared directly by utilizing cross-sections, other visualizations, and point–point distances between clouds, and indirectly by comparing volume and maximum depth of defects as measured by the different instruments.
4. Discussion
In order to extract necessary geometric information about various types of pavement distress, the necessary level of detail is highly varied depending on the defect. In this study, some defects were only some milliliters in size (
Table 12), while others were multiple liters. While larger defects can be identified in rather low-resolution data of sufficient accuracy, identification and extraction of defect properties, such as depth, width, and volume, requires reliable data of sufficient resolution. To identify and extract information about a 1 mm wide crack, sub-millimeter accuracy is necessary. Potholes, which are typically over 10 cm in diameter and multiple cm deep, can be robustly identified with centimeter-level accuracy. El Issaoui et al. [
33] deem a 1.4 mm error level adequate for operational rut depth measurements. For reference use, sub-millimeter levels of accuracy and precision are necessary to reliably evaluate various defect types. As seen in
Table 11, of the methods investigated here, only photogrammetric approaches provide this level of detail.
Carefully photographed high-resolution images that are carefully processed into high-density point clouds provide 3D models of higher precision and density than is available from terrestrial or handheld structured-light laser scanning. A lower-resolution industrial camera is also capable of providing high-density models, but imprecisions are more likely to remain, especially when the imaging process is not carefully planned and executed. It is reasonable to assume that a more careful imaging process, perhaps carried out on a moving platform from constant height and with higher overlap would provide results with less noise. TLS modeling is reliable and straightforward, and processing is quick. However, it is insufficient for millimeter-level precision and detail as point densities are very low relative to photogrammetry. While all investigated instrumentations achieved high precision in reference measurements (
Section 3.1), photogrammetry delivered unrivaled point density, which is vital for reference use. At the same time, the use of photogrammetry is always a compromise, as choices have to be made about the level of detail and processing settings. It would, for example, be possible to image a research plot with smaller ground sampling distances and larger numbers of images, which might result in higher point densities. In this study, the chosen measurement approaches can be justified as being reasonable for the acquisition of reference data. Closer-range imaging or more delicate measurement conditions would increase the workload in ways that do not reflect realistic measurement conditions.
While photogrammetry seems to perform well, results from handheld structured-light scanners are less straightforward. The measurements used in this study were nonoptimal, with Artec Leo measurements being made 2 months later than other measurements, and Faro Freestyle measurements not covering all defects that were used for comparisons. Some properties of these methods are evident nonetheless.
Figure 6,
Figure 7 and
Figure 12 and
Table 15 show that the Freestyle produces very noisy point clouds, which largely disqualifies the scanner from being used to precisely map road defects for reference purposes. The Freestyle also performed poorest in reference measurements. The Leo seems quite capable of producing precise point clouds, but scanning without targets or other ground control points causes drift and bending, as seen in
Figure 6 and
Figure 7. It was also rather poor at capturing the volume of defects, perhaps because the drift makes creating reference surfaces inaccurate, while obtaining the maximum depth of defects very accurately. Other possible reasons for this are smoothing or hole-filling happening during data processing as a result of some areas being occluded or poorly scanned for other reasons. Another drawback to handheld scanners is that they have limited possibilities for further development, and cannot, for example, be implemented on a mobile platform. With both handheld scanners, point density varied quite a lot, which suggests that human factors in the measurement process—and, with the Artec Leo, the processing process—have a significant effect on this quality. Furthermore, it is unclear whether higher point density—resulting, perhaps, from slower measurements—provides higher accuracy or precision, or whether, indeed, the opposite occurs. The latter could be the case if the tracking of the scanner is noisy. Further investigation into the effects of different measurement techniques with structured-light scanners is required to assess these questions.
4.1. Reference Measurements and Systematic Error
Significantly, the largest errors in reference measurements seem to occur at points in the targets where there are the deepest cracks, holes, or crevices (
Section 3.1 and
Figure 4). Similar results appeared across all instruments. This implies that all of the modeling methods may underestimate defect depth and volume. While the displacement shown in the figure is small, it may have consequences for modeling large, long, or deep cracks, which may as a result be underestimated in size and, therefore, significance, and need repair, especially by algorithmic evaluation. In other words, this suggests that defects are larger than they appear.
4.2. Evaluating Usability and Efficiency
4.2.1. Measurement and Processing Times
Table 16 presents approximations of measurement and processing times for a single measurement instance, i.e., plot. It also reports the need for targets in measurement, which contributes to measurement times and preparation needs. These are indicative of what will result in reproductions, but, as will be discussed in
Section 4.2, there are many factors affecting these times. Data processing was decentralized across multiple computers with different specifications, making direct comparisons difficult.
The fastest measurements were performed with the handheld Artec Leo scanner, which was able to scan a plot in approximately one or two minutes with practically no preparation. Supplementary scanning (the Artec Leo user interface allows the user to stop and resume scanning) was found to be unhelpful, likely because the simple geometry of the road surface makes it difficult to register multiple scans, compared to one continuous scan. The use of markers, which might improve scanning results with the Leo, would increase preparation time significantly. Nikon imaging took some 5–10 min for a single plot (200–250 images with two cameras). Grasshopper imaging took 1–4 min for actual imaging (500–1600 images per plot at 7 frames per second), but took some additional minutes for setup with a computer and imaging software, focusing, etc. TLS measurements require some 5 min per measurement, and in this study we measured each plot twice, giving a total of 10 min, to which setup times for target spheres, including tripod leveling and GNSS measurements, should be added. Specifying more exact scan areas would reduce this time; that is, not doing complete 360° scans, but only scanning the relevant section. The scanning time on the Faro Freestyle handheld scanner was 1–2 min, making it approximately as fast as the Leo. However, in this study, calibration signals were used to improve the geometry, which increased the total measurement time to 3–4 min, including the placement of the calibration signals, the scanning time, and the removal of the signals. Although only the damaged pavement and surrounding area were scanned with the Freestyle scanner, it would not have taken significantly longer if the entire test area had been scanned.
While the Artec Leo scanned the plot quickly, it required long processing times. The proprietary Artec Studio software that must be used functions similar to a black box, and it is difficult to know what processing is taking place and how much time it requires. Generally, when using nonoptimal settings, a plot was processed in about 5 h. However, using the best possible settings caused extensive processing times (days or weeks), especially in texturing. In this case, where interests are strictly in geometry, texturing could be foregone to accelerate processing, but due to lack of georeferencing in Artec Leo data, texturing (which, in this case, translates to a colored point cloud) made manual registration with other point clouds easier and more reliable. At the same time, improvements in point clouds were hard to find, based on a few computational and visual comparisons. In stark contrast, processing times for individual test plot data from the Faro Freestyle were less than one minute each.
TLS processing required approximately two hours to process all eight plots. Photogrammetric processing times depend on software and hardware being used, number of images, image resolutions, and choices of algorithms for interest point detection, among other factors. A reasonable estimate is that processing a set of images from one research plot into a high-density point cloud takes approximately a day. There are, however, many ways to improve processing times for photogrammetry, such as limiting the number of images, using optimized imaging patterns, using optimal interest point detection, and other algorithms based on the target and limiting the examined area. All measurement methods require a length of time for processing, and in all examined cases, this means hours of passive work. The practical result is that most processing is run overnight, and differences become less consequential. In addition, the time cost is of less importance in the case of reference modeling compared to deployed methods.
4.2.2. Other Considerations
Besides examining the time cost, other factors influence usability and efficiency as well. All discussed methods require a level of sophistication in instrumentation, though the cameras are the most affordable. Collecting images on a dSLR camera is straightforward, though photogrammetric applications require some well-known considerations in imaging, [
51,
52] (for example). Imaging using the Grasshopper industrial camera as described in
Section 2.2.2 is a nonstandard and somewhat complicated approach, and the study of this camera should be considered as a trial for a more systematic approach or rig where several such cameras are installed. As presented, the GH imaging method was unstable and prone to nonoptimal imaging, though it performed well on average. It also results in a deluge of images, most of which do not provide additional information. TLS measurement is straightforward and a well-established surveying method. It requires the use of targets for cloud alignment, but this cost is significant only insofar as it requires more time. The Artec Leo is easy to use and fast to operate, but quality control during operation is challenging, as quite little information is available to the user. Though the instrument notifies the user if tracking is lost, some errors in registration may be revealed only in postprocessing, which is impractical to do onsite. Using targets or signals would make measurement more cumbersome. In processing, the proprietary Artec Studio software is nonideal for research purposes, as the workings of different procedures and algorithms are not clear. Faro Freestyle measurements are quick to produce and process, but seem to be too imprecise and noisy for reliable use in reference modeling. In other use cases, its speed may be a decidingly favorable property as compared to the other instruments discussed here, especially if steps are taken to denoise resulting point clouds. The use of markers is not strictly necessary, but experientially helps prevent drift and distortions.
4.3. Future Research
As measurement and modeling technologies continue to develop and improve, further research can provide insight into how new instruments and methodologies compare to existing and established ones. The objective of this study was to determine which method is sufficiently accurate and efficient for use as reference in the study of more versatile pavement defect detection and analysis methods. As such, future research will focus on these methods, which are likely to be mobile and autonomous to varying degrees. Reference measurements remain necessary for benchmarking and evaluation. As research and development are focused mainly on mobile mapping of road defects, reference modeling can be mobilized as well, for example by constructing rigs with high-resolution cameras capable of imaging the entire width of a lane at high resolution. Such rigs can facilitate processing, since initial camera positions can be pre-estimated quite accurately. In addition, they can provide constant quality across different plots as images are not being taken by hand, and alleviate the amount of tedious manual measurement labor in general.
5. Conclusions
As the development of increasingly sophisticated defect detection and analysis techniques continues, evaluating these techniques is a necessary component of their development. This research contributes a case study of various state-of-the-art methods for producing close-range, static, high-resolution, three-dimensional reference measurements of pavement defects, investigating three technologies in five instrumentations in real measurement conditions. Such reference measurements can be used to provide ground truth to automated defect detection methods and less-accurate, precise, or dense sensors. The study finds that carefully measured, high-resolution photogrammetric point clouds are the most reliable, detailed, and precise without losing accuracy, providing mean distances between points of down to 0.04 mm and mean accuracies and precisions of 0.2 mm. Industrial camera photogrammetry can provide similar densities, accuracy and precision, but due to the imaging method deployed here, can retain distinct erroneous areas. Terrestrial laser scanning is likewise accurate, but much less dense. Of the investigated handheld structured-light scanners, the Artec Leo provides high accuracy and precision but only on a small scale, and has much lower point density (1–10 mm between points on average), while the Faro Freestyle creates quite noisy point clouds (almost 1-millimeter errors in reference measurements) with only slightly better densities than the Leo. While the photogrammetric approaches were superior in density, accuracy, and precision, other factors, such as measurement and processing time cost, may favor the other approaches. TLS holds the middle ground for both accuracy and precision and efficiency concerns, while the handheld scanners provide quick measurements and, specifically in the case of the Faro Freestyle, processing. Due to the requirements for reference measurements in developing and evaluating defect detection methods, photogrammetric or TLS approaches provide the most reliable reference datasets, with the choice depending mainly on required level of detail.