1. Introduction
Roads are the main and most used land communication axes, accounting for 71.8% of inland transport in EU-27 [
1]. Road maintenance is therefore of the utmost importance to the relevant management authorities. However, the road infrastructure maintenance investments have been decreasing significantly since 2008, resulting in the deterioration of the road network as well as additional costs, which increase exponentially as road assets deteriorate [
1]. To ensure infrastructure conservation and compliance with road users’ safety standards [
2,
3], government and local authorities must regularly measure, map, and inventory road infrastructures. However, manual approaches disrupting traffic are time-consuming and increase the risk of accidents with material or human consequences. Road work zones remain dangerous, which led the European Union Road Federation (ERF) to launch a program in 2014 to raise awareness among governments and standardize road work zones in European countries to increase safety during road works [
4].
In response, mobile mapping systems (MMS) have gained popularity, benefiting from being safer and faster than conventional approaches, since they are dynamic remote sensing techniques that do not disturb the traffic flow. In fact, there is a wide range of applications that can use the same acquisition: from road markings detection [
5,
6,
7], road boundary estimation [
8], and automatic retro reflectivity measures on road signs [
9,
10], to road inventory [
11,
12]. The prospect of an automatic road inventory workflow paves the way for the digitization of roads, marking a step forward in the possibility of modeling existing roads with the emergence of building information modeling (BIM) applications. The standardization of information regarding transportation infrastructure domains is already being studied by buildingSMART to comply with industry foundation classes (IFC) standards [
13]. Indeed, roads present a specific challenge as existing road networks exhibit a heterogeneous level of detail and information.
The purpose of this work is to analyze an automated method for road inventory of a 3D point cloud acquired with a low-cost MMS. This method includes the road width, the number and width of individual lanes, the superelevation, the width of the pavement shoulders, and the barrier height when present. Previous works have proposed various methodologies for road inventory: Holgado-Barco et al. proposed a heuristic method based on the segmentation, classification, and extraction of road markings through intensity thresholding and clustering of discontinuous road lines [
12]. Then, an estimation of road parameters, except for barrier height, is performed using PCA and specific thresholds. However, it did not use the road centerline and was limited to highway study cases. Vidal et al. [
14] performed segmentation and classification of barrier types on 3D point clouds using an intensity threshold to isolate road markings and fit a plane to the road. Then, a DBSCAN clustering allows distinguishing barriers among non-ground points. However, it focuses on security barriers and does not use more complex semantic segmentation applied to different objects. Regarding road tilt, Gargoum et al. performed a detailed estimation of superelevation for water drainage and roadside slopes on road cross sections of an Alaskan highway [
15].
These works highlight the importance of the automatic estimation of various elements surrounding road infrastructures. Being able to make an inventory of road assets is a crucial step to building an accurate digital representation of roads in an automated way. In the context of road parametrization, Soilán et al. [
16] proposed a method to extract the road centerline and the geometric features of the road in a format compliant with industry foundation classes (IFC) standards. To this end, the 3D point cloud is semantically segmented using a deep learning approach based on Point Transformer [
17]. The point cloud is divided into four classes, including road asphalt and markings. Then, the road centerline is extracted, and a robust curvature estimate is applied. The curvature is used to classify the uniformly sampled points into three geometric classes according to the horizontal alignment of the road (i.e., straight lines, circulars, and clothoids) whose parameters are calculated.
Che et al. [
18] developed a new specific structure called scan pattern grid as a pre-processing step for features extraction and point cloud segmentation. Scan lines are used to reconstruct the vehicle trajectory, then points are projected onto a 2D plan where rows correspond to a scan angle while columns represent a timestamp of the acquisition. This parametrization allows curved roads to be represented as a straight line. However, this method has only been tested on a one-way acquisition and is not directly applicable to a round-trip acquisition.
An essential step in the extraction of information in road applications is to isolate the elements of interest. In this regard, semantic segmentation addresses this problem by assigning a semantic class to each point in the case of a point cloud, allowing operations to be applied only to the desired elements. Semantic segmentation can be achieved by geometric considerations and thresholds, or by machine learning algorithms, such as random sample consensus (RANSAC) and region-growing [
19]. Vidal et al. [
14] used a scan angle threshold to isolate the road and characterized the density and verticality of the point clouds to extract safety barriers from the road environment.
An increasingly popular approach relies on deep learning models to segment point clouds [
20]. Deep learning research on semantic segmentation for point clouds took a large step forward with the release of PointNet in 2016 [
21] as the first model able to directly process raw point clouds without the need for additional 2D information or additional transformations. Since then, great efforts have been made to improve results on benchmark datasets, such as S3DIS [
17,
20,
21,
22,
23,
24]. Point Transformer [
17] is a modern architecture published in 2021, based on self-attention layers using a concept analogous to queries, keys, and values to enrich the input with contextual information. This structure has proven effective and increasingly popular in natural processing languages tasks [
25] before being successfully applied to point cloud semantic segmentation, achieving state-of-the-art results.
Deep learning has been successfully employed for the segmentation of 3D point clouds of infrastructure features. In [
26], road surface objects are segmented to extract features from the road surface and road markings. However, the information is processed using 2D images resulting from the projection of the 3D data. Ma et al. [
27] focus on the road pavement, developing a graph convolution network for pavement crack extraction. While the network obtains good results, the applicability is limited to a single road feature. In [
28], PointNet++ is used to extract road footprints from airborne LiDAR point clouds in urban areas. The main drawback of airborne data is that it is not possible to extract road parameters or assets that require better data resolution.
In this context, the motivation of this work is to explore the possibilities of the semantic classification of 3D point clouds to support road inventory tasks. To this end, this work proposes the following contributions:
Exploit a particular version of a deep learning model based on Point Transformer architecture for the semantic segmentation of 3D point clouds of road environments;
Use the road centerline to divide roads into cross sections and develop an algorithm to build a rectified road model from a round-trip MMS acquisition to facilitate its parametrization;
Integrate robust methods to estimate road parameters (i.e., road width, lane number and width, road shoulders width and barrier height).
The remaining work is organized as follows. Case study data are presented in
Section 2.
Section 2 also describes the methodology which consists of three main steps: (1) point cloud segmentation and road centerline, (2) cross sections and construction of the rectified road model, and (3) geometric inventory of the road for each cross section. The results are presented in
Section 3 and discussed in
Section 4. Finally, the conclusions and future lines of work are presented in
Section 5.
2. Materials and Methods
This part is composed of a subsection describing the materials used for acquisition and three methodological subsections, which are schematized in
Figure 1. First, a Point Transformer model is trained to perform semantic segmentation on the 3D point cloud. Second, the segmented point cloud is rectified to produce road cross sections that ease the road parameterization process. Finally, different parameters derived from the road layout (lanes, shoulders, barriers, superelevation) are extracted as the output of this methodology.
2.1. Case Study Data
This work uses data acquired with a custom and low-cost MMS in Ávila (Spain) in July 2021 on a 6 km stretch of a conventional road (AV-110, starting in its kilometric point 0) (
Figure 2a). The sensor was mounted on a van with 45° tilt, driving at approximately 80 km/h on the closest lane to the road centerline when possible. The laser scanner is a Phoenix Scout Ultra 32 (
Figure 2b) equipped with a Velodyne VLP-32C, with 32 laser beams and horizontal and vertical fields of view of 360° and 40°, respectively. The scan rate of 600,000 measurements per second (PhoenixLidar, 2021) provides dense 3D point clouds.
2.2. Point Cloud Semantic Segmentation and Road Centerline Extraction
This section focuses on the point cloud semantic segmentation using the Point Transformer architecture and the extraction of the road centerline, both of which are used as input for the rest of the work [
16].
Semantic segmentation: The road dataset corresponds to a round-trip MMS acquisition to ensure equal density on both sides of the road. Due to the high density in some areas, the point cloud is subsampled with a distance criterion of 3 cm, which results in a dataset of 103 M points. Ground truth data were obtained by manual labeling for the deep learning training. The labeled dataset consists of 3 M points for training and 3.5 M for the test set after subsampling to ensure enough representative data in both of them. The training and test sets are divided into five classes: asphalt, road markings, road signs, barriers, and other, see
Figure 3. The class “others” is defined with all the points that do not belong to the other classes to be segmented. Logically, and considering each class defined, we can see that the dataset is unbalanced towards the classes “asphalt” and “other”, which each account for 47% of the manually labeled points in the training set, as shown in
Table 1.
Point Transformer is an architecture introduced by Zhao et al. [
17]. Supported by this architecture, we designed a deep learning model composed of fives encoders and five decoders, consisting of a variable number of Point Transformer layers followed by a transition down and, respectively, up layer, as represented in
Figure 4. Points are grouped through k-nearest neighbors pooling to reduce the cardinality of points by a factor of 4 at each stage of the architecture.
The model is trained for five classes: asphalt, road markings, road signs, barriers, and other. The original weights from the author’s training on S3DIS [
17,
20,
21,
22,
23,
24] are used as initial weights to increase generalization capacity. The number of epochs is set to 300, while the learning rate, set to 0.001, is decayed by a factor of 0.1 every 60 epochs with Adam. The batch size is fixed to 32 samples of about 1500 points each.
To account for class imbalance, a weighted cross entropy loss with weights inversely proportional to the number of points in each class in the training set is used. Laser intensity is used as an additional input f for the deep learning model. To augment the data, a random rotation around the Z-axis is applied, as well as a flipping of the positions and slight rotations of up to 15 degrees around the X- and Y-axes for each batch.
Finally, to reduce the classification noise and give a smoother result removing the isolated regions, a conditional random field (CRF) post-processing is added. The energy model defined by Krähenbühl and Koltun [
29] is considered as follows (Equations (1) and (2)):
with
where
is the position of the point
, its associated class is
and feature
,
and
are the weights, and
,
are the bandwidths parameters.
The energy Equation (1) is the sum of unaries and pairwise potentials. Unaries potentials correspond to the log probability of the point belonging to a class. The pairwise potentials are composed of a label compatibility matrix and two kernels, the smoothness kernel and the appearance kernel. Nearby points sharing similar features tend to belong to the same class, which is represented mathematically by the appearance kernel while the smoothness kernel removes small regions in disagreement with their neighbors. The specific features used in the pairwise energy term were selected based on their relevance to this segmentation task, and their effectiveness in reducing classification noise and smoothing the segmentation output was carefully considered, thus, 3D geometric features derived from the coordinates of each point—normal vectors—and intensity as a feature were selected for this term.
Unaries potentials are obtained by inferring the classes scores returned by the deep learning segmentation model. Pairwise potentials allow to consider interactions between points and their associated classes and penalize points classified differently from their neighborhood by defining
. By adding a higher energy when points are connected to points of different classes, the minimization of the energy expression
results in a smoother segmentation. The minimization is done through a python wrapper of the original code
https://github.com/lucasb-eyer/pydensecrf (accessed on 2 February 2023).
However, the hyperparameters of the pairwise energy, the weights
, and the bandwidth
,
, have to be chosen beforehand. To this end, the test set is classified by the final trained model. A grid search of the hyperparameters is performed. For each parameter combination, the increase in mean intersection-over-union (see
Section 3) due to the CRF processing is measured on the test set. Experiments showed a small influence of the bandwidth parameters although large values were found to be more beneficial. Optimal values were found for
and
.
Road centerline extraction: The road centerline is extracted following the semantic segmentation of the point cloud. First, points labeled as asphalt and road marking are selected and processed to filter false positives as a post-processing step. Then, road markings are used together with the trajectory obtained by the navigation system of the MMS to extract the road centerline, which separates both traffic directions. That road centerline is processed so it can be defined by continuous geometries associated to the horizontal alignment of the road (i.e., straight lines, circular arcs, and clothoids). Hence, it is possible to sample that road centerline with a fixed distance, obtaining a discrete set of points that belong to the road centerline, which can be used as input data for the next processing steps of this work.
2.3. Cross Sections and Rectified Road Model
This section first describes the methodology for dividing the road point cloud into smaller portions called cross sections, and then the way they are transformed and reattached to build a rectified road model.
Cross-sections: Following a similar approach to Gargoum et al. [
15], the road is divided into cross sections. The points of the road centerline are separated by a distance of 1 m and defined by their planimetric coordinates (
x,
y). A subtraction between consecutive planimetric road centerline points gives the direction vector
of the road on each point
The direction vector
, as a normal vector, and the point
also define the road cross-section plane
, visible in
Figure 5. The points in the points cloud are then extracted based on a threshold distance
to the cross-section plane and a distance to the origin of the vector, which were experimentally chosen as 2.5 m and 20 m, respectively. The distance
to the plane is defined as (Equation (4)):
To facilitate the next steps in the workflow, the road is oriented such that the road axis becomes collinear to the
X-axis. The angle
α between the direction vector and the
X-axis is used to define a rotation matrix
which is applied to the cross section in
Figure 6.
Rectified road model: The resulting cross sections are convenient when working individually on each one of them. However, the continuity between consecutive sections is lost. Larger cross sections give more robust lines estimations but are limited by road curves. This step aims at building a rectified model of the road that allows to use the full scale of the road by removing curves. To solve this problem, in addition to the rotation, a translation to align each cross section to the others is performed. The algorithm used in this work requires only a list of road centerline points and is presented as pseudo-code in
Figure 7. A rotation centered on the origin of the associated vector is applied to each section. The section is then translated on the
X-axis and positioned at the calculated distance between the vector origin and the center of the previous section. The comparison is shown in
Figure 8.
2.4. Road Parameters
This section focuses on computing the road parameters: road width, superelevation, lanes width, lanes number, road shoulders width, and barriers heights (
Figure 9).
First, the cross sections are considered individually to calculate the geometric characteristics shown in
Figure 8. Thanks to the rotation computed before (
Figure 6), asphalt edges can be approximated as lines whose equation is
with
constant (
Figure 10a). Since the asphalt classification contains false positives and false negatives (see
Table 2 and
Table 3), the outliers are filtered out by approximating the points to a normal distribution along a specific axis and discarding the points outside the range [mean—2*std, mean + 2*std]. Filtering is performed on the vertical
Z-axis and then on the
Y-axis. Finally, the percentiles 0.01 and 0.99 of the Y distribution of asphalt points are considered as the edges of the asphalt, which are
y1 and
y2, respectively, with
y1 −
y2 being the road width (
Figure 10b). The superelevation can be calculated by using the mean
x of the cross section and taking the planimetric points closest to (
x1,
y1) and (
x2, y2) belonging to the asphalt class (Equation (5)).
Now that the road width and superelevation are calculated, the lane-related parameters are computed: the number of lanes, the lane width, and the road shoulders width. Their estimation is a process sensitive to outliers and misclassifications, especially when sections are considered individually. To account for this sensitivity, the rectified road model is used to refine the points classified as road markings. First, road markings points outside the road boundaries (previously computed and represented by y1 and y2) are discarded.
Even when considered in aggregate, additional road markings such as chevrons, white diagonal stripes (see
Figure 11a), and arrows can disrupt the process. Therefore, the entire rectified road marking point cloud is converted to a raster, or more specifically, a binary image. The points are projected onto an XY plane with a resolution of 0.1 m and each pixel containing at least one road marking point has its value set to 1. Then, a Sobel operator [
30] defined as follows:
It is applied to all pixels of the image to highlight the horizontal edges. The pixels that do not give a response in 2D (equal to 0) are discarded, while the rest are kept. The result can be seen in
Figure 11 and
Figure 12. Errors resulting from the alignment process are visible in
Figure 12 in the form of undulations and are discussed in more detail in the discussions section.
The next step involves the detection and clustering of road markings using a line extraction based on RANSAC [
31]. Haga clic o pulse aquí para escribir texto. RANSAC estimates line parameters by repeatedly using a random sampling strategy. The line parameters are estimated from the subset that contains more inliers, i.e., number of points closer than an orthogonal distance established at 10 cm as threshold (
Figure 13).
The lines found on the whole rectified road model, referred to as global lines in the following paragraphs, can then be compared to the lines found with RANSAC in each cross section individually. As central road markings can be separated by a great distance, for each cross section, the two neighboring cross sections are also selected to ensure that at least two central road markings appear. To give a more precise estimation, local lines are estimated using RANSAC and then compared to global lines. Two points are randomly selected from the subset of road markings, a line equation is estimated, and the points falling within a 0.2 m threshold from the line equation are grouped together, see
Figure 14. Only the candidate with the most inliers is retained before repeating the process.
Additional constraints based on the line slope are then added. This allows to discard lines that are diagonal or orthogonal to the road, which can happen in noisy classification or in the presence of chevrons (
Figure 14). Each global line found in the rectified road model is associated with the closest local line found in the cross section, which are the ones kept for lanes delimitation. The process is repeated for each cross section.
Although RANSAC can perform well in the presence of outliers, the experimental results applied on individual road sections yielded a high rate of false positives that resulted in erroneous estimates of the width and number of lanes. This is mainly due to the presence of chevrons or white diagonal stripes on the road, in addition to the classification noise. The use of the rectified road model provides a larger scale approach more suitable for lane estimation.
The final step is to calculate the number and the width of lanes. By considering a line defined by the equation
, with
a constant, passing through the center of the road section and the equations of the lines resulting from RANSAC, the number of intersections is calculated. The distances between consecutive lanes can be computed, giving the number and widths of each lane. Similarly, the smallest distances of a road marking to the asphalt edges
y1 and
y2, correspond to the two road shoulders’ widths as represented in
Figure 9.
The last part of the method focuses on the calculation of barrier height as seen in
Figure 15. Using the classified asphalt points based on the deep learning model defined in
Section 2, a plane whose equation is
can be fitted to the road with the least squares method. For its part, using the classified barrier points based on the deep learning model defined in
Section 2, the barrier height can be computed as the perpendicular distance from the road plane to the barrier. Similar to the road markings, RANSAC allows to extract the line that best fits with the superior part of the barrier and cluster each barrier independently, since the number and height of barriers for each cross section could vary. The 0.95 percentile value along the
Z-axis is retained as the security barrier height cluster.
4. Discussion
In accordance with the results obtained, there are different error sources than can be discussed here. First, the road rectification process takes as reference the extraction of the road centerline. However, this reference centerline contains errors that result in erroneous translations of the cross sections, visible by apparent undulations when displaying the rectified road model.
Figure 18a shows the road alignment computed over a subset of points and the resulting road model with a local curb, visible in
Figure 18b, that can affect the rest of the process.
Second, the process is highly dependent on the quality of the input semantic segmentation. Large errors propagate into the subsequent heuristic process, resulting in poor parameter estimation. While road asphalt is one of the best segmented classes and is therefore a class that can be considered more reliable than others, this is not the case for road markings whose segmentation strongly influences the traffic lanes delineation or barriers (
Figure 19).
Finally, the road itself may have a variety of singular structures that require special attention to be handled correctly. In the results presented in the previous section, an intersection, visible in
Figure 20, consisting of a secondary road joining the main road perpendicularly and a merger lane, have implications for road width, road shoulder width, as well as lanes estimates.
In order to discuss possible improvements of this work, it is interesting to note that in most of these failure cases, there is a geometric logic that is not being fulfilled. For example, in
Figure 19, it can be seen how the barrier is not correctly classified on one side of the roadway despite being practically symmetrical with respect to the barrier on the other side, which is correctly classified. Another example is that of erroneous results that may clearly contradict existing regulations regarding shoulder or lane lengths. Adding this type of logic to the classification architecture can improve the results. In this sense, works such as [
34] can be of great relevance to add this geometric logic to the segmentation process, limiting these errors thanks to the prior knowledge of the domain.
It is also important to highlight some of the potential applications of the proposed method. First, the straightforward application is road infrastructure mapping: measuring the dimensions and features of the road is critical for road planning and design, as well as for maintenance and safety improvements. The results of this method may be useful to generate standardized as-is information models of the infrastructures, using formats such as IFC [
35]. Second, improving the semantic segmentation accuracy and working toward real-time implementations would aid in the development of driver assistance systems, providing crucial information for the perception and decision-making modules of these systems (for example, lane detection or recognition of traffic signs). Finally, by segmenting the road environment and tracking changes over time, this method can be used to provide information about the deterioration of the road, as well as identify areas where maintenance is required. This can help reduce the cost and time associated with manual inspections and improve the safety and efficiency of the road network, especially if the method is extended to segment geometries related to slopes in mountainous areas, which is a relevant issue for road safety [
36].
5. Conclusions
This work presents a novel approach for automated road inventory that addresses the challenges of determining road width, number of lanes, lane width, road shoulder width, superelevation, and barrier heights. The approach employs deep learning on 3D point cloud data acquired by a low-cost mobile mapping system (MMS). A deep learning model is designed and trained on a manually labeled subset of the dataset, and the resulting semantic segmentation of the road dataset is refined using a conditional random field (CRF) post-processing to reduce classification noise. Road cross sections are extracted using direction vectors computed from the road centerline, and a rectified road model is generated to aid in lane delineation estimation. The rectified road model is rasterized using a vertical Sobel filter to remove markings diagonal or orthogonal to the road axis, and outliers filtering and heuristic processes are used to estimate the road parameters.
The results of this workflow are compared to a ground truth manually measured by an expert on the point cloud of a 1.5 km-long subset of the road. The estimates yield positive results for road width, superelevation, and barrier heights, with a median error of 0.35 m, 0.36%, and 0.01 m, respectively, and correctly inferring the number of lanes in 81% of the road. This proves the viability of the workflow to support inventory tasks with a more automated and safer approach than the classical protocols used for road inventory.
Although there are potential sources of error that may affect the results, this methodology shows potential for further improvements, such as enhancing the quality of input elements like semantic segmentation, road centerline extraction, and improving the robustness of the heuristic processes to errors. The approach can be refined to extend to more complex objects and rules for road maintenance and digitalization. Adding prior geometric logic to the segmentation network is proposed as an innovative line to improve the presented results. This encourages research on road inventory parameterization, in a context where MMSs and digitalization are increasingly popular. Further research could lead to segmenting more diverse and complex features, paving the way for building digital models from as-built infrastructure acquired by MMS, and to perform more complete geometric assessments.