Recent Developments on Drivable Area Estimation: A Survey and a Functional Analysis
Abstract
:1. Introduction
- An analysis of the state of the art of the last eight years in regards to drivable area estimation
- An architecture breakdown and a taxonomy that fit learning-based and non-learning-based algorithms
- A study of the existing relevant datasets to assess the performance of these algorithms and the influence of modern sensing technologies in their performance.
- A proposal for future research directions for the field
1.1. Scope
1.2. Related Surveys
2. Drivable Area Perception
2.1. Rgb
2.2. 3D LiDAR
2.3. Open Source Maps
3. Datasets
- Waymo Perception as a part of the Waymo Open Dataset is one of the most significant sensor datasets in terms of the sheer quantity and variety of annotation types. On top of its sensor suite, it offers labels for bounding box, key point, 2D panoptic segmentation and 3D semantic segmentation. It shows urban and residential scenes with diverse weather in both daytime and nighttime environments. It was updated in 2023 with an HD Map, and it also includes a python devkit to streamline development.
- Argoverse 2 builds its value proposal on top of the long range (200 m) of its LiDAR and annotations, the six different US cities it collected its data in and the amount of different labels it offers (30). It also offers a vector map with lane-level geometry and a ground height raster map to ease the filtering of ground LiDAR returns.
- nuScenes is a dataset that implements a full sensor array with RGB cameras spanning a 360° view, a 32-layer 3D LiDAR and a RADAR array. It includes a high number of annotations in inner-city traffic scenes with changing weather and heavy traffic. It has a leaderboard that differentiates between LiDAR only and any sensor modality for semantic segmentation and panoptic segmentation.
- CityScapes is a Stereo RGB dataset of complex urban scenes that offers fine and coarse semantic annotations. It depicts busy inner-city driving with high levels of traffic and pedestrian interaction and has separate annotations for road and sidewalk.
- KITTI is one of the oldest and most popular autonomous driving perception datasets available. The specific subset for drivable area estimation is named KITTI ROAD, and it consists of a non-sequential set of images and pointclouds taken in non-busy suburban environments with and without lane markings in single- and multi-lane roads. It has become the standard benchmark for the task. and one of its main strengths is the dedicated leaderboard, which acts as a powerful surveying tool of the current state of the art. Its main drawbacks are that it has no weather or illumination diversity and the relative sparsity of its traffic scenes.
- Semantic KITTI spawned as a complementary dataset from KITTI’s popularity and it expands on its features by adding semantic annotations for all sequences of the odometry subset.
- BDD100K is an extensive RGB-only dataset that offers 100.000 sequential images of driving in four different American cities. This dataset offers lane marking and drivable area annotations, making a distinction between directly drivable and alternatively drivable areas. They use (i) the directly drivable tag for road areas in which the ego-vehicle is currently driving on and has the priority, and (ii) the alternative drivable tag for road areas that the ego-vehicle is not currently driving on but could through a lane change. This dataset also offers a dedicated leaderboard based on yearly competitions.
- KITTI-360 calls itself the successor of KITTI and expands on the latter by offering richer sensor modalities, semantic instance annotations and a more accurate localization in suburban scenes with moderate traffic. It expands on the sensor suite of the original KITTI by adding an additional LiDAR, a pair of front-facing RGB cameras to produce disparity maps and two lateral fisheye cameras to complete their 360º scene perception. The main value proposition of this dataset is that it is the only segmentation-oriented dataset with all sensor modalities simultaneously available.
- DIODE is a combined indoor/outdoor dataset that offers RGB, depth and normal information using the same sensing and imaging setup. It achieves this by using the FARO Focus S350 sensor, which is an actuated phase-shift laser scanner that creates RGB and depth scenes with very high accuracy, resolution and FOV. The downside of using this dataset for drivable area estimation is that it has no sequential scenes as all of them are static due to the nature of the sensor.
- 3DHD CityScenes is a dataset that combines high-definition maps with high-density synchronized and georeferenced pointclouds taken by a high-end spatial imaging sensor.
- OpenLane V2 is a dataset built on top of nuScenes and Argoverse 2 that focuses on scene structure perception and reasoning by offering a dynamic map that takes into account traffic elements such as ground markings and traffic lights and signals. One of its highlights is the fact that they offer 3D-annotated lanes in their map, as opposed to the 2D lanes present in most datasets.
- Online HD Map Construction Benchmark offers a set of vectorized and rasterized maps from camera images that is built on top of the nuScenes dataset.
4. Drivable Area Estimation
4.1. Architecture
- Noise Removal is the process of identifying and correcting sensor data with the purpose of improving algorithmic or computational performance. Two types of noise are identified: static and motion noise. Static noise encompasses data that are irrelevant or harmful to the algorithm, such as outliers or out-of-range information, and are inherent to the sensor. The removal process of this type of noise can range from simple and fast through thresholding processes, such as min–maxing points that exceed a set height value, to more complex and costly, such as plane fitting through least-squares or RANSAC. Motion noise causes deformations in relevant information to the algorithm and is created by the vehicle’s movement. Example correction processes for this issue are pointcloud deskewing or attitude alignment.
- Modal Transformation: As a previous step to fusion or in order to apply cross-field techniques, a modal transformation to sensor data can be performed. Modal transformations usually affect data dimensionality by means of projecting (e.g., 3D LiDAR pointclouds onto image coordinates) or deriving (e.g., creating a map of normals from an image or a pointcloud).
- Fusion is the process of combining data coming from different sensors or stages to increase the accuracy that would be achieved in isolation. It is a stage that can be performed at different times during an algorithm execution as it can be applied to sensor information or to already generated features. Common fusion techniques are grid maps and Kalman filtering. Fusion inputs can be diverse: raw sensor data, processed sensor data, low-level features and high-level features.
- Feature Extraction: The translation from high-dimensional rich sensor data into basic features takes place in feature extraction. Features useful for drivable area estimation can be low level, such as image color or texture, or high level, such as detected lane lines in RGB images or height differentials extracted from pointclouds.
- Feature Expansion formulates hypotheses of the drivable area and fits them to the available extracted features. It generates drivable area proposals and uncertainty estimations. Applicable techniques in this stage are image upscaling, graph search or model fitting.
- Tracking: Using an estimation of vehicle displacement and geospacial information, this stage matches the produced estimations through time in order to improve the output. Tracking can be achieved by applying Bayesian fusion or Kalman filtering.
- Neural Network deep learning technologies have proven useful to act as a backend in any of the drivable area estimation stages. Neural networks can be trained to perform any stage in isolation and also to perform several tasks at once or even the complete algorithm from start to finish. Network training can be performed at different points in the algorithm’s timeline while taking a variety of different inputs (e.g., raw sensor data, data after modal transformation, data after fusion, features, etc).
4.2. Taxonomy
- Constraints. Drivable area estimation deals with the detection of man-made, industrially designed artifacts such as roads, curbs, sidewalks, etc. As such, it is useful to limit the search space using carefully chosen constraints. Symmetry constraints are commonly both finding features and fitting models, as most roads are defined by left and right boundaries that are also parallel. Finding the twin equivalent of a feature or constraining a model to remove outliers is a sound geometric foundation for road detection. Smoothness constraints. Roads are made for non-flexible, bound-to-the-ground and several hundred kilos heavy vehicles that cannot make abrupt direction changes and, therefore, require gradual evolution to be traversable. Smoothness constraints help find drivable area estimation candidates and predict its evolution. Continuity constraints are useful to find matches between consecutive sensor reads since drivable areas do not usually abruptly end or change. Fixed size constraints. Most roads maintain a fixed width that can be bound between a maximum and a minimum value that is useful to weed out outliers and find equivalents when also applying the symmetry constraint. Flatness constraints are useful because surfaces need to be locally flat when underneath a vehicle, or otherwise, at least one of the wheels would not be in contact with the ground.
- Features are acquired by processing raw data coming from the sensors and creating data points that host new information. Color features use pixel properties in RGB images to differentiate the drivable area from other vehicles, pedestrians or non-drivable surfaces. They are sensible to lighting changes, adverse weather or similarly colored bodies that do not belong to the same category. Edge features try to identify the drivable area by detecting hard gradients in color, intensity or geometry, as those are commonly found in the road limits with sidewalks or ditches. Texture features extract information from how color is arranged spatially in an RGB image and help produce drivable area candidates on the assumption that texture patterns remain consistent within the same road objects. Normal features are produced by calculating the normal vector of a group of geometric points and clustering them based on their angle. Usually, road normals point in the same direction or change gradually. Reflectivity features can be acquired using modern LiDAR sensors and give information about how any surface is able to reflect or absorb light. It is useful to detect road markings and signs as they are usually highly reflective. It could even be used to detect the road itself as a poorly reflective area if the sensor is sensitive enough.
- Modeling fits the sensor data to mathematical models that define lines or planes. Straight lines are a common model that is cheap computationally and can be directly applied to simple roads or as a collection of segments to model complex roads. Splines are smooth piecewise functions defined by polynomials. They have variable complexity depending on their order and, therefore, are useful to model curves. Bezier curves are smooth global functions defined by polynomials. They are attractive as a model because they offer curvature continuity at every point and are not computationally expensive to compute in their closed-form expressions. Polynomials of the second or third degree are useful to model curves under some constraints (p.e flatness) as they are simple to understand and compute. Their drawbacks are that modifying any points affects the complete curve or undesirable effects at the boundaries. Planes are especially useful in LiDAR data processing as planes are a very common feature in LiDAR pointclouds that appear in roads, sidewalks or buildings.All of these model proposals need to be fitted to the data using a mathematical approach. A common method to estimate the parameters of a model is random sample consensus (RANSAC), which consists of iteratively checking the fitness of a random sample of the data against a previously set model. It is useful to separate inliers from outliers but is a non-deterministic model, which means that its accuracy depends on the amount of iterations it has run. Another technique is principal components analysis (PCA), which is a method for dimensionality reduction that tries to keep most information in the dataset. It linearly transforms the data into a new coordinate system where the majority of their variance can be described using less dimensions than the original data. The challenge in applying PCA in this field is usually selecting the data subset in which the model needs to be fitted. Finally, least median squares (LMS) tries to minimize the sum of the squared difference between an observed data point and the fitted value provided by a model. It is a simple method to apply and understand but is sensitive to outliers and can overfit the model.
- Representation methods offer alternative representations for sensor and feature information that reduce dimensionality and ease the computational load. They are also widely used for data and feature fusion. Occupancy Grids represent the world with evenly spaced cells that host a binary variable (occupied or free) that is estimated and assigned a probability. These grids are used to make drivable area assumptions over the free area. Elevation Map models the world by keeping height information that can be used to derive the drivable area by detecting abrupt height gradients. Polar Grids host the information in polar coordinates instead of in Cartesian coordinates. They usually result in smaller and, therefore, faster to compute grids, have higher resolution near the vehicle and are adequate representations of roads because they both grow radially from the vehicle. Triangle Grids are common world representations in fields such as videogames because they can accurately model height and complex shapes. Their main characteristics are planarity, which means that every point of a triangle can be at different heights and still belong to the same plane, and simplicity, since they have the lowest amount of vertices of any polygon.
- Propagation techniques create relational systems between data points to mitigate the issue of data gaps caused by occlusions or at great distances from the sensor. Markov random field is an undirected and cyclic graph technique that sets each data point as a node and tries to assign labels and their probability to them by performing inference techniques on the graph, such as belief propagation. Conditional random field is a special case of Markov random field in which the graph takes into account the influence between neighboring nodes by modeling the dependencies between nodes. Bayesian Generalized Kernel propagates information by assuming continuity between adjacent data points and inferring the missing information by applying a kernel to the neighboring observed points. Dempster–Shafer theory is a general mathematical framework to reason with uncertainty and deal with information gaps. It is designed to work with sets of different labels combining evidence from different sources and producing a degree of belief.
- Learning-Based methods are ubiquitous due to their flexibility that allows them to morph into several different techniques. They can have sensor data, features or grid representations as inputs and can be used to perform model fitting, feature extraction or generate drivable area candidates directly. Their main drawback is the need for large annotated datasets for training. Convolutional Neural Networks are very popular learning-based approaches in the field. They are usually applied directly to RGB images from the road or to images derived from LiDAR pointclouds. They work by training image kernels that learn to identify specific features in an image and then passing those features to a fully-connected layer that recognizes larger elements in the scene. Residual Neural Networks are a special case of convolutional neural networks that, at the same time, try to reduce the number of layers of the network and tackle the vanishing gradient problem that occurs in deep networks. They work by providing residual connections between skipping layers that avoid the activation functions of previous layers that would reduce their derivatives. This results in shallower networks that are more efficient to train.
4.3. Algorithms
- BGK [25] first performs coarse ground segmentation through a min–max height difference approach and projects it into a grid. Then, they estimate the missing height information in the grid through Bayesian generalized kernel inference. From the generated dense height grid, they estimate the normal vectors and compute their angle difference to produce navigable candidates through angle thresholding. Afterwards they apply bilateral filtering to preserve edges, and finally, they apply multi-frame tracking on consecutive grids.
- SNE-RoadSeg [26] estimates surface normals from the depth information coming from a RGB-D sensor using a surface normal estimation (SNE) module. Then, it feeds the estimated surface normal data and the raw RGB to a convolutional neural network (CNN) with a parallel encoder–decoder architecture with densely-connected skip connections.
- PLARD [27] performs an altitude difference-based transformation (ADT) to produce an altitude difference image from the LiDAR data. Then it inputs the RGB data plus the produced altitude difference image into a CNN that fuses the generated features through feature space adaption (FSA) in its intermediate stages to finally produce a road estimation candidate.
- USNet [28] performs surface normal estimation from the depth information coming from the RGB-D sensor. It then feeds the estimated data and the raw RGB data into two separate CNNs that generate feature maps. Finally, those feature maps are fed to a module called multi-scale evidence collection (MEC) that generates separate prediction and uncertainty maps.
- Unsupervised RD [29] fuses RGB and LiDAR data to create the superpixel data structure. Then, it performs Delaunay triangulation to assign spatial surface to the superpixels. From the estimated surfaces, it computes its normals and uses them to estimate obstacle points through flatness edge thresholding. Then, it creates a ray map by casting rays from the origin of the sensor to the detected obstacles, which produces a rough road estimation. Finally it uses Markov random fields(MRF) and belief propagation to perform feature fusion with the objective of increased robustness.
- Map-Supervised RD [9] generates training annotations using OSM and geo-referenced images from the KITTI dataset. It then refines the generated labels by clustering pixels with similar color. Finally, it then uses the automatically generated labeled data to train a CNN.
- RBANet [30] uses SegNet [50] to create residual feature maps and then apply reversed attention (RA) and boundary attention (BA) units to them to generate road estimations.
- HID-LS [31] fuses RGB and LiDAR data to create superpixels. Then it performs Delaunay triangulation to interpolate the missing gaps in the 3D information. In the generated data, it computes inverse depth and height maps. It then uses the inverse depth map to compute horizontal and vertical histograms and derive first road region estimation. Using row and column scanning in the height map, it derives the second road region estimation. Finally, it fuses both estimations to produce a final road candidate.
- Curb Detection [32] first applies RANSAC to extract the ground plane with sidewalk, curb and road points. Then, it performs sliding-beam segmentation to divide the extracted ground areas in regions of interest. Finally, it extracts features through angle thresholding to detect curbs.
- LiDAR-Histogram [33] creates LiDAR imagery by displaying the points in a 2D plane organized by pitch and rotation angle. Then, it computes histograms from the LiDAR imagery. Afterwards, it performs RANSAC line fitting on the histograms to estimate road lines. Finally, it segments the pointcloud based on the estimated road lines.
- CyberMELD [10] computes vertical and horizontal slope maps and sums them to, afterwards, grow and detect a general region of interest (ROI) road area. It then performs Delaunay triangulation to map ROI 3D Points onto the RGB image. In the generated data, it applies inverse perspective mapping (IPM) and finds lane lines on it using gradients. Finally, it fuses the lane lines with OSM to generate an ego-lane estimation.
- RoadNet3 [34] uses cascaded CNNs to reduce the resolution of the feature map followed by a long–short-term memory (LSTM) network to find the contour of the drivable area. It applies the same architecture in parallel in the full image and in a smaller region in the center of the image and fuses them to produce the road estimation.
- Double Projection [35] creates a 2.5D elevation map with sets of points falling under the same cell through a min–maxing process. It also creates a 2.5D range map by projecting LiDAR points to a virtual cylinder plane. Then, the elevation map is used to detect obstacles through thresholding, and a reachable area is estimated through forward flood fill. The range map produces a road area using a smoothness constraint. Finally, the reachable area and the road area are fused through Bayesian decision theory to extract the drivable area.
- Pseudo-LiDAR [36] uses a big to small [51] neural network to estimate depth from RGB. It then transforms depth into attitude space to generate a Pseudo-LiDAR [52] pointcloud. It then applies a residual neural network (RNN) to process features in parallel for RGB and Pseudo-LiDAR and fuse them at different points of the pipeline to obtain the final result.
- CLCFNet [37] transforms a LiDAR pointcloud into LiDAR imagery. Then, it performs a perspective transformation to put the LIDAR imagery in the camera view. Then, it inputs the LiDAR imagery, LiDAR pointcloud and RGB into three cascaded CNNs to extract and fuse the road features. It works with LiDAR only or with both LiDAR–RGB data depending on light conditions.
- Multi-Cue [38] computes normal vectors on the disparity maps from stereo images. It then finds interest boundary areas from highly diverging normals. Curve fitting on the boundary pixels by using support vector regression(SVR) [53].
- TRAVEL [39] first corrects pointcloud skew and attitude caused by ego-motion. Then, it models the terrain by grouping subsets of pointcloud points into tri-grid field nodes. Afterwards, it uses breadth-first traversable graph search to classify traversable nodes by measuring acceptable concavity and convexity. Finally, it applies model fitting to match the traversable nodes to pointcloud points, assigning them a label.
- Road Markings [40] implement coarse ground segmentation through RANSAC plane fitting, along with regional grow-clustering to weed out points belonging to the curb. Then, they apply adaptive thresholding based on Otsu’s method [54] on the reflectivity information coming from the sensor. Finally, they produce lane boundary proposals by line model fitting.
- Line Fitting [41] first implements coarse ground segmentation through the channel-based clustering of points in a 2.5D polar grid map, along with height thresholding. Then, it identifies boundary points by checking the angle, distance and height difference between adjacent points. Finally, it performs B-spline curve fitting to produce the road boundary candidate.
- YOLOP [42] uses CSPDarknet [55] as a backbone to extract feature maps. Then, it implements spatial pyramid pooling (SPP) [56] to fuse features of different scales and a feature pyramid network (FPN) [57] to fuse features of different semantic levels. Then, it uses an upsampling process to restore the original size of the image from the feature map and generate two separate segmented images of the drivable area and lane lines.
- HybridNets [43] is a neural network that uses EfficientNet-B3 [58] as a backbone to extract feature maps and an FPN as neck to fuse features across feature maps with different resolutions. Then, it upsamples feature maps up to half the original resolution and feeds it to its proprietary segmentation head to to produce the final multi-label segmented image.
- Rangenet++ [44] performs a spherical projection of the LiDAR pointcloud into a range image and then feeds it into a fully convolutional neural network with an encoder–decoder hour-glass-shaped architecture. Finally it performs a k-nearest neighbor (kNN) search to reduce noise and shadow-like artifacts in the produced multi-labeled output.
- Urban Road Filter [46] tries to detect the road by applying three different techniques to LiDAR pointclouds. The three different techniques differ in the way they divide the pointcloud, by channels, by beams and by sliding windows. It then applies different heuristics to the clusters to detect anomalies in height or angle differences to produce curb candidates. Finally, those curb candidates are used to produce a drivable area polygon.
- Evidential Grids [47] use the Dempster–Shafer theory to efficiently fuse information sources that provide partial information of the environment. It offers a framework to fuse occupancy grids created from LiDAR information with meta-knowledge obtained from a high-fidelity map. It outputs drivable and non-drivable space and can differentiate from stationary and moving objects through temporal fusion.
- RoadSLAM [48] separates the pointcloud into ground and non-ground through coarse segmentation and clusters it in sets of free areas. Then, a robust weighted least-squares curve fit is applied to each side of the selected free area in order to find the instance that maximizes the likelihood given the current route of the vehicle. An unscented Kalman filter (UKF) is used to produce a prediction of the control points of the B-Splines representing the road boundaries with the free area boundaries as input. Finally, all information is accumulated and fused using GraphSLAM [59] with OSM as prior information.
- SemanticDepth [49] creates two information sources from a single RGB input. First, through monocular-depth estimation, they generate a disparity map from a single RGB image and then they transform it to a full pointcloud. Then, they perform semantic segmentation on the original RGB image using a CNN. Then, they overlay the segmented image as masks on the generated pointcloud to calculate the road width ahead of the ego-vehicle.
- There is, overall, high reported performance in all the publications. Every algorithm reports at least an F1 score of 83, with one of the methods even reaching 97.
- Execution time and performance are not clearly correlated as the highest performer has a runtime of 460 ms but a very close second only needs 22 ms.
- Neural networks offer consistently high performance at a relatively low computational cost but they all require a GPU to perform. The best tradeoff between computation time and algorithm performance is offered by USNet.
- With roughly 65% of the works under 100 ms in execution time and 47% under 50 ms, it is possible to see that a majority of the works today can handle sensor inputs at 10 Hz in real time and almost half can handle 20 Hz, which are two standard 3D LiDAR frequencies.
5. Discussion—Future Research
- Fusion variability: Sensor technology advances have brought the importance of sensor fusion to the foreground. Different sensors fill gaps in the drawbacks of others, and having a diverse array has become vital to reach a robust solution. Most of the methods currently tackle this by optimizing an algorithm for a sensor type and then modal transforming the sensor data from a different type into the optimized one. A potential research direction that would increase performance is delaying modal transformation and fusion to later stages of the algorithm and to optimize separate parallel algorithms for each sensor type. The authors in [37] propose a study with results from different fusion architectures that follows this direction.
- Three-dimensional LiDAR reflectivity: Modern developments in sensor technology have brought to the market 3D LiDARs with more sensitive receptors to light reflection intensity. The technology has come to a point where road shape can be understood from reflectivity alone. The drivable area estimation algorithms reliant on reflectivity features could pose a new breakthrough in the state of the art. The authors in [40] introduce a technique for lane marking estimation based on lidar reflectivity that could be adapted for drivable area estimation.
- Low-fidelity map fusion: Open-source low-fidelity maps offer valuable context information for drivable area estimation algorithms that is currently underrepresented in the state of the art. Some of the map providers see their contributor base increasing each year [60], while new players backed by big tech companies are appearing now [61]. A research direction for the field could be algorithms that take advantage of those databases as a core part of their proposal. A publication that is already working in that direction is [10].
- Streamlined data processing: Technological advancements in the sensor industry have brought sensors that produce high-density and high-quality outputs. Learning-based methods have their performance tied to having access to a significant amount of data. Therefore, the field would greatly benefit from new data storage and processing systems that could streamline the flow of data.
- RGB strongholds: RGB-based drivable area estimation is being spearheaded by learning-based methods while LiDAR does not have a clear technique leader yet. A potential gap in the current research is novel learning-based methods that only use 3D LiDAR.
- Algorithm output: The algorithms express their road estimation differently in terms of data type and quantity. Depending on the desired application, some outputs could be more interesting than others. Some relevant criteria to study the algorithms could be: (i) if the algorithm outputs is single or multi-label, (ii) if the algorithm outputs an expression of probability or uncertainty together with the estimation, (iii) if the output type is an image, a map, a pointcloud or a road model. All of these can significantly influence the choice in research direction. One example could be the need for an uncertainty measure helping to deal with potential sensor errors. Another could be the fact that a segmented image is likely to need an extra processing step to obtain a general road representation while a road model can be directly used by the next module.
- Multi-frame estimation: Algorithms that fuse estimations from different timesteps are not very common. The few ones that combine estimations carry them out in tracking or recursive neural networks. A potential research direction is to apply multi-frame estimation to produce reliable and robust intermittent-noise drivable area candidates.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pauls, J.H.; Strauss, T.; Hasberg, C.; Lauer, M.; Stiller, C. Can we trust our maps? An evaluation of road changes and a dataset for map validation. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2639–2644. [Google Scholar]
- Wong, K.; Gu, Y.; Kamijo, S. Mapping for autonomous driving: Opportunities and challenges. IEEE Intell. Transp. Syst. Mag. 2020, 13, 91–106. [Google Scholar]
- Gwon, G.P.; Hur, W.S.; Kim, S.W.; Seo, S.W. Generation of a precise and efficient lane-level road map for intelligent vehicle systems. IEEE Trans. Veh. Technol. 2016, 66, 4517–4533. [Google Scholar]
- Bar Hillel, A.; Lerner, R.; Levi, D.; Raz, G. Recent progress in road and lane detection: A survey. Mach. Vis. Appl. 2014, 25, 727–745. [Google Scholar]
- Papadakis, P. Terrain traversability analysis methods for unmanned ground vehicles: A survey. Eng. Appl. Artif. Intell. 2013, 26, 1373–1385. [Google Scholar]
- Liang, D.; Guo, Y.C.; Zhang, S.K.; Mu, T.J.; Huang, X. Lane detection: A survey with new results. J. Comput. Sci. Technol. 2020, 35, 493–505. [Google Scholar]
- Gao, B.; Pan, Y.; Li, C.; Geng, S.; Zhao, H. Are we hungry for 3D LiDAR data for semantic segmentation? a survey of datasets and methods. arXiv 2021, arXiv:2006.04307. [Google Scholar]
- OpenStreetMap Contributors. 2017. Available online: https://www.openstreetmap.org (accessed on 9 April 2023).
- Laddha, A.; Kocamaz, M.K.; Navarro-Serment, L.E.; Hebert, M. Map-supervised road detection. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, 19–22 June 2016; pp. 118–123. [Google Scholar]
- Wang, X.; Qian, Y.; Wang, C.; Yang, M. Map-enhanced ego-lane detection in the missing feature scenarios. IEEE Access 2020, 8, 107958–107968. [Google Scholar]
- Poggenhans, F.; Pauls, J.H.; Janosovits, J.; Orf, S.; Naumann, M.; Kuhnt, F.; Mayr, M. Lanelet2: A High-Definition Map Framework for the Future of Automated Driving. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Maierhofer, S.; Klischat, M.; Althoff, M. Commonroad scenario designer: An open-source toolbox for map conversion and scenario creation for autonomous vehicles. In Proceedings of the 2021 IEEE Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 3176–3182. [Google Scholar]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
- Wilson, B.; Qi, W.; Agarwal, T.; Lambert, J.; Singh, J.; Khandelwal, S.; Pan, B.; Kumar, R.; Hartnett, A.; Pontes, J.K.; et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv 2023, arXiv:2301.00493. [Google Scholar]
- Fritsch, J.; Kuehnl, T.; Geiger, A. A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms. In Proceedings of the International Conference on Intelligent Transportation Systems (ITSC), The Hague, The Netherlands, 6–9 October 2013. [Google Scholar]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual Conference, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2020. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Liao, Y.; Xie, J.; Geiger, A. KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D. arXiv 2021, arXiv:2109.13410. [Google Scholar]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Vasiljevic, I.; Kolkin, N.; Zhang, S.; Luo, R.; Wang, H.; Dai, F.Z.; Daniele, A.F.; Mostajabi, M.; Basart, S.; Walter, M.R.; et al. DIODE: A Dense Indoor and Outdoor DEpth Dataset. arXiv 2019, arXiv:1908.00463. [Google Scholar]
- Plachetka, C.; Sertolli, B.; Fricke, J.; Klingner, M.; Fingscheidt, T. 3DHD CityScenes: High-Definition Maps in High-Density Point Clouds. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 627–634. [Google Scholar]
- Wang, H.; Liu, Z.; Li, Y.; Li, T.; Chen, L.; Sima, C.; Wang, Y.; Jiang, S.; Wen, F.; Xu, H.; et al. Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving. arXiv 2023, arXiv:2304.10440. [Google Scholar]
- Li, Q.; Wang, Y.; Wang, Y.; Zhao, H. HDMapNet: An Online HD Map Construction and Evaluation Framework. arXiv 2021, arXiv:2107.06307. [Google Scholar]
- Xue, H.; Fu, H.; Ren, R.; Zhang, J.; Liu, B.; Fan, Y.; Dai, B. LiDAR-based Drivable Region Detection for Autonomous Driving. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 1110–1116. [Google Scholar]
- Fan, R.; Wang, H.; Cai, P.; Liu, M. Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 340–356. [Google Scholar]
- Chen, Z.; Zhang, J.; Tao, D. Progressive lidar adaptation for road detection. IEEE/CAA J. Autom. Sin. 2019, 6, 693–702. [Google Scholar] [CrossRef]
- Chang, Y.; Xue, F.; Sheng, F.; Liang, W.; Ming, A. Fast Road Segmentation via Uncertainty-aware Symmetric Network. arXiv 2022, arXiv:2203.04537. [Google Scholar]
- Liu, Z.; Yu, S.; Wang, X.; Zheng, N. Detecting drivable area for self-driving cars: An unsupervised approach. arXiv 2017, arXiv:1705.00451. [Google Scholar]
- Sun, J.Y.; Kim, S.W.; Lee, S.W.; Kim, Y.W.; Ko, S.J. Reverse and boundary attention network for road segmentation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Gu, S.; Zhang, Y.; Yuan, X.; Yang, J.; Wu, T.; Kong, H. Histograms of the normalized inverse depth and line scanning for urban road detection. IEEE Trans. Intell. Transp. Syst. 2018, 20, 3070–3080. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, J.; Wang, X.; Dolan, J.M. Road-segmentation-based curb detection method for self-driving via a 3D-LiDAR sensor. IEEE Trans. Intell. Transp. Syst. 2018, 19, 3981–3991. [Google Scholar] [CrossRef]
- Chen, L.; Yang, J.; Kong, H. Lidar-histogram for fast road and obstacle detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1343–1348. [Google Scholar]
- Lyu, Y.; Bai, L.; Huang, X. Road segmentation using cnn and distributed lstm. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [Google Scholar]
- Xu, F.; Liang, H.; Wang, Z.; Lin, L. A Framework for Drivable Area Detection Via Point Cloud Double Projection on Rough Roads. J. Intell. Robot. Syst. 2021, 102, 1–19. [Google Scholar]
- Sun, L.; Zhang, H.; Yin, W. Pseudo-LiDAR-Based Road Detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5386–5398. [Google Scholar] [CrossRef]
- Gu, S.; Yang, J.; Kong, H. A cascaded lidar-camera fusion network for road detection. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May 2021–5 June 2021; pp. 13308–13314. [Google Scholar]
- Wang, L.; Wu, T.; Xiao, Z.; Xiao, L.; Zhao, D.; Han, J. Multi-cue road boundary detection using stereo vision. In Proceedings of the 2016 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Beijing, China, 10–12 July 2016; pp. 1–6. [Google Scholar]
- Oh, M.; Jung, E.; Lim, H.; Song, W.; Hu, S.; Lee, E.M.; Park, J.; Kim, J.; Lee, J.; Myung, H. TRAVEL: Traversable Ground and Above-Ground Object Segmentation Using Graph Representation of 3D LiDAR Scans. arXiv 2022, arXiv:2206.03190. [Google Scholar] [CrossRef]
- Certad, N.; Morales-Alvarez, W.; Olaverri-Monreal, C. Road Markings Segmentation from LIDAR Point Clouds using Reflectivity Information. In Proceedings of the 2022 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Bogota, Colombia, 14–16 November 2022; pp. 1–6. [Google Scholar]
- Sun, P.; Zhao, X.; Xu, Z.; Wang, R.; Min, H. A 3D LiDAR data-based dedicated road boundary detection algorithm for autonomous vehicles. IEEE Access 2019, 7, 29623–29638. [Google Scholar] [CrossRef]
- Wu, D.; Liao, M.W.; Zhang, W.T.; Wang, X.G.; Bai, X.; Cheng, W.Q.; Liu, W.Y. Yolop: You only look once for panoptic driving perception. Mach. Intell. Res. 2022, 19, 550–562. [Google Scholar] [CrossRef]
- Vu, D.; Ngo, B.; Phan, H. HybridNets: End-to-End Perception Network. arXiv 2022, arXiv:2203.09035. [Google Scholar]
- Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and accurate lidar semantic segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4213–4220. [Google Scholar]
- Rummelhard, L.; Paigwar, A.; Nègre, A.; Laugier, C. Ground estimation and point cloud segmentation using spatiotemporal conditional random field. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1105–1110. [Google Scholar]
- Horváth, E.; Pozna, C.; Unger, M. Real-time LiDAR-based urban road and sidewalk detection for autonomous vehicles. Sensors 2021, 22, 194. [Google Scholar] [CrossRef] [PubMed]
- Kurdej, M.; Moras, J.; Cherfaoui, V.; Bonnifait, P. Map-aided evidential grids for driving scene understanding. IEEE Intell. Transp. Syst. Mag. 2015, 7, 30–41. [Google Scholar] [CrossRef]
- Burger, P.; Naujoks, B.; Wuensche, H.J. Unstructured road slam using map predictive road tracking. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 1276–1282. [Google Scholar]
- Palafox, P.R.; Betz, J.; Nobis, F.; Riedl, K.; Lienkamp, M. Semanticdepth: Fusing semantic segmentation and monocular depth estimation for enabling autonomous driving in roads without lane lines. Sensors 2019, 19, 3224. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.H.; Han, M.K.; Ko, D.W.; Suh, I.H. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv 2019, arXiv:1907.10326. [Google Scholar]
- Wang, Y.; Chao, W.L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8445–8453. [Google Scholar]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 1–7. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man, Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13029–13038. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Thrun, S.; Montemerlo, M. The graph SLAM algorithm with applications to large-scale mapping of urban structures. Int. J. Robot. Res. 2006, 25, 403–429. [Google Scholar] [CrossRef]
- OSM Maps User Stats. Available online: https://osmstats.neis-one.org/?item=members (accessed on 21 July 2023).
- Overture Maps. Available online: https://overturemaps.org/ (accessed on 21 July 2023).
Dataset | Year | Sensor Type | Sensors | Frames | Map | BM |
---|---|---|---|---|---|---|
Waymo Perception [13] | 2023 | 3D LiDAR RGB | 5 × Proprietary LiDAR 5 × 1920 × 1280 RGB | 390,000 | ✓ | ✓ |
Argoverse 2 [14] | 2023 | 3D LiDAR RGB | 2 × VLP-32C 7 × 2048 × 1550 RGB | 6,000,000 | ✓ | ✓ |
KITTI ROAD [15] | 2015 | 3D LiDAR RGB | Velodyne HDL-64E 1242 × 375 RGB | 579 | ✗ | ✓ |
nuScenes [16] | 2020 | 3D LiDAR RGB RADAR | Velodyne HDL32E 1600 × 900 RGB ARS 408-21 | 40,000 | ✓ | ✓ |
BDD100K [17] | 2020 | RGB | 1280 × 720 RGB | 100,000 | ✗ | ✓ |
CityScapes [18] | 2016 | Stereo RGB | 2048 × 1024 RGB | 24,998 | ✗ | ✓ |
KITTI-360 [19] | 2021 | 3D LiDAR Stereo RGB 360º RGB | Velodyne HDL-64E SICK LMS 200 4 × 1408 × 376 RGB | 83,000 | ✗ | ✓ |
Semantic KITTI [20] | 2019 | 3D LiDAR | Velodyne HDL-64E | 43,552 | ✗ | ✓ |
DIODE [21] | 2019 | RGB-D | FARO Focus S350 | 27,858 | ✗ | ✗ |
3DHD CityScenes [22] | 2022 | 3D LiDAR | Trimble Mx8 | - | ✓ | ✗ |
OpenLane V2 [23] | 2023 | Built on: nuScenes and Argoverse | ✓ | ✓ | ||
OMC Benchmark [24] | 2021 | Built on: nuScenes | ✓ | ✗ |
Name | Year | Input | Architecture | Methodology | Output | Contributions | Limitations |
---|---|---|---|---|---|---|---|
BGK [25] | 2021 | 3D LiDAR | Min–maxing, Bilateral Filtering, BGK, Normal Estimation Multi-frame tracking | Segmented Grid | Generates information from unobserved areas without LiDAR hits | Computational and algorithmic performance tied to grid resolution | |
SNE-RoadSeg [26] | 2020 | RGB-D | SNE, CNN | Surface Normals Segmented Image | SNE module that can be plugged into other CNNs with proven improvement | Surface normal information might missclassify sidewalks as road | |
PLARD [27] | 2019 | 3D LiDAR RGB | ADT, CNN, FSA | Altitude Difference Image Segmented Image | Leverages LiDAR to make RGB data robust to shadows | Requires high-end GPU to perform | |
USNet [28] | 2022 | RGB-D | SNE, MEC, Uncertainty Map | Uncertainty Image Probabilistic Segmented Image | Good tradeoff between computational and algorithmic performance Uncertainty map could be used by other modules | Requires high-end GPU to perform | |
Unsupervised RD [29] | 2017 | 3D LiDAR RGB | Superpixel, Delaunay triangulation, MRF, Belief propagation | Superpixel Probabilistic Segmented Image Probabilistic Segmented Pointcloud | Robust to illumination changes | Detection of 3D features dictates whole algorithm performance | |
Map-Supervised RD [9] | 2016 | Map RGB | OSM, CNN | Segmented Image | Able to work with or without map on inference | Reliant on extrinsic camera calibration and GPS quality | |
RBANet [30] | 2019 | RGB | SegNet, BA, RA, CNN | Probabilistic Segmented Image | Residual stages makes algorithm inspectable at different stages | Requires high-end GPU to perform | |
HID-LS [31] | 2019 | 3D LiDAR RGB | Superpixel, Depth Map, Height Map, Histograms | Superpixel Segmented Image | Transforms spatially discrete LiDAR pointclouds into a continuous and organized structure | Reliant on LiDAR resolution and parametrization | |
Curb Detection [32] | 2018 | 3D LiDAR | RANSAC Sliding beam segmentation | Curb Points | Good performance on CPU | Relies on curbs to detect drivable area | |
LiDAR-Histogram [33] | 2017 | 3D LiDAR | RANSAC, Histogram | LiDAR Imagery Segmented Pointcloud | Detects positive/negative obstacles and estimates road drivability degree | Makes assumptions on voids in LiDAR data which could missclassify poorly reflective obstacles | |
CyberMELD [10] | 2020 | Map 3D LiDAR RGB | Delaunay triangulation, IPM | Segmented Image | Leverages OSM to deal with missing features caused by shadows or occlusions | Only validated on single lane two-way roads Sensitive to OSM errors | |
RoadNet3 [34] | 2019 | RGB | CNN, LSTM | Segmented Image | Reduces feature map resolution with a CNN to achieve high performance | Assumes continuity of drivable area | |
Double Projection [35] | 2021 | 3D LiDAR | Range Map, Elevation Map Forward flood fill | Ground Model Segmented Pointcloud | Able to deal with rough terrain and offroad situations | Only considers positive obstacles | |
Pseudo-LiDAR [36] | 2022 | RGB | RNN | Pseudo-LiDAR Segmented Image | Takes advantage of 3D features but only requires an RGB camera | Requires additional depth estimating network to produce pseudo-LiDAR | |
CLCFNet [37] | 2021 | 3D LiDAR RGB | CNN | LiDAR Imagery Segmented Pointcloud Segmented Image | Runs on LiDAR only in case RGB is negated | LiDAR imagery over raw pointclouds makes algorithm sensitive to occlusions | |
Multi-Cue [38] | 2016 | RGB-D | SVR | Curb Points Road Boundary Model Segmented Image | Highest ranking stereo vision algorithm in the KITTI dataset | Uses surface normals to find road boundaries which can missclassify sidewalks as road | |
TRAVEL [39] | 2022 | 3D LiDAR | Breadth-first graph search, Tri-grid field | Segmented Pointcloud | Manages to perform on sloped surfaces and rough terrain One of few non-NN with open-source code | Only focus on traversable does not differentiate pathways from road | |
Road Markings [40] | 2022 | 3D LiDAR | Otsu thresholding, Reflectivity, Line fitting | Lane Boundary Model | Takes advantage of a rarely used data type | Only tested in traffic-free environments sensitive to parameter tuning | |
Line Fitting [41] | 2019 | 3D LiDAR | Channel-based segmentation B-spline curve fitting | Road Boundary Model Segmented Pointcloud | Can deal with occlusions Extracted lines can be fused with OSM data | Relies on curb to detect drivable area | |
YOLOP [42] | 2021 | RGB | CSPDarknet, SPP, FPN, CNN | Multi-label Segmented Image | Detects opposite lane as non-drivable area Produces drivable area plus lane estimations | Can missclassify gaps in the drivable area as lane-lines | |
HybridNets [43] | 2022 | RGB | EfficientNet-B3, FPN, CNN | Multi-label Segmented Image | Same as YOLOP while improving computational and algorithmic performance | Limited by camera FOV need to be very close to area to detect it as drivable | |
Rangenet++ [44] | 2019 | 3D LiDAR | kNN, CNN | Multi-label Segmented Pointcloud | Offers open-source trained pointclouds tackles over-segmentation in postprocessing | Requires high-end GPU to perform Hard-linked to LiDAR specifics in training | |
SpatioTemporal CRF [45] | 2017 | 3D LiDAR | CRF | Segmented Pointcloud | Robust to changes in slope and temporarily obstructed areas | No quantitative results Implementation details bare in the publication | |
Urban Road Filter [46] | 2021 | 3D LiDAR | Channel-based segmentation | Curb Points Road Boundary Model | Works in real-time on CPU | To detect road it requires the curb to be visible and the sidewalk to be smooth | |
Evidential Grids [47] | 2015 | Map 3D LiDAR | DST | Evidential Grid | Deals very well with noisy input data | Requires previously segmented data Assumes that the map used is high-fidelity | |
RoadSLAM [48] | 2019 | Map 3D LiDAR | OSM, B-Spline, UKF, Channel-based segmentation, GraphSLAM | Road Boundary Model | Is able to produce road estimations in areas out of sensor reach | Does not work on complex road geometries such as junctions | |
SemanticDepth [49] | 2019 | RGB | CNN | Pseudo-LiDAR Segmented Image | Robust to occlusions Specifically designed to work without lane-lines | Computationally expensive |
Name | Constraints | Features | Model | Representation | Propagation | Learning | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sym | Smo | Con | FS | Flat | C | E | T | N | R | SL | Sp | P | Pl | OG | EM | PG | TG | MRF | CRF | BGK | DST | CNN | RNN | |
BGK [25] | – | – | – | – | – | – | ✓ | – | ✓ | – | – | – | – | – | – | ✓ | – | – | – | – | ✓ | – | – | – |
SNE-RoadSeg [26] | – | – | ✓ | – | – | ✓ | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
PLARD [27] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
USNet [28] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
Unsupervised RD [29] | – | – | – | – | ✓ | ✓ | ✓ | ✓ | ✓ | – | – | – | – | – | – | – | – | – | ✓ | – | – | – | – | – |
Map-Supervised RD [9] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
RBANet [30] | – | – | ✓ | – | – | ✓ | ✓ | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ |
HID-LS [31] | – | – | – | – | ✓ | ✓ | ✓ | – | ✓ | – | – | – | – | – | – | ✓ | – | – | – | – | – | – | – | – |
Curb Detection [32] | ✓ | ✓ | – | ✓ | – | – | ✓ | – | – | – | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – |
LiDAR-Histogram [33] | – | – | – | – | ✓ | – | ✓ | – | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – |
CyberMELD [10] | – | – | ✓ | – | ✓ | – | ✓ | – | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – |
RoadNet3 [34] | – | – | ✓ | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
Double Projection [35] | – | ✓ | ✓ | – | – | – | ✓ | – | – | – | – | – | – | – | ✓ | ✓ | – | – | – | – | – | – | – | – |
Pseudo-LiDAR [36] | – | – | ✓ | – | – | ✓ | ✓ | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ |
CLCFNet [37] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
Multi-Cue [38] | – | – | ✓ | – | – | ✓ | ✓ | – | ✓ | – | – | – | – | ✓ | – | – | – | – | – | – | – | – | – | – |
TRAVEL [39] | – | – | ✓ | – | ✓ | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | ✓ | – | – | – | – | – | – |
Road Markings [40] | ✓ | – | – | – | – | – | – | – | – | ✓ | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – |
Line Fitting [41] | – | – | ✓ | – | – | – | ✓ | – | – | – | – | ✓ | – | – | – | – | ✓ | – | – | – | – | – | – | – |
YOLOP [42] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
HybridNets [43] | – | – | ✓ | – | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
Rangenet++ [44] | – | ✓ | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – |
SpatioTemporal CRF [45] | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – | – | – | ✓ | – | – | – | – |
Urban Road Filter [46] | – | ✓ | ✓ | – | ✓ | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
Evidential Grids [47] | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – | – | – | – | – | – | ✓ | – | – |
RoadSLAM [48] | – | ✓ | ✓ | – | – | – | – | – | ✓ | – | – | ✓ | – | – | – | – | – | – | – | – | – | – | – | – |
SemanticDepth [49] | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | ✓ | – | – |
Name | Time [ms] | Pre. | Rec. | F1 | GPU | Dataset | Code |
---|---|---|---|---|---|---|---|
BGK [25] | 45 | 98.66 | 72.72 | 83.73 | ✗ | SemKITTI | ✗ |
SNE-RoadSeg [26] | 80 | 96.90 | 96.61 | 96.75 | ✓ | KITTI | ✓ |
PLARD [27] | 160 | 96.79 | 96.86 | 96.83 | ✓ | KITTI | ✓ |
USNet [28] | 22 | 96.51 | 97.27 | 96.89 | ✓ | KITTI | ✓ |
Unsupervised RD [29] | - | 83.97 | 91.83 | 87.72 | ✗ | KITTI | ✗ |
Map-Supervised RD [9] | 280 | 86.01 | 89.66 | 87.80 | ✓ | KITTI | ✗ |
RBANet [30] | 160 | 95.14 | 97.50 | 96.30 | ✓ | KITTI | ✗ |
HID-LS [31] | 250 | 92.52 | 93.71 | 93.11 | ✗ | KITTI | ✗ |
Curb Detection [32] | 12 | 87.64 | 89.28 | 86.98 | ✗ | Proprietary | ✗ |
LiDAR-Histogram [33] | 100 | 93.06 | 88.41 | 90.67 | ✗ | KITTI | ✗ |
CyberMELD [10] | 50 | 95.94 | 91.30 | 93.56 | ✗ | KITTI | ✓ |
RoadNet3 [34] | 16 | 88.12 | 90.06 | 89.08 | ✓ | KITTI | ✗ |
Double Projection [35] | 77 | 95.91 | 99.28 | 95.00 | ✗ | SemKITTI | ✗ |
Pseudo-LiDAR [36] | 460 | 97.30 | 97.54 | 97.42 | ✓ | KITTI | ✗ |
CLCFNet [37] | 23 | 96.38 | 96.39 | 96.38 | ✓ | KITTI | ✗ |
Multi-Cue [38] | 2500 | 84.95 | 88.55 | 86.71 | ✗ | KITTI | ✗ |
TRAVEL [39] | 19 | 90.00 | 96.70 | 93.10 | ✗ | SemKITTI | ✓ |
Road Markings [40] | - | 97.04 | 94.03 | 95.51 | ✗ | Proprietary | ✗ |
Line Fitting [41] | 36 | 94.95 | 94.95 | 94.95 | ✗ | Proprietary | ✗ |
RoadSLAM [48] | - | 87.00 | 92.00 | 89.43 | ✗ | Proprietary | ✗ |
YOLOP [42] | 43 | mIoU: 91.5 | ✓ | BDD100K | ✓ | ||
HybridNets [43] | 37 | mIoU: 90.5 | ✓ | BDD100K | ✓ | ||
Rangenet++ [44] | 76 | mIoU: 91.8 | ✓ | SemKITTI | ✓ | ||
SpatioTemporal CRF [45] | 147 | only qualitative | ✓ | KITTI | ✗ | ||
Urban Road Filter [46] | 15 | only qualitative | ✗ | Proprietary | ✓ | ||
Evidential Grids [47] | - | only qualitative | ✗ | Proprietary | ✗ | ||
SemanticDepth [49] | 637 | MAE: 0.48 m | ✗ | Proprietary | ✗ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hortelano, J.L.; Villagrá, J.; Godoy, J.; Jiménez, V. Recent Developments on Drivable Area Estimation: A Survey and a Functional Analysis. Sensors 2023, 23, 7633. https://doi.org/10.3390/s23177633
Hortelano JL, Villagrá J, Godoy J, Jiménez V. Recent Developments on Drivable Area Estimation: A Survey and a Functional Analysis. Sensors. 2023; 23(17):7633. https://doi.org/10.3390/s23177633
Chicago/Turabian StyleHortelano, Juan Luis, Jorge Villagrá, Jorge Godoy, and Víctor Jiménez. 2023. "Recent Developments on Drivable Area Estimation: A Survey and a Functional Analysis" Sensors 23, no. 17: 7633. https://doi.org/10.3390/s23177633
APA StyleHortelano, J. L., Villagrá, J., Godoy, J., & Jiménez, V. (2023). Recent Developments on Drivable Area Estimation: A Survey and a Functional Analysis. Sensors, 23(17), 7633. https://doi.org/10.3390/s23177633