1. Introduction
Traffic accidents are among the leading causes of death. The World Health Organization (WHO) estimated that approximately 1.19 million road traffic deaths occurred in 2021 [
1]. In 2023, Brazil recorded 67,766 accidents on federal highways. Of these, 5639 (8.32%) involved maneuvers in the changing lane or motorcyclists riding between lanes. Furthermore, 1620 (2.39%) were due to improper overtaking, and 6325 (9.33%) occurred as vehicles entered the road without observing the presence of other vehicles. These four types of incidents represented 13,584 cases (20% of the total), resulting in 1023 fatalities, 16,355 injuries, and 29,862 vehicles involved [
2].
Previous studies have revealed numerous factors that influence the severity of traffic accidents, including human behavior, vehicle conditions, traffic characteristics, road infrastructure, and environmental conditions [
3,
4,
5]. Among human-related causes, a lack of attention, mobile phone use, excessive speed, improper overtaking, and alcohol consumption are the most frequently recorded factors contributing to deaths, underscoring their significant role in traffic-related deaths [
6].
Illegal overtaking, along with other illegal driving behaviors, has a strong correlation with the frequency of accidents, fatalities, and injuries and is among the main factors associated with an increased risk of accidents [
7]. Overtaking is among the most significant driving behaviors on two-lane highways and affects the highway capacity, safety, and level of service [
8]. Overtaking is a complicated maneuver that requires a driver to make several decisions based on the prevailing passing conditions. For instance, a driver chooses an acceptable gap size in the opposing traffic, the following distance behind the impeding vehicle, and the distance to leave in front of the impeding vehicle when returning to the original lane after passing [
9]. This study focuses exclusively on right-hand traffic. For left-hand traffic (as common in countries like England), minor adjustments would be necessary to account for the overtaking lane on the right.
Figure 1 illustrates the overtaking scenario analyzed in our study.
The overtaking action depends on numerous factors, including the ego vehicle’s current state and the surrounding vehicles’ positions and speeds. Furthermore, psychological factors such as impulsivity, mindfulness, driving attitudes, and depression can influence overtaking decisions [
10]. Establishing educational policies and promoting self-awareness can enhance traffic safety and encourage drivers to adopt safer driving practices [
11]. By increasing awareness of risky behaviors and recognizing unsafe decisions, we can encourage drivers to adopt safer practices and ultimately reduce the number of road accidents.
Indeed, one of the dangers of the overtaking maneuver relies on the driver’s ability to account for both the distance between the two vehicles and the speed of the one ahead [
12]. Although overtaking is not the leading cause of road accidents, collisions during overtaking maneuvers are among the most severe [
13]. For more than a decade, road traffic accidents have been the leading cause of death among young people [
1].
Given those risks, autonomous driving is a widely studied topic for researchers [
14], and may improve the traffic environments, making roads and highways safer. In traffic, overtaking is a very complex maneuver [
15]. Detecting overtaking is not trivial due to the variety of driving scenarios and conditions and the quality of cameras and sensors, which presents a challenge in developing accurate detection systems.
In the last few years, advancements in deep learning and computer vision have revolutionized the analysis of traffic behavior [
16,
17,
18,
19,
20]. These technologies are essential for the automatic interpretation of images and videos, enabling real-time object recognition, pattern detection, and scene segmentation. Despite these significant advancements, overtaking detection still poses challenges in low-cost applications [
21].
This paper explores the application of computer vision techniques to develop a system that can detect illegal overtaking maneuvers on the road, aiming to deal with the mentioned challenges. The main goal of the proposed system is to improve road safety, assist in vehicle auditing systems, and enhance driving practices.
Thus, the proposed method uses dashboard-mounted smartphone cameras and geolocation data to define analysis areas. You Only Look Once version 8 (YOLOv8) detects yellow road lines, while YOLO for Panoptic driving Perception version 2 (YOLOPv2), followed by post-processing, confirms potential illegal overtakes through detection overlaps. The system evaluates and stores these events throughout the video to identify and extract the moments of violation.
The main contributions of this paper are summarized as follows:
The use of low-cost cameras to make the technology more accessible and feasible for widespread implementation.
The application of recent deep learning and image processing techniques to accurately detect illegal overtaking maneuvers, improving traffic safety.
The development of a tool designed for integration into vehicle auditing systems to monitor and evaluate driving practices, which could improve road safety and help reduce accidents caused by illegal maneuvers.
The paper is structured as follows to provide a comprehensive understanding of this approach:
Section 2 discusses the previous state-of-the-art models designed.
Section 3 presents the dataset’s characteristics, methodology, and validation metrics.
Section 4 presents the results obtained in our study.
Section 5 discusses the results based on the data presented in the literature. Finally,
Section 6 presents the conclusion of this study.
2. Related Work
Deep learning, a subset of artificial intelligence, has revolutionized numerous fields by enabling machines to learn and make decisions from vast amounts of data. Among the many architectures within deep learning, Convolutional Neural Networks (CNNs) have emerged as a cornerstone for image classification and computer vision tasks. These networks demonstrate remarkable performance by integrating feature extraction and classification within a single framework, eliminating the need for manual feature engineering. This seamless integration, powered by deep learning, transforms machine vision and drives significant breakthroughs in tasks such as object detection [
20]. In traffic safety, deep learning performs tasks including obstacle detection, lane recognition, traffic sign and light recognition, and enabling self-driving cars.
Lane recognition, in particular, plays a crucial role in deviation warnings, collision prevention, and effective path planning. Deep learning approaches explore different ways to improve the accuracy and efficiency of these systems, contributing to enhanced traffic safety [
20]. Accurate lane detection is essential for identifying illegal overtaking maneuvers, as it allows for monitoring the vehicle’s position relative to lane boundaries. Advanced models for lane detection, such as the SCNN-based hybrid model, have achieved accuracies exceeding 0.970 on benchmark datasets [
22]. Additionally, semantic segmentation models like SegNet have performed well, reaching an F1-score of 0.934 and recall of 0.926 [
23]. By combining multiple detection frameworks, performance can be further enhanced, with models achieving F1-scores above 0.950 [
24].
Lee and Park [
25] proposed a method for rearview camera-based blind-spot detection and a lane change assistance system for autonomous vehicles using CNNs. They use YOLOv9 for vehicle detection, combined with Sobel edge detection and a Kalman filter to identify and track lanes, achieving a reported lane detection rate of 0.915. Lu and Chiu [
26] employed a domain adaptation model and image segmentation to improve lane detection performance in challenging scenarios. They emphasize that factors like low light, shadows, rain, or snow hinder accurate detection, achieving an F1-score of 0.743 under normal conditions and 0.691 at night.
In vehicle overtaking detection, researchers [
16,
18,
27,
28] have developed several models to enhance detection accuracy. One such study [
16] combines image pre-processing, segmentation, optical flow, and CNNs to identify overtaking maneuvers. The model removes repetitive patterns to eliminate false positives and employs behavior analysis to differentiate overtaking from regular lane changes or other movements. By analyzing the motion and speed of vehicles through optical flow, the approach improves overtaking predictions, thus reducing errors in detection algorithms.
Another approach [
27] integrates image processing techniques with a CNN to enhance overtaking safety. The system extracts features from images and applies manual rules to assess critical parameters such as the distance, velocity, and acceleration of surrounding vehicles. The CNN, trained on sequential image datasets, predicts the overtaking safety by evaluating risks based on the relative positions and speeds of surrounding vehicles, which improves decision-making during overtaking attempts.
In [
29], a system focusing on illegal lane crossing detection achieved a precision of 0.920 and a recall of 0.890, ensuring effective detection while minimizing false positives. Similarly, Xia et al. [
28] presents a method for overtaking detection with an overall error rate below 25%, demonstrating strong performance across various traffic scenarios, including those with poor visibility or heavy traffic. RGB-D data from Kinect sensors capture color and depth information, improving the detection accuracy in complex environments.
Panichpapiboon and Leakkaw [
18] introduced a probabilistic method for detecting lane changes, specifically using smartphone sensors. This method achieved approximately 0.900 precision and 0.928 recall, demonstrating its effectiveness in correctly identifying lane change events, minimizing missed detections, and ensuring reliable performance in real-world scenarios.
The development of autonomous vehicles has become an increasingly prominent area of research and development. These systems demand extensive technological and computational resources, with real-time information processing crucial to ensuring the safety of passengers and others on the road. In response, several studies have explored using cameras and sensors for monitoring dangerous behaviors on highways [
15,
16,
27,
29,
30]. These studies employ monocular cameras [
16] and other sensors, such as sonar, radar, and RGB-D data captured by Kinect devices [
28]. Lin et al. [
18] investigates using front-view cameras with gyro sensors, while smartphone sensors detect lane changes.
Building on these advances, researchers have focused on creating robust panoptic driving perception systems [
31], which integrate cameras and LiDAR to provide a comprehensive understanding of the environment. This approach supports route planning and enhances driving safety. For instance, Faizi and Al-sulaifanie [
32] use a CNN to detect lane features from image blocks and apply K-means clustering to map the detected points to lane markings, improving lane detection and driving assistance.
Lin et al. [
15] improve control strategies with Time to Lane Crossing estimation, which aids decision-making during overtaking maneuvers. YOLO, a deep learning model for object detection, excels in this application due to its efficiency and accuracy [
33]. Additionally, Finite State Machines (FSMs) help manage the overtaking decision process, addressing various states such as free driving, following, overtaking, and aborting [
34].
The literature presents recent and relevant studies on traffic safety, addressing various aspects such as lane and vehicle detection, lane change detection, and maneuver identification. However, to the best of our knowledge, no studies specifically address the detection of illegal overtaking. Existing works primarily focus on lane detection, without categorizing maneuvers, or in the case of autonomous vehicles, assessing whether overtaking is safe based on the distances between detected objects.
Despite recent advancements in deep learning and computer vision, challenges continue, particularly in more complex scenarios. Adverse weather conditions, poor lighting, and the demand for efficient and accurate solutions amplify these challenges. At night, the loss of color information further hinders sensor performance and complicates overtaking detection [
28]. Moreover, the effectiveness of models relies heavily on the quality and quantity of training data, making dataset collection a significant challenge [
27]. A key research gap exists: developing accessible, low-cost, and robust methodologies for accurately detecting illegal overtaking across various conditions.
In response to this challenge, this study proposes an innovative approach that integrates dashboard-mounted smartphone cameras, advanced deep learning models (YOLOv8 and YOLOPv2), and post-processing techniques. This solution aims to improve detection performance and simplify integration into vehicle auditing systems, enhancing road safety and reducing traffic violations.
3. Materials and Methods
This section presents the proposed methodology for identifying overtaking on continuous lanes.
Figure 2 illustrates the flow chart with the steps of this study.
The process starts by loading the data, which includes a file with the in-vehicle camera recording of the route and, when available, a file containing geolocation information. If geolocation data are available, we identify the segments where the driver was on a two-way road or highway and process only those relevant segments. If there are no geolocation data, we process the entire video. The processing involves detecting yellow lanes using YOLOv8. We also offer the option to filter YOLOv8 detections using YOLOPv2, followed by post-processing, to identify the overtaking lane and determine if the detections intersect. This filtering is activated only when YOLOv8 detects continuous lanes. We store the results of each frame in a Pandas DataFrame in Python, enabling further operations on the data to identify the start and end moments of overtaking on a continuous lane. Subsequently, we generate video clips corresponding to these segments.
Section 3 provides a structured overview of our approach. It begins with an outline of the dataset (
Section 3.1) and the associated geolocation information (
Section 3.2), followed by a discussion of the lane detection techniques employed (
Section 3.3), including yellow line detection (
Section 3.3.1) and overtaking lane detection (
Section 3.3.2). The section continues with an analysis of illegal overtaking incidents on continuous lane markings (
Section 3.4) and concludes with the evaluation metrics used to assess the model’s performance (
Section 3.5).
3.1. Dataset
The dataset used in this study comes from 1440 videos captured by dashboard-mounted smartphone cameras in vehicles. These videos, provided by the Energy Company of Minas Gerais (CEMIG) in Brazil, include urban, highway, and rural scenes, along with variations in weather conditions and times of day.
We selected a total of 4035 images, annotated into five categories: single dash lane (SDL), single solid lane (SSL), double solid lane (DdSL), double lane with solid/dash (DdLSD), and double lane with dash/solid (DdLDS). The dataset contains 4235 labels, as some images have multiple labels assigned. The distribution of these labels is as follows: 1079 for DdSL, 1032 for SDL, 401 for DdLDS, 382 for DdLSD, and 122 for SSL. Additionally, 1219 images consist only of background, which, although part of the same scenarios as the annotated images, lack the yellow lane markings relevant to the study. Including these background images was necessary to prevent the model from falsely identifying non-relevant markings, such as lateral lanes or other road markings unrelated to overtaking, improving the model’s ability to accurately distinguish between lane markings and other elements in the scene.
Figure 3 presents sample annotations from the dataset.
We divided our dataset into three subsets—training, validation, and testing—using a ratio of 80:10:10. This resulted in 3388 images for training, 323 for validation, and 324 for testing.
3.2. Geolocation Information
Considering that overtaking on continuous lane markings predominantly occurs on highways and roads, we decided to focus our analysis on these environments. Unlike urban areas, where the road infrastructure is more complex and diverse, with traffic lights, intersections, pedestrians, and cyclists, highways and roads offer more homogeneous conditions, allowing for more accurate detection of illegal overtaking. On these roads, continuous lane markings signify a prohibition on overtaking due to safety and visibility restrictions, making the investigation of such violations more pertinent. Furthermore, focusing on highways and roads reduces the influence of external variables.
To ensure that the system detects illegal overtaking only on highways and roads, we developed an application for the Android platform that leverages the system’s geolocation service to obtain periodic updates regarding the device’s geographic location. Each location data point comprises the latitude, longitude, timestamp, accuracy, heading, and velocity. The application records this data in a CSV file and saves a video of the entire route with the corresponding geolocation data.
After obtaining the driver’s geolocation data, we utilized Valhalla [
35], an open-source routing engine that works with OpenStreetMap (OSM) data [
36] to match Global Positioning System (GPS) coordinates. Valhalla performs map matching by aligning GPS measurements (represented as triplets of latitude, longitude, and time) with the corresponding road segments.
Every second, we have approximate information about the driver’s position. For each position, we check whether the road belongs to the classes of interest (primary, secondary, tertiary, highway, and motorway) and if it is a two-way road. Our analysis excludes roads with unknown classifications, residential and rural areas, and one-way roads.
We then extract the intervals of interest, focusing on the moments when the driver enters a highway, where we analyze illegal overtaking in a continuous lane. To avoid fragmented data and ensure consistent intervals, we discard isolated periods that last less than five seconds within the area of interest. If the gap between one period of interest and the next is less than five seconds, we merge the two into a single period.
3.3. Lane Detection
Object detection plays a critical role in computer vision, as accurate detection is essential for the effectiveness of applications such as autonomous vehicles, surveillance systems, company audits, and image analysis.
Another field of deep learning involves methods based on segmentation, which have achieved notable detection results. The image segmentation can include semantic segmentation as well as instance segmentation, each serving different purposes. Semantic and instance segmentation both operate at the pixel level to understand images. However, their focus differs: semantic segmentation identifies amorphous regions of uncountable objects with similar characteristics (stuff classes), such as drivable areas, lane lines, or background [
37], while instance segmentation not only classifies pixels but also distinguishes between different object instances, such as individual cars or pedestrians in an image. Segmentation refines the classification problem by assigning a predefined category to each pixel in the image. This approach provides more precise pixel-level boundaries than object detection, which focuses on identifying and localizing objects within bounding boxes without considering the finer details at the pixel level.
Recently, the literature has proposed several methods in this field, among which the You Only Look Once (YOLO) network, introduced by [
38], stands out. The YOLO algorithm is renowned for its exceptional speed compared to other methods while maintaining high accuracy. We selected it for its precision, fast response time—enabling real-time output—and capability to perform object detection and instance segmentation tasks.
3.3.1. Yellow Line Detection
A state-of-the-art model called YOLOv8 [
39] was recently released, offering advanced features for object detection and instance segmentation tasks. The network comes in five different versions: YOLOv8n (nano), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra large).
We trained different variants of the YOLOv8 model (nano, small, and medium) to achieve instance segmentation of yellow overtaking lines, excluding those related to parking or other purposes. We used models pre-trained on the Common Objects in Context (COCO) dataset [
40].
To ensure a quick and accurate response while minimizing computational costs, we performed a grid search by varying the model size, number of epochs, batch size, and learning rate to find the optimal parameters.
To determine the best parameters, we initially trained the model on a smaller subset of the data. Due to the multiple combinations generated by the grid search, running these on the full dataset would have been time-consuming. Therefore, we used the test set as the training set and kept the validation set unchanged.
For the reduced dataset grid search, YOLOv8m with a batch size of 8, a learning rate of , and 100 epochs achieved the best results in terms of bounding boxes and segmentation mAP on the validation set. Based on these findings, we trained YOLOv8m with these hyperparameters on the entire dataset, which included 3388 training images, 323 validation images, and 324 test images.
Data augmentation is crucial for improving the robustness and performance of YOLO models. During training, we used the standard augmentation techniques provided by YOLOv8, as detailed in
Table 1. However, we needed to modify the horizontal flip operation because flipping the image affects the DdLSD and DdLDS labels. Specifically, after flipping the image horizontally, the DdLSD label changes to DdLDS and vice versa. To handle this, we customized the RandomFlip class in YOLOv8.
3.3.2. Overtaking Lane Detection
YOLOPv2 [
31] is a multitask deep learning network that has demonstrated effective and efficient results in vehicle detection, drivable area segmentation, and lane segmentation. The YOLOPv2 model was inspired by the architectures of YOLOP [
41] and HybridNet [
42]. The main difference in this lies in its backbone for feature extraction and its use of three separate decoder heads to perform specific tasks, rather than using a single branch for both drivable area segmentation and lane detection. The authors indicate that this modification is due to the inherent complexity of segmentation tasks; the drivable area and lane segments have distinct challenges, which require different characteristics at the feature level. Consequently, utilizing different network structures enhances the detection performance.
YOLOPv2 provides three types of output data: (1) information regarding object detection, specifically for the vehicle class, including the class number, bounding boxes, and confidence scores for each detected object; (2) a binary image resulting from the drivable area segmentation; and (3) a binary image with lane segmentation.
Figure 4 shows examples of YOLOPv2’s output data plotted on a sample image.
In our study, we used only the lane segmentation output from YOLOPv2 (
Figure 5b), as this is sufficient to determine the area where the driver is without needing the road mask. At this stage, we did not consider vehicle detection because the focus is on evaluating the driver’s movement; for this purpose, we did not check the distance between vehicles or whether there is a vehicle to the right.
We then applied post-processing to filter the lanes and retain only the lane relevant for overtaking, i.e., the yellow line on the driver’s left side if the driver is not currently overtaking. The post-processing consists of the following steps:
- (a)
The removal of small objects;
- (b)
The identification of the driver’s lane;
- (c)
The detection and disconnection of lane intersections;
- (d)
The removal of disconnected secondary lanes;
- (e)
The re-identification of the driver’s lane;
- (f)
The identification of the overtaking lane;
- (g)
The extension of the detected lane.
First, we eliminated small objects (a) that are likely to be noise rather than actual lanes by removing any objects with an area smaller than 50 pixels from the binary lane segmentation image provided by YOLOPv2.
Figure 5c illustrates an example of this step.
To identify the driver’s lane (b), we first delineated the contours of each segmented object. We then applied the orthogonal distance regression model [
43] to determine the slope of each detected boundary. Next, we identified which lanes with positive and negative slopes were closest to the center of the image, thus obtaining the most representative pair. Subsequently, we removed the remaining objects.
Figure 5d displays the result of this step.
However, we observed that in some cases, there could be unwanted connections between the driver’s lane and the road lanes, which prevents the previous step from achieving the desired effect. To detect and disconnect the lane intersections (c), first, we identify each segmented object in the image and locate its intersection point. Next, we selected another point at the highest part of the object. A line, with a thickness of 3 pixels, is drawn to connect these two points, effectively separating both lanes (
Figure 5e).
After disconnecting the objects, we remove the disconnected road lane (d). In this step, we identify the objects closest to the center of the image and eliminate the pixels from those that are further away (
Figure 5f).
After the previous step, some unwanted noise may persist. To eliminate these remnants, we re-identify the driver’s lane (e), repeating step (b) to ensure accurate detection. While this process may seem redundant, it is crucial at both stages to minimize the risk of erroneously removing the lane of interest.
Figure 5g illustrates the results of this step.
To determine the overtaking lane (f), we use the lane segmentation image from YOLOPv2 alongside the result from the previous step. The YOLOPv2 image contains lane detections along the road, while the post-processed image includes only the driver’s lane. We calculate the average x-coordinate of the segmented boundaries for both images and compare these values. If they are identical, we treat the driver and the highway lanes as the same, which prevents us from determining the overtaking lane. In this case, we obtain an image showing both lanes. If the center of the driver’s lane is greater than that of the highway lane, we remove the lane on the right; otherwise, we remove the lane on the left (
Figure 5h).
Since some lanes might appear as short segments, we perform the extension of detected lanes (g). We start by identifying the contour of the lane from the previous step and apply the orthogonal distance regression model [
43] to determine the line’s slope. Based on this slope, we draw a line with a thickness of 7 pixels extending from the bottom edge of the image to the highest point of the contour, limited to half the height of the image (
Figure 5i).
3.4. Analysis of Illegal Overtaking Incidents on Continuous Lane Markings
In the proposed approach, we used YOLOv8 to detect yellow lanes frame by frame from the video. In any frame where a continuous lane is detected, we applied YOLOPv2 followed by post-processing to identify the overtaking lane. If the segmentations intersect, we retain the detection; otherwise, we discard it.
We applied the orthogonal distance regression model [
43] to calculate the slope of the lane. Positive slope values suggest that the driver is on the left side of the lane, which could indicate an irregularity. However, we cannot draw any conclusions from a single frame. To address this, we stored the data from each frame in a Pandas DataFrame, allowing us to perform validation based on a broader range of information.
After processing the entire video and generating the complete DataFrame, we apply the following operations to reduce potential noise and improve the accuracy in correctly identifying overtaking irregularities:
- (a)
Removal of short sequences: a positive slope of detected lanes must be present for a minimum sequence of 50 frames. If the sequence length is less than this value, it is not considered an overtaking maneuver;
- (b)
Connecting nearby sequences: If the gap between two sequences is less than 20 frames, we connect them. This decision is based on the understanding that short gaps may be due to noise or missed detections of certain lanes;
- (c)
Assignment of the most common class using a sliding window: To accurately identify the lane in which the driver is during an overtaking maneuver, we determine the predominant class within a sliding window of 20 frames. This approach smooths detections and mitigates temporary fluctuations, ensuring a more stable and precise representation of the lane where overtaking occurs.
We empirically determined the frame quantities used in the operations described above. It is important to note that the videos used in our experiments have a frame rate of 30 frames per second (FPS). After filtering the data, we can identify the initial and final moments of overtaking in continuous lanes, ensuring a more accurate analysis by relying on temporal information rather than individual frames alone.
3.5. Evaluation Metrics
We used precision, recall, F1-score, and Mean Average Precision (mAP) to quantitatively assess the model’s performance. To clarify these metrics, the key parameters are defined as follows: True positive (TP) refers to instances where the model correctly identifies a positive case. False positive (FP) occurs when the model incorrectly identifies a case as positive. False negative (FN) represents instances where the model fails to detect a positive case, instead classifying it as negative.
Precision (P) represents the proportion of TPs among all cases predicted as positive. In simpler terms, it measures how many of the model’s positive predictions are correct. Equation (1) defines the formula for calculating precision:
Recall (R) measures the proportion of TPs among all positive instances. It evaluates how well the model identifies all relevant positive cases. Recall is calculated using Equation (2):
The F1-score (F) combines precision (P) and recall (R) into a single metric and is calculated as shown in Equation (3):
The Average Precision (AP), also applied in [
44], measures a model’s performance for a specific category by building a precision–recall curve. This curve is created by plotting precision values (y-axis) against recall values (x-axis) at various confidence thresholds. The AP is calculated as the area under this curve, summarizing how well the model identifies that category across all thresholds. Mathematically, Equation (4) expresses this relationship:
where
is the precision at threshold
k,
is the recall at threshold
k and
k is the number of thresholds.
Mean Average Precision (mAP) is the average of the AP values across multiple categories. It provides a comprehensive measure of the model’s overall accuracy and effectiveness in object recognition by summarizing performance across various classes and thresholds. The mAP is computed as Equation (5):
where
n is the number of categories and
is the AP for the
k-th category. For our case,
, corresponding to the five types of lanes.
In the results, we use the symbols mAP50 and mAP50-95, where mAP50 calculates mAP using a 50% confidence threshold, and mAP50-95 uses confidence thresholds ranging from 50% to 95%, giving detailed information on the performance at varying confidence levels.
4. Results
This section presents the experimental results, which evaluate the effectiveness of different models and the detection of illegal overtaking with the proposed approach. We carried out the tests on an NVIDIA GeForce RTX 2070 SUPER with 8 GB of VRAM.
4.1. Yellow Line Detection
In this section, we assessed the performance of various instance segmentation models using pre-trained YOLOv8.
Table 2 presents the results achieved with the best hyperparameters, as detailed in
Section 3.3.1.
The results in
Table 2 show that the YOLOv8m model performs well overall in detecting and segmenting most types of yellow lanes, but there are notable differences in its effectiveness across different classes.
The model demonstrates a high precision, recall, and Mean Average Precision (mAP) for the DdSL and DdLDS classes, indicating its effectiveness in detecting these lane types, which exhibit distinct visual patterns and are sufficiently represented in the dataset. The DdLDS class, in particular, achieves perfect precision in mask segmentation, reflecting a very low rate of FPs.
On the other hand, the SSL class exhibits the lowest performance, with lower precision, recall, and mAP values. The subtle appearance and underrepresentation of these lanes in the dataset suggest challenges in detecting and segmenting them. To improve the model’s performance for this class, increasing the number of SSL instances in the training data or adjusting hyperparameters could be effective solutions.
The model shows moderate results for the SDL and DdLSD classes, achieving reasonable precision and recall; however, it achieves slightly lower mAP scores at more challenging Intersection over Union (IoU) thresholds (mAP50-95).
The results demonstrate the model’s promising performance in detecting more common lane types, such as DdSL and DdLDS, with high precision and recall. However, the lower scores for SSL indicate potential for improvement. Further fine-tuning or incorporating more diverse training data could help the model better handle variations in lane appearance and challenging environmental conditions.
4.2. Experiments
We used YOLOv8 to segment yellow road lines (Experiment 1) and determine whether drivers can overtake. We trained YOLOv8 to recognize five types of lane markings: SDL, SSL, DdSL, DdLSD, and DdLDS. SDL and DdLSD lane markings allow overtaking, while SSL, DdSL, and DdLDS markings prohibit overtaking and indicate illegal maneuvers.
Figure 6 shows an example of detection for each possible yellow line on the road or highway. The images on the left display the original frames, and those on the right show the detection overlaid on the corresponding frames.
However, we noticed that YOLOv8 sometimes mistakenly identifies white markings and yellow boxes as single solid yellow lines. It also classifies white roadside markings as either double or single solid lanes. Additionally, the system often misinterprets parking markings as single solid lanes. As a result, in these situations—where the lane is to the driver’s right—the system may incorrectly recommend illegal overtaking.
Figure 7 illustrates some of these cases.
To minimize these errors, we evaluated a second approach: Experiment 2. In this approach, we applied a filter when YOLOv8 detects a road marking that could suggest an illegal overtaking opportunity, such as SSL, DdSL, or DdLDS types. This filter uses YOLOPv2 for lane detection, and incorporates additional post-processing, as detailed in
Section 3.3.2. After identifying the lane involved in overtaking, we check for overlap with the segment detected by YOLOv8. If an overlap exists, we retain the segmentation; if not, we discard it, signaling a potential detection error.
Figure 8 presents an example of illegal overtaking. On the left, our approach (YOLOPv2 with post-processing) successfully identifies the overtaking lane, highlighting it in green over the original image, with the YOLOv8 detection shown in red. On the right, however, a bus occludes the lane marking that defines the highway boundary, preventing YOLOPv2 from detecting it. As a result, the post-processing algorithm fails to determine the correct overtaking lane, as the calculated distance between the edges of the lane and the highway boundary is the same in this case. Despite this limitation, the overall results remain largely unaffected, and lane filtering still significantly reduces the number of FPs.
The implementation of the overtaking lane detection method has notable FPs from YOLOv8. However, some FP cases persist, especially in areas that lack continuous lane markings, such as residential zones with predominantly white street markings or rural areas without lane markings. As a result, we decided to focus our analysis exclusively on instances when the vehicle travels on highways or major roads.
Given this, we conducted Experiment 3, where we first checked the geolocation information. If the driver is on a two-way highway (conditions described in
Section 3.2), we extracted the intervals of interest. We then applied the processing described in Experiment 2 exclusively to these intervals. This initial data filtering significantly optimizes the processing time and the accuracy of detecting violations related to illegal overtaking.
An important point is that the current research utilized real traffic videos provided by CEMIG. Our dataset comprised videos of actual routes recorded in various environments, including residential areas, single- and dual-direction highways, and rural areas, captured at different times of day and under varying weather conditions using in-vehicle cameras in cars and motorcycles. Since the CEMIG dataset contained only a limited number of infractions, we had to conduct additional simulations to assess our method’s effectiveness in scenarios with more violations. As a result, we simulated cases involving traffic violations, such as overtaking in continuous lanes, to assess our approach.
Table 3 presents the distribution of these videos between simulated infractions (conducted in real scenarios and intentionally executed) and routine videos from CEMIG, along with the number of infractions and the total hours analyzed for each case.
Table 4 presents the results from the three experiments conducted with this dataset. The results demonstrate progressive improvements in lane detection by addressing key challenges related to FPs and contextual accuracy.
The analysis of Experiment 1 reveals several challenges faced by YOLOv8 in detecting yellow lane markings across different datasets. The model struggled with various issues: the complexity and variability of lane markings in residential areas, false detections in rural settings, and misclassifications on single- and dual-lane roads. Factors such as motion blur due to vehicle movement, camera instability caused by road conditions, varying lighting, and the poor quality of lane markings further complicate detection.
For the CEMIG data, which represent a diverse range of real-world scenarios, the precision was notably low, at 0.312. Despite achieving a high recall of 1.000, the F1-score was relatively low, at 0.476, indicating that although the model successfully identifies nearly all true lane markings (high recall), it also includes many FPs (low precision), resulting in a lower F1-score. In contrast, the simulated dataset, which also includes real-world scenarios but is predominantly composed of highway scenarios, achieved a higher precision of 0.600 and a better F1-score of 0.750. This comparison highlights that YOLOv8 performs better in scenarios with less variability, such as highways, but may face challenges in more complex environments where the diversity of objects with characteristics similar to lane markings increases the likelihood of confusing the model.
Experiment 2 implemented a filtering mechanism that combined YOLOPv2 with post-processing to verify if the detected lane markings were correctly identified as overtaking lanes. This method notably enhanced the precision, achieving a score of 0.714 for the CEMIG dataset. However, the recall for the simulation dataset saw a slight reduction to 0.917. This decrease was due to an FN, where a lane detected by YOLOv8 was not matched with the overtaking lane identified by YOLOPv2 and was therefore excluded. Despite this, the overall F1-score improved to 0.889, indicating a more effective balance between accurately identifying true violations and reducing false detections. Lane type matching played a crucial role in refining the results, although challenges remained in accurately identifying lanes in more complex scenarios.
Experiment 3 improved the methodology by incorporating geolocation data and focusing detection on two-way highways, where overtaking lanes are relevant. This contextual filtering effectively reduced FPs from residential, rural, and single-direction roads, improving the precision to 1.000 across both datasets. The overall F1-score increased to 0.970, indicating that integrating the geographic context significantly enhanced the model’s accuracy in detecting relevant scenarios and reduced false detections.
Figure 9 presents a few examples of correct detections of illegal overtaking, illustrating the performance under favorable conditions and challenging environments, such as poor lighting or worn lane markings.
5. Discussion
Detecting illegal overtaking poses a significant challenge in computer vision, especially when using basic smartphone cameras. The difficulty increases under adverse weather conditions and low-light environments. In this study, we propose a method to identify illegal overtaking by analyzing videos recorded using smartphones and combining them with geolocation data. The approach employs two YOLO models: one for detecting lane types and another for identifying overtaking lanes. To enhance accuracy, we applied several post-processing techniques. By integrating detection areas, the system analyzes temporal patterns to determine whether the driver is in an illegal overtaking zone.
We conducted three distinct experiments using our method to analyze the impact of the proposed techniques on the achieved results. The progression through the experiments highlights how each step addressed specific limitations of the previous approach. Experiment 1 faced challenges due to diverse environments, which we addressed by introducing lane-type filtering in Experiment 2. We further enhanced the method by adding geolocation context in Experiment 3. These advancements demonstrate a clear improvement in detecting relevant lane markings and reducing false positives.
In the study proposed by [
18], the authors achieved precision and recall values of 0.900 and 0.928, respectively. When comparing these with the results from our study, we observe that in Experiments 2 and 3, we achieved superior precision and recall values. In Experiment 3, we achieved a remarkable precision of 1.000 and a recall of 0.941 across the entire dataset.
Similarly, in examining the study by [
29], which reported a precision of 0.920 and a recall of 0.890, Experiment 3 also demonstrates superiority in both metrics, culminating in an F1-score of 0.970, thus exceeding the results obtained by [
29].
Regarding the use of sensors and cameras, works utilizing monocular cameras and combined sensors, such as those by [
15,
16], highlight significant challenges in scenarios with considerable variability. Night-time conditions often lead to a loss of color information, while adverse weather can significantly impair sensor performance, complicating overtaking detection. We encounter similar challenges, particularly regarding lane changes in residential or rural areas. The results from Experiment 1, specifically within the CEMIG dataset, yielded a precision of 0.312, reflecting the difficulties experienced in these conditions. Additionally, the quality and quantity of training data heavily influence model effectiveness, making dataset collection a substantial challenge, as noted by [
27].
While recent advancements have made strides in detecting illegal overtaking and lane recognition, several challenges persist. Many models focus on optimizing the performance for ideal conditions with good visibility and favorable weather, which limits their effectiveness in adverse situations, such as low light or poor weather conditions, particularly with smartphone cameras [
28,
29]. Furthermore, limited training data reduces the model’s ability to generalize across different environments, camera positions, and driver behaviors. This limitation in the dataset can significantly affect the performance in various traffic scenarios [
27].
Another critical gap is integrating multiple sensors to improve environmental perception. While some studies explore combining cameras with LiDAR [
31], others suggest incorporating additional sensors like radar, sonar, and RGB-D data to enhance overtaking detection, especially in adverse conditions [
28]. Despite promising advancements in models by [
16,
27], the real-time analysis of vehicle behavior, such as speed and proximity, remains a challenge.
In addition to the challenges posed by varying environmental conditions, we also encountered difficulties in differentiating between the colors of the lines, which could be either yellow or white. Initially, we annotated both colors with all marking types (SDL, SSL, DdSL, DdLSD, and DdLDS) and trained the model. However, the model struggled to correctly differentiate these markings, likely due to factors like lighting conditions during capture and the influence of the surrounding terrain. For example, the reddish soil near the road, along with vehicle flow, could distort the perception of lane boundaries, even for human observers. In Brazil, white lanes typically mark the boundaries between the highway and the roadside, separate lanes in residential areas, or delineate lanes on one-way highways. To address this issue, we focused solely on yellow lanes for training, as they provided a more distinct differentiation given the specific conditions of our dataset.
Our method addresses these gaps by proposing a more accessible solution using smartphone cameras, eliminating the need for depth sensing or additional sensors. While more challenging for detection, this solution is more scalable and cost-effective, offering significant precision in real-world traffic environments. Despite the challenges posed by adverse conditions, our method maintains high accuracy in detecting illegal overtaking violations, demonstrating its robustness and potential for broader applicability.
To enhance the model’s effectiveness in various road conditions, different countries, and the diversity of traffic rules, we need to focus on future refinement to improve the accuracy without excessive reliance on filtering techniques. Furthermore, to ensure broader applicability, incorporating images with white lane markings, particularly from regions where white lanes indicate overtaking zones, allows the model to generalize more effectively across different scenarios.
6. Conclusions
In conclusion, this study demonstrates significant advancements in lane change detection, offering practical implications for vehicle auditing and road safety. By identifying overtaking violations, our method provides valuable insights into driver behavior, supporting targeted interventions to improve adherence to traffic rules and enhance road safety.
The literature review revealed that previous methods often relied on complex and costly sensor setups, such as LiDAR, radar, and RGB-D data, or focused solely on lane detection without categorization. While effective in test environments, these approaches faced scalability challenges due to high costs and infrastructure requirements. In contrast, our method leverages low-cost smartphone cameras, eliminating the need for depth sensing or additional sensors. Despite the increased detection complexity, this approach remains scalable and cost-effective, achieving high accuracy in lane change detection.
The developed method effectively detects illegal overtaking. However, it can be more effective in countries where yellow markings indicate overtaking lanes. Also, it performs well on highways, while urban environments, with their complexity, present additional challenges and increase the likelihood of false positives. A further aspect to consider is the variability of the dataset in weather conditions and at different times of the day. An additional aspect to consider is the dataset’s variability in weather conditions and at different times of the day. However, its accuracy in night-time videos could not be assessed due to the lack of data under such conditions, as our data source does not include this information.
For future work, we would like to highlight the following areas for improvement:
Incorporating night-time footage and challenging weather conditions, such as snow, into the training dataset to improve the model’s robustness.
Expanding the training dataset to include global data, and, specifically, images of highways worldwide, covering different overtaking lane marking colors (including both yellow and white markings).
Training the models by incorporating the data from items 1 and 2. Additionally, we may need to adjust the system to handle these new data, and we should collect more violation videos, particularly those depicting traffic violations under various climatic conditions, lighting scenarios, and during night-time.
Modifying the method to enable real-time processing, allowing integration with in-vehicle systems that provide immediate alerts to drivers about overtaking violations.
The results presented in this paper and the proposed directions for future work highlight the role of the developed system integrated with vehicle auditing systems in improving driving behavior and road safety. This system has the potential to help reduce traffic violations and, consequently, decrease the risk of accidents on the road.