Leveraging Perspective Transformation for Enhanced Pothole Detection in Autonomous Vehicles
Abstract
:1. Introduction
1.1. Pothole Detection Importance for Autonomous Vehicles
1.2. Human Response to Potholes
1.3. Pothole Detection Methods
1.4. Challenges in Potholes Detection
1.5. Proposed Method: Vision-Based Pothole Detection Using Perspective Transformation
- We developed an automated perspective transformation algorithm that selects the ROI—the street area containing potholes from the vehicle’s perspective—resulting in the generation of a transformation matrix. The matrix warps the image, excluding irrelevant areas like the sky, while enhancing pothole features by making distant ones appear closer for easier detection.
- Using this matrix to transform the input data before model inference significantly boosts the accuracy and robustness of pothole detection with limited impact on runtime. To the best of our knowledge, this is the first study to utilize perspective transformation in this manner to enhance pothole detection performance.
- We propose an intuitive evaluation strategy to assess the pothole detection models’ performance across potholes at three distance ranges (near, medium, and far), demonstrating the potential of our approach in improving pothole detection at far distances.
2. Related Work
2.1. Vision Approaches
2.2. Perspective Transformation Technique
3. Methodology
3.1. Framework Overview
3.1.1. Perspective Transformation Motivation
3.1.2. YOLOv5: Key Features and Functionality
3.2. Automated Perspective Transformation Algorithm
- : Affect the x-coordinate of the transformed point.
- : Affect the y-coordinate of the transformed point.
- : Homogenization factors (usually, is set to 1).
Algorithm 1 Automatic perspective transformation for images and bounding boxes. |
Input: Images, ground truth labels, ROI offset α |
Output: Transformed images and bounding boxes, perspective transformation matrix M |
Initialize lists |
1: Initialize lists: , , , , , |
▹, : top-left corner coordinates |
▹w, h: width and height of bounding boxes |
▹, : bottom-right corner coordinates |
Read bounding boxes |
2: for each and do |
3: Read bounding box per labeled object in the image |
4: |
5: Append to respective lists |
6: end for |
Calculate ROI offsets |
7: |
Determine ROI corners then define source points |
8: |
9: |
10: |
11: |
12: Source points |
Clip ROI corners to be within image boundaries |
13: for each in do |
14: if then |
15: |
16: else if then |
17: |
18: end if |
19: if then |
20: |
21: else if then |
22: |
23: end if |
24: end for |
Define target points based on image dimensions |
25: Target points |
Calculate perspective transformation matrix |
26: M = getPerspectiveTransform(, ) |
Transform images and bounding boxes |
27: for each and do |
28: Transform using M |
29: Transform ’ bounding box coordinates using M |
30: Save the transformed images and bounding boxes |
31: end for |
32: return M |
- Initialize lists: Store coordinates and dimensions of bounding boxes for all images, including minimum and maximum x and y coordinates, width, and height for each bounding box.
- Read bounding boxes: Extract bounding box data from each image’s corresponding label file, calculate all ROI boundary points, and update the respective lists.
- Calculate offsets: Determine the ROI offsets using a specific value and the max width and height of all bounding boxes to define a slightly larger ROI.
- Determine ROI corners: Use the minimum and maximum coordinates from the lists, along with the calculated offsets, to determine the corners of the ROI. These corners are the source points () for the perspective transformation.
- Clip ROI corners: Ensure ROI corners stay within image boundaries.
- Define target points: Set target points () based on the image dimensions, representing the transformed image corners.
- Calculate transformation matrix: Compute the perspective transformation matrix M using the source and target points. This matrix is used to transform the coordinates of the ROI to the new perspective.
- Transform images and bounding boxes: Apply the transformation matrix M to each image and its bounding boxes. This involves transforming the image and adjusting the bounding box coordinates accordingly. The transformed images and bounding boxes are then saved.
4. Experiment Design
4.1. Evaluation Dataset
4.2. Comparison Methods
4.3. Evaluation Metrics
4.4. Evaluation Strategy
- Far:
- Medium:
- Near:
- Far:
- Medium:
- Near:
4.5. Implementation Settings
5. Results and Discussion
5.1. Experiment 1: Naive vs. Fixed Cropping vs. Automated Transformation Approach
5.2. Experiment 2: Effects of Network Complexity/Scale on Performance
5.3. Experiment 3: Ablation Study
- YOLOv5’s augmentations only without negative images: This setup utilized only YOLOv5’s augmentation step without negative images, as illustrated in the first row of Table 3.
- YOLOv5’s augmentations with negative images: This setup included negative images alongside YOLOv5’s augmentation step, shown in the second row of Table 3.
- Manual preprocessing and YOLOv5’s augmentations without negative images: This configuration combined manual preprocessing augmentations with YOLOv5’s augmentations, using only positive images. It achieved the best results among all setups.
- Manual preprocessing and YOLOv5’s augmentation with negative images: This setup used both manual and YOLOv5’s augmentations, incorporating negative images into the positive dataset. It resulted in the lowest performance metrics.
5.4. Computational Latency Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tamrakar, N.K. Overview on causes of flexible pavement distresses. Bull. Nepal Geol. Soc 2019, 36, 245–250. [Google Scholar]
- Wada, S.A. Bituminous pavement failures. J. Eng. Res. Appl. 2016, 6, 94–100. [Google Scholar]
- Adlinge, S.S.; Gupta, A. Pavement deterioration and its causes. Int. J. Innov. Res. Dev. 2013, 2, 437–450. [Google Scholar]
- Visual Expert. Reaction Time. Available online: https://www.visualexpert.com/Resources/reactiontime.html (accessed on 1 August 2024).
- Remodel or Move. Should You Hit the Brakes When Going Over a Pothole? Available online: https://www.remodelormove.com/should-you-hit-the-brakes-when-going-over-a-pothole/ (accessed on 1 August 2024).
- Balakuntala, S.; Venkatesh, S. An intelligent system to detect, avoid and maintain potholes: A graph theoretic approach. arXiv 2013, arXiv:1305.5522. [Google Scholar]
- University of Minnesota Twin Cities. Talking Potholes. Available online: https://twin-cities.umn.edu/news-events/talking-potholes-u-m (accessed on 1 August 2024).
- Kim, Y.M.; Kim, Y.G.; Son, S.Y.; Lim, S.Y.; Choi, B.Y.; Choi, D.H. Review of recent automated pothole-detection methods. Appl. Sci. 2022, 12, 5320. [Google Scholar] [CrossRef]
- Eduzaurus. Pothole Detection Methods. Available online: https://eduzaurus.com/free-essay-samples/pothole-detection-methods/ (accessed on 1 August 2024).
- Geoawesomeness. Application of Mobile LiDAR on Pothole Detection. Available online: https://geoawesomeness.com/eo-hub/application-of-mobile-lidar-on-pothole-detection/ (accessed on 1 August 2024).
- Samczynski, P.; Giusti, E. Recent Advancements in Radar Imaging and Sensing Technology; MDPI: Basel, Switzerland, 2021. [Google Scholar]
- Outsight. How Does LiDAR Compare to Cameras and Radars? Available online: https://www.outsight.ai/insights/how-does-lidar-compares-to-cameras-and-radars (accessed on 1 August 2024).
- Zhang, J.; Zhang, J.; Chen, B.; Gao, J.; Ji, S.; Zhang, X.; Wang, Z. A perspective transformation method based on computer vision. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 765–768. [Google Scholar] [CrossRef]
- Nienaber, S.; Booysen, M.J.; Kroon, R. Detecting potholes using simple image processing techniques and real-world footage. In Proceedings of the 34th South Africa Transport Conference (SATC), Pretoria, South Africa, 6–9 July 2015. [Google Scholar]
- Pereira, V.; Tamura, S.; Hayamizu, S.; Fukai, H. A deep learning-based approach for road pothole detection in timor leste. In Proceedings of the 2018 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Singapore, 31 July–2 August 2018; pp. 279–284. [Google Scholar]
- Chen, H.; Yao, M.; Gu, Q. Pothole detection using location-aware convolutional neural networks. Int. J. Mach. Learn. Cybern. 2020, 11, 899–911. [Google Scholar] [CrossRef]
- Dhiman, A.; Klette, R. Pothole detection using computer vision and learning. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3536–3550. [Google Scholar] [CrossRef]
- Dhiman, A.; Chien, H.J.; Klette, R. Road surface distress detection in disparity space. In Proceedings of the 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ), Christchurch, New Zealand, 4–6 December 2017; pp. 1–6. [Google Scholar]
- Maeda, H.; Kashiyama, T.; Sekimoto, Y.; Seto, T.; Omata, H. Generative adversarial network for road damage detection. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 47–60. [Google Scholar] [CrossRef]
- Salaudeen, H.; Çelebi, E. Pothole Detection Using Image Enhancement GAN and Object Detection Network. Electronics 2022, 11, 1882. [Google Scholar] [CrossRef]
- Shaghouri, A.A.; Alkhatib, R.; Berjaoui, S. Real-time pothole detection using deep learning. arXiv 2021, arXiv:2107.06356. [Google Scholar]
- Bučko, B.; Lieskovská, E.; Zábovská, K.; Zábovskỳ, M. Computer vision based pothole detection under challenging conditions. Sensors 2022, 22, 8878. [Google Scholar] [CrossRef] [PubMed]
- Rastogi, R.; Kumar, U.; Kashyap, A.; Jindal, S.; Pahwa, S. A comparative evaluation of the deep learning algorithms for pothole detection. In Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India, 10–13 December 2020; pp. 1–6. [Google Scholar]
- Kocur, V. Perspective transformation for accurate detection of 3d bounding boxes of vehicles in traffic surveillance. In Proceedings of the 24th Computer Vision Winter Workshop, Stift Vorau, Austria, 6–8 February 2019; Volume 2, pp. 33–41. [Google Scholar]
- Lee, W.Y.; Jovanov, L.; Philips, W. Multi-View Target Transformation for Pedestrian Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, Waikoloa, HI, USA, 3–7 January 2023; pp. 90–99. [Google Scholar]
- Wang, K.; Fang, B.; Qian, J.; Yang, S.; Zhou, X.; Zhou, J. Perspective Transformation Data Augmentation for Object Detection. IEEE Access 2020, 8, 4935–4943. [Google Scholar] [CrossRef]
- Hou, Y.; Zheng, L.; Gould, S. Multiview Detection with Feature Perspective Transformation. In Proceedings of the Computer Vision–ECCV 2020, Glasgow, UK, 23–28 August 2024; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 1–18. [Google Scholar]
- Jocher, G. YOLOv5 by Ultralytics. Available online: https://github.com/ultralytics/yolov5 (accessed on 4 August 2024).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, K.; Fu, Z.; Jin, S.; Chen, Z.; Zhou, F.; Jiang, R.; Chen, Y.; Ye, J. ESOD: Efficient Small Object Detection on High-Resolution Images. arXiv 2024, arXiv:2407.16424. [Google Scholar]
- Saponara, S.; Elhanashi, A. Impact of image resizing on deep learning detectors for training time and model performance. In Proceedings of the International Conference on Applications in Electronics Pervading Industry, Environment and Society, Online, 21–22 September 2021; Springer: Berlin/Heidelberg, Switzerland, 2021; pp. 10–17. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. Available online: https://github.com/ultralytics/ultralytics (accessed on 4 August 2024).
- Nozick, V. Multiple view image rectification. In Proceedings of the 2011 1st International Symposium on Access Spaces (ISAS), Yokohama, Japan, 17–19 June 2011; pp. 277–282. [Google Scholar]
- El Shair, Z.; Rawashdeh, S. High-temporal-resolution event-based vehicle detection and tracking. Opt. Eng. 2023, 62, 031209. [Google Scholar] [CrossRef]
- Kocur, V.; Ftáčnik, M. Detection of 3D bounding boxes of vehicles using perspective transformation for accurate speed measurement. Mach. Vis. Appl. 2020, 31, 62. [Google Scholar] [CrossRef]
- Barath, D.; Hajder, L. Novel Ways to Estimate Homography from Local Affine Transformations; Distributed Event Analysis Research Laboratory: Budapest, Hungary, 2016. [Google Scholar]
- Nienaber, S.; Kroon, R.; Booysen, M.J. A comparison of low-cost monocular vision techniques for pothole distance estimation. In Proceedings of the 2015 IEEE symposium series on computational Intelligence, Cape Town, South Africa, 7–10 December 2015; pp. 419–426. [Google Scholar]
- Quach, L.D.; Quoc, K.N.; Quynh, A.N.; Ngoc, H.T. Evaluating the effectiveness of YOLO models in different sized object detection and feature-based classification of small objects. J. Adv. Inf. Technol. 2023, 14, 907–917. [Google Scholar] [CrossRef]
Approach | Pothole Distance | Metric (%) | ||||
---|---|---|---|---|---|---|
AP50:95 | AP50 | AP75 | ARmax=1 | ARmax=10 | ||
Image as Is | All | 19.8 | 47.7 | 12.8 | 17.1 | 25.8 |
Near | 23.6 | 55.0 | 15.1 | 20.7 | 29.9 | |
Medium | 19.5 | 48.4 | 13.5 | 21.6 | 26.6 | |
Far | 6.8 | 18.8 | 3.1 | 10.0 | 11.5 | |
Bottom Cropped | All | 18.2 | 46.1 | 10.2 | 15.9 | 23.7 |
Near | 21.4 | 51.7 | 12.2 | 19.3 | 27.3 | |
Medium | 18.0 | 46.1 | 11.1 | 20.6 | 25.4 | |
Far | 5.5 | 18.2 | 1.5 | 7.3 | 9.0 | |
Double Cropped | All | 17.5 | 46.8 | 9.3 | 15.6 | 23.0 |
Near | 21.7 | 58.6 | 11.0 | 19.4 | 27.6 | |
Medium | 17.5 | 47.9 | 10.5 | 20.0 | 23.9 | |
Far | 11.2 | 27.9 | 5.1 | 11.7 | 14.2 | |
Auto Transformation | All | 28.4 | 61.9 | 22.1 | 21.6 | 35.2 |
Near | 31.7 | 64.5 | 26.5 | 26.2 | 38.5 | |
Medium | 31.7 | 68.8 | 26.0 | 31.9 | 38.8 | |
Far | 20.0 | 50.0 | 12.0 | 18.8 | 27.4 |
Approach | Object Detection Model | Parameters (M) | FLOPs (G) | Pothole Distance | Metric (%) | ||||
---|---|---|---|---|---|---|---|---|---|
AP50:95 | AP50 | AP75 | ARmax=1 | ARmax=10 | |||||
Image As Is | YOLOv5-small | 7.2 | 16.5 | All | 19.8 | 47.7 | 12.8 | 17.1 | 25.8 |
Near | 23.6 | 55.0 | 15.1 | 20.7 | 29.9 | ||||
Medium | 19.5 | 48.4 | 13.5 | 21.6 | 26.6 | ||||
Far | 6.8 | 18.8 | 3.1 | 10.0 | 11.5 | ||||
YOLOv5-medium | 21.2 | 49.0 | All | 21.7 | 50.3 | 14.6 | 17.7 | 27.4 | |
Near | 25.0 | 55.1 | 18.1 | 21.1 | 31.1 | ||||
Medium | 21.9 | 51.8 | 15.1 | 22.9 | 29.3 | ||||
Far | 7.4 | 25.0 | 1.7 | 9.6 | 12.3 | ||||
YOLOv5-large | 46.5 | 109.1 | All | 21.7 | 50.5 | 14.2 | 17.5 | 27.7 | |
Near | 25.4 | 55.7 | 18.4 | 21.2 | 31.6 | ||||
Medium | 21.6 | 54.9 | 11.6 | 22.9 | 29.9 | ||||
Far | 6.7 | 20.5 | 2.8 | 9.8 | 12.0 | ||||
Auto Transformation | YOLOv5-small | 7.2 | 16.5 | All | 28.4 | 61.9 | 22.1 | 21.6 | 35.2 |
Near | 31.7 | 64.5 | 26.5 | 26.2 | 38.5 | ||||
Medium | 31.7 | 68.8 | 26.0 | 31.9 | 38.8 | ||||
Far | 20.0 | 50.0 | 12.0 | 18.8 | 27.4 | ||||
YOLOv5-medium | 21.2 | 49.0 | All | 28.0 | 61.6 | 22.4 | 21.5 | 34.5 | |
Near | 31.1 | 62.9 | 28.3 | 25.9 | 38.6 | ||||
Medium | 30.7 | 67.9 | 24.5 | 30.1 | 37.6 | ||||
Far | 19.3 | 47.9 | 12.2 | 18.8 | 25.9 | ||||
YOLOv5-large | 46.5 | 109.1 | All | 28.6 | 60.2 | 24.5 | 22.0 | 35.3 | |
Near | 32.0 | 63.8 | 29.6 | 25.9 | 38.5 | ||||
Medium | 30.9 | 65.0 | 27.7 | 29.7 | 37.3 | ||||
Far | 20.8 | 47.6 | 13.8 | 21.0 | 29.1 |
Configuration | Metric (%) | |||||
---|---|---|---|---|---|---|
Preproc. Augs. | Neg. Images | AP50:95 | AP50 | AP75 | ARmax=1 | ARmax=10 |
27.1 | 59.7 | 20.1 | 19.6 | 34.6 | ||
✓ | 26.2 | 57.3 | 18.9 | 20.5 | 32.1 | |
✓ | 28.4 | 61.9 | 22.1 | 21.6 | 35.2 | |
✓ | ✓ | 25.8 | 55.1 | 19.0 | 19.8 | 31.3 |
Object Detection Model | Pre-Processing (ms) | Perspective Transformation (ms) | Inference (ms) | Post-Processing (ms) | Total Time (ms) |
---|---|---|---|---|---|
YOLOv5-small | 0.6 | 14.4 | 7.2 | 1.2 | 23.4 |
YOLOv5-medium | 0.6 | 14.4 | 14.3 | 1.4 | 30.7 |
YOLOv5-large | 0.6 | 14.4 | 25.5 | 1.3 | 41.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Abu-raddaha, A.; El-Shair, Z.A.; Rawashdeh, S. Leveraging Perspective Transformation for Enhanced Pothole Detection in Autonomous Vehicles. J. Imaging 2024, 10, 227. https://doi.org/10.3390/jimaging10090227
Abu-raddaha A, El-Shair ZA, Rawashdeh S. Leveraging Perspective Transformation for Enhanced Pothole Detection in Autonomous Vehicles. Journal of Imaging. 2024; 10(9):227. https://doi.org/10.3390/jimaging10090227
Chicago/Turabian StyleAbu-raddaha, Abdalmalek, Zaid A. El-Shair, and Samir Rawashdeh. 2024. "Leveraging Perspective Transformation for Enhanced Pothole Detection in Autonomous Vehicles" Journal of Imaging 10, no. 9: 227. https://doi.org/10.3390/jimaging10090227
APA StyleAbu-raddaha, A., El-Shair, Z. A., & Rawashdeh, S. (2024). Leveraging Perspective Transformation for Enhanced Pothole Detection in Autonomous Vehicles. Journal of Imaging, 10(9), 227. https://doi.org/10.3390/jimaging10090227