Enhancing Pedestrian Tracking in Autonomous Vehicles by Using Advanced Deep Learning Techniques
Abstract
:1. Introduction
2. Related Work
- Multi-Object Tracking (MOT) Methods:
- These methods are designed to concurrently track multiple pedestrians within a scene. Traditional approaches often employ the Hungarian algorithm for association, linking detections across frames [5].
- Recent advancements leverage deep neural networks, such as TrackletNet [15] and DeepSORT, to learn features and association scores, enhancing tracking accuracy.
- Re-identification-based Methods:
- This category integrates appearance-based and geometric feature-based techniques to identify and track pedestrians across different camera views. Notably, Siamese networks are employed to learn a similarity metric between pairs of images [16].
- Contemporary methods incorporate attention mechanisms to emphasize discriminative regions of pedestrians in order to improve tracking precision [17].
- Multi-Cue Fusion: Strategies in this category amalgamate various cues, such as color, shape, and motion, to enhance tracking robustness in complex scenes. For instance, the multi-cue multi-camera pedestrian tracking (MCMC-PT) method integrates color, shape, and motion cues from multiple cameras for comprehensive pedestrian tracking [18].
2.1. YOLOv8
2.2. DeepSORT
- Feature Extraction: Responsible for extracting features from input video frames, including bounding boxes and corresponding features.
- Detection and Tracking: Detects objects in each video frame and associates them with their tracks using the Hungarian algorithm.
- Kalman Filter: Predicts the location of each object in the next video frame based on its previous location and velocity.
- Appearance Model: Stores and updates the appearance features of each object over time, facilitating re-identification and appearance updates.
- Re-identification: Matches the appearance of an object in one video frame with its appearance in a previous frame.
- Output: Generates the final output—a set of object tracks for each video frame.
- Blurred Objects: Tracking difficulties due to image artifacts caused by blurred objects [37].
- Intra-object Changes: Challenges in handling changes in the shape or size of objects [38].
- Non-rigid Objects: Difficulty in tracking objects appearing for short durations [39].
- Transparent Objects: Challenges in detecting objects made of transparent materials.
- Non-linear Motion: Difficulty in tracking irregularly moving objects [40].
- Fast Motion: Challenges posed by quickly moving objects.
- Similar Objects: Difficulty in differentiating objects with similar appearances [41].
- Occlusion: Tracking challenges when objects overlap or obstruct each other [18].
- Scale Variation: Difficulty when objects appear at different scales in the image [42].
2.3. StrongSORT
- AFLink: A flag indicating whether the AFLink algorithm should be utilized for temporal matching.
- Path_AFLink: The path to the AFLink algorithm model to be employed.
- GSI: A flag indicating whether the GSI algorithm should be employed for temporal interpolation.
- Interval: The temporal interval to be applied in the GSI algorithm.
- Tau: The temporal interval to be used in the GSI algorithm.
3. Proposed Method
3.1. Proposed Approach
- Diverse Appearance Model Training: Our approach begins with training the appearance model on a diverse set of data. By exposing the network to a wide range of visual appearances, we aim to enhance its ability to effectively track objects. This diversity helps the model generalize better to various scenarios, including challenging conditions, such as occlusions, variations in lighting, and diverse pedestrian characteristics.
- Precise Parameter Tuning: Recognizing the critical role of parameter tuning in tracking accuracy, our approach involves the fine-tuning of various parameters. This includes parameters related to the Kalman filter, the Hungarian algorithm, and the confidence threshold used for tracklet merging. The optimization of these parameters is essential for achieving optimal tracking results, and our methodology systematically explores the parameter space for performance enhancement.
- Incorporating Additional Information: To further improve tracking accuracy, we explore the integration of additional information about the tracked objects. This supplementary information may include object size, shape, or velocity. By incorporating these cues into the tracking process, we aim to provide the algorithm with richer contextual information, enabling more accurate predictions and reducing instances of tracking failures.
- Integration with Other Tracking Algorithms: Our approach investigates the collaborative integration of StrongSORT with other tracking algorithms. This synergistic approach allows us to leverage complementary information from different sources. By combining the strengths of multiple algorithms, we aim to enhance the overall robustness and reliability of pedestrian tracking in diverse and challenging scenarios [20,43].
- Genetic Algorithm for Parameter Tuning:
- Random Parameter Generation: In this phase, a diverse set of parameters for the StrongSORT model is randomly generated. The randomness introduces variability, exploring different regions of the parameter space. This diversity is crucial as it helps prevent the algorithm from converging to local optima and promotes exploration of the broader solution space.
- Evaluation and Comparison: The generated parameters undergo evaluation using the tracking dataset. This evaluation involves running the StrongSORT model with the randomly generated parameters and measuring its performance against predefined goals. The goals serve as benchmarks, providing clear criteria to determine the success or failure of a set of parameters. This step is vital for identifying the parameters that contribute to improved tracking accuracy.
- Natural Selection: Natural selection is a key principle inspired by biological evolution. Parameters that exhibit superior performance, as measured by the predefined goals, are selected. This mimics the biological concept of favoring traits that contribute positively to survival and reproduction. The selected parameters become the foundation for the next generation, ensuring that successful traits are passed on and refined over successive iterations.
- Iteration: The entire process is repeated for several generations. In each iteration, the algorithm refines and evolves the parameters based on the success of the previous generations. Over time, this iterative approach leads to the emergence of parameter sets that exhibit superior performance, demonstrating the adaptability and efficiency of the genetic algorithm in finding optimal solutions within the solution space.
- Manual Tuning of High-Impact Parameters:
- Impact Analysis: A comprehensive analysis is conducted for understanding the impact of various parameters on the model’s performance. This involves studying how each parameter influences the tracking algorithm and its interactions with other parameters and the dataset. Impact analysis provides valuable insights into the complex relationships within the algorithm, guiding subsequent tuning efforts.
- Precise Manual Tuning: Based on the insights gained from impact analysis, high-impact parameters are precisely and manually adjusted. This precision involves a deep understanding of the algorithm’s behavior and the specific effects of parameter changes. Manual tuning allows researchers to exert fine-grained control over critical aspects, such as the Kalman filter parameters or tracklet merging thresholds, ensuring that adjustments align with the goals of improving tracking accuracy.
- Experiments and Measurements: After manual parameter tuning, a series of experiments are conducted to measure the model’s performance. These experiments involve running the StrongSORT algorithm with the manually adjusted parameters on the tracking dataset. The goal is to quantify the improvements achieved through manual tuning, providing empirical evidence of the impact of parameter adjustments on tracking accuracy.
- Reiteration: The tuning and measurement processes are iterative, allowing researchers to refine parameters further based on experimental results. Reiteration is essential for optimizing the model progressively. Researchers can repeat the manual tuning cycle as necessary to achieve the maximum potential of the algorithm, continually refining the parameter values and ensuring they align with the specific requirements of the tracking scenario.
3.2. Datasets
3.3. Improving Performance Using StrongSORT
3.4. Evaluation Metrics
4. Results
4.1. Quantitative Evaluation of Tracking Algorithms
4.2. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Razzok, M.; Badri, A.; Mourabit, I.E.; Ruichek, Y.; Sahel, A. Pedestrian Detection and Tracking System Based on Deep-SORT, YOLOv5, and New Data Association Metrics. Information 2023, 14, 218. [Google Scholar] [CrossRef]
- Bhola, G.; Kathuria, A.; Kumar, D.; Das, C. Real-time Pedestrian Tracking based on Deep Features. In Proceedings of the 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1101–1106. [Google Scholar]
- Li, R.; Zu, Y. Research on Pedestrian Detection Based on the Multi-Scale and Feature-Enhancement Model. Information 2023, 14, 123. [Google Scholar] [CrossRef]
- Du, Y.; Zhao, Z.; Song, Y.; Zhao, Y.; Su, F.; Gong, T.; Meng, H. StrongSORT: Make DeepSORT Great Again. IEEE Trans. Multimed. 2023, 25, 8725–8737. [Google Scholar] [CrossRef]
- Dendorfer, P.; Rezatofighi, H.; Milan, A.; Shi, J.; Cremers, D.; Reid, I.D.; Roth, S.; Schindler, K.; Leal-Taixé, L. MOT20: A Benchmark for Multi Object Tracking in Crowded Scenes. arXiv 2020, arXiv:2003.09003. [Google Scholar]
- Xiao, C.; Luo, Z. Improving Multiple Pedestrian Tracking in Crowded Scenes with Hierarchical Association. Entropy 2023, 25, 380. [Google Scholar] [CrossRef] [PubMed]
- Myagmar-Ochir, Y.; Kim, W. A Survey of Video Surveillance Systems in Smart City. Electronics 2023, 12, 3567. [Google Scholar] [CrossRef]
- Tao, M.; Li, X.; Xie, R.; Ding, K. Pedestrian Identification and Tracking within Adaptive Collaboration Edge Computing. In Proceedings of the 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1124–1129. [Google Scholar]
- Son, T.H.; Weedon, Z.; Yigitcanlar, T.; Sanchez, T.; Corchado, J.M.; Mehmood, R. Algorithmic Urban Planning for Smart and Sustainable Development: Systematic Review of the Literature. Sustain. Cities Soc. 2023, 94, 104562. [Google Scholar] [CrossRef]
- AL-Dosari, K.; Hunaiti, Z.; Balachandran, W. Systematic Review on Civilian Drones in Safety and Security Applications. Drones 2023, 7, 210. [Google Scholar] [CrossRef]
- Vasiljevas, M.; Damaševičius, R.; Maskeliūnas, R. A Human-Adaptive Model for User Performance and Fatigue Evaluation during Gaze-Tracking Tasks. Electronics 2023, 12, 1130. [Google Scholar] [CrossRef]
- Alhafnawi, M.; Salameh, H.A.B.; Masadeh, A.; Al-Obiedollah, H.; Ayyash, M.; El-Khazali, R.; Elgala, H. A Survey of Indoor and Outdoor UAV-Based Target Tracking Systems: Current Status, Challenges, Technologies, and Future Directions. IEEE Access 2023, 11, 68324–68339. [Google Scholar] [CrossRef]
- Li, F.; Chen, Y.; Hu, M.; Luo, M.; Wang, G. Helmet-Wearing Tracking Detection Based on StrongSORT. Sensors 2023, 23, 1682. [Google Scholar] [CrossRef]
- Abdulghafoor, N.H.; Abdullah, H.N. A Novel Real-time Multiple Objects Detection and Tracking Framework for Different Challenges. Alex. Eng. J. 2022, 61, 9637–9647. [Google Scholar] [CrossRef]
- Wang, G.; Wang, Y.; Zhang, H.; Gu, R.; Hwang, J. Exploit the Connectivity: Multi-Object Tracking with TrackletNet. In Proceedings of the 27th ACM International Conference on Multimedia (MM), Nice, France, 21–25 October 2019; pp. 482–490. [Google Scholar]
- Li, R.; Zhang, B.; Kang, D.; Teng, Z. Deep Attention Network for Person Re-Identification with Multi-loss. Comput. Electr. Eng. 2019, 79, 106455. [Google Scholar] [CrossRef]
- Jiao, S.; Wang, J.; Hu, G.; Pan, Z.; Du, L.; Zhang, J. Joint Attention Mechanism for Person Re-Identification. IEEE Access 2019, 7, 90497–90506. [Google Scholar] [CrossRef]
- Guo, W.; Jin, Y.; Shan, B.; Ding, X.; Wang, M. Multi-Cue Multi-Hypothesis Tracking with Re-Identification for Multi-Object Tracking. Multimed. Syst. 2022, 28, 925–937. [Google Scholar] [CrossRef]
- Kang, W.; Xie, C.; Yao, J.; Xuan, L.; Liu, G. Online Multiple Object Tracking with Recurrent Neural Networks and Appearance Model. In Proceedings of the 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Chengdu, China, 17–19 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 34–38. [Google Scholar]
- Guo, S.; Wang, S.; Yang, Z.; Wang, L.; Zhang, H.; Guo, P.; Gao, Y.; Guo, J. A Review of Deep Learning-Based Visual Multi-Object Tracking Algorithms for Autonomous Driving. Appl. Sci. 2022, 12, 10741. [Google Scholar] [CrossRef]
- Stadler, D.; Beyerer, J. Improving Multiple Pedestrian Tracking by Track Management and Occlusion Handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual, 19–25 June 2021; pp. 10958–10967. [Google Scholar]
- Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.P.; Kämäräinen, J.; Danelljan, M.; Zajc, L.C.; Lukezic, A.; Drbohlav, O.; et al. The 8th Visual Object Tracking VOT2020 Challenge Results. In Proceedings of the Workshops on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2020; Volume 12539, pp. 547–601. [Google Scholar]
- Pham, Q.H.; Doan, V.S.; Pham, M.N.; Duong, Q.D. Real-Time Multi-vessel Classification and Tracking Based on StrongSORT-YOLOv5. In Proceedings of the International Conference on Intelligent Systems & Networks (ICISN); Lecture Notes in Networks and Systems. Springer: Berlin/Heidelberg, Germany, 2023; Volume 752, pp. 122–129. [Google Scholar]
- Shelatkar, T.; Bansal, U. Diagnosis of Brain Tumor Using Light Weight Deep Learning Model with Fine Tuning Approach. In Proceedings of the International Conference on Machine Intelligence and Signal Processing (MISP); Lecture Notes in Electrical Engineering. Springer: Berlin/Heidelberg, Germany, 2022; Volume 998, pp. 105–114. [Google Scholar]
- Li, J.; Wu, W.; Zhang, D.; Fan, D.; Jiang, J.; Lu, Y.; Gao, E.; Yue, T. Multi-Pedestrian Tracking Based on KC-YOLO Detection and Identity Validity Discrimination Module. Appl. Sci. 2023, 13, 12228. [Google Scholar] [CrossRef]
- Subramanian, M.; Shanmugavadivel, K.; Nandhini, P.S. On Fine-Tuning Deep Learning Models Using Transfer Learning and Hyper-Parameters Optimization for Disease Identification in Maize Leaves. Neural Comput. Appl. 2022, 34, 13951–13968. [Google Scholar] [CrossRef]
- Sukkar, M.; Kumar, D.; Sindha, J. Improve Detection and Tracking of Pedestrian Subclasses by Pre-Trained Models. J. Adv. Eng. Comput. 2022, 6, 215. [Google Scholar] [CrossRef]
- Kapania, S.; Saini, D.; Goyal, S.; Thakur, N.; Jain, R.; Nagrath, P. Multi Object Tracking with UAVs using Deep SORT and YOLOv3 RetinaNet Detection Framework. In Proceedings of the 1st ACM Workshop on Autonomous and Intelligent Mobile Systems, Linz, Austria, 8–10 July 2020; pp. 1–6. [Google Scholar]
- Zhu, J.; Yang, H.; Liu, N.; Kim, M.; Zhang, W.; Yang, M. Online Multi-Object Tracking with Dual Matching Attention Networks. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11209, pp. 379–396. [Google Scholar]
- Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.T.; Upcroft, B. Simple Online and Realtime Tracking. In Proceedings of the International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3464–3468. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ATOM: Accurate Tracking by Overlap Maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Computer Vision Foundation; IEEE: Piscataway, NJ, USA, 2019; pp. 4660–4669. [Google Scholar]
- Guo, M.; Xue, D.; Li, P.; Xu, H. Vehicle Pedestrian Detection Method Based on Spatial Pyramid Pooling and Attention Mechanism. Information 2020, 11, 583. [Google Scholar] [CrossRef]
- Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Sirisha, U.; Praveen, S.P.; Srinivasu, P.N.; Barsocchi, P.; Bhoi, A.K. Statistical Analysis of Design Aspects of Various YOLO-Based Deep Learning Models for Object Detection. Int. J. Comput. Intell. Syst. 2023, 16, 126. [Google Scholar] [CrossRef]
- Ultralytics YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 January 2024).
- Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. In Proceedings of the International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3645–3649. [Google Scholar]
- Guo, Q.; Feng, W.; Gao, R.; Liu, Y.; Wang, S. Exploring the Effects of Blur and Deblurring to Visual Object Tracking. IEEE Trans. Image Process. 2021, 30, 1812–1824. [Google Scholar] [CrossRef] [PubMed]
- Janai, J.; Güney, F.; Behl, A.; Geiger, A. Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art. Found. Trends Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar] [CrossRef]
- Meimetis, D.; Daramouskas, I.; Perikos, I.; Hatzilygeroudis, I. Real-Time Multiple Object Tracking Using Deep Learning Methods. Neural Comput. Appl. 2023, 35, 89–118. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, L.; Liu, Y.; Li, Y.; Wang, S. Towards Real-Time Multi-Object Tracking. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2020; Volume 12356, pp. 107–122. [Google Scholar]
- Ciaparrone, G.; Sánchez, F.L.; Tabik, S.; Troiano, L.; Tagliaferri, R.; Herrera, F. Deep Learning in Video Multi-Object Tracking: A Survey. Neurocomputing 2020, 381, 61–88. [Google Scholar] [CrossRef]
- Song, S.; Li, Y.; Huang, Q.; Li, G. A New Real-Time Detection and Tracking Method in Videos for Small Target Traffic Signs. Appl. Sci. 2021, 11, 3061. [Google Scholar] [CrossRef]
- Alikhanov, J.; Kim, H. Online Action Detection in Surveillance Scenarios: A Comprehensive Review and Comparative Study of State-of-the-Art Multi-Object Tracking Methods. IEEE Access 2023, 11, 68079–68092. [Google Scholar] [CrossRef]
- Multiple Object Tracking Benchmark. Available online: https://motchallenge.net/ (accessed on 28 January 2024).
- Luiten, J.; Osep, A.; Dendorfer, P.; Torr, P.H.S.; Geiger, A.; Leal-Taixé, L.; Leibe, B. HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. Int. J. Comput. Vis. 2021, 129, 548–578. [Google Scholar] [CrossRef]
Method | Advantages | Disadvantages |
---|---|---|
StrongSORT [4] |
|
|
DeepSORT [28] | Simple and efficient | Limited robustness to occlusions and identity switches |
TrackletNet [15] | Robust to occlusions and identity switches | Complex architecture and not real-time |
DMAN [29] | Robust to occlusions and identity switches | Not real-time |
SORT [30] | Simple and efficient | Limited robustness to occlusions and identity switches |
ATOM [31] | Accurate tracking | Complex architecture and not real-time |
IVDM [25] | Enhanced handling of occlusions and identity switches | Needs evaluation against real-time performance |
# | Method | HOTA | MOTA | IDF1 |
---|---|---|---|---|
1 | BYTEtrack | 37.574 | 33.36 | 43.946 |
2 | Ocsort | 40.852 | 38.206 | 48.678 |
3 | StrongSORT_I | 42.611 | 38.46 | 51.805 |
4 | Botsort | 42.644 | 41.654 | 51.91 |
5 | StrongSORT | 42.698 | 38.416 | 51.899 |
6 | StrongSORT_P | 42.859 | 38.467 | 52.29 |
Increase | 0.161 | 0.051 | 0.391 | |
Percentage Increase | 0.38% | 0.13% | 0.75% |
# | Method | HOTA | MOTA | IDF1 |
---|---|---|---|---|
1 | BYTEtrack | 41.073 | 38.41 | 48.973 |
2 | Ocsort | 44.389 | 42.394 | 53.121 |
3 | Botsort | 45.087 | 43.498 | 53.865 |
4 | StrongSORT_I | 45.299 | 42.414 | 54.856 |
5 | StrongSORT | 45.338 | 42.384 | 54.96 |
6 | StrongSORT_P | 45.55 | 42.413 | 55.338 |
Increase | 0.212 | 0.029 | 0.378 | |
Percentage Increase | 0.47% | 0.07% | 0.69% |
Method | HOTA (%) | MOTA (%) | IDF1 (%) |
---|---|---|---|
YOLOv8n | 0.49 | 0.15 | 0.92 |
YOLOv8s | 0.47 | 0.07 | 0.69 |
YOLOv8m | 0.60 | 0.27 | 0.65 |
YOLOv8l | 0.51 | 0.46 | 0.69 |
YOLOv8x | 0.44 | 0.39 | 0.61 |
Average percentage increase (%) | 0.50 | 0.27 | 0.71 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sukkar, M.; Shukla, M.; Kumar, D.; Gerogiannis, V.C.; Kanavos, A.; Acharya, B. Enhancing Pedestrian Tracking in Autonomous Vehicles by Using Advanced Deep Learning Techniques. Information 2024, 15, 104. https://doi.org/10.3390/info15020104
Sukkar M, Shukla M, Kumar D, Gerogiannis VC, Kanavos A, Acharya B. Enhancing Pedestrian Tracking in Autonomous Vehicles by Using Advanced Deep Learning Techniques. Information. 2024; 15(2):104. https://doi.org/10.3390/info15020104
Chicago/Turabian StyleSukkar, Majdi, Madhu Shukla, Dinesh Kumar, Vassilis C. Gerogiannis, Andreas Kanavos, and Biswaranjan Acharya. 2024. "Enhancing Pedestrian Tracking in Autonomous Vehicles by Using Advanced Deep Learning Techniques" Information 15, no. 2: 104. https://doi.org/10.3390/info15020104
APA StyleSukkar, M., Shukla, M., Kumar, D., Gerogiannis, V. C., Kanavos, A., & Acharya, B. (2024). Enhancing Pedestrian Tracking in Autonomous Vehicles by Using Advanced Deep Learning Techniques. Information, 15(2), 104. https://doi.org/10.3390/info15020104