An Intelligent Tracking System for Moving Objects in Dynamic Environments
Abstract
:1. Introduction
- A tracking system that excludes dynamic environment boundaries is presented. The front end incorporates a new dynamic environment boundary detection algorithm (DRD). This detection algorithm does not affect the localization accuracy.
- The simple linear iterative clustering algorithm is enhanced to eliminate the redundant calculations and reduce the time complexity of the DRD algorithm.
2. Related Work
3. Model Description
3.1. Dynamic Area Extraction
- Sift is a rotation invariant model that displays high performance in both complex and time-consuming computation system.
- SIFT is employed to find local features in the spatial dimension.
- It detects key points and then adds descriptors to be utilized for object detection.
- SURF is a fast detection algorithm that compares the center-point’s intensity level with its surrounding pixels.
- SURF is fast and robust and utilized for similarity invariant comparison of images.
- The key point of SURF algorithm is the real-time calculation of operators employing box filters for object tracking.
- (1)
- Each input frame is portioned into multiple squares. The SURF algorithm is devised to compute additional dynamic maps of each region.
- (2)
- Each image is represented by initial nodes, and the dynamic feature points are assigned to the proper node.
- (3)
- The algorithm tests if each node has only a single feature. If a node has more than one dynamic feature, the square including this node will be divided into four sub-squares, and a new node will be assigned to each sub-square. The dynamic feature in the old node will be assigned to the new node.
- (4)
- Step (3) is repeated and will be stopped when the vertices that have a specific map vector achieve the required dynamic map.
3.2. Dynamic Area Marking
- The proposed phases of simple iterative clustering are defined as depicted below:
- The class locus Z is spread uniformly with N as the spatial displacement.
- The displacements between class centers and the image pixels are calculated within a 2 N × 2 N area and the pixel is allocated to the class with the least displacement.
- Each cluster’s center should be updated.
- Continue looping through Steps b and c, until convergence to reach better results.
- All the centers will be uniformly distributed and pixels near the edges will be manifested as unstable.
- The displacement between each center and the selected unstable points are calculated within a 2 N × 2 N environment boundary and the unstable points are changed.
- The center and the unstable pixels beside the unstable flag of pixels beside any unstable points are marked and updated.
- Continue looping through Steps b and c, until convergence to reach better results.
3.3. The Tracking Model
Dynamic Object Tracking Algorithm
4. Experiments
4.1. Experimental Settings
4.2. Dynamic Environment Boundary Detection
4.3. Evaluation of the System DPR
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Engel, J.; Koltun, V.; Cremers, D. Direct sparse odometer. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 611–625. [Google Scholar] [CrossRef] [PubMed]
- Collins, R.; Zhou, X.; Teh, S.K. An open source tracking testbed and evaluation web site. IEEE Workshop Perform. Eval. Track. Surveill. 2020, 2, 35. [Google Scholar]
- Xu, H.; Yang, M.; Wang, X.; Yang, Q. Magnetic sensing system design for intelligent vehicle guidance. IEEE/ASME Trans. Mechatron. 2020, 15, 652–656. [Google Scholar]
- Loevsky, I.; Shimshoni, I. Reliable and efficient landmark-based localization for mobile robots. Robot. Auton. Syst. 2021, 58, 520–528. [Google Scholar] [CrossRef]
- Akter, S.; Habib, A.; Islam, M.A.; Hossen, M.S.; Fahim, W.A.; Sarkar, P.R.; Ahmed, M. Comprehensive Performance Assessment of Deep Learning Models in Early Prediction and Risk Identification of Chronic Kidney Disease. IEEE Access 2021, 9, 165184–165206. [Google Scholar] [CrossRef]
- Murrtal, R.; Tardos, J. SCR: An open-source SLAM system for monocular stereo and RGB-D cameras. IEEE Trans. Robot. 2021, 33, 1255–1262. [Google Scholar] [CrossRef]
- Bescos, A.; Facil, J.; Neira, J. DynaS: Tracking mapping and in painting in dynamic areas. IEEE Robot. Auton. Lett. 2019, 3, 4076–4083. [Google Scholar] [CrossRef]
- Ronzoni, D.; Olmi, R.; Fantuzzi, C. AGV global localization using indistinguishable artificial landmarks. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 287–292. [Google Scholar]
- Hafez, R.; David, J. SLAM2: A SLAM system for monocular stereo and RGB-D cameras. IEEE Trans. Robot. 2020, 33, 1255–1262. [Google Scholar]
- Mime, J.; Bayoun, D. LSD: Large static direct monocular model. Comput. Vis. 2020, 7, 83–89. [Google Scholar]
- Ahmed, M.; Cremers, D. Indirect deep learning odometer model. IEEE Trans. Trans. Pattern Anal. Mach. Intell. 2019, 4, 61–65. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The VID dataset. J. Robot. Reason. 2021, 32, 123–127. [Google Scholar]
- Fisher, R. The MOVSD4 surveillance ground-truth data sets. In Proceedings of the IEEE Workshop Performing Evaluation Tracking Surveillance, Cairo, Egypt, 17–19 December 2019; pp. 12–17. [Google Scholar]
- Fuentes, J.; Ascencio, J.; Mancha, J. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2019, 43, 55–81. [Google Scholar] [CrossRef]
- Saputra, M.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surv. 2020, 51, 37. [Google Scholar] [CrossRef]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y. Past present and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
- Wang, Y.; Lin, M.; Ju, R. Visual SLAM and moving-object detection for a small-size humanoid robot. Adv. Robot. Syst. 2021, 7, 133–143. [Google Scholar] [CrossRef]
- Kundu, A.; Krishna, K.; Sivaswamy, J. Moving object detection by multi-view geometric techniques from a single camera mounted robot. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 436–441. [Google Scholar]
- Li, S.; Lee, D. RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autosomes 2020, 2, 223–230. [Google Scholar] [CrossRef]
- Tan, W.; Liu, H.; Bao, H. Robust monocular SLAM in dynamic environments. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, SA, Australia, 1–4 October 2013; pp. 209–218. [Google Scholar]
- Fischler, M.; Bolles, R. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 2021, 24, 381–395. [Google Scholar] [CrossRef]
- Supreeth, H.S.G.; Patil, C.M. Moving object detection and tracking using deep learning neural network and correlation filter. In Proceedings of the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 20–21 April 2018; pp. 1775–1780. [Google Scholar] [CrossRef]
- Alcantarilla, P.; Yebes, J.; Almazan, J.; Bergasa, L. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 190–197. [Google Scholar]
- Giordano, D.; Murabito, F.; Spampinato, C. Superpixel-based video object segmentation using perceptual organization and location prior. Comput. Pattern Recognit. 2020, 6, 484–489. [Google Scholar]
- Hu, M.; Liu, Z.; Zhang, J.; Zhang, G. Robust object tracking via multi-cue fusion. Signal Process. 2017, 139, 86–95. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.; Wei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1168–1174. [Google Scholar]
- Machiraju, G.S.R.; Kumari, K.A.; Sharif, S.K. Object Detection and Tracking for Community Surveillance using Transfer Learning. In Proceedings of the 2021 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 20–22 January 2021; pp. 1035–1042. [Google Scholar] [CrossRef]
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 30, 328–341. [Google Scholar] [CrossRef]
- Lowe, D. Distinctive image features from scale-invariant key points. J. Comput. 2020, 6, 91–110. [Google Scholar]
- Bay, H.; Ess, A.; Gool, L.V. SURF: Speeded up robust features. Proc. Conf. Comput. Vis. 2021, 3, 346–359. [Google Scholar]
- Rosten, E.; Porter, R.; Drummond, T. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. 2021, 32, 105–119. [Google Scholar] [CrossRef]
- Achanta, R.; Sässtrunk, S. SLIC super pixels compared to state-of-the-art super pixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 4, 227–234. [Google Scholar]
- Sturm, J.; Cremers, D. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 573–580. [Google Scholar]
- Kerl, C.; Sturm, J.; Cremers, D. Robust odometry estimation for RGB-D cameras. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 3748–3754. [Google Scholar]
- Sun, Y.; Meng, M. Motion removal for reliable RGB-D SLAM in dynamic environments. Robot. Auton. Syst. 2018, 10, 115–128. [Google Scholar] [CrossRef]
Reference | Method | Description | Proposed Model | Database | Average Accuracy |
---|---|---|---|---|---|
[14] | Binary classification (safe/not-safe) | Tracking systems operate on different features of objects in dynamic areas, which decreases their effect on movement estimation. Tracking systems might miss dangerous moving objects in dynamic environments and might assume such objects as outliers that do not pose safety issues. | Spatial similarity map | Surveillance images database of 2500 videos | 91.23% |
[18] | Robot odometer to build a dynamic feature matrix | The model employed a robot odometer to build a dynamic feature matrix. Dynamic areas are discarded by posing restrictions on the polar and path computational geometry, the refusal outcome was influenced by the odometer accuracy. | Recurring CNN | 7064 images of five labeled dangerous situations | 90.76% |
[19] | Elimination of motion areas that do not impose dangerous object movement | The method computed static weights by merging depth information and feature static weights to approximate the camera pose. The method eliminated the adverse impact on the pose computation that was triggered by dynamic objects. However, the previously proposed systems could not totally eradicate the effect of the motion. | Deep learning CNN | 4970 images | 92.7% |
[20] | Motion estimation algorithm | The model used pixel segmentation for each two consecutive video frames where the super-pixels of the current video frame are broadcast. The motion metric, between the broadcast super-pixels and the previous motion is calculated and the largest area was used to determine if the broadcast super-pixel belonged to the moving object. | Deep CNN Architecture | 6024 videos images with 30 frames each | 93.4% |
[20] | Depth Calculation | The model used the variance in the color hue of both the present motion area and the previous moving area to spot path variation. Pixel categorization was performed with the partition of the frame. One of the drawbacks of this algorithm is that it classifies part of the ground like a dynamic object because the ground may look as a moving object depth. | CNN and discrete cosine transform | 8054 3D images | 92.3% |
[22] | Image semantics | The model utilizes Image semantics to aid the tracking system model in recognizing its surroundings by optimizing feature associations to ensure robustness. | Transfer learning | 8192 surveillance images | 91.5% |
[23] | Contour segmentation | The model used a contour segmentation of the dynamic objects in the video frames using ResNet, and was united with a path detection algorithm to detect the dynamic status of an object. | Deep learning Recurring CNN model | 4005 video frames | 93.5% with higher CPU time |
[24] | Dynamic background occlusion | The model employed pixel-based segmentation of the dynamic background in the video frames and prevented the tracking system from extracting dynamic features from those pixels. | R-CNN | 4860 images | 93.67% |
Our proposed model | In this study, we propose a novel model that incorporates a combination of the deep learning method and a tracking system algorithm that rejects dynamic areas which are not within the environment boundary of interest. The proposed model: Dynamic Province Reject (DPR) detects the areas of the dynamic points in the frames by epipolar feature extraction model. | epipolar feature extraction model | 6608 videos | 97.6% |
Parameter | Description |
---|---|
Input | 256 × 256 × 3 images |
Batch size | 64 |
Subset | Number of Frames | % |
---|---|---|
Training subset | 4340 | 70% |
Validation subset | 930 | 15% |
Testing subset | 930 | 15% |
GPU | Eight Cores Each of 32 bits @ 1.5 GHz, 64 Gb RAM with GP I/O |
---|---|
CPU | Intel Xeon processors |
Operating System | UNIX System V Release 4 (SVR4) |
MOVSD4 [13] Dataset | Positive | Negative |
---|---|---|
Positive | 3002 (TP) | 18 (FN) |
Negative | 19 (FP) | 750 (TN) |
VID [12] Dataset | Positive | Negative |
---|---|---|
Positive | 2120 (TP) | 16 (FN) |
Negative | 13 (FP) | 670 (TN) |
Frame Sequence | SCR | DynaS | Our Proposed Method | ||||||
---|---|---|---|---|---|---|---|---|---|
TAE% | RT (Degree/100 m) | RRE (m) | TAE% | RT (Degree/100 m) | RRE (m) | TAE% | RT (Degree/100 m) | RRE (m) | |
FS1 | 1.37 | 0.21 | 5.4 | 1.58 | 0.23 | 3.6 | 0.53 | 0.18 | 3.2 |
FS3 | 0.69 | 0.17 | 0.7 | 0.68 | 0.19 | 4.7 | 0.75 | 0.12 | 0.78 |
FS4 | 0.48 | 0.13 | 0.3 | 0.45 | 0.09 | 1.23 | 0.41 | 0.11 | 0.2 |
FS5 | 0.39 | 0.17 | 0.7 | 0.41 | 0.17 | 0.79 | 0.39 | 0.17 | 0.69 |
FS7 | 0.51 | 0.29 | 0.6 | 0.53 | 0.29 | 0.5 | 0.41 | 0.14 | 0.38 |
FS8 | 1.03 | 0.33 | 3.7 | 1.07 | 0.33 | 3.6 | 0.57 | 0.18 | 2.2 |
Scene | SCR | DynaS | Our Proposed Model | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
TAE% | Mean | Median | SD | TAE% | Mean | Median | SD | TAE% | Mean | Median | SD | |
1 | 0.77 | 0.53 | 0.175 | 0.521 | 0.002 | 0.001 | 0.001 | 0.101 | 0.012 | 0.018 | 0.002 | 0.010 |
2 | 0.87 | 0.85 | 0.788 | 0.177 | 0.006 | 0.719 | 0.077 | 0.138 | 0.075 | 0.024 | 0.078 | 0.033 |
3 | 1.01 | 0.87 | 0.675 | 0.512 | 0.045 | 0.069 | 0.023 | 0.135 | 0.041 | 0.021 | 0.052 | 0.033 |
4 | 0.49 | 0.43 | 0.422 | 0.223 | 0.141 | 0.107 | 0.179 | 0.166 | 0.139 | 0.117 | 0.069 | 0.037 |
5 | 0.51 | 0.48 | 0.487 | 0.243 | 0.115 | 0.129 | 0.105 | 0.153 | 0.141 | 0.092 | 0.098 | 0.039 |
6 | 0.53 | 0.47 | 0.459 | 0.322 | 0.107 | 0.133 | 0.109 | 0.149 | 0.097 | 0.128 | 0.090 | 0.053 |
Scene | SCR | DynaS | Our Proposed Model | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
TAE% | Mean | Median | SD | TAE% | Mean | Median | SD | TAE% | Mean | Median | SD | |
1 | 0.013 | 0.008 | 0.008 | 0.012 | 0.003 | 0.003 | 0.001 | 0.001 | 0.010 | 0.008 | 0.002 | 0.011 |
2 | 0.072 | 0.014 | 0.014 | 0.013 | 0.006 | 0.019 | 0.017 | 0.018 | 0.013 | 0.014 | 0.078 | 0.013 |
3 | 0.041 | 0.021 | 0.011 | 0.013 | 0.045 | 0.039 | 0.013 | 0.015 | 0.023 | 0.011 | 0.052 | 0.013 |
4 | 0.039 | 0.017 | 0.014 | 0.010 | 0.041 | 0.017 | 0.079 | 0.006 | 0.017 | 0.017 | 0.019 | 0.007 |
5 | 0.041 | 0.014 | 0.014 | 0.013 | 0.015 | 0.029 | 0.015 | 0.005 | 0.023 | 0.022 | 0.018 | 0.019 |
6 | 0.067 | 0.027 | 0.021 | 0.043 | 0.017 | 0.033 | 0.019 | 0.009 | 0.030 | 0.028 | 0.010 | 0.013 |
Dynamic Segmentation Time | Tracking Time | The Dynamic Object Detection | |
---|---|---|---|
SCR model | Does not have segmentation | 48.3 | 48.3 |
DynaS model | 501.4 | 77.3 | 578.7 |
Our proposed model | 83.4 | 51.2 | 134.7 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ali Hakami, N.; Hosni Mahmoud, H.A.; AlArfaj, A.A. An Intelligent Tracking System for Moving Objects in Dynamic Environments. Actuators 2022, 11, 274. https://doi.org/10.3390/act11100274
Ali Hakami N, Hosni Mahmoud HA, AlArfaj AA. An Intelligent Tracking System for Moving Objects in Dynamic Environments. Actuators. 2022; 11(10):274. https://doi.org/10.3390/act11100274
Chicago/Turabian StyleAli Hakami, Nada, Hanan Ahmed Hosni Mahmoud, and Abeer Abdulaziz AlArfaj. 2022. "An Intelligent Tracking System for Moving Objects in Dynamic Environments" Actuators 11, no. 10: 274. https://doi.org/10.3390/act11100274
APA StyleAli Hakami, N., Hosni Mahmoud, H. A., & AlArfaj, A. A. (2022). An Intelligent Tracking System for Moving Objects in Dynamic Environments. Actuators, 11(10), 274. https://doi.org/10.3390/act11100274