A State-Based Language for Enhanced Video Surveillance Modeling (SEL)
Abstract
:1. Introduction
2. Related Work
3. Proposal SEL: Language Description and Methodology
3.1. Representation of the Scenario
3.2. Motion Detection
3.3. Motion Primitives
3.4. Language Grammar
3.5. Methodology for Implementation
- Scenario Analysis: This step entails a thorough analysis of the scenario, concentrating on pertinent areas and activities, as depicted in Figure 6(1), which illustrates some frames of the stage.
- Segmentation:The scenario is systematically divided into sections using a proposed matrix. The complexity of these segments can be precisely adjusted to match the demands of the activities they are modeling, i.e., in Figure 6(2), multiple segmentation options are presented as needed.
- State Listing: A list of states and names is generated after segmentation, i.e., Figure 6(3) shows the matrix-based structure of the naming and state list.
- Activity Modeling: After obtaining the states, we proceed to assess the appropriate motion primitives that will be employed for the purpose of modeling activities and constructing a statement, i.e., in Figure 6(4), a graphical representation of three alternative activities developed utilizing our recommended motion primitives is provided. The sequence primitive is utilized for activities such as tracking, while the concurrency primitive becomes vital in scenarios that demand synchronization, and finally, the parallelism primitive is intended specifically to detect movement inside designated zones.
- Activity Script Creation: The process begins with the creation of one or more scripts for various activities. Each script is segmented, which is crucial for activities modeled using motion primitives. Once the script is completed with the grammar and syntax defined in SEL, the activities are modeled for later inference. For example, Figure 6(5) illustrates an instance of a script based on previous examples.
4. Experimental Analysis and Results
- (a)
- Single counting objects. The first scenario involves passage states or access zones. Its goal is to track the trajectory of moving objects. The complexity arises from various external factors that the motion detector might detect, such as changes in lighting and multiple access points. In this scenario, each motion state represents an individual moving object, such as a person or a car, and must be counted.
- (b)
- Activity inference through displacement trajectories.Standard surveillance videos for traffic analysis are essential tasks in the context of an intelligent vehicle’s density flow monitoring system, particularly for outdoor zones where illumination conditions are uncontrolled.The scenarios encompass various activities that vehicles perform as behaviors that can be recognized and labeled as allowed or not. However, similar to the first scenario, the detection of correct motion zones becomes challenging due to the uncontrollable outdoor conditions.
- (c)
- Complex interaction between two or more objects.This scenario represents one of the most complex dynamics of recognition. It involves object interactions, situations where two or more objects interact with a specific semantic interpretation, such as a handshake, where the motions of two states are synchronized. In this context, precise detection and tracking of moving objects are fundamental for correctly interpreting these complex interactions. Recognizing these interactions can have significant applications in advanced surveillance, human–robot interaction, and security monitoring.
- (d)
- Complex activity detection.In a complex scenario, multiple activities occur in different areas. These activities are repetitive, meaning that several state sequences can represent the same activity’s activation. In this context, accurately identifying the activities and understanding their interactions are crucial for precise analysis and interpretation of the scenario’s dynamics. Accurate detection and tracking of moving objects are essential for capturing the complexity of these simultaneous activities. This type of scenario presents unique challenges for surveillance and activity analysis systems, and a detailed understanding is crucial for developing effective monitoring and security solutions.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fan, C.; Gao, F. Enhanced Human Activity Recognition Using Wearable Sensors via a Hybrid Feature Selection Method. Sensors 2021, 21, 6434. [Google Scholar] [CrossRef] [PubMed]
- Kong, Y.; Fu, Y. Human Action Recognition and Prediction: A Survey. arXiv 2018, arXiv:1806.11230. [Google Scholar] [CrossRef]
- Abu-Bakar, S.A.R. Advances in human action recognition: An updated survey. IET Image Process. 2019, 13, 2381–2394. [Google Scholar] [CrossRef]
- Eraso Guerrero, J.C.; Muñoz España, E.; Muñoz Añasco, M. Human Activity Recognition via Feature Extraction and Artificial Intelligence Techniques: A Review. Tecnura 2022, 26, 213–236. [Google Scholar] [CrossRef]
- Ke, S.R.; Thuc, H.; Lee, Y.J.; Hwang, J.N.; Yoo, J.H.; Choi, K.H. A Review on Video-Based Human Activity Recognition. Computers 2013, 2, 88–131. [Google Scholar] [CrossRef]
- Shakya, S.; Zhang, C.; Zhou, Z. Comparative Study of Machine Learning and Deep Learning Architecture for Human Activity Recognition Using Accelerometer Data. Int. J. Mach. Learn. Comput. 2018, 8, 577–582. [Google Scholar] [CrossRef]
- Ravipati, A.; Kondamuri, R.K.; Posonia, M. Vision Based Detection and Analysis of Human Activities. In Proceedings of the 2023 7th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 11–13 April 2023; pp. 1542–1547. [Google Scholar] [CrossRef]
- Morris, B.T.; Trivedi, M.M. A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 1114–1127. [Google Scholar] [CrossRef]
- Vu, V.T.; Brémond, F.; Thonnat, M. Automatic Video Interpretation: A Novel Algorithm for Temporal Scenario Recognition. In Proceedings of the International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 9–15 August 2003. [Google Scholar]
- Lou, J.; Liu, Q.; Tan, T.; Hu, W. Semantic interpretation of object activities in a surveillance system. In Proceedings of the Object Recognition Supported by User Interaction for Service Robots, Quebec City, QC, Canada, 11–15 August 2002; Volume 3, pp. 777–780. [Google Scholar] [CrossRef]
- Morris, B.; Trivedi, M. Learning trajectory patterns by clustering: Experimental studies and comparative evaluation. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 312–319. [Google Scholar] [CrossRef]
- Lewis, H.R.; Papadimitriou, C.H. Elements of the Theory of Computation. SIGACT News 1998, 29, 62–78. [Google Scholar] [CrossRef]
- Lowry, E. Formal Language as a Medium for Technical Education. In Proceedings of the ED-MEDIA 96, Boston, MA, USA, 17–22 June 1996. [Google Scholar]
- Kim, E.; Helal, S.; Cook, D. Human Activity Recognition and Pattern Discovery. IEEE Pervasive Comput. 2010, 9, 48–53. [Google Scholar] [CrossRef]
- Yao, B.; Jiang, X.; Khosla, A.; Lin, A.L.; Guibas, L.; Fei-Fei, L. Human Action Recognition by Learning Bases of Action Attributes and Parts. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
- Chmiel, W.; Kwiecień, J.; Mikrut, Z. Realization of Scenarios for Video Surveillance. Image Process. Commun. 2012, 17, 231. [Google Scholar] [CrossRef]
- Lee, J.; Ahn, B. Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors 2020, 20, 2886. [Google Scholar] [CrossRef] [PubMed]
- Tran, D.; Bourdev, L.D.; Fergus, R.; Torresani, L.; Paluri, M. C3D: Generic Features for Video Analysis. arXiv 2014, arXiv:1412.0767. [Google Scholar]
- Devanne, M.; Wannous, H.; Berretti, S.; Pala, P.; Daoudi, M.; Del Bimbo, A. 3-D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold. IEEE Trans. Cybern. 2015, 45, 1340–1352. [Google Scholar] [CrossRef] [PubMed]
- Kumar, R.; Kumar, S. Survey on artificial intelligence-based human action recognition in video sequences. Opt. Eng. 2023, 62, 023102. [Google Scholar] [CrossRef]
- Kim, B.; Lee, J. A Bayesian Network-Based Information Fusion Combined with DNNs for Robust Video Fire Detection. Appl. Sci. 2021, 11, 7624. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. arXiv 2014, arXiv:1406.2199. [Google Scholar]
- Poppe, R. A survey on vision-based human action recognition. Image Vis. Comput. 2010, 28, 976–990. [Google Scholar] [CrossRef]
- Turaga, P.; Chellappa, R.; Subrahmanian, V.S.; Udrea, O. Machine Recognition of Human Activities: A Survey. IEEE Trans. Circuits Syst. Video Technol. 2008, 18, 1473–1488. [Google Scholar] [CrossRef]
- Aggarwal, J.; Ryoo, M. Human Activity Analysis: A Review. ACM Comput. Surv. 2011, 43, 16. [Google Scholar] [CrossRef]
- Candamo, J.; Shreve, M.; Goldgof, D.; Sapper, D.; Kasturi, R. Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms. IEEE Trans. Intell. Transp. Syst. 2010, 11, 206–224. [Google Scholar] [CrossRef]
- Chaudhary, A.; Raheja, J.L.; Das, K.; Raheja, S. A survey on hand gesture recognition in context of soft computing. Commun. Comput. Inf. Sci. 2011, 133 CCIS, 46–55. [Google Scholar] [CrossRef]
- Hosler, B.C.; Zhao, X.; Mayer, O.; Chen, C.; Shackleford, J.A.; Stamm, M.C. The Video Authentication and Camera Identification Database: A New Database for Video Forensics. IEEE Access 2019, 7, 76937–76948. [Google Scholar] [CrossRef]
- Malgireddy, M.R.; Nwogu, I.; Govindaraju, V. Language-Motivated Approaches to Action Recognition. J. Mach. Learn. Res. 2013, 14, 2189–2212. [Google Scholar]
- Yang, Z.; Kay, A.; Li, Y.; Cross, W.; Luo, J. Pose-based Body Language Recognition for Emotion and Psychiatric Symptom Interpretation. arXiv 2020, arXiv:2011.00043. [Google Scholar]
- Alferes, J.; Banti, F.; Brogi, A. An Event-Condition-Action Logic Programming Language. In Logics in Artificial Intelligence (JELIA 2006); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4160. [Google Scholar] [CrossRef]
- Sarray, I.; Ressouche, A.; Moisan, S.; Rigault, J.; Gaffe, D. An activity description language for activity recognition. In Proceedings of the 2017 International Conference on Internet of Things, Embedded Systems and Communications (IINTEC), Gafsa, Tunisia, 20–22 October 2017; pp. 177–182. [Google Scholar] [CrossRef]
- Nguyen, N.T.; Phung, D.Q.; Venkatesh, S.; Bui, H. Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 955–960. [Google Scholar] [CrossRef]
- Jensen, K.; Kristensen, L.M.; Wells, L. Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems. Int. J. Softw. Tools Technol. Transf. 2007, 9, 213–254. [Google Scholar] [CrossRef]
- Vella, G.; Dimou, A.; Gutierrez-Perez, D.; Toti, D.; Nicoletti, T.; La Mattina, E.; Grassi, F.; Ciapetti, A.; McElligott, M.; Shahid, N.; et al. SURVANT: An Innovative Semantics-Based Surveillance Video Archives Investigation Assistant. In Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges, Virtual Event, 10–15 January 2021; Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R., Eds.; Springer: Cham, Switzerland, 2021; pp. 611–626. [Google Scholar]
- Zerzour, K.; Frazier, G. VIGILANT: A semantic Model for Content and Event Based Indexing and Retrieval of Surveillance Video. In Proceedings of the Knowledge Representation Meets Databases, Berlin, Germany, 21 August 2000. [Google Scholar]
- Lei, Q.; Du, J.; Zhang, H.; Ye, S.; Chen, D.S. A Survey of Vision-Based Human Action Evaluation Methods. Sensors 2019, 19, 4129. [Google Scholar] [CrossRef]
- Brand, M. Understanding manipulation in video. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, Killington, VT, USA, 14–16 October 1996; pp. 94–99. [Google Scholar] [CrossRef]
- Joo, S.W.; Chellappa, R. Attribute Grammar-Based Event Recognition and Anomaly Detection. In Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06), New York, NY, USA, 17–22 June 2006; p. 107. [Google Scholar] [CrossRef]
- Duckworth, P.; Hogg, D.C.; Cohn, A.G. Unsupervised human activity analysis for intelligent mobile robots. Artif. Intell. 2019, 270, 67–92. [Google Scholar] [CrossRef]
- Ryoo, M.S.; Aggarwal, J.K. Recognition of Composite Human Activities through Context-Free Grammar Based Representation. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1709–1718. [Google Scholar] [CrossRef]
- Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 26 August 2004; Volume 3, pp. 32–36. [Google Scholar] [CrossRef]
- Ikizler-Cinbis, N.; Sclaroff, S. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition. In Proceedings of the ECCV, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010. [Google Scholar]
- Richard, A.; Gall, J. Temporal Action Detection Using a Statistical Language Model. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3131–3140. [Google Scholar] [CrossRef]
- García-Huerta, J.M.; Jiménez-Hernández, H.; Herrera-Navarro, A.M.; Hernández-Díaz, T.; Terol-Villalobos, I. Modelling dynamics with context-free grammars. In Proceedings of the IS&T/SPIE Electronic Imaging, San Francisco, CA, USA, 2–6 February 2014. [Google Scholar] [CrossRef]
- Bobick, A.F.; Davis, J.W. The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 257–267. [Google Scholar] [CrossRef]
Motion Primitive | Symbol | Computational Complexity |
---|---|---|
Sequence | seq() | |
Parallelism | par() | |
Concurrency | con() |
Approach | Computational Complexity |
---|---|
Hidden Markov Models | |
Gaussian Mixture Model | |
Tracking optical flow | |
Convolutional Neural Networks | |
PCA | |
Bayes classifier |
Scenario | States | Activity Name | SEL |
---|---|---|---|
state A=[(5, 0),(6, 0)]; | Abnormal activity | ||
state B=[(7, 0),(8, 0),(9, 0),(10, 0),(11, 0)]; | |||
state C=[(10, 6),(11, 6)]; | |||
state D=[(12, 6),(12, 7),(13, 7),(13, 6), | |||
(14, 6),(14, 7),(15, 7),(15, 6)]; | |||
state A=[(2, 2),(3, 2),(3, 3),(2, 3)]; | Normal trajectory flow | ||
state B=[(4, 3),(4, 2),(5, 2),(5, 3)]; | |||
state C=[(6, 2),(6, 3),(7, 3),(8, 3),(9, 3), | |||
(9, 2),(8, 2),(7, 2)]; | |||
state D=[(10, 1),(10, 2),(11, 2),(11, 1), | |||
(12, 1),(12, 2),(13, 2),(13, 1)]; | |||
state E=[(15, 1),(14, 1),(14, 2),(15, 2)]; | |||
state A=[(3, 3),(4, 3),(4, 4),(5, 4),(6, 4), | Lateral road entry | ||
(6, 3),(5, 3)]; | |||
state B=[(7, 4),(8, 4),(9, 4),(10, 4)]; | |||
state C=[(11, 5),(11, 4),(12, 4),(12, 5)]; | |||
state D=[(13, 4),(13, 5),(14, 5),(14, 4), | |||
(15, 4),(15, 5)]; | |||
state A=[(6, 0),(7, 0),(8, 0)]; | Double traffic lanes | ||
state B=[(6, 1),(7, 1),(8, 1)]; | |||
state C=[(6, 2),(7, 2),(8, 2)]; | |||
state D=[(6, 4),(7, 4),(8, 4)]; | |||
state E=[(6, 5),(7, 5),(8, 5)]; | |||
state F=[(6, 6),(7, 6),(8, 6)]; | |||
state G=[(6, 7),(7, 7),(8, 7)]; | |||
state H=[(6, 8),(7, 8),(8, 8)]; | |||
state A=[(9, 5),(9, 6),(10, 6),(11, 6),(11, 5), | Join the highway | ||
(10, 5)]; | |||
state B=[(9, 7),(9, 8),(10, 8),(11, 8),(11, 7), | |||
(10, 7)]; | |||
state C=[(12, 5),(12, 6),(13, 6),(14, 6), | |||
(14, 5),(13, 5),(15, 5),(15, 6)]; | |||
state D=[(12, 7),(12, 8),(14, 8),(13, 7), | |||
(13, 8),(14, 7),(15, 7),(15, 8)]; | |||
state A=[(5, 4),(6, 4),(7, 4)]; | Crossing detection | ||
state B=[(8, 4),(9, 4),(10, 4),(11, 4),(12, 4)]; | |||
state C=[(13, 4),(14, 4),(15, 4)]; |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ramirez-Rosales, S.; Diaz-Jimenez, L.-A.; Canton-Enriquez, D.; Perez-Ramos, J.-L.; Hernandez-Ramirez, H.; Herrera-Navarro, A.-M.; Xicotencatl-Ramirez, G.; Jimenez-Hernandez, H. A State-Based Language for Enhanced Video Surveillance Modeling (SEL). Modelling 2024, 5, 549-568. https://doi.org/10.3390/modelling5020029
Ramirez-Rosales S, Diaz-Jimenez L-A, Canton-Enriquez D, Perez-Ramos J-L, Hernandez-Ramirez H, Herrera-Navarro A-M, Xicotencatl-Ramirez G, Jimenez-Hernandez H. A State-Based Language for Enhanced Video Surveillance Modeling (SEL). Modelling. 2024; 5(2):549-568. https://doi.org/10.3390/modelling5020029
Chicago/Turabian StyleRamirez-Rosales, Selene, Luis-Antonio Diaz-Jimenez, Daniel Canton-Enriquez, Jorge-Luis Perez-Ramos, Herlindo Hernandez-Ramirez, Ana-Marcela Herrera-Navarro, Gabriela Xicotencatl-Ramirez, and Hugo Jimenez-Hernandez. 2024. "A State-Based Language for Enhanced Video Surveillance Modeling (SEL)" Modelling 5, no. 2: 549-568. https://doi.org/10.3390/modelling5020029
APA StyleRamirez-Rosales, S., Diaz-Jimenez, L. -A., Canton-Enriquez, D., Perez-Ramos, J. -L., Hernandez-Ramirez, H., Herrera-Navarro, A. -M., Xicotencatl-Ramirez, G., & Jimenez-Hernandez, H. (2024). A State-Based Language for Enhanced Video Surveillance Modeling (SEL). Modelling, 5(2), 549-568. https://doi.org/10.3390/modelling5020029