Deep Multi-Scale Features Fusion for Effective Violence Detection and Control Charts Visualization
Abstract
:1. Introduction
- A novel spatio-temporal VD method considering multi-scale deep features from the spatial domain to tackle the partial appearance of the action performer and to handle the occluded violent actions.
- We consider the concept of control charts in the VD domain to analyze the violent behavior very well, to reduce false alarms, and to maintain a proper history of the events that occurred. To the best of our knowledge, we are the first to employ visualization using deep models-based VD.
- Extensive experimental results are performed on benchmark datasets to prove the effectiveness of the proposed VD model.
2. Related Work
3. The Proposed Framework
3.1. Spatial Features Extraction
3.2. Spatio-Temporal Learning Mechanism
3.3. Control Charts Construction
3.4. Implementation Details
4. Experimental Results
4.1. Setup
4.2. Datasets
4.2.1. Hockey Fight
4.2.2. RWF-2000
4.2.3. Others
4.3. Discussion
4.4. Comparison with SOTA
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Muhammad, K.; Obaidat, M.S.; Hussain, T.; Ser, J.D.; Kumar, N.; Tanveer, M.; Doctor, F. Fuzzy logic in surveillance big video data analysis: Comprehensive review, challenges, and research directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–33. [Google Scholar] [CrossRef]
- Sevcik, L.; Voznak, M. Adaptive Reservation of Network Resources According to Video Classification Scenes. Sensors 2021, 21, 1949. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Li, Y.; Zhang, S.; Shahabi, F.; Xia, S.; Deng, Y.; Alshurafa, N. Deep learning in human activity recognition with wearable sensors: A review on advances. Sensors 2022, 22, 1476. [Google Scholar] [CrossRef] [PubMed]
- Yao, H.; Hu, X. A survey of video violence detection. Cyber-Phys. Syst. 2021, 1–24. [Google Scholar] [CrossRef]
- Baba, M.; Gui, V.; Cernazanu, C.; Pescaru, D. A sensor network approach for violence detection in smart cities using deep learning. Sensors 2019, 19, 1676. [Google Scholar] [CrossRef] [Green Version]
- Khan, I.U.; Afzal, S.; Lee, J.W. Human activity recognition via hybrid deep learning based model. Sensors 2022, 22, 323. [Google Scholar] [CrossRef]
- Ullah, W.; Ullah, A.; Haq, I.U.; Muhammad, K.; Sajjad, M.; Baik, S.W. CNN features with bi-directional LSTM for real-time anomaly detection in surveillance networks. Multimed. Tools Appl. 2021, 80, 16979–16995. [Google Scholar] [CrossRef]
- Lejmi, W.; Khalifa, A.B.; Mahjoub, M.A. Challenges and methods of violence detection in surveillance video: A survey. In Computer Analysis of Images and Patterns, Proceedings of the International Conference on Computer Analysis of Images and Patterns; Springer: Cham, Switzerland, 2019; pp. 62–73. [Google Scholar]
- Serrano Gracia, I.; Deniz Suarez, O.; Bueno Garcia, G.; Kim, T.K. Fast fight detection. PLoS ONE 2015, 10, e0120448. [Google Scholar] [CrossRef]
- Zhang, T.; Yang, Z.; Jia, W.; Yang, B.; Yang, J.; He, X. A new method for violence detection in surveillance scenes. Multimed. Tools Appl. 2016, 75, 7327–7349. [Google Scholar] [CrossRef]
- Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent flows: Real-time detection of violent crowd behavior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 1–6. [Google Scholar]
- Sjöberg, M.; Baveye, Y.; Wang, H.; Quang, V.L.; Ionescu, B.; Dellandréa, E.; Schedl, M.; Demarty, C.H.; Chen, L. The MediaEval 2015 Affective Impact of Movies Task. In Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, 14–15 September 2015; Volume 1436. [Google Scholar]
- Serrano, I.; Deniz, O.; Espinosa-Aranda, J.L.; Bueno, G. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Trans. Image Process. 2018, 27, 4787–4797. [Google Scholar] [CrossRef]
- Ding, C.; Fan, S.; Zhu, M.; Feng, W.; Jia, B. Violence detection in video by using 3D convolutional neural networks. In Advances in Visual Computing, Proceedings of the International Symposium on Visual Computing; Springer: Cham, Switzerland, 2014; pp. 551–558. [Google Scholar]
- Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497. [Google Scholar]
- Meng, Z.; Yuan, J.; Li, Z. Trajectory-pooled deep convolutional networks for violence detection in videos. In Computer Vision Systems. ICVS 2017; Springer: Cham, Switzerland, 2017; pp. 437–447. [Google Scholar]
- Sudhakaran, S.; Lanz, O. Learning to detect violent videos using convolutional long short-term memory. In Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy, 29 August–1 September 2017; pp. 1–6. [Google Scholar]
- Aktı, Ş.; Tataroğlu, G.A.; Ekenel, H.K. Vision-based fight detection from surveillance cameras. In Proceedings of the 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA), Istanbul, Turkey, 6–9 November 2019; pp. 1–6. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Ullah, W.; Ullah, A.; Hussain, T.; Khan, Z.A.; Baik, S.W. An efficient anomaly recognition framework using an attention residual LSTM in surveillance videos. Sensors 2021, 21, 2811. [Google Scholar] [CrossRef] [PubMed]
- Ullah, F.U.M.; Muhammad, K.; Haq, I.U.; Khan, N.; Heidari, A.A.; Baik, S.W.; de Albuquerque, V.H.C. AI-Assisted Edge Vision for Violence Detection in IoT-Based Industrial Surveillance Networks. IEEE Trans. Ind. Inform. 2021, 18, 5359–5370. [Google Scholar] [CrossRef]
- Nafea, O.; Abdul, W.; Muhammad, G.; Alsulaiman, M. Sensor-based human activity recognition with spatio-temporal deep learning. Sensors 2021, 21, 2141. [Google Scholar] [CrossRef] [PubMed]
- Ullah, A.; Muhammad, K.; Hussain, T.; Lee, M.; Baik, S.W. Deep LSTM-based sequence learning approaches for action and activity recognition. In Deep Learning in Computer Vision; CRC Press: Boca Raton, FL, USA, 2020; pp. 127–150. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2021, 54, 200. [Google Scholar] [CrossRef]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing convolutions to vision transformers. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 22–31. [Google Scholar]
- Singh, J.; Thakur, D.; Ali, F.; Gera, T.; Kwak, K.S. Deep feature extraction and classification of android malware images. Sensors 2020, 20, 7013. [Google Scholar] [CrossRef] [PubMed]
- Khan, K.; Khan, R.U.; Ahmad, K.; Ali, F.; Kwak, K.S. Face segmentation: A journey from classical to deep learning paradigm, approaches, trends, and directions. IEEE Access 2020, 8, 58683–58699. [Google Scholar] [CrossRef]
- Ale, L.; Zhang, N.; Li, L. Road damage detection using RetinaNet. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 5197–5200. [Google Scholar]
- Ullah, A.; Ahmad, J.; Muhammad, K.; Sajjad, M.; Baik, S.W. Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 2017, 6, 1155–1166. [Google Scholar] [CrossRef]
- Nievas, E.B.; Suarez, O.D.; García, G.B.; Sukthankar, R. Violence detection in video using computer vision techniques. In CAIP 2011: Computer Analysis of Images and Patterns; Springer: Berlin/Heidelberg, Germany, 2011; pp. 332–339. [Google Scholar]
- Cheng, M.; Cai, K.; Li, M. RWF-2000: An open large scale video database for violence detection. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 4183–4190. [Google Scholar]
- Bilinski, P.; Bremond, F. Human violence recognition and detection in surveillance videos. In Proceedings of the 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Colorado Springs, CO, USA, 23–26 August 2016; pp. 30–36. [Google Scholar]
- Mabrouk, A.B.; Zagrouba, E. Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognit. Lett. 2017, 92, 62–67. [Google Scholar] [CrossRef]
- Xia, Q.; Zhang, P.; Wang, J.; Tian, M.; Fei, C. Real time violence detection based on deep spatio-temporal features. In CCBR 2018: Biometric Recognition; Springer: Cham, Switzerland, 2018; pp. 157–165. [Google Scholar]
- Ullah, F.U.M.; Ullah, A.; Muhammad, K.; Haq, I.U.; Baik, S.W. Violence detection using spatiotemporal features with 3D convolutional neural network. Sensors 2019, 19, 2472. [Google Scholar] [CrossRef] [Green Version]
- Carreira, J.; Zisserman, A. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6299–6308. [Google Scholar]
- Traoré, A.; Akhloufi, M.A. Violence detection in videos using deep recurrent and convolutional neural networks. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 154–159. [Google Scholar]
- Ullah, F.U.M.; Obaidat, M.S.; Muhammad, K.; Ullah, A.; Baik, S.W.; Cuzzolin, F.; Rodrigues, J.J.; de Albuquerque, V.H.C. An intelligent system for complex violence pattern analysis and detection. Int. J. Intell. Syst. 2021. [Google Scholar] [CrossRef]
- Freire-Obregón, D.; Barra, P.; Castrillón-Santana, M.; Marsico, M.D. Inflated 3D ConvNet context analysis for violence detection. Mach. Vis. Appl. 2022, 33, 1–13. [Google Scholar] [CrossRef]
- Khaire, P.; Kumar, P.; Imran, J. Combining CNN streams of RGB-D and skeletal data for human activity recognition. Pattern Recognit. Lett. 2018, 115, 107–116. [Google Scholar] [CrossRef]
Dataset | # Videos | # Violent | # Non-Violent | FPS |
---|---|---|---|---|
Hockey fight | 300 | 150 | 150 | 20–30 |
RWF-2000 | 2000 | 1000 | 1000 | |
Violent flow | 246 | 123 | 123 | 25 |
Violence in movies | 1000 | 500 | 500 | 25 |
Method | Accuracy (%) | |||
---|---|---|---|---|
Hockey Fight | RWF-2000 | Violent Crowd | Industrial Surveillance | |
Hassner et al. [11] | 58.20 | o | 81.20 | o |
Ding et al. [14] | 91.00 | o | o | o |
Bilinski et al. [32] | 93.40 | o | o | o |
Mabrouk et al. [33] | 88.60 | o | 85.83 | o |
Sudhakaran et al. [17] | 97.10 | o | o | o |
Xia et al. [34] | 95.90 | o | o | o |
Ullah et al. [35] | 96.00 | o | o | o |
Carreira et al. (a) [36] | o | 85.75 | o | o |
Carreira et al. (b) [36] | o | 75.50 | o | o |
Carreira et al. (c) [36] | o | 81.50 | o | o |
Traore et al. [37] | 96.50 | o | o | o |
Ullah et al. [21] | 98.50 | 88.20 | o | 83.57 |
Ullah et al. [38] | 98.20 | o | o | o |
Freire et al. [39] | 99.40 | o | o | o |
Cheng et al. [31] | o | 87.25 | o | o |
Tran et al. [15] | o | 82.75 | o | o |
Ours | 91.29 | 90.47 | 89.63 | 81.22 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mumtaz, N.; Ejaz, N.; Aladhadh, S.; Habib, S.; Lee, M.Y. Deep Multi-Scale Features Fusion for Effective Violence Detection and Control Charts Visualization. Sensors 2022, 22, 9383. https://doi.org/10.3390/s22239383
Mumtaz N, Ejaz N, Aladhadh S, Habib S, Lee MY. Deep Multi-Scale Features Fusion for Effective Violence Detection and Control Charts Visualization. Sensors. 2022; 22(23):9383. https://doi.org/10.3390/s22239383
Chicago/Turabian StyleMumtaz, Nadia, Naveed Ejaz, Suliman Aladhadh, Shabana Habib, and Mi Young Lee. 2022. "Deep Multi-Scale Features Fusion for Effective Violence Detection and Control Charts Visualization" Sensors 22, no. 23: 9383. https://doi.org/10.3390/s22239383
APA StyleMumtaz, N., Ejaz, N., Aladhadh, S., Habib, S., & Lee, M. Y. (2022). Deep Multi-Scale Features Fusion for Effective Violence Detection and Control Charts Visualization. Sensors, 22(23), 9383. https://doi.org/10.3390/s22239383