ACF: An Armed CCTV Footage Dataset for Enhancing Weapon Detection
Abstract
:1. Introduction
- The first and most critical issue was that the data are fed to a Convolution Neural Network (CNN) for learning features to achieve classification and detection tasks. For the detection of weapons on CCTV footage, there was no standard dataset available.
- The public datasets primarily available did not cover our interesting tasks, such as a pistol images dataset containing a full-size pistol in images for the classification task. In most of the datasets, the image quality was poor and unsymmetrical.
- Manually collecting data was a time-consuming task. Furthermore, labeling the collected dataset was complex because all data must be categorized manually.
- As mentioned, the objects in the images extracted from CCTV footage were small. Therefore, a preprocessing method is needed to enhance the efficiency of the object detection models.
- The collection of new datasets for appropriate weapon detection training data, in which we collected an Armed CCTV Footage (ACF) dataset of 8319 images. The dataset contained CCTV footage of a pedestrian carrying a weapon in different scenes and armed postures. The ACF Dataset was collected in indoor and outdoor scenarios to leverage the efficiency of weapon detection on actual CCTV footage.
- The research implemented the image tiling method to enhance the object detector efficiency for small objects. The image tiling method will help enhance weapon detection on the small image object region.
2. Literature Review
2.1. Object Detection Approach Overview
2.2. Weapon Detection Public Datasets
3. ACF Dataset
3.1. CCTV Cameras Setting
3.2. Data Collection
- Parking lot 1: Parking lot 1 was located at the corner of the Faculty of Engineering parking lot, where the area had no shadows of surrounding objects, such as trees or buildings, casting on the footage area. The area’s light environment was totally dependent on the weather, which was sunny. The data collection at parking lot 1 took 38 min 44 s for both the pistol and knife data collection. The data were collected during the morning and afternoon of the same day. Three participants were alternately armed with a weapon following data collection scenarios, and each scenario may have had up to two weapon-carrying participants.
- Parking lot 2: Parking lot 2 was located at the edge of the Faculty of Engineering parking lot where the area had shadows from buildings and trees casting on the footage area. The data collection at parking lot 2 took 28 min 8 s for both the pistol and knife data collection, where the dataset was collected during the morning and afternoon of the same day. Three participants were alternately armed with a weapon following the data collection scenarios. Each scenario may have had up to two participants carrying weapons.
- Parking lot 3: Parking lot 3 was located at the edge of the Faculty of Engineering parking lot connected to a grass field where the area had the shadow of nearby trees casting on the footage area. The data collection at parking lot 3 took 45 min 35 s of both the pistol and knife data collection, and the data were collected during the afternoon. Four participants were alternately armed with weapons following the data collection scenarios. Each scenario may have had up to two participants carrying weapons.
- Corridor: The corridor was located at the corridor connected to the classrooms on the second floor of the Faculty of Engineering where the area had the facade’s shadow casting on the footage area. The data collection at the building corridor took 25 min 35 s for both the pistol and knife data collection, where the data were collected in the afternoon. Two participants were alternately armed with weapons following the data collection’s scenarios. Each scenario may have had up to two participants carrying weapons.
3.3. Data Acquisition
- Dataset 1: ACF_Pistol Dataset
- 2.
- Dataset 2: ACF_Knife Dataset
- 3.
- Dataset 3: ACF Dataset
4. Methodology
4.1. Preprocessing
- Tiling Dataset 1: A total of 17,704 tiling images of Dataset 1, which contained images of a person holding a pistol in different scenarios.
- Tiling Dataset 2: A total of 14,236 tiling images of Dataset 2, which contained images of a person armed with a knife.
- Tiling Dataset 3: The combined dataset of Tiling Dataset 1 and Tiling Dataset 2, which comprised a total of 33,276 tiling images of Dataset 3, which contained images of a person armed with a pistol or knife.
4.2. Weapon Detection Model Training
4.3. Evaluation Metrics
- Intersection over Union (IoU)
- 2.
- Precision
- 3.
- Recall
- 4.
- Average Precision (AP)
- 5.
- Average Recall (AR)
- 6.
- Mean Average Precision (mAP)
5. Experimental Results
5.1. Experimental Setup
- (1)
- SSD MobileNet V2;
- (2)
- EfficientDet D0;
- (3)
- Faster R-CNN Inception Resnet V2.
5.2. Weapon Detection on ACF Dataset
- SSD MobileNet V2;
- EfficientDet D0;
- Faster R-CNN Inception Resnet V2.
5.3. Approach Comparison with Related Research
6. Discussion
6.1. Importance of ACF Creation
6.2. Importance of Tiling Approach
7. Conclusions and Future Works
7.1. Conclusions
7.2. Limitations and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Royal Canadian Mounted Police. Community, Contract and Aboriginal Policing Services Directorate, Research and Evaluation Branch—Indigenous Justice Clearinghouse. Available online: https://www.indigenousjustice.gov.au/resource-author/royal-canadian-mounted-police-community-contract-and-aboriginal-policing-services-directorate-research-and-evaluation-branch/ (accessed on 22 December 2021).
- Available online: http://statbbi.nso.go.th/staticreport/page/sector/th/09.aspx (accessed on 22 December 2021).
- Mass Shooter Killed at Terminal 21 in Korat, 20 Dead. Available online: https://www.bangkokpost.com/learning/really-easy/1853824/mass-shooter-killed-at-terminal-21-in-korat-20-dead (accessed on 22 December 2021).
- ImageNet. Available online: https://www.image-net.org/update-mar-11-2021.php (accessed on 6 January 2022).
- COCO—Common Objects in Context. Available online: https://cocodataset.org/#home (accessed on 6 January 2022).
- Tiwari, R.K.; Verma, G.K. A Computer Vision Based Framework for Visual Gun Detection Using Harris Interest Point Detector. Procedia Comput. Sci. 2015, 54, 703–712. [Google Scholar] [CrossRef]
- Verma, G. A Computer Vision Based Framework for Visual Gun Detection Using SURF. In Proceedings of the 2015 International Conference on Electrical, Electronics, Signals, Communication and Optimization (EESCO), Visakhapatnam, India, 24–25 January 2015; pp. 1–5. [Google Scholar] [CrossRef]
- Huval, B.; Wang, T.; Tandon, S.; Kiske, J.; Song, W.; Pazhayampallil, J.; Andriluka, M.; Rajpurkar, P.; Migimatsu, T.; Cheng-Yue, R.; et al. An Empirical Evaluation of Deep Learning on Highway Driving. arXiv 2015, arXiv:1504.01716. [Google Scholar]
- Mastering the Game of Go with Deep Neural Networks and Tree Search. Available online: https://www.researchgate.net/publication/292074166_Mastering_the_game_of_Go_with_deep_neural_networks_and_tree_search (accessed on 22 December 2021).
- TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 22 December 2021).
- Caffe | Deep Learning Framework. Available online: https://caffe.berkeleyvision.org/ (accessed on 22 December 2021).
- PyTorch. Available online: https://www.pytorch.org (accessed on 20 April 2021).
- ONNX | Home. Available online: https://onnx.ai/ (accessed on 22 December 2021).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
- Ba, J.; Mnih, V.; Kavukcuoglu, K. Multiple Object Recognition with Visual Attention. arXiv 2015, arXiv:1412.7755. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1 July 2017; pp. 6517–6525. [Google Scholar]
- YOLOv3: An Incremental Improvement | Semantic Scholar. Available online: https://www.semanticscholar.org/paper/YOLOv3%3A-An-Incremental-Improvement-Redmon-Farhadi/e4845fb1e624965d4f036d7fd32e8dcdd2408148 (accessed on 22 December 2021).
- Zhang, C.; Kim, J. Object Detection with Location-Aware Deformable Convolution and Backward Attention Filtering. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Long Beach, CA, USA, 2019; pp. 9444–9453. [Google Scholar]
- AttentionNet: Aggregating Weak Directions for Accurate Object Detection. Available online: https://arxiv.org/abs/1506.07704 (accessed on 2 February 2022).
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv 2016, arXiv:1502.03044. [Google Scholar]
- Li, L.; Xu, M.; Wang, X.; Jiang, L.; Liu, H. Attention Based Glaucoma Detection: A Large-Scale Database and CNN Model. arXiv 2019, arXiv:1903.10831. [Google Scholar]
- Hara, K.; Liu, M.-Y.; Tuzel, O.; Farahmand, A. Attentional Network for Visual Object Detection. arXiv 2017, arXiv:1702.01478. [Google Scholar]
- Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An Attentive Survey of Attention Models. arXiv 2021, arXiv:1904.02874. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. arXiv 2014, arXiv:1311.2524. [Google Scholar]
- Fast R-CNN. Available online: https://arxiv.org/abs/1504.08083 (accessed on 22 December 2021).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. arXiv 2017, arXiv:1612.03144. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. arXiv 2019, arXiv:1904.01355. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. arXiv 2018, arXiv:1708.02002. [Google Scholar]
- Kong, T.; Sun, F.; Liu, H.; Jiang, Y.; Li, L.; Shi, J. FoveaBox: Beyond Anchor-Based Object Detector. IEEE Trans. Image Process. 2020, 29, 7389–7398. [Google Scholar] [CrossRef]
- Peppa, M.V.; Komar, T.; Xiao, W.; James, P.; Robson, C.; Xing, J.; Barr, S. Towards an End-to-End Framework of CCTV-Based Urban Traffic Volume Detection and Prediction. Sensors 2021, 21, 629. [Google Scholar] [CrossRef]
- He, X.; Cheng, R.; Zheng, Z.; Wang, Z. Small Object Detection in Traffic Scenes Based on YOLO-MXANet. Sensors 2021, 21, 7422. [Google Scholar] [CrossRef]
- Galvao, L.G.; Abbod, M.; Kalganova, T.; Palade, V.; Huda, M.N. Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems—A Review. Sensors 2021, 21, 7267. [Google Scholar] [CrossRef]
- Kim, D.; Kim, H.; Mok, Y.; Paik, J. Real-Time Surveillance System for Analyzing Abnormal Behavior of Pedestrians. Appl. Sci. 2021, 11, 6153. [Google Scholar] [CrossRef]
- Tsiktsiris, D.; Dimitriou, N.; Lalas, A.; Dasygenis, M.; Votis, K.; Tzovaras, D. Real-Time Abnormal Event Detection for Enhanced Security in Autonomous Shuttles Mobility Infrastructures. Sensors 2020, 20, 4943. [Google Scholar] [CrossRef] [PubMed]
- Huszár, V.D.; Adhikarla, V.K. Live Spoofing Detection for Automatic Human Activity Recognition Applications. Sensors 2021, 21, 7339. [Google Scholar] [CrossRef]
- Fernández-Carrobles, M.; Deniz, O.; Maroto, F. Gun and Knife Detection Based on Faster R-CNN for Video Surveillance. In Iberian Conference on Pattern Recognition and Image Analysis; Springer: Cham, Switzerland, 2019; pp. 441–452. ISBN 978-3-030-31320-3. [Google Scholar]
- Navalgund, U.V.; Priyadharshini, K. Crime Intention Detection System Using Deep Learning. In Proceedings of the 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET), Kottayam, India, 21–22 December 2018; pp. 1–6. [Google Scholar]
- Weapon_detection_dataset. Available online: https://www.kaggle.com/datasets/abhishek4273/gun-detection-dataset (accessed on 14 July 2022).
- GitHub—HeeebsInc/WeaponDetection. Available online: https://github.com/HeeebsInc/WeaponDetection (accessed on 8 May 2022).
- Salazar González, J.L.; Zaccaro, C.; Álvarez-García, J.A.; Soria Morillo, L.M.; Sancho Caparrini, F. Real-Time Gun Detection in CCTV: An Open Problem. Neural Netw. 2020, 132, 297–308. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
- Reina, G.A.; Panchumarthy, R. Adverse Effects of Image Tiling on Convolutional Neural Networks. In Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 25–36. [Google Scholar]
- An Application of Cascaded 3D Fully Convolutional Networks for Medical Image Segmentation. Available online: https://arxiv.org/abs/1803.05431 (accessed on 22 December 2021).
- Efficient ConvNet-Based Object Detection for Unmanned Aerial Vehicles by Selective Tile Processing. Available online: https://www.researchgate.net/publication/337273563_Efficient_ConvNet-based_Object_Detection_for_Unmanned_Aerial_Vehicles_by_Selective_Tile_Processing (accessed on 22 December 2021).
- Image Tiling for Embedded Applications with Non-Linear Constraints. Available online: https://ieeexplore.ieee.org/document/7367256 (accessed on 22 December 2021).
- Reina, G.A.; Panchumarthy, R.; Thakur, S.P.; Bastidas, A.; Bakas, S. Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation. Front. Neurosci. 2020, 14, 65. [Google Scholar] [CrossRef]
- Huang, B.; Reichman, D.; Collins, L.M.; Bradbury, K.; Malof, J.M. Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations. arXiv 2019, arXiv:1805.12219. [Google Scholar]
- The Power of Tiling for Small Object Detection | IEEE Conference Publication | IEEE Xplore. Available online: https://ieeexplore.ieee.org/document/9025422 (accessed on 22 December 2021).
- Daye, M.A. A Comprehensive Survey on Small Object Detection. Master’s Thesis, İstanbul Medipol Üniversitesi Fen Bilimleri Enstitüsü, Istanbul, Turkey, 2021. [Google Scholar]
- GitHub—Real-Time Gun Detection in CCTV: An Open Problem. 2021. Available online: https://github.com/Deepknowledge-US/US-Real-time-gun-detection-in-CCTV-An-open-problem-dataset (accessed on 8 May 2022).
- Feris, R.; Brown, L.M.; Pankanti, S.; Sun, M.-T. Appearance-Based Object Detection Under Varying Environmental Conditions. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 166–171. [Google Scholar]
- Wang, A.; Sun, Y.; Kortylewski, A.; Yuille, A. Robust Object Detection under Occlusion with Context-Aware CompositionalNets. arXiv 2020, arXiv:2005.11643. [Google Scholar]
- Seo, J.; Park, H. Object Recognition in Very Low Resolution Images Using Deep Collaborative Learning. IEEE Access 2019, 7, 134071–134082. [Google Scholar] [CrossRef]
- Courtrai, L.; Pham, M.-T.; Lefèvre, S. Small Object Detection in Remote Sensing Images Based on Super-Resolution with Auxiliary Generative Adversarial Networks. Remote Sens. 2020, 12, 3152. [Google Scholar] [CrossRef]
- Darrenl Tzutalin/LabelImg 2020. Available online: https://github.com/tzutalin/labelImg/ (accessed on 22 December 2021).
- Tensorflow/Models 2020. Available online: https://www.tensorflow.org/ (accessed on 22 December 2021).
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- The PASCAL Visual Object Classes Homepage. Available online: http://host.robots.ox.ac.uk/pascal/VOC/ (accessed on 17 March 2020).
- Open Images Object Detection RVC 2020 Edition. Available online: https://kaggle.com/c/open-images-object-detection-rvc-2020 (accessed on 23 December 2021).
- Models/Research/Object_detection at Master Tensorflow/Models. Available online: https://github.com/tensorflow/models (accessed on 22 December 2021).
- Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An Advanced Object Detection Network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar]
- He, Y.; Zhu, C.; Wang, J.; Savvides, M.; Zhang, X. Bounding Box Regression with Uncertainty for Accurate Object Detection. arXiv 2019, arXiv:1809.08545. [Google Scholar]
- Jiang, B.; Luo, R.; Mao, J.; Xiao, T.; Jiang, Y. Acquisition of Localization Confidence for Accurate Object Detection. arXiv 2018, arXiv:1807.11590. [Google Scholar]
- The PASCAL Visual Object Classes Challenge 2012 (VOC2012). Available online: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html (accessed on 17 March 2020).
- Hosang, J.; Benenson, R.; Dollár, P.; Schiele, B. What Makes for Effective Detection Proposals? IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 814–830. [Google Scholar] [CrossRef] [Green Version]
- Chen, K.; Li, J.; Lin, W.; See, J.; Wang, J.; Duan, L.; Chen, Z.; He, C.; Zou, J. Towards Accurate One-Stage Object Detection with AP-Loss. arXiv 2020, arXiv:1904.06373. [Google Scholar]
- Azulay, A.; Weiss, Y. Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations? arXiv 2019, arXiv:1805.12177. [Google Scholar]
- Hashemi, M. Enlarging Smaller Images before Inputting into Convolutional Neural Network: Zero-Padding vs. Interpolation. J. Big Data 2019, 6, 98. [Google Scholar] [CrossRef]
- Sabottke, C.F.; Spieler, B.M. The Effect of Image Resolution on Deep Learning in Radiography. Radiol. Artif. Intell. 2020, 2, e190015. [Google Scholar] [CrossRef]
- Luke, J.; Joseph, R.; Balaji, M. Impact of image size on accuracy and generalization of convolutional neural networks. Int. J. Res. Anal. Rev. 2019, 6, 70–80. [Google Scholar]
Dataset | Image Size | Class | Number of Images | Labels | Ratio (Labels/Image) | Avg. Pixels |
---|---|---|---|---|---|---|
Weapon detection dataset [43] | 240 × 145 pixels–4272 × 2848 pixels | Gun | 3000 | 3464 | 1.15:1 | (261 × 202) |
Weapon detection system [44] | 99 × 93 pixels–6016 × 4016 pixels | Gun | 4940 | 9202 | 1.82:1 | (150 × 116) |
Mock Attack Dataset [55] | 1920 × 1080 pixels | Handgun | 5149 | 1714 | 0.32:1 | (40 × 50) |
Short rifle | 797 | 0.15:1 | (56 × 99) | |||
Knife | 210 | 0.04:1 | (40 × 52) | |||
ACF Dataset (Ours) | 1920 × 1080 pixels | Pistol | 8319 | 4961 | 1.12:1 | (49 × 62) |
Knife | 3618 | 1.02:1 | (43 × 66) |
Location | Video Duration | Weapon | Time 1 | Participant 2 |
---|---|---|---|---|
Parking lot 1 | 38 min 44 s | Pistol Knife | Morning Afternoon | 3 |
Parking lot 2 | 28 min 8 s | Pistol Knife | Morning Afternoon | 3 |
Parking lot 3 | 45 min 35 s | Pistol Knife | Afternoon | 4 |
Corridor | 25 min 35 s | Pistol Knife | Afternoon | 2 |
Architecture | SSD MobileNet V2 | EfficientDet D0 | Faster R-CNN Inception Resnet V2 |
---|---|---|---|
Input size | 640 × 640 | 512 × 512 | 640 × 640 |
Activation function | RELU_6 | Swish | SOFTMAX |
Steps | 30,000 | 30,000 | 30,000 |
Batch size | 20 | 12 | 2 |
Evaluation metrics | COCO | COCO | COCO |
Optimizer | Momentum optimizer | Momentum optimizer | Momentum optimizer |
Initial Learning rate | 0.04 | 0.04 | 0.04 |
Training time | 6 h 21 min 28 s | 13 h 8 min 13 s | 3 h 49 min 0 s |
Dataset | Training Set | Validation Set | Total | Test Dataset 1 |
---|---|---|---|---|
Dataset 1: ACF_Pistol Dataset | 3541 | 885 | 4426 | 660 |
Dataset 2: ACF_Knife Dataset | 2848 | 711 | 3559 | 660 |
Dataset 3: ACF Dataset | 6655 | 1664 | 8319 | 1320 |
Tiling Dataset 1: Tiling ACF_Pistol Dataset | 14,164 | 3540 | 17,704 | 2640 |
Tiling Dataset 2: Tiling ACF_Knife Dataset | 11,392 | 2844 | 14,236 | 2640 |
Tiling Dataset 3: Tiling ACF Dataset | 26,620 | 6656 | 33,276 | 5280 |
Type | Architecture | mAP | 0.5 IoU | 0.75 IoU |
---|---|---|---|---|
Dataset 1 | SSD MobileNet V2 | 0.427 | 0.777 | 0.426 |
EfficientDet D0 | 0.296 | 0.763 | 0.154 | |
Faster R-CNN Inception Resnet V2 | 0.279 | 0.686 | 0.145 | |
Dataset 2 | SSD MobileNet V2 | 0.544 | 0.907 | 0.585 |
EfficientDet D0 | 0.347 | 0.804 | 0.223 | |
Faster R-CNN Inception Resnet V2 | 0.350 | 0.734 | 0.261 | |
Dataset 3 | SSD MobileNet V2 | 0.495 | 0.861 | 0.517 |
EfficientDet D0 | 0.242 | 0.661 | 0.111 | |
Faster R-CNN Inception Resnet V2 | 0.376 | 0.804 | 0.275 |
Dataset | Architecture | AUC |
---|---|---|
Dataset 1 | SSD MobileNet V2 | 0.215 |
EfficientDet D0 | 0.903 | |
Faster R-CNN Inception Resnet V2 | 0.819 | |
Dataset 2 | SSD MobileNet V2 | 0.949 |
EfficientDet D0 | 0.977 | |
Faster R-CNN Inception Resnet V2 | 0.883 | |
Dataset 3 | SSD MobileNet V2 | 0.832 |
EfficientDet D0 | 0.913 | |
Faster R-CNN Inception Resnet V2 | 0.884 |
Dataset | Architecture | mAP | 0.5 IoU | 0.75 IoU |
---|---|---|---|---|
Tiling Dataset 1 | SSD MobileNet V2 | 0.559 | 0.900 | 0.636 |
EfficientDet D0 | 0.303 | 0.726 | 0.202 | |
Faster R-CNN Inception Resnet V2 | 0.445 | 0.870 | 0.391 | |
Tiling Dataset 2 | SSD MobileNet V2 | 0.256 | 0.689 | 0.113 |
EfficientDet D0 | 0.488 | 0.855 | 0.526 | |
Faster R-CNN Inception Resnet V2 | 0.630 | 0.938 | 0.747 | |
Tiling Dataset 3 | SSD MobileNet V2 | 0.544 | 0.891 | 0.616 |
EfficientDet D0 | 0.419 | 0.819 | 0.386 | |
Faster R-CNN Inception Resnet V2 | 0.343 | 0.751 | 0.247 |
Dataset | Architecture | AUC |
---|---|---|
Tiling Dataset 1 | SSD MobileNet V2 | 0.972 |
EfficientDet D0 | 0.623 | |
Faster R-CNN Inception Resnet V2 | 0.876 | |
Tiling Dataset 2 | SSD MobileNet V2 | 0.953 |
EfficientDet D0 | 0.952 | |
Faster R-CNN Inception Resnet V2 | 0.886 | |
Tiling Dataset 3 | SSD MobileNet V2 | 0.779 |
EfficientDet D0 | 0.931 | |
Faster R-CNN Inception Resnet V2 | 0.845 |
Dataset | Architecture | 0.5 IoU | |
---|---|---|---|
Raw Image | Tile Image | ||
Tiling Dataset 1 | SSD MobileNet V2 | 0.667 | 0.777 |
EfficientDet D0 | 0.547 | 0.507 | |
Faster R-CNN Inception Resnet V2 | 0.534 | 0.673 | |
Tiling Dataset 2 | SSD MobileNet V2 | 0.789 | 0.779 |
EfficientDet D0 | 0.704 | 0.719 | |
Faster R-CNN Inception Resnet V2 | 0.534 | 0.654 | |
Tiling Dataset 3 | SSD MobileNet V2 | 0.717 | 0.758 |
EfficientDet D0 | 0.545 | 0.632 | |
Faster R-CNN Inception Resnet V2 | 0.620 | 0.620 |
Dataset | Architecture | Raw Image | Tile Image | ||
---|---|---|---|---|---|
Inference | Throughput | Inference | Throughput | ||
Dataset 1 | SSD MobileNet V2 | 44.3 ms | 16 images/s | 98.7 ms | 7 images/s |
EfficientDet D0 | 46.4 ms | 15 images/s | 112.8 ms | 6 images/s | |
Faster R-CNN Inception Resnet V2 | 277.1 ms | 3 images/s | 1061.3 ms | 1 images/s | |
Dataset 2 | SSD MobileNet V2 | 43.2 ms | 16 images/s | 96.5 ms | 7 images/s |
EfficientDet D0 | 47.9 ms | 15 images/s | 110.6 ms | 6 images/s | |
Faster R-CNN Inception Resnet V2 | 293.7 ms | 2 images/s | 1064.3 ms | 1 images/s | |
Dataset 3 | SSD MobileNet V2 | 41.2 ms | 9 images/s | 94.6 ms | 4 images/s |
EfficientDet D0 | 46.7 ms | 8 images/s | 112 ms | 3 images/s | |
Faster R-CNN Inception Resnet V2 | 274.1 ms | 1 images/s | 1066.5 ms | 0.3 images/s |
Method | mAP | 0.5 IoU | 0.75 IoU |
---|---|---|---|
Mock Attack Dataset (González et al. [45]) | 0.009 | 0.034 | 0.003 |
Tiling Mock Attack Dataset (Ours) | 0.077 | 0.192 | 0.035 |
Architecture | mAP | 0.5 IoU | 0.75 IoU |
---|---|---|---|
EfficientDet D0 | 0.092 | 0.254 | 0.028 |
Faster R-CNN Inception Resnet V2 | 0.077 | 0.192 | 0.035 |
SSD MobileNet V2 | 0.084 | 0.200 | 0.051 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hnoohom, N.; Chotivatunyu, P.; Jitpattanakul, A. ACF: An Armed CCTV Footage Dataset for Enhancing Weapon Detection. Sensors 2022, 22, 7158. https://doi.org/10.3390/s22197158
Hnoohom N, Chotivatunyu P, Jitpattanakul A. ACF: An Armed CCTV Footage Dataset for Enhancing Weapon Detection. Sensors. 2022; 22(19):7158. https://doi.org/10.3390/s22197158
Chicago/Turabian StyleHnoohom, Narit, Pitchaya Chotivatunyu, and Anuchit Jitpattanakul. 2022. "ACF: An Armed CCTV Footage Dataset for Enhancing Weapon Detection" Sensors 22, no. 19: 7158. https://doi.org/10.3390/s22197158
APA StyleHnoohom, N., Chotivatunyu, P., & Jitpattanakul, A. (2022). ACF: An Armed CCTV Footage Dataset for Enhancing Weapon Detection. Sensors, 22(19), 7158. https://doi.org/10.3390/s22197158