FireViT: An Adaptive Lightweight Backbone Network for Fire Detection
Abstract
:1. Introduction
- An adaptive lightweight backbone network consisting of deformable convolution combined with a transformer, which we name FireViT, is proposed for smoke and flame detection in fire. Our proposed DeformViT block is the main module in FireViT.
- An improved adaptive activation function, AdaptGELU, is proposed to increase the nonlinear representation of the model and further enhance the accuracy of the network.
- Considering the relatively small number of publicly available labeled fire datasets, we collected and built one of the richest labeled fire datasets with the largest number of fire scenes and fire images to evaluate our model. Our labeled fire dataset contains a fire natural light dataset and fire infrared dataset.
2. Related Work
2.1. MobileViT
2.2. Prediction Head
3. Methods
3.1. Adaptive Lightweight Backbone Network Module: DeformViT Block
3.2. Adaptive Activation Function: Adaptive GELU (AdaptGELU)
4. Experiments and Results
4.1. Dataset
4.2. Implementation Details
4.3. Evaluation Metrics
4.4. Ablation Experiments on the Fire Natural Light Dataset
4.5. Comparison Experiments on Fire Infrared Datasets
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rachman, F.Z.; Yanti, N.; Hadiyanto, H.; Suhaedi, S.; Hidayati, Q.; Widagda, M.E.P.; Saputra, B.A. Design of the early fire detection based fuzzy logic using multisensor. Conf. Ser. Mater. Sci. Eng. 2020, 732, 012039. [Google Scholar] [CrossRef]
- Ye, H.; Xiaogang, W.; Shuchuan, G. Design and Evaluation Method of Wireless Fire Detection Node Based on Multi-Source Sensor Data Fusion. Int. J. Sens. Sens. Netw. 2021, 9, 19. [Google Scholar] [CrossRef]
- Solórzano, A.; Eichmann, J.; Fernández, L.; Ziems, B.; Jiménez-Soto, J.M.; Marco, S.; Fonollosa, J. Early fire detection based on gas sensor arrays: Multivariate calibration and validation. Sens. Actuators B Chem. 2022, 352, 130961. [Google Scholar] [CrossRef]
- Li, Y.; Yu, L.; Zheng, C.; Ma, Z.; Yang, S.; Song, F.; Tittel, F.K. Development and field deployment of a mid-infrared CO and CO2 dual-gas sensor system for early fire detection and location. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 270, 120834. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Sun, B.; Xu, Z.D.; Liu, X.; Xu, D. An intelligent fire detection algorithm and sensor optimization strategy for utility tunnel fires. J. Pipeline Syst. Eng. Pract. 2022, 13, 04022009. [Google Scholar] [CrossRef]
- Qiu, T.; Yan, Y.; Lu, G. An autoadaptive edge-detection algorithm for flame and fire image processing. IEEE Trans. Instrum. Meas. 2011, 61, 1486–1493. [Google Scholar] [CrossRef]
- Ji-neng, O.; Le-ping, B.; Zhi-kai, Y.; Teng, W. An early flame identification method based on edge gradient feature. In Proceedings of the 2018 2nd IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Xi’an, China, 25–27 May 2018; pp. 642–646. [Google Scholar] [CrossRef]
- Khalil, A.; Rahman, S.U.; Alam, F.; Ahmad, I.; Khalil, I. Fire detection using multi color space and background modeling. Fire Technol. 2021, 57, 1221–1239. [Google Scholar] [CrossRef]
- Majid, S.; Alenezi, F.; Masood, S.; Ahmad, M.; Gündüz, E.S.; Polat, K. Attention based CNN model for fire detection and localization in real-world images. Expert Syst. Appl. 2022, 189, 116114. [Google Scholar] [CrossRef]
- Chen, G.; Cheng, R.; Lin, X.; Jiao, W.; Bai, D.; Lin, H. LMDFS: A Lightweight Model for Detecting Forest Fire Smoke in UAV Images Based on YOLOv7. Remote Sens. 2023, 15, 3790. [Google Scholar] [CrossRef]
- Dogan, S.; Barua, P.D.; Kutlu, H.; Baygin, M.; Fujita, H.; Tuncer, T.; Acharya, U.R. Automated accurate fire detection system using ensemble pretrained residual network. Expert Syst. Appl. 2022, 203, 117407. [Google Scholar] [CrossRef]
- Li, A.; Zhao, Y.; Zheng, Z. Novel Recursive BiFPN Combining with Swin Transformer for Wildland Fire Smoke Detection. Forests 2022, 13, 2032. [Google Scholar] [CrossRef]
- Huang, J.; Zhou, J.; Yang, H.; Liu, Y.; Liu, H. A Small-Target Forest Fire Smoke Detection Model Based on Deformable Transformer for End-to-End Object Detection. Forests 2023, 14, 162. [Google Scholar] [CrossRef]
- Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 14420–14430. [Google Scholar] [CrossRef]
- Li, Y.; Hu, J.; Wen, Y.; Evangelidis, G.; Salahi, K.; Wang, Y.; Ren, J. Rethinking vision transformers for mobilenet size and speed. arXiv 2022, arXiv:2212.08059. [Google Scholar] [CrossRef]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar] [CrossRef]
- Wang, R.; Shivanna, R.; Cheng, D.; Jain, S.; Lin, D.; Hong, L.; Chi, E. Dcn v2: Improved deep and cross network and practical lessons for web-scale learning to rank systems. arXiv 2020, arXiv:2008.13535. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Zhuang, J.; Qin, Z.; Yu, H.; Chen, X. Task-Specific Context Decoupling for Object Detection. arXiv 2023, arXiv:2303.01047. [Google Scholar] [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Xu, S.; Wang, X.; Lv, W.; Chang, Q.; Cui, C.; Deng, K.; Lai, B. PP-YOLOE: An evolved version of YOLO. arXiv 2022, arXiv:2203.16250. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Wei, X. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar] [CrossRef]
- Ultralytics-YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 26 June 2023).
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. arXiv 2020, arXiv:2006.04388. [Google Scholar] [CrossRef]
- Xia, Z.; Pan, X.; Song, S.; Li, L.E.; Huang, G. Vision transformer with deformable attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 4794–4803. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
- Dunnings, A.; Breckon, T.P. Fire Image Data Set for Dunnings 2018 Study-PNG Still Image Set; Durham University: Durham, UK, 2018. [Google Scholar] [CrossRef]
- Dedeoglu, N.; Toreyin, B.U.; Gudukbay, U.; Cetin, A.E. Real-time fire and flame detection in video. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05), Philadelphia, PA, USA, 18–23 March 2005; Volume 2, pp. 669–672. [Google Scholar] [CrossRef]
- Ko, B.; Kwak, J.Y.; Nam, J.Y. Wildfire smoke detection using temporospatial features and random forest classifiers. Opt. Eng. 2012, 51, 017208. [Google Scholar] [CrossRef]
- Zhang, Q.X.; Lin, G.H.; Zhang, Y.M.; Xu, G.; Wang, J.J. Wildland forest fire smoke detection based on faster R-CNN using synthetic smoke images. Procedia Eng. 2018, 211, 441–446. [Google Scholar] [CrossRef]
- Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Fulé, P.Z.; Blasch, E. Aerial imagery pile burn detection using deep learning: The FLAME dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar] [CrossRef]
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar] [CrossRef]
- Cui, C.; Gao, T.; Wei, S.; Du, Y.; Guo, R.; Dong, S.; Ma, Y. PP-LCNet: A lightweight CPU convolutional neural network. arXiv 2021, arXiv:2109.15099. [Google Scholar] [CrossRef]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Adam, H. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- An, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar] [CrossRef]
(a) Fire natural light dataset | ||||
Data Sources | Rotate-Flip-Affine Transformation | Total Number | Number of “smoke” labels | Number of “fire” labels |
A | × | 10,048 | 27,221 | 19,103 |
B | √ | 54,968 | 61,324 | 20,297 |
C | √ | 29,222 | 25,345 | 17,711 |
D | √ | 6488 | 12,124 | 5453 |
E | × | 12,201 | 23,615 | 72,660 |
F | × | 8412 | 6064 | 34,159 |
(b) Fire infrared dataset | ||||
Infrared data | Rotate-Flip-Affine Transformation | Total Number | Number of “fire” labels | |
Fusion | √ | 75,219 | 214,005 | |
GreenHot | × | 5011 | 26,386 | |
WhiteHot | × | 5891 | 22,590 | |
Structured Infrared Light | √ | 9991 | 10,198 |
Training Parameter Settings | Particulars |
---|---|
Initialization | MSRA initialization [39] |
Input image dimensions | (640, 640, 3) |
Optimizer | SGD |
Momentum | 0.937 |
Initial learning rate | 0.01 |
Weight Decay | 0.0005 |
Number of images per batch | 8 |
Epochs | 50 |
Options | mAP | Params | GFLOPs |
---|---|---|---|
MobileViT block | 90.9% | 1.9M | 13.8 |
I | 91.7% | 2.1M | 12.5 |
II | 91.6% | 1.8M | 12.2 |
III | 91.5% | 1.8M | 12.2 |
IV | 91.1% | 1.6M | 11.9 |
V | 91.2% | 1.6M | 11.9 |
VI | 91.5% | 1.8M | 12.2 |
VII | 91.5% | 1.8M | 12.2 |
VIII | 91.8% | 2.1M | 12.5 |
Model | mAP | Params | GFLOPs |
---|---|---|---|
FireViT-SiLU | 91.82% | 2.1 M | 12.5 |
FireViT-Sigmoid | 91.38% | 2.1 M | 12.5 |
FireViT-ReLU | 91.89% | 2.1 M | 12.5 |
FireViT-GELU | 91.88% | 2.1 M | 12.5 |
FireViT-AdaptGELUv1 | 92.09% | 2.1 M | 12.5 |
FireViT-AdaptGELUv2 | 92.14% | 2.1 M | 12.5 |
Model | mAP | Params | GFLOPs | ||
---|---|---|---|---|---|
GhostNetV2 | 89.2% | 91.0% | 90.1% | 3.8 M | 6.3 |
PP-LCNet | 89.6% | 90.9% | 90.25% | 3.0 M | 10.8 |
ShuffleNetV2 | 89.4% | 90.1% | 89.75% | 3.0 M | 10.7 |
MobileNetV3 | 89.0% | 90.2% | 89.6% | 3.1 M | 5.5 |
EfficientNet | 89.0% | 91.0% | 90.0% | 3.4 M | 8.4 |
FireViT(ours) | 91.3% | 92.9% | 92.1% | 2.1 M | 12.5 |
Model | mAP | Params | GFLOPs | ||
---|---|---|---|---|---|
EfficientViT-M0 | 89.0% | 90.7% | 89.85% | 2.8 M | 7.4 |
SwinTransformer | 88.9% | 90.1% | 89.5% | 2.2 M | 6.9 |
EfficientFormerV2-S0 | 89.4% | 91.2% | 90.3% | 3.8 M | 9.0 |
MobileViT-XS | 91.0% | 90.8% | 90.9% | 1.9 M | 13.8 |
FireViT(ours) | 91.3% | 92.9% | 92.1% | 2.1 M | 12.5 |
Model | Params | GFLOPs | |
---|---|---|---|
GhostNetV2-Infrared | 93.7% | 3.8 M | 6.3 |
PP-LCNet-Infrared | 94.1% | 3.0 M | 10.8 |
ShuffleNetV2-Infrared | 94.0% | 3.0 M | 10.7 |
MobileNetV3-Infrared | 93.5% | 3.1 M | 5.5 |
EfficientNet-Infrared | 93.9% | 3.4 M | 8.4 |
EfficientViT-M0-Infrared | 93.8% | 2.8 M | 7.4 |
SwinTransformer-Infrared | 93.3% | 2.2 M | 6.9 |
EfficientFormerV2-S0-Infrared | 93.9% | 3.8 M | 9.0 |
MobileViT-XS-Infrared | 94.3% | 1.9 M | 13.8 |
FireViT-Infrared (ours) | 95.1% | 2.1 M | 12.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, P.; Sun, N.; Hu, K.; Ye, X.; Wang, P.; Xia, Q.; Wei, C. FireViT: An Adaptive Lightweight Backbone Network for Fire Detection. Forests 2023, 14, 2158. https://doi.org/10.3390/f14112158
Shen P, Sun N, Hu K, Ye X, Wang P, Xia Q, Wei C. FireViT: An Adaptive Lightweight Backbone Network for Fire Detection. Forests. 2023; 14(11):2158. https://doi.org/10.3390/f14112158
Chicago/Turabian StyleShen, Pengfei, Ning Sun, Kai Hu, Xiaoling Ye, Pingping Wang, Qingfeng Xia, and Chen Wei. 2023. "FireViT: An Adaptive Lightweight Backbone Network for Fire Detection" Forests 14, no. 11: 2158. https://doi.org/10.3390/f14112158
APA StyleShen, P., Sun, N., Hu, K., Ye, X., Wang, P., Xia, Q., & Wei, C. (2023). FireViT: An Adaptive Lightweight Backbone Network for Fire Detection. Forests, 14(11), 2158. https://doi.org/10.3390/f14112158