1. Introduction
Forests are vital to human survival, as they provide a variety of ecosystem services, such as air and water purification, land protection, and biodiversity conservation. Looking at the resource data of China’s forestry industry, China’s forest coverage rate is only 16.55 percent, far below the global average of 27 percent, and per capita, forest resources account for only 20 percent of the global average. However, forests have been damaged by various human activities, especially forest fires. Forest fires are sudden, rapid, widespread, difficult to fight, and extremely harmful. Forest fires cause serious economic losses and human casualties, directly touching the forest industry’s sustainable development and the national ecological security barrier.
Traditional forest fire monitoring mainly uses ground patrol, watchtowers, airplanes, and satellites for monitoring. Due to the constraints of weather and climate, technical level, and monitoring operation costs, ground patrol, and satellite remote sensing technology cannot monitor and collect information related to the start of forest fires in real time. With the development of image processing technology and video analysis technology, forest fire monitoring technology based on video recognition has gained more and more attention in the field of forest fire prevention and control and has been widely used. The objects currently applied to forest fire video image monitoring system monitoring are mainly flames and smoke.
In smoke recognition, the focus is on the discovery, extraction, and analysis of smoke in visible images. In 2001, Jerome and Philippe [
1,
2] from France proposed a fast algorithm for complex motion extraction in small spatial envelopes based on the extraction of localized motions from a cluster analysis of points in a multidimensional time-embedded space for smoke recognition, and in 2002 they proposed the main criterion for smoke recognition by extracting pixel continuous dynamic envelopes from an image through an extraction of segmentation techniques for pixel continuous dynamic envelopes. In 2003, F. Gomez-Rodriguez [
3] of the University of Seville proposed wavelet and optical flow-based computerized image processing techniques for monitoring fire spread; in 2004, Fernandes A M [
4] of Portugal used neural network algorithms to monitor a small-scale forest fire experiment, using a neural network algorithm to automatically identify smoke features in LIDAR signals collected in the experiment, which improved the classification efficiency of smoke feature patterns and reduced the false alarm rate. References [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14] performed smoke detection using motion detection, color, texture, contour, and spatio-temporal features, respectively. References [
15,
16,
17] on the other hand, used a deep belief network and a deep confidence network for training to obtain a smoke recognition model, which improved the detection rate and reduced the number of false alarms. References [
18,
19,
20,
21,
22,
23,
24] used a convolutional neural network (CNN) model and Resnet50 network for smoke detection. References [
25,
26] talk about Conditional Random Fields (CRFs) and Bi-directional Long and Short-Term Memory Networks (ABi-LSTM) applied to smoke recognition, respectively. Xuehui Wu et al. [
27] from Southeast University proposed a stable AdaBoost (RAB) classifier to improve the detection effect and performance of fire smoke detectors, and later proposed a new regularization and variable selection method. Reference [
28] used the method of co-training a smoke recognition model with synthetic and real smoke images to improve the accuracy of smoke recognition. References [
29,
30,
31] proposed some algorithms for forest fire detection, especially for how to improve accuracy in foggy environments.
In flame recognition, the main research focus is to optimize and improve the flame foreground, including the segmentation of the front and back view, the spatial and temporal features of the flame, and the recognition of the texture. Celik and Habiboglu et al. [
32,
33] in Turkey proposed to improve the flame detection effect by refining the classification of fire image elements and spatial and temporal features through statistical modeling, respectively. In 2019, Tan Yong et al. [
34] at Jiangnan University improved the accuracy and anti-jamming performance of flame recognition using symbiotic matrices and an improved probabilistic neural network (PNN) method. Sun Xiaofang and Wu Xuehui [
35,
36], respectively, improved the support vector machine and applied it to flame recognition. Dai et al. [
37] proposed a deep-learning-based multi-scale video flame detection model for fire warning, which solves the problem of weak and inconvenient detection of fire at an early stage.
The recognition of infrared images in conjunction with visible images plays an increasingly important role in forest fire monitoring and early warning technologies. In 2021, Ciprián-Sánchez et al. [
38] proposed a deep-learning-based infrared–visible image fusion method, FIRe-GAN, for recovering areas occluded by smoke and haze. In 2021, Sun Fei et al. [
39] utilized infrared and visible light bands to obtain composite feature information of forest fires using a binocular vision method. In 2022, Bowen Wang et al. [
40,
41] from Nanjing University of Science and Technology realized high-resolution infrared and visible light image reconstruction, which significantly improves the imaging quality of a fused image.
In summary, forest fire monitoring is mainly for flame image and smoke recognition. When a forest fire risk appears in the early stage, the flame is too small to be easily obscured by the forest trees and cannot be monitored; when the flame is large enough to be recognized and monitored, the fire has spread to a certain scale, which makes it difficult to fight. Smoke recognition can be found in the early stage of a forest fire if the forest fire occurs in the early stage of the early warning, which can reduce the difficulty of fighting the fire and reduce the loss of life and property. However, fire-induced smoke is very similar to water mist, dust, and clouds, and the existing methods cannot distinguish them well, resulting in a high rate of misidentification. Therefore, how to improve the accuracy of smoke recognition in the early stage of forest fire, so as to improve the accuracy of fire warning is a technical problem that needs to be solved urgently.
2. Methodologies
In this paper, we propose a forest fire risk monitoring and early warning algorithm, which aims to improve the accuracy of smoke recognition in the early stage of forest fires, especially to differentiate between water mist, dust, clouds, and other situations that are prone to causing false alarms (see
Figure 1), so as to improve the accuracy of early warning.
The algorithm acquires a currently visible light image and an infrared image of the same forest area by means of a visible infrared bispectral camera. It firstly utilizes a smoke detection model based on deep learning to perform smoke detection on said visible light image, and obtains the confidence level of the occurrence of fire in the said visible light image; secondly, it determines whether the local temperature value of said infrared image exceeds the preset warning value, and obtains the judgment result based on the infrared image. It then calculates a current FWI from which to determine the current fire danger level; and finally, it determines whether to issue a fire warning based on said fire danger level, said confidence level of the occurrence of fire in said visible light image, and said judgment result based on the infrared image. The algorithm flow is shown in
Figure 2.
2.1. YOLOv5 Training Model
The smoke detection model based on deep learning is pre-trained to obtain and specifically label the collected visible light images of the early stage of the forest fire, labeling the location and shape of the real fire point and the smoke and fire, and labeling the collected visible light images of other objects that can cause fire misreporting, such as fog images, cloud images, machine high temperature images, lake reflections, and building chimneys, etc. It also labels the other objects that can cause false alarms in the visible light images collected, including the location and shape of the object, such as clouds, fog, and dust, etc. Labeling is divided into four categories, namely, fire, clouds, fog, and other, and detecting smoke is the goal. Labeled images have been used to train the pre-programmed model, and ultimately program the training model to be able to detect smoke. The output of the model is the confidence level of fire occurrence in the input image. The confidence level that a fire has occurred in the input image should be interpreted as the confidence level that a fire has occurred in the actual environment corresponding to the input image.
Instead of viewing an image thousands of times (extracting thousands of Anchor Boxes) to produce a classification result, the YOLO family of object detection models transforms the object detection problem into a regression problem, using a deep neural network to predict object information and directly generate coordinates and probabilities for each category. During training, it processes the entire image at once, and its predictions are determined by the global context of the image. Thus, YOLO is an end-to-end single-stage approach that has been developed into several versions through continuous optimization to date [
42,
43].
YOLOv5 adopts weighted NMS in the screening of prediction frames, which also further improves the model prediction ability, so in this paper, YOLOv5 is selected for deep learning training on visible light images. The EMA attention mechanism is introduced in the model feature extraction network to improve the feature representation capability of the network, and the network model framework used in this paper is shown in
Figure 3.
Among them, the computational flow of the EMA attention mechanism [
44] is shown in
Figure 4, which fully considers the feature grouping and multi-scale structure, which is conducive to the effective establishment of short-term and long-term dependencies, thus obtaining better performance.
The module extracts the attention weight descriptors of the grouped feature map through three parallel routes. Two 1D global average pooling operations are employed to encode the channels along two spatial directions, respectively, in the 1 × 1 branch, and only one 3 × 3 kernel is stacked in the 3 × 3 branch to capture the multiscale feature representation. After decomposing the output of the 1 × 1 convolution into two vectors, two nonlinear Sigmoid functions are utilized to fit the 2D binomial distribution after linear convolution. In order to achieve different cross-channel interaction features between the two parallel paths of the 1 × 1 branch, the two-channel smart attention maps within each group are aggregated together by multiplication. The 3 × 3 branch captures the local cross-channel interactions by a 3 × 3 convolution in order to expand the feature space. Inter-channel information is encoded to adjust the importance of different channels, and precise spatial structure information is retained in the channels. The global spatial information of the 1 × 1 branch outputs is encoded using 2D global average pooling. Finally, the output feature mapping within each group is computed as the set of the two spatial attention weight values generated, and then a Sigmoid function is used to highlight the global context of all pixels. Cross-spatial information aggregation methods are considered, and the parallel use of 3 × 3 and 1 × 1 convolutions allows for more contextual information to be utilized in the intermediate features, modeling remote dependencies while embedding precise location information into the EMA mechanism.
2.2. Infrared Thermal Imaging Early Warning
Infrared thermal imaging technology uses the reception of infrared radiation emitted by the object, and turns the measured target object surface infrared radiation into a video signal, while the detected target itself radiates heat energy, which will be converted into a real-time object surface reflecting the characteristics of the target thermal image, both to obtain the measured target temperature distribution to determine the state of the object. The background temperature of a forest area is generally −40 degrees to 60 degrees, while the temperature of the flame produced by forest combustible material is 600 to 1200 degrees; the temperature difference between the two is large, so the heat source can be detected as early as possible, so as to fulfil the purpose of preventing fires.
The temperature at the location of a burning or shaded fire point will be much higher than the surrounding temperature, so temperature characterization is preferred for fire identification and is the main reason for converting visible cameras to infrared cameras in forest fire monitoring and early warning methods. If the local temperature value of an infrared image exceeds the preset warning value, then it is considered that the forest area corresponding to the infrared image is very likely to contain a fire. When the local temperature value of the infrared image exceeds the preset warning value, the judgment result is recorded as 1. When the local temperature value of the infrared image does not exceed the preset warning value, the judgment result is recorded as 0.
2.3. Canadian Forest Fire Weather Index
Forest fires have a very close relationship with meteorological conditions. The main meteorological elements that have a significant impact on the occurrence and spread of forest fires are temperature, precipitation, relative humidity, wind speed, and so on.
The Canadian Fire Danger Rating System (CFFDRS) was developed by the Canadian Federal Forest Service in 1968, but initial research dates back to the 1820s. The two main subsystems of the CFFDRS, the Fire Weather Index (FWI) system, and the Fire Behavior Prediction (FBP) system, have been in official operation for many years; the other two subsystems, the Combustible Moisture Aid (CMA) system, and the Canadian Forest Fire Occurrence Prediction (FOP) system, are available in a variety of regional versions but are not available to the general public. Various regional versions exist, but they have not yet been developed into a national version [
45].
The formulas for calculating the component factors of the FWI system can be found in the Equations and FORTRAN program for the Canadian Forest Fire Weather Index System [
46] and the Development and structure of the Canadian forest fire weather index system [
47]. Two books with detailed instructions.
In forest fire ecological studies, the Fire Weather Index (FWI) is the main indicator for predicting the occurrence of forest fire behavior, energy release, and estimating the size of fire danger, which can better reflect the fire weather conditions. The five-level system is commonly used in forest areas in China, where fire danger weather levels are categorized into five levels; see
Table 1 for details.
2.4. Forest Fire Risk Early Warning Algorithm
The process of determining whether or not to issue a fire warning based on the fire danger level described earlier, the confidence level that a fire has occurred in the visible image, and the results of the infrared image-based judgment considers the following rules:
If the fire danger level is at level 2, and the confidence level of the visible light image of the occurrence of fire is greater than or equal to the first threshold value and the judgment result based on the infrared image is 1, then a fire warning is issued, wherein the judgment result based on the infrared image is 1, indicating that the local temperature value of said infrared image exceeds the warning value;
If the fire danger level is at level 3, and the confidence level of the occurrence of fire in the visible image is greater than or equal to the second threshold value and the judgment result based on the infrared image is 1, then a fire warning is issued;
If the fire danger level is at level 4, and the confidence degree of the occurrence of fire in the visible light image is greater than or equal to the third threshold value and the judgment result based on the infrared image is 1, a fire warning is issues;
If the fire danger level is at level 5, and the confidence level of the occurrence of fire in the visible light image is greater than or equal to the fourth threshold value and the judgment result based on the infrared image is 1, then a fire warning is issued, wherein said first threshold value is greater than said second threshold value, said second threshold value is greater than said third threshold value, and said third threshold value is greater than said fourth threshold value;
In other cases, no fire warning is issued.
This method is intended to improve the accuracy of fire warnings. Specifically, if the fire danger level is at level 2, and the confidence level of the occurrence of in the visible image is greater than or equal to the first threshold value and the judgment result based on the infrared image is 1, then it is determined that the recognized smoke is smoke generated by a fire, and a fire warning is issued, wherein the judgment result based on the infrared image is 1, indicating that the local temperature value of said infrared image exceeds the warning value. If the fire danger level is at level 3, and the confidence degree of the occurrence of fire in said visible light image is greater than or equal to the second threshold value and the judgment result based on the infrared image is 1, then it is determined that the recognized smoke is smoke generated by fire, and a fire warning is issued. If the fire danger level is at level 4, and the confidence degree of the occurrence of fire in the visible light image is greater than or equal to the third threshold value and the judgment result based on the infrared image is 1, then it is determined that the recognized smoke is fire-generated smoke, wherein a judgment result of 1 indicates that the local temperature value of said infrared image exceeds the warning value, and a fire warning is issued. If the fire danger level is at level 5, and the confidence level of the occurrence of fire in said visible light image is greater than or equal to the fourth threshold and the judgment result based on the infrared image is 1, then it is determined that the recognized smoke is smoke generated by a fire, and a fire warning is issued. In addition to the above cases where a fire warning is required, a fire warning is not issued in other cases, i.e., smoke corresponding to other cases is not fire-generated smoke, e.g., when the fire danger level is 1, the recognized smoke is not fire-generated smoke, and a fire warning is not issued. That is, regardless of the value of the confidence level and the judgment result based on the infrared image, as long as the fire danger level is at level 1, a fire warning is not issued. Furthermore, for a case in which the fire danger level is at level 2, the confidence level of the occurrence of a fire in the visible image is greater than or equal to the first threshold value, and the judgment result based on the infrared image is 0, a fire warning is not issued.
Table 2 shows the cases in which a fire warning is required, wherein a warning result of 1 indicates that a fire warning is required.
3. Results
3.1. Experimental Data
A 45-channel visible–infrared bispectral camera deployed in a certain area is selected, and the parameters of the camera are as follows: maximum image size: 640 × 512; focal length (lens): 100 mm; furthest fire point detection distance (based on 2 m × 2 m): 6000 m; furthest smoke detection distance (based on 5 m × 5 m): 8 km; field of view: 24.6° × 19.8°~6.2° × 5.0°; visible lens: 15.6–500 mm; support optical fog transmission; 360° continuous rotation in the horizontal direction, −45°~+45° in the vertical direction; power supply: DC48V; operating temperature and humidity: −40 °C to 65 °C, humidity less than 90%; protection grade: IP66.
The experimental images in this paper are all derived from natural camera shots, using 1359 visible light images captured in March and April 2023 and 1359 infrared images captured simultaneously to cover the four types of images that tend to trigger alarms. As shown in
Figure 5,
Figure 6,
Figure 7 and
Figure 8.
3.2. Model Training
In this paper, we use the PyTorch deep learning framework, python 3.9 programming language, Windows 10 system, and NVIDIA Geforce Quard P6000 24 G video memory. The official pre-training weights of YOLOv5s are used, the number of training iterations is set to 600 epochs, the size is 16, the optimizer is SGD, the initial learning rate is 0.01, the momentum is 0.937, the regularization coefficient of weight decay is 0.005, and Mosaic is used for data enhancement.
In the article, 1359 visible image datasets are allocated according to the ratio of the training set, validation set, and test set, 8:1:1, of which 1088 are in the training set, 135 are in the validation set, and 136 are in the test set. The loss values and validation accuracy curves during training are shown in
Figure 13 and
Figure 14.
The PR curves for the test result data obtained from the model test set in this paper are shown in
Figure 15:
It can be seen that the algorithm in this paper achieves an mAP value of 83.4% in the test set, which is able to realize the detection of smoke in the early stage of fire more accurately.
3.3. FWI Value Calculation
Data including temperature, humidity, wind speed, wind direction, soil temperature, soil moisture content, rainfall in the last hour, consecutive days without rain, etc., can be obtained through the meteorological stations, based on which the fire risk level is calculated according to the FWI calculation formula, which serves as the basis for forest fire risk early warning aid judgment.
To eliminate the influence of the initial value and to form a complete time series, in this paper, the meteorological factors of the time series are used to carry out the calculation of each component factor of the FWI system, and the duration of this time series is more than 1 year longer than that of the data to be analyzed, and the period affected by the initial value is not included in the analytical process when carrying out the analysis, to minimize the influence of the results due to the difference in the initial value. The forest fire weather index FWI values for March are shown in
Table 3.
The FWI values for March are as follows:
The selection of fire danger level thresholds is based on fire archives, historical fire conditions and weather realities and other adjustments to determine the historical empirical values. The fire danger thresholds of the selected picture areas are shown in
Table 4:
3.4. Experimental Results
The accuracy, precision, recall, and F1 score were calculated to determine the performance of the experimental results obtained in this study, where TP is the number of true positives, FP is the number of false positives, FN is the number of false negatives, and TN is the number of true negatives. The first letter indicates whether the real value is correctly divided from the predicted value, T indicates that the judgment is correct, and F indicates that the judgment is wrong (false); the second letter indicates the result of the classifier’s judgment (prediction), P indicates that the judgment is a positive case, and N indicates that the judgment is a negative case. The relationship between them is shown below. First, the accuracy and precision were obtained using Equations (1) and (2), the recall was obtained using Equation (3), and the reconciled mean F1 score of accuracy and detection was obtained using Equation (4)
In the judgment rule of 2.4 herein, said first threshold, second threshold, third threshold, and fourth threshold vary slightly according to environmental conditions, and the first threshold obtained from the experiments conducted herein is 90%, the second threshold is 80%, the third threshold is 65%, and the fourth threshold is 58%. Then whether or not a fire warning should be issued is determined according to the fire danger level, the confidence level of the occurrence of fire in the visible image, and the judgment result based on an infrared image. According to the evaluation index of the classification algorithm, the number of fire warnings that need to be issued and are actually fires is judged as TP, the number of fire warnings that need to be issued but are not actually fires is judged as FP, the number of fire warnings that do not need to be issued but are actually fires is judged as FN, and the number of fires that do not need to be issued but are actually not fires is judged as TN, and then the values of the accuracy, precision, recall and f1 score are calculated according to the formulas in this chapter.
Table 5 shows the accuracy, precision, recall, and F1-score calculated based on the fire warning results, as well as a comparison with the method using only YOLOv5 with the EMA attention mechanism (YOLOV5-EMA) in
Section 2.1 herein, and Computer Vision Detection and Convolutional Neural Network (CVD-CNN) [
31], Single Shot Multibox Detector (SSD) [
48] Faster R-CNN (Region proposal Convolutional Neural Network) [
49] for comparison results.
As shown in
Table 5, the model presented in this study achieves an accuracy of 94.12%, a precision of 96.1%, a recall of 93.67, an F1-score of 94.87, and an accuracy of 83.42% using YOLOv5-EMA deep learning algorithms alone, with an accuracy of 93.0% for CVD-CNN, 85% for SSD, and 89.0% for Faster R-CNN.
4. Discussion
In this study, the object of our research is not to distinguish images with smoke from images without smoke, but rather images that contain situations such as smoke and fire, water mist, clouds, dust, exhaust from large machines, reflection of the sun, etc., that are determined by a separate type of visible or infrared image to be a forest fire, causing false alarms. The types of images that cause early warnings are shown in
Table 6.
In this study, we collect a large number of images of the early stage of forest fires, and then label these images to mark the location and shape of the fire point and the smoke and fire, and collect other visible light images that will cause false alarms of fire, including images of fog, cloud images, images of high temperatures of machines, images of reflections of lakes, images of chimneys of buildings, etc. We label the objects that cause false fire alarms as well as their location and shape, and divide them into four categories to be labeled as fire, cloud, fog, and other, and then train the predefined model using these labeled images, and ultimately obtain a deep-learning-based smoke detection model. Through deep learning training, the detection of smoke at the early stage of a fire can be more accurately realized. In addition, the smoke detection model obtained by the training method can be trained to obtain confidence in the occurrence of fire in images containing water mist, clouds, dust, the exhaust of large machines, and the reflection of the sun.
In this paper, YOLOv5 is selected for deep learning training of visible images, and the EMA (Exponential Moving Average) attention mechanism is introduced in the YOLOv5 model feature extraction network to improve the feature representation capability of the network. The computational process of the EMA attention mechanism is an existing process, and this EMA attention mechanism fully considers the feature grouping and the multi-scale structure, which is conducive to the effective establishment of short-term and long-term dependencies, thus achieving a better performance.
In forest fire monitoring scenarios, it is difficult for visible light images to distinguish smoke and fire from other shapes like clouds and machine exhaust, resulting in a high false recognition rate. This situation was improved after we combined the fire danger level obtained from the fire weather index with the infrared image warning. As can be seen from
Table 6, most of the time when water mist or vapor occurs, the meteorological environment generally shows a low temperature and high humidity, which corresponds to a fire weather level of 1 or 2, which does not trigger an infrared alarm; when dust and white clouds appear, the meteorological environment generally shows a high temperature and low humidity, which does not trigger an infrared warning. The sun’s reflection and the exhaust of machines and equipment trigger an infrared warning, but their visible similarity with smoke is very low, and will not trigger a visible image warning. Therefore, in summary, we use deep learning training of light with the FWI fire danger level and infrared image warning, which has a higher accuracy than simply using visible image or infrared image warnings, further improving the precision rate and solving the problem of the high rate of false alarms at the early stage of forest fire.
5. Conclusions
This paper proposes a method for monitoring and early warning of forest fire danger, wherein the method determines the current fire danger level based on the current FWI, the confidence level of the occurrence of a fire situation in visible light images, and the result of the judgment based on infrared images to determine whether or not to issue a fire warning. The study first labels the collected visible light images, including images of the early stages of a forest fire situation and other visible light images that cause false fire alarms, such as clouds, fog, and dust, with their locations and shapes. Then the EMA attention mechanism is introduced to the feature extraction network of the YOLOv5 model to obtain the described predefined model, and the predefined model is trained by deep learning using the labeled images to obtain a deep learning-based smoke detection model. The fire danger level obtained from the fire weather index and the infrared image warning are combined on this basis, which resolves the problem of the deep smoke detection model making it difficult to distinguish fire-generated smoke from other shapes like clouds and machine exhausts in visible images. Using this method, false alarms caused by water mist, clouds, dust, etc., are effectively reduced, and the accuracy of the initial risk warning of forest fires is improved.
In our future research, we will continue to work on early risk monitoring and warning of forest fires. We plan to use visible images, infrared images, and meteorological information as parameters together, and to put them in a large model for training, to obtain a more accurate warning model, and at the same time, be able to classify interfering images such as water mist, clouds, dust, exhaust from large machines, and the reflection of the sun.