1. Introduction
Forest fires are significant natural disasters that have emerged as a major global concern. These fires present substantial threats to ecological environments, economic development, and human life and property, posing challenges for many countries in recent decades. Forest fire risk assessment is instrumental in mitigating these risks and plays a crucial role in forest fire management [
1,
2]. By evaluating forest fire risk, managers can implement fire prevention measures such as fire risk zoning and fuel management to reduce fire losses [
3,
4,
5].
Risk assessment can use probability to express the uncertainty of the occurrence or the intensity of hazardous events [
6]. Predicting the probability of forest fires in a region requires identifying factors influencing fire occurrence, such as climate, topography, and fuel conditions [
7,
8,
9,
10], and determining the connection between these factors and the probability of fire occurrence [
11]. These factors, commonly designated as driving factors, affect the occurrence and spread of forest fires. Vegetation, as the primary fuel source in forest fires, plays a pivotal role as a driving factor and has therefore been extensively studied. Jiang et al. [
12] studied the application of vegetation water content (VWC) in forest fire risk assessment, and Luo [
13] explored forest fire risk using the MODIS Leaf Area Index (LAI) and Live Fuel Moisture Content (LFMC) data as vegetation driving factors. All these indices aim to represent vegetation’s role as a combustible material in fires, where its internal moisture and carbon content significantly impact fire behavior. However, in dense forests, optical vegetation indices saturate with canopy quality [
14]. Optical remote sensing primarily detects the canopy fuel moisture content, which limits its ability to sense surface fuel moisture content in dense forests. In fire risk assessment, the surface fuel moisture content is more critical. Microwave remote sensing technology has unique advantages in acquiring vegetation water content because the dielectric constant of vegetation is primarily influenced by moisture content information, which in turn determines the absorption and scattering of microwave signals [
15,
16]. Microwave remote sensing signals can penetrate the canopy to sense the moisture content of ground fuels. Wang et al. [
17,
18,
19] utilize microwave remote sensing technology to retrieve fuel moisture content (FMC), employing different microwave remote sensing data and models to enhance the accuracy of FMC estimation. The results demonstrate that microwave remote sensing has advantages over traditional optical methods for monitoring FMC under all-weather conditions, confirming its effectiveness for FMC retrieval in grassland and forest environments. Mei [
20] used RADARSAT-2 dual-polarization Synthetic Aperture Radar (SAR) data combined with the water cloud model and bare soil scattering model to retrieve the fuel moisture content of forest margin grassland. These studies demonstrated the model’s feasibility for moisture retrieval. However, fewer studies have used data from microwave remote sensing to assess forest fire risk. The Vegetation Optical Depth (VOD), a microwave index derived from microwave remote sensing data, has been shown to be proportional to the water content of above-ground vegetation [
21]. Leveraging this relationship, we attempt to improve the model’s sensitivity to the vegetation water status by combining optical and microwave remote sensing data.
To enhance the accuracy of forest fire prediction, many scholars have proposed different methods. Eskandari and Miesel [
22], for the high-risk areas of forest fires in Iran, proposed that knowledge-based hierarchical analysis (AHP) and fuzzy set methods can effectively identify the locations where forest fires are likely to occur. Nami et al. [
23] employed the Evidential Belief Function (EBF) method to predict fire occurrence probability in northern Iran’s Hyrcanian ecological zone, showing significant effectiveness with an area under the curve (AUC) of 84.14%. Farahmand et al. [
24] developed the Fire Danger from Earth Observation (FDEO) algorithm to predict fire danger in the contiguous United States with a lead time of up to two months, achieving an overall accuracy of up to 75% during the fire season. In recent years, machine learning techniques have been widely used to analyze complex environmental data and predict fire occurrences [
25,
26]. Specifically, the application of deep learning models in forest fire occurrence probability research has become increasingly common. For instance, Abdollahi and Pradhan [
27] utilized deep neural networks (DNNs) in conjunction with an explainable artificial intelligence (XAI) to develop predictive models to map wildfire susceptibility and identify key contributing components within them. Lin et al. [
28] applied the long-term and short-term time series network (LSTNet) model, incorporating convolutional and recurrent layers, for forest fire prediction. This method captures long-term patterns missed by traditional models, recognizing that the occurrence of forest fires is influenced by both short-term and long-term data variations, thereby achieving high accuracy (ACC 0.941) and demonstrating effectiveness in spatial predictions of forest fire susceptibility using time series data.
In natural disaster research, risk is defined as the expected loss or benefit, including the probability and potential impact of natural disaster events [
29]. Fire occurrence and spread require a certain amount of time, and, if quickly extinguished after the initial occurrence, severe consequences can be avoided. However, in large-scale or severe forest fires, rapid spread and expansion accompany fire occurrences [
30]. Simulating fire spread can reflect the combined influence of factors on forest fires, thereby comprehensively reflecting regional fire risks [
31,
32]. To predict and simulate forest fire situations, various model techniques with different predictive variables have been developed. Cellular Automaton (CA) models and Monte Carlo (MC) simulations are widely used in fire spread modeling and forest fire research. CA models simulate fire spread by representing the landscape as a grid of cells, each of which can be in different states (e.g., unburned, burning, burned). The state of each cell evolves based on predefined rules and the states of neighboring cells [
33]. MC simulations add a probabilistic element to this process, allowing for the incorporation of randomness and uncertainty in fire behavior, making the models more realistic and robust [
30]. Xuezheng et al. [
32] used the Burn-P3 model to simulate burn probability, potential fire intensity, spread speed, and fire occurrence types at the landscape scale and calculate fire exposure, using the AHP to assess forest fire risk. Carmel Y et al. [
34] used the FARSITE model for Monte Carlo simulation of fire spread, generating high-resolution forest fire risk maps in Mount Carmel, Israel.
To address the complex and dynamic nature of forest fires, many studies have developed various methods for assessing fire risks and predicting occurrences. Building on these advances, this paper seeks to further improve the accuracy and comprehensiveness of forest fire risk assessments by integrating multi-source data. This paper integrates multi-source remote sensing, meteorological, topographic, and social data through deep-learning-based time series predictions and potential fire spread simulation techniques to establish a forest fire risk assessment method. The innovation of this work lies in the following: (1) merging the microwave index VOD with optical remote sensing data to obtain vegetation water information, (2) predicting forest fire occurrence probability using time series methods and incorporating an attention mechanism to better focus on and weigh driving factors in the input data, learning key features from data during the week before the fire and capturing trends and changes in meteorological and vegetation factors before fire occurrences, (3) assessing forest fires through an integrated assessment that includes both the probability of fire occurrence and the potential fire spread probability to provide an overall measure of the fire risk.
2. Materials and Methods
2.1. Study Area and Fire Data
The study area (98° E–104° E, 24° N–30° N) is located at the border of Sichuan Province and Yunnan Province in the southwest region of China, covering Liangshan Prefecture, Panzhihua City in Sichuan Province, and Diqing Prefecture, Lijiang City, Dali City, Chuxiong City, and Kunming City in Yunnan Province (
Figure 1a).
The terrain in this region is characterized by complex topography, steep mountains, and elevations ranging from 350 to 6400 m. The forests are primarily composed of subtropical forests, and the main vegetation types are shown in
Table 1 and
Figure 1b. The climate belongs to a subtropical monsoon climate, characterized by distinct wet and dry seasons. The annual average temperature is 15–20 °C, and the annual temperature varies between 15 °C and 25 °C. The annual average precipitation is 800–1000 mm, with 75%–85% of the annual precipitation occurring from May to October. The precipitation in winter is scarce, and temperatures rise quickly, leading to severe drought and dry vegetation during winter and spring.
The study area is a high-fire-risk zone in China, where forest fires mainly occur in winter and spring [
35]. Our analysis of the data reveals a similar seasonal pattern, where the forest fire occurrence frequency across different months exhibits a clear trend (
Figure 2a), with peak occurrences particularly from December to May (
Figure 2b). In contrast, the fire frequency from June to November is lower and more concentrated.
In the study, forest fire data were collected through the NASA Fire Information for Resource Management System (FIRMS) (
https://firms.modaps.eosdis.nasa.gov/, accessed on 16 December 2023) using the Moderate Resolution Imaging Spectroradiometer (MODIS). Based on the main vegetation types shown in
Table 1 and
Figure 1b, forest fire incidents within forest areas were selected. A total of 783 forest fire incidents were gathered from 2015 to 2018, including notable fires such as the forest fire in Lijiang City in March 2015; the forest fire in Miyi County, Panzhihua City, in February 2016; and the forest fire in Muli County, Liangshan Prefecture, in November 2016.
2.2. Fire Point Unbalanced Data Preprocessing
A significant challenge in forest fire prediction is the data imbalance between fire point and non-fire point. Fire events are relatively rare, leading to far fewer fire point data compared to non-fire point data, which can affect the model’s accuracy and predictive performance for minority classes [
36,
37]. In this study, non-fire point data were selected based on the semi-variogram function method [
38]. A buffer zone was established for each fire point using a semi-variogram function based on a spherical model (Equation (1)). The distance at which the model first flattens is called the range, i.e., the buffer radius. Sample positions separated by distances within the range are spatially correlated, while sample positions separated by distances beyond the range are not spatially correlated.
where
γ(
h) is the semi-variogram function value,
C0,
C,
a are model parameters, and
h is the spatial distance.
Due to seasonal variations in vegetation and meteorological conditions, conditions at the same time each year are similar. Therefore, positions with high fire risk at a particular time are also likely to have high fire risk at similar times in adjacent periods and years. Using multi-temporal data, a fire buffer overlay image was established, selecting only data outside any buffer zones as the range for non-fire point data selection. Within this range, typical non-fire points were randomly generated at a ratio of 1:1.2 to fire points. Then, the influencing factor data corresponding to these typical non-fire points were extracted for the corresponding dates.
2.3. Driving Factors
The damage caused by forest fires is influenced by various factors, including types of vegetation, fuel characteristics, topography, weather and climatic conditions, and human activities [
39]. In this study, we considered 11 important parameters for the years from 2015 to 2017 from four categories (meteorological, topographic, vegetation factors, and anthropogenic factors).
Figure 3 illustrates these factors, spatially mapped across the study area.
Vegetation Factors: The rates of forest fire occurrences are heavily affected by land cover and types of vegetation [
12]. We used the 500 m spatial resolution MODIS Normalized Difference Vegetation Index (NDVI) dataset to analyze the vegetation status in the study area. Additionally, to gain a more comprehensive understanding of vegetation factors, we supplemented with microwave data VOD. VOD data were sourced from the VODCA product provided by the University of Texas, Vienna [
40], which integrates and reprocesses VOD data from multiple microwave sensors. Bilinear interpolation was used to resample VOD data to the 500 m spatial resolution consistent with the NDVI in this study.
Meteorological Factors: These factors include temperature, precipitation, humidity, and wind speed, and play key roles in the occurrence and spread of forest fires, impacting forest fire differently under varying conditions. In conditions of low precipitation, low relative humidity, and high temperatures, forest surface fuels are more likely to ignite [
41]. Wind not only accelerates the evaporation and drying process of soil and surface fuels before fires, but also injects fresh fuel for flames after forest fire occurrence, influencing the spread speed. When wind speeds increase, and dry and hot conditions occur simultaneously, forest fires can be ignited quickly and spread rapidly [
42]. All meteorological data in this study were from the European Centre for Medium-Range Weather Forecast (ECMWF) global atmospheric reanalysis dataset. Relative humidity data were calculated using the dew point and air temperature, while wind speed data were obtained by combining meridional (U) and zonal (V) wind speeds to calculate horizontal 10 m wind speeds. All factors were adjusted to match the resolution of the MODIS vegetation indices.
Topographic Factors: Topography significantly influences local climate conditions such as wind and precipitation, which in turn affect forest fire behavior. For example, variations in elevation and slope can alter wind flow and humidity levels, making certain areas more susceptible to be ignited. Mountain regions are often more sensitive to temperature changes at higher elevations, which can intensify fire behavior by altering local temperature and precipitation patterns [
43]. In this study, we considered slope, aspect, and elevation. The derived variables such as slope and aspect were calculated from the Digital Elevation Model (DEM). The DEM data used were sourced from the Shuttle Radar Topography Mission (SRTM) and accessed via the NASA Earth Data platform (
https://earthdata.nasa.gov/, accessed on 17 December 2023.), with a resolution of approximately 90 m.
Anthropogenic Factors: Human activities significantly impact forest fires, with areas near transportation infrastructure experiencing frequent human activities, increasing fire risk [
44]. In our study, we selected the distance to railways and highways as anthropogenic driving factors. Euclidean distances to railways (
Figure 3j) and highways (
Figure 3k) were calculated using buffering tools.
2.4. Training and Testing Datasets
The training samples in our experiment included 5078 labeled locations, consisting of 2308 fire event data points and 2770 non-fire event data points. Each data point included a 7-day record of relevant driving factors for the respective area. Time series information was extracted from these labeled locations, with data combinations every 7 days forming a forest fire dataset with time series influences. To train the forest fire occurrence probability model, we used two sets of labels 0 and 1, with 1 indicating fire events and 0 indicating no fire events. During training, data from 2015 to 2017 were split into 80% for model training and 20% for validating the model’s performance, while the data from 2018 were used for prediction.
2.5. Forest Fire Occurrence Prediction Models
In this study, the Long Short-Term Memory (LSTM) algorithm, augmented with an attention mechanism, was presented to develop a predictive model for forest fire occurrence probability. LSTM is a type of Recurrent Neural Network (RNN) specifically designed to overcome the limitations of traditional RNNs in handling long-term dependencies. It achieves this goal through a special gating mechanism, which helps retain important information over longer periods while reducing the risk of vanishing gradients. The method’s structure is built on a multi-layer neural network, combining the LSTM’s time series processing capability and the attention mechanism’s feature focusing ability, providing a powerful solution for binary classification tasks. This model captures dependencies in time series data through LSTM layers and enhances prediction performance by focusing on key features with the attention mechanism. As shown in
Figure 4, the architecture was composed of an input layer
Li, an output layer
Lo, and LSTM layers
Xi (where
i ∈ {1, 2,…, 6}), with an Attention Module between the input and output layers. The input layer
Li received time series data with multiple features at each time step. The LSTM layers, each employing the tanh activation function (as defined in Equation (2)), processed the input and returned the complete output sequence. Following this step, a Dropout layer was applied to prevent overfitting. The Attention layer focused on important features by calculating attention weights for time steps, enhancing the model’s learning ability. The output from the Attention layer was flattened and passed into the dense layer using the tanh activation function for subsequent processing. The Sigmoid function is advantageous for binary classification due to its smooth gradient and output range of (0, 1) [
45]. Therefore, the output layer
Lo used the Sigmoid activation function to convert the output to binary classification probability values.
2.6. Fire Risk Combined Assessment
In this study, we considered forest fire risk as the combined result of forest fire occurrence probability and potential burn probability. Using fire occurrence probability and potential burn probability, forest fire risk was estimated using Equation (3), as follows:
where OP is the fire occurrence probability (0 < OP < 1), and BP is the fire potential burn probability (0 < BP < 100%).
The potential burn probability (BP), calculated based on the outcomes of fire spread simulations according to Equation (4), represents the proportion of times a given location burns out of the total number of simulations. To determine the BP, we randomly selected points from the high and very-high forest fire occurrence probability areas predicted by the forest fire occurrence probability model as fire points. This random selection is an inherent feature of the Monte Carlo simulation, which relies on stochastic sampling within a defined range to account for variability and uncertainty in fire spread. These points were then utilized for Cellular Automaton-Monte Carlo simulations, excluding locations in non-combustible areas like roads, residential zones, and water bodies. The inputs for the Cellular Automaton model included temperature, humidity, wind speed, and NDVI data. Monte Carlo simulations were used to randomly select ignition points within the high and very-high forest fire occurrence probability areas, simulating potential fire spread behavior. In this study, the total number of simulations was 100, which ensures a good balance between statistical stability and computational efficiency, meaning that 100 fire spread simulation experiments were conducted.
2.7. Evaluation Metrics
To validate the accuracy of the proposed forest fire occurrence prediction models, three common metrics were applied in our study: Accuracy, F1 score, and the AUC, as shown in
Table 2.
Accuracy is an indicator of the model’s overall prediction accuracy, defined as the ratio of correctly predicted cases, True Positive (TP) and True Negative (TN), to all cases, True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The F1 score is the harmonic mean of precision and recall, used to comprehensively evaluate the model’s performance. It balances precision and recall and is commonly used when there is class imbalance. A high F1 score (typically above 0.8) indicates good performance in both precision and recall.
The AUC is used to evaluate the performance of classification models, where an AUC value close to 0 indicates random prediction, while an AUC value close to 1 indicates high prediction accuracy. The AUC is derived from the Receiver Operating Characteristic (ROC) curve, which illustrates the trade-off relationship between the false positive rate () and the true positive rate (), quantifying the model’s overall performance.
3. Results
In the study, we established and optimized a forest fire occurrence probability model to evaluate its predictive capabilities for fire occurrence. As mentioned above, to better assess the effectiveness of the forest fire occurrence probability model,
Table 3 presents the quantitative results of different evaluation metrics for five different models. The Multi-Layer Perceptron (MLP) model serves as a baseline for comparison in this study, to highlight the effectiveness of using LSTM for a time series data analysis.
As shown in
Table 3, the forest fire occurrence probability model combining microwave remote sensing data and the attention mechanism performed the best, achieving a prediction accuracy of 88.82%, an AUC of 0.9608, and an F1 score of 0.8869. Compared to the model without VOD data, the prediction accuracy increased by 1.55%, and the inclusion of the attention mechanism enhanced the model’s generalization ability, improving the prediction accuracy by 1.78%. Using time series for prediction, the model significantly outperformed models without such data, with an accuracy of 84.46%. Time series played a crucial role in capturing trends and changes in meteorological and vegetation factors shortly before fire occurrences, significantly enhancing the model’s predictive performance. The ROC curve for the best-performing model, which achieved a prediction accuracy of 88.82%, is depicted in
Figure 5. This ROC curve illustrates the model’s excellent ability to distinguish between different classes, demonstrating its robustness and reliability in practical applications.
To validate the practical application of the model, we used the final optimized model to predict fire risk in the study area and generated a fire occurrence probability map. We employed five classes: very-high, high, moderate, low, and very-low, established using the natural break algorithm [
46], a standard method for grouping similar values to classify fire occurrence probability. The results of the classification are shown in
Figure 6.
Analyzing the fire occurrence probability map reveals that most areas within the study region have a low and very-low probability of forest fire occurrence. The high and very-high probability areas are mainly concentrated in densely vegetated and dry regions, particularly near some railways and highways. Conversely, low probability areas are primarily distributed in sparsely vegetated or moist regions. The high temperatures in the region have increased the probability of forest fires. Due to the distribution of forests, the probability of forest fires is relatively high in the areas of Liangshan Prefecture and Panzhihua. Meanwhile, the probability of forest fires is slightly higher in the areas near the Sichuan–Yunnan border compared to locations further away.
By applying the fire risk combined assessment method, which integrates the potential burn probability (
Figure 6b) with the fire occurrence probability, the resulting fire risk map (
Figure 6c) shows a significant reduction in the proportion of high and very-high risk areas compared to the fire occurrence probability alone (
Table 4). The very-high risk areas are more clearly defined, allowing for more targeted fire prevention measures.
4. Discussion
In the study, we proposed a new method for forest fire risk assessment by integrating multi-source remote sensing data (including microwave and optical remote sensing) with deep learning models. The results indicate that the method has advantages in predicting forest fire occurrence probabilities and assessing fire risk. By introducing VOD as a vegetation influencing factor, the model’s sensitivity to forest fire occurrence was improved. Compared to traditional methods relying solely on optical remote sensing data, VOD data better reflect vegetation water content, enhancing the model’s ability to capture fire occurrence conditions. Additionally, the deep learning model combining LSTM networks with attention mechanisms effectively handled time series data, capturing trends and changes in meteorological and vegetation factors before fire occurrences. The evaluation metrics, including accuracy, F1 score, and the AUC, all showed high accuracy, further validating the model’s effectiveness.
This study comprehensively utilized various data sources, including optical remote sensing data, microwave remote sensing data, meteorological data, topographic data, and anthropogenic factors. This diversity enriched the feature dimensions of the model inputs and enhanced the model’s adaptability to the complex forest fire environment. Furthermore, the inclusion of distances to railways and highways as anthropogenic factors allowed the model to reflect the impact of human activities on forest fire risk, highlighting the significance of human-caused fires in forests.
We utilized a combination of cellular automaton and Monte Carlo simulation methods to perform fire spread simulations and calculate potential burn probability for points within the study area with predicted forest fire occurrence probabilities in the high and very-high regions.
Figure 6c shows the results of fire risk assessments based on two different fire risk-related factors (fire occurrence probability in
Figure 6a and potential burn probability in
Figure 6b). The final results indicate that fire risk is significantly influenced by the combined effects of fire occurrence probability and potential burn probability. The very-high risk areas show high fire occurrence probabilities and high potential burn probabilities. Medium risk and low risk areas are characterized by high and moderate fire occurrence probabilities, but lower potential burn probabilities reduce the overall fire risk. Very-low risk areas are mainly distributed in sparsely vegetated or moist regions, with both low fire occurrence probabilities and potential burn probabilities. This comprehensive assessment method enables us to more accurately identify high fire risk areas, making forest fire prevention measures more targeted. By comprehensively assessing fire occurrence probabilities and potential burn probabilities, high fire risk areas within the study region were effectively identified. It provides scientific support for forest fire management departments to develop targeted fire prevention measures and resource allocation strategies. Additionally, the results of fire spread simulations help quick responses in the early stages of fire occurrences, reducing the damage to ecosystems and human society.
However, our study has some limitations. Firstly, the study area is limited to specific regions in Southwestern China, and the model’s applicability and generalizability need further validation. Secondly, for potential burn probability studies, we used a combination of cellular automaton and Monte Carlo simulation methods, which can be influenced by initial conditions and rule settings [
33]. This model lacks a detailed consideration of environmental factors and complex terrains, making it difficult to accurately reflect the interactions of multiple factors during an actual fire spread [
47]. Additionally, the impact of lightning-caused fires has not been adequately accounted for in this study. Although the proportion of lightning-induced forest fires in Southwestern China, such as in Muli County, Sichuan (46.7%) [
48], is lower compared to northern regions, such as the Greater Khingan Mountains (68.28%) [
49], lightning-caused fires remain a significant factor that should not be overlooked. Future studies could incorporate more detailed lightning data and consider its spatial and temporal distribution to improve the accuracy of fire occurrence models in different regions. Furthermore, training and tuning the deep learning model for forest fire occurrence probability is complex and requires significant computational resources, posing potential challenges for practical applications. Future research can improve potential spread simulations by combining deep learning spread prediction methods to obtain more accurate potential burn probabilities.