Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach

Sharma, Sanjeev; Khanal, Puskar

doi:10.3390/fire7060205

Open AccessEditor’s ChoiceArticle

Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach

by

Sanjeev Sharma

^*

and

Puskar Khanal

^*

Department of Forestry and Environmental Conservation, College of Agriculture, Forestry and Life Sciences, Clemson University, Clemson, SC 29634, USA

^*

Authors to whom correspondence should be addressed.

Fire 2024, 7(6), 205; https://doi.org/10.3390/fire7060205

Submission received: 20 March 2024 / Revised: 20 May 2024 / Accepted: 13 June 2024 / Published: 18 June 2024

(This article belongs to the Special Issue Machine Learning (ML) and Deep Learning (DL) Applications in Wildfire Science: Principles, Progress and Prospects)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The study of forest fire prediction holds significant environmental and scientific importance, particularly in regions like South Carolina (SC) with a high incidence rate of forest fires. Despite the limited existing research on forest fires in this area, the application of machine learning and neural network techniques presents an opportunity to enhance forest fire prevention and control efforts. Utilizing data of forest fire from the SC Forestry Commission for the year 2023, prediction models were developed incorporating various factors such as meteorology, terrain, vegetation, and infrastructure—key drivers of forest fires in SC. Feature importance analysis was employed to construct the final fire prediction model using different machine learning and neural network approaches including Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), Artificial Neural Network (ANN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). Correlation coefficients analysis was employed to construct the final fire hazard map using a correlation test. The evaluation of predictive performance based on accuracy scores revealed that the DT model achieved the highest accuracy of 90.58%, surpassing other models. However, based on the kernel density map of the fire data from 2000 to 2023, the correlation test gave the better fire hazard map compared to any other machine learning or neural network approach that utilized feature importance. Nonetheless, all models achieved prediction accuracies exceeding 80%. This finding directed us to the approach based on the correlation coefficients rather than to those just based on feature importance. The overlap between fire locations and carbon hotspots provided the immediate need to mitigate the carbon loss due to fire in those locations. These results serve as a valuable resource for forest fire prediction in SC, demonstrating the efficacy of the correlation test, providing a theoretical foundation and data support for future forestry applications in the region, and showing the outperforming capability of this method compared to other approaches based on feature importance and the importance to prioritize areas to mitigate the climate change impact based upon fire prediction.

Keywords:

machine learning; neural network; Google Earth Engine; ArcGIS Pro; fire

1. Introduction

Forests serve as a crucial natural asset, playing a pivotal role in maintaining environmental equilibrium. They represent a valuable natural resource globally, with particular significance for local economies. Monitoring the condition of forests, including their shape and overall health, provides valuable insights into the state of the environment [1]. In many communities residing on the fringes of forests, these wooded areas are integral to their economic activities and hold significant social value [1]. Additionally, forests contribute to local weather patterns, which highlights their multifaceted importance [2]. Before the onset of industrialization, forests spanned approximately 5.9 billion hectares of the earth’s land surface. However, due to various factors, including human activities, this coverage has dwindled to around 4 billion hectares, accounting for roughly 31% of the planet’s land area [3].

In recent years, wildfires have emerged as a critical concern, posing a recurring threat to vast forested areas worldwide [4]. These wildfires wreak havoc on ecosystems, cause substantial damage to infrastructure, and pose significant risks to human life. Notable recent wildfires ravaged regions such as Australia, the Amazon, and the United States [5]. The impact of wildfires extends beyond immediate destruction, often causing irreparable harm to wildlife and resulting in substantial economic losses [6]. Similarly, wildfires pose a significant threat to forests, particularly impacting young trees and contributing to deforestation. The frequency of forest fires has shown an alarming upward trend over the years due to climate change, underscoring the urgent need for comprehensive research efforts to monitor and implement effective measures for mitigating the wildfire menace [6]. As wildfires become increasingly complex hazards, they pose significant challenges to habitats, communities, and economies alike.

Forest fire susceptibility mapping serves as a crucial tool in preemptive measures against forest fires and aid in the identification of high-risk areas prone to such occurrences [7]. By providing spatial insights into the likelihood of forest fires, these maps enable managers and planners to mitigate potential damages to forest ecosystems, reduce casualties, and minimize economic losses [7,8]. Thus, forest fire susceptibility maps play a pivotal role in effective forest fire risk management strategies. Different tools and software such as global mapper version 25.1, the Environment for Visualizing Images (ENVI) version 6.0, ERDAS IMAGINE version 16.8, Google Earth Engine (GEE) version 7.3, and eCOGNITION version 10.4 are available, but for large-scale analyses, Google Earth Engine (GEE) version 7.3 is superior [9].

The GEE (https://earthengine.google.com, accessed on 23 January 2024) platform provides unparalleled access to a vast array of satellite and geospatial data, spanning over four decades of earth observation imagery. This wealth of data enables users to conduct comprehensive comparisons and longitudinal analyses, covering extensive geographical areas, contingent upon the coverage of the specific satellite [9]. One of the primary advantages of GEE is its seamless access to satellite data archives. Additionally, GEE is coupled with regularly updated robust computing resources, facilitating the processing of large-scale geospatial and remote sensing datasets, which are regularly updated. GEE is used in web application development or programming. This accessibility extends the reach of geospatial analysis to a broader audience, fostering collaboration and innovation in the field. Notably, GEE offers a diverse range of satellite data, including but not limited to, Landsat, Sentinel-2, and NEXRAD, empowering users to integrate various datasets into their applications for a wide range of scientific and societal endeavors. Its user-friendly interface and robust support for coding and APIs make it particularly convenient for processing spatial data [10,11]. The utilization of GEE has witnessed a steady surge across various research domains. Notably, researchers have leveraged GEE for successful analyses spanning land cover classification, forest and vegetation studies, ecosystem monitoring, and agricultural assessments [12,13,14,15]. This widespread adoption underscores the platform’s versatility and effectiveness in addressing diverse spatial analysis needs, thereby solidifying its position as a go-to tool for researchers worldwide.

GGE was used in the past to predict fire in small areas, and various factors were employed to develop fire risk maps. However, the assessment of the performance of GGE in large area like states has not been conducted. Furthermore, there is a need to select the best machine learning or neural network approach for the best prediction for the state of SC. In this study, GEE and six machine learning algorithms (Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Artificial Neural Network (ANN), and Convolution Neural Network (CNN)) were used to construct and evaluate a forest fire susceptibility model for SC. The occurrence of forest fires was assessed through the analysis of forest fire data in relation to climate, topography, hydrology, and human activities. The results are expected to be used for the protection of forest resources, biodiversity conservation, and environmental management planning. Therefore, the objectives of this study were to (1) identify the factors affecting forest fire in SC; (2) identify the best prediction model for forest fire.

2. Methods

2.1. Study Area

This study was conducted in SC, U.S.A. (Figure 1). SC is the 40th-largest and 23rd-most populous U.S.A. state, with a recorded population of 5,124,712 according to the 2020 census. In 2019, its GDP was USD 213.45 billion [16]. SC is composed of 46 counties. Within SC, from east to west, there are three main geographic regions, i.e., the Atlantic coastal plain, the Piedmont, and the Blue Ridge Mountains in the northwestern corner of upstate SC. SC has primarily a humid subtropical climate, with hot, humid summers and mild winters. Areas in the Upstate have a subtropical highland climate. Along South Carolina eastern coastal plain, there are many salt marshes and estuaries. SC southeastern Lowcountry contains portions of the Sea Islands, a chain of barrier islands along the Atlantic Ocean. In the summer, SC is hot and humid, with daytime temperatures averaging between 30 and 34 °C in most of the state, and overnight lows averaging 21–24 °C on the coast and 19–23 °C inland. Winter temperatures are much less uniform in SC. The coastal areas of the state have very mild winters, with high temperatures approaching an average of 16 °C and overnight lows around 4 °C.

2.2. Data Collection

Fire point location data for 2023 were obtained from the SC Forestry Commission and seamlessly integrated into Google Earth Engine. To ensure a balanced dataset, an equal number of random non-fire points were generated within Earth Engine along SC. Additionally, various raster layers depicting critical features such as roads, rivers, ponds or lakes, and settlements were meticulously created and imported into Earth Engine. These raster layers, sourced from different sites and repositories, were then carefully clipped based on the SC state shapefile to ensure spatial accuracy and relevance (Table 1).

Several datasets were incorporated to enrich the analysis with essential environmental variables. The National Land Cover Dataset (NLCD) for 2021 was added and clipped to match SC boundaries, providing valuable land cover information. Furthermore, sentinel imagery (COPERNICUS/S2_SR_HARMONIZED) captured between May and October 2023 was introduced into Earth Engine for calculating the average Normalized Difference Vegetation Index (NDVI). Images with less than 1% cloud cover were selected and clipped based on SC region to minimize seasonal variations in NDVI. Additionally, topographic data such as Digital Elevation Model (DEM) layers were included in Earth Engine, enabling the calculation of slope and aspect. Lastly, meteorological data from Terra Climate were integrated, offering crucial insights into temperature, precipitation, and other climatic factors essential for a comprehensive fire analysis. All these factors were selected based upon previous studies.

The block diagram outlines the data collection process, involving the acquisition of diverse datasets from multiple sources. These datasets included meteorological data, topographic information, land cover data, and vegetative indices. Following data gathering, an exploratory analysis was conducted as part of the preprocessing. This step involved the removal of noisy data and the conversion of categorical variables into numerical formats to prepare the data for further analysis. Once preprocessing was complete, feature extraction was performed using the location data of fire incidents. Relevant features were extracted to capture key characteristics of the landscape and environmental conditions at the fire locations. Finally, detailed analysis techniques were applied to the extracted features to uncover patterns, trends, and relationships, providing valuable insights into fire behavior and factors influencing fire occurrence (Figure 2).

The workflow diagram outlines a systematic approach to analyzing fire risk, beginning with the extraction of various features for both fire and random points. Subsequently, 80% of the dataset was designated for training, while the remaining 20% was reserved for testing the selected model. Model selection prioritized accuracy scores, ensuring the identification of the most suitable model for predicting fire risk. Upon selecting the model, a final fire risk map was generated based on the feature importance of different factors, allowing for a comprehensive assessment of fire risk across the study area (Figure 3).

2.3. Data Analysis

In 2023, 1076 fire incidents data sourced from the SC Forestry Commission (SCFC) were imported into Google Earth Engine. An equivalent number of random or non-fire points were generated and meticulously processed to remove any overlapping points, ensuring data integrity. Additionally, various vector layers containing information on infrastructure such as roads, rivers, lakes or ponds, and settlements were imported, alongside raster layers such as the National Land Cover Database (NLCD).

The process continued with the integration of harmonized Sentinel-2 imagery with less than 1% cloud cover. From this imagery, the Normalized Difference Vegetation Index (NDVI) layer was derived, a critical indicator calculated using the traditional formula:

NDVI = (NIR − R)/(NIR + R)

(1)

where NIR represents near-infrared, and R represents red values. Furthermore, a Digital Elevation Model (DEM) raster layer was incorporated, allowing for the derivation of slope and aspect layers.

TerraClimate data enriched the analysis by providing essential meteorological variables such as precipitation accumulation, mean minimum and maximum temperatures, soil moisture, vapor pressure, runoff, and wind speed (Figure 4). Euclidean raster layers were generated using ArcGIS Pro version 3.2.2 for roads, rivers, lakes or ponds, and settlements and then imported into Google Earth Engine. These layers, along with others like land cover, NDVI, DEM, slope, aspect, and TerraClimate data were used to extract the values. To ensure consistency, all raster layers were resampled to a uniform spatial resolution of 100 m. Feature extraction from raster layers was conducted to augment the fire and non-fire points. Subsequently, these points were exported and subjected to comprehensive analysis using Python 3.11.5. Given the categorical nature of the land cover layer, it was converted into dummy variables, yielding 13 distinct land cover classes. A row was eliminated if a value was missing in that row.

The data underwent min–max normalization in Python version 3.11.9, and various libraries were employed for a detailed analysis. Diverse machine learning algorithms and neural networks, including RF, LR, SVM, DT, ANN, and CNN, were utilized for prediction and model selection. The dataset was split into training (80%) and testing (20%) subsets, with model selection based on higher accuracy scores and lower classification errors. For weighted overlay, all raster layers underwent multiplication with the feature importance scores obtained from the top three models DL, RF, and LR based on accuracy (Figure 4) and the correlation coefficient based upon the correlation test. This process resulted in the creation of a composite layer, serving as the final fire hazard map layer. This hazard map was then compared to the Kernel Density Estimation (KDE) map generated by the fire location data from 2000 to 2023. KDE is a method used to analyze and visualize the spatial distribution of point data. The KDE and the hazard maps generated by the different machine learning and neural network approaches were compared. All raster layers adhered to the WGS 1984 geographic coordinate system within Google Earth Engine (GEE). The carbon data for the state of SC were collected from the SC Forestry Commission. The carbon data were later used to compare the fire areas with the carbon hotspot areas. Getis-Ord Gi* (Gi-star) hotspot analysis is a statistical method used in spatial analysis to identify clusters of high or low values in a dataset. Carbon hotspots were identified by this method. This technique is particularly useful for detecting areas with significantly higher or lower values than expected, which are referred to as “hotspots” or “cold spots”, respectively.

A correlation matrix was also generated to see the correlation between factors. A correlation matrix is a statistical tool used to quantify the relationships between variables in a dataset. It consists of a table where each cell represents the correlation coefficient between two variables. These coefficients range from −1 to 1, where 1 indicates a perfect positive correlation, −1 indicates a perfect negative correlation, and 0 indicates no correlation. The correlation matrix provides valuable insights into the strength and direction of relationships between variables, helping researchers and analysts understand patterns and dependencies within their data. Implementing a correlation matrix involves calculating correlation coefficients using statistical methods such as Pearson correlation and visualizing the results using techniques like heatmaps. The discard of the factors was conducted based on the correlation values. If they were highly correlated, one of them was removed for model prediction. The following approaches were used to predict forest fire accuracy based upon multiple factors.

2.4. Random Forest (RF)

It was introduced by Breiman in 2001 and stands as a non-parametric supervised method applicable to both classification and prediction tasks [17,18]. RF is essentially a combination of DTs, where each tree independently contributes to the assignment of the most frequent class to the input data. The final class prediction is determined by the majority vote of all trees in the forest [19]. In RF models, trees are grown from different subsets of training data to enhance diversity, thereby achieving greater classifier stability [19]. The data subsets excluded from the training of a particular tree, known as out-of-bag samples, are subsequently used for assessing accuracy and performance and calculating an internal unbiased estimate of the generalization error based on the number of trees [19]. By increasing the number of trees in the model, RF helps in reducing the generalization error and prevents overfitting. Effective attribute selection is crucial for RF model construction to maximize dissimilarity measures between classes. Common methods for attribute selection include gain ratio, Gini index, and chi-square [19,20]. During tree growth, each node is divided using the best split from a random subset of input features. While this makes individual trees less robust, it simultaneously reduces the correlation between trees, thereby lowering the generalization error and improving the model accuracy [19].

2.5. Multiple Logistic Regression (LR)

In fire probability modeling, the aim of multiple logistic regression is to develop a robust model that elucidates the relationships between the presence or absence of a fire event (the dependent variable) and a set of independent variables known as fire influencing factors [21]. Logistic regression was applied, as the dependent variable was dummy. The fire points were coded as 1, and the random points were coded as 0. The independent data were divided into two portions: 80% of them was used for training the model, and 20% for testing. A confusion matrix was also generated.

The logistic regression equation takes a general form, expressed as:

P = 1/1 + e^−z

(2)

Here, P represents the probability of fire occurrence, while z denotes the occurrence (coded as 1) or non-occurrence (coded as 0) of a fire event. The equation for z is

z = b₀ + b_1×1 + b_2×2 + … + b_n×n

(3)

In this equation, b₀ represents the intercept of the model equation, b_i (where i = 0, 1, 2 ……, n) signifies the model coefficients, and xi (where i = 0, 1, 2……, n) denotes the fire explanatory variables.

2.6. Support Vector Machine (SVM)

It was introduced by Cortes and Vapnik and stands as a prominent statistical machine learning algorithm featuring a supervised learning binary classifier [22,23]. Its widespread adoption has been witnessed for various real-world problems within Earth observations, notably in fire prediction scenarios [24,25]. Empirically, SVM operates by maximizing the margin between data points to establish an optimal separating hyperplane. This emphasis on maximizing the margin underscores the importance of support vectors, effectively downplaying the significance of other training examples, especially in the context of a linear classifier. SVM versatility extends to its ability to tackle classification and regression problems while accommodating large feature spaces, which sets it apart from other machine learning approaches [26,27]. Furthermore, SVM allows for flexibility in selecting a similarity function and effectively mitigates overfitting concerns. For a deeper understanding of SVM models, interested individuals can delve into previous comprehensive research [1,24,28,29].

2.7. Decision Tree (DT)

It is a popular machine learning algorithm used for both classification and regression tasks. It is a tree-like model where an internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or a continuous value. The tree is built by recursively splitting the dataset into subsets based on the value of an attribute [30]. The split is determined by selecting the attribute that best separates the data into different classes or reduces the uncertainty (e.g., entropy or Gini impurity) the most. The splitting process continues until one of the stopping criteria is met, such as reaching a maximum depth or having a minimum number of samples in a node, or when no further improvement can be made. Once the tree is constructed, it can be used to make predictions on new data by traversing the tree from the root node to a leaf node based on the attribute values of the data and then assigning the class label or value associated with that leaf node. The criterion for splitting a node in a DT is based on minimizing the standard deviation of the output values reaching that node, serving as a measure of error. The reduction in standard deviation (SDR) at a node is computed as follows:

S D R = m / | T | \times β (i) \times [s d (T) - \sum_{j \in (L, R)} \frac{|T j|}{|T|} \times s d (T_{j})]

(4)

where SDR is the decrease standard deviation, T is the series of samples that reach the node, m is the number of samples that have no missing values for this parameter, β(i) is a correction factor, and

T_{L}

and

T_{R}

are sets created by dividing on this parameter.

2.8. Artificial Neural Network (ANN)

It draws inspiration from biological neurons and represents a subset of artificial intelligence techniques adept at recognizing intricate patterns within data [31]. These networks comprise various nonlinear computational units that operate in parallel, structured in patterns [32]. Traditionally, an ANN is constructed with an input layer and an output layer, forming a multilayer perceptron, interconnected by one or more hidden layers [33]. This architecture enables the identification of non-linear relationships within data [34].

2.9. Convolution Neural Network (CNN)

It is one of the most popular deep neural networks. It takes this name from a mathematical linear operation between matrixes, called convolution. CNNs have multiple layers, including a convolutional layer, a non-linearity layer, a pooling layer, and a fully connected layer [35]. The convolutional and fully connected layers have parameters, but the pooling and non-linearity layers do not have parameters. The CNN has an excellent performance for machine learning problems.

2.10. Validation and Accuracy

The Area Under the Curve (AUC) for each model was generated, and the Receiver Operating Characteristic (ROC) curve was generated to validate the model. The AUC is a widely used model for evaluating the accuracy and performance of prediction models [36] by balancing the false positive rate on the X-axis and the true positive rate on the Y-axis. The AUC value ranges from 0.5 to 1; when the model does not predict well, the AUC value is close to 0.5, and when the model predicts well, it is close to 1. In general, AUC values greater than 0.8 indicate high prediction performance [37].

3. Results

The higher accuracy of fire prediction was obtained with the DT model, with 90.58% accuracy, followed by RF, with 88% accuracy, while the score was low for ANN. Similarly, the classification error was low for DT (9.42%) and high for ANN (Figure 5).

Correlation coefficients were obtained after the correlation test with different factors for forest fire prediction. Factors such as aspect, DEM, mean minimum temperature (Min_Temp), mean maximum temperature (Max_Temp), moisture, NDVI, precipitation, runoff, slope, vapor, wind speed (WindSpeed), distance from rivers (Drivers), distance from lake or pond (Dlake), distance from roads (Droads), distance from settlement (Dset), and all the landcover class dummies were used for correlation test. Positive and negative coefficients were obtained. Positive correlation was high for the landcover class 21, while negative correlation was high for distance to roads (Figure 6). This indicated that fire occurrence was maximum for the landcover class 21 and in areas closer to roads.

Same 28 factors were used for correlation matrix generation. Through all these factors, the heat map was generated, and a comparison was performed to determine if there was a high correlation between variables and discard factors before model construction (Figure 7). The heat map showed a high correlation of soil vapor and mean minimum temperature, so mean minimum temperature was eliminated in model building to increase the prediction performance of the model.

The final hazard maps were made (Figure 8) based upon the feature importance of different factors for predicting fire using different machine learning models. The feature importance of all factors was multiplied by their corresponding raster layer to form a composite image. The composite image was later downloaded and added in ArcGIS Pro for better visualization. Histogram equalization was performed to better symbolize the results. The landcover class 21 and the distance from roads showed higher importance compared to other factors (Figure 8). Though the accuracy scores for DT, RF, or LR were high, their coefficients, as represented by feature importance, did not predict the hazard map better compared to the correlation coefficients. The correlation test utilized the correlation coefficient values for raster overlay and generated the final hazard map, which was similar to the one generated by implementing the kernel density for the fire data from 2000 to 2023.

KDE was applied to the fire points from 2000 to 2023. The kernel density showed a higher hazard in the lower part of SC, which was like the result obtained using the correlation coefficients from the correlation test. Therefore, the correlation coefficients had a higher prediction capability compared to the feature importance obtained by different machine learning and neural network approaches, though the latter had a higher accuracy score. By examining the carbon hotspot map and the kernel density map, we identified common regions between these two maps (Figure 9).

ROC curves (Figure 10) from the top two models were generated for DT and RF. Feature importance from the DT and correlation coefficients from the correlation test were utilized for the hazard map. Though the AUC for the DT was high (96%), the hazard map generated using the DT was not precise, as feature importance did not generate a better hazard map compared to the correlation test—which provided the correlation coefficients that predicted a precise hazard map—as shown by comparing both maps to the KDE map.

4. Discussion

(1): Major Forest Fire Driving Factors in SC

In this study, twenty-eight factors influencing the occurrence of forest fires were selected. These factors could be divided into five categories: geographical location, meteorology, climate, topography, and vegetation. Researchers have studied the influencing factors of these forest fires [38,39]. The drivers of forest fires considered were mean maximum temperature, mean minimum temperature, cumulative precipitation, soil moisture, runoff, wind speed, vapor pressure, DEM, slope, aspect, NDVI, landcover, and distance from roads, rivers, lakes, and settlements. These were selected based upon past research. Landcover was included in our analysis and, thus, was an additional variable compared to past research. Different machine learning and neural network approaches provided the feature importance for each factor. Among all factors, the distance of roads from the fire points and the landcover class 21 had the higher importance. However, in this study, the feature importance of the factors from the machine learning and neural network approaches did not perform well. However, [40] used feature importance in different models and achieved a higher accuracy than our research, though it did not compare the results obtained with the actual long-term data. In [40], the authors used not only meteorology, vegetation, topography, and socio-humanities data but also included the fuel factors and obtained an AUC almost above 90% for all models they selected. However, the lack of long-term data for forest fires made it difficult to predict the precision of the hazard map. In our case, the correlation coefficients from the correlation test predicted a better fire hazard map, based upon the comparison with the KDE map. This suggests that the correlation coefficients are better for predicting fires compared to feature importance, which many researchers utilized before for fire hazard map prediction. Other criteria for generating coefficients are needed to better predict a hazard map instead of just relaying on feature importance. The results from this study showed a higher occurrence of fire in the southeast region. Better management in this region will decrease the impact of the fire effects on the community. Considering a previous study [41], the factors and conditions influencing fire occurrences operate differently depending on the region. This is due to the environmental and social conditions, which differ from country to country. The KDE and carbon hotspot maps showed the common regions; better management in these regions would help in the mitigation of carbon loss due to forest fire.

The assessment of factors influencing fire occurrence and risk was considered in several studies [41,42,43]. The main and critical factors in fire ignition and spread are the meteorological (temperature, relative humidity, precipitation, and wind speed) and climatic conditions. The assessment of factors influencing fire occurrence and risk was also investigated in [42]. This study reported that the most influencing variable is rainfall in the last 24 h, followed by temperature, wind speed, and relative humidity. According to [44], the most influencing factors in wildfire occurrence in Switzerland are (in order of influence) land cover, elevation, mean annual rainfall, and mean annual temperature. However, in our case, the most influencing factors were the distance to roads and the landcover class 21. The land cover class 21 includes areas with a mixture of construction materials and with vegetation mostly in the form of lawn grass. Impervious surfaces account for less than 20% of the total area. Our results indicate that human factors are the key factors for forest fire prediction.

Human factors also play an important role in the occurrence of forest fires. These factors include the distance from fire points to residential areas and special festivals. Frequent human activities, such as burning paper, reclamation, wasteland, or arson, will cause the frequency of forest fires to gradually increase [45]. Ref. [40] found that the anthropogenic factors are the most influencing factors in forest fire occurrence, which is aligned with our results. In the selection of the forest fire driving factors, this study first used multicollinearity tests and then selected features. In the test of collinearity, a correlation test was used to generate a heat map and to eliminate variables with high correlation.

(2): Choice of the Model

We compared the feature importance of each factor from each model (DT, RF, LR, ANN, CNN, and SVM) and used the correlation coefficients from the correlation test model to predict the hazard map. For this, we divided the data into training and testing sets and calculated the accuracy scores. As accuracy was above 80% for all models, we performed a raster-weighted overlay based upon feature importance and the correlation coefficients. We then evaluated them using ROC curves. We selected the correlation test as the optimal choice for forest fire prediction. The accuracies of all six models were above 80%, which means that they were all reliable. The results showed that the correlation coefficients allowed for the best prediction based upon the hazard map, compared to the DT, RF, LR, ANN, SVM, and CNN methods. Though the DT has high accuracy, can handle high-dimensional samples without factor screening and heterogeneous or missing data, has high training and prediction speed, and can effectively eliminate model overfitting, it utilizes feature importance, so its prediction of the hazard map was inaccurate compared to that of the correlation test, which utilizes the correlation coefficients. The ANN and CNN models can be trained very quickly and can handle samples with a large amount of data, but their accuracy in this study was relatively low. The SVM model has a high predictive ability, but it also has certain shortcomings. The higher the model complexity, the lower the calculation speed. It takes a long time for this model to obtain the optimal parameters when processing large amounts of data.

The most used machine learning algorithm for forest fires modeling is the LR [43,46,47], which is mainly employed in studies that use GIS data for wildfire ignition probability prediction; however, when we used the LR, feature importance did not allow us to obtain a precise hazard map. The DT learning algorithm and its ensembles, namely, the DT and the RF algorithms, are also applied in forest fire prediction and detection systems. The DT is suitable for this purpose due to its simplicity and ease of readability compared to other machine learning algorithms. The reported accuracies [48] for the DT and its ensembles are about 81.2%. The comparison reported in [41] indicated that the RF model showed higher predictive ability than the Multiple Linear Regression (MLR) model. We also compared the DT and the RF. Though they showed a higher accuracy score, the feature importance obtained from these models did not allow for a precise outcome when compared with the KDE map. This indicates the existence of region-specific requirements for predicting forest fires.

These results provide valuable insights into the potential causes of the spatial distribution of fire events throughout SC. Despite the strong human influence on fire occurrence in SC, physical variables—particularly those related to climatic conditions—should be included in the analysis, since they establish the natural setting that favors or hampers human actions. Fire management strategies should focus on the areas where there is a combination of landcover type 21 and a high density of roads. Preventive measures should be applied in such areas.

5. Conclusions

The issue of forest fires in SC has not received as much research attention as in other regions of the U.S.A. or globally. However, recent studies highlighted the significant influence of human activities on the occurrence of wildfires in this area, which surpassed the impact of climatic or topographical factors. Research findings suggest that anthropogenic factors exhibit a strong correlation with wildfire occurrences. This underscores the need to prioritize human-induced influences when assessing fire risk in the region. Moreover, studies have shown that traditional DT methods outperform other machine learning or neural network approaches in predicting wildfire occurrences. However, there was a lack of data to see whether this prediction corresponded to reality in the fire area. The correlation test predicted fires as it used the correlation coefficients rather than feature importance. When compared to the map obtained by the KDE, the correlation test coefficients predicted the fire hazard map better compared to the DT, though the latter had high accuracy. Appropriate management in the common areas of carbon hotspots and fire locations might help in mitigating the carbon loss due to fire.

To advance research in this area, there is a need for a well-defined and conceptually robust approach utilizing remote sensing data. Establishing such a framework will lay the groundwork for developing a fully automated fire risk assessment system for SC, U.S.A. These results can serve as valuable tools for proactive fire prevention and planning strategies. Future endeavors should focus on integrating the generated risk maps with seasonal risk indices and monitoring the periodic dynamics of forest fires. This holistic approach will enable the development of an application capable of accurately determining wildfire risk and predicting fire spread during extreme events, thus enhancing the overall preparedness and response efforts.

Author Contributions

S.S. wrote the original draft and P.K. helped in editing and comments. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Extension, Education and USDA Climate program, project award 2023-67022-39531, from the U.S. Department of Agriculture’s National Institute of Food and Agriculture.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The forest fire location data can be found in South Carolina forestry commission website, the shapefile of South Carolina is found in TigerLine website, the topographic data can be found in national map website, landcover data can be found in multi resolution land consortium website and other climate data from terraclimate website.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Aryal, J. Forest Fire Susceptibility and Risk Mapping Using Social/Infrastructural Vulnerability and Environmental Variables. Fire 2019, 2, 50. [Google Scholar] [CrossRef]
Moayedi, H.; Mehrabi, M.; Bui, D.T.; Pradhan, B.; Foong, L.K. Fuzzy-Metaheuristic Ensembles for Spatial Assessment of Forest Fire Susceptibility. J. Environ. Manag. 2020, 260, 109867. [Google Scholar] [CrossRef] [PubMed]
MacDicken, K.G. Global Forest Resources Assessment 2015: What, Why and How? For. Ecol. Manag. 2015, 352, 3–8. [Google Scholar] [CrossRef]
Sayad, Y.O.; Mousannif, H.; Al Moatassime, H. Predictive Modeling of Wildfires: A New Dataset and Machine Learning Approach. Fire Saf. J. 2019, 104, 130–146. [Google Scholar] [CrossRef]
Hantson, S.; Pueyo, S.; Chuvieco, E. Global Fire Size Distribution Is Driven by Human Impact and Climate. Glob. Ecol. Biogeogr. 2015, 24, 77–86. [Google Scholar] [CrossRef]
Tymstra, C.; Stocks, B.J.; Cai, X.; Flannigan, M.D. Wildfire Management in Canada: Review, Challenges and Opportunities. Prog. Disaster Sci. 2020, 5, 100045. [Google Scholar] [CrossRef]
Jaiswal, R.K.; Mukherjee, S.; Raju, K.D.; Saxena, R. Forest Fire Risk Zone Mapping from Satellite Imagery and GIS. Int. J. Appl. Earth Obs. Geoinf. 2002, 4, 1–10. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Gayen, A.; Edalat, M.; Zarafshar, M.; Tiefenbacher, J.P. Is Multi-Hazard Mapping Effective in Assessing Natural Hazards and Integrated Watershed Management? Geosci. Front. 2020, 11, 1203–1217. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Sidhu, N.; Pebesma, E.J.; Câmara, G. Using Google Earth Engine to Detect Land Cover Change: Singapore as a Use Case. Eur J Remote Sens 2018, 51, 486–500. [Google Scholar] [CrossRef]
Liu, X.; Hu, G.; Chen, Y.; Li, X.; Xu, X.; Li, S.; Pei, F.; Wang, S. High-Resolution Multi-Temporal Mapping of Global Urban Land Using Landsat Images Based on the Google Earth Engine Platform. Remote Sens. Environ. 2018, 209, 227–239. [Google Scholar] [CrossRef]
Hu, Y.; Hu, Y. Land Cover Changes and Their Driving Mechanisms in Central Asia from 2001 to 2017 Supported by Google Earth Engine. Remote Sens. 2019, 11, 554. [Google Scholar] [CrossRef]
Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K. Mapping Cropland Extent of Southeast and Northeast Asia Using Multi-Year Time-Series Landsat 30-m Data Using a Random Forest Classifier on the Google Earth Engine Cloud. Int. J. Appl. Earth Obs. Geoinf. 2019, 81, 110–124. [Google Scholar] [CrossRef]
Piao, Y.; Jeong, S.; Park, S.-J.; Lee, D.K. Analysis of Land Use and Land Cover Change Using Time-Series Data and Random Forest in North Korea. Remote Sens. 2021, 13, 3501. [Google Scholar] [CrossRef]
Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for Geo-Big Data Applications: A Meta-Analysis and Systematic Review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
U.S. Census Bureau. Apportionment Population and Number of Representatives by State: 2020 Census; U.S. Census Bureau: Suitland-Silver Hill, MD, USA, 2020. [Google Scholar]
Costache, R.; Hong, H.; Pham, Q.B. Comparative Assessment of the Flash-Flood Potential within Small Mountain Catchments Using Bivariate Statistics and Their Novel Hybrid Integration with Machine Learning Models. Sci. Total Environ. 2020, 711, 134514. [Google Scholar] [CrossRef] [PubMed]
Gokceoglu, C.; Nefeslioglu, H.A.; Sezer, E.; Bozkir, A.S.; Duman, T.Y. Assessment of Landslide Susceptibility by Decision Trees in the Metropolitan Area of Istanbul, Turkey. Math. Probl. Eng. 2010, 73, 1. [Google Scholar] [CrossRef]
Breiman, L. Random Forest; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2001. [Google Scholar]
Quinlan, J. C4. 5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993. [Google Scholar]
Hong, H.; Jaafari, A.; Zenner, E.K. Predicting Spatial Patterns of Wildfire Susceptibility in the Huichang County, China: An Integrated Model to Analysis of Landscape Indicators. Ecol. Indic. 2019, 101, 878–891. [Google Scholar] [CrossRef]
Bui, D.T.; Le, K.T.T.; Nguyen, V.C.; Le, H.D.; Revhaug, I. Tropical Forest Fire Susceptibility Mapping at the Cat Ba National Park Area, Hai Phong City, Vietnam, Using GIS-Based Kernel Logistic Regression. Remote Sens. 2016, 8, 347. [Google Scholar] [CrossRef]
Ireland, G.; Petropoulos, G.P. Exploring the Relationships between Post-Fire Vegetation Regeneration Dynamics, Topography and Burn Severity: A Case Study from the Montane Cordillera Ecozones of Western Canada. Appl. Geogr. 2015, 56, 232–248. [Google Scholar] [CrossRef]
Brown, A.; Petropoulos, G.P.; Ferentinos, K.P. Appraisal of the Sentinel-1 & 2 Use in a Large-Scale Wildfire Assessment: A Case Study from Portugal’s Fires of 2017. Appl. Geogr. 2018, 100, 78–89. [Google Scholar]
Gigovic, L.; Pourghasemi, H.R.; Drobnjak, S.; Bai, S. Testing a New Ensemble Model Based on SVM and Random Forest in Forest Fire Susceptibility Assessment and Its Mapping in Serbia’s Tara National Park. Forests 2019, 10, 408. [Google Scholar] [CrossRef]
Roy, D.P.; Huang, H.; Boschetti, L.; Giglio, L.; Yan, L.; Zhang, H.H.; Li, Z. Landsat-8 and Sentinel-2 Burned Area Mapping—A Combined Sensor Multi-Temporal Change Detection Approach. Remote Sens. Environ. 2019, 231, 111254. [Google Scholar] [CrossRef]
Tien Bui, D.; Van Le, H.; Hoang, N.-D. GIS-Based Spatial Prediction of Tropical Forest Fire Danger Using a New Hybrid Machine Learning Method. Ecol. Inform. 2018, 48, 104–116. [Google Scholar] [CrossRef]
Jain, P.; Coogan, S.; Subramanian, S.; Crowley, M.; Taylor, S.W.; Flannigan, M. A Review of Machine Learning Applications in Wildfire Science and Management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
Tehrany, M.; Jones, S.; Shabani, F.; Martínez-Álvarez, F.; Bui, D. A Novel Ensemble Modelling Approach for the Spatial Prediction of Tropical Forest Fire Susceptibility Using Logitboost Machine Learning Classifier and Multi-Source Geospatial Data. Theor. Appl. Clim. 2019, 137, 637–653. [Google Scholar] [CrossRef]
Debeljak, M.; Deroski, S. Decision Trees in Ecological Modelling. In Modelling Complex Ecological Dynamics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Skapura, D. Building Neural Networks; Addison-Wesley Professional: Boston, MA, USA, 1996. [Google Scholar]
Lippmann, R.P. An Introduction to Computing with Neural Nets; Association for Computing Machinery: New York, NY, USA, 1987; Volume 3. [Google Scholar]
Rosenblatt, F. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
Basheer, I.A.; Hajmeer, M. Artificial Neural Networks: Fundamentals, Computing, Design, and Application. J. Microbiol. Methods 2000, 43, 3–31. [Google Scholar] [CrossRef]
Naderpour, M.; Rizeei, H.; Ramezani, F. Forest Fire Risk Prediction: A Spatial Deep Neural Network-Based Framework. Remote Sens. 2021, 13, 2513. [Google Scholar] [CrossRef]
Shabani, S.; Jaafari, A.; Bettinger, P. Spatial Modeling of Forest Stand Susceptibility to Logging Operations. Environ. Impact Assess. Rev. 2021, 89, 106601. [Google Scholar] [CrossRef]
Nachappa, T.G.; Ghorbanzadeh, O.; Gholamnia, K.; Blaschke, T. Multi-Hazard Exposure Mapping Using Machine Learning for the State of Salzburg, Austria. Remote Sens 2020, 12, 2757. [Google Scholar] [CrossRef]
Syphard, A.D.; Radeloff, V.C.; Keuler, N.S.; Taylor, R.S.; Hawbaker, T.J.; Stewart, S.I.; Clayton, M.K. Predicting Spatial Patterns of Fire on a Southern California Landscape. Int. J. Wildland Fire 2008, 17, 602–613. [Google Scholar] [CrossRef]
Tian, X.; Zhao, F.; Shu, L.; Wang, M. Distribution Characteristics and the Influence Factors of Forest Fires in China. For. Ecol. Manag. 2013, 310, 460–467. [Google Scholar] [CrossRef]
Wu, X.; Zhang, G.; Yang, Z.; Tan, S.; Yang, Y.; Pang, Z. Machine Learning for Predicting Forest Fire Occurrence in Changsha: An Innovative Investigation into the Introduction of a Forest Fuel Factor. Remote Sens. 2023, 15, 4208. [Google Scholar] [CrossRef]
Oliveira, S.; Oehler, F.; San-Miguel-Ayanz, J.; Camia, A.; Pereira, J.M.C. Modeling Spatial Patterns of Fire Occurrence in Mediterranean Europe Using Multiple Regression and Random Forest. For. Ecol. Manag. 2012, 275, 117–129. [Google Scholar] [CrossRef]
Vasilakos, C.; Kalabokidis, K.; Hatzopoulos, J.; Matsinos, I. Identifying Wildland Fire Ignition Factors through Sensitivity Analysis of a Neural Network. Nat. Hazards 2009, 50, 125–143. [Google Scholar] [CrossRef]
Chang, Y.; Bu, R.; Chen, H.; Feng, Y.; Li, Y.; Hu, Y.; Wang, Z. Predicting Fire Occurrence Patterns with Logistic Regression in Heilongjiang Province, China. Landsc. Ecol. 2013, 28, 1989–2004. [Google Scholar] [CrossRef]
Dlamini, W.M. A Bayesian Belief Network Analysis of Factors Influencing Wildfire Occurrence in Swaziland. Environ. Model. Softw. 2010, 25, 199–208. [Google Scholar] [CrossRef]
Zeng, A.-C.; Cai, Q.-J.; Su, Z.-W.; Guo, X.-B.; Jin, Q.-F.; Guo, F.-T. Seasonal Variation and Driving Factors of Forest Fire in Zhejiang Province, China, Based on MODIS Satellite Hot Spots. Ying Yong Sheng Tai Xue Bao 2020, 31, 399–406. [Google Scholar] [CrossRef]
Vega-Garcia, C. Applying Neural Network Technology to Human-Caused Wildfire Occurrence Prediction. AI Appl. 1996, 10, 9–18. [Google Scholar]
Padilla, M.; Vega-García, C. On the Comparative Importance of Fire Danger Rating Indices and Their Integration with Spatial and Temporal Variables for Predicting Daily Human-Caused Fire Occurrences in Spain. Int. J. Wildland Fire 2011, 20, 46–58. [Google Scholar] [CrossRef]
Stojanova, D.; Panov, P.; Kobler, A.; Taskova, S.; Taskova, K. Learning to Predict Forest Fires with Different Data Mining Techniques. In Proceedings of the Conference on Data Mining and Data Warehouses (SiKDD 2006), Ljubljana, Slovenia, 9 October 2006. [Google Scholar]

Figure 1. U.S.A. map and SC state map as the study area.

Figure 2. Forest fire prediction block design.

Figure 3. General workflow from feature extraction to the prediction of fire location.

Figure 4. Raster layers of different factors used for final fire prediction which were created in GGE and imported in ArcGIS Pro for better visualization.

Figure 5. Accuracy scores of different machine learning and neural network models for predicting forest fires.

Figure 6. Correlation coefficient of different factors for fire prediction.

Figure 7. Heat map of the factors used for predicting forest fires.

Figure 8. Final fire prediction model after the weighted overlay of different raster layers at 100 m spatial scale resolution. The first and second figures from top show the use of feature importance for predicting the final fire hazard map using DT, and RT in earth engine, while the third or bottom figure shows the use of correlation coefficients for predicting the fire hazard.

Figure 9. The Carbon hotspot map of SC on the left and the KDE map of the fire location points from 2000 to 2023 on the right.

Figure 10. The ROC curve obtained from the DT. The accuracy was high for the DT but using it for the prediction of forest fire based on feature importance was not a better option than using the correlation coefficients obtained from the correlation test.

Table 1. Different sites and methods used for data collection.

Name	Website/Google Earth Engine Platform	Data (Shapefile or Raster)
Tigerline/Shapefile	https://www.census.gov/cgi-bin/geo/shapefiles/index.php (accessed on 23 January 2024)	Roads, Rivers, SC State, and Lakes or Ponds
TerraClimate	ee.ImageCollection(“IDAHO_EPSCOR/TERRACLIMATE”)	Average Minimum Temperature, Average Maximum Temperature, Precipitation Accumulation, Wind Speed, Soil Moisture, Vapor Pressure, and Runoff.
National Landcover Dataset	ee.ImageCollection(“USGS/NLCD_RELEASES/2021_REL/NLCD”)	Landcover or National Land Cover Dataset (NLCD)
Sentinel	ee.ImageCollection(“COPERNICUS/S2_SR”)	Normalized Difference Vegetative Index (NDVI)
NASA/USGS	ee.Image(“USGS/SRTMGL1_003”)	Digital Elevation Model (DEM), Slope, and Aspect
Fire Data	https://www.scfc.gov (accessed on 23 January 2024)	Fire Points
Carbon Data	https://www.scfc.gov (accessed on 23 January 2024)	Carbon in metric ton for each county of SC

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sharma, S.; Khanal, P. Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach. Fire 2024, 7, 205. https://doi.org/10.3390/fire7060205

AMA Style

Sharma S, Khanal P. Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach. Fire. 2024; 7(6):205. https://doi.org/10.3390/fire7060205

Chicago/Turabian Style

Sharma, Sanjeev, and Puskar Khanal. 2024. "Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach" Fire 7, no. 6: 205. https://doi.org/10.3390/fire7060205

APA Style

Sharma, S., & Khanal, P. (2024). Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach. Fire, 7(6), 205. https://doi.org/10.3390/fire7060205

Article Menu

Forest Fire Prediction: A Spatial Machine Learning and Neural Network Approach

Abstract

1. Introduction

2. Methods

2.1. Study Area

2.2. Data Collection

2.3. Data Analysis

2.4. Random Forest (RF)

2.5. Multiple Logistic Regression (LR)

2.6. Support Vector Machine (SVM)

2.7. Decision Tree (DT)

2.8. Artificial Neural Network (ANN)

2.9. Convolution Neural Network (CNN)

2.10. Validation and Accuracy

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI