Next Article in Journal
Cellular Automaton Model for Pedestrian Evacuation Considering Impacts of Fire Products
Next Article in Special Issue
Modeling Wildland Firefighters’ Assessments of Structure Defensibility
Previous Article in Journal
Combustion of Liquid Fuels in the Presence of CO2 Hydrate Powder
Previous Article in Special Issue
Attention-Based Wildland Fire Spread Modeling Using Fire-Tracking Satellite Observations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine-Learning Approach to Predicting Daily Wildfire Expansion Rate

Department of Geophysics, Porter School of the Environment and Earth Sciences, Tel Aviv University, Tel Aviv 69978, Israel
*
Author to whom correspondence should be addressed.
Fire 2023, 6(8), 319; https://doi.org/10.3390/fire6080319
Submission received: 14 July 2023 / Revised: 12 August 2023 / Accepted: 14 August 2023 / Published: 16 August 2023

Abstract

:
Accurate predictions of daily wildfire growth rates are crucial, as extreme wildfires have become increasingly frequent in recent years. The factors which determine wildfire growth rates are complex and depend on numerous meteorological factors, topography, and fuel loads. In this paper, we have built upon previous studies that have mapped daily burned areas at the individual fire level around the globe. We applied several Machine Learning (ML) algorithms including XGBoost, Random Forest, and Multilayer Perceptron to predict daily fire growth rate based on meteorological factors, topography, and fuel loads. Our best model on the entire dataset obtained a 1.15 km2 MAE. The ML model obtained a 90% accuracy when predicting whether a fire’s growth rate will increase or decrease the following day, compared to 61% using a logistic regression. We discuss the central factors that determine wildfire growth rate. To the best of our knowledge, this study is the first to perform such analyses on a global dataset.

1. Introduction

Wildfires are among the most dangerous and costly natural hazards. In recent years, the occurrence of extreme wildfires has become more frequent in numerous regions around the globe, most likely due to processes related to climate change (for example, [1,2]). Effective wildfire hazard management and firefighting strategies require accurate wildfire risk assessment, both prior to ignition and as the fire spreads. Various wildfire risk indices have been formed for this purpose. Despite the significant advances made in the field, prediction of wildfire characteristics remains a challenging task as wildfire behavior depends on numerous meteorological factors, topography, and fuel load (e.g., [3,4,5,6]). Wildfire prediction focuses on various wildfire characteristics, one of which is wildfire growth rate. Accurate wildfire growth prediction is crucial for planning the dispatch and safe deployment of firefighters and other resources on fires [7].
Scholars have tackled the challenge of wildfire prediction for decades and have suggested various methods of prediction. Most studies in the field can be divided into two categories: physical models, which simulate the physical processes of wildfires; and empirical studies, which assess wildfire characteristics based on historical data of wildfires. The two approaches have different advantages and disadvantages, according to the purpose of the study. Owing to high scholarly focus, both approaches have undergone substantial advancement in recent years.
Among the empirical studies, the most significant advancement in recent years can probably be attributed to the application of advanced Machine Learning (ML) models that have provided scholars with new tools in addition to the traditional statistical analyses [8]. ML models are becoming increasingly popular and have been applied in many scientific fields. ML can be defined as a data-centric approach in which algorithms improve automatically [8]. Naturally, the performance of ML models depends on the quantity and quality of available data. One of the central advantages of ML models is their relatively high performance when predicting non-linear phenomena. As wildfire behavior is affected by non-linear interactions between various factors, scholars have recognized the potential contribution of ML models to predictions in the field [8] and have demonstrated the high performance of ML models in wildfire prediction both regionally (e.g., [9,10]) and globally (e.g., [11,12]). Fortunately, in recent years, data required for wildfire prediction have become available not only in specific regions but also globally. This includes both datasets on wildfires (e.g., [13,14,15]) and datasets on factors which could be used as predictors, such as temperature or relative humidity (e.g., [16]). Researchers have identified the potential contribution of ML models to various fields of wildfire science, including the prediction of fuel characterization, fire detection, wildfire preparedness and response, fuel treatments, and many more [8]. One such field is the prediction of daily growth rate for existing wildfires, which could be used to assess ongoing wildfire risk, alert populations which are in danger, assist firefighters in planning effective preventive measures and a fire attack strategy, and more.
Scholars have been researching wildfire growth rate for decades. Studies in the field include both physical and empirical models (e.g., [17,18,19]). Physical wildfire spread models require high-resolution measurements of meteorological data, vegetation, and many more factors which influence wildfire behavior. In addition, physical simulators require high computational costs and are difficult to perform on a large scale. Additional research efforts have attempted to empirically assess wildfire spread rate based on historical data. The traditional empirical approach mainly includes linear regression models (e.g., [20,21]). The results of such models are combined with fire weather indices and are vastly used (e.g., [22,23]).
In recent years, several studies have successfully applied ML models to predict daily wildfire growth rate on a regional level. For example, [24] applied several ML models to predict whether a fire’s growth rate will become large and obtained an accuracy score of over 75%. Their data included almost 3000 wildfires in southwestern United States over a six-year period. The authors of [25] applied deep supervised ML models to predict which areas around a burning fire are likely to burn in the near future and outperformed FARSITE, a well-known physical fire modeling tool [17]. Their training and testing data were limited to the Rocky Mountain region of the United States, but they believe similar models can be trained and used in additional regions. The authors of [26] developed several ML models to predict wildfire rate of spread in Australian grassfires. Their data included 283 records from both actual wildfires and experiments. Some studies, such as [27], have trained ML models to approximate the results of wildfire simulations. The authors of [28] have developed a model using ML-based cellular automata and validated it using historical fire data from the Lushan area of Liangshan Prefecture, Sichuan Province.
The goal of the current study is to apply advanced ML models on a global dataset of wildfires in order to predict wildfire growth rate based on meteorological factors, topography, and fuel loads. Although previous studies have already predicted wildfire growth rate (among other wildfire characteristics) using ML models, this is the first study (to the best of our knowledge) implementing a global dataset analysis. New global wildfire datasets have recently become available from satellite observations (e.g., [13,14,15]), providing an opportunity to study wildfire behavior on a global scale. Training the models on a global dataset has several advantages. First, global datasets are composed of an enormous number of observations which are necessary for a proper training of ML models. Second, models trained on a global dataset can provide prediction capabilities and scientific insights which are not region dependent. Finally, the dataset we chose [14] not only provides burned areas at high spatial (250 m) and temporal (daily) resolution but also clusters burned areas to wildfires (allowing us to perform the analysis at the individual fire level as opposed to the regional burned area) and separates between fires that occurred in proximate regions.
The paper is organized as follows. We begin by providing a detailed summary of the data, followed by a description of the ML methods we apply in the paper. We then present the primary results of the study and the prediction accuracies of the different models, both regression models which predict the size of daily burned areas, and classification models which predict whether the daily burned area of a fire will increase or decrease in the following day. We analyze which factors are the most influential and have the highest impact on wildfire growth rate. In the final section, we discuss the contribution, implications and limitations of the study, and propose several directions for future research.

2. Research Design

2.1. Data

The target (dependent) variable in the study is the daily burned area in a specific fire. To measure the target variable, we processed the global wildfire dataset published by [14]. The dataset included Shapefiles describing the daily-burned polygons at the individual fire level with global coverage. The polygons were available in a 250 m resolution. For each fire, we calculated the total daily burned area for each day of the fire. The target variable in each observation of our dataset thus consisted of a daily burned area of a specific fire. We covered all global wildfires of 2016, resulting in a total of 2,409,079 daily observations of 1,024,926 different wildfires. Figure 1 presents a histogram of the daily burned area in the dataset.
For each observation, we included several variables describing the behavior of the specific wildfire in the days prior to the observation. These include the total area burned since the beginning of the fire, the mean daily area burned until the observation, and the area burned in the previous day as well as the mean area burned in the three days prior to the observation. While these variables could substantially improve the predictive performance of the model, they are only useful for predictions of daily burned area in real time, after the fire initiates. In contrast, they are not available for wildfire risk assessment before the ignition of a wildfire. We therefore repeated the analysis with and without these variables, so that the model’s performance would be evaluated according to its purpose. When removing observations that do not include all of the aforementioned data (i.e., observations from the first three days of the fire), the number of daily observations was reduced to 821,376. We also performed an analysis that only includes the first day of the fire, which included 584,877 observations.
Additional features (independent variables) included meteorological factors, topography, fuel characteristics, and regional fire history. While the wildfire data were provided in a relatively high resolution (250 m) some of the independent variables were only available at coarser resolutions. This means that the presented models provide a lower bound on the predictive performance and can theoretically be improved by higher resolution independent variables (although this is not necessarily desired, as this would also mean users would have to access such high-resolution data to apply our model). Most meteorological data were taken from the ERA5 hourly reanalysis dataset [16]. We used 2 m temperature, precipitation, relative humidity and 10 m wind velocity and direction. For precipitation and relative humidity, for each observation we included three features: present value, value in the previous month, and mean value in the previous year. We included these lagged variables as previous studies have shown that past meteorological factors are correlated with wildfire risk (e.g., [7]). We also included a variable for incoming short-wave solar radiation, obtained as a daily mean in 0.25° resolution regions [29].
For each observation, we included various fire weather indices. These indices included three groups: (1) Canadian Forest Service Fire Weather Index Rating System; (2) Australian McArthur Mark 5 Rating System; and (3) U.S. Forest Service National Fire-Danger Rating System. The variables in each group included (1a) fire weather index; (1b) build up index; (1c) danger index; (1d) drought code; (1e) duff moisture code; (1f) initial fire spread index; (1g) fine fuel moisture code; (1h) fire daily severity rating; (2a) Keetch–Byram drought index; (2b) fire danger index; (3a) spread component; (3b) energy release component; (3c) burning index; and (3d) ignition component. All data were available in 0.25° resolution and were obtained from the Copernicus Climate Change Service [30].
Topography can affect the growth rate of a wildfire, and consequently the area burned by the fire (e.g., [31]). We included the mean slope in a 0.1° × 0.1° region around the center of the fire, based on the dataset published by [32].
Previous studies have shown that population density has a substantial effect on the area burned by wildfires. We included population density based on the dataset of the Center for International Earth Science Information Network [33]. While the original dataset is provided with a resolution of ~1km, we calculated the mean population density in a 0.25° region whose center is located at the center of the fire.
Leaf area index (LAI) is a variable that describes the leaf material in a given location. LAI is a dimensionless variable that varies between 0 and approximately 10. LAI data at a 1/112° (~1 km) resolution were taken from [34]. The LAI data were separated into low and high vegetation; we also included a variable of their sum. In addition, we included the normalized difference vegetation index (NDVI), a dimensionless parameter which is commonly used to estimate the density of live green vegetation. NDVI is calculated as the difference between near-infrared (NIR) and red reflectance, divided by their sum [35]: N D V I = N I R R e d N I R + R e d . We included the NDVI value of the 0.25° region of the wildfire in the month before the fire, obtained from the NASA Earth Observations website [35].
Regional fire history was obtained from the Copernicus dataset ([13]), which maps monthly burned areas in a 0.25° resolution. For each observation we included two variables: one describing the total burned area in the region in the year prior to that observation, and one describing the mean monthly burned area in the region from January 2003 up to the month prior to the observation. Both variables excluded the month of the observation to prevent data leakage.
As a categorical variable, month of the year was transformed into 1-of-C dummy encoding [36]. Numerical features were standardized using the formula Z i = x i x ¯ δ x where x i are the original values, x ¯ is the mean value of the original values, and δ x is their standard deviation. The features and their sources are summarized in Table 1.
Scatterplots of the target variable (the daily burned area) and each of the features are presented in Figure A1 in Appendix A. Several wildfire indices are strongly correlated with daily burned areas (though not necessarily by a linear correlation); for example, KBDI, FFMC, and danger rating. Widespread wildfires appear to be rare with RH > 60% or temperature < 280° Kelvin and are very rare with any positive daily precipitation. As expected, highly populated areas are less prone to large wildfires. The latitude and longitude scatterplot reflect the familiar (e.g., [37]) global wildfire distribution, such as the peaks in the African longitudes and latitudes. The mean areas burned in the previous 1 and 3 days are positively correlated with the predicted burned area.

2.2. Methodology

We developed regression models which predict daily burned areas in a given fire. As stated in the previous section, we repeated the analysis either with or without data on the previous days of the fire. Since we included a variable of the mean areas burned in the previous three days of the fire, observations from the first three days of the fire lacked these data and were therefore discarded in the analysis which included previous fire data. For this reason, the subsets of the data were different in the two analyses. This prevents a fair comparison that could clearly present the contribution of the previous fire data. To resolve this issue, we could discard observations from the first three days of the fire in both analyses; however, this would limit the validity of our models to advanced (4+) days of the fire. Since predicting the first days of the fire is crucial in our opinion, we chose a different approach: we present the results of the models when limiting the data to advanced (4+) days of the fire either with or without previous fire days in order to provide a fair comparison, but we also provide the results of the model without previous fire days when applied on the entire dataset. The two approaches enable an evaluation of the models either for the purpose of quantifying the contribution of data from previous fire days, or for the purpose of prediction over the entire dataset. In addition, we present a model which only includes the first day of the fire. Such a model can quantify the potential risk that a fire would spread in its early stage if a fire were to ignite.
Our original data were extremely positively skewed, with skewness values of over 500 (Table 2). Developing models with such skewed data is problematic and could potentially lead to biased predictions. We addressed this issue using two different approaches, both of which are commonly used in the literature. The first approach is to apply a logarithmic transformation of the data. The logarithmic transformation reduces the skewness of the data and allows a more balanced prediction. The second approach is to use outlier detection and to remove particularly large observations. The advantage of the latter is that it enables prediction on the original scale of the data; however, its disadvantage is that it does not provide predictions of large values. We applied outlier removal using the median absolute deviation (MAD) value and removed observations whose values were not in the interval of median ± 3·MAD [38]. We present the histograms of daily burned areas in the three analyses along with their respective MAD boundaries in Figure 2. The skewness values before and after applying these techniques are presented in Table 2.
We evaluated the accuracy of these models by measuring the mean absolute error (MAE), as is common in many ML papers (e.g., [9,39,40]). In the analysis in which the data are transformed to a logarithmic form, we present both the MAE of the original (exponentiated) data and the MAE of the transformed data. The MAE of the transformed data is in fact the mean absolute percentage error (MAPE) divided by 100–MAPE, which is defined as M A P E = 100 · 1 n i = 1 n y i x i x i where x denotes the observation, y denotes the predicted value, and n denotes the number of observations [41]. The MAPE metric is less sensitive to errors in extremely large values (outliers) compared to metrics such as the MAE, which are less suitable when the data include observations of different scales [42]. MAPE cannot be applied when the target variable can reach zero, as this would produce an infinite error term; in the current study we did not include unburned observations, but rather only predicted the daily expansion rate of existing fires. Therefore, the use of MAPE is valid in this sense. In addition, we also present the mean squared errors (MSE) and root mean squared errors (RMSE).
An additional section is dedicated to predicting whether the expansion rate of a fire will increase or decrease in the following day. This becomes a classification problem in which the target variable is 1 if the burned area is larger than that of the previous day, or 0 in case it is not. For this specific analysis, we added features which describe the difference between the current meteorological factors and fire indices and their values in the previous day. Different studies use different metrics for classification problems, the two most common being the prediction accuracy metric, which is the percentage of correctly classified observations, or the area under the curve (AUC) metric, which is the area under the relative operating characteristic (ROC) curve [43]. We used the AUC metric because it is considered preferable to accuracy in binary classifications [44]. A random classification would produce an AUC score of 0.5; AUC scores between 0.5 and 0.7 are considered poor prediction accuracies; scores between 0.7 and 0.9 are considerate moderate; and AUC scores of 0.9 or above are considered excellent prediction accuracies [10,45].
We applied four different classification and regression models: (i) Random Forest (RF) [46]; (ii) Extreme Gradient Boosting (XGBoost) [47]; (iii) Multilayer Perceptron (MLP), a form of Neural Network [48]; and (iv) linear/logistic regression [49]. In addition, when previous fire data were available, we present a naïve prediction which takes the burned area in the previous day as the predicted area. We performed a train-test split where 25% of the observations are used for testing. We also used a 5-fold cross-validation for the training data. The analyses were performed using Python’s Scikit-learn package [50], apart from the Extreme Gradient Boosting model which was based on the XGBoost package [47].
The RF, XGBoost, and MLP models were tuned by various hyperparameters. We performed hyperparameter optimization for both models to achieve optimal predictions. The following hyperparameters were examined for the RF and XGBoost: the number of estimators (‘n_estimators’) between 100 and 300 and the maximal tree depth (‘max_depth’) between 8 and 12. We used a minimal samples leaf (‘min_samples_leaf’) of 1, a learning rate (‘learning_rate’ in XGBoost) of 0.3, and did not limit the maximal number of features (‘max_features’ in RF). The accuracies presented in the Results section are of the optimal hyperparameters. The following hyperparameters were examined for the MLP model: the number and size of hidden layers (‘hidden_layer_sizes’), i.e., 1–3 hidden layers of 50–150 neurons, the Adam optimization algorithm [51], a learning rate of 0.001, and up to 200 epochs.

3. Results

3.1. Regression on Entire Dataset: Log Transformation

We begin by presenting the regression analysis for the entire dataset (without performing outlier detection), using the log transformation. The results are summarized in Table 3. For each subset of the data, we present the MAE scores for each of the models, both in the transformed and in the original forms of the data. Both the right and left sides of the upper part of the table are performed on the same subset of the data, which excludes observations from the first three days of each fire. The left side makes use of previous fire data, while the model described in the right side has no access to these data. In both cases, the XGBoost model outperformed all other models. When previous wildfire data were available, the XGBoost model obtained an MAE score of 1.76 km2 compared to slightly inferior scores in the RF and MLP models, and a substantially inferior score (7.93 km2) in the linear model. Compared to the naïve prediction of the area burned in the previous day, our model obtained an improvement of 30% in terms of MAE (1.76 km2 versus 2.54 km2). In terms of MAPE, we obtained a score of 77% compared to 104% in the naïve prediction. Without information on the previous days of the fire, prediction accuracies decreased to 2.12 km2 (upper right side of the table).
The bottom left side of the table includes the entire dataset including observations from the first three days of the fire, but does not utilize previous fire data as a predictive feature. The bottom right side presents an analysis which only includes the first day of the fire. The MAE scores in these two analyses are substantially better since these data include a large number of small observations from the first three days of the fires. In the analysis of the entire dataset without previous fire data, the XGBoost model obtained an MAE score of 1.15 km2 and a relative error which is similar to the previous analysis. In the final analysis, which only includes the first fire day, the MAE score improved to 0.33 km2 and the relative error improved to 52%. In the last two analyses, the performance of the XGBoost model was only slightly better than the other models, including the linear regression. Figure 3 and Figure 4 provide further visualizations of the models’ performances.
Figure 5 and Figure 6 present the SHAP values for our best model, XGBoost, with or without previous fire behavior data available, respectively. Figure 7 presents SHAP values for the analysis which only includes the first day of the fires. For each feature, SHAP values were calculated by comparing the predictions without the feature with the predictions including the feature [52]. Each dot in each feature represents the feature’s effect on the prediction in a single observation. For clarity of viewing the graph, we randomly chose 5000 observations from the entire dataset.
The features describing fire behavior in previous days were substantially more important than all other features including meteorological factors and fire indices. The most influential factor in this model is the mean area burned in the last three days, followed by the variable of area burned in the previous day. This suggests relatively low variability in daily burned areas. However, the mean area burned since the beginning of the fire is found to be less important, indicating some variation in daily burned area over longer time periods of the fire. The next most important feature in both models is the mean monthly burned area in the region since 2003. The long-term history of burned areas is positively correlated with the target variable, indicating which areas are more prone to burning. In contrast, the total area burned in the year before the fire is found to be significantly less important; it does not even appear in Figure 5 and is rated relatively low in Figure 6 and Figure 7. We believe that this result has two explanations: the first is that medium-term (one year) history is less statistically significant as it includes less data; the second is that in the one-year timescale, burned regions are less prone to burning again because the remaining fuel loads are reduced. This may explain why the one-year burned area variable is even negatively correlated to the target variable in the first-day analysis (Figure 7).
The remaining influential factors are similar, though not identical, in the three analyses. As expected, location variables (latitude and longitude) were two of the most influential factors in both models. Relative humidity was found to be important and negatively correlated with fire growth rate as expected. In all three models, various fire indices had a significant effect on model outcomes; however, the influential fire indices as well as their order were significantly different between the models. One possible explanation of this finding is that some of these indices are redundant and include similar information, so that the most important indices are sensitive to the specific training data.
Topographic slope appeared in Figure 6 and Figure 7, though its effect on both was not found to be high. In both cases, slope appears positively correlated to wildfire growth rate, but there are also many contradictory observations. We note that slope also had an unexpected negative Pearson correlation coefficient (Figure A1). As for fuel loads, NDVI as well as LAI were only influential in the two latter models. It is possible that fuel loads did not appear directly in Figure 5 because the mean burned areas in previous days already reflects the fuel loads in the region (as opposed to variables with higher daily variation such as relative humidity).

3.2. Regression on Entire Dataset: Outlier Detection

In Table 4 we present the prediction accuracies after performing outlier detection based on the MAD method. As expected, the absolute accuracies are substantially better than those of the full model since the variance of the data is smaller. In all of the four analyses the performances of the ML models are only slightly better than the performance of the linear regression. Unlike the analysis on the full dataset (Table 3), including information on the previous days of the fire did not improve the prediction accuracies. Even so, the analysis of feature importance provided overall similar results to those of the previous section (presented in Appendix A).

3.3. Classification: Increase or Decrease in Daily Burned Area

In this section, we predict whether a wildfire’s growth rate will increase or decrease the following day. In other words, we predict whether the burned area at the day of the observation is larger than that of the previous day. The results are summarized in Figure 8. All three ML models obtained similar AUC scores of 0.89–0.90 (on a scale of 0 to 1, 0.5 being a random guess). The logistic regression was significantly less accurate with an AUC score of 0.61, only slightly higher than a flip of a coin.
A feature importance analysis based on SHAP values is presented in Figure 9. Here, as well, the most important features are those describing the fire behavior in the previous days. Unlike the regression analysis, the variable of the area burned in the previous day and the variable of the mean areas burned in the previous three days have opposite effects on the target value. We interpret this as the difference between these two variables; if the area burned in the previous day is larger than the mean in the last three days, the fire is decaying (and vice versa). Some additional influential variables concern regional characteristics: mean burned since 2003, latitude, and longitude. As for meteorological factors and fire indices, the most important features concern the relative change of these variables compared to the previous day. However, some of these features are also significant in their absolute form, such as RH and temperature.

4. Discussion

In this study, we predicted daily wildfire behavior using a global wildfire dataset that provides daily wildfire data at the individual fire level [14]. Using several different data sources, we created a novel dataset which describes the meteorological factors, fuel loads and topography for each observation. We applied several state-of-the-art Machine Learning models to predict the daily wildfire growth rate for each fire, either with or without information on the wildfire behavior in previous days. The study joins several previous papers which have applied ML models to wildfire growth predictions (e.g., [24,25,27]). In contrast to physics-based systems which require various and accurate data for predictions (e.g., [17]), ML-based models leverage large datasets to provide accurate data-driven predictions. To the best of our knowledge this is the first paper to apply ML models to wildfire growth rate predictions using a global dataset, making it applicable throughout the globe.
Our best model obtained an MAE score of 1.15 km2 for the entire data. When limiting the data to advanced (4+) days of the fire, we compared the prediction accuracies either with or without previous wildfire data. In this subset of the data our best model obtained an MAE score of 1.76 km2 when using information on the previous days of the fire and 2.12 km2 when these variables were not available. A naïve prediction that the burned area is identical to that of the previous day provided a significantly inferior MAE score of 2.54 km2: a 30% difference from the ML model on the same dataset. This demonstrates that information on previous days of the fire is important for future predictions, but is not sufficient by itself and should therefore be combined with meteorological and additional factors for improved performance.
The importance of previous wildfire data was also demonstrated in the features importance analysis. When this information was available it had a substantial effect on the model. Longer term historical data of wildfires in the region was also extremely important and had a strong and positive correlation on wildfire growth rate. In contrast to the data on previous wildfire behavior, the long-term historical data are available before a fire initiates, and so it can be used to identify fire susceptible regions. The location-dependent effect was also reflected by the latitude and longitude variables. As for the meteorological factors, we found that RH had the most important effect on daily wildfire growth rate, in accordance with previous research. The effect of some factors was surprisingly small, such as the wind speed that did not appear in the 20 most important features of any model, and the topographic slope which only had a marginal effect. However, one possible explanation is that the wind speed is reflected in some of the fire indices and might be less significant by itself. The absence of several wildfire indices from the most important features lists is also unexpected but could potentially be explained by redundancy as well.
An additional section was dedicated to predicting whether a wildfire’s growth rate will increase or decrease the following day. While the ML models also outperformed traditional models in the previous analyses, the advantage of the ML models was extremely significant for this purpose. The ML models obtained a 90% accuracy, compared to only 61% using a logistic regression. To the best of our knowledge, these AUC scores are higher than those obtained in previous studies (e.g., [24]).
The contribution of this study can be summarized from several aspects concerning data, methodology, and results. We have built on previous studies and utilized large and detailed wildfire datasets to develop ML-based models for wildfire growth prediction [14]. The performance of the trained models is substantially higher than linear or logistic regressions as we have demonstrated using various models and metrics. As the models were developed using a global dataset, their results can be applied in any location and can potentially improve current fire weather indices which have mostly been developed using linear regressions. In addition, these models provide a better understanding of the most influential factors in wildfire growth rate.
One limitation of the current study is that it does not predict where the fire would spread, but rather its total spread rate. Predictions of spread direction (such as in [25]) are also important for firefighting purposes. An additional possible limitation concerns the nature of the global dataset used in this study; although using a global dataset for model training is helpful for generalization purposes, it might overlook regional characteristics and underperform compared to models which are focused on a specific region. A focus on specific regions and vegetation types also enables performing an evaluation of the ML models on experiments (e.g., [26] in Australian grassfires).
Future studies can build upon the results of this research in several different research directions. At the practical level, we propose to apply ML-based fire indices that could assess the expected expansion rate of a wildfire in case it occurs. In addition, ML-based indices could be used to estimate the progress rate of ongoing wildfires. This information is crucial for decision makers during wildfire crisis management and could assist in decisions such as evacuation of the population or spatiotemporal considerations of fire attack. An additional direction of research could focus on predicting which areas are expected to be burned, in contrast to the current study that predicted how much area will be burned. Previous studies have applied deep learning algorithms for such tasks (e.g., [25]), but not on a global scale. Finally, the results of this study provide a better understanding and quantification of the relations between meteorological conditions and wildfire behavior; applying our models on climate change projections could provide insights regarding the expected nature of wildfires in a changing climate.

Author Contributions

Conceptualization, A.S. and E.H.; methodology, A.S. and E.H; software, A.S.; validation, E.H.; data curation, A.S.; writing—original draft preparation, A.S.; writing—review and editing, E.H.; visualization, A.S.; supervision, E.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created in this study. Data sharing is not applicable to this article.

Acknowledgments

We thank the three anonymous reviewers and the editorial team for their careful reading of our manuscript and their many insightful comments and suggestions.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Appendix A

Figure A1 presents scatterplots of the target variable (the daily burned area) and each of the features used in the model.
Figure A1. Scatterplots of the features and the target variable. Descriptive statistics for all features used in the model. The y-axis represents daily burned areas in km2 (log scale) and the x-axes represent the values of the different features. The Pearson correlation coefficient is presented for each subplot.
Figure A1. Scatterplots of the features and the target variable. Descriptive statistics for all features used in the model. The y-axis represents daily burned areas in km2 (log scale) and the x-axes represent the values of the different features. The Pearson correlation coefficient is presented for each subplot.
Fire 06 00319 g0a1
Figure A2, Figure A3 and Figure A4 are similar to Figure 5, Figure 6 and Figure 7, but are for the models that were trained on the data after outlier detection based on the MAD method. The results differed only slightly from the original analysis.
Figure A2. Feature importance: previous fire behavior available, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are available in this model. Outliers were removed based on the MAD method.
Figure A2. Feature importance: previous fire behavior available, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are available in this model. Outliers were removed based on the MAD method.
Fire 06 00319 g0a2
Figure A3. Feature importance: previous fire behavior not available, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are not available in this model. Outliers were removed based on the MAD method.
Figure A3. Feature importance: previous fire behavior not available, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are not available in this model. Outliers were removed based on the MAD method.
Fire 06 00319 g0a3
Figure A4. Feature importance: first day of fire, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Since only the first day of the fire is included in this analysis, no previous fire data are available. Outliers were removed based on the MAD method.
Figure A4. Feature importance: first day of fire, MAD outlier detection. Feature importance plot for the highest performing model, XGBoost. Since only the first day of the fire is included in this analysis, no previous fire data are available. Outliers were removed based on the MAD method.
Fire 06 00319 g0a4

References

  1. Westerling, A.L.; Bryant, B.P.; Preisler, H.K.; Holmes, T.P.; Hidalgo, H.G.; Das, T.; Shrestha, S.R. Climate change and growth scenarios for California wildfire. Clim. Change 2011, 109, 445–463. [Google Scholar] [CrossRef]
  2. Abatzoglou, J.T.; Williams, A.P. Impact of anthropogenic climate change on wildfire across western US forests. Proc. Natl. Acad. Sci. USA 2016, 113, 11770–11775. [Google Scholar] [CrossRef] [PubMed]
  3. Flannigan, M.D.; Harrington, J.B. A study of the relation of meteorological variables to monthly provincial area burned by wildfire in Canada (1953–80). J. Appl. Meteorol. Climatol. 1988, 27, 441–452. [Google Scholar] [CrossRef]
  4. Slocum, M.G.; Beckage, B.; Platt, W.J.; Orzell, S.L.; Taylor, W. Effect of climate on wildfire size: A cross-scale analysis. Ecosystems 2010, 13, 828–840. [Google Scholar] [CrossRef]
  5. Vlassova, L.; Pérez-Cabello, F.; Mimbrero, M.R.; Llovería, R.M.; García-Martín, A. Analysis of the relationship between land surface temperature and wildfire severity in a series of landsat images. Remote Sens. 2014, 6, 6136–6162. [Google Scholar] [CrossRef]
  6. Joseph, M.B.; Rossi, M.W.; Mietkiewicz, N.P.; Mahood, A.L.; Cattau, M.E.; St. Denis, L.A.; Balch, J.K. Spatiotemporal prediction of wildfire size extremes with Bayesian finite sample maxima. Ecol. Appl. 2019, 29, e01898. [Google Scholar] [CrossRef]
  7. Taylor, S.W.; Woolford, D.G.; Dean, C.B.; Martell, D.L. Wildfire prediction to inform fire management: Statistical science challenges. Statist. Sci. 2013, 28, 586–615. [Google Scholar] [CrossRef]
  8. Jain, P.; Coogan, S.C.; Subramanian, S.G.; Crowley, M.; Taylor, S.; Flannigan, M.D. A review of machine learning applications in wildfire science and management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
  9. Castelli, M.; Vanneschi, L.; Popovič, A. Predicting burned areas of forest fires: An artificial intelligence approach. Fire Ecol. 2015, 11, 106–118. [Google Scholar] [CrossRef]
  10. Cao, Y.; Wang, M.; Liu, K. Wildfire susceptibility assessment in Southern China: A comparison of multiple methods. Int. J. Disaster Risk Sci. 2017, 8, 164–181. [Google Scholar] [CrossRef]
  11. Shmuel, A.; Heifetz, E. Global wildfire susceptibility mapping based on machine learning models. Forests 2022, 13, 1050. [Google Scholar] [CrossRef]
  12. Shmuel, A.; Heifetz, E. Developing novel machine-learning-based fire weather indices. Mach. Learn. Sci. Technol. 2023, 4, 015029. [Google Scholar] [CrossRef]
  13. Chuvieco, E.; Pettinari, M.L.; Lizundia-Loiola, J.; Storm, T.; Padilla Parellada, M. ESA Fire Climate Change Initiative (Fire_cci): MODIS Fire_cci Burned Area Pixel Product, Version 5.1. Centre for Environmental Data, 2018. Available online: https://catalogue.ceda.ac.uk/uuid/58f00d8814064b79a0c49662ad3af537 (accessed on 3 August 2023).
  14. Artés, T.; Oom, D.; De Rigo, D.; Durrant, T.H.; Maianti, P.; Libertà, G.; San-Miguel-Ayanz, J. A global wildfire dataset for the analysis of fire regimes and fire behaviour. Sci. Data 2019, 6, 1–11. [Google Scholar] [CrossRef]
  15. Andela, N.; Morton, D.C.; Giglio, L.; Paugam, R.; Chen, Y.; Hantson, S.; Randerson, J.T. The Global Fire Atlas of individual fire size, duration, speed and direction. Earth Syst. Sci. Data 2019, 11, 529–552. [Google Scholar] [CrossRef]
  16. Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Thépaut, J.N. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
  17. Finney, M.A. FARSITE, Fire Area Simulator—Model Development and Evaluation (No. 4); US Department of Agriculture, Forest Service, Rocky Mountain Research Station: Fort Collins, CO, USA, 1998.
  18. Cruz, M.G.; Alexander, M.E. Uncertainty associated with model predictions of surface and crown fire rates of spread. Environ. Model. Softw. 2013, 47, 16–28. [Google Scholar] [CrossRef]
  19. Hoffman, C.; Canfield, J.; Linn, R.; Mell, W.; Sieg, C.; Pimont, F.; Ziegler, J. Evaluating crown fire rate of spread predictions from physics-based models. Fire Technol. 2016, 52, 221–237. [Google Scholar] [CrossRef]
  20. Rothermel, R. How to Predict the Spread and Intensity of Forest Fire and Range Fires; General Technical Reports, INT-143; US Department of Agriculture, Forest Service, Intermountain Forest and Range Experiment Station: Ogden, UT, USA, 1983.
  21. Alexander, M.E.; Cruz, M.G. Evaluating a model for predicting active crown fire rate of spread using wildfire observations. Can. J. For. Res. 2006, 36, 3015–3028. [Google Scholar] [CrossRef]
  22. Van Wagner, C.E. Structure of the Canadian Forest Fire Weather Index; Environment Canada, Forestry Service: Petawawa Forest Experiment Station: Chalk River, ON, Canada, 1974; Volume 1333.
  23. Dowdy, A.J.; Mills, G.A.; Finkele, K.; De Groot, W. Australian Fire Weather as Represented by the McArthur Forest Fire Danger Index and the Canadian Forest Fire Weather Index; Centre for Australian Weather and Climate Research: Melbourne, Australia, 2009; p. 91.
  24. Markuzon, N.; Kolitz, S. Data driven approach to estimating fire danger from satellite images and weather information. In Proceedings of the IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2009), Washington, DC, USA, 14–16 October 2009; pp. 1–7. [Google Scholar]
  25. Radke, D.; Hessler, A.; Ellsworth, D. FireCast: Leveraging Deep Learning to Predict Wildfire Spread. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 4575–4581. [Google Scholar]
  26. Khanmohammadi, S.; Arashpour, M.; Golafshani, E.M.; Cruz, M.G.; Rajabifard, A.; Bai, Y. Prediction of wildfire rate of spread in grasslands using machine learning methods. Environ. Model. Softw. 2022, 156, 105507. [Google Scholar] [CrossRef]
  27. Allaire, F.; Mallet, V.; Filippi, J.B. Emulation of wildland fire spread simulation using deep learning. Neural Netw. 2021, 141, 184–198. [Google Scholar] [CrossRef]
  28. Xu, Y.; Li, D.; Ma, H.; Lin, R.; Zhang, F. Modeling forest fire spread using machine learning-based cellular automata in a GIS environment. Forests 2022, 13, 1974. [Google Scholar] [CrossRef]
  29. Troccoli, A. Solar Radiation—Variable Fact Sheet. Copernicus Climate Change Service. Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview (accessed on 15 August 2023).
  30. Fire Danger Indices Historical Data from the Copernicus Emergency Management Service—User Guide, 2021. Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-fire-historical?tab=overview (accessed on 3 August 2023).
  31. Pimont, F.; Dupuy, J.L.; Linn, R.R. Coupled slope and wind effects on fire spread with influences of fire size: A numerical study using FIRETEC. Int. J. Wildland Fire 2012, 21, 828–842. [Google Scholar] [CrossRef]
  32. Amatulli, G.; Domisch, S.; Tuanmu, M.N.; Parmentier, B.; Ranipeta, A.; Malczyk, J.; Jetz, W. A suite of global, cross-scale topographic variables for environmental and biodiversity modeling. Sci. Data 2018, 5, 180040. [Google Scholar] [CrossRef] [PubMed]
  33. Center for International Earth Science Information Network—CIESIN—Columbia University. Gridded Population of the World, Version 4 (GPWv4): Population Density, Revision 11; NASA Socioeconomic Data and Applications Center (SEDAC): Palisades, NY, USA, 2018. [CrossRef]
  34. Blessing, S.; Giering, R. Leaf Area Index and Fraction Absorbed of Photosynthetically Active Radiation 10-Daily Gridded Data from 1981 to Present, 2018. Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/satellite-lai-fapar?tab=overview (accessed on 11 August 2023).
  35. Didan, K.; Munoz, A.B.; Solano, R.; Huete, A. MODIS Vegetation Index User’s Guide (MOD13 Series); Vegetation Index and Phenology Lab, University of Arizona: Tucson, AZ, USA, 2015. [Google Scholar]
  36. Hsu, C.W.; Chang, C.C.; Lin, C.J. A Practical Guide to Support Vector Classification. 2003, pp. 1396–1400. Available online: https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf (accessed on 11 August 2023).
  37. Andela, N.; Morton, D.C.; Giglio, L.; Chen, Y.; van der Werf, G.R.; Kasibhatla, P.S.; Randerson, J.T. A human-driven decline in global burned area. Science 2017, 356, 1356–1362. [Google Scholar] [CrossRef] [PubMed]
  38. Leys, C.; Ley, C.; Klein, O.; Bernard, P.; Licata, L. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 2013, 49, 764–766. [Google Scholar] [CrossRef]
  39. Xie, Y.; Peng, M. Forest fire forecasting using ensemble learning approaches. Neural Comput. Appl. 2019, 31, 4541–4550. [Google Scholar] [CrossRef]
  40. Al_Janabi, S.; Al_Shourbaji, I.; Salman, M.A. Assessing the suitability of soft computing approaches for forest fires prediction. Appl. Comput. Inform. 2018, 14, 214–224. [Google Scholar] [CrossRef]
  41. Morley, S.K.; Brito, T.V.; Welling, D.T. Measures of model performance based on the log accuracy ratio. Space Weather 2018, 16, 69–88. [Google Scholar] [CrossRef]
  42. Shcherbakov, M.V.; Brebels, A.; Shcherbakova, N.L.; Tyukov, A.P.; Janovsky, T.A.; Kamaev, V.A.E. A survey of forecast error measures. World Appl. Sci. J. 2013, 24, 171–176. [Google Scholar]
  43. Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
  44. Ling, C.X.; Huang, J.; Zhang, H. AUC: A better measure than accuracy in comparing learning algorithms. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence 2003, Victoria, BC, Canada, 9–11 May 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 329–341. [Google Scholar]
  45. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  46. Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
  47. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  48. Ramchoun, H.; Idrissi, M.A.J.; Ghanou, Y.; Ettaouil, M. Multilayer Perceptron: Architecture Optimization and Training. Int. J. Interact. Multimed. Artif. Intell. 2016, 4, 26–30. [Google Scholar] [CrossRef]
  49. Lever, J.; Krzywinski, M.; Altman, N. Logistic regression: Regression can be used on categorical responses to estimate probabilities and to classify. Nat. Methods 2016, 13, 541–543. [Google Scholar] [CrossRef]
  50. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Duchesnay, E. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  51. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  52. Mangalathu, S.; Hwang, S.H.; Jeon, J.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Figure 1. Histogram of daily burned areas throughout the data’s time span. The data include a total of 2,409,079 daily observations of 1,024,926 different wildfires. For clarity of viewing the figure, values exceeding 10 km2 are not presented (a total of 42,429 values which are 1.76% of the data).
Figure 1. Histogram of daily burned areas throughout the data’s time span. The data include a total of 2,409,079 daily observations of 1,024,926 different wildfires. For clarity of viewing the figure, values exceeding 10 km2 are not presented (a total of 42,429 values which are 1.76% of the data).
Fire 06 00319 g001
Figure 2. Outlier detection. Histograms of daily burned areas when previous wildfire data are available (a) or not available (b), and for the analysis which only includes the first day of the fire (c). Lower boundaries are not presented as they are negative and have no effect, except for the positive MAD lower boundary in (c). Observations of more than 10 km2 are not presented.
Figure 2. Outlier detection. Histograms of daily burned areas when previous wildfire data are available (a) or not available (b), and for the analysis which only includes the first day of the fire (c). Lower boundaries are not presented as they are negative and have no effect, except for the positive MAD lower boundary in (c). Observations of more than 10 km2 are not presented.
Fire 06 00319 g002
Figure 3. Boxplot for the models’ errors. Boxplot for the errors of the XGBoost, RF, MLP, and LR models when trained on the full dataset, including wildfire history (top-left model in Table 3). All values in the figure are in log (km2).
Figure 3. Boxplot for the models’ errors. Boxplot for the errors of the XGBoost, RF, MLP, and LR models when trained on the full dataset, including wildfire history (top-left model in Table 3). All values in the figure are in log (km2).
Fire 06 00319 g003
Figure 4. Scatterplots for the model predictions versus true values. Scatterplots for the XGBoost, RF, MLP, and LR models when trained on the full dataset, including wildfire history (top-left model in Table 3). All values in the figure are in log (km2).
Figure 4. Scatterplots for the model predictions versus true values. Scatterplots for the XGBoost, RF, MLP, and LR models when trained on the full dataset, including wildfire history (top-left model in Table 3). All values in the figure are in log (km2).
Fire 06 00319 g004
Figure 5. Feature importance: previous fire behavior available. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are available in this model.
Figure 5. Feature importance: previous fire behavior available. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are available in this model.
Fire 06 00319 g005
Figure 6. Feature importance: previous fire behavior not available. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are not available in this model.
Figure 6. Feature importance: previous fire behavior not available. Feature importance plot for the highest performing model, XGBoost. Variables describing the behavior of the fire in the days before the observations are not available in this model.
Fire 06 00319 g006
Figure 7. Feature importance: first day of fire. Feature importance plot for the highest performing model, XGBoost. Since only the first day of the fire is included in this analysis, no previous fire data are available.
Figure 7. Feature importance: first day of fire. Feature importance plot for the highest performing model, XGBoost. Since only the first day of the fire is included in this analysis, no previous fire data are available.
Fire 06 00319 g007
Figure 8. ROC curves: wildfire rate increase or decrease classification. ROC curves of the four models used for classifying whether the fire’s rate will increase or decrease the following day. The XGBoost model has the best prediction performance with an AUC score of 0.90 (on a scale of 0 to 1). The RF and MLP models’ performance is slightly lower with an AUC of 0.89. The AUC score of the logistic regression is substantially lower (0.61), which is only slightly better than a flip of a coin (AUC = 0.5).
Figure 8. ROC curves: wildfire rate increase or decrease classification. ROC curves of the four models used for classifying whether the fire’s rate will increase or decrease the following day. The XGBoost model has the best prediction performance with an AUC score of 0.90 (on a scale of 0 to 1). The RF and MLP models’ performance is slightly lower with an AUC of 0.89. The AUC score of the logistic regression is substantially lower (0.61), which is only slightly better than a flip of a coin (AUC = 0.5).
Fire 06 00319 g008
Figure 9. Feature importance: wildfire rate increase or decrease classification. Feature importance plot for the highest performing classification model, XGBoost. The data include observations starting from the second day of each fire.
Figure 9. Feature importance: wildfire rate increase or decrease classification. Feature importance plot for the highest performing classification model, XGBoost. The data include observations starting from the second day of each fire.
Fire 06 00319 g009
Table 1. Summary of features and data sources.
Table 1. Summary of features and data sources.
VariableAbbreviationSource
daily burned areaarea_burned[14]
2 m temperaturetemp[16]
relative humidityRH
10 m wind velocitywind_speed
precipitationprec
mean relative humidity in previous monthRH_1_month
mean precipitation in previous monthprec_1_month
mean relative humidity in previous yearRH_12_months
mean precipitation in previous yearprec_12_months
area burned in previous yearburned_previous_year[13]
median burned areamean_regional_burn_since_2003
month (categorical)month_1, month_2, etc.-
mean slopeslope[32]
population densitypopulation[33]
leaf area index: low vegetationLAI_low[34]
leaf area index: high vegetationLAI_high
leaf area index: totalLAI_tot
normalized difference vegetation indexNDVI[35]
incoming short-wave solar radiationradiation[29]
daily fire weather indexFWI[30]
daily build-up indexbui
daily danger indexdanger
daily drought codedrought
daily duff moisture codeduff_moisture
daily initial fire spread indexinitial_spread
daily fine fuel moisture codeFFMC
daily fire severity rating severity
daily Keetch–Byram drought indexKBDI
daily fire danger indexfdi
daily spread componentspread
daily energy release componentenergy
daily burning indexBI
daily ignition componentIC
Table 2. Data skewness.
Table 2. Data skewness.
ModelOriginal DataTransformed (Log) DataMAD Outlier Detection
Previous Fire Data Available76.301.121.12
Previous Fire Data Not Available126.821.461.47
Only First Day of Fire559.801.64−0.51
Skewness values for the original data, transformed data, and the original data after outlier removal either, based on MAD method.
Table 3. Summary of model performances.
Table 3. Summary of model performances.
Previous Fire Data Available
(Days 4+ of Fire)
Previous Fire Data Not Available
(Days 4+ of Fire)
MAEMSERMSEMAEMSERMSE
XGBoost0.770.910.950.871.261.12
RF0.780.930.960.921.431.20
MLP0.780.950.970.911.381.18
LR0.911.271.130.951.421.19
Naïve Prediction (Burned Last Day)1.041.561.25---
Previous Fire Data Not Available
(Days 1+ of Fire)
Only First Day of Fire
MAEMSERMSEMAEMSERMSE
XGBoost0.750.920.960.520.440.66
RF0.770.980.990.520.440.66
MLP0.760.980.990.540.450.67
LR0.801.041.020.540.470.69
Summary of model performances. All values in the table are in log (km2). The two upper models are performed on the same subset of the data, which excludes the first three days of the fire. The bottom left model is trained on the entire dataset and does not include data on previous days of the fire as predictors. The bottom right model only includes observations of the first day of the fire.
Table 4. Summary of model performances with outlier removal.
Table 4. Summary of model performances with outlier removal.
ModelPrevious Fire Behavior Available (Days 4+ of Fire)
MAEMSERMSE
XGBoost0.270.100.31
RF0.270.110.33
MLP0.270.110.34
LR0.280.120.34
ModelPrevious Fire Behavior Not Available (Days 4+ of Fire)
MAEMSERMSE
XGBoost0.270.120.35
RF0.270.120.35
MLP0.270.120.35
LR0.290.120.35
ModelPrevious Fire Behavior Not Available (Days 1+ of Fire)
MAEMSERMSE
XGBoost0.240.100.31
RF0.240.100.31
MLP0.240.100.31
LR0.240.100.31
ModelOnly First Day of Fire
MAEMSERMSE
XGBoost0.170.040.19
RF0.170.040.19
MLP0.170.040.19
LR0.180.040.19
Model performances after applying outlier detection using the MAD method.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shmuel, A.; Heifetz, E. A Machine-Learning Approach to Predicting Daily Wildfire Expansion Rate. Fire 2023, 6, 319. https://doi.org/10.3390/fire6080319

AMA Style

Shmuel A, Heifetz E. A Machine-Learning Approach to Predicting Daily Wildfire Expansion Rate. Fire. 2023; 6(8):319. https://doi.org/10.3390/fire6080319

Chicago/Turabian Style

Shmuel, Assaf, and Eyal Heifetz. 2023. "A Machine-Learning Approach to Predicting Daily Wildfire Expansion Rate" Fire 6, no. 8: 319. https://doi.org/10.3390/fire6080319

APA Style

Shmuel, A., & Heifetz, E. (2023). A Machine-Learning Approach to Predicting Daily Wildfire Expansion Rate. Fire, 6(8), 319. https://doi.org/10.3390/fire6080319

Article Metrics

Back to TopTop