1. Introduction
Forest covers 31% of the global land surface and is of great importance in the wildland ecosystem. Forest fires are one of the major challenges for the preservation of forest, and cause great economic and ecological losses and even loss of human lives [
1]. Even though much attention and expense have been paid to monitor and control forest fires [
2], global annual burned forest hectares (ha) are in the millions [
3]. The development of prediction models are expected to benefit fire management strategies in the ecosystem [
4].
Traditionally a forest fire is monitored by watch-keepers on a watchtower, but it is not feasible to construct a lot of watchtowers scattered in extensive forests, and the watch-keeper can only detect fires already occurring instead of making predictions [
5]. It is known that the occurrence of a forest fire is correlated with environmental conditions. For example, a forest fire is more likely to occur in hot and dry condition compared to old and humid ones [
6]. In the modern world, since numerous meteorological stations are available, the collection of weather data is fast and cheap. In addition, with the help of satellite remote sensing technology, local area conditions, such as the state of crops and land surface temperature, can be computed based on satellite images [
7]. This information can greatly benefit the construction of real-time and non-costly forest fire prediction methods.
Environmental data include many different variables, such as temperature, air pressure, humidity, wind speed, and vegetation index. These variables are usually recorded as numerical values and have valuable patterns correlated with the occurrence of forest fires. Since the dimensions of environmental data, i.e., the number of variables, are large, it is difficult for human experts to analyze the complex patterns. Machine learning models have been employed for automatically learning the relationship between the environmental data and the occurrence of forest fires [
8,
9]. Logistic regression and random forest were compared for fire detection in Slovenian forests [
10]. Cortez et al. tested five different machine learning models for burned area prediction in the Montesinho Natural Park of Portugal using meteorological data [
5]. Artificial Neural Networks (ANNs) and logistic regression were applied for prediction of fire danger in Galicia [
11]. Genetic programming was adopted for forest burned areas prediction based on meteorological data in [
12]. A hybrid machine learning approach for predicting fire risk indices was proposed in [
13]. West et al. predicted the occurrence of large wildfires based on multivariate regression and future climate information [
14]. The performance of different methods, such as the cascade correlation network, ANN, polynomial neural network, radial basis function and support vector machine (SVM), were assessed for forest fires prediction in [
15]. Sayad et al. predicted the occurrence of wildfires based on weather-related metrics using ANN and SVM [
7]. Fuzzy logic models were utilized for forest fire forecasting in [
16]. Meteorological and land-cover parameters were used for burned area prediction in [
17]. A comprehensive precipitation index was built and used for predicting forest fire risk in central and northern China [
18].
The limitations of these methods are (
i) they are based on shallow machine learning models that require the selection of useful features as the input, and (
ii) they do not consider the imbalanced problem in the historical data, e.g., the number of large-scale forest fire is much less than that of small-scale ones [
19], making the prediction models neglect information of large-scale forest fires, which is, in fact, more important to prevent serious consequences [
20].
To contribute to addressing these limitations, the use of deep learning (DL) methods for forest fire prediction is considered in this work. DL methods are able to automatically extract useful features based on neural networks with multiple layers [
21]. Among the DL methods, convolutional neural networks and recurrent neural networks have been employed to process satellite images or videos for forest fire detection [
22,
23,
24]. An autoencoder-based deep neural network (DNN) is the most suitable for the processing of numerical input values. The autoencoder is constructed by an “encoder” network and a “decoder” network whose structures are symmetrical to the encoder [
25]. The encoder network automatically extracts features from the input data and the decoder reconstructs the data from the extracted features. A sparse autoencoder tends to extract more representative and discriminative features by adopting a sparsity penalty for the activation of network neurons [
26]. A sparse autoencoder-based DNN can be built by taking the encoder of the sparse autoencoder and adding a regression/classification layer on top of it.
To deal with the imbalanced problem, a data balancing procedure is proposed, whose key idea is to generate synthetic data samples by over-sampling and introducing Gaussian noise to balance the distribution of the prediction targets. Under-sampling majority and over-sampling minority are commonly used strategies for tackling imbalanced problems [
27]. Since the available dataset recording historical forest fire condition is typically small, under-sampling is not preferred since it discards a valuable proportion of data. Over-sampling generates new data samples by randomly copying minority samples. We introduced Gaussian noise to the copied samples to increase the diversity of generation. Since DNN is capable of extracting useful features from a large amount of data, the generation of synthetic samples allows improvement of its performance.
The proposed method is evaluated using a real-world dataset provided by [
5], which contains environmental conditions of forest fires and the corresponding burned size collected from the Montesinho Natural Park of Portugal. Experimental results show that the prediction accuracy of the burned area of large-scale fires, which constitutes a minority in the dataset, was improved by adopting the proposed method. This is particularly important to prevent serious consequences of large-scale fires. To the best of our knowledge, a sparse autoencoder-based DNN has not been used for forest fire prediction, and the imbalanced problem is seldom considered in this task.
The objective of this work was to develop a method able to predict a forest fire. We assume the availability of a set of historical data, composed of many records of small-scale forest fires and few records of large-scale ones, which is common considering the occurrence of small-scale forest fire is much more frequent. Specifically, we assumed the availability of records of forest fires, which hereafter are called data samples, , where is a vector containing numerical variables associated with the environmental condition of the forest fire, e.g., weather measurements and metrics that are computed based on satellite images, and is the corresponding severity of the forest fires. The forest fire prediction method receives the test vector as the input, containing the environmental variables collected at a certain location and time, and is required to provide a prediction . The prediction is better when it is closer to the true value .
2. Materials and Methods
2.1. Benchmark Data for Forest Fire Prediction
We considered forest fire data collected during 2000–2003 from the Montesinho Natural Park of Portugal. The dataset contains 517 records of forest fires, whose environmental condition is described using 12 numerical variables, and severity is represented by the burned area measured in ha.
Table 1 shows the numerical variables. The Natural Park is divided into 81 subareas using a 9 × 9 grid; therefore, the
x and
y coordinates indicate a certain subarea. Considering that 517 data samples are not enough to show the relevance of 81 subareas to the forest fire, the coordinates are not used for the prediction of burned area of the forest fire. The “month” and “day” are transformed to numerical data by denoting January to December as 1 to 12, and Monday to Sunday as 1 to 7, respectively. For variables, each is normalized into the scale
, with
as the dimensional input vector
for the prediction of burned area. More details of the dataset can be found in [
5].
A histogram of burned area is shown in
Figure 1a. Notice that the burned area is obviously imbalanced and that the number of small-scale fires is much greater than that of large-scale fires. In the dataset, there are 247 samples with a zero burned area indicating that the burned area is lower than 0.01 ha.
To ease the unbalanced problem, a logarithm transformation was applied to the original burned area values:
where
represents the original burned area,
and
are maximum and minimum of the burned area, respectively, and
is the transformed burned area, which is normalized into the scale
to be used as the output for the prediction task, as shown in
Figure 1b.
2.2. Sparse Autoencoder
The sparse autoencoder [
26] aims at extracting useful features from the
-dimensional input of
training samples
. As shown in
Figure 2, the encoder extracts a feature vector
from the input vector
as follows:
where
,
,
are the activation function, the weight matrix and the bias vector of encoder, respectively. Then, the decoder reconstructs
to
based on
:
where
,
,
are the activation function, the weight matrix and the bias vector of decoder, respectively. The training of sparse autoencoder aims at minimizing the following loss function to encourage the extraction of discriminative features:
where
,
and
are terms with respect to reconstruction error, the sparsity regularization and the
regularization, respectively, and
and
are coefficients.
measures how much the reconstruction
is close to the input
:
is used to constrain the hidden neurons to be inactive most of the time in order to extract discriminative features. Denote the mean activation of the
-th hidden neuron,
, over all the samples
as:
Ideally the expected value of
should be a small value, e.g., 0.05, since the activation is required to be at zero for most of the samples.
is computed using the Kullback-Leibler (KL) divergence function to evaluate whether
is close to an expected value
:
The above function reaches zero, the minimal value, when all are equal to .
The
, is used to constrain the weight values to prevent the network from overfitting:
2.3. Deep Neural Networks
The DNN aims at constructing an empirical mapping function from a -dimensional input space to an output space based on input-output samples .
The DNN is composed of an encoder and a regression layer (
Figure 3). The encoder extracts high-level features from
using multiple hidden layers, and the regression layer provides a prediction
of
.
Following the idea of the sparse autoencoder,
and
defined in Equation (4) are applied to assist the encoder in extracting high-level features from
using
hidden layers. Denote the feature vectors progressively extracted from the hidden layers as
, …,
. Then, the dimension of
,
, is typically larger than the input dimension
to obtain a sparse-overcomplete feature vector, which was found to be capable of benefiting feature extraction of the following layers [
28]. For the following hidden layers, the dimension
,
, forces effective feature extraction. The ReLU activation function is adopted for all the layers of the encoder to allow the fast training of DNN [
29].
The regression layer computes the prediction based on high-level feature
:
where
is the sigmoid activation function,
and
are the weight matrix and bias of the regression layer, respectively. Then, the training of DNN is to minimize:
where the first term is the mean squared prediction error,
is the sparsity regularization for the
-th hidden layer, the last term is the
regularization.
2.4. Data Balancing Procedure
In order to deal with the imbalanced distribution of the output , a data balancing procedure is proposed. It is described as following:
Step 1. Identify the range of the output , and divide it equally into non-overlapping intervals, i.e., ;
Step 2. Generate a random number from the uniform distribution and select the interval for synthetic sample generation, where , . Notice that each interval has the same possibility of selection due to random sampling from a uniform distribution, which helps to avoid over or under-estimating a specific interval;
Step 3. Randomly choose a sample whose is in the selected interval;
Step 4. Generate a synthetic sample by introducing the Gaussian noise . There is a possibility of 10% that elements of are all zeros, i.e., the synthetic sample is the same as the original sample, and a possibility of 90% that elements of are randomly sampled from a Gaussian distribution to increase the diversity of the synthetic samples;
Step 5. Repeat steps 2–4 until random samplings are done. Then, the obtained synthetic samples are used for training the DNN.
3. Results
The general step of applying the proposed method for forest fire prediction is shown in
Figure 4.
Since the number of data samples is limited, a 10-fold cross-validation is adopted to evaluate the performance of the DNN. The dataset is randomly divided into 10 subsets, each containing approximately 10% of the samples. For each fold of the cross-validation, a subset is taken as the test set, and the other subsets are used as the training set for building the DNN model with the data balancing procedure. The predictions are obtained on the test set using the DNN and are inversely transformed to the original scale. The subsets are used as the test set in turn to obtain predictions on the whole dataset. The performance of the DNN is evaluated using Mean Absolute Error (
MAE) and Root Mean Squared Error (
RMSE):
Lower values of these two metrics indicate better performance. Considering the random effect caused by dataset splitting, the 10-fold cross-validation is repeated 10 times. The average MAE and RMSE are computed as the final performance for DNN.
Figure 5 shows an example of applying the data balancing procedure (
Section 2.4). The scaled range
of
is divided into
non-overlapping intervals with length 0.01.
Figure 5a shows the number of samples with respect to the intervals, where the first interval contains much more samples than the others, i.e., the distribution of
is extremely imbalanced. Synthetic samples are generated by randomly choosing an interval, randomly choosing a sample from the interval, and adding Gaussian noise to the input of the sample. Since variables of
are scaled into
, the standard deviation
of noise distribution
is set to be 0.001 to slightly modify the original variables for the diversity of synthetic samples.
samplings are performed and approximately 200 synthetic samples are generated for each interval, as shown in
Figure 5b. The number of synthetic samples is less than
since nothing is generated for the intervals which do not have any data sample.
A DNN with
hidden layers is constructed. Its encoder structure is
,
,
,
, which is set following the general principle discussed in
Section 2.3. For the hyperparameters,
,
which is set by computing the magnitude ratio in the loss function to keep the balance, the expected sparsity
is chosen considering a possible set
using an internal 10-fold grid search, i.e., for each fold of the cross-validation, first perform 10-fold cross-validations using only the training data for DNN with a given value of
, and then select the DNN with best-performing
on the training set to make predictions on the test set. The median value of selected
is 0.08. The obtained results are reported in
Table 2 in terms of the mean and standard deviation of
MAE and
RMSE, which are computed based on 10 times of cross-validation.
4. Discussion
For comparison, the popular regression methods ANN, SVM and RF were used for forest fire prediction and their average
MAEs and
RMSEs of 10 times 10-fold cross-validation are computed. The ANN is typically a feedforward network with one hidden layer. The activation function of the hidden and output neurons is sigmoid. The number of hidden neurons,
, is selected by an internal 10-fold grid search considering the possible set
, the median value of selected
is 10. The SVM maps the input variables into a high dimensional space, and then finds the best linear hyperplane for regression with the support of a nonlinear kernel function. The regularization parameter,
, is selected by an internal 10-fold grid search considering the possible set
. The median value of selected
is 1. A Random Forest (RF) was constructed by averaging the outputs of multiple decision trees, each trained using training samples random selected by bootstrap technique. The number of trees,
, was selected by an internal 10-fold grid search considering the possible set
, the median value of selected
is 200. All experiments were conducted using a computer with Intel i7-8550U CPU, 8.00 GB RAM, Windows 10 OS. The DNN was built using the Keras 2.0 framework, and the other models for comparison were constructed using the scikit-learn 0.24.2 package. The obtained results are shown in
Table 2 in terms of the mean and standard deviation of
MAE and
RMSE. The metrics were computed regarding 10 intervals of the burned area to investigate the prediction accuracy of the methods for different scales of forest fire.
In
Table 2, the ANN, SVM and RF give better performance for small-scale forest fire whose burned area is less than 15.42 ha. The proposed method outperforms the other methods when the scale is larger indicating that the DNN successfully pays more attention to the large-scale forest fires with the support of data balancing procedure.
To better understand the performance of the methods,
Figure 6 shows prediction results with respect to one fold of the cross-validation. Since the ANN, SVM and RF behave similarly, and to make the figure clear, only the SVM which has the best performance on the smallest interval is shown for comparison. Since in the dataset there are more than half of the samples with zero burned areas, the SVM actually learns always to provide a very small value no matter what the input is. To further investigate the behavior of different methods,
Figure 7 shows the histogram of their predictions obtained from one fold of prediction. It is verified that SVM, ANN and RF always provide small predictions regardless of the input variables. This means that SVM, ANN and RF are not reliable; they ignore or extremely underestimate all large-scale fires.
Different from classical methods that don’t learn useful information, the proposed method pays attention to all fire scales. For most small-scale forest fires, the proposed method provides close predictions, and the relatively larger prediction error is mainly due to some occasional large predictions (
Figure 6). For large-scale fires, the proposed method attempts closer predictions though these are still not very accurate. The prediction histogram of the proposed method (
Figure 7) is closer to the original data distribution (
Figure 1a), indicating the proposed method facilitates the extraction of correlation from input variables to the output. However, the performance of the proposed method is limited by the lack of enough information, since synthetic data, generated by adding Gaussian noise to original data, can help the training of DNN but do not provide any new information different from the original data. Therefore, the proposed method cannot fully detect the causes of large-scale fires because the information is insufficient, leading to wrong over-estimation of some small-scale fires. As a result, it shows a trade-off between improving large-scale prediction accuracy and over-estimating small-scale fires (
Table 2).
The over-estimation of small-scale fires may cause overreaction and extra costs. However, considering that a large-scale fire can cause serious consequences and huge losses, its accurate prediction usually is more important. We expect that the performance of the proposed method can be further improved if more data containing more information are collected.
In
Table 2, notice that performance variations of the proposed method are larger than those of SVM, ANN and RF. The variations of
MAE and
RMSE are computed based on the results obtained from 10 cross-validations. Heavily affected by the imbalanced data, ANN, SVM and RF always provide very small and stable values in all the 10 cross-validations, resulting in small variations. Performance variation of the proposed method among cross-validations is mainly due to the change in the training set (
Figure 4). Since the dataset is relatively small (517 records), the random division of subsets makes the training set very different in cross-validations. When more data are collected, the effect of random data division can be weakened to reduce the variation of the proposed method. The large variations actually indicate that the proposed method tends to learn useful information in all the cross-validations.
To further investigate why it is difficult for SVM, ANN and RF to learn useful patterns for fire prediction, the trends of forest fires by input variables are shown in
Figure 8. With respect to month, large-scale fires tend to occur in summer (July to September), which is consistent with common sense. The day of the week seems to not strongly influence fire occurrence, and large-scale fires happened in all the days except Friday. By definition, FFMC, DMC, DC and ISI suggest more severe burning conditions with larger values [
5]. From the collected data, FFMC and DC behave in accordance with the definition, and DMC and ISI are more likely to indicate severe burning conditions in the middle of their ranges. The remaining four variables are the most intuitive. Large-scale fires are likely to be driven by high temperature, low RH (relative humidity), wind with speed of about 2–6 km/h and a small amount of rain. However, notice that the variable values more likely leading to severe burning can also suggest small-scale fires, and variables correlating nonlinearly with the burning area and can have various value combinations. Thus, it is difficult to explicitly extract rules for fire prediction from the data. A method able to automatically extract useful features from the data is required. The ANN, SVM and RF rely on high-quality features strongly correlated with the output as the input, whereas the proposed method based on DNN can further extract useful features for prediction from the input variables.
In summary, SVM, ANN and RF cannot make meaningful predictions due to the complexity of the mapping between input variables and the burned area. The proposed method performs relatively well in the prediction of large-scale forest fires. However, it sometimes over-estimates small-scale fires and its prediction variation is relatively large. If more data are available, the superiority of the proposed method based on DNN is expected to be more obvious and overcome its drawbacks. Fortunately, more and more forest monitoring data can be collected in the big data era, and the proposed method is a promising tool to be considered for forest fire prediction.