1. Introduction
Forest fire research is crucial for understanding the origins of forest fires and determining reasons for further investigation. Forest fires remain highly uncontrollable events that inflict significant disruption on entire ecosystems, necessitating examination through remote sensing technology. The motivation for this research stems from the imperative to protect forests against the devastating impact of forest fires. Human activities profoundly influence biological resources, contributing to the deterioration of biodiversity. Effective forest management tools are essential for safeguarding biodiversity. With its diverse flora and fauna, India faces the challenge of forest degradation resulting from fires and other activities, posing a threat to animal habitats [
1]. Legislation has been enacted to protect wildlife, plants, animals, birds, and everything associated with wildlife, emphasizing the need to preserve the country’s biological and environmental safety. Article 8 of the Convention on Biological Diversity emphasizes the establishment and management of protected areas, promoting resource efficiency, and ensuring the protection and restoration of ecosystems to safeguard a nation’s biodiversity [
2].
The detection and monitoring of wildfires have gained significant attention due to their impact on the environment, economy, and public safety. The literature provides a comprehensive overview of various methods and technologies employed for wildfire detection, with a focus on advancements in remote sensing and deep learning approaches. Recent data from the Forest Survey of India [
1] indicate a significant increase in forest fire incidents across India, with a 2.7-fold rise compared to previous years [
2]. This trend underscores the need for more effective monitoring and early detection systems. The rise in wildfire occurrences has been linked to climatic factors, land-use changes, and human activities. Traditional methods of fire detection, such as ground-based observations and manual reporting, have limitations in terms of coverage and response time, necessitating the adoption of advanced technologies like remote sensing. Remote sensing has emerged as a crucial tool for monitoring forest fires, offering broad coverage and the ability to detect fires in inaccessible regions. Platforms such as Landsat-8, which provides high-resolution imagery and thermal data, have been widely utilized in wildfire detection and monitoring [
3].
The use of satellite imagery allows for real-time tracking and assessment of fire dynamics, aiding in decision-making for firefighting and resource allocation. Optical remote sensing techniques, which involve the use of satellite or aerial imagery to monitor changes in vegetation and surface temperatures, have shown promise in early fire detection. Barmpoutis et al. [
4] reviewed various optical remote sensing systems, highlighting their effectiveness in identifying fire hotspots and smoke plumes, which are early indicators of wildfires. However, challenges remain, including false alarms due to cloud cover or other atmospheric conditions. Recent advancements in deep learning have led to significant improvements in the accuracy and speed of wildfire detection. Several studies have explored the application of deep neural networks (DNNs) and convolutional neural networks (CNNs) for wildfire detection using various data sources, including satellite imagery, UAV footage, and remote camera feeds [
5].
Toan et al. [
6] proposed a deep learning approach utilizing hyperspectral satellite images for early wildfire detection, which demonstrated high accuracy in identifying fire-prone areas. Similarly, Lee et al. [
7] employed deep neural networks with UAV-based imagery, achieving effective fire detection in remote regions. The use of UAVs provides an additional advantage of flexibility and high-resolution data, though challenges such as limited battery life and flight range persist. Deep learning techniques have also been employed for segmenting wildfire regions in satellite images [
8]. Khryashchev and Larionov [
9] applied deep learning algorithms for segmentation tasks, demonstrating the capability to accurately delineate fire-affected areas. Ganesan et al. [
10] compared various segmentation methods for forest fire regions in high-resolution satellite images, further emphasizing the importance of precise segmentation in early fire detection. Deep-learning-based segmentation models, such as U-Net and Mask R-CNN, have shown substantial improvements over traditional methods by leveraging hierarchical feature extraction to capture complex fire patterns [
11]. Wang et al. [
11] utilized deep learning techniques for early forest fire region segmentation, reporting significant advancements in accuracy. Machine learning algorithms have been utilized not only for detecting fires but also for predicting their spread and behavior. Priya and Vani [
12] developed a deep-learning-based classification system for identifying different types of fire occurrences in satellite images. Predictive modeling techniques, such as Bayesian networks and random forests, have been used to forecast fire spread, incorporating variables like wind speed, temperature, and vegetation type [
13,
14]. Khakzad [
13] modeled wildfire spread using a dynamic Bayesian network, demonstrating its effectiveness in capturing complex interactions between variables in wildland-industrial interfaces. Similarly, Sayad et al. [
14] introduced a new dataset and employed machine learning methods to enhance wildfire prediction capabilities. Combining data from multiple sources, such as satellite imagery, UAVs, and ground-based sensors, can significantly improve the accuracy of wildfire detection systems. Govil et al. [
15] reported preliminary results from a wildfire detection system that integrated deep learning algorithms with remote camera images, demonstrating promising outcomes in early fire identification. The fusion of multi-sensor data allows for a more comprehensive analysis of fire events, enabling better situational awareness and resource management [
16,
17]. Despite the advancements in wildfire detection technologies, several challenges persist. The reliability of remote sensing-based systems can be affected by environmental factors such as cloud cover and atmospheric disturbances [
18]. Moreover, deep learning models require extensive training datasets and computational resources, which may limit their deployment in real-time applications. Future research should focus on developing more robust algorithms that can operate under varying conditions and integrating emerging technologies such as hyperspectral imaging and LiDAR for improved fire detection capabilities [
19].
Table 1 below summarizes the main contributions and limitations in this field.
This present study aimed to utilize publicly available large-scale multi-sensor satellite data to develop and implement advanced algorithms for the accurate detection, classification, and segmentation of fire outbreaks. The primary focus was to create an automated fire detection algorithm that leverages satellite imagery to enhance recall, precision, and accuracy while minimizing false alarm rates (commission errors) and maintaining efficient processing times.
The present study involves designing a machine learning architecture aimed at achieving near-real-time capabilities. To evaluate the effectiveness of our wildfire risk assessment model, we compared it with other advanced models, with the goal of training a system capable of accurately assessing wildfire risk across diverse landscapes.
This paper investigates the wildfire risk through the lens of an image segmentation task, wherein the model assesses the susceptibility of each pixel to fire rather than providing a classification for the entire image, resulting in an image mask that delineates areas prone to fires within the region of interest. The primary objectives of this research are as follows:
To develop and optimize deep learning models for the detection, classification, and segmentation of wildfires using multi-sensor satellite images, focusing on improving real-time prediction capabilities.
To evaluate the performance of various deep learning architectures, including convolutional neural networks (CNNs), U-Net, and autoencoders, in accurately predicting wildfire risk.
To reduce false alarms in wildfire detection by implementing advanced loss functions and optimizing model parameters.
To assess the scalability and applicability of these models in fire-prone regions across different temporal and spatial scales, enhancing wildfire management strategies.
The remainder of this article is organized as follows:
Section 2: Materials provides an overview of the data and tools used in this study. It describes the sources and characteristics of the wildfire data and discusses the deep learning techniques employed for fire detection and segmentation. Subsections include details on the wildfire datasets used and the specific deep learning architectures implemented.
Section 3: Proposed System introduces the system architecture and methodology used for wildfire detection. It details the dataset preparation process, followed by the architectural design of the three deep learning models: autoencoder, U-Net, and convolutional neural network (CNN).
This section also discusses the loss functions used in model training to optimize performance.
Section 4: Experimental Results presents the outcomes of various experiments conducted to optimize the CNN model and assess its variability across different configurations.
This section provides insights into the tuning of hyperparameters and the model’s response to different training conditions.
Section 5: Results offers a comprehensive evaluation of the models’ performance, comparing the CNN, U-Net, and autoencoder architectures. It includes analyses of feature importance, spatial dependence in the data, and the overall accuracy of the proposed system.
Section 6: Discussion addresses the implications of the findings, highlights the limitations of the current approach, and provides recommendations for future work. This section also discusses potential improvements to the system and alternative methodologies.
Section 7: Conclusion summarizes the key contributions of the study and emphasizes the importance of integrating deep learning techniques with satellite imagery for enhancing wildfire detection and management.
2. Materials
This section provides an overview of the materials used in the study, including the data sources for wildfire detection and the deep learning techniques employed for modeling and prediction.
2.1. Wildfires
Wildfire data were collected from publicly available satellite datasets, including multispectral imagery from the Landsat, Sentinel-1, and Sentinel-2 missions. These satellites were chosen due to their fine spatial, temporal, and spectral resolutions, which are suitable for detecting and monitoring fire-related phenomena such as active fires, burned areas, and smoke. The satellite data covered the period from 2018 to 2020 and included regions known for frequent wildfire occurrences. Data were acquired from the Google Earth Engine (GEE) platform, which provides access to a vast repository of geospatial datasets. However, due to gaps in satellite data and the absence of fire instances in some areas, the data were not globally uniform. To address this, specific samples were selected based on known wildfire events, using the date and location of the incidents to guide the extraction of relevant imagery. This ensured a comprehensive dataset that represents various fire conditions and intensities. To supplement the satellite data, ground-truth information on fire occurrences was obtained from government agencies and wildfire monitoring services. These records helped validate the model predictions by providing reference points for assessing the accuracy of the deep learning models. Additionally, environmental factors such as vegetation type, land cover, and meteorological conditions were considered to improve the accuracy of the models in predicting fire-prone areas.
2.2. Deep Learning
Deep learning techniques were employed to develop models capable of detecting, classifying, and segmenting wildfires from satellite images. Three primary architectures were tested: convolutional neural networks (CNNs), U-Net, and autoencoders. These models were chosen for their proven effectiveness in image analysis tasks, such as semantic segmentation and feature extraction.
Convolutional Neural Networks (CNNs): The CNN architecture consisted of multiple layers designed to progressively extract spatial features from the satellite images. Each layer applied a set of filters to the input data, followed by batch normalization and activation functions to improve the model’s learning capability. The CNN model in this study contained approximately 587,177 trainable parameters, making it suitable for handling complex wildfire detection tasks.
U-Net Architecture: U-Net is a popular model for image segmentation that utilizes an encoder–decoder structure. In this study, the encoder reduced the spatial dimensions of the input data while increasing the number of feature channels, while the decoder expanded the dimensions to reconstruct a segmentation map. The U-Net architecture was tested with different configurations to find an optimal balance between accuracy and computational efficiency.
Autoencoders: Autoencoders are unsupervised learning models that encode the input data into a compressed representation and then decode it back to the original form. In this study, the autoencoder was used as a baseline model for wildfire detection, employing a simpler architecture compared to CNNs and U-Nets. The autoencoder’s ability to learn compact feature representations was evaluated to determine its effectiveness in identifying fire-prone regions.
To optimize the performance of these models, several loss functions, including Dice loss, binary cross entropy (BCE), and focal loss, were tested. The Dice loss function, which measures the overlap between the predicted and ground truth segmentation, was selected as the primary loss function due to its superior performance in handling class imbalance. Hyperparameters such as learning rate, batch size, and the number of epochs were fine-tuned to achieve the best possible accuracy for each model.
The combination of satellite data and deep learning techniques enabled the development of an advanced system for wildfire detection, capable of real-time monitoring and prediction. The models were trained using a dataset that represented a wide range of fire conditions, which allowed for the evaluation of the models’ robustness across different scenarios.
The CNN architecture comprises multiple blocks, each integrating convolutional, batch normalization, and activation layers, as depicted in
Figure 1. The U-Net engineering for semantic division utilizes convolutional layers to keep up with the first size of the picture while continuously expanding the number of channels. The architecture consists of four blocks, with the filter counts for each block set as 40, 60, 30, and 1, respectively. The activation functions used in the inner blocks were tanh, while the remaining blocks utilized the ReLU activation function. The final layer employed a sigmoid activation function to transform the outputs of the preceding layers into a single filter. The entire model encompasses 587,177 parameters.
3. Proposed System
The proposed system aims to enhance wildfire detection, classification, and segmentation by integrating advanced deep learning techniques with satellite-based Earth observation data. Given the increasing frequency and intensity of wildfires around the world, timely and accurate fire prediction has become a critical need for effective disaster management and mitigation. Traditional methods relying on spectral indices and manual analysis are often limited by fixed thresholds and environmental variability, making it challenging to achieve consistent accuracy across different regions and fire conditions. To address these limitations, this study introduces a system that leverages deep learning models—specifically convolutional neural networks (CNNs), U-Net architectures, and autoencoders—to process multispectral satellite imagery for improved fire detection and segmentation. The system is designed to automatically identify fire-prone areas and classify various fire-related phenomena, such as active fire fronts, burned areas, and smoke, with a high degree of accuracy. By optimizing the architectures and fine-tuning key hyperparameters, the proposed approach seeks to outperform traditional methods and existing models in terms of both speed and precision.
The system’s architecture is built to handle the complexities of real-time wildfire monitoring, integrating various pre-processing, training, and prediction modules. It includes data acquisition from multiple satellite sensors, model training using ground-truth data for validation, and post-processing techniques to refine the output segmentation maps. The following subsections provide a detailed description of each component of the system, including the data preparation workflow, model architectures, training procedures, and evaluation metrics used to assess the system’s performance.
3.1. Dataset
The NASA Earth Observing System (EOS) is a comprehensive program designed to study Earth’s climate, ecosystems, and atmosphere using a series of advanced satellites and ground-based observations. It focuses on collecting long-term, consistent, and global data for understanding and monitoring environmental and climate processes.
Utilizing Google Earth Engine (GEE), a diverse set of images has been extracted from various regions worldwide from 2018 to 2020. The study emphasized fire seasons in Africa, Asia, Australia, Europe, South America, and the United States (
Figure 2) with broad geographical and historical coverage, while accounting for considerations related to missing data. In this study, 20 remote sensing data sources were explored, aiming for a comprehensive representation.
Table 2 presents details on the spatial and temporal resolution of these features.
Figure 2 below illustrates global patterns and trends in forest loss and associated dynamics across different regions from 2000 to 2020.
Figure 3 and
Table 2 provide a comprehensive visualization of environmental variables derived from remote sensing and meteorological data, including topographical parameters (elevation), vegetation indices (LAI, FAPAR, NDVI), climatic factors (land surface temperature, humidity, precipitation, wind components, and air pressure), and soil properties (soil temperature). The figure also includes histograms for variables such as LAI, FAPAR, land surface temperature, soil temperature, precipitation, and humidity, offering statistical insights into their distributions. Additionally, evapotranspiration, fire occurrences, and land cover classifications are represented, highlighting the diversity of datasets used for environmental modeling and analysis. These datasets are critical for studying land-atmosphere interactions, ecological dynamics, and disaster monitoring.
3.2. Autoencoder Architecture
The initial architecture for assessing wildfire risk is an autoencoder. The autoencoder comprises six essential blocks, with the initial three being convolutional layers that logically decrease the number of channels, really compacting the picture. The subsequent three layers then increase the number of filters, expanding the image. The filter counts for each respective convolutional layer are 40, 20, 5, 5, 20, and 40. All max-pooling and upsampling layers employed a 2 by 2-pixel kernel. The activation function used across all layers was tanh, except for the final layer, which employed a sigmoid activation function to convert the outputs of preceding layers into a single filter. The entire model encompassed 36,596 parameters.
Figure 4 illustrates the architecture of the autoencoder designed to predict fire risk. The model begins with an input layer, followed by three successive blocks of convolutional layers (CONV_2D), activation functions (TANH), and pooling layers (MAX_POOL_2D) to extract and compress spatial features. The compressed representation is then passed through symmetric decoding layers comprising upsampling layers (UPSAMPLING_2D), activation functions (TANH), and convolutional layers (CONV_2D) to reconstruct the data. Finally, a sigmoid activation function in the output layer maps the decoded features to a fire risk probability. This structure demonstrates the use of autoencoders for feature extraction, dimensionality reduction, and reconstruction, tailored for fire risk prediction.
3.3. U-Net Architecture
The subsequent model utilized was a U-Net, a notable engineering model exceptionally respected for its presentation in tasks like biomedical segmentation. The U-Net architecture consists of five down blocks for compression and four up blocks for expansion. The number of filters for the down blocks ranges from 64 to 1024, while for the up blocks, the filter counts range from 512 to 64. The final layer utilizes a sigmoid activation function to convert the outputs into a single filter.
Convolutional Neural Network Architecture
In the context of this research, the convolutional neural network (CNN) was employed as a deep learning model for predicting fire risk and segmenting wildfire regions in satellite images. The CNN architecture used in this study was designed to optimize the performance in semantic segmentation tasks by leveraging multiple layers to extract relevant features from the input data.
The architecture comprises several sequential blocks, each integrating convolutional layers, batch normalization, and activation functions to progressively refine the feature representation. As depicted in
Figure 5, the CNN architecture utilized in this study consists of four main blocks. The filter counts for each block were set to 40, 60, 30, and 1, respectively, to adjust the network’s capacity for learning different levels of features. This arrangement was chosen to gradually increase the number of feature maps, enabling the model to capture more complex patterns related to wildfire occurrences. For activation functions, the architecture used the hyperbolic tangent (tanh) function in the inner blocks to maintain the range of the output while allowing negative values, which can help in capturing intricate details. The remaining blocks employed the Rectified Linear Unit (ReLU) activation function, known for its efficiency in deep learning tasks due to reduced likelihood of vanishing gradients. The final layer used a sigmoid activation function to transform the outputs into a single filter, representing the probability of each pixel belonging to the fire risk class. The overall CNN model encompasses 587,177 trainable parameters. This parameter count was selected to achieve a balance between model complexity and computational efficiency, making the architecture suitable for real-time wildfire detection tasks while maintaining a high level of accuracy.
U-Net Engineering for Semantic Segmentation
The U-Net architecture, which was also tested in this study, is designed specifically for image segmentation tasks. It uses an encoder–decoder structure where convolutional layers in the encoder path reduce the spatial dimensions while increasing the number of channels. The U-Net maintained the original image size by utilizing convolutional layers with padding, allowing for the retention of spatial information throughout the network. In the decoder path, the network expands the spatial dimensions while reducing the number of channels, enabling the reconstruction of a segmented output that matches the input size. This process facilitates the precise localization of fire risk areas in the satellite images. The comparison between the CNN and U-Net models revealed that while both architectures performed well in segmenting wildfire regions, the CNN architecture demonstrated superior performance with fewer parameters, making it more suitable for real-time applications.
Loss Function
Several loss functions were tested in the U-Net model for semantic segmentation, including mean squared error (MSE), binary cross entropy (BCE), and focal loss. These loss functions are commonly used in machine learning for various tasks, such as regression (MSE), binary classification (BCE), and class imbalance handling (focal loss). However, they yielded unsatisfactory results in this study, indicating their lack of suitability for the specific task of wildfire segmentation. The primary goal of the optimization model in this context was to minimize the discrepancy between the predicted segmentation map and the ground truth, thus maximizing the accuracy of the segmentation. The optimization problem can be expressed as
where L represents the loss function, P denotes the predicted segmentation, G is the ground truth segmentation, and θ is the set of parameters (weights) of the model. The choice of an appropriate loss function plays a crucial role in effectively optimizing the model.
After experimenting with different loss functions, Dice loss, derived from the Dice coefficient, demonstrated optimal performance and was consequently implemented across the three previously presented architectures. The Dice loss function is particularly suited for segmentation tasks as it measures the overlap between the predicted and ground truth segmentation. It is defined as
where p
i and g
i represent the predicted and ground truth values, respectively, at each pixel i, and N is the total number of pixels. The Dice loss is effective for addressing class imbalance, which is common in semantic segmentation tasks where the target class (wildfire) is often much smaller than the background.
Boundary Conditions:
For the optimization problem, the boundary conditions were set such that the values of the predicted segmentation probabilities P fall within the range [0, 1], ensuring valid probability outputs. Additionally, the Dice loss function ensures that the output is continuous and differentiable, which is a necessary condition for gradient-based optimization methods used in training the deep learning models.
6. Discussion
While traditional spectral indices, such as the Normalized Burn Ratio (NBR) and Fire Radiative Power (FRP), are indeed effective for rapid detection and monitoring of various fire-related phenomena (e.g., smoke, active fires, burned areas, and fire severity), they have limitations in accurately predicting fire occurrences in complex scenarios. Spectral indices rely heavily on fixed thresholds and specific spectral characteristics, which may not be reliable in all cases, especially under varying atmospheric conditions, land cover types, or mixed fire severity levels. The motivation for developing deep learning models for fire detection lies in their ability to learn complex patterns in the data without relying on predefined thresholds. These models can integrate multiple sources of information (e.g., multispectral data from different satellite sensors) and account for non-linear relationships that are difficult to capture using spectral indices alone. Additionally, deep learning models can adapt to new data over time, potentially improving their predictive capabilities as more labeled data become available. To ensure practical applicability, the proposed models, particularly the CNN that achieved an 82% detection accuracy, can be integrated into existing wildfire management systems in several ways. First, these models can be embedded into satellite-based early warning platforms, providing real-time fire detection alerts to firefighting teams and disaster management authorities. Second, they can be combined with geographic information systems (GISs) to create detailed fire risk maps, aiding resource allocation and decision-making. Third, the adaptability of deep learning models makes them suitable for integration with drone- or ground-based sensor networks, allowing for real-time fire monitoring and dynamic updates in high-risk areas. Finally, partnerships with wildfire management agencies could facilitate the deployment of these models into operational workflows, such as automated resource planning and evacuation strategies. Such integrations would not only enhance the speed and accuracy of fire detection but also improve the overall efficiency and effectiveness of wildfire response systems. Continued research and collaboration will be essential to address challenges like computational resource demands and model generalization across diverse geographies.
To validate the effectiveness of the proposed models, including the convolutional neural network (CNN), U-Net, and autoencoder, a detailed comparison with traditional approaches was performed. The CNN model, which achieved a fire detection accuracy of 82%, demonstrated improved robustness in identifying fire-prone areas compared to spectral indices under diverse conditions. The model was designed to optimize performance by incorporating advanced techniques such as batch normalization, dropout layers, and regularization, which contribute to reducing overfitting and improving generalization to new data.
6.1. Comparative Analysis and Model Performance
Table 4 compares the three deep learning models used in this study, highlighting their strengths and limitations. The CNN outperformed the U-Net and autoencoder, achieving a fire ratio score of 0.82 and a no-fire ratio score of 0.87. This performance advantage can be attributed to the CNN’s ability to capture more intricate spatial features due to its higher number of parameters (16 times more than the autoencoder). Incorporating dropout and batch normalization layers into the CNN further enhanced its regularization capabilities, contributing to its superior performance.
The U-Net model’s relatively average performance in fire detection can be linked to its architectural design. The use of Conv2DTranspose layers for image upscaling, while useful in some segmentation tasks, may not be as effective in capturing fine-grained details necessary for accurate fire detection compared to the CNN. In contrast, the autoencoder’s simpler upscaling method using UpSampling2D, along with its fewer parameters, limited its ability to learn complex patterns, resulting in poorer performance compared to the other models.
6.2. Justification for Model Development over Traditional Approaches
While spectral indices remain valuable tools for initial fire detection, the optimized deep learning models offer advantages in adapting to various environmental conditions and integrating multi-sensor data. The ability of these models to generalize across different datasets and handle varying data quality makes them suitable for large-scale applications. However, the use of deep learning models does not exclude spectral indices; rather, these approaches can complement each other. By integrating spectral-index-based pre-processing steps with deep learning models, the combined approach could provide a more accurate and faster detection pipeline.
The findings from this study indicate that the CNN, with its optimized architecture and regularization techniques, offers a viable alternative to traditional spectral index methods, especially in complex scenarios where conventional approaches struggle. Continued research is needed to further improve the integration of spectral indices and machine learning techniques, ensuring that the developed models can outperform existing methods consistently.