Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets

Shrivastava, Aarav M.; Shrivastava, Manish

doi:10.3390/atmos16020131

Open AccessArticle

Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets

by

Aarav M. Shrivastava

¹

and

Manish Shrivastava

^2,*

¹

Richland High School, Richland, WA 99352, USA

²

Pacific Northwest National Laboratory, Richland, WA 99352, USA

^*

Author to whom correspondence should be addressed.

Atmosphere 2025, 16(2), 131; https://doi.org/10.3390/atmos16020131

Submission received: 31 December 2024 / Revised: 17 January 2025 / Accepted: 24 January 2025 / Published: 26 January 2025

(This article belongs to the Section Air Quality)

Download

Browse Figures

Versions Notes

Abstract

:

Wildfires significantly impact human health, air quality, visibility, weather, and climate change and cause substantial economic losses. While state and county-operated air quality monitors provide critical insights during wildfires, they are not available in all regions. This highlights the need for affordable, accessible tools that allow the general public to assess air quality impacts. In this study, we apply machine learning with deep neural networks to diagnose air quality rapidly from sky images taken at the Pacific Northwest National Laboratory in Richland, WA, USA. Using a convolutional neural network (CNN) framework, we trained a deep learning model to classify air quality indices based on sky images. By leveraging transfer learning, our approach fine-tunes a pre-trained model on a small dataset of sky images, significantly reducing training time while maintaining high accuracy. Our results demonstrate the potential of deep learning to provide rapid air quality diagnostics during wildfire episodes, offering early warnings to the public and enabling timely mitigation strategies, particularly for vulnerable populations. Additionally, we show that lower respiratory infections pose the highest health risk during acute smoke exposures. Reactive oxygen species (ROS) from wildfire particles further exacerbate health risks by triggering inflammation and other adverse effects.

Keywords:

biomass burning; machine learning; neural networks; total sky imager; air quality; climate; human health

1. Introduction

Wildfire activity (both the frequency and duration of large wildfires) has increased in recent decades [1,2,3] in many regions globally, including the Western United States, due to climate change and expanding human activity. Fires emit hazardous air pollutants, including Fine particles (PM_2.5, less than 2.5

μ

m in diameter), carbon monoxide, and volatile organic compounds, which can impact human and ecosystem health [4,5,6,7] and the Earth’s radiative forcing [8]. Ultra-fine particles (less than 50 nm in diameter) formed by the atmospheric chemical aging of smoke gases were recently shown to intensify deep clouds and heavy precipitation, affecting weather and climate systems [9]. Wildfire smoke can travel thousands of miles, impacting large regions and communities. To assess compliance with the National Ambient Air Quality Standard (NAAQS), the Environmental Protection Agency (EPA) collects air quality monitoring data in collaboration with state, local, and tribal agencies and makes these data publicly available [10]. Fine particles PM_2.5 associated with wildfire smoke can significantly impact human health, causing lower respiratory infections, chronic obstructive pulmonary disease (COPD), and lung cancer [11]. Studies indicate that reactive oxygen species (ROS) in wildfire smoke particles, such as polycyclic aromatic hydrocarbons and transition metals, can cause cardiovascular impairment, responsible for a significant portion of air pollution-related mortality [12].

Acute exposure to wildfire smoke can diminish the benefits of controlling human-caused air pollution. Continuous monitoring of air pollution, including PM_2.5, is crucial, particularly in vulnerable communities globally. However, traditional air quality monitoring sensors are often expensive and require specialized skills to operate, making their widespread operation impractical over large regions of interest. This creates the need for fast, accessible, low-cost air quality detection methods that can provide early warnings of poor air quality. Note that low-cost PM_2.5 sensors need to be individually calibrated for each source in their intended environment [13]. Such warning systems can help people make informed decisions about their health and safety during wildfires.

Machine learning is a promising approach for wildfire science and management, aiding in understanding fire processes across multiple scales [14]. When wildfire data for training a machine learning model are limited, transfer learning and data augmentation techniques can be applied [15,16]. In this work, we use deep CNNs to rapidly diagnose air quality by combining hourly PM_2.5 data from the Washington State Department of Ecology [17] with local sky images collected by a total sky imager (TSI) [18] at the Pacific Northwest National Laboratory in Richland, WA, USA (46.34° N, 119.27° W). Using a transfer learning approach combined with data augmentation, we adapt a deep neural network to rapidly detect air quality indices (AQIs) from a relatively small set of sky images. Our approach can quickly detect air quality during intense wildfires, providing an economical early warning system to reduce health risks from smoke exposure. Additionally, we calculate the health risk of PM_2.5 using two methods: (1) integrated exposure–response functions relating PM_2.5 to health endpoints, and (2) the calculation of ROS potential during and after smoke influence. Our results highlight the significant increase in health risks during wildfire smoke events.

2. Materials and Methods

In September 2020, wildfires burned in Oregon, WA, USA with smoke transported over hundreds of kilometers. In Kennewick, WA, PM_2.5 concentrations were measured hourly, with daily averages exceeding 200

μ

g/m³, as discussed later. This study aims to develop a cost-effective air quality detection method useful during heavy pollution episodes such as wildfires. We leverage two key datasets for a site in WA, USA during and after the intense wildfires in September 2020: (1) air quality indices from hourly PM_2.5 measurements by the Washington State Department of Ecology [19], and (2) hourly daytime images of the sky generated by a total sky imager (TSI) [20] at the Pacific Northwest National Laboratory (PNNL), less than 15 miles from the Kennewick site. Air quality indices are categorized based on PM_2.5 concentrations as: Class 1, Good (0 ≤ PM_2.5 ≤ 12

μ

g/m³), Class 2, Moderate (12.1 ≤ PM_2.5 ≤ 35.4

μ

g/m³), Class 3, Unhealthy (55.5 ≤ PM_2.5 ≤ 150.4

μ

g/m³), and Class 4, Very Unhealthy (150.5 ≤ PM_2.5 ≤ 250.4

μ

g/m³) during 10–20 September 2020. During the daytime, the TSI captured around 104 sky images, categorized into these four AQI classes. We train a supervised machine learning model to classify TSI images into these AQI categories based on corresponding PM_2.5 measurements. Although the dataset is small for training a deep learning model, we demonstrate the utility of a “transfer learning” approach leveraging a previously trained MobileNetV2 [21] model. [MobileNetV2 (Mobile: optimized for mobile and embedded devices; Net: neural network architecture; V2: version 2) is an efficient CNN model designed for resource-constrained environments like smartphones and IoT devices. It is simpler and smaller than the latest MobileNetV4 [22] model, making it a practical choice for older hardware and limited resources. Since MobileNetV4 offers improved efficiency on modern hardware, it may be explored in future studies for classifying sky images].

Below, we provide a brief description of the measurement methods for PM_2.5 and the sky image datasets used in our study. The Beta Attenuation Monitor (BAM) instruments are Federal Equivalent Method (FEM) monitors used by the WA Department of Ecology to measure PM_2.5 concentrations in near-real-time for regulatory and public information purposes [19]. The total sky imager (TSI) at PNNL captures hemispheric images of the sky during daylight using a solid-state CCD imaging camera overlooking a heated hemispherical mirror [20]. The mirror projects the sky image onto the lens, with a solar-ephemeral guided shadow band blocking direct solar radiation. Images are captured at user-defined intervals (usually hourly) and saved as JPEG files for analysis.

We leverage transfer learning based on MobileNetV2 [21] to efficiently train our machine learning model with limited air quality data. MobileNetV2 was initially trained on the ImageNet (ILSVRC-2012) dataset, which contains approximately 1.2 million images spanning 1000 categories, and learns diverse feature representations of these images. When adapted to new datasets, it leverages these learned features, reducing training time and data requirements while maintaining accuracy. We adapted MobileNetV2 for classifying TSI images into four AQI categories (Good, Moderate, Unhealthy, and Very Unhealthy). We added two trainable layers: layer 1 with 1024 neurons and Rectified Linear Units (ReLU) activation, and layer 2 with four neurons and softmax probabilities [23] corresponding to the AQI classes. The output of the second layer represents the probability that a given test sky image belongs to each of the four AQI classes. Our transfer learning code, available on GitHub [24], was developed using the Google Colab platform [25] and can be easily run on any local computer for rapid air quality prediction.

3. Results

Our dataset comprises a total of 104 images that we split into a training set and a validation set, with 80% of the images used in training and 20% used in validation. Splitting a portion of the data as a validation set enables the evaluation of intermediate model performance during training, guides hyperparameter tuning, and helps prevent overfitting on the training data [26]. Given the limited size of our training data, we used Keras’s ImageDataGenerator function [27] to augment the dataset during training with techniques like horizontal flipping, shearing, and zooming. For each epoch (a complete pass over the training data), ImageDataGenerator produces approximately 64 augmented images (based on our batch size of 32 and training set size), stored in memory during training for a total of 1920 images across 30 epochs.

We initially froze the default layers of MobileNetV2. In architectures like MobileNetV2, the final step includes a fully connected “top layer”, which maps high-level features into output predictions by combining all learned representations into a single predictive distribution.

To adapt MobileNetV2 to our datasets, we replaced its top fully connected layer with a new trainable layer of 1024 neurons using ReLU activation, followed by another trainable layer with four neurons and to output probabilities summing to 1 for the four AQI classes. Transfer learning was applied by fine-tuning these newly added layers while leveraging the pre-trained MobileNetV2 layers.

We trained the model using the Adam optimizer and categorical cross-entropy loss, commonly used for classification tasks. Categorical cross-entropy quantifies the difference between predicted and true probability distributions (AQI class labels in our study). The training data were shuffled for each epoch to prevent overfitting and improve generalizability. Given the use of transfer learning, we set a small learning rate of 0.0001 to enable gradual adaptation to the new task [28].

All experiments were conducted in a Google Colab notebook [29], configured with Ubuntu 22.04.3 LTS (Jammy Jellyfish). The environment used Python 3.10.12 along with TensorFlow 2.17.1, scikit-learn 1.6.0, NumPy 1.26.4, Matplotlib 3.8.0, and Seaborn 0.13.2.

The region around PNNL experienced intense wildfire smoke from Oregon between September 11 (9 a.m. local time) and the late evening of 18 September 2020 (6 p.m. local time). Clear sky conditions were observed before and after this period (9–10 September and 19–20 September). Figure 1 shows the four air quality index (AQI) classes analyzed, represented by total sky imager (TSI) images and PM_2.5 monitoring data from PNNL.

Good AQI: Represented by a TSI image captured on 22 September 2020, at 22 UTC, with a PM_2.5 concentration of 5.7 $μ$ g/m³.
Moderate AQI: Represented by a TSI image captured on 10 September 2020, at 15 UTC, with a PM_2.5 concentration of 19.3 $μ$ g/m³.
Unhealthy AQI: Represented by a TSI image captured on 17 September 2020, at 15 UTC, with a PM_2.5 concentration of 111.0 $μ$ g/m³.
Very Unhealthy AQI: Represented by a TSI image captured on 13 September 2020, at 16 UTC, with a PM_2.5 concentration of 191.9 $μ$ g/m³.

Figure 1 illustrates the four air quality index (AQI) classes determined from PM_2.5 measurements by the WA Department of Ecology, showing strong correspondence with changes in sky images captured by the TSI sky imager at PNNL.

3.1. Training the Model

Since the convolutional layers of the MobileNetV2 model were frozen, training via transfer learning only updates the parameters (weights and biases) of the two newly added layers for adapting the model to our sky image dataset. Figure 2 shows rapid convergence, with training and validation accuracies quickly rising from low initial values to over 90%.

3.1.1. Discussing Model Performance

Figure 2a shows the evolution of accuracy and loss over 30 training epochs. Training accuracy increases from ~42% to ~92%, while validation accuracy improves from ~57% to ~91%. The nearly parallel rise indicates consistent learning across both sets, with no strong evidence of overfitting.

Figure 2b displays the corresponding loss curves. Training loss decreases from ~1.24 to 0.26, and validation loss drops from ~1.10 to 0.33. The similar downward trends suggest that the model learns meaningful features rather than memorizing data. The small gap between training and validation metrics highlights the model’s strong generalization capabilities.

Overall, the results indicate that the network effectively classifies the data while maintaining stable validation performance. The convergence of training and validation curves in the final epochs reflects a good balance between underfitting and overfitting. Testing the model on more unseen images would further validate its robustness.

3.1.2. Testing the Model

We evaluated the computational efficiency of our model by measuring the wall-clock runtime and per-inference timing using Colab’s built-in “magic” command (%timeit). Across multiple runs, the average execution time was 2.29 s ± 1.23 s per loop (mean ± standard deviation, based on 7 iterations) for classifying 40 test images. These results highlight minor variability in resource allocation within the Colab environment.

Figure 3 illustrates the performance of our trained model on representative images from the validation set, where one randomly selected example from each of the four AQI classes is shown. The true AQI class of each image enables a direct comparison between the model-predicted and ground truth classifications, providing a visual assessment of the model’s predictive accuracy.

The fifth (rightmost) image represents an entirely unseen example from the August 2018 wildfires, deliberately excluded from both model training and validation. This image corresponds to a measured PM_2.5 concentration classified as Unhealthy. The model successfully predicts the correct class, assigning a 62% probability to Unhealthy and a 36% probability to Very Unhealthy AQI.

Using statistical evaluation methods [30], we assessed the model’s performance on the unseen 2018 wildfire dataset, comprising 10 test images per AQI class (total 40 images) determined from PM_2.5 concentrations.

The confusion matrix (Figure 4a) shows the true vs. predicted class labels, highlighting classification accuracy and misclassification patterns [31]. The model performed strongly for wildfire-affected classes, correctly classifying 10 instances in Class 3 (Unhealthy) and 7 in Class 4 (Very Unhealthy). Misclassifications were more frequent between clean air periods in Class 1 (Good) and Class 2 (Moderate), suggesting the need for additional training data to better distinguish these categories.

The ROC curves (Figure 4b) assess the model’s discrimination ability using the area under the curve (AUC). Excellent discrimination was observed for Class 3 (AUC = 0.97) and Class 4 (AUC = 0.93), while Class 1 (AUC = 0.79) and Class 2 (AUC = 0.81) showed moderate performance, consistent with the confusion matrix.

The model’s performance was further evaluated using the F1 score, a harmonic mean of precision and recall, particularly effective for imbalanced datasets [30]. For the 2018 dataset, the weighted F1 score was 0.63, reflecting moderate overall performance. Table 1 provides detailed metrics, including precision, recall, and F1 scores for each class. The highest F1 scores were observed in Class 3 (0.80) and Class 4 (0.74), with lower scores for Class 1 (0.44) and Class 2 (0.56). These trends align with the confusion matrix and ROC analysis, demonstrating better model performance in identifying polluted conditions.

The weighted F1 score accounts for the class distribution (10 images per class) and provides an aggregated measure of effectiveness. These results indicate that the model is well-suited to identifying wildfire-related pollution and likely generalizes to other wildfire events in the same region during different years.

Although PM_2.5 concentrations during the 2018 and 2020 wildfires may indicate similar AQI levels (Unhealthy, Very Unhealthy), the color and visibility of the sky can vary significantly due to differences in vegetation burned, atmospheric transport, and the chemical aging of smoke. The smoke observed in Kennewick during 2020 originated from Oregon and traveled shorter distances with less aging compared to the 2018 smoke, which was transported from Canada.

Despite these variations, the model successfully identified Unhealthy AQI levels for the 2018 dataset, corroborated by local PM_2.5 measurements. This result highlights the model’s potential applicability across diverse smoke events. Future performance could be further improved by incorporating sky images from other wildfire events (e.g., the 2018 data) and datasets from additional locations into the training process. Furthermore, as demonstrated in this study, evaluating the trained model on entirely unseen datasets provides a robust metric for assessing its generalizability.

3.2. Health Impacts of Smoke

Exposure to ambient air pollution is a major contributor to the global burden of disease, driving both mortality and morbidity. Cohen et al. [11] estimated the relative risks of mortality from respiratory diseases and lung cancer attributable to PM_2.5, as well as global trends in disease burden from 1990 to 2015. Their findings emphasize that policy actions aimed at controlling PM_2.5 sources can effectively reduce ambient air pollution and its associated health impacts. However, uncontrolled wildfires significantly contribute to air pollution, including PM_2.5, potentially undermining these policy efforts.

Figure 5 illustrates the relationship between PM_2.5 concentrations measured during the September 2020 wildfires in Kennewick, WA, and corresponding AQI levels with relative health risks normalized to a baseline value of 1.0. Using the bottom three panels of Figure 1 in Cohen et al. [11], we compared health risks for lower respiratory infections, lung cancer, and chronic obstructive pulmonary disease (COPD) across all ages in Kennewick, WA. The health risks were estimated by extracting risk values corresponding to PM_2.5 concentrations observed in Kennewick, WA, from the graphs in Cohen et al. [11].

The three curves demonstrate a non-linear increase in health risks with rising PM_2.5 concentrations, with the risk of lower respiratory infections showing the steepest growth, followed by COPD and lung cancer. Vertical dashed lines indicate AQI thresholds for “Good”, “Moderate”, and “Unhealthy” air quality, marking key concern levels for pollution exposure.

Figure 5 underscores the significant public health impacts of air pollution, revealing that even AQI levels classified as “Good” may correspond to relative risks exceeding 1.0. At “Good” to “Moderate” AQI levels, COPD risk is higher than that of lower respiratory infections and lung cancer. However, as PM_2.5 levels increase, the risk of lower respiratory infections grows steeply, overtaking the others. During wildfire smoke episodes, when PM_2.5 concentrations exceed 55

μ

g/m³ (“Unhealthy”), the risk of lower respiratory infections becomes the most pronounced.

This figure provides a valuable framework for estimating health risks based on observed PM_2.5 concentrations in specific regions, such as Kennewick, as included in this study.

Reactive Oxygen Species Associated with PM_2.5 in Wildfire Smoke

Elevated ROS levels can intensify oxidative stress, triggering inflammation, respiratory distress, and cardiovascular strain [12]. Vulnerable populations, including those with pre-existing respiratory or cardiovascular conditions, are especially at risk during such events.

Figure 6 illustrates the temporal variation of daily averaged PM_2.5 concentrations (bars, left Y-axis) and the corresponding reactive oxygen species (ROS) potential (line, right Y-axis) during the September 2020 wildfires in Kennewick, WA, highlighting critical public health implications. The ROS potential was calculated by multiplying PM_2.5 concentrations with oxidative potential (OP), using a value of 0.069 nmol/min/

μ

g for biomass burning [32].

Combustion conditions influence OP, with flaming conditions (OP ~0.12 nmol/min/

μ

g) generating higher values than smoldering conditions (OP ~0.03 nmol/min/

μ

g) [33]. For urban and traffic-influenced areas, OP ranges from ~0.08 to 0.13 nmol/min/

μ

g, based on normalized OP^DTT values from Janssen et al. [34]. In this study, we used a constant OP value of 0.07 to estimate ROS content at Kennewick, WA, accounting for contributions from urban traffic, background sources, and wildfire smoke during the smoke episodes.

The redox-active nature of biomass burning emissions and other combustion sources is attributed to the presence of polycyclic aromatic hydrocarbons (PAHs) and transition metals in PM_2.5. Figure 6 underscores the significant oxidative stress burden caused by wildfire smoke. The daily averaged ROS potential during the intense wildfire day of 12 September 2020, was 30 times higher (14.8 nmol/min/m³) compared to the clean background day of 20 September 2020 (0.5 nmol/min/m³) in Kennewick, WA.

Although PM_2.5 levels decreased after 15 September 2020, the ROS potential remained elevated, likely due to secondary aerosol formation driven by atmospheric chemical processing of smoke gases and particulate matter. Future research should explore the chemical factors contributing to the ROS potential and the role of atmospheric aging in sustaining health risks associated with wildfire smoke.

4. Discussions

Wildfires are among the most significant natural sources of air pollution, posing substantial challenges to the effectiveness of global pollution control policies. Wildfire smoke can travel thousands of miles, influencing the Earth’s weather and climate systems while exposing human populations to hazardous pollutants, including PM_2.5. This exposure leads to acute health risks, including increased mortality and morbidity. Timely detection of acute smoke exposure is critical in densely populated regions to mitigate health impacts effectively. However, continuous and widespread PM_2.5 monitoring is often expensive and impractical, particularly in developing regions. This limitation highlights the need for affordable smoke detection methods.

In this study, we developed a deep neural network framework to predict air quality indices (AQIs) by correlating sky images with PM_2.5-derived AQIs at a local site in Kennewick, WA. Using a “transfer learning” approach, we adapted the pre-trained MobileNetV2 deep convolutional neural network to analyze new datasets of sky images and AQI indices. Despite a small dataset of 104 sky images collected hourly over 10 days (10–20 September 2020), our method achieved high training and validation accuracy within just 30 training epochs. Data augmentation techniques further enhanced model performance.

Our model successfully predicted Unhealthy AQI levels during the August 2018 wildfires, using completely unseen test images that were excluded from the training and validation datasets. These findings highlight the potential of our machine learning approach to rapidly assess AQI levels from sky images. This method provides a practical alternative to PM_2.5 measurements, particularly in urban neighborhoods of developing countries that lack air quality monitoring infrastructure. By enabling timely public health advisories, the establishment of clean air shelters, and other interventions, it can help mitigate health risks during wildfire episodes.

The trained model, implemented on the Google Colab platform, is highly efficient and straightforward to deploy. It processes 40 test images in just 2.3 s, achieving an average prediction time of less than 0.1 s per image. Built on the resource-efficient MobileNetV2 architecture, the model is compatible with a wide range of hardware systems, including older and resource-constrained devices. This accessibility could empower policymakers and the public to rapidly detect and respond to wildfire smoke episodes.

Our analysis also highlights that lower respiratory infections pose the greatest mortality risk during acute smoke exposure. Additionally, wildfire smoke can elevate reactive oxygen species (ROS) exposure by more than an order of magnitude compared to clean periods. This increase can lead to inflammation, respiratory distress, and cardiovascular strain, particularly in vulnerable populations. These findings underscore the significant public health implications of wildfire smoke and the need for innovative, cost-effective monitoring tools to mitigate its impact.

Author Contributions

A.M.S. conceptualized this work, analyzed the PM_2.5 measurement data and related these to the TSI sky image dataset, conducted formal analyses including the development, running, and testing of the deep learning model, and also wrote the original draft of the manuscript. M.S. contributed to the validation of the software and the results, generating visualizations, and the review and editing of the manuscript drafts. All authors have read and agreed to the published version of the manuscript.

Funding

M.S. acknowledges support by the US Department of Energy (DOE) Office of Science, Office of Biological and Environmental Research (BER) through the Early Career Research Program. The Pacific Northwest National Laboratory (PNNL) is operated for the DOE by Battelle Memorial Institute under contract DE-AC06-76RL01830.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Our trained deep learning code is available on GitHub [24]. The sky image dataset used in training our model is available through the Zenodo platform [35].

Acknowledgments

The authors gratefully acknowledge Susanne Glienke at the Pacific Northwest National Laboratory (PNNL) for curating the TSI sky image dataset used in this work, and Evgueni Kassianov at PNNL for helpful discussions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Westerling, A.L.; Hidalgo, H.G.; Cayan, D.R.; Swetnam, T.W. Warming and Earlier Spring Increase Western U.S. Forest Wildfire Activity. Science 2006, 313, 940–943. [Google Scholar] [CrossRef] [PubMed]
Wasserman, T.; Mueller, S. Climate influences on future fire severity: A synthesis of climate-fire interactions and impacts on fire regimes, high-severity fire, and forests in the western United States. Fire Ecol. 2023, 19, 43. [Google Scholar] [CrossRef]
Donovan, V.M.; Crandall, R.; Fill, J.; Wonkka, C.L. Increasing large wildfire in the eastern United States. Geophys. Res. Lett. 2023, 50, e2023GL107051. [Google Scholar] [CrossRef]
Urbanski, S.P.; Hao, W.M.; Baker, S. Chemical composition of wildland fire emissions. Dev. Environ. Sci. 2008, 8, 79–107. [Google Scholar]
Cascio, W.E. Wildland fire smoke and human health. Sci. Total Environ. 2018, 624, 586–595. [Google Scholar] [CrossRef]
Andreae, M.O. Emission of trace gases and aerosols from biomass burning—An updated assessment. Atmos. Chem. Phys. 2019, 19, 8523–8546. [Google Scholar] [CrossRef]
Schneider, S.R.; Shi, B.; Abbatt, J.P.D. The measured impact of wildfires on ozone in Western Canada from 2001 to 2019. J. Geophys. Res. Atmos. 2024, 129, e2023JD038866. [Google Scholar] [CrossRef]
Ward, D.S.; Mahowald, N.M.; Kloster, S. The changing radiative forcing of fires: Global model estimates for past, present, and future. Atmos. Chem. Phys. 2012, 12, 10857–10886. [Google Scholar] [CrossRef]
Shrivastava, M.; Fan, J.; Zhang, Y.; Rasool, Q.Z.; Zhao, B.; Shen, J.; Pierce, J.R.; Jathar, S.H.; Akherati, A.; Zhang, J.; et al. Intense formation of secondary ultrafine particles from Amazonian vegetation fires and their invigoration of deep clouds and precipitation. One Earth 2024, 7, 1029–1043. [Google Scholar] [CrossRef]
U.S. Environmental Protection Agency. EPA AirData—Download Data Files. Available online: https://aqs.epa.gov/aqsweb/airdata/download_files.html (accessed on 22 December 2024).
Cohen, A.J.; Brauer, M.; Burnett, R.; Anderson, H.R.; Frostad, J.; Estep, K.; Balakrishnan, K.; Brunekreef, B.; Dandona, L.; Dandona, R.; et al. Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases Study 2015. Lancet 2017, 389, 1907–1918. [Google Scholar] [CrossRef]
Miller, M.R. Oxidative stress and the cardiovascular effects of air pollution. Free Radic. Biol. Med. 2020, 151, 69–87. [Google Scholar] [CrossRef]
Jayaratne, R.; Liu, X.; Ahn, K.H.; Asumadu-Sakyi, A.; Fisher, G.; Gao, J.; Mabon, A.; Mazaheri, M.; Mullins, B.; Nyaku, M.; et al. Low-cost PM2.5 sensors: An assessment of their suitability for various applications. Aerosol Air Qual. Res. 2020, 20, 520–532. [Google Scholar] [CrossRef]
Jain, P.; Coogan, S.C.; Subramanian, S.G.; Crowley, M.; Taylor, S.W.; Flannigan, M.D. A review of machine learning applications in wildfire science and management. Environ. Rev. 2020, 28, 478–505. [Google Scholar] [CrossRef]
Sousa, M.J.; Moutinho, A.; Almeida, M.; Moreira, A. Wildfire detection using transfer learning on augmented datasets. Expert Syst. Appl. 2020, 142, 112975. [Google Scholar] [CrossRef]
Sathishkumar, V.E.; Cho, J.; Subramanian, M.; Naren, O.S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol. 2023, 19, 9. [Google Scholar] [CrossRef]
Washington State Department of Ecology. Enviwa Air Quality Monitoring Map. 2024. Available online: https://enviwa.ecology.wa.gov/home/map (accessed on 22 December 2024).
Atmospheric Radiation Measurement (ARM) User Facility. TSI (Total Sky Imager). 2024. Available online: https://arm.gov/capabilities/instruments/tsi (accessed on 22 December 2024).
Washington State Department of Ecology. Met One BAM 1020 Operating Procedure. 2017. Available online: https://apps.ecology.wa.gov/publications/documents/1702005.pdf (accessed on 24 December 2024).
Morris, V.R. Total Sky Imager (TSI) Handbook; Technical Report DOE/SC-ARM/TR-017; U.S. Department of Energy: Washington, DC, USA, 2005. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4—Universal Models for the Mobile Ecosystem. arXiv, 2024; arXiv:2404.10518. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; See Chapter 6 for Activation Functions and Softmax in Neural Networks; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Shrivastava, A. Aarav_train.ipynb. 2024. Available online: https://github.com/amshriva810/SkyImages/blob/main/Savebest_Aarav_train.ipynb (accessed on 28 December 2024).
Google Colaboratory (Colab). 2024. Available online: https://colab.research.google.com (accessed on 22 December 2024).
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Chollet, F. Keras. 2015. Available online: https://github.com/keras-team/keras (accessed on 28 December 2024).
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Google Colab Notebook. Available online: https://colab.research.google.com/drive/1O6b4TP8IecEgSQo36z52lhOw6swMZ3hj#scrollTo=xEhngj4LOhtn (accessed on 1 January 2025).
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
Provost, F.; Fawcett, T. Robust classification for imprecise environments. Mach. Learn. 2001, 42, 203–231. [Google Scholar] [CrossRef]
Fang, T.; Verma, V.; Bates, J.T.; Abrams, J.; Klein, M.; Strickland, M.J.; Sarnat, S.E.; Chang, H.H.; Mulholland, J.A.; Tolbert, P.E.; et al. Oxidative potential of ambient water-soluble PM_2.5 in the southeastern United States: Contrasts in sources and health associations between ascorbic acid (AA) and dithiothreitol (DTT) assays. Atmos. Chem. Phys. 2016, 16, 3865–3879. [Google Scholar] [CrossRef]
Fang, T.; Hwang, B.C.H.; Kapur, S.; Hopstock, K.S.; Wei, J.; Nguyen, V.; Nizkorodov, S.A.; Shiraiwa, M. Wildfire particulate matter as a source of environmentally persistent free radicals and reactive oxygen species. Environ. Sci. Atmos. 2023, 3, 34–46. [Google Scholar] [CrossRef]
Janssen, N.A.; Yang, A.; Strak, M.; Steenhof, M.; Hellack, B.; Gerlofs-Nijland, M.E.; Kuhlbusch, T.; Kelly, F.; Harrison, R.; Brunekreef, B.; et al. Oxidative potential of particulate matter collected at sites with different source characteristics. Sci. Total Environ. 2014, 472, 572–581. [Google Scholar] [CrossRef] [PubMed]
Shrivastava, A.; Susanne, G. Richland WA Sky Images. 2024. Available online: https://zenodo.org/records/14545814 (accessed on 22 December 2024).

Figure 1. Examples of our pair-wise training data showing the four classes of air quality indices (AQIs) determined from PM_2.5 monitoring data and corresponding TSI images: (a) Good AQI, (b) Moderate AQI, (c) Unhealthy AQI, and (d) Very Unhealthy AQI.

Figure 2. Training progression with increasing number of epochs (X-axes) of the MobileNetV2 model with transfer learning: (a) Training and validation accuracy. (b) Training and validation losses.

Figure 3. Model-predicted softmax probabilities (Y-axis) for the four AQI classes are represented as grouped colored vertical bars. The images on the X-axis, from left to right, are randomly selected from the validation dataset during the September 2020 wildfires. The fifth (rightmost) test image, taken during the August 2018 wildfires, was excluded from both the training and validation datasets.

Figure 4. Model performance metrics evaluated on the TSI test sky dataset and corresponding PM_2.5 concentrations during the 2018 wildfires: (a) Confusion matrix comparing true and predicted labels for the four AQI classes. (b) Receiver Operating Characteristic (ROC) curves with their corresponding area under the curve (AUC) values.

Figure 5. Central estimates of relative health risks from wildfire smoke exposure, determined using local PM_2.5 measurements and integrated exposure-response functional graphs. A relative risk of 1 corresponds to PM_2.5 concentrations between 0 and 2.4

μ

g/m³.

Figure 5. Central estimates of relative health risks from wildfire smoke exposure, determined using local PM_2.5 measurements and integrated exposure-response functional graphs. A relative risk of 1 corresponds to PM_2.5 concentrations between 0 and 2.4

μ

g/m³.

Figure 6. ROS potential calculated from PM_2.5 concentrations using a constant assumed OP value.

Table 1. F1 scores for model performance on the unseen 2018 wildfire dataset.

Class	Precision	Recall	F1-Score	Number of Images
Class 1 (Good)	0.50	0.40	0.44	10
Class 2 (Moderate)	0.62	0.50	0.56	10
Class 3 (Unhealthy)	0.67	1.00	0.80	10
Class 4 (Very Unhealthy)	0.78	0.70	0.74	10
Weighted Avg	0.64	0.65	0.63	40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shrivastava, A.M.; Shrivastava, M. Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets. Atmosphere 2025, 16, 131. https://doi.org/10.3390/atmos16020131

AMA Style

Shrivastava AM, Shrivastava M. Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets. Atmosphere. 2025; 16(2):131. https://doi.org/10.3390/atmos16020131

Chicago/Turabian Style

Shrivastava, Aarav M., and Manish Shrivastava. 2025. "Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets" Atmosphere 16, no. 2: 131. https://doi.org/10.3390/atmos16020131

APA Style

Shrivastava, A. M., & Shrivastava, M. (2025). Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets. Atmosphere, 16(2), 131. https://doi.org/10.3390/atmos16020131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applying Deep Learning for Wildfire Identification: Economical and Accessible Solutions Leveraging Small Datasets

Abstract

1. Introduction

2. Materials and Methods