Predictive Model for Bark Beetle Outbreaks in European Forests

Fernández-Carrillo, Ángel; Franco-Nieto, Antonio; Yagüe-Ballester, María Julia; Gómez-Giménez, Marta

doi:10.3390/f15071114

Open AccessArticle

Predictive Model for Bark Beetle Outbreaks in European Forests

by

Ángel Fernández-Carrillo

,

Antonio Franco-Nieto

,

María Julia Yagüe-Ballester

and

Marta Gómez-Giménez

^*

Remote Sensing & Geospatial Analytics, GMV, Isaac Newton 11, 28760 Tres Cantos, Spain

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(7), 1114; https://doi.org/10.3390/f15071114

Submission received: 17 May 2024 / Revised: 19 June 2024 / Accepted: 22 June 2024 / Published: 27 June 2024

(This article belongs to the Special Issue Ecology and Management of Forest Pests—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Bark beetle outbreaks and forest mortality have rocketed in European forests because of warmer winters, intense droughts, and poor management. The methods developed to predict a bark beetle outbreak have three main limitations: (i) a small-spatial-scale implementation; (ii) specific field-based input datasets that are usually hard to obtain at large scales; and (iii) predictive models constrained by coarse climatic factors. Therefore, a methodological approach accounting for a comprehensive set of environmental traits that can predict a bark beetle outbreak accurately is needed. In particular, we aimed to (i) analyze the influence of environmental traits that cause bark beetle outbreaks; (ii) compare different machine learning architectures for predicting bark beetle attacks; and (iii) map the attack probability before the start of the bark beetle life cycle. Random Forest regression achieved the best-performing results. The predicted bark beetle damage reached a high robustness in the test area (F1 = 96.9, OA = 94.4) and showed low errors (CE = 2.0, OE = 4.2). Future improvements should focus on including additional variables, e.g., forest age and validation sites. Remote sensing-based methods contributed to detecting bark beetle outbreaks in large extensive forested areas in a cost-effective and robust manner.

Keywords:

remote sensing; bark beetles; forest pest management; machine learning; predictive model; predisposition; disturbance

1. Introduction

In recent years, European forests have experienced a pronounced increase in bark beetle outbreaks triggered by warmer winters and intense droughts [1]. The European spruce bark beetle (Ips typographus L.) is the most destructive species of conifer bark beetle in Europe, with Norway spruce (Picea abies L. Karst) being its primary host. Other occasional hosts may include fir (Abies ssp.), larch (Larix decidua), pine (Pinus ssp.), and Douglas fir (Pseudotsuga menziesii) [2,3]. The problem has unfolded gradually since 2015, mainly because of the convergence of extreme climate periods and an inefficient forest protection system. Prolonged warm spells and insufficient precipitation have rendered forest stands vulnerable to drought. A critical issue lies in the delayed management of trees infested by bark beetles increasing negative impacts [4]. For instance, droughts, wildfires, and bark beetle outbreaks interact, producing cascading effects [5].

In Central Europe, combined abiotic and biotic stressors have boosted forest mortality. In the Czech Republic, bark beetle disturbance has steadily increased over recent decades and it is expected to grow sevenfold before 2030 compared to the reference period 1971–1980 [5,6]. Similar challenges have also been observed in Polish forests, where prolonged droughts have compromised the health of Norway spruce stands [7,8,9]. In Germany, bark beetle outbreaks are a recurrent threat to spruce trees, which cover 25 percent of its forests [10]. Moreover, climate change is exacerbating the frequency and intensity of bark beetle outbreaks, also leading to the alteration of interconnected ecological processes, such as wildfires, severe drought episodes, or windstorms [11,12,13,14].

On the one hand, the life cycle of bark beetles contains four stages of development: egg, larva, pupa, and adult. During the initial development stages, eggs are protected under the bark of host trees. Young beetles feeding on the inner bark inhibit sap flow, leading to tree death. Filial beetles emerge 6–10 weeks later, establishing potential second and third generations [15]. Adult beetles hibernate under bark or in forest litter [16,17,18]. Temperature is a key driver of the insect’s development stage [2]. The characterization of these phases of development is particularly important for monitoring and accurately predicting outbreaks. Therefore, the spring and summer seasons are the critical periods of the year to be monitored.

On the other hand, infested trees go through three different stages depending on the host response to bark beetles, i.e., green attack, red attack, and grey attack [19]. Trees under red and grey attack are easier to identify, both visually and using multi-spectral remote sensing data [20,21,22]. During a green attack, visual changes in leaves are subtle and difficult to detect [23,24,25]. This task is particularly challenging due to the subtlety of symptoms [26] as well as the similarity of the spectral signature resulting from other factors that cause vegetation stress. Remote sensing-based methods yield low accuracies in detecting green attacks [27,28], which has restricted the implementation of these methods to a post-attack context, such as damage estimation [22]. However, detecting the green attack stage is crucial for mitigating bark beetle spread and preventing significant outbreaks [23], as well as predicting the potential damage in advance for better management [15].

In recent years, different studies have tried to predict bark beetle damage based on remote sensing and other publicly available data. Predictive models as part of early warning systems have also been implemented to prevent bark beetle outbreaks [29,30]. These early warning systems can predict and mitigate the impact of disturbances, being essential to reduce economic costs associated with forest pests [29]. While different studies have aimed to predict the probability of bark beetle outbreaks [29,31,32,33,34], shortcomings in the scalability and cost-effectiveness of these studies still remain. The main drawback of these models is that they need comprehensive field datasets, which are not always available, as input (e.g., soil base saturation, sanitary felling records). Therefore, these methods are mainly relevant for small-scale studies with available field surveys and statistics that can result in high accuracies. For instance, de Groot and Ogris [29] reached an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.83 for a spruce bark beetle predictive model (1 km), using a Generalized Additive Model (GAM) and some predictive variables that are hard to obtain for most regions (e.g., soil depth, soil cation exchange capacity, amount of phosphorus). Moreover, Duraciova et al. [35] developed yearly models of susceptibility to bark beetle attacks for 2008–2010 using a combination of remote sensing data and stand measures derived from forest management plans. They evaluated the model, obtaining an AUC between 0.75 and 0.82. Nevertheless, their model made use of field data derived from local management plans (e.g., forest age, stand density), and was also limited to a relatively small area (i.e., the Bohemian Forest in the Czech Republic).

Remote sensing offers substantial potential for detecting bark beetle outbreaks in large areas, as it allows for cost-effective and recurrent coverage of extensive forested areas [20,36,37,38,39]. Artificial Intelligence (AI) methods, particularly machine learning (ML) and deep learning (DL), are of significant relevance for developing scalable bark beetle predictive modelling approaches [34,39,40]. These techniques provide advanced methodological frameworks to assess and predict the probability of infestation, as well as to understand the interaction between environmental factors and pest dynamics [34,41]. While traditional methods of bark beetle detection and prediction have proven to be effective, the capabilities of ML and DL offer substantial improvements through a more detailed analysis of data, including the identification of subtle patterns that might go unnoticed with conventional approaches [34]. Furthermore, these techniques enable the continuous adjustment of the model as new data are collected, which is crucial given the changing variability and dynamics of forest ecosystems [42]. Rammer and Seidl [39] also used deep neural networks to create a bark beetle predictive model in the Bavarian Forest National Park, Germany. The authors used input climatic information and damage records from the two previous years as predictive variables. Although their model showed a high inference capacity, performance metrics obtained against the ground truth achieved a moderate algorithm performance (F1-score = 0.49).

In summary, although different scientists have developed successful methods to predict the probability of bark beetle infestation, most of them are restricted to small scales and need specific field-based variables that are usually hard to obtain at a large scale. In other cases, the factors used in their predictive models are limited to climatic conditions such as temperature, sun irradiation, and precipitation [15,43,44]. Moreover, most of these studies are based on classical modelling approaches. Remote sensing-based studies have overcome some of the limitations associated with field campaigns such as the implementation of approaches at regional or even global levels at relatively low cost by leveraging state-of-the-art ML and DL methods. Nevertheless, most of those studies have been still centred on post-damage detection (i.e., red and grey attack). Those scientists that have attempted to detect green attack do not usually offer high levels of accuracy and they consider a small number of factors that predispose forest to a bark beetle attack. Therefore, a contribution to this research field would include the development of state-of-the-art methodologies that aim at improving the early detection of a bark beetle outbreak, especially the detection of green attacks. To do so, we propose a methodological approach employing EO data and environmental factors to estimate the probability of bark beetle infestation in European forests.

The European Commission has indicated that a susceptibility model able to identify forest mortality and areas prone to a bark beetle outbreak is a useful tool for decision-making. For example, stakeholders could identify sites and forests prone to biotic damage, as well as options for prophylactic forest management in advance [2]. Therefore, the general objective of this study was to develop a methodological approach to account for the environmental traits that can predict a bark beetle outbreak accurately. The specific objectives that we have developed in this study are (1) to analyze the influence of environmental traits that cause bark beetle outbreaks, (2) to compare different ML architectures for predicting bark beetle attack, and (3) to map the attack probability before the start of the bark beetle life cycle.

This study was carried out within the context of the European Union (EU)-funded FirEUrisk project and, in particular, within the analysis of non-fire hazards that, combined with wildfires, can potentially increase negative ecological and socioeconomic impacts, triggering cascading effects [45]. We aimed at understanding the environmental context of the region before the wildfire that occurred in the cross-border region of the Saxon Switzerland National Park (N.P.) between Germany and the Czech Republic in summer 2022. This human-caused fire burnt 16 km² in an area that was highly impacted by a bark beetle infestation. This is the largest fire that has occurred in Czechia in modern history [46]. The attack probability approach developed in this study considers the phenological season prior to the wildfire event, and it could help managers optimize their resources and mitigate the negative effects of forest pests and associated hazards. There is usually a higher fire danger in disturbed areas [2], and this fire danger may be even larger in regions with no fire prevention regulations in disturbed areas [47].

2. Study Area

The study area is in a transboundary region encompassing parts of Czechia, Poland, and Germany (Figure 1). The study area covers 32,777 km² and over one-third of the area is covered by forests, being highly affected by bark beetle. Norway spruce and Scots pine (Pinus sylvestris) are the dominant tree species, covering 24.5% of the area. Mixed forests cover 7.25% and broadleaved forest accounts for 4.2%. The area includes two cross-border protected areas. The first protected area is shaped by the Saxon Switzerland N.P. (93.5 km²) in Germany, and the Bohemian Switzerland N.P. (79.27 km²) in the Czech Republic. The second protected area is shaped by the Polish Karkonosze N.P. (59.54 km²) and the Czech Krkonoše N.P. (363.52 km²). These National Parks are densely wooded landscapes. Forest represents 94% of the Saxon Switzerland N.P. area; 59% of the Krkonoše N.P. area; and 47% of the Karkonosze N.P. area.

3. Data Description

3.1. Sentinel-2 Imagery

Surface reflectance images were acquired from the Sentinel-2 Level 2A catalogue through the Copernicus Open Access Hub [48]. Sentinel Level 1C (i.e., top-of-atmosphere) images were used to mask clouds. All spectral bands from the MultiSpectral Instrument (MSI) were initially considered.

3.2. Environmental Traits

Environmental traits that predispose forests to a bark beetle infestation pointed out by the literature [2,36] were derived from freely available datasets. The data sources used for the analysis are shown below (c.f. Section 4.4.1 and Appendix A. Data Sources for more details):

National forest inventories (NFIs). Three forest inventories were collected: Lower-Silesian Forest data [49,50], Brandenburg forest data [51], and Saxon forest data [52]. Their reference year is 2021.
Corine Land Cover (CLC) + Backbone [53,54] was used to generate training samples for the spruce classification.
The European Digital Elevation Model (EU-DEM) [55] was used to characterize the topography (i.e., elevation, slope, and aspect). Solar irradiation was also derived from the EU-DEM.
Saxon Switzerland National Park bark beetle data records from 1990 to 2021, provided by the Technische Universität Dresden, TUD (c.f. Acknowledgements), were used to validate the bark beetle predictive model and the intermediate bark beetle damage estimates. The data collected cover information on the spatial extents of the forest areas affected by bark beetle disturbances in the National Park. Annual damaged areas were derived from ground surveys and did not include the severity of the damage (defoliation, discoloration, mortality, or dieback). Thus, it was not possible to determine the stage of attack (green, red, or grey) due to the datasets only including spatial–temporal references.

4. Methods

Figure 2 summarizes the methods used to model bark beetle attack and produce a predictive map of bark beetle damage. Cloudless Sentinel-2 composites were used to obtain a spruce mask and estimate the damage caused by bark beetles in summer 2021 in the study area. Bark beetle damage estimates were used to train a bark beetle predictive model, using environmental factors as explanatory variables, and test different ML architectures and configurations. The different phases of the methodology are detailed in the following sub-sections.

4.1. Sentinel-2 Composites

It was not possible to produce a cloud-free composite for some months due to the high presence of clouds and snowstorms. Hence, monthly Sentinel-2 composites were created for the years 2019, 2020, and 2021. All images with less than 60% cloud coverage were considered. Pixels with clouds and cloud shadows were masked using an improvement of the FMASK algorithm [56]. The mean was computed for each remaining pixel to produce the final monthly composites. The mean was used to ease the process of capturing disturbances.

4.2. Spruce Classification (September 2020)

A binary spruce/non-spruce mask was produced over the study area to detect accurately bark beetle damage during 2021. A Sentinel-2 monthly composite from September 2020 was selected to classify Norway spruce before the first attack signs observed in spring 2021. This monthly composite was the closest to spring 2021 with 0% cloud cover.

The spruce/non-spruce dataset was used to train a classification algorithm. Two NFI datasets (i.e., Polish and German) were harmonized to create a single geospatial dataset. To do so, the generation of training samples involved the following steps: First, a systematic grid of 2000 × 2000 points (i.e., 4 million samples) was generated in the study area. Each point was assigned to a land cover category according to the CLC + Backbone product to obtain non-spruce samples. For the spruce class, a systematic sampling was carried out, creating points which were 10 × 10 m each within spruce stands according to NFI datasets. Points within NFI polygons with 100% tree cover density and Norway spruce as the dominant species were included in the spruce class. This dataset was split into two subsets, i.e., 80% of the dataset was used for training the model, while the remaining 20% was left out for testing the model’s performance.

A literature review was carried out to identify the most suited spectral indices to discriminate Norway spruce areas [21,57,58,59]. The following spectral indices were selected since they yielded the best results in the reviewed literature: Bare Soil Index (BSI) [60,61,62], Red-edge-band Chlorophyll Index (CLRE) [63], Dead Fuel Index (DFI) [64], Enhanced Vegetation Index (EVI) [65], Modified Bare Soil Index (MBI) [66], Modified Soil Adjusted Vegetation Index 2 (MSAVI2) [67,68], Normalized Difference Moisture Index (NDMI) [69], Normalized Difference Red Edge Index 1 (NDREI1) [70], Normalized Difference Red Edge Index 2 (NDREI2) [71], and Normalized Difference Vegetation Index (NDVI) [72]. The pairwise Pearson correlation of indices (Table 1) was studied to reduce collinearity, avoid redundancies in the model, and make it more efficient. The indices with the lowest Pearson coefficient (i.e., with more complementary information) were EVI, MBI, MSAVI2, and NDVI. These indices were finally selected to build the classification model.

Before training the model, the training dataset was cleaned to avoid possible errors in the spruce training samples (e.g., cleared pixels within spruce areas according to NFIs). The distribution of the selected indices was studied, discarding outliers to make the dataset more robust. Spruce pixels beyond 2.5 standard deviations from the mean were discarded, as they were not considered pure spruce canopy. Finally, a Random Forest model [73] was trained and a binary spruce/non-spruce classification was produced for September 2020. This output map was used in further steps to assess vegetation stress monitoring and estimate bark beetle damage.

4.3. Bark Beetle Damage Estimate (September 2020–October 2021)

To estimate bark beetle damage during the 2021 spring–summer season, the algorithm proposed by Fernandez-Carrillo et al. [22] was used. The process can be summarized in the following steps:

First, a baseline for spruce condition was established using Sentinel-2 composites with September 2020 as a reference date. To do so, vegetation indices including NDVI, Green Leaf Area Index (LAI_green) [74], and NDMI were computed as proxies of photosynthetic activity, number of leaves, and water content, factors that are strongly affected by bark beetle. These indices were masked with the previously derived Norway spruce map. Due to persistent cloud cover, Sentinel-2 images from September 2021 were unusable, leading to the assessment of bark beetle damage between September 2020 and October 2021.

Second, anomalies in vegetation condition were estimated using multitemporal linear regression on NDVI, LAI_green, and NDMI. The indices in September 2020 were used as independent variables, while the values of October 2021 were set as dependent variables. After adjusting the regression model, errors were standardized (i.e., z-scores) and averaged to generate images of vegetation anomalies. Negative anomalies were categorized into the following damage classes: no damage (average error > −1 standard deviations), minor damage (−1 to −2 standard deviations), moderate damage (−2 to −3 standard deviations), and severe damage (<−3 standard deviations), indicating varying levels of spruce tree damage caused by bark beetles.

The lack of accurate and timely bark beetle damage reference datasets hindered the validation of damage estimates in the whole region of interest. Field data were limited to bark beetle damage records (1990–2021) for the Saxon Switzerland N.P. Within the time frame of the analysis (September 2020–October 2021), damaged areas registered in 2021 in the aforementioned reference dataset were used as ground truth damage samples. The remaining damage records from dates out of this range were considered no-damage, since we aimed at predicting bark beetle damage in 2021. A confusion matrix and performance metrics were computed to validate the damage estimates derived from Sentinel-2 imagery.

4.4. Bark Beetle Predictive Modelling

4.4.1. Bark Beetle Parametrization: Environmental Traits

According to the European Commission (EC) [2], there are a set of key environmental traits that create the perfect conditions for bark beetle development. Therefore, this study was designed to use spatial information about the key environmental traits that could help us infer areas prone to bark beetle attack between September 2020 and October 2021. Some of the environmental traits were discarded from the analysis due to data limitations (e.g., lack of accurate data sources, low resolution). The complete list of the environmental traits considered by the EC is shown in Appendix B. Subsequent paragraphs detail the reasons behind the selection of each trait.

Norway spruce proportion and stand age (partially selected): The susceptibility of Norway spruce stands to attack by bark beetle is highly correlated with the abundance of its prime host tree and stand age. A spruce density image was derived from the Random Forest probability of the spruce mask generated from Sentinel-2 images. The spruce density raster ranged from 0% to 100% and had a spatial resolution of 10 m. Forest age was discarded as it was not possible to obtain reliable reference datasets to use them or to train any supervised model. Unsupervised forest age models were considered not accurate enough.
Stand structure (discarded): No reference dataset was available for the whole study area, and generating a new one would require extensive LiDAR data, which were also unavailable.
Autochthonous tree species (discarded): The high presence of Norway spruce monocultures outside natural range create the ideal conditions for massive bark beetle outbreaks [2].
Stand density (selected): High canopy cover is directly related to the bark beetle life cycle [2]. Stand density was derived from the probability of a new Random Forest model trained with the same dataset as the spruce mask but grouping the samples into forest and non-forest classes. Samples were stratified to ensure the representativity of different forest types (e.g., conifers others than spruce, mixed deciduous forest) and non-forest land cover classes (e.g., urban, crops, natural grassland, bare soil). The stand density raster ranged from 0% to 100% and had a spatial resolution of 10 m.
Stand vitality and tree sociology (discarded): This trait was not selected since there was not a valid reference dataset and its generation is not straightforward.
Exploitation/Phytosanitary measures (discarded): Many protected areas are affected by unprecedented, large, and severe bark beetle outbreaks. Stand accessibility practises such as cleaning, removal of dead wood, and salvage cutting of infested trees prevent bark beetle outbreaks. Data on exploitation and phytosanitary measures were not included in the analysis, as it would require extensive work to refine forest inventories, which was out of the scope of this study.
Topography (selected): Exposed spruce stands on ridges, hilltops, or upper slopes are particularly susceptible to bark beetle attacks [2]. Three EU-DEM products were used to include topographic information in the model: elevation, slope, and aspect, at 25 m spatial resolution.
Potential solar irradiation (selected): High solar irradiation triggers beetle propagation [75]. Potential irradiation between September 2020 and October 2021 was computed from the DEM using the implementation by Hofierka and Šúri [76], with a pixel size of 50 m.
Site water supply (discarded): This trait was not included in the analysis because the generation of a synthetic dataset about generic water supply would involve a high level of complexity. Moreover, the benefits of adding this variable might not be significant since there are no extreme dry nor wet zones in the study area.
Soil type (discarded): Spruce stands suffering from root deterioration caused by stagnic soil conditions are particularly prone to suffering bark beetle damage [2]. No dataset was found containing this precise soil information.
Soil depth and skeleton (discarded): No valid data source was found. Its influence on bark beetle attack should be minor according to the EC [2].
Site index (discarded): There was no valid data source to estimate the site index in the whole study area.
Temperature (discarded): To the authors’ knowledge, the mean daily air temperature derived from Copernicus ERA-5 Land at 10 km resolution is the only public source of spatially continuous data on temperature for the transboundary study area. Temperature was discarded to avoid lowering the final model resolution to 10 km.
Precipitation (discarded): As with temperature, ERA-5 precipitation data were discarded to avoid lowering the final model resolution to 10 km.
Drought periods (discarded): Bark beetle damage increases after intense summer drought [15,77]. Drought datasets are available from the European Drought Observatory at 5 km spatial resolution. As using these datasets would lower the resolution of the model to 5 km, this variable was left out of the analysis.
Wind throw (discarded): Windstorms are directly related to greater bark beetle outbreaks [78]. To the authors’ knowledge, there are no datasets of direct wind damage. Its generation is possible using remote sensing data, but it is costly, and results often have poor quality in terms of accuracy [79]. ERA5-Land wind speed at 10 km resolution could be used as a proxy for wind damage, but it would require lowering the spatial resolution of the model.
Snow breakage (discarded): There are no datasets providing spatially continuous information on the forest damage caused directly by snowfall.
Other damages (discarded): The EC names fire, avalanches, and rock falls as examples of natural events causing forest damage. Fire counts could have been used in this study. Nevertheless, fire damage was not selected due to the low occurrence of wildfires in the study area (c.f. Section 1).
Bark beetle abundance (selected): Exact data on bark beetle population were not available. Nevertheless, it can be assumed that bark beetle population is directly related to the extension and intensity of previous bark beetle attacks, as pointed out by the EC [2]. Hence, bark beetle damage was estimated for the previous year (October 2019–September 2020) of the selected time frame of our analysis using the methods described in Section 4.3. Subsequently, attack densities were computed in different spatial kernels as proxies of bark beetle abundance. According to [80,81,82], bark beetle (Ips typographus) individuals can fly up to 40–55 km away from infested spruce trees in exceptional cases. Hence, bark beetle attack density was computed for kernels within this fly distance range (i.e., kernels with a 100 m, 200 m, 1 km, 5 km, 10 km, 25 km, and 50 km radius were considered) [78,79].

The final density features selected according to the importance proxies of bark beetle abundance were:

Population density in adjacent stands (partially selected): The bark beetle density variables previously explained were also considered as proxies of beetle population density.
Predisposition of adjacent stands to infestation (partially selected): In addition to the four-bark beetle attack density features, the Euclidean distance to affected areas in the previous year was used as a proxy of the predisposition of adjacent stands to suffer a bark beetle attack. The distance in metres was computed with a resolution of 50 m.

The final subset of the selected variables is summarized in Table 2.

Although the initial variables were between 10 and 25 m, all variables were resampled to 50 m to ease the computation of potential solar irradiation and bark beetle densities. Potential solar irradiation resolution was limited by the available computing resources, finding a compromise at 50 m. Moreover, the computation of bark beetle density at resolutions higher than 50 m risked yielding unsignificant or poor results. Hence, to build the predictive model at the selected resolution, all variables were resampled, computing the mean value of all pixels overlapping the final 50 m × 50 m pixel. This spatial resolution was the best trade-off between capturing bark beetle dynamics and including key environmental traits. All of the selected factors were included in a table to carry out a feature engineering process before training the bark beetle predictive model.

4.4.2. Feature Engineering

The first step of the feature engineering process was to transform the input variables for the model. Different transformation methods were used as follows:

All quantitative continuous variables remained with their original values and ranges.
The aspect was originally provided in degrees 0–360, showing the angle from North to North (i.e., East = 90°, South = 180°, West = 270°, North = 0°). A generic model would normally assume that these values are in a linear scale (not an angle), and thus would consider that 359 is extremely far from 0, instead of recognizing them as neighbour values with very similar meanings (i.e., North orientation). This aspect was thus divided into two components [84]: the Sin(Aspect) or S-N component and the Cos(Aspect) or W-E component, both ranging from −1 to 1.

With all variables correctly co-registered, resampled, and transformed, different techniques were applied to reduce dimensionality and avoid collinearity and redundancies. One of the main problems of one-hot encoding is that it increases the dimensionality of features proportionally to the number of possible values in each variable [85].

The pairwise Pearson correlation coefficient was also computed for all variables to study collinearity (Table 3). The highest correlations were found for the bark beetle density variables, with a maximum for the pair 1 km–25 km (r = 0.87). Spruce density also showed a relatively high correlation with stand density (r = 0.71). Since none of the variables yielded very high correlations (i.e., >0.9), it was decided to feed the model with all variables to avoid losing significant information.

After dimensionality reduction, the final variables selected to build the model were (1) spruce density, (2) stand density, (3) elevation, (4) slope, (5) aspect sin, (6) aspect cos, (7) potential solar irradiation, (8) bark beetle density 200 m, (9) bark beetle density 1 km, (10) bark beetle density 10 km, (11) bark beetle density 25 km, and (12) bark beetle distance.

The target variable of this study (i.e., the independent variable) for predicting a bark beetle attack was bark beetle damage. As presented in Section 4.2, a bark beetle damage map was produced for the year 2021. The bark beetle categorical map was developed at 10 m spatial resolution and binarized into simple damage and no-damage categories. This layer was resampled to 50 m by computing the mean, yielding pixels with values from 0 to 1, indicating the proportion of area attacked in each 50 m pixel. The 0–1 values resulting from the bark beetle damage map indicated the proportion of the pixel that was attacked between September 2020 and October 2021. Given that the bark beetle predominantly attacks Norway spruce, the usage of this bark beetle damage map as the target variable could potentially lead to irrelevant results. This is because the real factors affecting bark beetle behaviour (i.e., attack vs. no attack) in areas with similar spruce proportions might not be accurately captured. To mitigate this, a per-pixel spruce proportion map was derived by resampling the spruce map (see Section 4.2) to 50 m using the mean. The resampled damage/no-damage map was then multiplied by the spruce proportion, resulting in a final map showing the proportion of spruce that was damaged in each pixel.

This last variable was used as a target, directing the model’s focus towards estimating the proportion of spruce that had been damaged by bark beetles in each 50 m × 50 m pixel. This final target was a continuous variable ranging from 0 to 1 (i.e., 0% to 100% of the spruce in the pixel area was attacked by the bark beetle between September 2020 and October 2021). As the distribution of this variable was highly skewed towards 0, the feature was transformed by applying a logarithmic function (Equation (1)) before training the model. This process smoothed the exponential distribution of the original values towards a less skewed distribution, helping the model to perform better at higher attack intensity levels, which were originally under-represented.

y^{'} = l o g (1 + y)

(1)

where y′ is the value of the dependent variable after the transformation and y is the original one.

Before the model selection and hyperparameter tuning, the dataset containing all predictive features was separated into two different subsets: training and testing the model’s performance. The training subset contained 80% of the data and was used to adjust the different models described in Section 4.4.3. The 20% remaining were used to internally test the model performance and compare the different combinations of architectures and hyperparameters using the coefficient of determination (r²), root mean square error (RMSE), and mean absolute error (MAE).

4.4.3. Model Selection and Hyperparameter Tuning

Different architectures were tested to select the best model. As the target and the desired output of the model were continuous variables, only regression models were considered [86]. Four different algorithms were selected: Ordinary Least Squares (OLS) [87], Support Vector Regression (SVR) [88], Random Forest regression (RF) [73], and LightGBM [89]. OLS is a simple linear regression method that finds the best-fitting linear relationship between the independent and dependent variables by minimizing the sum of the squared differences between the predicted and actual values. It is straightforward and interpretable, but it assumes a linear relationship between variables and may not perform well with complex, nonlinear data. SVR is a regression technique based on support vector machines, which aims to find a hyperplane that minimizes the margin of error while adhering to a defined tolerance level. It is effective in capturing nonlinear patterns and offers control over the level of tolerance, but it can be computationally very intensive and sensitive to hyperparameter tuning (i.e., prone to overfitting). RF regression is an ensemble method that combines multiple decision trees to make predictions, offering good flexibility and robustness while reducing overfitting. RF excels at handling complex data, mitigating overfitting, and provides feature importance, but can be less interpretable. LightGBM is a gradient boosting framework that uses a histogram-based approach to constructing decision trees, resulting in faster training and efficient handling of large datasets. It is efficient for large datasets, offering strong predictive power with fast training times, but it requires careful hyperparameter tuning to not overfit and, as with RF, has limited interpretability. Its trade-offs involve the balance between model interpretability, computational efficiency, and predictive accuracy.

The same optimization strategy was applied to tune the hyperparameters of all algorithms [90,91,92]. Hyperparameter tuning is essential in machine learning as it significantly impacts the model’s performance and generalization. It allows for optimizing the model’s accuracy and ensuring robustness across different datasets. Proper tuning also enhances the model’s efficiency and can influence its interpretability. The proposed systematic hyperparameter tuning strategy involved a two-phase process: a broad random search [93] was followed by a more precise grid search [92]. The initial random search phase explored a wide range of hyperparameter values to identify promising regions within the hyperparameter space. Subsequently, a grid search was conducted in these regions with finer granularity, leading to the selection of optimal hyperparameter values. Each configuration of model/hyperparameters was evaluated by computing the train and test r², mean absolute error (MAE), and root mean square error (RMSE).

4.4.4. Feature Importance

Feature importance was computed based on the Gini impurity decrease [73] for the Random Forest model. The Gini coefficient is a measure of how much each feature helps to make the trees purer. Node impurity quantifies the unpredictability in the target variable at a node (i.e., the point of the decision tree where a split is made). In Random Forests, features that effectively reduce the Gini index (increase purity) when splitting nodes are considered important. The overall importance of a feature is averaged across all trees in the forest, highlighting its influence on predictions. This method is commonly used to study which features have a stronger influence on the model’s result. Other methods were also analyzed, such as examining OLS coefficients or LightGBM split feature importance. Since RF was the best model according to the results (see Section 5), only the RF feature importance is shown in this work.

4.4.5. Inference and Postprocessing

Once the best model was selected, fine-tuned, and trained, it was used to infer the proportion of spruce attacked between September 2020 and October 2021. This result was divided by the known spruce proportion in September 2020 to obtain the bark beetle attack probability in each pixel, ranging from 0 (i.e., no attack, 0% of the pixel is predicted to be attacked at the end of the next season) to 1 (i.e., 100% of the pixel area is expected to suffer a bark beetle infestation at the end of the next season).

4.5. Validation

Validation was carried out using the Saxon Switzerland National Park ground truth data (see Section 3.2 for a detailed description). The damaged areas registered in 2021 were considered ground truth for predicted damage, while the areas registered in 2020 were discarded. This produced an imbalanced reference dataset, with 1281 (11%) samples of damage and 10,448 (89%) of no-damage.

The bark beetle attack product was binarized into damage and no-damage categories using different probability thresholds between 0 and 100%. For each threshold, a confusion matrix was built, and the following performance metrics were computed: overall accuracy (OA), precision (P), recall I, F1-score (F1), relative bias (relB), and commission and omission errors (CE and OE, respectively).

It is important to consider that, with the available ground truth classes, possible damage suffered between September and December 2020, even correctly detected by the model, might bias the validation towards a false high commission error (i.e., precision might decrease). Conversely, areas damaged between November and December 2021 should not be detected and this might bias the validation results towards a higher omission error (i.e., recall would decrease). Given that 89% of the validation samples are non-damage, the validation results are likely to be biased towards lower recall values. However, the areas damaged in autumn–winter should always be residual.

After studying the performance metrics of each binary classification, the best threshold to binarize the continuous bark beetle damage prediction was found through the optimization of the F1-score, which effectively balances precision and recall.

5. Results

5.1. Spruce Classification

The output of the spruce classification algorithm was a binary spruce/non-spruce map of the study area at 10 m spatial resolution (Figure 3).

The spruce classification map was evaluated by building a confusion matrix from the product and the 20% of the reference samples that were not used for training. Different performance metrics were derived from the confusion matrix (Table 4). Agreement metrics reached high values (OA = 91%, F1 = 86%). However, a relatively high omission error (OE = 21%) revealed some underestimation of the spruce area.

5.2. Bark Beetle Damage Estimate

The bark beetle damage estimate map generated for September 2020–October 2021 had a spatial resolution of 10 m (Figure 4) and three different categories depending on the damage intensity.

Performance metrics are shown in Table 5. It is important to note that these statistics were computed based solely on reference data from the Saxon Switzerland N.P. (see Section 4.4. for more details). The metrics give information about the model’s ability to map bark beetle damage, with high values for agreement metrics (OA = 82%, F1 = 85%). Although there is room for improvement regarding the commission error (CE = 21%), the resulting value should be taken with caution, given the unbalanced nature of the validation dataset (i.e., 11% damage vs. 89% no-damage).

5.3. Bark Beetle Predictive Model

5.3.1. Model Comparison and Selection

After adjusting the hyperparameters of all the models tested (i.e., OLS, RF, SVR, XGBoost, LightGBM), Random Forest yielded the best results (Table 6).

For the training phase, the r² values ranged from 0.23 for OLS to 0.49 for RF, indicating the proportion of the variance in the dependent variable captured by the models. Among the ensemble methods, RF exhibited the highest training r². Moving to the testing phase, the r² remained relatively consistent across models, with RF and LightGBM yielding the highest scores at 0.48 and 0.46, respectively. This suggests that these models generalize well to unseen data. OLS and SVR exhibited slightly lower testing r² compared to ensemble methods.

In terms of RMSE, RF and LightGBM outperformed the other models, demonstrating lower prediction errors on the test set. Similar trends were observed for MAE, where RF and LightGBM yielded smaller values compared to OLS, SVR, and XGBoost.

Overall, the results highlight the varying performance of the considered machine learning models, with ensemble methods, particularly Random Forest and LightGBM, demonstrating favourable predictive capabilities on the tested dataset. Although LightGBM clearly outperformed other algorithms in terms of computational efficiency, RF was chosen as the best model to predict bark beetle attacks in view of the performance metrics. The hyperparameters for the RF regression optimized through random + grid search are shown in Table 7.

The choice of square root for max_features ensures a balanced selection of features in each split, while the use of the squared error as the criterion prioritizes the minimization of the mean squared error during model training. Moreover, the selected 250 estimators contribute to the robustness of the ensemble, allowing the model to better capture complex relationships within the data. The maximum tree depth of 41 serves as a regularization mechanism, preventing the model from becoming overly intricate and improving its ability to generalize to unseen data.

The selected RF model yielded very low errors (RMSE = 0.03, MAE = 0.11), highlighting the good adjustment of the model. Nevertheless, the coefficient of determination was modest (r² = 0.49–0.48). This coefficient reveals that the correlation between the analyzed factors and bark beetle damage exists, and it is clear, but the relatively low value highlights some of the shortcomings of the approach. Firstly, although the selected variables are significant for the model, it seems clear that the model is missing some variables that might help to explain bark beetle attack occurrence. It is important to point out that, among other variables, climate and forest age were not considered in this analysis due to data limitations and low resolutions, but they are well-known drivers of bark beetle infestation.

5.3.2. Feature Importance

The Gini impurity-based feature importance ranking derived from the RF model is shown in Table 8.

Spruce and stand densities stand out as the most important variables according to the RF importance, with 21.34% and 16.65% of the model results being explained by these two variables. Together, these two variables (17% of the total) accumulate 38% of the model’s variance, and around 50% of the model is explained by the first three variables. After spruce and stand densities, the density and distance of bark beetle attacks in the previous year was the second most important group of variables, representing 31.7% of the model’s variance. In third place, topographic-derived variables (i.e., elevation, slope, aspect, and solar irradiation) add up to 30.3% of the feature importance.

5.3.3. Final Model

As explained in Section 4.4.2, the map resulting from applying the trained RF model to the whole study area, representing the proportion of spruce damaged per pixel, was divided by the spruce proportion to obtain a single map of damage per pixel area (Figure 5). Areas with the highest bark beetle attack probability (September 2020–October 2021) are located in the spruce plantations near the German–Czech border and, with less intensity, the Polish–Czech border. These regions are characterized by a mountainous topography and the high presence of monospecific Norway spruce plantations, although in some parts there is a relatively high proportion of naturalized spruce individuals in mixed masses (e.g., Saxon Switzerland N.P. in Germany).

5.3.4. Validation and Damage Calibration

As explained in the Methods Section, the predictive map was binarized into damage and no-damage classes using different thresholds (Table 9, Figure 6). The performance metrics showed that the model has a fair ability to discriminate areas with a higher probability of suffering bark beetle attack. For the mid-range thresholds (45%–75%), the OA was always above 90%. Nevertheless, given the imbalance in both the spatial distribution of bark beetle attacks and the reference dataset, it was decided to use F1 as the reference metric. F1 outperforms OA in applications where the performance on the minority class (i.e., no damage, 11% of the samples) or a balanced precision–recall trade-off is crucial. The F1-score was higher than 75% in in probability thresholds around 50%.

Figure 6 shows how the precision increases with the threshold, while recall experiences a gradual decline. The F1-score, the harmonic mean of precision and recall, exhibits a peak around the 52% threshold. The right panel provides insights into commission and omission errors, and relative bias. The commission errors exhibit a decreasing trend, while omission errors rise with high thresholds. Relative bias yields the best value (i.e., closest to 0) around the 50% threshold (Table 9), highlighting the balanced performance of the model when probability is binarized between 45% and 52%. Negative relative bias values in the mid-thresholds reveal a slight systematic bias towards omission.

The threshold for the prediction of bark beetle attack occurrence was established at 52% according to the optimization of F1, with values above this being categorized as damaged, and values below as undamaged. Recall in mid-thresholds might yield higher values in a more balanced dataset. The performance metrics (Table 10) yield a comprehensive assessment of the classification model’s efficacy.

The metrics (Table 10) show the fair performance of the model, with an overall accuracy of 92.3%, indicating its proficiency in correctly categorizing both positive and negative instances. Precision and recall also yielded fair values (P = 81.6%, R = 75.3%). Nevertheless, errors were relatively high (CE = 31.2%, OE = 46.2%, relB = −22.2%). The F1-score, the harmonious mean of precision and recall, stands at 78.0%.

6. Discussion

According to our model, bark beetle attacks can be predicted with relatively high accuracy. More than one-third of the attack probability in areas dominated by spruce seems to be determined by forest structure (i.e., stand density and spruce density), while the other two-thirds are related to the closeness to areas damaged in previous bark beetle outbreaks and topographic factors, including solar irradiation. As previously commented (c.f. Section 5.3.1), we have identified two main data limitations: first, some factors, such as forest age, were not considered in this analysis because of a lack of harmonized field datasets, even though they are recognized as relevant traits to predict bark beetle infestation [2]; second, the coarser spatial resolution of climate time series restricted the possibility of including precipitation and temperature variables in the analysis, which might improve the model’s ability to capture a clearer and stronger relationship between factors and bark beetle attack.

The model validation carried out in the Saxon Switzerland N.P. (c.f. Section 5.3.4) yielded fair overall performance metrics (OA = 92.3%, F1 = 78.0%) for the attack probability threshold of 52% with a slightly decreased predictive performance towards higher and lower attack probabilities. It is important to note that OA is not the best metric to assess model performance, as a model classifying all samples as no-damage would yield an accuracy of 89% given the class imbalance. In these cases, it is highly recommended to look at the F1-score, which provides more meaningful insights on model performance when the classes are not balanced. Precision, which indicates the proportion of correctly classified samples from the predicted positive ones, increased with higher thresholding, since low thresholds lead to higher proportions of false positives (i.e., higher CE). Conversely, high thresholds lead to a decrease in recall, which indicates the proportion of real damaged samples that were correctly classified, and an increase in OE. Recall was the most problematic metric, with values lower than other metrics in the central thresholds. This means that a certain proportion of the real damage samples were not correctly classified. Nevertheless, considering that only 11% of the reference samples belong to damaged areas, this metric should be interpreted with caution. It might be that some proportion of the reference damage samples corresponds to areas affected between October and December 2021, which are not detected by the model. Precision, which was calculated by taking into account the ground truth samples of no-damage (i.e., 89%) is more stable in this case (i.e., if a more complete validation is carried out using a greater area and more balanced dataset, it is likely that the recall values are closer to the ones obtained in this research), yielded higher values than recall using all the central thresholds. Nevertheless, from a management perspective, recall might be a more interesting metric, since it reveals the proportion of predicted damaged areas that are being correctly classified by the model.

To delve deeper into the model’s performance, error metrics were examined. The relatively high CE (31.2%) and OE (46.4%) reveal that there is room for improvement in terms of false positives and false negatives. Although the relative bias remains close to 0 in the central probability thresholds, indicating a balanced prediction capability without significant bias, the trend towards negative values underscores a bias towards omission. Again, this bias might be overestimated as a consequence of the high class imbalance in the reference dataset. It is important to bear in mind that, in comparison with previous studies [35,93], the methods proposed in the present research paper do not depend on field surveys or local data, as they are based on freely accessible remote sensing data (e.g., Sentinel-2) and other datasets at global or pan-European levels (e.g., EU-DEM). It is also important to consider the possible propagation of errors coming from some of the products used as inputs. Such is the case of spruce classification, which yielded an omission error of 21%. Future efforts are needed to improve the spruce classification algorithm to mitigate the propagation of uncertainties.

Our methods were designed and implemented to be scalable to other locations across Europe and eventually be adapted to other continents. Rammer and Seidl [39] adjusted a bark beetle predictive model for the Bavarian Forest N.P. (Germany) based on neural networks without using field data. However, only climatic information and damage records were used as predictive variables, missing the influence of forest structure and topography that has been analyzed in this study. Although their model showed a high inference capacity, the performance metrics obtained against ground truth data were modest (F1 = 0.49). A similar model to the one here presented has been proposed to predict the damage caused by Dendroctonus frontalis in the United States [94]. Although the authors obtained an overall accuracy of 87.7%, the different ecological dynamics of beetles makes these results hardly comparable. The same occurred with other publicly available datasets from several agencies in Germany (Saxony State Forestry, Saxon Switzerland N.P., Brandenburg State Forestry), Czech Republic (Czech Forest Management Institute), and Poland (Polish Forest Research Institute) [95,96,97], which were not tailored to the model’s temporal and spatial resolution (c.f. Acknowledgements).

It is crucial to emphasize that this methodology is designed for delivering general information within a regional or global framework. Inherent uncertainties exist in both the methodology and the data sources used. To the authors’ knowledge, the proposed methodology for predicting a potential attack of Ips typographus is the first one to be fully developed and based on open geospatial datasets. Therefore, we have carried out a SWOT (strengths, weaknesses, opportunities, and threats) analysis (Table 11) that might help researchers identify further research lines to improve the prediction of a bark beetle attack at local scales.

In summary, the lack of field data and spatially continuous datasets in the whole region and the absence of validation datasets tailored to the model’s temporal and spatial resolution underscore the need for high-quality, finely resolved data for accurate predictions. Overcoming these challenges may involve improved data collection methods, data standardization, and the development of validation datasets that mirror the model’s specific requirements. This, in turn, can enhance the model’s utility and reliability in assessing bark beetle infestation hazards. This analysis helped understand the biotic context of a region, which, combined with abiotic factors, may trigger a cascading effect [45].

7. Limitations

7.1. Validation Datasets

There is a lack of appropriate data for calibrating and validating bark beetle outbreaks in the study area. Four datasets have been gathered to monitor bark beetle dynamics (see Appendix A for a detailed description):

Saxon Switzerland National Park bark beetle data records from 1990 to 2021 derived from field surveys provided by TUD (c.f. Section 3.2 for a detailed description).
Damaged and open areas in the Saxon Forest from Sentinel-2 data [96] provided by Saxony State Forestry.
Map of areas of bark beetle spread from planet satellite [97] data provided by Kůrovcová mapa of the Czech Forest Management Institute.
The database of European Forest Insect and Disease Disturbances (DEFID2, [38,95]) provided by JRC.

From the above list, only the bark beetle data records from the Saxon Switzerland National Park could be used to validate our analysis. The three remaining datasets were discarded for the following reasons:

They were not tailored to the time frame of the modelling analysis (e.g., the Saxony State Forestry dataset) due to the presence of clouds and snow (c.f. Section 4.1) at the beginning of the year. Images taken to analyze the status of the forest during the hibernation period of the insect were from the previous year (c.f. Section 4.2). However, Forest Agencies grouped data annually or through a multiannual composite.
Some datasets were not used in the validation process as the bark beetle damage definition was not consistent with the bark beetle damage considered in our study. Datasets such as DEFID2 of Kůrovcová mapa use different categories (e.g., clearcuts, deadwood, unprocessed deadwood) that include areas that have been damaged by bark beetle, but not necessarily in the time frame of the damage assessment campaign. Using these data to validate our predicted bark beetle damage might introduce biases, as it is hard to identify their real sources.

7.2. Datasets on Environmental Factors

Although climate variables are essential to understanding bark beetle outbreaks [15,44,98], there are no spatially continuous datasets at high resolution. ERA-5 Land temperature, precipitation, and wind variables were discarded due to their low spatial resolution (10 km), which was not consistent with the scale of the analysis proposed. The generation of higher-resolution climate datasets using local meteorological stations and interpolation methods might be a solution, but gaining access to meteorological stations in the three different countries (Germany, Poland, and the Czech Republic) included in the study area defined in the FirEUrisk project was not straightforward and the selection of an interpolation method would have had an impact on the results of the model. Nevertheless, we were able to include the variable of potential solar irradiation, which relates to temperature because in large openings, both variables trigger bark beetle propagation. Moreover, fertilization through Nitrogen (N) deposition can fertilize trees and improve the nutrient quality of needles, producing further attacks [2]. N deposition plays an important role in gross primary productivity, especially in N-limited ecosystems [99]. Forest age, which is also known as one of the main factors favouring bark beetle attack, was not included given the lack of datasets containing such information for the whole transboundary area. Although some forest management plans include data on stand age, substantial efforts should be made in harmonizing this information [17,78,79].

7.3. Spatial Resolution

The selected spatial resolution was a compromise between the 100 m required by the project and the 10 m of our variables with higher resolution (i.e., the ones derived from Sentinel-2). Choosing 50 m as the target spatial resolution also significantly reduced the computational burden, which was especially heavy for processes such as computing potential solar irradiance and bark beetle densities, and for hyperparameter tuning.

8. Conclusions

This study proposes a novel and integrated methodological framework by combining different open geospatial datasets with ML techniques to create a predictive model of spruce bark beetle attack for Central Europe in 2021. The time frame of the analysis was selected to understand the environmental context of the region and potential factors that may trigger cascading effects, especially prior to the wildfire that occurred in the cross-border region of the Saxon Switzerland N.P. between Germany and the Czech Republic in summer 2022.

Different environmental datasets were used together with Earth Observation data to train different ML models and produce a bark beetle attack predictive map for September 2020–October 2021 at a spatial resolution of 50 m. Random Forest regression was the best of the architectures tested, yielding a fair adjustment (train r² = 0.49, test r² = 0.48) and low errors (RMSE = 0.03, MAE = 0.11). The resulting bark beetle predicted damage was validated using independent field data records from the Saxon Switzerland National Park. The performance metrics derived from the validation process highlight the ability of the model to predict bark beetle attacks. Fair agreement metrics (F1 = 78.0, OA = 92.3) and balanced errors emphasize the model’s reliability in making predictions. Nevertheless, the lack of accurate reference datasets prevented a robust validation over all the study area.

Its performance indicates the model’s potential utility across a range of scientific and real-world applications. These findings serve as a foundation for future investigations, potentially guiding refinements in the model to better align with specific task requirements. Future improvements should focus on adding more input variables, such as forest age and climate, and using a larger dataset to validate the damage prediction maps.

The proposed approach not only facilitates the prevention and early detection of outbreaks but also allows forest managers and stakeholders to obtain a more nuanced understanding of the complex interplay between environmental factors and bark beetle dynamics. This framework represents a significant step forward in the ongoing efforts to improve the sustainability and resilience of forests against the threat of bark beetle.

Author Contributions

Conceptualization, M.G.-G.; methodology, Á.F.-C., A.F.-N. and M.G.-G.; software, Á.F.-C.; validation, Á.F.-C. and A.F.-N.; investigation, Á.F.-C., A.F.-N. and M.G.-G.; formal analysis, Á.F.-C. and A.F.-N.; data curation, Á.F.-C. and A.F.-N.; writing—original draft preparation, Á.F.-C., A.F.-N. and M.G.-G.; writing—review and editing, Á.F.-C., A.F.-N., M.G.-G. and M.J.Y.-B.; visualization, Á.F.-C. and A.F.-N.; supervision, M.G.-G.; project administration, M.G.-G.; funding acquisition, M.J.Y.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is part of the FirEUrisk project research funded by the European Union’s Horizon 2020 research and innovation program under Grant Agreement No. 101003890.

Data Availability Statement

Data presented in this study is available at https://doi.org/10.5281/zenodo.12533018.

Acknowledgments

In the context of the FirEUrisk project, the Technische Universität Dresden provided this study with bark beetle damage records (1990–2021) for the Saxon Switzerland National Park. We are grateful to Christopher Marrs (TUD) for establishing contact with several agencies to request data to calibrate and validate the analysis: (i) Germany: Saxony State Forestry (Staatsbetrieb Sachsenforst), Brandenburg State Forestry (Landesbetrieb Forst Brandenburg), and Saxon Switzerland National Park (Nationalpark Sächsische Schweiz); (ii) Czechia: Czech Forest Management Institute (Ústav pro hospodářskou úpravu lesů Brandorest ýs nad Labem, ÚHÚL) and Czech State Forest (Lesy České republiky, LČR); and (iii) Poland: Polish State Forests (Lasy Państwowe), Polish Forest Research Institute (Instytut Badawczy Leśnictwa), and Polish Forest Management and Geodesy Office (Biuro Urządzania Lasu i Geodezji Leśnej, BULIGL).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Data Sources.

Provider	Data Source	Reference
National Forest Inventory, NFI	(2021) Lower-Silesian Forest data were provided by the Polish Forest Management and Geodesy Office (Biuro Urządzania Lasu i Geodezji Leśnej, BULIGL) [Dataset]	[49,50]
	(2021) Brandenburg forest data were provided by Brandenburg State Forestry (Landesbetriebes Forst Brandenburg) [Dataset]	[51]
	(2021) Saxon forest data were provided by Saxony State Forestry (Staatsbetrieb Sachsenforst) [Dataset]	[52]
Copernicus Land Monitoring Service, CLMS	(2023). CLC + Backbone 2018, CORINE Land Cover +. Feb. 2023. [Dataset] DOI: https://doi.org/10.2909/cd534ebf-f553-42f0-9ac1-62c1dc36d32c	[53,54]
Copernicus Land Monitoring Service, CLMS	(2016). EU-DEM (raster)—version 1.1, Apr. 2016. [Dataset] PID: https://sdi.eea.europa.eu/catalogue/copernicus/api/records/3473589f-0854-4601-919e-2e7dd172ff50, accessed on 17 May 2024	[55]
Joint Research Centre, JRC	(2023) European Forest Insect & Disease Disturbances, DEFID2. Joint Research Centre, JRC [Dataset] PID: http://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/FOREST/DISTURBANCES/DEFID2/, accessed on 17 May 2024	[38,95]
Dresden University of Technology, TUD	(2021) Saxon Switzerland National Park bark beetle data derived from field work and provided by Dresden University of Technology (Technische Universität Dresden, TUD) [Dataset]	-
Dresden University of Technology, TUD	(2020) Damaged and open areas in the Saxon Forest from Sentinel-2 data by Staatsbetrieb Sachsenforst and provided by Dresden University of Technology (Technische Universität Dresden, TUD) [Dataset]	-

Appendix B

Table A2. Environmental Traits Exposing European Forests to Bark Beetle Attack, According to the European Commission.

Environmental Traits That Predispose towards Bark Beetle Attack		Selected Factors	Data Source
Stand factors/Silviculture	Norway spruce proportion and stand age	Spruce density	EO-derived
	Stand structure	-	-
	Autochthonous tree species	-	-
	Stand density	Stand density	EO-derived
	Stand vitality and tree sociology	-	-
Stand management	Exploitation/Phytosanitary measures	-	-
Site characteristics	Topography	Elevation	EU-DEM
		Slope
		Aspect
	Potential solar irradiation	Potential solar irradiation	DEM-derived
	Site water supply	-	-
	Soil type	-	-
	Soil depth and skeleton	-	-
	Site index	-	-
Climate	Temperature	-	-
	Precipitation	-	-
	Drought periods	-	-
Pre-damages	Wind throw	-	-
	Snow breakage	-	-
	Other damages	-	-
Population dynamics	Abundance	Bark beetle attack density (September 2019–September 2020)	EO-derived
Situation in adjacent forest stands	Population density in adjacent stands	Bark beetle attack density (September 2019–September 2020)	EO-derived
Situation in adjacent forest stands	Predisposition of adjacent stands to infestation	Bark beetle attack distance (September 2019–September 2020)	EO-derived

References

Schelhaas, M.-J.; Nabuurs, G.-J.; Schuck, A. Natural Disturbances in the European Forests in the 19th and 20th Centuries. Glob. Chang Biol. 2003, 9, 1620–1633. [Google Scholar] [CrossRef]
BIO Intelligence Service. Disturbances of EU Forests Caused by Biotic Agents, Final Report Prepared for European Commission (DG ENV); European Commision: Luxembourg, 2011; Volume 270. [Google Scholar]
Raffa, K.F.; Grégoire, J.-C.; Staffan Lindgren, B. Chapter 1—Natural History and Ecology of Bark Beetles. In Bark Beetles; Vega, F.E., Hofstetter, R.W.B.T.-B.B., Eds.; Academic Press: San Diego, CA, USA, 2015; pp. 1–40. ISBN 978-0-12-417156-5. [Google Scholar]
Hlásny, T.; König, L.; Krokene, P.; Lindner, M.; Montagné-Huck, C.; Müller, J.; Qin, H.; Raffa, K.F.; Schelhaas, M.-J.; Svoboda, M.; et al. Bark Beetle Outbreaks in Europe: State of Knowledge and Ways Forward for Management. Curr. For. Rep. 2021, 7, 138–165. [Google Scholar] [CrossRef]
Hlásny, T.; Zimová, S.; Merganicova, K.; Stepanek, P.; Modlinger, R.; Turcani, M. Devastating Outbreak of Bark Beetles in the Czech Republic: Drivers, Impacts, and Management Implications. For. Ecol. Manag. 2021, 490, 119075. [Google Scholar] [CrossRef]
Knížek, M.; Liška, J.; Lubojacký, J. Recent Spruce Bark Beetle Calamity in Czechia. In Proceedings of the Forest Future 2020: Consequences of Bark Beetle Calamity for the Future of Forestry in Central Europe, Jihlava, Czech Republic, 20–23 June 2020. [Google Scholar]
Grodzki, W. Mass Outbreaks of the Spruce Bark Beetle Ips typographus in the Context of the Controversies around the Białowieża Primeval Forest. For. Res. Pap. 2016, 77, 324–331. [Google Scholar] [CrossRef]
Jankowiak, R.; Solheim, H.; Bilański, P.; Mukhopadhyay, J.; Hausner, G. Ceratocystiopsis spp. Associated with Pine- and Spruce-Infesting Bark Beetles in Norway. Mycol. Prog. 2022, 21, 61. [Google Scholar] [CrossRef]
Jaworski, T.; Jabłoński, T.; Skrzecz, I.; Grodzki, W. Current State of Bark Beetle Outbreaks in Poland. In Proceedings of the Forest Future 2020: Consequences of Bark Beetle Calamity for the Future of Forestry in Central Europe, Jihlava, Czech Republic, 20–23 June 2020. [Google Scholar]
Petercord, R. Forest Protection Situation and Measures against Bark Beetles in Germany. In Proceedings of the Forest Future 2020: Consequences of Bark Beetle Calamity for the Future of Forestry in Central Europe, Jihlava, Czech Republic, 20–23 June 2020. [Google Scholar]
Lecina-Diaz, J.; Martínez-Vilalta, J.; Alvarez, A.; Banqué, M.; Birkmann, J.; Feldmeyer, D.; Vayreda, J.; Retana, J. Characterizing Forest Vulnerability and Risk to Climate-Change Hazards. Front Ecol. Environ. 2021, 19, 126–133. [Google Scholar] [CrossRef]
Seidl, R.; Schelhaas, M.-J.; Rammer, W.; Verkerk, P.J. Increasing Forest Disturbances in Europe and Their Impact on Carbon Storage. Nat. Clim. Chang. 2014, 4, 806–810. [Google Scholar] [CrossRef] [PubMed]
Thom, D.; Seidl, R. Natural Disturbance Impacts on Ecosystem Services and Biodiversity in Temperate and Boreal Forests. Biol. Rev. 2016, 91, 760–781. [Google Scholar] [CrossRef] [PubMed]
Beetz, K.; Marrs, C.; Busse, A.; Poděbradská, M.; Kinalczyk, D.; Kranz, J.; Forkel, M. Effects of Bark Beetle Disturbance and Fuel Types on Fire Radiative Power and Burn Severity in the Bohemian-Saxon Switzerland. For. Int. J. For. Res. 2024, cpae024. [Google Scholar] [CrossRef]
Wermelinger, B. Ecology and Management of the Spruce Bark Beetle Ips typographus—A Review of Recent Research. For Ecol. Manag. 2004, 202, 67–82. [Google Scholar] [CrossRef]
Bentz, B.; Jönsson, A. Modeling Bark Beetle Responses to Climate Change. In Bark Beetles: Biology and Ecology of Native and Invasive Species; Academic Press: San Diego, CA, USA, 2015; pp. 533–553. ISBN 9780124171565. [Google Scholar]
Huang, C.; Anderegg, W.R.L.; Asner, G.P. Remote Sensing of Forest Die-off in the Anthropocene: From Plant Ecophysiology to Canopy Structure. Remote Sens. Environ. 2019, 231, 111233. [Google Scholar] [CrossRef]
Marini, L.; Økland, B.; Jönsson, A.M.; Bentz, B.; Carroll, A.; Forster, B.; Grégoire, J.-C.; Hurling, R.; Nageleisen, L.M.; Netherer, S.; et al. Climate Drivers of Bark Beetle Outbreak Dynamics in Norway Spruce Forests. Ecography 2017, 40, 1426–1435. [Google Scholar] [CrossRef]
Wulder, M.A.; Dymond, C.C.; White, J.C.; Leckie, D.G.; Carroll, A.L. Surveying Mountain Pine Beetle Damage of Forests: A Review of Remote Sensing Opportunities. For. Ecol. Manag. 2006, 221, 27–41. [Google Scholar] [CrossRef]
Bárta, V.; Hanuš, J.; Dobrovolný, L.; Homolová, L. Comparison of Field Survey and Remote Sensing Techniques for Detection of Bark Beetle-Infested Trees. For. Ecol. Manag. 2022, 506, 119984. [Google Scholar] [CrossRef]
Dalponte, M.; Solano-Correa, Y.T.; Frizzera, L.; Gianelle, D. Mapping a European Spruce Bark Beetle Outbreak Using Sentinel-2 Remote Sensing Data. Remote Sens. 2022, 14, 3135. [Google Scholar] [CrossRef]
Fernandez-Carrillo, A.; Patočka, Z.; Dobrovolný, L.; Franco-Nieto, A.; Revilla-Romero, B. Monitoring Bark Beetle Forest Damage in Central Europe. A Remote Sensing Approach Validated with Field Data. Remote Sens. 2020, 12, 3634. [Google Scholar] [CrossRef]
Abdullah, H.; Darvishzadeh, R.; Skidmore, A.K.; Groen, T.A.; Heurich, M. European Spruce Bark Beetle (Ips typographus, L.) Green Attack Affects Foliar Reflectance and Biochemical Properties. Int. J. Appl. Earth Obs. Geoinf. 2018, 64, 199–209. [Google Scholar] [CrossRef]
Huo, L.; Lindberg, E.; Bohlin, J.; Persson, H.J. Assessing the Detectability of European Spruce Bark Beetle Green Attack in Multispectral Drone Images with High Spatial- and Temporal Resolutions. Remote Sens. Environ. 2023, 287, 113484. [Google Scholar] [CrossRef]
Zabihi, K.; Surovy, P.; Trubin, A.; Singh, V.V.; Jakuš, R. A Review of Major Factors Influencing the Accuracy of Mapping Green-Attack Stage of Bark Beetle Infestations Using Satellite Imagery: Prospects to Avoid Data Redundancy. Remote Sens. Appl. 2021, 24, 100638. [Google Scholar] [CrossRef]
Einzmann, K.; Ng, W.-T.; Immitzer, M.; Pinnel, N.; Atzberger, C. Method Analysis for Collecting and Processing In-Situ Hyperspectral Needle Reflectance Data for Monitoring Norway Spruce. Photogramm. Fernerkund. Geoinform 2014, 5, 423–434. [Google Scholar] [CrossRef]
Abdullah, H.; Skidmore, A.K.; Darvishzadeh, R.; Heurich, M. Sentinel-2 Accurately Maps Green-Attack Stage of European Spruce Bark Beetle (Ips typographus, L.) Compared with Landsat-8. Remote Sens. Ecol. Conserv. 2019, 5, 87–106. [Google Scholar] [CrossRef]
Fassnacht, F.E.; Latifi, H.; Ghosh, A.; Joshi, P.K.; Koch, B. Assessing the Potential of Hyperspectral Imagery to Map Bark Beetle-Induced Tree Mortality. Remote Sens. Environ. 2014, 140, 533–548. [Google Scholar] [CrossRef]
de Groot, M.; Ogris, N. Short-Term Forecasting of Bark Beetle Outbreaks on Two Economically Important Conifer Tree Species. For. Ecol. Manag. 2019, 450, 117495. [Google Scholar] [CrossRef]
Lausch, A.; Heurich, M.; Fahse, L. Spatio-Temporal Infestation Patterns of Ips typographus (L.) in the Bavarian Forest National Park, Germany. Ecol. Indic. 2013, 31, 73–81. [Google Scholar] [CrossRef]
Netherer, S.; Nopp-Mayr, U. Predisposition Assessment Systems (PAS) as Supportive Tools in Forest Management—Rating of Site and Stand-Related Hazards of Bark Beetle Infestation in the High Tatra Mountains as an Example for System Application and Verification. For. Ecol. Manag. 2005, 207, 99–107. [Google Scholar] [CrossRef]
Seidl, R.; Baier, P.; Rammer, W.; Schopf, A.; Lexer, M.J. Modelling Tree Mortality by Bark Beetle Infestation in Norway Spruce Forests. Ecol. Model 2007, 206, 383–399. [Google Scholar] [CrossRef]
Seidl, R.; Schelhaas, M.J.; Lindner, M.; Lexer, M.J. Modelling Bark Beetle Disturbances in a Large Scale Forest Scenario Model to Assess Climate Change Impacts and Evaluate Adaptive Management Strategies. Reg. Environ. Change 2009, 9, 101–119. [Google Scholar] [CrossRef]
Marvasti-Zadeh, S.M.; Goodsman, D.; Ray, N.; Erbilgin, N. Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review. ACM Comput. Surv. 2024, 56, 1–40. [Google Scholar] [CrossRef]
Duraciova, R.; Munko, M.; Barka, I.; Koren, M.; Resnerova, K.; Holusa, J.; Blazenec, M.; Potterf, M.; Jakus, R. A Bark Beetle Infestation Predictive Model Based on Satellite Data in the Frame of Decision Support System TANABBO. IForest 2020, 13, 215–223. [Google Scholar] [CrossRef]
Commission, E.; Environment, D.-G.; Atzberger, C.; Zeug, G.; Defourny, P.; Aragão, L.; Hammarström, L.; Immitzer, M. Monitoring of Forests through Remote Sensing: Final Report; Publications Office: Luxembourg, 2020. [Google Scholar]
Pause, M.; Schweitzer, C.; Rosenthal, M.; Keuck, V.; Bumberger, J.; Dietrich, P.; Heurich, M.; Jung, A.; Lausch, A. In Situ/Remote Sensing Integration to Assess Forest Health—A Review. Remote Sens. 2016, 8, 471. [Google Scholar] [CrossRef]
Forzieri, G.; Dutrieux, L.P.; Elia, A.; Eckhardt, B.; Caudullo, G.; Taboada, F.Á.; Andriolo, A.; Bălăcenoiu, F.; Bastos, A.; Buzatu, A.; et al. The Database of European Forest Insect and Disease Disturbances: DEFID2. Glob. Change Biol. 2023, 29, 6040–6065. [Google Scholar] [CrossRef] [PubMed]
Rammer, W.; Seidl, R. Harnessing Deep Learning in Ecology: An Example Predicting Bark Beetle Outbreaks. Front Plant Sci. 2019, 10, 1327. [Google Scholar] [CrossRef] [PubMed]
Ramazi, P.; Kunegel-Lion, M.; Greiner, R.; Lewis, M.A. Predicting Insect Outbreaks Using Machine Learning: A Mountain Pine Beetle Case Study. Ecol. Evol. 2021, 11, 13014–13028. [Google Scholar] [CrossRef] [PubMed]
Koreň, M.; Jakuš, R.; Zápotocký, M.; Barka, I.; Holuša, J.; Ďuračiová, R.; Blaženec, M. Assessment of Machine Learning Algorithms for Modeling the Spatial Distribution of Bark Beetle Infestation. Forests 2021, 12, 395. [Google Scholar] [CrossRef]
Pichler, M.; Hartig, F. Machine Learning and Deep Learning—A Review for Ecologists. Methods Ecol. Evol. 2023, 14, 994–1016. [Google Scholar] [CrossRef]
Baier, P.; Pennerstorfer, J.; Schopf, A. PHENIPS—A Comprehensive Phenology Model of Ips typographus (L.) (Col., Scolytinae) as a Tool for Hazard Rating of Bark Beetle Infestation. For. Ecol. Manag. 2007, 249, 171–186. [Google Scholar] [CrossRef]
Czech Hydrometeorological Institute Weather and Bark Beetle. Available online: https://www.chmi.cz/aktualni-situace/aktualni-stav-pocasi/ceska-republika/pocasi-a-kurovec (accessed on 17 May 2024).
Chuvieco, E.; Yebra, M.; Martino, S.; Thonicke, K.; Gómez-Giménez, M.; San-Miguel, J.; Oom, D.; Velea, R.; Mouillot, F.; Molina, J.R.; et al. Towards an Integrated Approach to Wildfire Risk Assessment: When, Where, What and How May the Landscapes Burn. Fire 2023, 6, 215. [Google Scholar] [CrossRef]
Boháč, A.; Drápela, E. Present Climate Change as a Threat to Geoheritage: The Wildfire in Bohemian Switzerland National Park and Its Use in Place-Based Learning. Geosciences 2023, 13, 383. [Google Scholar] [CrossRef]
Berčák, R.; Holuša, J.; Kaczmarowski, J.; Tyburski, Ł.; Szczygieł, R.; Held, A.; Vacik, H.; Slivinský, J.; Chromek, I. Fire Protection Principles and Recommendations in Disturbed Forest Areas in Central Europe: A Review. Fire 2023, 6, 310. [Google Scholar] [CrossRef]
European Space Agency. Copernicus Open Access Hub. Available online: https://scihub.copernicus.eu/ (accessed on 17 May 2024).
Biuro Urządzania Lasu i Geodezji Leśnej. Lower-Silesian Forest Data. Available online: https://www.bdl.lasy.gov.pl/portal/mapy-en (accessed on 17 May 2024).
Talarczyk, A. National Forest Inventory in Poland. Balt For 2014, 20, 333–341. [Google Scholar]
Landesbetriebes Forst Brandenburg Brandenburg Forest Data. Available online: https://forst.brandenburg.de/lfb/de/ (accessed on 17 May 2024).
Staatsbetrieb Sachsenforst Saxon Forest Data. Available online: https://www.wald.sachsen.de/ergebnisse-der-bundeswaldinventur-3-4913.html (accessed on 17 May 2024).
Copernicus Land Monitoring Service. CLC+Backbone 2018 (Raster 10 m), Europe, 3-Yearly, Feb. 2023. Available online: https://sdi.eea.europa.eu/catalogue/copernicus/api/records/cd534ebf-f553-42f0-9ac1-62c1dc36d32c?language=all (accessed on 17 May 2024).
Copernicus Land Monitoring Service. CLC+ Backbone Product Specification and User Manual. Version: 3.0. 2022. Available online: https://land.copernicus.eu/en/technical-library/clc-backbone-product-user-manual/@@download/file (accessed on 17 May 2024).
European Environment Agency. EU-DEM (Raster)—Version 1.1, Apr. 2016. Available online: https://sdi.eea.europa.eu/catalogue/srv/api/records/3473589f-0854-4601-919e-2e7dd172ff50 (accessed on 17 May 2024).
Frantz, D.; Haß, E.; Uhl, A.; Stoffels, J.; Hill, J. Improvement of the Fmask Algorithm for Sentinel-2 Images: Separating Clouds from Bright Surfaces Based on Parallax Effects. Remote Sens. Environ. 2018, 215, 471–481. [Google Scholar] [CrossRef]
Bonannella, C.; Hengl, T.; Heisig, J.; Parente, L.; Wright, M.; Herold, M.; de Bruin, S. Forest Tree Species Distribution for Europe 2000-2020: Mapping Potential and Realized Distributions Using Spatiotemporal Machine Learning. PeerJ 2022, 10, e13728. [Google Scholar] [CrossRef] [PubMed]
Candotti, A.; De Giglio, M.; Dubbini, M.; Tomelleri, E. A Sentinel-2 Based Multi-Temporal Monitoring Framework for Wind and Bark Beetle Detection and Damage Mapping. Remote Sens. 2022, 14, 6105. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Azadeh, A.; Dimitrios, P.; Peter, S. Forest Canopy Density Assessment Using Different Approaches—Review. J. For. Sci. 2017, 63, 107–116. [Google Scholar]
Godinho, S.; Gil, A.; Guiomar, N.; Neves, N.; Pinto-Correia, T. A Remote Sensing-Based Approach to Estimating Montado Canopy Density Using the FCD Model: A Contribution to Identifying HNV Farmlands in Southern Portugal. Agrofor. Syst. 2016, 90, 23–34. [Google Scholar] [CrossRef]
Rikimaru, A.; Roy, P.S.; Miyatake, S. Tropical Forest Cover Density Mapping. Trop. Ecol. 2002, 43, 39–47. [Google Scholar]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between Leaf Chlorophyll Content and Spectral Reflectance and Algorithms for Non-Destructive Chlorophyll Assessment in Higher Plant Leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef] [PubMed]
Cao, X.; Chen, J.; Matsushita, B.; Imura, H. Developing a MODIS-Based Index to Discriminate Dead Fuel from Photosynthetic Vegetation and Soil Background in the Asian Steppe Area. Int. J. Remote Sens. 2010, 31, 1589–1604. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Nguyen, C.T.; Chidthaisong, A.; Kieu Diem, P.; Huo, L.-Z. A Modified Bare Soil Index to Identify Bare Land Features during Agricultural Fallow-Period in Southeast Asia Using Landsat 8. Land 2021, 10, 231. [Google Scholar] [CrossRef]
Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A Modified Soil Adjusted Vegetation Index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Gao, B. NDWI—A Normalized Difference Water Index for Remote Sensing of Vegetation Liquid Water from Space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Gitelson, A.; Merzlyak, M.N. Spectral Reflectance Changes Associated with Autumn Senescence of Aesculus Hippocastanum L. and Acer Platanoides L. Leaves. Spectral Features and Relation to Chlorophyll Estimation. J. Plant Physiol. 1994, 143, 286–292. [Google Scholar] [CrossRef]
Barnes, E.; Clarke, T.R.; Richards, S.E.; Colaizzi, P.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T.L. Coincident Detection of Crop Water Stress, Nitrogen Status, and Canopy Density Using Ground Based Multispectral Data. In Proceedings of the Fifth International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July 2000. [Google Scholar]
Tucker, C.J.; Elgin, J.H.; McMurtrey, J.E.; Fan, C.J. Monitoring Corn and Soybean Crop Development with Hand-Held Radiometer Spectral Data. Remote Sens. Environ. 1979, 8, 237–248. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Pasqualotto, N.; Delegido, J.; Van Wittenberghe, S.; Rinaldi, M.; Moreno, J. Multi-Crop Green LAI Estimation with a New Simple Sentinel-2 LAI Index (SeLI). Sensors 2019, 19, 904. [Google Scholar] [CrossRef] [PubMed]
Mezei, P.; Potterf, M.; Škvarenina, J.; Rasmussen, J.G.; Jakuš, R. Potential Solar Radiation as a Driver for Bark Beetle Infestation on a Landscape Scale. Forests 2019, 10, 604. [Google Scholar] [CrossRef]
Hofierka, J.; Šúri, M. The Solar Radiation Model for Open Source GIS: Implementation and The Solar Radiation Model for Open Source GIS: Implementation and Applications. In Proceedings of the Open Source GIS-GRASS Users Conference, Seoul, Republic of Korea, 14–15 September 2015. [Google Scholar]
Netherer, S.; Panassiti, B.; Pennerstorfer, J.; Matthews, B. Acute Drought Is an Important Driver of Bark Beetle Infestation in Austrian Norway Spruce Stands. Front. For. Glob. Change 2019, 2, 39. [Google Scholar] [CrossRef]
Hroššo, B.; Mezei, P.; Potterf, M.; Majdák, A.; Blaženec, M.; Korolyova, N.; Jakuš, R. Drivers of Spruce Bark Beetle (Ips typographus) Infestations on Downed Trees after Severe Windthrow. Forests 2020, 11, 1290. [Google Scholar] [CrossRef]
Tanase, M.A.; Aponte, C.; Mermoz, S.; Bouvet, A.; Le Toan, T.; Heurich, M. Detection of Windthrows and Insect Outbreaks by L-Band SAR: A Case Study in the Bavarian Forest National Park. Remote Sens. Environ. 2018, 209, 700–711. [Google Scholar] [CrossRef]
Forsse, E.; Solbreck, C.H. Migration in the Bark Beetle Ips typographus L.: Duration, Timing and Height of Flight. Z. Für Angew. Entomol. 1985, 100, 47–57. [Google Scholar] [CrossRef]
Franklin, A.J.; Grégoire, J.-C. Flight Behaviour of Ips typographus L. (Col., Scolytidae) in an Environment without Pheromones. Ann. For Sci. 1999, 56, 591–598. [Google Scholar] [CrossRef]
European Environment Agency. Conversion of DN in “SLOP” Files into Degrees off Horizontal of the Surface Tangent; EEA: Copenhagen, Denmark, 2016. [Google Scholar]
European Environment Agency. Conversion of DN in CP-ASPC Files into Degrees North over East; EEA: Copenhagen, Denmark, 2016. [Google Scholar]
MacLeod, C.D.; Mandleberg, L.; Schweder, C.; Bannon, S.M.; Pierce, G.J. A Comparison of Approaches for Modelling the Occurrence of Marine Animals. Hydrobiologia 2008, 612, 21–32. [Google Scholar] [CrossRef]
Rodríguez, P.; Bautista, M.A.; Gonzàlez, J.; Escalera, S. Beyond One-Hot Encoding: Lower Dimensional Target Embedding. Image Vis. Comput. 2018, 75, 21–31. [Google Scholar] [CrossRef]
Sen, P.C.; Hajra, M.; Ghosh, M. Supervised Classification Algorithms in Machine Learning: A Survey and Review BT—Emerging Technology in Modelling and Graphics; Mandal, J.K., Bhattacharya, D., Eds.; Springer Singapore: Singapore, 2020; pp. 99–111. [Google Scholar]
Maulud, D.; Abdulazeez, A.M. A Review on Linear Regression Comprehensive in Machine Learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
Kavitha, S.; Varuna, S.; Ramya, R. A Comparative Analysis on Linear Regression and Support Vector Regression. In Proceedings of the 2016 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, India, 19 November 2016; pp. 1–5. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017); Guyon, I., Luxburg, U., Von Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Glasgow, UK, 2017; Volume 30. [Google Scholar]
Gridin, I. Hyperparameter Optimization. In Automated Machine Learning: METHODS, Systems, Challenges; Springer: Berlin/Heidelberg, Germany, 2022; ISBN 9783030053178. [Google Scholar]
Schratz, P.; Muenchow, J.; Iturritxa, E.; Richter, J.; Brenning, A. Hyperparameter Tuning and Performance Assessment of Statistical and Machine-Learning Algorithms Using Spatial Data. Ecol. Modell 2019, 406, 109–120. [Google Scholar] [CrossRef]
Andonie, R. Hyperparameter Optimization in Learning Systems. J. Membr. Comput. 2019, 1, 279–291. [Google Scholar] [CrossRef]
Stadelmann, G.; Bugmann, H.; Wermelinger, B.; Meier, F.; Bigler, C. A Predictive Framework to Assess Spatio-Temporal Variability of Infestations by the European Spruce Bark Beetle. Ecography 2013, 36, 1208–1217. [Google Scholar] [CrossRef]
Munro, H.L.; Montes, C.R.; Gandhi, K.J.K. A New Approach to Evaluate the Risk of Bark Beetle Outbreaks Using Multi-Step Machine Learning Methods. For. Ecol. Manag. 2022, 520, 120347. [Google Scholar] [CrossRef]
Forzieri, G.; Beck, P.; Cescatti, A. Database of European Forest Insect & Disease Disturbances—DEFID2. 2020. Available online: https://forest.jrc.ec.europa.eu/media/filer_public/c1/1e/c11e2b28-b263-4cbe-ad99-4a0447d1f7fc/defid2_protocol-for-data-collection_v01-1.pdf (accessed on 17 May 2024).
Staatsbetrieb Sachsenforst Damaged and Open Areas in the Saxon Forest from Sentinel-2 Data (Raster). Available online: https://geoportal.de/Metadata/f03d3d89-34b8-430d-84c5-87ed24c3a0b9 (accessed on 17 May 2024).
Ústav pro Hospodářskou Úpravu lesů Brandýs nad Labem. Kurovcová Mapa. Available online: https://www.kurovcovamapa.cz/ (accessed on 17 May 2024).
Seidl, R.; Müller, J.; Hothorn, T.; Bässler, C.; Heurich, M.; Kautz, M. Small Beetle, Large-Scale Drivers: How Regional and Landscape Factors Affect Outbreaks of the European Spruce Bark Beetle. J. Appl. Ecol. 2016, 53, 530–540. [Google Scholar] [CrossRef] [PubMed]
Gómez Giménez, M.; de Jong, R.; Keller, A.; Rihm, B.; Schaepman, M.E. Studying the Influence of Nitrogen Deposition, Precipitation, Temperature, and Sunshine in Remotely Sensed Gross Primary Production Response in Switzerland. Remote Sens. 2019, 11, 1135. [Google Scholar] [CrossRef]

Figure 1. FirEUrisk pilot site in Central–Eastern Europe.

Figure 2. Flowchart describing the main steps of the methodology. The results of the methodology are highlighted through color. The intermediate outputs are marked in light yellow, while the main output is marked in dark yellow.

Figure 3. Map of spruce classification in Sep. 2020 derived from Sentinel-2 data.

Figure 4. Map of bark beetle damage from September 2020 to October 2021, derived from Sentinel-2 data and validated with the reference dataset.

Figure 5. Map of the bark beetle predictive model from September 2020 to October 2021.

Figure 6. Optimization of performance metrics of thresholding.

Table 1. Pairwise Pearson correlation of select vegetation indices. Color scale reflects a higher correlation in reds, a medium correlation in yellows, and a lower correlation in greens of the different vegetation indices.

Index	CLRE	DFI	EVI	MBI	MSAVI2	NDMI	NDREI1	NDREI2	NDVI
BSI	0.87	0.82	0.64	0.94	0.63	0.97	0.92	0.92	0.92
CLRE		0.8	0.67	0.73	0.67	0.77	0.96	0.96	0.87
DFI			0.67	0.7	0.68	0.73	0.86	0.86	0.95
EVI				0.54	0.99	0.57	0.67	0.65	0.68
MBI					0.53	0.99	0.78	0.77	0.77
MSAVI2						0.56	0.66	0.65	0.68
NDMI							0.83	0.82	0.82
NDREI1								0.99	0.94
NDREI2									0.94
NDVI

Table 2. Environmental traits selected to be included in the feature engineering and modelling process.

Name	Format	Resolution (m)	Units	Date
Spruce density	TIF	10	%	September 2020
Stand density	TIF	10	%	September 2020
Elevation	TIF	25	m	2011
Slope	TIF	25	Degrees *	2011
Aspect	TIF	25	Degrees *	2011
Potential solar irradiation	TIF	50	W/m²	October 2020–September 2021
Bark beetle density (previous season)	TIF	50	Damaged ha/total ha	1 August 2019–1 September 2020
Bark beetle distance (previous season)	TIF	50	m	1 August 2019–1 September 2020

* Originally in pseudo-degrees and transformed [82,83].

Table 3. Pearson correlation matrix for all variables after applying transformations. Colour scale reflects the level of correlation in reds. Lighter colours refers to lower correlations.

	Stand Density	Elevation	Slope	Potential Solar Irradiation	Bark Beetle Density 200 m	Bark Beetle Density 1 km	Bark Beetle Density 10 km	Bark Beetle Density 25 km	Bark Beetle Distance	Aspect Sin	Aspect Cos
Spruce density	0.71	0.47	0.44	0.11	0.27	0.26	0.19	0.31	0.30	0	0
Stand density		0.23	0.23	0.03	0.16	0.12	0.11	0.11	0.11	0	0.01
Elevation			0.51	0.40	0.18	0.26	0.14	0.34	0.49	0	0.01
Slope				0.01	0.21	0.25	0.17	0.31	0.29	0	0
Potential solar irradiation					0.05	0.09	0.03	0.11	0.18	0.01	0.02
Bark beetle density 200 m						0.73	0.69	0.57	0.15	0	0
Bark beetle density 1 km							0.46	0.87	0.21	0	0
Bark beetle density 10 km								0.38	0.12	0	0
Bark beetle density 25 km									0.29	0	0.01
Bark beetle distance										0	0
Aspect Sin											0
Aspect Cos

Table 4. Performance metrics of the spruce mask (September 2020).

Metric	Value (%)
OA	91
P	79
R	94
F1	86
CE	3
OE	21
relB	−16

Table 5. Performance metrics of the bark beetle damage estimate (September 2020–October 2021).

Metric	Value (%)
OA	82
P	80
R	92
F1	85
CE	21
OE	8
relB	16

Table 6. Comparison of the training and test metrics obtained from the trained ML models.

Metric	OLS	RF	SVR	XGBoost	LightGBM
Train r²	0.23	0.49	0.39	0.45	0.45
Test r²	0.21	0.48	0.38	0.45	0.46
Test RMSE	0.47	0.03	0.16	0.05	0.03
Test MAE	0.72	0.11	0.31	0.14	0.11

Table 7. Hyperparameter tuning results for the RF model.

Hyperparameter	Best Value
min_samples_split	5
min_samples_leaf	1
max_features	Square root
criterion	Squared error
n_estimators	250
max_depth	41

Table 8. Variable ranking derived from the RF impurity decrease.

Ranking	Variable	Importance (%)
1	Spruce density	21.34
2	Stand density	16.65
3	Elevation	10.04
4	Bark beetle density 25 km	9.28
5	Bark beetle density 10 km	8.07
6	Bark beetle distance	6.20
7	Bark beetle density 1 km	6.17
8	Potential solar irradiation	5.89
9	Slope	5.56
10	Aspect Sin	4.41
11	Aspect Cos	4.37
12	Bark beetle density 200 m	2.02

Table 9. Performance metrics of the damage/no-damage predictions based on different attack thresholds. The sequential colour scale reflects the negative and positive values of each performance metric in three colours: green, yellow, and red, which correspond to green for positive values, yellow for mid-range values, and red for negative values.

Attack Threshold	0.25	0.4	0.45	0.48	0.5	0.52	0.55	0.6	0.75
OA	76.6	87.1	90	91.3	91.8	92.3	92.6	92.7	91.1
P	64.4	70.3	74.5	77.7	79.5	81.6	84.2	88.3	93.8
R	82.3	80.2	78.4	77	75.9	75.3	73.2	70	59.8
F1	65.3	73.6	76.2	77.4	77.5	78	77.3	75.5	63.9
CE	69.5	55.7	46.4	39.6	35.7	31.2	25.5	16.6	3.5
OE	10.3	28.5	36.5	41.2	44.3	46.4	51.5	58.9	80.4
relB	194.1	61.4	18.5	−2.7	−13.4	−22.2	−35	−50.7	−79.7

Table 10. Performance metrics in Saxon Switzerland National Park, DE.

Metric	Value (%)
OA	92.3
P	81.6
R	75.3
F1	78.0
CE	31.2
OE	46.4
relB	−22.2

Table 11. SWOT (strengths, weaknesses, opportunities, and threats) analysis of bark beetle predictive modelling.

Internal

External

Strengths

Semi-automatic pipeline: Semi-automatic variable selection, automatic hyperparameter tuning. The model can identify and use the most relevant variables for predicting bark beetle outbreaks and optimize a model.
Good metric-based performance: High OA and F1 and low errors (CE, OE, RMSE, and MAE) suggest that the estimates provided by the model tend to be close to actual values, which is essential for informed decision-making.
ML-based methodology: One of the main advantages of ML algorithms is that they adapt easily to new data as compared to classical approaches.

Opportunities

Ease the extrapolation to other regions: The model can be applied to various geographic areas, making it highly adaptable and useful in a global or regional context. This is particularly beneficial for large-scale forest and pest management.
Replicability and standardization of the methodology: Consistency and reproducibility ensure reliable comparisons between studies.
Data are Findable, Accessible, Interoperable, and Re-usable (FAIR): This can promote collaborative research in forest ecology and pest management, laying the groundwork for sustainable solutions and informed decision-making.

Weaknesses

Absence of important data: The literature underscores climate, forest age, and other damage types (e.g., snow, windthrows, drought) as pivotal factors in beetle outbreaks, which were not available for this analysis. Other significant factors might have been ignored, such as those related to ecological processes that cannot be inferred from geospatial datasets (e.g., interactions with other pests or the effects of dead biomass).
Inherited data limitations from the inputs: The model encounters spatial resolution challenges due to broad-scale data. Temporal resolution constraints were caused by varying data intervals. Inconsistent data formats need extensive preprocessing, potentially compromising adaptability and accuracy.
Lack of validation datasets for this temporal resolution: This hindered a robust accuracy assessment, potentially compromising the model‘s reliability in other spatial and temporal contexts.

Threats

Transboundary areas with different forest management and policies: Diverse forest management and policies poses challenges in coordinating strategies, sharing data, and implementing unified pest management measures. These complexities hinder the efficacy of early warning systems, needing efforts and collaboration for a successful mitigation of bark beetle threats.
Interoperability and Open data: The lack of large, open, and interoperable datasets may hinder the validation of this type of approach at large scales. Collaborative efforts at the administrative and research levels are required for effective pest management and forest conservation, especially in cross-border regions.
Need for a pan-European monitoring system for forest disease: Standardized reference bark beetle damage data with consistent methodologies are essential. Efforts have started to be made in this sense [39].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fernández-Carrillo, Á.; Franco-Nieto, A.; Yagüe-Ballester, M.J.; Gómez-Giménez, M. Predictive Model for Bark Beetle Outbreaks in European Forests. Forests 2024, 15, 1114. https://doi.org/10.3390/f15071114

AMA Style

Fernández-Carrillo Á, Franco-Nieto A, Yagüe-Ballester MJ, Gómez-Giménez M. Predictive Model for Bark Beetle Outbreaks in European Forests. Forests. 2024; 15(7):1114. https://doi.org/10.3390/f15071114

Chicago/Turabian Style

Fernández-Carrillo, Ángel, Antonio Franco-Nieto, María Julia Yagüe-Ballester, and Marta Gómez-Giménez. 2024. "Predictive Model for Bark Beetle Outbreaks in European Forests" Forests 15, no. 7: 1114. https://doi.org/10.3390/f15071114

APA Style

Fernández-Carrillo, Á., Franco-Nieto, A., Yagüe-Ballester, M. J., & Gómez-Giménez, M. (2024). Predictive Model for Bark Beetle Outbreaks in European Forests. Forests, 15(7), 1114. https://doi.org/10.3390/f15071114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predictive Model for Bark Beetle Outbreaks in European Forests

Abstract

1. Introduction

2. Study Area

3. Data Description

3.1. Sentinel-2 Imagery

3.2. Environmental Traits

4. Methods

4.1. Sentinel-2 Composites

4.2. Spruce Classification (September 2020)

4.3. Bark Beetle Damage Estimate (September 2020–October 2021)

4.4. Bark Beetle Predictive Modelling

4.4.1. Bark Beetle Parametrization: Environmental Traits

4.4.2. Feature Engineering

4.4.3. Model Selection and Hyperparameter Tuning

4.4.4. Feature Importance

4.4.5. Inference and Postprocessing

4.5. Validation

5. Results

5.1. Spruce Classification

5.2. Bark Beetle Damage Estimate

5.3. Bark Beetle Predictive Model

5.3.1. Model Comparison and Selection

5.3.2. Feature Importance

5.3.3. Final Model

5.3.4. Validation and Damage Calibration

6. Discussion

7. Limitations

7.1. Validation Datasets

7.2. Datasets on Environmental Factors

7.3. Spatial Resolution

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI