High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset

Bessardon, Geoffrey; Rieutord, Thomas; Gleeson, Emily; Pálmason, Bolli; Oswald, Sandro

doi:10.3390/land13111811

Open AccessArticle

High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset

by

Geoffrey Bessardon

^1,*,†

,

Thomas Rieutord

^1,†

,

Emily Gleeson

^1,*

,

Bolli Pálmason

² and

Sandro Oswald

³

¹

Met Éireann, 65/67 Glasnevin Hill, D09 Y921 Dublin, Ireland

²

Veðurstofa Íslands, Bústaðavegi 7–9, 105 Reykjavík, Iceland

³

GeoSphere Austria, Hohe Warte 38, 1190 Vienna, Austria

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Land 2024, 13(11), 1811; https://doi.org/10.3390/land13111811

Submission received: 13 September 2024 / Revised: 23 October 2024 / Accepted: 30 October 2024 / Published: 1 November 2024

(This article belongs to the Special Issue Feature Papers for Land Innovations—Data and Machine Learning: 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

ECOCLIMAP-SG+ is a new 60 m land use land cover dataset, which covers a continental domain and represents the 33 labels of the original ECOCLIMAP-SG dataset. ECOCLIMAP-SG is used in HARMONIE-AROME, the numerical weather prediction model used operationally by Met Éireann and other national meteorological services. ECOCLIMAP-SG+ was created using an agreement-based method to combine information from many maps to overcome variations in semantic and geographical coverage, resolutions, formats, accuracy, and representative periods. In addition to ECOCLIMAP-SG+, the process generates an agreement score map, which estimates the uncertainty of the land cover labels in ECOCLIMAP-SG+ at each location in the domain. This work presents the first evaluation of ECOCLIMAP-SG and ECOCLIMAP-SG+ against the following trusted land cover maps: LUCAS 2022, the Irish National Land Cover 2018 dataset, and an Icelandic version of ECOCLIMAP-SG. Using a set of primary labels, ECOCLIMAP-SG+ outperforms ECOCLIMAP-SG regarding the F1-score against LUCAS 2022 over Europe and the Irish national land cover 2018 dataset. Similarly, it outperforms ECOCLIMAP-SG against the Icelandic version of ECOCLIMAP-SG for most of the represented secondary labels. The score map shows that the quality ECOCLIMAP-SG+ is hetereogeneous. It could be improved once new maps become available, but we do not control when they will be available. Therefore, the second part of this publication series aims at improving the map using machine learning.

Keywords:

land cover land use; meteorology; uncertainty quantification

1. Introduction

To estimate parameters required for calculating turbulent, radiative, heat, and moisture fluxes from the surface of the Earth, numerical weather prediction (NWP) models need information describing the Earth’s surface. Physiographic databases are used to provide such information. These databases comprise a land use land cover (LULC) map and a complementary set of geophysical datasets such as leaf area index, albedo, tree heights, and lake depths. A LULC map gathers a set of identifiable features from observations (generally remote sensing) into classes that the map producer desires. The LULC map is a pivotal element of the physiographic database used in NWP models as it triggers the use of different physical parametrisations and biophysical arrays associated with each LULC class. These physical parametrisations and biophysical arrays enable the estimation of surface fluxes. An accurate estimation of the surface fluxes is essential for NWP, as most energy and water exchanges happen between the surface and the lower boundary of the atmosphere. Thus, the classes of the LULC map used in each NWP model need to address the surface physics requirements of the model to correctly represent the most energy- and water-intensive exchanges in the atmosphere.

Meteorological organizations use different LULC maps in their NWP models [1]. These maps vary in the number of land-cover classes (semantic resolution) and grid size (or resolution), but all aim to represent the sub-grid surface heterogeneity of the NWP model correctly. To proceed, the resolution of the LULC map must be significantly smaller than the NWP model grid spacing. Over the past few years, sub-kilometer-resolution NWP experiments have emerged with grid spacings of up to 100 m [2,3,4]. As described in [5], the datasets used in operational NWP are based on datasets with resolutions coarser than 100 m and are therefore unable to provide sub-grid surface heterogeneity information for hectometric NWP.

Many high-resolution maps exist [6,7,8,9]. As described in [10], these maps can be global, continental [6,7], national [9], or even local [11], and all have limitations due to spatial coverage, spatial accuracy, semantic resolution, and how up-to-date they are. The global and continental maps have the best spatial coverage but lower spatial accuracy, lower semantic resolution, and are the most difficult to update. National and local maps tend to have better spatial accuracy, better semantic resolution, and are easier to update [10]. National and local datasets tend to have labels specific to the region they represent. This can lead to semantic incompatibility when attempting to merge datasets.

Operational NWP runs on international, continental, or even global domains with high semantic resolution. Therefore, there are still challenges to be overcome to benefit from the sub-grid information contained in these maps for use in hectometric NWP [5].

For example, the latest version of ECOCLIMAP [12,13] called ECOCLIMAP-Second Generation (ECOSG)1 is a LULC map implemented in multiple NWP systems [14,15]. ECOCLIMAP-Second Generation (ECOSG)’s resolution of 300 m is insufficient for 100 m NWP experiments. Using a higher-resolution LULC map with ECOSG labeling is necessary. It is essential to retain the ECOSG labeling to avoid the need to rewrite the surface physics parametrizations.

The ECOSG physiographic database has a land cover with 33 labels, which is much less than the 215 labels of ECOCLIMAP-II [13] but much more than many existing LULC. These limitations mean that no existing high-resolution (less than 100 m) datasets with ECOSG labels are currently available globally, or even for Europe.

Machine learning (ML) brings efficient solutions for producing LULC maps, thanks to its cost-effectiveness, spatial extensivity, multi-temporality, and its ability to save time compared with traditional surveys [16,17]. A first approach is to use ML to derive the quantity of interest from raw remote sensing data, such as the land-cover classes [1] or the building heights [18]. However, in both cases, the main limitation is the availability of large, representative, and labeled data. Using processed remote sensing data instead of raw data leads to much better results and is widely used to produce both general [6,7] and specialized maps [19]. However, the work load of processing input data can be substantial, and this approach barely reuses information contained in existing maps. Our approach is to use ML to leverage information about the uncertainty on the target LULC: large, representative, and labeled data are easily made available in high-quality areas. Then, ML is used to extend the map to lower-quality areas.

The paper, as part one of a two-part publication, introduces an agreement-based method for producing a 60 m resolution reference LULC map that covers continental Europe regional NWP domains with ECOSG labels. The resolution of 60 m was chosen as the highest resolution for which the current labels still make sense. Our method leverages existing suitable datasets and produces a map we will call ECOCLIMAP-SG+ or ECOSG+ hereafter. The agreement-based method uniquely addresses the limitations of various existing land cover datasets (e.g., global, continental, national, local) by leveraging their semantic and spatial disparities. The ECOSG+ methodology overcomes these challenges by calculating agreement at two semantic levels—primary and secondary labels—enabling it to harmonize these diverse datasets within the framework of the ECOSG labeling system. This process ensures the preservation of NWP physical parametrizations while improving spatial resolution and label accuracy, particularly for hectometric weather prediction. By quantifying the agreement between datasets, the method provides the uncertainty on the land cover labels, which is rarely provided as information, although very useful in some applications. In particular, it can be used to build a reference dataset for machine learning easily. This will be conducted in the second part of this publication [20], which focuses on improving ECOSG+ where its uncertainty is high. Section 2 presents the datasets and methodology used to produce ECOSG+. The results follow in Section 3, while the conclusions are provided in Section 4.

2. Materials and Methods

ECOSG+ is built by mixing a large number of existing land cover maps based on the agreement between them. The agreement is quantified by a consensus criterion on the labels at the pixel level, which is detailed in Section 2.2. Section 2.1 introduces which land cover maps were used.

2.1. Material

2.1.1. Primary and Secondary Labels

In the construction and evaluation of ECOSG+, we use two sets of labels: primary labels,

L_{1}

, and secondary labels,

L_{2}

. The sets

L_{1}

and

L_{2}

are such that for each

l_{2} \in L_{2}

, there is a unique

l_{1} \in L_{1}

. The function for the hierarchical link is denoted h as in Equation (1).

h : l_{2} \in L_{2} \mapsto h (l_{2}) = l_{1} \in L_{1}

(1)

Figure 1 gives the sets of labels

L_{1}

and

L_{2}

and the hierarchy between these.

While we aim to create ECOSG+ with the set of secondary labels

L_{2}

, the method starts with

L_{1}

to aid the comparison of maps with heterogeneous semantic coverage. More specifically, some maps focus on primary labels (we refer to these as backbone maps), and some maps focus on some secondary labels (referred to as specialist maps), although the focus is not always exclusive (some maps are both backbone and specialist maps).

2.1.2. Land Cover Maps

The data used in the construction of ECOSG+ is a set of LULC maps. We define a map as a function from a geographical location to a label as in Equation (2).

M : x \in D \mapsto l \in L

(2)

with the following definitions: x is a geographical location,

D

is the geographical domain of definition of the map, l is a land cover label, and

L

is the set of labels for this map (i.e., the semantic domain of definition).

For ECOSG+, the geographical domain of definition is the entire globe, denoted

D

. However, most of the maps used in the building of ECOSG+ are focused on Europe. Therefore, the evaluation is only conducted over Europe, and visualization is restricted to the EURAT domain (longitudes: −32 to 42 degrees, latitudes: 20 to 72 degrees). The semantic domain of the definition of ECOSG+ is the set of secondary labels

L_{2}

.

A total of 43 land cover maps were used in the construction of ECOSG+. They are listed in Table A1, Table A2 and Table A3. For details on the pre-processing and exceptions, we refer to Appendix D.1.

Backbone Maps

We define a backbone map as any map that provides all2 the primary labels

L_{1}

(preferably at a high resolution). We denote the total number of backbone maps available as

N_{b b}

. For any

i \in {1, \dots, N_{b b}}

, the i-th backbone map

M_{b b}^{i}

is, therefore, a function satisfying Equation (3).

M_{b b}^{i} : x \in D_{b b}^{i} \mapsto l_{1} \in L_{1}

(3)

where

D_{b b}^{i}

is the geographical domain of the map

M_{b b}^{i}

. Note that the geographical domain varies from one backbone map to another. In this work,

N_{b b} = 16

backbone maps were used. These are identified as

M_{b b}

in Table A1, Table A2 and Table A3.

Specialist Maps

We define a specialist map as any map that provides secondary labels. We denote the total number of specialist maps available as

N_{s p}

. For any

j \in {1, \dots, N_{s p}}

, the j-th specialist map

M_{s p}^{j}

satisfies

M_{s p}^{j} : x \in D_{s p}^{j} \mapsto l_{2} \in L_{2}^{j}

(4)

where

D_{s p}^{j}

is the geographical domain of definition and

L_{2}^{j} \subset L_{2}

is the semantic domain of definition. For backbone maps, the geographical coverage varies from one specialist map to another, as does the semantic domain of definition. In this work,

N_{s p} = 33

specialist maps were used. They are identified as

M_{s p}

in Table A1, Table A2 and Table A3, which also contain the secondary label indices that are in their semantic domain of definition.

2.2. Methods

2.2.1. Construction of ECOSG+

This section only describes the main steps of the construction, which are original to this work. For technical details and exceptions, please see Appendix D. The input data used for this method are the backbone maps

{M_{b b}^{1}, \dots, M_{b b}^{N_{b b}}}

and the specialist maps

{M_{s p}^{1}, \dots, M_{s p}^{N_{s p}}}

with their definition domains, as introduced in Section 2.1.2.

Definition of a Specialist Agreement Score

For any position

x \in D

and secondary label

l_{2} \in L_{2}

, we define the specialist agreement score,

S_{s p} (x, l_{2})

, as in Equation (5).

S_{s p} (x, l_{2}) = \frac{\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2})}{\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j})}

(5)

with

1

the indicator function returning 1 if its argument is true and 0 otherwise, and ∧ the logical “and” operator. Therefore, the score

S_{s p} (x, l_{2})

is the ratio between the number of specialist maps that agree with the label

l_{2}

at x versus the number of maps that could provide this information. For example, if we have a position x for which 4 maps can give the secondary label

l_{2}

= “19. Winter C3 crops” (i.e.,

\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j}) = 4

) but only 3 maps actually give this label (i.e.,

\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2}) = 3

), then we have a specialist agreement score of 0.75 (i.e.,

S_{s p} (x, l_{2}) = 3 / 4

). If, at another location, only 3 maps can give this label and 2 are actually giving it (therefore with more semantic heterogeneity), then the specialist score will be 0.66. The specialist agreement score ranges between 0 and 1 and reflects the confidence level on the information provided by the specialist maps: the higher the value of

S_{s p} (x, l_{2})

, the more confident we are that the label

l_{2}

is correct at x. Exceptions to Equation (5) are listed in Appendix D.2.

Refinement of Backbone Maps

For each backbone map, we create a map giving secondary labels instead of primary labels. This process results in a so-called refined map for each backbone map. For any

i \in {1, \dots, N_{b b}}

we write

M_{r f}^{i}

, the refined map of the i-th backbone map, as

M_{r f}^{i} : x \in D_{r f}^{i} \mapsto \underset{l_{2} \in L_{2}, h (l_{2}) = M_{b b}^{i} (x)}{argmax} S_{s p} (x, l_{2}) \in L_{2}

(6)

with

S_{s p}

the specialist agreement score (see Equation (5)), h the hierarchical link between secondary and primary labels (see Equation (1)) and

D_{r f}^{i} = \{x \in D_{b b}^{i} : max_{l_{2} \in L_{2}, h (l_{2}) = M_{b b}^{i} (x)} S_{s p} (x, l_{2}) \neq 0\}

(7)

The refined map

M_{r f}^{i}

returns the secondary label with the highest specialist agreement score while satisfying the hierarchical link with the primary label given by the backbone map

M_{b b}^{i}

. For example, let us consider the ESA WorldCover v200 backbone map and a position x where the map gives the primary label “Water bodies” (i.e.,

M_{b b} (x) =

“Water bodies”). The associated refined map will return whichever of the following secondary labels

l_{2} \in

{“1. Sea and oceans”, “2. Lakes”, “3. Rivers”} (i.e.,

h (l_{2}) = M_{b b} (x)

) has the highest

S_{s p} (x, l_{2})

.

The domain of definition of a refined map,

D_{r f}^{i}

, is smaller than that of the corresponding backbone map,

D_{b b}^{i}

, because refinement is not possible everywhere. When the highest specialist agreement score is zero, there are no specialist maps that provide the relevant secondary labels. Therefore, refinement is impossible.

We define a refined agreement score,

S_{r f} (x, l_{2})

, for any position x and secondary label

l_{2}

as in Equation (8).

S_{r f} (x, l_{2}) = \frac{\sum_{i = 1}^{N_{b b}} 1 (x \in D_{r f}^{i} \land M_{r f}^{i} (x) = l_{2})}{{max}_{x} \{\sum_{i = 1}^{N_{b b}} 1 (x \in D_{r f}^{i})\}}

(8)

Therefore,

S_{r f} (x, l_{2})

is the ratio between the number of refined maps that agree with the label

l_{2}

at x versus the maximum number of overlapping maps. For example, in our case

N_{b b} = 16

, but because the definition domains of some refined maps are mathematically disjoint, no more than 9 refined maps overlap (i.e.,

{max}_{x} \sum_{i = 1}^{N_{b b}} 1 (x \in D_{r f}^{i}) = 9

). If we have a position x for which 4 refined maps give the secondary label

l_{2}

= “19. Winter C3 crops” (i.e.,

\sum_{i = 1}^{N_{b b}} 1 (x \in D_{r f}^{i} \land M_{r f}^{i} (x) = l_{2}) = 4

), then we have a refined agreement score of 0.44 (i.e.,

S_{r f} (x, l_{2}) = 4 / 9

). Note that the denominator is constant. This choice ensures that an area with more available refined maps obtains a higher score. Exceptions in the refinement process are listed in Appendix D.3.

Best-Guess Map

The ensemble of refined maps is then used to create a single best-guess map. This is conducted by taking the label

l_{2}

with the best-refined agreement score

S_{r f} (x, l_{2})

. The resulting map,

M^{*}

, is defined as follows3:

M^{*} : x \in D^{*} \mapsto \underset{l_{2} \in L_{2}}{argmax} S_{r f} (x, l_{2}) \in L_{2}

(9)

with

S_{r f}

the refined agreement score (see Equation (8)) and

D^{*} = \{x \in D : max_{l_{2} \in L_{2}} S_{r f} (x, l_{2}) \neq 0\}

(10)

For the latter, we extend the definition domain of

M^{*}

to the whole globe

D

by inserting the label “0. No data” for all

x \notin D^{*}

. The extended map is also denoted

M^{*} : D \to L_{2}

.

Quality Assessment

The quality of

M^{*} (x)

depends on the refinement process and the construction of the best-guess map. The uncertainties of these steps are represented by the specialist agreement score

S_{s p}

and the refined agreement score

S_{r f}

. For any position x, we define the quality score,

S (x)

, as in Equation (11)

S (x) = \sqrt{S_{r f} (x, M^{*} (x)) S_{s p} (x, M^{*} (x))}

(11)

with the following definitions:

$M^{*}$ is the best-guess map, defined in Equation (9);
$S_{r f}$ is the refined agreement score, defined in Equation (8);
$S_{s p}$ is the specialist agreement score, defined in Equation (5).

The quality score

S (x)

is the geometric mean of the uncertainty caused by disagreement in the backbone maps (represented by their refined counterpart) and the uncertainty due to disagreement in the specialist maps. This is the score we use to estimate the uncertainty of the label given by ECOSG+ at any position x. Note that when

M^{*} (x)

= “0. No data”, the score

S (x) = 0

because

S_{s p} (x, “ 0 . No data ”) = 0

, as defined in Equation (5).

Assembling

The last step in producing ECOSG+ is to assemble the available information, namely: the best-guess map

M^{*}

, the quality score S, and the ECOSG map. The assembly of ECOSG+ takes the best-guess map

M^{*}

where the agreement score S is higher than a threshold

S_{m i n}

and uses ECOSG elsewhere. If we denote the map returning the labels of ECOSG (resp. ECOSG+) by

M_{s g}

(resp.

M_{s g +}

), we have the following:

M_{s g +} (x) = \{\begin{matrix} M^{*} (x) & if S (x) > S_{m i n} \\ M_{s g} (x) & else \end{matrix}

(12)

The

S_{m i n}

threshold was determined using a histogram of

S (x)

for x covering the EURAT domain at 0.1° resolution. The threshold was chosen by minimizing the variance in the rejected and accepted values following the Otsu method [21]. Using this method, a value of 0.525 was attained for

S_{m i n}

.

2.2.2. Evaluation of ECOSG+

As for all land cover maps, the evaluation of ECOSG+ is made complicated by the heterogeneity of resolutions, semantics, and geographical coverage. Existing methods for evaluating land cover maps include the following:

Comparison to derived measurable quantities. For example, ref. [22] trained a machine learning model to derive the skin temperature from the land cover and compared the derived skin temperature to the measured skin temperature. This method allows quantitative evaluation of the land cover maps of large domains but requires measured quantities at a comparable resolution over a comparable domain.
Human validation. For example, LUCAS [23] and CLC+ [24] are validated by human experts. In the case of LUCAS, experts went to designated sites to verify the land cover. In the case of CLC+, experts validated the land cover classes by photo-interpretation. In both cases, human validation requires a sufficient number of trained staff and a carefully designed validation procedure.
Comparison to trusted land cover maps. For example, ref. [25] and ref. [1] trained and validated a machine learning model using the CORINE land cover map. In this method, the quality of the evaluation is dependent on the quality of the trusted map. It, therefore, requires the existence of a trusted map of proven quality with an appropriate set of labels, and preferably a higher spatial resolution and greater detail than the map being assessed. Although this is considered less accurate than human validation [26], this method validates every pixel on the trusted map domain.

In our case, human validation was not possible because of limitations in time and expertise. The comparison to derived measurable quantities was not investigated because of the lack of measured quantities at 60 m resolution covering significant parts of Europe. Therefore, we chose to perform a comparison with trusted land cover maps used as references.

Reference Maps

The trusted land cover maps used are the following:

LUCAS 2022: In situ data at validated sites over all of Europe translated to primary labels (see Table A7). A translation to secondary labels is not possible.
NLC 2018: A raster map providing primary labels at 10 m resolution, created by the National Mapping Division of Tailte Éireann (formerly Ordnance Survey of Ireland) in partnership with the Irish Environmental Protection Agency (EPA) and translated to primary labels (see Table A8). A translation to secondary labels is not possible, and the map only covers Ireland.
ECOSGIMO: A raster map at 25 m (provided at 60 m) resolution providing secondary labels, using national datasets and expert rules. Most of the covers over nature come from a habitat classification map [27] from the Icelandic Institute of National History (IINH) based on the EUNIS classification system. The habitat types were translated to secondary labels for Snow, Water bodies, Bare land, Grassland, Crops, and Flooded vegetation. The secondary labels for Forests and Shrubs are based on data from the Icelandic Forest Service, Icelandic Forest Research, Mógilsá. Two maps were used, i.e., a map of native birch forests and shrubs and a map of afforestation with different coniferous and broadleaf species. The urban local climate zone labels come from the CORINE Land Cover 2018 [28] with a few updates in Reykjavík city. Recent lava fields were added as rocks with data from the Icelandic Meteorological Institute and other national institutes in Iceland.

None of the reference maps were included in the construction process, which avoids bias in the evaluation. Three components of ECOSG+ are tested with the three reference maps: accuracy across Europe (LUCAS), accuracy of small-scale features (NLC 2018), and accuracy of secondary labels (ECOSGIMO). However, these three components are not evaluated together, and each has limitations (only discrete sites in LUCAS 2022, only the geographical region of Ireland in NLC 2018, and only Icelandic secondary labels in ECOSGIMO).

Comparison Scores

To quantify the similarity between two maps, we compute the confusion matrix on the labels4 for all pixels. Then, we compute the overall accuracy (OA) for an overall comparison and the F1-score for a per-label comparison.

The confusion matrices shown in Section 3 are normalized row-wise by the number of pixels with the row’s label in the reference map. Therefore, the diagonal of these matrices shows the recall (or producer accuracy) value for each label.

Baseline Maps

Once the comparison scores were computed for ECOSG+ against the reference maps, the values of the scores were compared with the ones obtained using baseline maps. The baseline maps considered are ECOSG, ECOSG+300, and ESA WorldCover v200 ([29], hereafter ESA WorldCover).

ECOSG is used as a baseline to quantify the improvement on the land cover map currently used in the HARMONIE-AROME NWP model.
ECOSG+300 refers to ECOSG+ resampled at ECOSG’s native resolution of 300 m. This baseline aims to show whether the improvement is due to the increase in resolution or the correction of some labels.
ESA WorldCover v200 is one of the most commonly used land cover maps, and therefore makes a good standard.

In addition to the quantitative evaluation described here, a qualitative evaluation is also carried out. It consists of visualizing ECOSG+ and its quality score at different scales and calculating basic statistics about the labels.

3. Results

The results are presented as introduced in Section 2.2.2: first, a qualitative evaluation of ECOSG+ and its quality score (Section 3.1), then a quantitative evaluation. The quantitative evaluation involves testing three components of ECOSG+ separately, marked by their reference datasets: a continental scale evaluation is performed with LUCAS 2022 as a reference (Section 3.2.1), a small scale evaluation is performed with NLC 2018 as a reference (Section 3.2.2), and a secondary label evaluation is performed with ECOSGIMO as a reference (Section 3.2.3).

3.1. Qualitative Evaluation

3.1.1. Overview of the ECOSG+ Map and Its Quality Score

Figure 2 shows an overview of the ECOSG+ map (upper panel) and its associated quality score map (lower panel). It is obtained by taking the nearest neighbor value on a regular longitude–latitude grid (EPSG:4326) covering the EURAT domain at 0.1°. The colormap of the land cover labels is the same as for ECOSG. Only labels actually present on the map are listed in the color bar. The colormap of the scores transitions from red (low values, which means low confidence on the label) to green (high values, which means high confidence). The cut-off value in the colormap is the same as the threshold

S_{m i n}

used in assembling the map (see Equation (12)): 0.525. Therefore, the labels of ECOSG+ come from ECOSG where the scores are in red, and they come from the best-guess map

M^{*}

(see Equation (9)) where the scores are in green.

On the land cover map (upper panel of Figure 2), little can be said at this scale, except that no obviously wrong labels were found. We retrieve the expected main features: credible coastlines, a large area of desert in Northern Africa, and a complex tiling of covers everywhere else.

On the score map (lower panel of Figure 2), the main red areas are over the Atlantic Ocean, Eastern Europe, and Mediterranean countries. The low scores over the Atlantic Ocean are due to the common practice of removing data over large sea areas. Consequently, despite the low score values, we are confident that the “1. Sea and oceans” label taken from ECOSG over this area is correct. In Eastern Europe, low scores are expected as fewer datasets representing these areas were included in the method. This is also the case for some Mediterranean countries, such as Turkey, Morocco, and Algeria. However, some regions are in red, despite having a good coverage of datasets. For example, southern France and Portugal are depicted in red despite using the national land cover maps there, which suggests disagreement between all the datasets used in these areas.

The main green areas are the coastline, the deserts, and most European countries. The coastline is usually well represented in all land cover maps, and some of the maps used in this work have a resolution as high as 10 m. Therefore, this explains the good agreement on the coastline. A large number of maps cover most European countries, explaining the high confidence level in these areas. Surprisingly, the deserts are shown in light green, which suggests a satisfactory confidence level, despite the limited number of datasets available. This result can be explained by the limited number of labels covering the deserts (“4. Bare land” and “5. Bare rock”), which reduces the risk of disagreement compared with areas covered by more labels. Therefore, the numerator in the specialist agreement score (see Equation (5)) is most likely high. This high numerator is divided by a small number of maps available, leading to a large score and, consequently, a high confidence level.

Other noticeable features on the score map are the numerous artifacts due to the manipulation of many maps with heterogeneous projections and boundaries. We can see stitching artifacts (regularly spaced red horizontal lines) and reprojection artifacts (patterns in the North and Arctic Seas). Noticeably, these artifacts are not visible on the land cover map, at least at this scale, which is an encouraging result for the ECOSG+ construction method.

In practice, such information is used to warn users about varying quality within the map. In [20], it is used to create a reference dataset for a machine learning algorithm.

3.1.2. Distribution of Labels

Figure 3 represents the distribution of labels over the EURAT domain for ECOSG+ (outer ring) and ECOSG (inner ring). In addition, the proportion of pixels with scores exceeding the

S_{m i n} = 0.525

threshold has been estimated as 33.79%, which means that 33.79% of the pixels of ECOSG+ take their label from a source other than ECOSG. The pie chart names only the most common labels (those with 2% coverage or more). The first visible feature is the similar distribution of labels between ECOSG+ and ECOSG, which is expected at this scale. Therefore, although 33.79% of pixels have changed, the distribution of labels is barely modified, which is an element of ECOSG+ validation. The second visible feature is the strong imbalance among the land cover labels. On the one hand, four labels cover more than 81% of the pixels: “1. Sea and oceans” (52%), “4. Bare land” (17%), “19. Winter C3 crops” (7%), and “12. Boreal needleleaf evergreen” (5%). On the other hand, the 15 least dominant labels cover less than 1% of the pixels. In particular, urban areas (LCZs 1 to 10) represent 0.9% of the pixels.

3.1.3. Zoom on a Few Patches

Figure 4 shows examples of land cover patches for several land cover maps (one per column) at several locations (one per row). Each row represents a geographical area and is identified on the left-hand side by a toponym, the country it is in, and the longitude–latitude coordinates of the central point. The geographical areas have been chosen to display various landscapes and latitudes within the EURAT domain. The first row is the Snaefell Glacier in Iceland, as glacier classification is challenging due to diversity in glaciers, temporal changes [30], the presence of debris, shadows, and illumination effects [31]. The second row is centered on Nanterre, France, in the north–western part of the Paris urban area. This was chosen because the classification of hetereogeneous urban areas has been identified as an issue in ECOSG [1]. The third row shows the small islands of Kihdinluoto in the south–west of Finland. The classification of small islands is challenging due to their size, which requires high-resolution data, and the complex interaction between land and sea at the coastline [32]. The fourth row is a rural part of Portugal, around the small town of Pinhel. The underestimation of small towns was previously identified as an issue in ECOSG [33]. The fifth and last row is the oasis town of El Menia in the Sahara Desert (Algeria). This region was chosen to investigate the consequence of having a lower number of datasets available for the construction of ECOSG+ in that area. All patches are 0.0833° in size, representing approximately 8 km at low latitudes.

Each column represents a different land cover map. The first column is ESA World Cover, one of the backbone maps with global coverage and 10 m resolution, shown here to verify the primary labels. The second column is ECOSG, currently used in NWP models, and is the baseline to improve upon. The colormap for the ECOSG labels is the same as in Figure 1, Figure 2 and Figure 3. The third column is ECOSG+,

M_{s g +}

(see Equation (12)), the final map of this work. The fourth column is the best-guess map,

M^{*}

(see Equation (9)). The fifth and last column is the ECOSG+ quality score map, S (see Equation (11)), used in the assembly of ECOSG+. The colormap for the score values is the same as in Figure 2, with green indicating where ECOSG+ takes values from

M^{*}

instead of ECOSG.

Figure 4 illustrates the gain in resolution between ECOSG and ECOSG+ and the label correction (e.g., over Paris, line 2). The best-guess map shows even more pronounced differences with ECOSG very often in agreement with ESA WorldCover, which is good. However, some missing values remain where the score is 0 (deep red, e.g., the crops over Portugal, line 4, or the Algerian lake and town, line 5). The absence of agreement in some places (where the score is zero) makes it necessary to have a cut-off value

S_{m i n}

as described in Equation (12). Further assessment of the impact of the cut-off value is made in [20]. As ECOSG is taken where the score is below the cut-off value, the areas with the lowest scores also have a lower resolution. Despite being qualitatively more satisfying, further work will be required to run hectometric-scale simulations with the updated physiography and assess the impact of the gain in resolution.

3.2. Quantitative Evaluations

3.2.1. Europe-Wide Evaluation Against LUCAS

The ESA WorldCover, ECOCLIMAP-SG map, and the new ECOSG+ map were evaluated over the European Union using the LUCAS 2022 dataset [23]. In this comparison, the LUCAS points were expanded to a 60 m radius, and the LUCAS labels were translated according to the C3 classification and Table A7. To rule out resolution changes as a factor for the differences between ECOSG and ECOSG+, ECOSG+ was downsampled to 300 m (ECOSG+300) using the most frequent land cover type in each 300 m square grid.

Figure 5 shows the confusion matrices of ECOSG, ESA WorldCover, ECOSG+, and ECOSG+300. Primary labels of the reference dataset (LUCAS) are on the y-axis, and those of the evaluated dataset are on the x-axis. To compensate for label imbalance, the matrices have been normalized row-wise by the number of pixels per label in the reference dataset. The figure also shows each map’s associated overall accuracy (OA). For every dataset, “Forest”, “Grassland”, and “Crops” display the most off-diagonal spread in values, with a clear confusion between “Bare land” and “Crops”. A misclassification of bare land was previously reported in multiple backbone maps used in ECOSG+ [7,34]. The refinement process and the best-guess map are dependent on the accuracy of the backbones map. When most backbone maps are inaccurate, this leads to an erroneous best-guess map. Because ECOSG presents a similar issue, it explains the ECOSG+ misclassification of bare land.

ECOSG+, ECOSG+300, and ESA WorldCover exhibit similar behavior, with higher values on the diagonal for “Grassland” and more “Grassland” versus “Shrubs” confusion than in ECOSG. The OA reflects these two groups, with ECOSG+, ECOSG+300, and ESA WorldCover having a value of over 0.5, while ECOSG has a value of 0.42. These observations suggest that the agreement-based method improved the overall representation of primary labels, and these observations are not only due to the resolution increase but also to label correction, as ECOSG+300 exhibits similar behavior to ECOSG+.

Table 1 displays the F1-score for ECOSG, ESA WorldCover, ECOSG+, and ECOSG+300 with LUCAS 2022 as a reference. ECOSG+ has the highest F1-score for “Bare land”, with a value of 0.154 compared with 0.126 for ECOSG and 0.138 for ESA Worldcover. “Snow” has a value of 0.788 compared with 0.708 for ECOSG and 0.643 for ESA Worldcover, and “Flooded vegetation” has a value of 0.438 compared with 0.342 for ECOSG and 0.367 for ESA Worldcover. It outperforms ECOSG for every label. While the impact of the downsampling leads to lower F1-scores, ECOSG+300 outperforms ECOSG for every label except “Shrubs”. This shows that ECOSG+ improves the overall representation of ECOSG primary labels over Europe.

3.2.2. Small Scale Feature Evaluation Against NLC 2018

In Section 3.2.1, we assessed the overall representation of the primary labels over Europe. In this section, we compare ECOSG+ against a dataset for Ireland to assess whether ECOSG+ represents the high-resolution detail well. Produced by the EPA and Taillte Ireland, NLC 2018 covers the Republic of Ireland. It has two classification levels, with thematic accuracy of 78.5% and 88.7% at Level 2 and Level 1, respectively, while the geometric accuracy (i.e., the area outline) is 87.2%.

To perform the analysis, the NLC 2018 data were rasterized on a 60 m grid and converted to primary labels following Table A8. We note that the NCL 2018 labels are not a perfect fit for primary labels, especially for the Flooded vegetation types. Thus, an assessment of the NLC 2018 primary label conversion is needed. This assessment was performed over Ireland using LUCAS as a reference. The assessment, presented in Appendix E, has determined that NLC 2018 is a suitable reference map for the Republic of Ireland after being transformed into primary labels. However, it has been observed that NLC 2018 classifies “Shrubs” as “Flooded vegetation”, meaning that the outcomes for these two primary labels should be viewed with caution.

Figure 6 shows the row-wise normalized confusion matrices for ECOSG+, ECOSG, and ESA WorldCover with NLC 2018 as a reference. The dark blue column over grassland indicates the large spread in the grassland classification, which is consistent with the LUCAS confusion matrix (Appendix E). ECOSG+ and ESA WorldCover classify most of the NLC 2018 “Flooded vegetation” as “Grassland”, while there is good agreement between ECOSG and NLC 2018 for flooded grassland. As “Flooded vegetation” is overestimated in NLC 2018, the relative underestimation in ECOSG+ and ESA WorldCover suggests a better representation of “Flooded vegetation” than in ECOSG. The forest land cover in ECOSG+ and ESA WorldCover corresponds well to NLC 2018 forest, while ECOSG overestimates “Grassland”.

Table 2 shows the F1-scores over the Republic of Ireland for ECOSG+, ECOSG and ESA WorldCover, using NLC 2018 as a reference. ECOSG+ has the highest F1-score, ahead of ESA WorldCover, for “Forests”, “Grassland”, “Crops”, and “Urban” areas. Meanwhile, ECOSG is a better fit for “Bare land” and “Flooded vegetation”, which were the least accurate in NLC 2018 compared with LUCAS (Appendix E). This suggests that the agreement-based method correctly captures most primary labels over Ireland, even when compared with a higher resolution reference. The quality of the “Bare land”, “Shrubs”, and “Flooded vegetation” labels remains questionable in all tested maps.

3.2.3. Secondary Label Evaluation Against ECOSGIMO

In the previous sections, we evaluated ECOSG+ primary labels over Europe and its level of detail over the Republic of Ireland. This comparison is incomplete, as this work aims to create a reference dataset with the 33 secondary labels. An evaluation at a secondary label level is necessary. In this section, we will compare ECOSG+ secondary labels with a high-resolution version of ECOSG called ECOSGIMO. ECOSGIMO is the only high-resolution version of ECOSG available. However, ECOSGIMO does not have an accuracy assessment, and its coverage over Iceland is outside the LUCAS domain or that of any known in situ datasets. Hence, we cannot verify the accuracy of ECOSGIMO. Rather than providing an absolute assessment of the ECOSG+ secondary labels, this section aims to demonstrate how closely the ECOSG+ agreement-based method can approximate a more traditional technique.

Figure 7 shows the row-wise normalized confusion matrices for ECOSG+ and ECOSG with ECOSGIMO as a reference. For visibility purposes, labels not represented over Iceland were removed from Figure 7. Therefore, we can only evaluate 20 labels. The overall accuracies are similar and high compared with the previous section due to the imbalance of label distribution in the Icelandic domain. Indeed, 48% of the domain is covered by “1. Sea and oceans” and “6. Permanent snow”, and the dark blue pixel on the diagonal for these labels reveals a strong agreement of ECOSG+ or ECOSG with ECOSGIMO, which increases the proportion of correctly represented labels.

The dashed squares in Figure 7 indicate the primary labels, and the diagonal in the “water bodies” square indicates that water bodies in both ECOSG+ and ECOSG agree with ECOSGIMO at a primary level. Still, at a secondary level, rivers in ECOSG+ are better represented compared with in ECOSG. In the “Bare land” square, ECOSG+ agrees more at a primary level but is more confused at a secondary level. For “Forests”, “Shrubs”, and “Grassland”, both maps overestimate “Grassland”, but ECOSG+ preferentially returns “16. Boreal grassland”, while ECOSG preferentially returns “17. Temperate grassland”, and the latter is the most frequent in ECOSGIMO, which explains a significant part of the overall accuracy gap between ECOSG and ECOSG+. For “Flooded vegetation”, ECOSG is in good agreement with ECOSGIMO, while ECOSG+ overestimates “Grassland”. For “Urban”, ECOSG confuses sparsely built labels with “Grassland” compared with ECOSG+. The observed underestimation of sparsely built areas in ECOSG is consistent with previous studies [33], and ECOSG+ seems to resolve this issue partially. Heterogeneities within the “Urban” primary label also seem to be better represented in ECOSG+, with more color on the diagonal, especially for the label “LCZ6: open low-rise”.

Table 3 and Table 4 show the F1-score of ECOSG and ECOSG+ over Iceland for primary and secondary labels with ECOSGIMO as a reference. ESA WorldCover was added at the primary level for information. ECOSG+ has the highest score at the primary level, except for “Forest”, “Shrubs”, and “Flooded vegetation”.

Interestingly, ECOSG+ has the worst score for “Shrubs”, which is also a secondary label, consistent with other analyses showing that “Shrubs” are still problematic in ECOSG+. Generally speaking, shrubs are problematic because the reality they cover might not be exactly the same from one map to another: the sparsity, the height, and the type of vegetation might differ. Therefore, there is no map with a specific focus on shrubs (see Table A1, Table A2 and Table A3), which might affect the accuracy of ECOSG+. Moreover, the definition of the shrub label might be different to that of the NLC2018 map, which might degrade the scores of both ECOSG and ECOSG+ in this evaluation.

At the secondary level, ECOSG+ scores better than ECOSG for 14 labels, while ECOSG has a better score for 4. Neither map identifies “13. Boreal needleleaf deciduous” and “28. LCZ5: open midrise”, explaining the NaN values.

Among the labels where ECOSG+ is better than ECOSG in terms of the F1-score, for eight of the labels the gap between the two scores is more than 0.1. Of the four labels, ECOSG has a better F1-score than ECOSG+. The gap between the two scores is more than 0.1 for two labels: “4. Bare land”, “17. Temperate grassland”. For “4. Bare land”, the low F1-score is due to a distribution issue between the “Bare land” primary label and the two secondary labels “4. bare land” and “5. bare rock”. This issue is due to the limited number of specialist maps distinguishing “4. Bare land” and “5. Bare rock”. The bioclimatic distribution between the grassland primary label and the “16. Boreal grassland” and “17. Temperate grassland” secondary labels seems to be off in Figure 7, explaining the relatively low boreal grassland F1-score. This shows the limits of solely relying on [35] for the bioclimatic classification (see Appendix D.3 for more detail).

4. Conclusions

In this work, we presented an agreement-based method for producing ECOSG+, a LULC map at 60 m with ECOSG labels. It is also the first high-resolution (below 100 m) map with the 33 ECOSG labels covering continental Europe.

ECOSG+ results from the assembly of ECOSG and a so-called best-guess map

M^{*}

representing the most probable ECOSG labels combining information from many suitable datasets. The quality score S represents the confidence level of

M^{*}

and defines the ECOSG+ assembly rules. A threshold value of 0.525 was defined for

S_{m i n}

using the Otsu method to define pixels where

M^{*}

replaces ECOSG in ECOSG+.

Although 33.79% of the resulting ECOSG+ map pixels are sourced from

M^{*}

, ECOSG+ keeps a similar label distribution to the original ECOSG. A qualitative evaluation of ECOSG+ shows the gain in resolution between ECOSG+ and ECOSG and the label correction. The qualitative evaluation also exhibits the limits of

M^{*}

, with some missing values, and therefore the need for the cut-off value

S_{m i n}

despite losing higher resolution information.

We performed the first quantitative evaluation of ECOSG in parallel with ECOSG+ against LUCAS 2018, NLC 2018, and ECOSGIMO. The evaluation against LUCAS revealed the superiority of ECOSG+ across Europe for every primary label at the LUCAS sites. The downsampling of ECOSG+ to 300 m (ECOSG+300) showed that the improvements between ECOSG+ and ECOSG are not only due to resolution. The most improved labels are “Forest” and “Grassland”, while “Bare land” and “Shrubs” have small improvements and are the least well-represented labels.

The evaluation of small-scale features against NLC 2018 revealed that ECOSG+ improves upon ECOSG for every label except “Bare land” and “Flooded vegetation”, for which the NLC 2018 translation to primary labels has been identified as being less reliable. “Shrubs” were confirmed as poorly represented in every map.

The evaluation of secondary labels against ECOSGIMO showed that ECOSG+ is superior for 14 of the 20 labels represented in Iceland. However, two secondary labels were inferior: “4. Bare land”, “17. Temperate grassland”. While their respective primary labels, “Bare land” and “Grassland”, were superior in ECOSG+, the lack of a specialist dataset for “4. Bare land” and the sole reliance on the [35] distribution for the bioclimatic distribution showed the limits of the method.

The evaluation showed limitations in representing “Bare land”, “Shrubs”, and “Flooded vegetation”. Therefore, implementing new specialist maps covering these labels would greatly benefit these aspects of ECOSG+. For example, including the global mangrove extent produced by the Global Mangrove Watch [36] can improve the representation of flooded trees. The ESA High Resolution Land Cover Climate Change Initiative [37,38], which includes wetland secondary labels, could greatly improve the results of the agreement-based method. Unfortunately, it is not yet available in Europe. The classification of “Urban” areas into local climate zones needs further evaluation. Due to the limited number of datasets explicitly representing local climate zones, the method could be improved by looking at the agreement based on building height and density, similar to what was conducted for trees (see Appendix D.3).

The method for creating ECOSG+ is flexible enough to implement other datasets as they become available. This enables the possibility of regular updates to keep ECOSG+ up-to-date to reflect LULC changes over time. The integration of more comprehensive data can also improve the accuracy of ECOSG+. It also enables the expansion of ECOSG+ and its verification beyond the EURAT domain. Table A4 presents datasets included in the code but are outside the EURAT domain. More datasets are available on GEE [39] and can be added to the framework by taking great care regarding which labels are usable and how they can be used (backbone and/or specialist maps).

The method also allows a regional customization of ECOSG+. For example, a dataset can be integrated or excluded from the framework depending on its performance over the region of interest. The cut-off value can also be adjusted to better reflect the quality score histogram over the region of interest.

This study showed that ECOSG+ outperforms ECOSG at the primary level regarding overall accuracy and F1-score per label. Still, it would have been desirable to use a larger domain with more secondary labels to ensure the quality of all 33 labels. One way to assess this product could be to compare it to derived measurable quantities.

However, the aim of ECOSG+ is not only to provide an accurate land cover map but also a quality score on the given land cover. Such information is rarely provided along with LULC maps, although it enables further use of the LULC information. For example, the quality score highlights the areas where improvements are needed and the areas that can be used to build a trustworthy reference dataset for machine learning applications. This work is conducted in [20], where ECOSG+ is used to train the AI model, which translates ESA WorldCover v200 to ECOSG+ with convolutional auto-encoders, to replace the areas with the lowest S scores. One could also consider integrating the quality score as a feature to increase the skill of an AI model using land cover information (e.g., post-processing of NWP output, quality check of observations).

As land cover is only one aspect of the ECOCLIMAP-SG physiography database, additional work will be needed to ensure the complementarity between land cover and geophysical parameters such as leaf area index, albedo, tree heights, and lake depths.

Author Contributions

All authors G.B., T.R., E.G., B.P. and S.O. contributed to writing the paper. G.B. implemented and ran the method, producing the ECOSG+ map. The figures were prepared by G.B. and T.R. B.P. created ECOSGIMO used for the evaluation of ECOSG+. S.O. created ESAGHSurban. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

ECOSG+, the best-guess map

M^{*}

, and the quality score map S are available at https://doi.org/10.5281/zenodo.10944693 [40], where each map is a zip file containing 100 .tif tiles covering the EURAT domain (longitudes: −32 to 42 degrees, latitudes: 20 to 72 degrees). Each .tif file has a resolution of 60m. The zip files are named as follows: “quality_score_map.zip: the quality score map S”. “best-guess_map.zip: the best-guess map

M^{*}

”. “ecosg_plus.zip: ECOSG+”.

Acknowledgments

We would like to thank all the mapping agencies for providing the datasets used to create ECOSG+. Eurostat for the LUCAS 2022 dataset, EPA, and Tailte Éireann for providing the NLC 2018 map under the National Mapping Agreement.

Conflicts of Interest

The contact author has declared that none of the authors has any competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

ECOSG	ECOCLIMAP-SG: a physiography database currently used in NWP
ECOSG+	ECOCLIMAP-SG+: the land cover map described in this manuscript
EURAT	Europe–Atlantic domain (longitudes: −32 to 42, latitudes: 20 to 72)
LCZ	Local climate zone
LULC	Land use land cover
NWP	Numerical weather prediction

Appendix A. Tables of Land Cover Datasets Used in the Creation of ECOSG+

Table A1. Land cover datasets used in the creation of ECOSG+ (1/3).

Name	Reference	Resolution	AOI	Usage in ECOSG +
CALC2020	[41]	10 m	circumpolar Arctic	$M_{b b}$
CGLSLC100	[42]	100 m	global	$M_{s p}$ ( $l_{2}$ = 7, 8, 9, 10, 11, 12, 13, 14)
CGLSLC100F	[42]	100 m	global	$M_{s p}$ ( $l_{2}$ = 16, 17, 18)
ESAGHSurban	ESA WorldCover and GHS-BUILT-S according to Table A5	10 m	world	$M_{b b}$
ESAWorldcereal	[43]	10 m	world	$M_{s p}$ ( $l_{2}$ = 19, 20, 21)
ESA WorldCover	[29]	10 m	world	$M_{b b}$
ESRI2020	[8]	10 m	world	$M_{b b}$
FROMGLC10	[44]	10 m	world	$M_{b b}$
GHS-BUILT-C	[45]	10 m	global	$M_{s p}$ ( $l_{2}$ = 31)
GHS-BUILT-S	[46]	10 m	global	$M_{b b}$ (ESAGHSurban)
GLCZ	[47]	100 m	global	$M_{s p}$ ( $l_{2}$ = 4, 5, 15, 16, 17, 18, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
GLC_FCS302020	[48]	30 m	world	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ = 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
GRWLwatermask	[49]	30 m	global	$M_{s p}$ ( $l_{2}$ = 1, 2, 3)
GWL_FCS30	[50]	30 m	world	$M_{s p}$ ( $l_{2}$ = 22, 23)
Hydrolakes	[51]	MMA: 10 ha	world	$M_{s p}$ ( $l_{2}$ = 2)
OSMsurfacewater	[52]	90 m	world	$M_{s p}$ ( $l_{2}$ = 1, 3)

Table A2. Land Cover Datasets used in the creation of ECOSG+ (2/3).

Name	Reference	Resolution	AOI	Usage in ECOSG +
CLCplus	[24]	10 m	Europe	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ = 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18)
Coastal2018	[53]	Vector MMU: 0.5 ha MMW:10 m	Europe Coastal areas	$M_{s p}$ ( $l_{2}$ = 1, 2, 3, 4, 5, 6, 31, 32)
ELC10	[7]	10 m	Europe	$M_{b b}$
EUCROPMAP	[19]	10 m	Europe	$M_{s p}$ ( $l_{2}$ = 19, 20, 21)
EUMAPOSMgrass	[54]	30 m	Europe	$M_{s p}$ ( $l_{2}$ = 16, 17, 18)
EUMAPlandcover	[55]	30 m	Europe	$M_{s p}$ ( $l_{2}$ = 4, 5, 6, 15, 23)
EUSALP	[56]	up to 5 m	European Alps Macro region	$M_{s p}$ ( $l_{2}$ = 4, 5, 6, 31)
EUhydrocoastline	[57]	Vector MMU: 1 ha	Europe	$M_{s p}$ ( $l_{2}$ = 1)
Geoclimate	[58]	Vector MMW: 60 m	Run over multiple large urban area across Europe	$M_{s p}$ ( $l_{2}$ = 16, 17, 18, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
GRA2018	[59]	10 m	Europe	$M_{s p}$ ( $l_{2}$ = 16, 17, 18)
IMD2018	[60]	10 m	Europe	$ϕ$ $l_{2} = 4, 5$
N2K2018	[61]	Vector MMU: 0.5 ha MMW:10 m	Europe Natura 2000 zones	$M_{s p}$ ( $l_{2}$ = 1, 2, 3, 4, 5, 6, 31, 32)
OpenEuroRegionalCoast OpenEuroRegionalIce OpenEuroRegionalLake OpenEuroRegionalRailrdL OpenEuroRegionalRoadL1 OpenEuroRegionalRoadL2 OpenEuroRegionalSea OpenEuroRegionalSoilcrs OpenEuroRegionalWatercrs OpenEuroRegionalWatercrsL	[62]	Vector data 1:25,000 missing linear small scale features	Europe	$M_{s p}$ ( $l_{2}$ = 1, 2, 3, 4, 5, 6, 31, 32)
RPZ2018	[63]	Vector MMU: 0.5 ha MMW:10 m	Europe riparian zones	$M_{s p}$ ( $l_{2}$ = 1, 2, 3, 4, 5, 6, 31, 32)
S2GLC	[6]	10 m	Europe	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ = 7, 8, 9, 10, 11, 12, 13, 14)

Table A3. Land Cover Datasets used in the creation of ECOSG+ (3/3).

Name	Reference	Resolution	AOI	Usage in ECOSG +
COSc2020	[64]	10 m	Portugal	$M_{b b}$
Icelandhabitat	[27]	1:25,000	Iceland	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ = 1, 2, 3)
MACATECOSG	MACAT and COSc2020 using Table A6	10 m	Portugal	$M_{s p}$ ( $l_{2}$ = 19, 20, 21)
NLCSweden2018	[65]	10 m	Sweden	$M_{b b}$
OCS2020	[66]	10 m	Metropolitan France	$M_{b b}$ , $M_{s p}$ ( $l_{2}$ = 4, 5), grassland, Shr…
WFDCanalIE	[67]	Vector data 1:50,000	Ireland	$M_{s p}$ ( $l_{2}$ = 3)
WFDCoastalIE	[68]	Vector data 1:50,000	Ireland	$M_{s p}$ ( $l_{2}$ = 1)
WFDLakeIE	[69]	Vector data 1:50,000	Ireland	$M_{s p}$ ( $l_{2}$ = 2)
WFDRiverIE	[70]	Vector data 1:50,000	Ireland	$M_{s p}$ ( $l_{2}$ = 3)
WFDTransitionalIE	[71]	Vector data 1:50,000	Ireland	$M_{s p}$ ( $l_{2}$ = 1)

Table A4. Land cover included in the code but outside the EURAT domain.

Name	Reference	Resolution	AOI	Usage in ECOSG +
NALCMS2020	[72]	30 m	North America	$M_{b b}$
NLCD2019	[73]	30 m	USA	$M_{b b}$ , $M_{s p}$ broadleaf_deciduous, broadleaf_evergreen

Appendix B. Conversion Tables for Particular Cases

Table A5. ESAGHSurban map creation rules. The columns are inclusive, i.e., the urban label corresponds to pixels with the ESA WorldCover value, or GHS-BUILT-S.

ESAGHSurban Primary Label	ESA WorldCover Labels	GHS-BUILT-S Fraction
Water bodies	80 Permanent Water bodies	Not used
Bare land	60 Bare/sparse vegetation	Not used
Snow	70 Snow and ice	Not used
Forest	10 Tree cover	Not used
Shrubs	20 Shrubland	Not used
Grassland	30 Grassland	Not used
Flooded vegetation	90 Herbaceous wetland; 95 Mangroves; 100 Moss and lichen	Not used
Urban	50 Built-up	>5% of built up surfaces

Table A6. Rules to create MACATECOSG using MACAT and COSc2020. The columns are exclusive, i.e., the MACATECOSG secondary labels require both COSc2020 and MACAT conditions to be met.

MACATECOSG Secondary Label	COSc2020 Label	MACAT Labels
19. Winter C3 Crops	211 Culturas anuais de outono/inverno (winter crops)	1101, 1203, 1305 Aveia (Oat); 1102, 1204, 1306 Azevém (Ryegrass); 1103, 1301, 1307 Trigo (Wheat); 1104, 1302, 1308 Triticale (Triticale); 1105, 1303, 1309 Centeio (Rye); 1106, 1304, 1310 Cevada (Barley); 1311 Courgete (Zucchini); 1312 Pimento (Pepper); 1401 Tremocilha (Lupini beans); 1402 Ervilha (Pea); 1403 Grão de bico (Chickpea); 1404 Fava (Fava); 1405 Trevo (Clover); 1406 Feijão (Bean); 1407 Tremoço (Lupine); 1409 Ervilhaças (Peas)
20. Summer C3 Crops	212 Culturas anuais de primavera/verão (summer crops)	1101, 1203, 1305 Aveia (Oat); 1102, 1204, 1306 Azevém (Ryegrass); 1103, 1301, 1307 Trigo (Wheat); 1104, 1302, 1308 Triticale (Triticale); 1105, 1303, 1309 Centeio (Rye); 1106, 1304, 1310 Cevada (Barley); 1311 Courgete (Zucchini); 1312 Pimento (Pepper); 1401 Tremocilha (Lupini beans); 1402 Ervilha (Pea); 1403 Grão de bico (Chickpea); 1404 Fava (Fava); 1405 Trevo (Clover); 1406 Feijão (Bean); 1407 Tremoço (Lupine); 1409 Ervilhaças (Peas)
21. C4 Crops	211 Culturas anuais de outono/inverno (winter crops); 212 Culturas anuais de primavera/verão (summer crops); 213 Outras áreas agrícolas (other crops)	1201 Milho (Corn); 1202 Sorgo (Sorghum)

Appendix C. Conversion Tables Used for the Evaluation

Table A7. LUCAS C3 classification conversion to primary labels.

Primary Label	LUCAS C3 Code and Name
Water bodies	G10 Inland Water bodies, G11 Inland fresh Water bodies, G12 Inland salty Water bodies, G20 Inland running water, G21 Inland fresh running water, G22 Inland salty running water, G30 Transitional Water bodies, G40 Sea and ocean
Bare land	F10 Rocks and stones, F20 Sand, F40 Other bare soil
Snow	G50 Glaciers, Permanent snow
Forest	C, C1, C10 Broadleaved woodland, C2, C20 Coniferous woodland, C21 Spruce dominated coniferous woodland, C22 Pine-dominated coniferous woodland, C23 Other coniferous woodland, C30 Mixed woodland, C31 Spruce dominated mixed woodland, C32 Pine dominated mixed woodland, C33 Other mixed woodland
Shrubs	D, D1, D10 Shrubland with sparse tree cover, D2, D20 Shrubland without tree cover
Grassland	E, E1, E10 Grassland with sparse tree/shrub cover
	E2, E20 Grassland without tree/shrub cover
	E3, E30 Spontaneously re-vegetated surfaces
Crops	every B code from B00 Cropland to B84 Permanent industrial crops
Flooded vegetation	H, H10 Inland wetlands, H11 Inland marshes, H12 Peatbogs, H20 Coastal wetlands, H21 Salt marshes, H22 Salines and other chemical deposits, H23 Intertidal flats, F3, F30 Lichens and moss
Urban	A00 Artificial land, A1, A10 Roofed built-up areas, A11 Buildings with one to three floors, A12 Buildings with more than three floors, A13 Greenhouses, A2, A20 Artificial non-built up areas, A21 Non built-up area features, A22 Non built-up linear features, A30 Other Artificial Areas

Table A8. Conversion of NLC 2018 to ECOSG primary labels.

Primary Label	NLC 2018 Code and Label
Water Bodies	810 Rivers and Streams, 820 Lakes and Ponds, 830 Artificial Water bodies, 840 Transitional Water bodies, 850 Marine Water
Bareland	210 Exposed Rock and Sediments, 220 Coastal Sediments, 230 Mudflats, 240 Bare Soil and Disturbed Ground, 250 Burnt Areas
Snow	None
Forest	410 Coniferous Forest, 420 Mixed Forest, 430 Transitional Forest, 440 Broadleaved Forest and Woodland 470 Treelines
Shrubs	450 Scrub, 460 Hedgegrows
Grassland	510 Improved Grassland, 520 Amenity Grassland, 530 Dry Grassland
Crops	310 Cultivated Land
Flooded vegetation	540 Wet Grassland, 550 Saltmarsh, 570 Swamp, 610 Raised Bog, 620 Blanket Bog, 630 Cutover Bog, 640 Bare Peat, 650 Fens, 710 Bracken, 720 Dry Heath, 730 Wet Heath
Urban	110 Buildings, 120 Ways, 130 Other Artificial Surfaces

Appendix D. Exceptions and Special Cases in the Construction of ECOSG+

Appendix D.1. Exceptions in the Land Cover Maps

Pre-processing is applied and consists of uploading all required maps onto Google Earth Engine (GEE) and reprojecting these onto the GEE default grid (see [74] for details). Most of the data are hosted in the GEE data catalog [75] and the so-called “awesome” GEE community catalog [39]. Additional datasets not in these catalogs, such as CLC+, were manually uploaded to GEE. Another reprojection is made when exporting the ECOSG+ map and its quality estimation in EPSG:4326 at 0.000539° resolution (approximately 60 m). These operations are dealt with by GEE commands that are beyond the scope of this paper. We refer the reader to [74,76] for comprehensive documentation on the Python API and the geemap python package.

Some particular cases in Table A1, Table A2, Table A3 and Table A4 are noteworthy:

The Geoclimate dataset consists of a map of LCZ obtained by running the Geoclimate tool [58] on the main European urban areas.
The Copernicus Imperviousness HRL does not provide secondary labels but distinguishes secondary labels ‘(‘4. Bare land”, “5. Bare rocks”) and concrete runways (“31. LCZ8: large low-rise”) (see Appendix D.2). We extract the artificial imperviousness density, denoted $ϕ (x)$ , from this dataset.
ESAGHSurban is a combination of ESA WorldCover v200 and the GHS built-up surface (GHS-BUILT-S) converted to primary labels, where GHS-BUILT-S was resampled to the target grid, and missing urban areas have been added from ESA WorldCover following Table A5.
MACATECOSG is a merge of the Portuguese DGterritorio MACAT and COSC 2020 [64] maps. Using the rules in Table A6, this merge locates the labels “19. Winter C3 Crops”, “20. Summer C3 Crops”, and “21. C4 crops” over Portugal.

Appendix D.2. Exceptions in the Specialist Agreement Score

Exceptions to Equation (5) are made in the following cases:

$l_{2} =$ “0. No data”. The specialist agreement score is set to 0 everywhere for this label.
Null denominator: $\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j}) = 0$ . When no specialist map provides the label $l_{2}$ at x, the specialist agreement score is set to 0.
$l_{2} \in$ {“4. Bare land”, “5. Bare rocks”}. Preliminary experiments showed that confusion often occurs between sand, rocks (secondary labels “4. Bare land”, “5. Bare rocks”) and concrete runways (“31. LCZ8: large low-rise”), which can be disambiguated thanks to the artificial imperviousness density. Therefore, maps providing the labels “4. Bare land” and “5. Bare rocks” see their score penalized by the imperviousness density (sand and rocks with high imperviousness are likely to be wrong).

Therefore, a more general (but more complicated) formula for the specialist agreement score is

S_{s p} (x, l_{2}) = α (x, l_{2}) (\sum_{j = 1}^{N_{s p}} 1 (x \in D_{s p}^{j} \land l_{2} \in L_{2}^{j} \land M_{s p}^{j} (x) = l_{2})) + β (x, l_{2})

(A1)

with the following coefficients:

α (x, l_{2}) = \{\begin{matrix} 1 / N_{s p}^{'} (x, l_{2}) & if N_{s p}^{'} (x, l_{2}) > 0 and l_{2} \neq “ 0 . no data ” \\ 0 & else \end{matrix}

(A2)

β (x, l_{2}) = \{\begin{matrix} - ϕ (x) & if l_{2} = “ 4 . bare land ”, “ 5 . bare rocks ” \\ 0 & else \end{matrix}

(A3)

where

ϕ (x)

is the artificial imperviousness density (ranging between 0 and 1, with 1 being totally human-made impervious ground).

Appendix D.3. Exceptions in the Refinement Process

The refinement process as described in Section 2.2.1 suffers three exceptions: a joint maxima in score, the labels with bioclimatic classification, and the forest primary label.

Joint Maximum in Score

In Equation (6), in the case of a joint maximum in the specialist agreement score, the argmax returns the one with the lowest label number. For example, if we have a position x, a backbone map

M_{b b}^{i}

giving

M_{b b}^{i} (x)

= “Crops”, and a joint maximum in the specialist agreement score (e.g.,

S_{s p} (x,

“21. C4 crops”

) = S_{s p} (x,

“20. Summer C3 crops”

) > S_{s p} (x,

“19. Winter C3 crops”)), then the refined map

M_{r f}^{i} (x)

will return “20. Summer C3 crops” because it has the lowest label number in the joint maximum. This default behavior is arbitrary, as no better solution was found for now.

Bioclimatic Classification

Bioclimatic classification (i.e., “boreal”, “temperate”, or “tropical”) is used in the “Forest” and “Grassland” primary labels. However, few datasets provide the bioclimatic class. Therefore, the secondary labels that differ only by their bioclimatic class are distinguished following the Beck et al. [35] bioclimatic map, and the score is calculated independently of its bioclimatic class. For example, “8. Temperate broadleaf deciduous” and “9. Tropical broadleaf deciduous” are two types of “broadleaf deciduous”. They will have the same score values:

S_{s p} (x,

“8. Temperate broadleaf deciduous”

) = S_{s p} (x,

“9. Tropical broadleaf deciduous”) for all x.

The Forest Primary Label

The “Forest” primary label includes secondary labels having a more regular structure than the other primary labels. Indeed, the seven classes of forests are a combination of the following:

The bioclimatic classification (i.e., “boreal”, “temperate”, or “tropical”);
The dominant leaf type (i.e., “broadleaf”, “needleleaf”);
The leaf cycle (i.e., “deciduous”, “evergreen”).

Therefore, the secondary labels that fall under the “Forest” primary label are obtained using a deeper hierarchy of labels than other primary labels. The bioclimatic class is given by Beck et al. [35], as explained in the previous paragraph. The dominant leaf type and the leaf cycle are jointly taken from datasets specific to these information. The refinement score

S_{r f}

(see Equation (8)) is then calculated ignoring the bioclimatic class.

Appendix E. Limitations of NLC 2018: Comparison Against LUCAS

Figure A1 shows the normalized confusion matrix of ECOSG, ESA WorldCover, ECOSG+, and NLC 2018 for the Republic of Ireland. ECOSG+, ECOSG, and ESA WorldCover each have dark-colored off-diagonal entries, indicating an over-estimation of grassland over Ireland, which seems to be stronger than for over the rest of Europe (Figure 5). Shrubs are barely found on any of the maps. They are misclassified as grassland in ESA WorldCover and ECOSG+, while in ECOSG “Shrubs” are distributed between “Shrubs” and “Flooded vegetation” (compare the LUCAS shrub label (y-axis) with what appears in the other datasets (x-axis)). Meanwhile, NLC 2018 has a smaller overestimation of grassland and only confuses grassland with urban area. This is possibly due to the conversion of the NLC 2018 labels “510 Improved Grassland” and “520 Amenity Grassland” but we note that this confusion is less pronounced than for the other datasets. NLC 2018 also indicates the presence of “Flooded vegetation” instead of “Shrubs”, another possible consequence of the conversion to primary labels. NLC 2018’s overall accuracy of 0.571 is below the producer accuracy range; however, once “Shrubs”and “Flooded vegetation” are not taken into account the overall accuracy is 0.797 with the producer accuracy range. This indicates NLC 2018 is a valid reference except for “Shrubs” and “Flooded vegetation” primary labels,

The F1-scores in Table A9 show that NLC 2018 has the highest F1-score for “Water bodies”, “Bare land”,“Shrubs”, and “Urban” and for every other label, its F1-score is less than 0.058 away from the label’s highest score. This consistency in the F1-score across all labels confirms that NLC 2018, converted to primary labels, is reasonable over the Republic of Ireland.

Figure A1. Normalised confusion matrices for ECOSG+, NLC 2018, ECOSG, and ESA WorldCover, against LUCAS over Ireland.

Table A9. F1-scores over Ireland for the ECOSG, ECOSG+, NLC 2018, and ESA WorldCover maps with LUCAS 2022 as a reference. The maps have been converted to primary land cover labels. For each primary label, bold font indicates the best result between ECOSG and ECOSG+. ESA WorldCover scores are only informative, as we first want to compare NLC 2018, ECOSG, and ECOSG+. Stars indicate each row’s highest value.

	ECOSG+	ECOSG	NLC 2018	ESA WorldCover
Water bodies	0.843	0.684	0.896 *	0.892
Bare land	0.086	0.086	0.280 *	0.053
Forest	0.537	0.164	0.605	0.614 *
Shrubs	NaN	NaN	0.080 *	NaN
Grassland	0.784	0.692	0.730	0.788 *
Crops	0.644	0.411	0.690	0.737 *
Flooded vegetation	0.269 *	0.205	0.262	0.092
Urban	0.273	0.188	0.456 *	0.295

Notes

1	See CNRM wiki page on ECOCLIMAP-SG: https://opensource.umr-cnrm.fr/projects/ecoclimap-sg/wiki (access on 13 September 2024).
2	Except for one map of Portugal that does not include the snow primary label.
3	In the case of a joint maximum in $S_{r f}$ , we take the $l_{2}$ with the highest $S_{s p}$ . If the highest $S_{s p}$ is also a joint maximum, we take the $l_{2}$ with the lowest label number (see Appendix D.3 for an example).
4	Primary labels when the reference is LUCAS 2022 or NLC 2018, secondary labels when the reference is ECOSGIMO.

References

Walsh, E.; Bessardon, G.; Gleeson, E.; Ulmas, P. Using machine learning to produce a very high resolution land-cover map for Ireland. Adv. Sci. Res. 2021, 18, 65–87. [Google Scholar] [CrossRef]
CNRM. Research Demonstration Project Paris 2024 Olympics; CNRM: Toulouse, France, 2024. [Google Scholar]
Lemonsu, A.; Alessandrini, J.; Capo, J.; Claeys, M.; Cordeau, E.; de Munck, C.; Dahech, S.; Dupont, J.; Dugay, F.; Dupuis, V.; et al. The heat and health in cities (H2C) project to support the prevention of extreme heat in cities. Clim. Serv. 2024, 34, 100472. [Google Scholar] [CrossRef]
Hagelin, S.; Auger, L.; Brovelli, P.; Dupont, O. Nowcasting with the AROME model: First results from the high-resolution AROME airport. Weather Forecast. 2014, 29, 773–787. [Google Scholar] [CrossRef]
Lean, H.W.; Theeuwes, N.E.; Baldauf, M.; Barkmeijer, J.; Bessardon, G.; Blunn, L.; Bojarova, J.; Boutle, I.A.; Clark, P.A.; Demuzere, M.; et al. The hectometric modelling challenge: Gaps in the current state of the art and ways forward towards the implementation of 100 m scale weather and climate models. Q. J. R. Meteorol. Soc. 2024, 1–38. [Google Scholar] [CrossRef]
Malinowski, R.; Lewiński, S.; Rybicki, M.; Gromny, E.; Jenerowicz, M.; Krupiński, M.; Nowakowski, A.; Wojtkowski, C.; Krupiński, M.; Krätzschmar, E.; et al. Automated Production of a Land Cover/Use Map of Europe Based on Sentinel-2 Imagery. Remote Sens. 2020, 12, 3523. [Google Scholar] [CrossRef]
Venter, Z.S.; Sydenham, M.A.K. Continental-Scale Land Cover Mapping at 10 m Resolution Over Europe (ELC10). Remote Sens. 2021, 13, 2301. [Google Scholar] [CrossRef]
Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use/land cover with Sentinel 2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4704–4707. [Google Scholar] [CrossRef]
Lydon, K.; Smith, G. National Land Cover Map of Ireland 2018 Final Report; Technical Report, Tailte Éireann in Partnership with the Environmental Protection Agency (EPA) and with the Support of Members of the Cross-Governmental National Landcover and Habitat Mapping (NLCHM) Working Group; Tailte Éireann: Dublin, Ireland, 2023. [Google Scholar]
Mallet, C.; Le Bris, A. Current Challenges in Operational Very High Resolution Land-cover Mapping. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. ISPRS Arch. 2020, 43, 703–710. [Google Scholar] [CrossRef]
Radoux, J.; Bourdouxhe, A.; Copp, T.; Vroey, M.D.; Dufr, M.; Defourny, P. A Consistent Land Cover Map Time Series at 2 m Spatial Resolution—The LifeWatch 2006–2015–2018–2019 Dataset for Wallonia. Data 2023, 8, 13. [Google Scholar] [CrossRef]
Masson, V.; Champeaux, J.L.; Chauvin, F.; Meriguet, C.; Lacaze, R. A global database of land surface parameters at 1-km resolution in meteorological and climate models. J. Clim. 2003, 16, 1261–1282. [Google Scholar] [CrossRef]
Faroux, S.; Kaptué Tchuenté, A.T.; Roujean, J.L.; Masson, V.; Martin, E.; Le Moigne, P.; Le Moigne, P. ECOCLIMAP-II/Europe: A twofold database of ecosystems and surface parameters at 1 km resolution based on satellite information for use in land surface, meteorological and climate models. Geosci. Model Dev. 2013, 6, 563–582. [Google Scholar] [CrossRef]
Bengtsson, L.; Andrae, U.; Aspelien, T.; Batrak, Y.; Calvo, J.; de Rooy, W.; Gleeson, E.; Hansen-Sass, B.; Homleid, M.; Hortal, M.; et al. The HARMONIE–AROME Model Configuration in the ALADIN–HIRLAM NWP System. Mon. Weather Rev. 2017, 145, 1919–1935. [Google Scholar] [CrossRef]
Schulz, J.P.; Mercogliano, P.; Adinolfi, M.; Apreda, C.; Bassani, F.; Bucchignani, E.; Campanale, A.; Cinquegrana, D.; Dumitrache, R.; Fedele, G.; et al. A New Urban Parameterisation for the ICON Atmospheric Model. In Proceedings of the COSMO General Meeting, Gdansk, Poland, 11–15 September 2023. [Google Scholar]
Hoffmann, J. The future of satellite remote sensing in hydrogeology. Hydrogeol. J. 2005, 13, 247–250. [Google Scholar] [CrossRef]
Talukdar, S.; Singha, P.; Mahato, S.; Shahfahad; Pal, S.; Liou, Y.A.; Rahman, A. Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sens. 2020, 12, 1135. [Google Scholar] [CrossRef]
Keany, E.; Bessardon, G.; Gleeson, E. Using machine learning to produce a cost-effective national building height map of Ireland to categorise local climate zones. Adv. Sci. Res. 2022, 19, 13–27. [Google Scholar] [CrossRef]
D’Andrimont, R.; Verhegghen, A.; Lemoine, G.; Kempeneers, P.; Meroni, M.; van der Velde, M. From parcel to continental scale – A first European crop type map based on Sentinel-1 and LUCAS Copernicus in-situ observations. Remote Sens. Environ. 2021, 266, 112708. [Google Scholar] [CrossRef]
Rieutord, T.; Bessardon, G.; Gleeson, E. High-resolution land use land cover dataset for meteorological modelling – Part 2: ECOCLIMAP-SG-ML an ensemble land cover map. Earth Syst. Sci. Data 2024. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man. Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Kimpson, T.; Choulga, M.; Chantry, M.; Balsamo, G.; Boussetta, S.; Dueben, P.; Palmer, T. Deep learning for quality control of surface physiographic fields using satellite Earth observations. Hydrol. Earth Syst. Sci. 2023, 27, 4661–4685. [Google Scholar] [CrossRef]
Ballin, M.; Barcaroli, G.; Masselli, G. New LUCAS 2022 Sample and Subsamples Design: Criticalities and Solutions; Technical Report; Publications Office of the European Union: Brussels, Belgium, 2022. [Google Scholar] [CrossRef]
EEA. CLC+Backbone 2018 (Raster 10 m), Europe, 3-Yearly, Feb. 2023; EEA: Copenhagen, Denmark, 2022. [Google Scholar] [CrossRef]
Ulmas, P.; Liiv, I. Segmentation of Satellite Imagery using U-Net Models for Land Cover Classification. arXiv 2020, arXiv:2003.02899. [Google Scholar]
Camacho Olmedo, M.T.; García-Álvarez, D.; Gallardo, M.; Mas, J.F.; Paegelow, M.; Castillo-Santiago, M.Á.; Molinero-Parejo, R. Validation of Land Use Cover Maps: A Guideline. In Land Use Cover Datasets and Validation Tools: Validation Practices with QGIS; Springer International Publishing: Cham, Switzerland, 2022; pp. 35–46. [Google Scholar] [CrossRef]
Ottósson, J.G.; Sveinsdóttir, A.; Harðardóttir, M. Vistgerðir á Íslandi. Fjölrit Náttúrufræðistofnunar 2016, 54, 1–299. [Google Scholar]
EEA. CORINE Land Cover 2018 (Raster 100 m), Europe, 6-Yearly—Version 2020_20u1, May 2020; EEA: Copenhagen, Denmark, 2020. [Google Scholar] [CrossRef]
Zanaga, D.; Kerchove, R.V.D.; Daems, D.; Keersmaecker, W.D.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 v200; ESA: Paris, France, 2022. [Google Scholar] [CrossRef]
Maslov, K.A.; Persello, C.; Schellenberger, T.; Stein, A. Towards Global Glacier Mapping with Deep Learning and Open Earth Observation Data. arXiv 2024, arXiv:2401.15113. [Google Scholar]
Mitkari, K.V.; Arora, M.K.; Tiwari, R.K.; Sofat, S.; Gusain, H.S.; Tiwari, S.P. Large-Scale Debris Cover Glacier Mapping Using Multisource Object-Based Image Analysis Approach. Remote Sens. 2022, 14, 3202. [Google Scholar] [CrossRef]
Nath, A.; Koley, B.; Choudhury, T.; Saraswati, S.; Ray, B.C.; Um, J.S.; Sharma, A. Assessing Coastal Land-Use and Land-Cover Change Dynamics Using Geospatial Techniques. Sustainability 2023, 15, 7398. [Google Scholar] [CrossRef]
Bessardon, G.; Gleeson, E. Using the Best Available Physiography to Improve Weather Forecasts for Ireland. 2019. Available online: https://presentations.copernicus.org/EMS2019/EMS2019-702_presentation.pdf (accessed on 30 April 2021).
Venter, Z.S.; Barton, D.N.; Chakraborty, T.; Simensen, T.; Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 2022, 14, 4101. [Google Scholar] [CrossRef]
Beck, H.E.; Zimmermann, N.E.; McVicar, T.R.; Vergopolan, N.; Berg, A.; Wood, E.F. Present and future Köppen-Geiger climate classification maps at 1-km resolution. Sci. Data 2018, 5, 180214. [Google Scholar] [CrossRef]
Bunting, P.; Rosenqvist, A.; Hilarides, L.; Lucas, R.M.; Thomas, N.; Tadono, T.; Worthington, T.A.; Spalding, M.; Murray, N.J.; Rebelo, L.M. Global Mangrove Extent Change 1996–2020: Global Mangrove Watch Version 3.0. Remote Sens. 2022, 14, 3657. [Google Scholar] [CrossRef]
Bruzzone, L.; Bovolo, F.; Amodio, A.; Brovelli, M.; Corsi, M.; Defourny, P.; Domingo, C.; Gamba, P.; Kolitzus, D.; Lamarche, C.; et al. ESA High Resolution Land Cover Climate Change Initiative: High Resolution Land Cover Maps in Amazonia (Eastern Amazonas Region) at 10 m Spatial Resolution for 2019 in Geotiff Format, v1.2; NERC EDS Centre for Environmental Data Analysis: Didcot, UK, 2024. [Google Scholar] [CrossRef]
Bruzzone, L.; Bovolo, F.; Amodio, A.; Brovelli, M.; Corsi, M.; Defourny, P.; Domingo, C.; Gamba, P.; Kolitzus, D.; Lamarche, C.; et al. ESA High Resolution Land Cover Climate Change Initiative: High Resolution Land Cover Maps in Africa (Eastern Sahel Region) at 10 m Spatial Resolution for 2019 in Geotiff Format, v1.2; NERC EDS Centre for Environmental Data Analysis: Didcot, UK, 2024. [Google Scholar] [CrossRef]
Roy, S.; Schwehr, K.; Pasquarella, V.; Swetnam, T. Samapriya/Awesome-Gee-Community-Datasets: Community Catalog; Zenodo: Genève, Switzerland, 2023. [Google Scholar] [CrossRef]
Bessardon, G.; Rieutord, T.; Gleeson, E.; Oswald, S. ECOCLIMAP-SG+: An Agreement-Based High-Resolution Land Use Land Cover Dataset for Meteorological Modelling; Zenodo: Genève, Switzerland, 2024. [Google Scholar] [CrossRef]
Liu, C.; Xu, X.; Feng, X.; Cheng, X.; Liu, C.; Huang, H. CALC-2020: A new baseline land cover map at 10 m resolution for the circumpolar Arctic. Earth Syst. Sci. Data 2023, 15, 133–153. [Google Scholar] [CrossRef]
Buchhorn, M.; Smets, B.; Bertels, L.; Roo, B.D.; Lesiv, M.; Tsendbazar, N.E.; Herold, M.; Fritz, S. Copernicus Global Land Service: Land Cover 100m: Collection 3: Epoch 2019: Globe; Zenodo: Genève, Switzerland, 2020. [Google Scholar] [CrossRef]
Tricht, K.V.; Degerickx, J.; Gilliams, S.; Zanaga, D.; Battude, M.; Grosu, A.; Brombacher, J.; Lesiv, M.; Bayas, J.C.L.; Karanam, S.; et al. WorldCereal: A dynamic open-source system for global-scale, seasonal, and reproducible crop and irrigation mapping. Earth Syst. Sci. Data 2023, 15, 5491–5515. [Google Scholar] [CrossRef]
Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30 m resolution sample set collected in 2015 to mapping 10 m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef]
Pesaresi, M.; Politis, P. GHS-BUILT-C R2023A—GHS Settlement Characteristics, Derived from Sentinel2 Composite (2018) and Other GHS R2023A Data; European Commission, Joint Research Centre (JRC): Brussels, Belgium, 2023. [Google Scholar] [CrossRef]
Pesaresi, M.; Politis, P. GHS-BUILT-S R2023A—GHS Built-Up Surface Grid, Derived from Sentinel2 Composite and Landsat, Multitemporal (1975–2030); European Commission, Joint Research Centre (JRC): Brussels, Belgium, 2023. [Google Scholar] [CrossRef]
Demuzere, M.; Kittner, J.; Martilli, A.; Mills, G.; Moede, C.; Stewart, I.D.; Vliet, J.V.; Bechtel, B. A global map of local climate zones to support earth system modelling and urban-scale environmental science. Earth Syst. Sci. Data 2022, 14, 3835–3873. [Google Scholar] [CrossRef]
Liangyun, L.; Xiao, Z.; Xidong, C.; Yuan, G.; Jun, M. GLC_FCS30-2020: Global Land Cover with Fine Classification System at 30m in 2020; Zenodo: Genève, Switzerland, 2020. [Google Scholar] [CrossRef]
Allen, G.H.; Pavelsky, T.M. Global River Widths from Landsat (GRWL) Database; Zenodo: Genève, Switzerland, 2018. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Zhao, T.; Chen, X.; Lin, S.; Wang, J.; Mi, J.; Liu, W. GWL_FCS30: A global 30 m wetland map with a fine classification system using multi-sourced and time-series remote sensing imagery in 2020. Earth Syst. Sci. Data 2023, 15, 265–293. [Google Scholar] [CrossRef]
Messager, M.L.; Lehner, B.; Grill, G.; Nedeva, I.; Schmitt, O. Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nat. Commun. 2016, 7, 1–11. [Google Scholar] [CrossRef] [PubMed]
Yamazaki, D.; Ikeshima, D.; Sosa, J.; Bates, P.D.; Allen, G.H.; Pavelsky, T.M. MERIT Hydro: A High-Resolution Global Hydrography Map Based on Latest Topography Dataset. Water Resour. Res. 2019, 55, 5053–5073. [Google Scholar] [CrossRef]
EEA. Coastal Zones Land Cover/Land Use 2018 (Vector), Europe, 6-Yearly, February 2021; European Environment Agency (EEA) Datahub: Copenhagen, Denmark, 2021. [Google Scholar] [CrossRef]
Witjes, M. OSM Grass; OpenGeoHub Foundation: Doorwerth, The Netherlands, 2022. [Google Scholar]
Parente, L.; Witjes, M.; Hengl, T.; Landa, M.; Brodsky, L. Continental Europe Land Cover Mapping at 30m Resolution Based CORINE and LUCAS on Samples; Zenodo: Geneva, Switzerland, 2021. [Google Scholar] [CrossRef]
Marsoner, T.; Simion, H.; Giombini, V.; Vigl, L.E.; Candiago, S. A detailed land use/land cover map for the European Alps macro region. Sci. Data 2023, 10, 468. [Google Scholar] [CrossRef] [PubMed]
EEA. EU-Hydro—Coastline—Version 1.2, September 2020; European Environment Agency (EEA) Datahub: Copenhagen, Denmark, 2020. [Google Scholar]
Bocher, E.; Bernard, J.; Wiederhold, E.; Leconte, F.; Petit, G.; Palominos, S.; Noûs, C. GeoClimate: A Geospatial processing toolbox for environmental and climate studies. J. Open Source Softw. 2021, 6, 3541. [Google Scholar] [CrossRef]
EEA. Grassland 2018 (Raster 10 m), Europe, 3-Yearly, August 2020; EEA: Copenhagen, Denmark, 2020. [Google Scholar] [CrossRef]
EEA. Imperviousness Density 2018 (Raster 10 m), Europe, 3-Yearly, August 2020; EEA: Copenhagen, Denmark, 2020. [Google Scholar] [CrossRef]
EEA. N2K 2018 (Vector), Europe, 6-Yearly, July 2021; EEA: Copenhagen, Denmark, 2021. [Google Scholar] [CrossRef]
Eurogeographics. EuroRegionalMap; EuroGeographics AISBL: Brussels, Belgium, 2021. [Google Scholar]
EEA. Riparian Zones Land Cover/Land Use 2018 (Vector), Europe, 6-Yearly, December 2021; EEA: Copenhagen, Denmark, 2021. [Google Scholar] [CrossRef]
Costa, H.; Benevides, P.; Moreira, F.D.; Moraes, D.; Caetano, M. Spatially Stratified and Multi-Stage Approach for National Land Cover Mapping Based on Sentinel-2 Data and Expert Knowledge. Remote Sens. 2022, 14, 1865. [Google Scholar] [CrossRef]
Naturvårdsverket. Nationella marktäckedata 2018: Basskikt; Swedish Environmental Protection Agency: Stockholm, Sweden, 2018.
Thierion, V.; Vincent, A.; Valero, S. Theia OSO Land Cover Map 2020; Zenodo: Genève, Switzerland, 2022. [Google Scholar] [CrossRef]
EPA. Water Framework Directive Canal Waterbodies; EPA: Johnstown Castle Estate, Wexford, Ireland, 2020. [Google Scholar]
EPA. Water Framework Directive Coastal Waterbodies; EPA: Johnstown Castle Estate, Wexford, Ireland, 2020. [Google Scholar]
EPA. Water Framework Directive Lake Waterbodies; EPA: Johnstown Castle Estate, Wexford, Ireland, 2020. [Google Scholar]
EPA. Water Framework Directive River Waterbodies; EPA: Johnstown Castle Estate, Wexford, Ireland, 2020. [Google Scholar]
EPA. Water Framework Directive Transitional Waterbodies; EPA: Johnstown Castle Estate, Wexford, Ireland, 2020. [Google Scholar]
CEC. North American Land Cover, 2020 (Landsat, 30m); Commission for Environmental Cooperation (CEC): Montreal, QC, Canada, 2023. [Google Scholar]
Wickham, J.; Stehman, S.V.; Sorenson, D.G.; Gass, L.; Dewitz, J.A. Thematic accuracy assessment of the NLCD 2019 land cover for the conterminous United States. GIScience Remote Sens. 2023, 60, 1143. [Google Scholar] [CrossRef]
Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. J. Open Source Softw. 2020, 5, 2305. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Wu, Q.; Lane, C.R.; Li, X.; Zhao, K.; Zhou, Y.; Clinton, N.; DeVries, B.; Golden, H.E.; Lang, M.W. Integrating LiDAR data and multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine. Remote Sens. Environ. 2019, 228, 1–13. [Google Scholar] [CrossRef]

Figure 1. Illustration showing the backbone and specialist maps, primary labels, and the ECOCLIMAP-Second Generation secondary labels.

Figure 2. Overview of ECOSG+ (top) and its quality score (bottom) on the EURAT domain. Upsampled at 0.1° in EPSG:4326 with the nearest neighbor method.

Figure 3. Distribution of the land cover labels over the EURAT domain at 0.1 resolution for ECOSG+ (outer ring) and ECOSG (inner ring). Labels with less than 2% coverage have been removed from the annotations. A total of 33.79% of the pixels have a quality score above the 0.525 threshold.

Figure 4. Qualitative verification of the ECOSG+ map on a given set of patches.

Figure 5. Row-wise normalized confusion matrices of ECOSG+, ECOSG+ downsampled at 300 m (ECOSG+300), ECOSG, and ESA WorldCover over Europe, using LUCAS 2022 as a reference.

Figure 6. Row-wise normalized confusion matrices for ECOSG+, ECOSG, and ESA WorldCover, against NLC 2018 over Ireland.

Figure 7. Row-wise normalized confusion matrices for ECOSG+, ECOSG, against ECOSGIMO over Iceland. For visibility, labels not in ECOSGIMO have been removed.

Table 1. F1-scores over Europe for the ECOSG, ECOSG+, ECOSG+300, and ESA WorldCover maps with LUCAS 2022 as a reference. The maps have been converted to primary land cover labels. For each primary label, bold font indicates the highest ECOSG or ECOSG+ F1-score. ESA WorldCover scores are purely informative, as we first want to compare ECOSG and ECOSG+. Stars indicate the highest value per label.

	ECOSG+	ECOSG+300	ECOSG	ESA WorldCover
Water bodies	0.786	0.709	0.600	0.831 *
Bare land	0.154 *	0.146	0.126	0.138
Snow	0.788 *	0.783	0.708	0.643
Forest	0.709	0.648	0.545	0.749 *
Shrubs	0.114	0.094	0.091	0.171 *
Grassland	0.599	0.522	0.304	0.622 *
Crops	0.614	0.570	0.471	0.705 *
Flooded vegetation	0.438 *	0.376	0.342	0.367
Urban	0.374	0.321	0.279	0.418 *

Table 2. F1-scores over Ireland for the ECOSG+, ECOSG, and ESA WorldCover maps with NLC 2018 as a reference. The maps have been converted to primary land cover labels. For each primary label, bold font indicates the highest ECOSG or ECOSG+ F1-score. ESA WorldCover scores are purely informative, as we first want to compare ECOSG and ECOSG+. Stars indicate the highest value per label.

	ECOSG+	ECOSG	ESA WorldCover
Water bodies	0.964	0.932	0.966 *
Bare land	0.109	0.151	0.043
Forest	0.725 *	0.299	0.710
Shrubs	0.000	0.000	0.000
Grassland	0.746 *	0.703	0.726
Crops	0.675 *	0.400	0.643
Flooded vegetation	0.171	0.552 *	0.025
Urban	0.465 *	0.340	0.427

Table 3. Primary labels F1-scores over Iceland for ECOSG+, ECOSG, and ESA WorldCover, against ECOSGIMO over Iceland. The maps have been converted to primary land cover labels. Bold font indicates the highest ECOSG or ECOSG+ F1-score. ESA WorldCover scores are purely informative, as we first want to compare ECOSG and ECOSG+. Stars indicate the highest value per label.

	ECOSG+	ECOSG	ESA WorldCover
Water bodies	0.986 *	0.976	0.978
Bare land	0.800 *	0.749	0.706
Snow	0.966 *	0.950	0.907
Forest	0.106	0.069	0.253 *
Shrubs	0.047	0.055	0.228 *
Grassland	0.738 *	0.566	0.738 *
Flooded vegetation	0.174	0.223 *	0.028
Urban	0.411 *	0.334	0.344

Table 4. Secondary labels F1-scores over Iceland for ECOSG+, ECOSG, against ECOSG IMO over Iceland. Bold font indicates the highest ECOSG or ECOSG+ F1-score. A third column with the difference between the two scores has been added to highlight significant F1-score differences (when one value is NaN, it is counted as a 0 in the difference).

	ECOSG+	ECOSG	Gap (New–Old)
1. Sea and oceans	0.992	0.989	0.003
2. Lakes	0.837	0.523	0.314
3. Rivers	0.632	0.201	0.431
4. Bare land	0.428	0.740	−0.312
5. Bare rocks	0.072	0.016	0.056
6. Permanent snow and ice	0.966	0.950	0.016
7. Boreal broadleaf deciduous	0.102	0.000	0.101
8. Temperate broadleaf deciduous	0.001	0.002	−0.001
12. Boreal needleleaf evergreen	0.013	0.012	0.001
13. Boreal needleleaf deciduous	NaN	NaN	NaN
15. Shrubs	0.047	0.056	−0.09
16. Boreal grassland	0.444	NaN	0.444
17. Temperate grassland	0.126	0.298	−0.172
23. Flooded grassland	0.174	0.030	0.144
25. LCZ2: compact mid rise	0.046	NaN	0.046
28. LCZ5: open midrise	NaN	NaN	NaN
29. LCZ6: open low-rise	0.569	0.355	0.214
31. LCZ8: large low-rise	0.400	NaN	0.400
32. LCZ9: sparsely built	0.033	NaN	0.033
33. LCZ10: heavy industry	0.329	NaN	0.329

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bessardon, G.; Rieutord, T.; Gleeson, E.; Pálmason, B.; Oswald, S. High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset. Land 2024, 13, 1811. https://doi.org/10.3390/land13111811

AMA Style

Bessardon G, Rieutord T, Gleeson E, Pálmason B, Oswald S. High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset. Land. 2024; 13(11):1811. https://doi.org/10.3390/land13111811

Chicago/Turabian Style

Bessardon, Geoffrey, Thomas Rieutord, Emily Gleeson, Bolli Pálmason, and Sandro Oswald. 2024. "High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset" Land 13, no. 11: 1811. https://doi.org/10.3390/land13111811

APA Style

Bessardon, G., Rieutord, T., Gleeson, E., Pálmason, B., & Oswald, S. (2024). High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset. Land, 13(11), 1811. https://doi.org/10.3390/land13111811

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 1: ECOCLIMAP-SG+ an Agreement-Based Dataset

Abstract

1. Introduction

2. Materials and Methods

2.1. Material

2.1.1. Primary and Secondary Labels

2.1.2. Land Cover Maps

Backbone Maps

Specialist Maps

2.2. Methods

2.2.1. Construction of ECOSG+

Definition of a Specialist Agreement Score

Refinement of Backbone Maps

Best-Guess Map

Quality Assessment

Assembling

2.2.2. Evaluation of ECOSG+

Reference Maps

Comparison Scores

Baseline Maps

3. Results

3.1. Qualitative Evaluation

3.1.1. Overview of the ECOSG+ Map and Its Quality Score

3.1.2. Distribution of Labels

3.1.3. Zoom on a Few Patches

3.2. Quantitative Evaluations

3.2.1. Europe-Wide Evaluation Against LUCAS

3.2.2. Small Scale Feature Evaluation Against NLC 2018

3.2.3. Secondary Label Evaluation Against ECOSGIMO

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Tables of Land Cover Datasets Used in the Creation of ECOSG+

Appendix B. Conversion Tables for Particular Cases

Appendix C. Conversion Tables Used for the Evaluation

Appendix D. Exceptions and Special Cases in the Construction of ECOSG+

Appendix D.1. Exceptions in the Land Cover Maps

Appendix D.2. Exceptions in the Specialist Agreement Score

Appendix D.3. Exceptions in the Refinement Process

Appendix E. Limitations of NLC 2018: Comparison Against LUCAS

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI