High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 2: ECOCLIMAP-SG-ML an Ensemble Land Cover Map
Abstract
:1. Introduction
2. Data
2.1. Land Cover Maps
2.1.1. ECOCLIMAP-SG
2.1.2. ECOCLIMAP-SG+
- Specialist maps (i.e., maps with a focus on a specific land cover type, such as forest, crops, or urban, for example) are more reliable than non-specialist maps;
- The more maps that agree with the land cover label at a given location, the more confident we are of the label;
- When no better information is available, the ECOSG label is kept.
2.1.3. ESA WorldCover
2.1.4. Other Land Cover Maps Used
2.2. Training, Testing, and Validation Sets
3. Methods
3.1. Map Translation with Auto-Encoders
- It requires the training to be redone if changes are made to the input or the output map.
- It does not provide a “common ground” for both maps.
- It is supervised by the output map, with its inaccuracies.
- the reconstruction loss (cross-entropy loss to ensure the auto-encoder correctly reproduces the original map),
- the translation loss (cross-entropy loss to penalize an incorrect translation),
- the embedding loss (mean squared error loss to ensure that the latent space is shared across all maps).
3.2. Training Strategy
3.3. Production of the Final Map: Merging Inference and ECOSG+
3.4. Generation of an Ensemble Land Cover
3.5. Evaluation Method
4. Results
4.1. Evaluation of the Inference Against ECOSG+
4.2. Evaluation of ECOSG-ML Against LUCAS
4.3. Qualitative Evaluation of the Final Map
4.4. Demonstration of Ensemble Land Cover Generation
5. Discussion
5.1. Limitations
- Obviously wrong classifications. Some pixels may show inconsistent land cover (e.g., lake or river pixels surrounded by sea pixels, or permanent snow at low altitude or latitude). These might be due to insufficient input data (e.g., for lakes, ESA WorldCover only shows “water”). This might lead to incorrect triggering of surface parameterisations, but further studies are necessary to assess their impact on the final weather forecast.
- Default secondary labels. For some primary labels (e.g., “Crops” or “Forest”), a default secondary label is predicted almost all the time (“19. winter C3 crops” for “Crops”, “8. temperate broadleaf deciduous” or “12. boreal needleleaf evergreen” for “Forest”), as visible in Figure 4. This results in correct primary label classification but incorrect secondary label classification.
- Too simple ensemble construction. With the current method of generating the members, u is the same everywhere on the map. As a result, all locations are modified in the same way, as if the uncertainty varies the same way everywhere, which may not be valid. Moreover, only a qualitative evaluation of the ensemble is made here. In particular, the representativity of the ensemble to the land cover uncertainty is not established.
5.2. Potential Directions for Improvement
- Enrich input information. Many of the current limitations are due to a lack of input information. The addition of informative variables like elevation or a position encoding would certainly help the network to better detect some labels (such as “6. permanent snow” or the bioclimatic classification). Such complementary information can be added as input to the auto-encoder or in the latent space (therefore as input to the decoder). For example, despite the limitations of ECOSG, it certainly contains valuable information to distinguish some secondary labels. After being projected in the latent space, the information from any land cover has the same resolution and channels, which makes the combination easier. Additionally, any improvements in ECOSG+ will be directly beneficial to ECOSG-ML in both the training and evaluation.
- Better loss function. The current loss is unaware of class similarities (classes are more similar within the same primary label, for example) and is unweighted. It is possible to use class-weighted cross-entropy to address class imbalance, or focal loss to focus more on hard-to-classify examples, or even similarity-aware loss functions that could prioritize minimizing errors within related class groups (e.g., within the same primary label). Stratified sampling could also be used to create more balanced datasets. We suggest making a third phase of training on a DS3 dataset created with stratified sampling. The role of each component in the loss has not been investigated yet. In our experience, never exceeds 6% of the overall loss but has a different dynamic (an increase then a slow and smooth decrease) than that of the other components (noisier decreases). The role it plays in the training process could be further explored with an ablation study.
- Better input for CDF inversion. In this work, we used a single random number for all pixels and patches. This is better than to make a random draw for each pixel because the latter reduces correlation with geographical proximity, and is technically very simple. However, this is not entirely satisfactory because all locations are modified in the same way. A suggestion for future developments is to use a 2-dimensional stochastic process with appropriate properties to generate the members.
5.3. Prospects for Future Use
- Update other components of physiography. To be used in NWP, the whole physiography database must be updated to be consistent with the land cover maps. Other components include Leaf Area Index (LAI), albedo, lake parameters, and tree height. In ECOSG, these components are present but stored in a way that is highly dependent on the land cover map (LAI and tree height are only stored for pixels with vegetation or trees, etc.). Therefore, despite a priori compatibility, as ECOSG is already used in NWP, it can be complicated to reuse the values of ECOSG. Moveover, the values for the other components might be outdated since ECOSG is a static database. Consequently, we recommend using up-to-date, high-quality sources for these other components as much as possible. Over Europe, Copernicus products3 are available. Machine learning can also help to provide up-to-date and fit-for-purpose datasets for these components, such as in e.g., [33].
- Assess benefit of new maps in NWP. Once an updated physiography database is available, the potential benefit of this update will need to be evaluated. In particular, the resolution of ECOSG-ML also allows sub-kilometer NWP experiments to be carried out, for which the influence of the physiography is expected to be large.
- Assess benefit of ensemble land cover maps in physics-driven and data-driven ensemble forecasts. Besides the remaining questions about the representativity of the ensemble, there are open questions on the opportunities for using ensemble land covers in EPS. The effect of using a different land cover for each forecast member is unknown and is, in our opinion, an interesting question.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DS1 | Dataset for phase 1 of the training (France mainland, 5 maps) |
DS2 | Dataset for phase 2 of the training (EURAT, 3 maps) |
ECOSG | ECOCLIMAP-SG: a physiography database currently used in NWP |
ECOSG+ | ECOCLIMAP-SG+: the land cover map created by [25], used as a reference |
ECOSG-ML | ECOCLIMAP-SG-ML: the ensemble land cover map described in this manuscript |
EPS | Ensemble Prediction Systems |
EURAT | Europe-Atlantic domain (longitudes: −32 to 42, latitudes: 20 to 72) |
NWP | Numerical Weather Prediction |
1 | The ECOCLIMAP-SG wiki: https://opensource.umr-cnrm.fr/projects/ecoclimap-sg/wiki (accessed on 4 November 2024). |
2 | MLULC [20] Zenodo archive: https://doi.org/10.5281/zenodo.5843595 (accessed on 4 November 2024). |
3 | Copernicus products. Leaf area index: https://land.copernicus.eu/en/products/vegetation/high-resolution-leaf-area-index (accessed on 4 November 2024); Albedo: https://www.copernicus.eu/en/global-land-surface-albedo (accessed on 4 November 2024); Building height: https://land.copernicus.eu/api/en/products/urban-atlas/building-height-2012 (accessed on 4 November 2024). |
References
- Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef] [PubMed]
- Nuissier, O.; Duffourg, F.; Martinet, M.; Ducrocq, V.; Lac, C. Hectometric-scale simulations of a Mediterranean heavy-precipitation event during the Hydrological cycle in the Mediterranean Experiment (HyMeX) first Special Observation Period (SOP1). Atmos. Chem. Phys. 2020, 20, 14649–14667. [Google Scholar] [CrossRef]
- Sabatier, T.; Largeron, Y.; Paci, A.; Lac, C.; Rodier, Q.; Canut, G.; Masson, V. Semi-idealized simulations of wintertime flows and pollutant transport in an Alpine valley. Part II: Passive tracer tracking. Q. J. R. Meteorol. Soc. 2020, 146, 827–845. [Google Scholar] [CrossRef]
- Lemonsu, A.; Alessandrini, J.; Capo, J.; Claeys, M.; Cordeau, E.; de Munck, C.; Dahech, S.; Dupont, J.; Dugay, F.; Dupuis, V.; et al. The heat and health in cities (H2C) project to support the prevention of extreme heat in cities. Clim. Serv. 2024, 34, 100472. [Google Scholar] [CrossRef]
- Lean, H.W.; Theeuwes, N.E.; Baldauf, M.; Barkmeijer, J.; Bessardon, G.; Blunn, L.; Bojarova, J.; Boutle, I.A.; Clark, P.A.; Demuzere, M.; et al. The hectometric modelling challenge: Gaps in the current state of the art and ways forward towards the implementation of 100-m scale weather and climate models. Q. J. R. Meteorol. Soc. 2024. [Google Scholar] [CrossRef]
- Seity, Y.; Brousseau, P.; Malardel, S.; Hello, G.; Bénard, P.; Bouttier, F.; Lac, C.; Masson, V. The AROME-France Convective-Scale Operational Model. Mon. Weather Rev. 2011, 139, 976–991. [Google Scholar] [CrossRef]
- Bengtsson, L.; Andrae, U.; Aspelien, T.; Batrak, Y.; Calvo, J.; Rooy, W.d.; Gleeson, E.; Hansen-Sass, B.; Homleid, M.; Hortal, M.; et al. The HARMONIE–AROME Model Configuration in the ALADIN–HIRLAM NWP System. Mon. Weather Rev. 2017, 145, 1919–1935. [Google Scholar] [CrossRef]
- Gleeson, E.; Kurzeneva, E.; de Rooy, W.; Rontu, L.; Pérez, D.M.; Clancy, C.; Ivarsson, K.I.; Engdahl, B.J.; Tijm, S.; Nielsen, K.P.; et al. The Cycle 46 Configuration of the HARMONIE-AROME Forecast Model. Meteorology 2024, 3, 354–390. [Google Scholar] [CrossRef]
- Masson, V.; Le Moigne, P.; Martin, E.; Faroux, S.; Alias, A.; Alkama, R.; Belamari, S.; Barbu, A.; Boone, A.; Bouyssel, F.; et al. The SURFEXv7.2 land and ocean surface platform for coupled or offline simulation of earth surface variables and fluxes. Geosci. Model Dev. 2013, 6, 929–960. [Google Scholar] [CrossRef]
- Le Moigne, P.; Boone, A.; Calvet, J.C.; Decharme, B.; Faroux, S.; Gibelin, A.L.; Lebeaupin, C.; Mahfouf, J.F.; Martin, E.; Masson, V. SURFEX Scientific Documentation; Note de Centre (CNRM/GMME); Météo-France: Toulouse, France, 2009; Volume 268.
- Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2021 v200. 2022. Available online: https://zenodo.org/records/7254221 (accessed on 4 November 2024).
- Malinowski, R.; Lewiński, S.; Rybicki, M.; Gromny, E.; Jenerowicz, M.; Krupiński, M.; Nowakowski, A.; Wojtkowski, C.; Krupiński, M.; Krätzschmar, E.; et al. Automated Production of a Land Cover/Use Map of Europe Based on Sentinel-2 Imagery. Remote Sens. 2020, 12, 3523. [Google Scholar] [CrossRef]
- Venter, Z.S.; Sydenham, M.A.K. Continental-Scale Land Cover Mapping at 10 m Resolution Over Europe (ELC10). Remote Sens. 2021, 13, 2301. [Google Scholar] [CrossRef]
- Mirmazloumi, S.M.; Kakooei, M.; Mohseni, F.; Ghorbanian, A.; Amani, M.; Crosetto, M.; Monserrat, O. ELULC-10, a 10 m European Land Use and Land Cover Map Using Sentinel and Landsat Data in Google Earth Engine. Remote Sens. 2022, 14, 3041. [Google Scholar] [CrossRef]
- Sumbul, G.; de Wall, A.; Kreuziger, T.; Marcelino, F.; Costa, H.; Benevides, P.; Caetano, M.; Demir, B.; Markl, V. BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval. IEEE Geosci. Remote Sens. Mag. 2021, 9, 174–180. [Google Scholar] [CrossRef]
- Schmitt, M.; Hughes, L.H.; Qiu, C.; Zhu, X.X. SEN12MS—A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion. arXiv 2019, arXiv:1906.07789. [Google Scholar] [CrossRef]
- Zhang, D.; Zhao, J.; Chen, J.; Zhou, Y.; Shi, B.; Yao, R. Edge-aware and spectral–spatial information aggregation network for multispectral image semantic segmentation. Eng. Appl. Artif. Intell. 2022, 114, 105070. [Google Scholar] [CrossRef]
- Aksoy, A.K.; Ravanbakhsh, M.; Kreuziger, T.; Demir, B. A Consensual Collaborative Learning Method for Remote Sensing Image Classification Under Noisy Multi-Labels. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 3842–3846. [Google Scholar] [CrossRef]
- Baudoux, L.; Inglada, J.; Mallet, C. Toward a Yearly Country-Scale CORINE Land-Cover Map without Using Images: A Map Translation Approach. Remote Sens. 2021, 13, 1060. [Google Scholar] [CrossRef]
- Baudoux, L.; Inglada, J.; Mallet, C. Multi-nomenclature, multi-resolution joint translation: An application to land-cover mapping. Int. J. Geogr. Inf. Sci. 2023, 37, 403–437. [Google Scholar] [CrossRef]
- Gneiting, T.; Raftery, A.E. Weather forecasting with ensemble methods. Science 2005, 310, 248–249. [Google Scholar] [CrossRef]
- Frogner, I.L.; Andrae, U.; Bojarova, J.; Callado, A.; Escribà, P.; Feddersen, H.; Hally, A.; Kauhanen, J.; Randriamampianina, R.; Singleton, A.; et al. HarmonEPS—The HARMONIE ensemble prediction system. Weather Forecast. 2019, 34, 1909–1937. [Google Scholar] [CrossRef]
- Ben Bouallègue, Z.; Clare, M.C.; Magnusson, L.; Gascon, E.; Maier-Gerber, M.; Janoušek, M.; Rodwell, M.; Pinault, F.; Dramsch, J.S.; Lang, S.T.; et al. The rise of data-driven weather forecasting: A first statistical assessment of machine learning-based weather forecasts in an operational-like context. Bull. Am. Meteorol. Soc. 2024, 105, E864–E883. [Google Scholar] [CrossRef]
- Oskarsson, J.; Landelius, T.; Lindsten, F. Graph-based neural weather prediction for limited area modeling. arXiv 2023, arXiv:2309.17370. [Google Scholar]
- Bessardon, G.; Rieutord, T.; Gleeson, E.; Palmason, B.; Oswald, S. High-resolution land use land cover dataset for meteorological modelling—Part 1: ECOCLIMAP-SG+ an agreement-based dataset. Land 2024, 13, 1811. [Google Scholar] [CrossRef]
- Venter, Z.S.; Barton, D.N.; Chakraborty, T.; Simensen, T.; Singh, G. Global 10 m land use land cover datasets: A comparison of dynamic world, world cover and esri land cover. Remote Sens. 2022, 14, 4101. [Google Scholar] [CrossRef]
- Inglada, J.; Vincent, A.; Arias, M.; Tardy, B.; Morin, D.; Rodes, I. Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series. Remote Sens. 2017, 9, 95. [Google Scholar] [CrossRef]
- EEA. CORINE Land Cover 2018 (Vector), Europe, 6-Yearly—Version 2020_20u1, May 2020. 2018. Available online: https://sdi.eea.europa.eu/catalogue/copernicus/api/records/960998c1-1870-4e82-8051-6485205ebbac?language=all (accessed on 4 November 2024).
- Ballin, M.; Barcaroli, G.; Masselli, G. New LUCAS 2022 Sample and Subsamples Design: Criticalities and Solutions; Technical Report; Publications Office of the European Union: Luxembourg, 2022. [Google Scholar] [CrossRef]
- Devroye, L. Non-Uniform Random Variate Generation; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
- Mironov, D.; Heise, E.; Kourzeneva, E.; Ritter, B.; Schneider, N.; Terzhevik, A. Implementation of the lake parameterisation saheme FLake into the numerical weather prediction model COSMO. Boreal Environ. Res. 2010, 15, 218. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Keany, E.; Bessardon, G.; Gleeson, E. Using machine learning to produce a cost-effective national building height map of Ireland to categorise local climate zones. Adv. Sci. Res. 2022, 19, 13–27. [Google Scholar] [CrossRef]
Primary Label | Inference | ECOSG | Secondary Label | Inference | ECOSG | Support |
---|---|---|---|---|---|---|
Water | 0.9149 | 0.4402 | 1. sea and oceans | 0.8185 | 0.5937 | 2% |
2. lakes | 0.7615 | 0.305 | 3% | |||
3. rivers | 0.3268 | 0.0812 | 140,570 | |||
Bare | 0.8767 | 0.6892 | 4. bare land | 0.645 | 0.4758 | 1% |
5. bare rock | 0.7874 | 0.0242 | 1% | |||
Snow | 0.7018 | 0.4119 | 6. permanent snow | 0.7018 | 0.4119 | 83,305 |
Forest | 0.8806 | 0.6206 | 7. boreal broadleaf deciduous | 0.311 | 0.3506 | 397,158 |
8. temperate broadleaf deciduous | 0.6397 | 0.4017 | 14% | |||
9. tropical broadleaf deciduous | - | - | 0 | |||
10. temperate broadleaf evergreen | 0.016 | 0.0808 | 25,087 | |||
11. tropical broadleaf evergreen | - | - | 0 | |||
12. boreal needleleaf evergreen | 0.7628 | 0.6183 | 12% | |||
13. temperate needleleaf evergreen | 0.5098 | 0.2372 | 7% | |||
14. boreal needleleaf deciduous | - | - | 41,558 | |||
Shrubs | 0.0855 | 0.0509 | 15. shrubs | 0.0855 | 0.0509 | 228,646 |
Grass | 0.6983 | 0.428 | 16. boreal grassland | 0.5956 | 0.0574 | 1% |
17. temperate grassland | 0.6848 | 0.4222 | 10% | |||
18. tropical grassland | - | 0.0072 | 597 | |||
Crops | 0.8513 | 0.6773 | 19. winter C3 crops | 0.7021 | 0.5265 | 25% |
20. summer C3 crops | 0.0 | 0.1015 | 3% | |||
21. C4 crops | 0.2624 | 0.1984 | 8% | |||
Flooded | 0.5621 | 0.2335 | 22. flooded trees | 0.0118 | - | 53,089 |
23. flooded grassland | 0.5478 | 0.2293 | 1% | |||
Urban | 0.7543 | 0.3387 | 24. LCZ1: compact high-rise | - | 0.0284 | 8955 |
25. LCZ2: compact midrise | 0.3257 | 0.1207 | 53,105 | |||
26. LCZ3: compact low-rise | 0.0697 | 0.0683 | 33,709 | |||
27. LCZ4: open high-rise | 0.0272 | - | 9746 | |||
28. LCZ5: open midrise | 0.282 | 0.0676 | 139,875 | |||
29: LCZ6: open low-rise | 0.6833 | 0.0781 | 1% | |||
30: LCZ7: lightweight low-rise | - | - | 38 | |||
31: LCZ8: large low-rise | 0.4995 | 0.103 | 254,488 | |||
32: LCZ9: sparsely built | 0.434 | 0.1319 | 3% | |||
33: LCZ10: heavy industry | 0.0998 | 0.0881 | 10,641 | |||
Overall accuracy | 0.831 | 0.583 | Overall accuracy | 0.634 | 0.411 | 50M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rieutord, T.; Bessardon, G.; Gleeson, E. High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 2: ECOCLIMAP-SG-ML an Ensemble Land Cover Map. Land 2024, 13, 1875. https://doi.org/10.3390/land13111875
Rieutord T, Bessardon G, Gleeson E. High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 2: ECOCLIMAP-SG-ML an Ensemble Land Cover Map. Land. 2024; 13(11):1875. https://doi.org/10.3390/land13111875
Chicago/Turabian StyleRieutord, Thomas, Geoffrey Bessardon, and Emily Gleeson. 2024. "High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 2: ECOCLIMAP-SG-ML an Ensemble Land Cover Map" Land 13, no. 11: 1875. https://doi.org/10.3390/land13111875
APA StyleRieutord, T., Bessardon, G., & Gleeson, E. (2024). High-Resolution Land Use Land Cover Dataset for Meteorological Modelling—Part 2: ECOCLIMAP-SG-ML an Ensemble Land Cover Map. Land, 13(11), 1875. https://doi.org/10.3390/land13111875