Data Augmentation in Earth Observation: A Diffusion Model Approach
Abstract
:1. Introduction
- Natural Changes: Gradual or abrupt changes caused by natural processes, such as snow cover, droughts, or floods.
- Human Impacts: Anthropogenic activities, such as urbanization, road construction, and deforestation, that alter landscapes.
- Disasters: Extreme events like wildfires, floods, and storms that lead to rapid and significant environmental transformations.
2. Related Work
2.1. Traditional Augmentation Techniques
2.2. Augmentation Techniques Using Diffusion Models
3. Materials and Methods
3.1. Earth Observation Data Augmentation
Algorithm 1: Proposed Four-Stage Data Augmentation Process |
|
3.1.1. Instruction Generation
Generate a detailed and descriptive caption for the provided remote sensing image, focusing on the specified <class> class. In your description, clearly identify the key characteristics visible in the image. If the image suggests any impact of human activity, natural events, or environmental conditions, elaborate on these.
3.1.2. Captioning
3.1.3. Model Fine-Tuning
3.1.4. Data Augmentation
3.2. Experiment
3.2.1. EO Dataset
3.2.2. Data Augmentation Techniques
- Baseline: The baseline serves as a control scenario where no data augmentation is applied beyond resizing the images to pixels. This setup benchmarks the model’s performance on the EuroSAT dataset, providing a reference point for evaluating other augmentation techniques.
- Basic Augmentation: Basic augmentation includes random resized cropping to pixels and random horizontal flipping. These simple transformations help the model become less sensitive to variations in orientation and scale, which are common in EO imagery.
- Advanced Augmentation: Advanced augmentation introduces additional variability through:
- −
- Random horizontal and vertical flips;
- −
- Random rotations between and ;
- −
- Random resized cropping to pixels with a scale range from 0.7 to 1.0.
- AutoAugment for Earth Observation: AutoAugment [15] applies transformations based on policies optimized for the ImageNet dataset. While originally designed for natural images, AutoAugment exposes models to diverse visual variations, which can enhance robustness in EO datasets, based on preliminary findings [33].
- Our Augmentation: Our approach, as detailed in Section 3.1, leverages diffusion models to generate semantically rich and diverse synthetic EO images. We generate images at resolution and resize them to for consistency with other methods. To ensure domain-specific relevance, we employ a prompt specification (Prompt Spec) where all prompts begin with the following prefix:
“Generate a remote sensing satellite image capturing”
- Each prompt focuses on a specific class in the dataset, incorporating key visual and semantic details:
- Annual Crop: Fields of crops arranged in orderly rows with agricultural machinery, capturing agricultural precision.
- Forest: A forest landscape as seen from space, highlighting the canopy’s texture and diversity, with visible paths, clearings, or water bodies. Include signs of deforestation in a small area.
- Herbaceous Vegetation: Areas dominated by herbaceous vegetation, such as meadows or grasslands, showing the texture and color variations of the vegetation.
- Highway: A major highway traversing various landscapes, including bridges, interchanges, and adjacent urban or rural areas.
- Industrial: A large industrial area featuring factories and warehouses, with clear indications of industrial activity such as large parking lots.
- Pasture: A lush pasture area from above, featuring grazing livestock, with no trees and a nearby fire.
- Permanent Crop: A vineyard showing permanent crop arrangements, rows of trees or vines, and possibly signs of ongoing maintenance or harvesting.
- Residential: A dense residential area with a variety of housing units, surrounded by streets and green spaces.
- River: A winding river cutting through diverse landscapes, with adjacent vegetation or urban areas.
- Sea/Lake: A lake or coastal sea area as seen from above, highlighting the surrounding land, including beaches, docks, or natural vegetation.
This detailed prompt specification ensures that the generated images incorporate semantic variations caused by natural changes, human impacts, and disasters. By explicitly targeting these key semantic axes, our method produces diverse and realistic EO images that traditional augmentation techniques cannot achieve.
3.2.3. Model Architecture
3.2.4. Evaluation Metrics
3.2.5. Experimental Procedure
4. Results
4.1. Classification Performance
4.2. Zero-Shot Performance
4.3. Qualitative Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
BLIP | Bootstrapping Language–Image Pre-training |
CLIP | Contrastive Language–Image Pre-training |
DM | Diffusion Model |
EO | Earth Observation |
ESA | European Space Agency |
FID | Fréchet Inception Distance |
LoRA | Low-Rank Adaptation |
LULC | Land Use and Land Cover |
MP | Meta-Prompt |
PS | Prompt Specification |
RN50 | ResNet-50 |
SDG | Sustainable Development Goal |
SGD | Stochastic Gradient Descent |
U-Net | U-shaped Network |
ViT-B/32 | Vision Transformer Base Model with 32 × 32 Patches |
VLM | Vision–Language Model |
Appendix A
Appendix A.1. Synthetic Image Augmentations for Earth Observation
Appendix A.2. Applications of Synthetic Image Augmentations in Earth Observation
References
- Campbell, J.B.; Wynne, R.H.; Thomas, V.A. Introduction to Remote Sensing, 6th ed.; Guilford Press: New York, NY, USA, 2022. [Google Scholar]
- Yang, J.; Gong, P.; Fu, R.; Zhang, M.; Chen, J.; Liang, S.; Xu, B.; Shi, J.; Dickinson, R. The Role of Satellite Remote Sensing in Climate Change Studies. Nat. Clim. Chang. 2013, 3, 875–883. [Google Scholar] [CrossRef]
- Purkis, S.J.; Klemas, V.V. Remote Sensing and Global Environmental Change; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Andries, A.; Morse, S.; Murphy, R.J.; Lynch, J.; Woolliams, E.R. Using Data from Earth Observation to Support Sustainable Development Indicators: An Analysis of the Literature and Challenges for the Future. Sustainability 2022, 14, 1191. [Google Scholar] [CrossRef]
- Sousa, T. Towards Modeling and Predicting the Resilience of Ecosystems. In Proceedings of the 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C), Västerås, Sweden, 1–6 October 2023; pp. 159–165. [Google Scholar] [CrossRef]
- Sathyaraj, P.; Nirmala, G.; Vijayalakshmi, S.; Rajakumar, S. Artificial Intelligence: Applications, Benefits, and Future Challenges in the Monitoring and Prediction of Earth Observations. In Novel AI Applications for Advancing Earth Sciences; IGI Global: Hershey, PA, USA, 2024; pp. 1–18. [Google Scholar] [CrossRef]
- Schmitt, M.; Ahmadi, S.A.; Xu, Y.; Taskin, G.; Verma, U.; Sica, F.; Hansch, R. There Are No Data Like More Data- Datasets for Deep Learning in Earth Observation. IEEE Geosci. Remote Sens. Mag. 2023, 11, 63–97. [Google Scholar] [CrossRef]
- Perez, L.; Wang, J. The Effectiveness of Data Augmentation in Image Classification Using Deep Learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
- Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10674–10685. [Google Scholar] [CrossRef]
- Bansal, M.A.; Sharma, D.R.; Kathuria, D.M. A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications. ACM Comput. Surv. CSUR 2022, 54, 208:1–208:29. [Google Scholar] [CrossRef]
- Elmes, A.; Alemohammad, H.; Avery, R.; Caylor, K.; Eastman, J.R.; Fishgold, L.; Friedl, M.A.; Jain, M.; Kohli, D.; Laso Bayas, J.C.; et al. Accounting for Training Data Error in Machine Learning Applied to Earth Observations. Remote Sens. 2020, 12, 1034. [Google Scholar] [CrossRef]
- Kansakar, P.; Hossain, F. A review of applications of satellite earth observation data for global societal benefit and stewardship of planet earth. Space Policy 2016, 36, 46–54. [Google Scholar] [CrossRef]
- Hendrycks, D.; Mu, N.; Cubuk, E.D.; Zoph, B.; Gilmer, J.; Lakshminarayanan, B. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. arXiv 2019, arXiv:1912.02781. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning Augmentation Policies from Data. arXiv 2018, arXiv:1805.09501. [Google Scholar]
- Abdelhack, M. A Comparison of Data Augmentation Techniques in Training Deep Neural Networks for Satellite Image Classification. arXiv 2020, arXiv:2003.13502v1. [Google Scholar]
- Illarionova, S.; Nesteruk, S.; Shadrin, D.; Ignatiev, V.; Pukalchik, M.; Oseledets, I. MixChannel: Advanced Augmentation for Multispectral Satellite Images. Remote Sens. 2021, 13, 2181. [Google Scholar] [CrossRef]
- Lalitha, V.; Latha, B. A Review on Remote Sensing Imagery Augmentation Using Deep Learning. Mater. Today Proc. 2022, 62, 4772–4778. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 6840–6851. [Google Scholar] [CrossRef]
- Ho, J.; Salimans, T. Classifier-Free Diffusion Guidance. In Proceedings of the NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, Online, 13 December 2021. [Google Scholar] [CrossRef]
- Trabucco, B.; Doherty, K.; Gurinas, M.; Salakhutdinov, R. Effective Data Augmentation with Diffusion Models. arXiv 2023, arXiv:2302.07944. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. In Proceedings of the Neural Information Processing Systems; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 8780–8794. [Google Scholar] [CrossRef]
- Zhao, C.; Ogawa, Y.; Chen, S.; Yang, Z.; Sekimoto, Y. Label Freedom: Stable Diffusion for Remote Sensing Image Semantic Segmentation Data Generation. In Proceedings of the 2023 IEEE International Conference on Big Data, Sorrento, Italy, 15–18 December 2023. [Google Scholar] [CrossRef]
- Sebaq, A.; ElHelw, M. RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model. arXiv 2023, arXiv:2309.02455. [Google Scholar] [CrossRef]
- Lu, X.; Wang, B.; Zheng, X.; Li, X. Exploring Models and Data for Remote Sensing Image Caption Generation. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2183–2195. [Google Scholar] [CrossRef]
- Dai, W.; Li, J.; Li, D.; Tiong, A.M.H.; Zhao, J.; Wang, W.; Li, B.; Fung, P.; Hoi, S. InstructBLIP: Towards General-Purpose Vision-Language Models with Instruction Tuning. arXiv 2023, arXiv:2305.06500. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Xiong, C.; Hoi, S. BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation. arXiv 2022, arXiv:2201.12086. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Hu, J.E.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Chen, W. LoRA: Low-rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; Aberman, K. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22500–22510. [Google Scholar]
- Ha, D.; Dai, A.; Le, Q.V. HyperNetworks. arXiv 2016, arXiv:1609.09106. [Google Scholar]
- Helber, P.; Bischke, B.; Dengel, A.; Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2217–2226. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, S.; Li, X.; Ye, F. Remote Sensing Image Classification with Few Labeled Data Using Semisupervised Learning. Wirel. Commun. Mob. Comput. 2023, 2023, e7724264. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Palatucci, M.; Pomerleau, D.; Hinton, G.E.; Mitchell, T.M. Zero-shot Learning with Semantic Output Codes. In Proceedings of the Advances in Neural Information Processing Systems; Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2009; Volume 22. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Jayasumana, S.; Ramalingam, S.; Veit, A.; Glasner, D.; Chakrabarti, A.; Kumar, S. Rethinking FID: Towards a Better Evaluation Metric for Image Generation. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 9307–9315. [Google Scholar] [CrossRef]
- Ding, M.; Zheng, W.; Hong, W.; Tang, J. CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers. In Proceedings of the Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35, pp. 16890–16902. [Google Scholar]
- Sumbul, G.; Charfuelan, M.; Demir, B.; Markl, V. Bigearthnet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding. In Proceedings of the IGARSS 2019–2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 5901–5904. [Google Scholar] [CrossRef]
- Schmitt, M.; Hughes, L.H.; Qiu, C.; Zhu, X.X. SEN12MS—A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, IV-2/W7, 153–160. [Google Scholar] [CrossRef]
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, New York, NY, USA, 2–5 November 2010; GIS ’10. pp. 270–279. [Google Scholar] [CrossRef]
Augmentation Technique | ResNet50 | ViT-B/32 | ||||||
---|---|---|---|---|---|---|---|---|
Top-1 | Top-3 | F1 | ETT | Top-1 | Top-3 | F1 | ETT | |
Baseline | 33% | 62% | 0.28 | 261 | 83% | 95% | 0.75 | 333 |
Basic Augmentation | 34% | 60% | 0.30 | 264 | 81% | 96% | 0.72 | 337 |
Advanced Augmentation | 36% | 62% | 0.33 | 293 | 81% | 96% | 0.70 | 375 |
AutoAugment | 31% | 55% | 0.27 | 325 | 85% | 94% | 0.80 | 423 |
Our Augmentation | 39% | 66% | 0.36 | 301 | 90% | 99% | 0.85 | 391 |
Model | Original Top-1 | Our Top-1 | Improvement (%) |
---|---|---|---|
CLIP RN50 | 41.1% | 58.07% | +16.97% |
CLIP ViT-B/32 | 49.4% | 69.23% | +19.83% |
Feature | Traditional Methods | Generic Diffusion Models | Our Approach |
---|---|---|---|
Diversity | Low (e.g., flips, rotations) | Moderate (trained on broad imagery) | High (fine-tuned on EO) |
Domain Adaptation | Generic | Limited domain customization | Fine-tuned for EO tasks |
Semantic Diversity | Not addressed | Implicitly addressed | Explicitly targeted via prompts |
EO Suitability | Low (lacks domain realism) | Moderate (not tailored to EO) | High (EO fine-tuned) |
Computational Efficiency | High | Low | Balanced via LoRA |
Scalability | Scalable, but limited in diversity | Scalable, but high compute cost | Domain-specific scalability |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sousa, T.; Ries, B.; Guelfi, N. Data Augmentation in Earth Observation: A Diffusion Model Approach. Information 2025, 16, 81. https://doi.org/10.3390/info16020081
Sousa T, Ries B, Guelfi N. Data Augmentation in Earth Observation: A Diffusion Model Approach. Information. 2025; 16(2):81. https://doi.org/10.3390/info16020081
Chicago/Turabian StyleSousa, Tiago, Benoît Ries, and Nicolas Guelfi. 2025. "Data Augmentation in Earth Observation: A Diffusion Model Approach" Information 16, no. 2: 81. https://doi.org/10.3390/info16020081
APA StyleSousa, T., Ries, B., & Guelfi, N. (2025). Data Augmentation in Earth Observation: A Diffusion Model Approach. Information, 16(2), 81. https://doi.org/10.3390/info16020081