Empirical Study of PEFT Techniques for Winter-Wheat Segmentation †
Abstract
:1. Introduction
- We explore different PEFT techniques to efficiently adapt pre-trained models for crop-type segmentation, and identify the most important parameters for each PEFT method to achieve the best performance.
- We experiment with different datasets from different countries like Germany and Lebanon and consider a realistic and challenging setup where we test the model on different years and regions.
2. Framework
2.1. PEFT Techniques
2.2. Baselines
- Training from Scratch: In this approach, the TSViT model was trained without using any prior weights. The goal of this strategy was to discern the innate potential of the TSViT architecture without the influence of fine-tuning.
- Full Fine-Tuning: Given a pre-trained model, we apply transfer learning by training all model parameters on a new data set using a low learning rate. It represents an aspirational benchmark: any PEFT technique that can perform as well or better than full fine-tuning would be deemed successful.
- Head Fine-Tuning: This technique introduces a layer to the front of the model, which then undergoes training. It is minimalist, targeting only the initial aspects of the model and setting the minimal performance expectation. Any PEFT technique that underperforms compared to this baseline would need re-evaluation.
- Token Tuning: TSViT stands out due to its reliance on temporal tokens for segmentation. This intrinsic characteristic opens up the possibility of a unique fine-tuning technique: Token Tuning. By manipulating the temporal tokens, one can effectively alter the output classes. It is a nuanced method, specifically for the TSViT model.
2.3. Datasets
3. Experimental Results
3.1. BitFit and LoRA
3.2. VPT and AdaptFormer
- Series 1: External prompt is not used, and the model is deep. The F1-score hovers around the low 82% range, regardless of the temporal and spatial dimensions.
- Series 2: We used an external deep prompt that shows outstanding performance, where the highest F1-score of 83.5% is achieved at dimensions .
- Series 3: We used an external shallow prompt and witnessed a dip in performance, with F1-scores of 81.36% and 79% for dimensions and , respectively. This suggests that although the shallow prompt might not be as effective as the deep prompt, it still outperforms head tuning, as shown in Table 1.
- Series 4: We experiment here with setting one dimension (temporal and spatial) at a time to zero while using an external deep prompt. An F1-score of 83.5% is achieved when the temporal dimension is set to 8, and the spatial dimension is equal to zero, but the performance drops significantly to 69.73% the other way around (temporal set to zero).
3.3. Munich 480 Dataset
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tarasiou, M.; Chavez, E.; Zafeiriou, S. ViTs for SITS: Vision Transformers for Satellite Image Time Series. arXiv 2023, arXiv:2301.04944. [Google Scholar]
- Garnot, V.S.F.; Landrieu, L. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 4872–4881. [Google Scholar]
- Lialin, V.; Deshpande, V.; Rumshisky, A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv 2023, arXiv:2303.15647. [Google Scholar]
- Lester, B.; Al-Rfou, R.; Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online, 7–11 November 2021; pp. 3045–3059. [Google Scholar]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
- Karimi Mahabadi, R.; Henderson, J.; Ruder, S. Compacter: Efficient low-rank hypercomplex adapter layers. Adv. Neural Inf. Process. Syst. 2021, 34, 1022–1035. [Google Scholar]
- Chen, S.; Ge, C.; Tong, Z.; Wang, J.; Song, Y.; Wang, J.; Luo, P. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Proceedings of the Thirty-sixth Annual Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Yuan, Y.; Zhan, Y.; Xiong, Z. Parameter-Efficient Transfer Learning for Remote Sensing Image-Text Retrieval. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 5619014. [Google Scholar] [CrossRef]
- Zaken, E.B.; Ravfogel, S.; Goldberg, Y. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv 2021, arXiv:2106.10199. [Google Scholar]
- Rußwurm, M.; Körner, M. Multi-temporal land cover classification with sequential recurrent encoders. ISPRS Int. J. Geo-Inf. 2018, 7, 129. [Google Scholar] [CrossRef]
Method | Trainable Parameters (%) | F1-Score |
---|---|---|
BitFit-Partial Bias | 0.29 | 83.0 |
BitFit-Full Bias | 0.54 | 83.9 |
VPT | 0.29 | 83.5 |
LoRA | 5.87 | 84.76 |
AdaptFormer | 1.09 | 85.0 |
Head tune | 0.05 | 56.0 |
Full fine-tuning | 100 | 84.3 |
Temporal Dimension | Spatial Dimension | External Prompt | Deep Prompt | F1-Score (%) | |
---|---|---|---|---|---|
Series 1 | 4 | 4 | ✘ | ✓ | 82.12 |
8 | 8 | ✘ | ✓ | 82.88 | |
16 | 16 | ✘ | ✓ | 82.35 | |
Series 2 | 4 | 4 | ✓ | ✓ | 82.16 |
8 | 8 | ✓ | ✓ | 82.75 | |
16 | 16 | ✓ | ✓ | 83.50 | |
Series 3 | 8 | 8 | ✓ | ✘ | 81.36 |
4 | 4 | ✓ | ✘ | 79.00 | |
Series 4 | 8 | 0 | ✓ | ✓ | 83.50 |
0 | 8 | ✓ | ✓ | 69.73 |
Training Technique | F1-Score (%) | IoU (%) | Trainable Parameters (%) |
---|---|---|---|
AdaptFormer | 84.7 | 74.3 | 0.808 |
Head Tune | 63.9 | 48.4 | 0.079 |
Partial Token Tune | 67.4 | 52.1 | 0.061 |
Full Token Tune | 75.2 | 61.3 | 0.208 |
Training from scratch | 88.9 | 80.7 | 100 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zahweh, M.H.; Nasrallah, H.; Shukor, M.; Faour, G.; Ghandour, A.J. Empirical Study of PEFT Techniques for Winter-Wheat Segmentation. Environ. Sci. Proc. 2024, 29, 50. https://doi.org/10.3390/ECRS2023-15833
Zahweh MH, Nasrallah H, Shukor M, Faour G, Ghandour AJ. Empirical Study of PEFT Techniques for Winter-Wheat Segmentation. Environmental Sciences Proceedings. 2024; 29(1):50. https://doi.org/10.3390/ECRS2023-15833
Chicago/Turabian StyleZahweh, Mohamad Hasan, Hasan Nasrallah, Mustafa Shukor, Ghaleb Faour, and Ali J. Ghandour. 2024. "Empirical Study of PEFT Techniques for Winter-Wheat Segmentation" Environmental Sciences Proceedings 29, no. 1: 50. https://doi.org/10.3390/ECRS2023-15833
APA StyleZahweh, M. H., Nasrallah, H., Shukor, M., Faour, G., & Ghandour, A. J. (2024). Empirical Study of PEFT Techniques for Winter-Wheat Segmentation. Environmental Sciences Proceedings, 29(1), 50. https://doi.org/10.3390/ECRS2023-15833