LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting
Abstract
:1. Introduction
- We propose LLMDiff, a novel model for precipitation nowcasting, which leverages the exceptional capabilities of a diffusion framework based on Earthformer. This structure excels at handling the inherent complexities and uncertainties associated with meteorological conditions, offering a data-driven approach that enhances the prediction of high-quality sequences, closely mirroring real-world atmospheric dynamics.
- We utilize a two-stage method for training an encoder–decoder conditional network and a denoising network. To explore the potential of LLMs in rainfall prediction, the encoder layer of the denoising network includes a frozen transformer block from pre-trained LLMs. Our approach enhances the precision and reliability of precipitation nowcasting predictions.
- LLMDiff significantly outperforms state-of-the-art methods on the precipitation nowcasting benchmark dataset.
2. Related Work
2.1. Deterministic Predictive Models
2.2. Probabilistic Predictive Models
2.3. Denoise Predictive Models
3. LLMDiff
3.1. Preliminaries
3.2. Overall Architecture
3.2.1. Condition Network
3.2.2. Denoising Design Structure
3.2.3. Frozen LLM Transformer Module
3.3. Parameter Settings for Model Architecture
4. Experiments
4.1. Dataset
4.2. Evaluation Metric
4.3. Comparison Analysis
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.; Wong, W.; Woo, W. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
- Naz, F.; She, L.; Sinan, M.; Shao, J. Enhancing Radar Echo Extrapolation by ConvLSTM2D for Precipitation Nowcasting. Sensors 2024, 24, 459. [Google Scholar] [CrossRef] [PubMed]
- She, L.; Zhang, C.; Man, X.; Luo, X.; Shao, J. A Self-Attention Causal LSTM Model for Precipitation Nowcasting. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops, ICMEW Workshops 2023, Brisbane, Australia, 10–14 July 2023; pp. 470–473. [Google Scholar]
- Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 879–888. [Google Scholar]
- Guen, V.L.; Thome, N. Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 14–19 June 2020; pp. 11471–11481. [Google Scholar]
- Gao, Z.; Tan, C.; Wu, L.; Li, S.Z. SimVP: Simpler yet Better Video Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 3160–3170. [Google Scholar]
- Gao, Z.; Shi, X.; Wang, H.; Zhu, Y.; Wang, Y.; Li, M.; Yeung, D. Earthformer: Exploring Space-Time Transformers for Earth System Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 10674–10685. [Google Scholar]
- Ruiz, N.; Li, Y.; Jampani, V.; Pritch, Y.; Rubinstein, M.; Aberman, K. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 22500–22510. [Google Scholar]
- Harvey, W.; Naderiparizi, S.; Masrani, V.; Weilbach, C.; Wood, F. Flexible Diffusion Modeling of Long Videos. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Blattmann, A.; Dockhorn, T.; Kulal, S.; Mendelevitch, D.; Kilian, M.; Lorenz, D.; Levi, Y.; English, Z.; Voleti, V.; Letts, A.; et al. Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv 2023, arXiv:2311.15127. [Google Scholar]
- Chang, Z.; Zhang, X.; Wang, S.; Ma, S.; Gao, W. STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video Prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 13926–13935. [Google Scholar]
- Zhang, Y.; Long, M.; Chen, K.; Xing, L.; Jin, R.; Jordan, M.I.; Wang, J. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Yang, G.; Fei, N.; Huo, Y.; Lu, Z.; Luo, P.; Ding, M. VDT: General-purpose Video Diffusion Transformers via Mask Modeling. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Lian, L.; Shi, B.; Yala, A.; Darrell, T.; Li, B. LLM-grounded Video Diffusion Models. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Zhang, H.; Li, X.; Bing, L. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023-System Demonstrations, Singapore, 6–10 December 2023; pp. 543–553. [Google Scholar]
- Lin, X.; Tiwari, S.; Huang, S.; Li, M.; Shou, M.Z.; Ji, H.; Chang, S. Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 14846–14855. [Google Scholar]
- Merullo, J.; Castricato, L.; Eickhoff, C.; Pavlick, E. Linearly Mapping from Image to Text Space. In Proceedings of the Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Schwettmann, S.; Chowdhury, N.; Klein, S.; Bau, D.; Torralba, A. Multimodal Neurons in Pretrained Text-Only Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023-Workshops, Paris, France, 2–6 October 2023; pp. 2854–2859. [Google Scholar]
- Wang, W.; Chen, Z.; Chen, X.; Wu, J.; Zhu, X.; Zeng, G.; Luo, P.; Lu, T.; Zhou, J.; Qiao, Y.; et al. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Li, J.; Li, D.; Xiong, C.; Hoi, S.C.H. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In Proceedings of the International Conference on Machine Learning, ICML 2022, Baltimore, MD, USA, 17–23 July 2022; pp. 12888–12900. [Google Scholar]
- Alayrac, J.; Donahue, J.; Luc, P.; Miech, A.; Barr, I.; Hasson, Y.; Lenc, K.; Mensch, A.; Millican, K.; Reynolds, M.; et al. Flamingo: A Visual Language Model for Few-Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Pang, Z.; Xie, Z.; Man, Y.; Wang, Y. Frozen Transformers in Language Models Are Effective Visual Encoder Layers. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Yao, S.; Chen, H.; Thompson, E.J.; Cifelli, R. An Improved Deep Learning Model for High-Impact Weather Nowcasting. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7400–7413. [Google Scholar] [CrossRef]
- Pan, Z.; Hang, R.; Liu, Q.; Yuan, X. A Short-Long Term Sequence Learning Network for Precipitation Nowcasting. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4106814. [Google Scholar] [CrossRef]
- Ma, Z.; Zhang, H.; Liu, J. Focal Frame Loss: A Simple but Effective Loss for Precipitation Nowcasting. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6781–6788. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, L.; Yang, M.; Li, L.; Long, M.; Fei-Fei, L. Eidetic 3D LSTM: A Model for Video Prediction and Beyond. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Jin, Q.; Zhang, X.; Xiao, X.; Wang, Y.; Meng, G.; Xiang, S.; Pan, C. SpatioTemporal Inference Network for Precipitation Nowcasting with Multimodal Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1299–1314. [Google Scholar] [CrossRef]
- Li, W.; Zhou, Y.; Li, Y.; Song, D.; Wei, Z.; Liu, A. Hierarchical Transformer with Lightweight Attention for Radar-Based Precipitation Nowcasting. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1000705. [Google Scholar] [CrossRef]
- Niu, D.; Li, Y.; Wang, H.; Zang, Z.; Jiang, M.; Chen, X.; Huang, Q. FsrGAN: A Satellite and Radar-Based Fusion Prediction Network for Precipitation Nowcasting. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7002–7013. [Google Scholar] [CrossRef]
- Luo, C.; Li, X.; Ye, Y.; Feng, S.; Ng, M.K. Experimental Study on Generative Adversarial Network for Precipitation Nowcasting. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5114220. [Google Scholar] [CrossRef]
- Babaeizadeh, M.; Finn, C.; Erhan, D.; Campbell, R.H.; Levine, S. Stochastic Variational Video Prediction. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Franceschi, J.; Delasalles, E.; Chen, M.; Lamprier, S.; Gallinari, P. Stochastic Latent Residual Video Prediction. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online, 13–18 June 2020; pp. 3233–3246. [Google Scholar]
- Tulyakov, S.; Liu, M.; Yang, X.; Kautz, J. MoCoGAN: Decomposing Motion and Content for Video Generation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1526–1535. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Höppe, T.; Mehrjou, A.; Bauer, S.; Nielsen, D.; Dittadi, A. Diffusion Models for Video Prediction and Infilling. arXiv 2022, arXiv:2206.07696. [Google Scholar]
- Yu, S.; Sohn, K.; Kim, S.; Shin, J. Video Probabilistic Diffusion Models in Projected Latent Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 18456–18466. [Google Scholar]
- Yu, D.; Li, X.; Ye, Y.; Zhang, B.; Luo, C.; Dai, K.; Wang, R.; Chen, X. DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, 17–21 June 2024; pp. 27758–27767. [Google Scholar]
- Asperti, A.; Merizzi, F.; Paparella, A.; Pedrazzi, G.; Angelinelli, M.; Colamonaco, S. Precipitation nowcasting with generative diffusion models. arXiv 2023, arXiv:2308.06733. [Google Scholar]
- Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
- Veillette, M.S.; Samsi, S.; Mattioli, C.J. SEVIR: A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Wu, H.; Yao, Z.; Wang, J.; Long, M. MotionRNN: A Flexible Model for Video Prediction With Spacetime-Varying Motions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Nashville, TN, USA, 20–25 June 2021; pp. 15435–15444. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
- Zhao, Z.; Dong, X.; Wang, Y.; Hu, C. Advancing Realistic Precipitation Nowcasting with a Spatiotemporal Transformer-Based Denoising Diffusion Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4102115. [Google Scholar] [CrossRef]
- Ravuri, S.; Lenc, K.; Willson, M.; Kangin, D.; Lam, R.; Mirowski, P.; Fitzsimons, M.; Athanassiadou, M.; Kashem, S.; Madge, S.; et al. Skilful precipitation nowcasting using deep generative models of radar. Nature 2021, 597, 672–677. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015—18th International Conference Munich, Proceedings, Part III, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Bai, C.; Sun, F.; Zhang, J.; Song, Y.; Chen, S. Rainformer: Features Extraction Balanced Network for Radar-Based Precipitation Nowcasting. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4023305. [Google Scholar] [CrossRef]
Dataset | Size | Seq. Len | Spatial Resolution | |||
---|---|---|---|---|---|---|
Train | val | Test | in | out | ||
SEVIR | 35,718 | 9060 | 12,159 | 13 | 12 | 384 × 384 |
Model | CSI-M↑ | CSI-219↑ | CSI-181↑ | CSI-160↑ | CSI-133↑ | CSI-74↑ | CSI-16↑ | MSE↓ | CRPS↓ |
---|---|---|---|---|---|---|---|---|---|
Unet | 0.3593 | 0.0577 | 0.1580 | 0.2157 | 0.3274 | 0.6531 | 0.7441 | 4.1119 | \ |
ConvLSTM | 0.4185 | 0.1288 | 0.2482 | 0.2928 | 0.4052 | 0.6793 | 0.7569 | 3.7532 | 0.0264 |
PredRNN | 0.4080 | 0.1312 | 0.2324 | 0.2767 | 0.3858 | 0.6713 | 0.7507 | 3.9014 | 0.0271 |
PhyDNet | 0.3940 | 0.1288 | 0.2309 | 0.2708 | 0.3720 | 0.6556 | 0.7059 | 4.8165 | 0.0253 |
E3D-LSTM | 0.4038 | 0.1239 | 0.2270 | 0.2675 | 0.3825 | 0.6645 | 0.7573 | 4.1702 | \ |
Rainformer | 0.3661 | 0.0831 | 0.1670 | 0.2167 | 0.3438 | 0.6585 | 0.7277 | 4.0272 | \ |
Earthformer | 0.4343 | 0.1675 | 0.2815 | 0.3138 | 0.4201 | 0.6845 | 0.7385 | 3.6692 | 0.0251 |
LLMDiff | 0.4508 | 0.1812 | 0.2817 | 0.3305 | 0.4313 | 0.6956 | 0.7576 | 3.5581 | 0.0245 |
Model | Metrics | |||||||
---|---|---|---|---|---|---|---|---|
CSI-M↑ | CSI-219↑ | CSI-181↑ | CSI-160↑ | CSI-133↑ | CSI-74↑ | CSI-16↑ | MSE↓ | |
Earthformer | 0.4343 | 0.1675 | 0.2815 | 0.3138 | 0.4201 | 0.6845 | 0.7385 | 3.669 |
+diffuion | 0.4336 | 0.1757 | 0.2782 | 0.3171 | 0.4242 | 0.6863 | 0.7512 | 3.6352 |
+LLaMa | 0.4239 | 0.1369 | 0.2549 | 0.2989 | 0.4066 | 0.6820 | 0.7539 | 3.6633 |
LLMDiff | 0.4508 | 0.1812 | 0.2817 | 0.3305 | 0.4313 | 0.6956 | 0.7576 | 3.5581 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
She, L.; Zhang, C.; Man, X.; Shao, J. LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting. Sensors 2024, 24, 6049. https://doi.org/10.3390/s24186049
She L, Zhang C, Man X, Shao J. LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting. Sensors. 2024; 24(18):6049. https://doi.org/10.3390/s24186049
Chicago/Turabian StyleShe, Lei, Chenghong Zhang, Xin Man, and Jie Shao. 2024. "LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting" Sensors 24, no. 18: 6049. https://doi.org/10.3390/s24186049
APA StyleShe, L., Zhang, C., Man, X., & Shao, J. (2024). LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting. Sensors, 24(18), 6049. https://doi.org/10.3390/s24186049