1. Introduction
Precipitation is a crucial element in Earth’s climate system. Its significant influence spans across various areas of human life, such as agriculture and water resource management. Notably, its variability and extremes can pose considerable risks and dangers to the population and property, underscoring the importance of disaster mitigation strategies. Accurate and prompt precipitation estimates are vital for effective decision making within these domains. Traditional precipitation estimation methods primarily depend on ground-based observations from rain gauges and weather radar systems. However, the spatial distribution of rain gauges is too sparse to accurately capture the variability in precipitation. While radars can provide real-time precipitation measurements with high spatiotemporal resolution, installation and maintenance costs limit their widespread use. As of 2023, one-third of Earth’s countries, including highly populated regions like Africa, lack coverage from precipitation radars [
1].
On the other hand, meteorological satellites allow for global observations, and satellite remote sensing has emerged as a powerful tool for precipitation estimation. This method harnesses the unique capabilities of Earth observation satellites, offering comprehensive spatial coverage and high-frequency measurements. Algorithms for satellite-based precipitation estimation typically merge infrared (IR) measurements from geosynchronous-Earth-orbiting (GEO) satellites with passive microwave (PMW) data from low-Earth-orbiting satellites (LEO). GEO satellites provide nearly global monitoring with a timestep ranging from 5 to 15 min at a high spatial resolution. However, visible and IR channels only provide information about the cloud top, rendering the precipitation estimation indirect. Meanwhile, PMW measurements, which are directly sensitive to hydrometeors, are only available twice a day for a given area and satellite [
2] and have relatively poor spatial resolution.
Active microwave observations from radars onboard satellites, such as the Tropical Rainfall Measuring Mission [
3] (TRMM, 1998–2014) and the ongoing Global Precipitation Measurement Core Observatory mission (GPMCO, 2014–present) [
4], are also available. However, these satellites cover a very limited area. For instance, the swath of the GPMCO only spans 245 km.
Over the past two decades, numerous satellite-based precipitation estimation products have been developed by combining various types of data. These products are widely used for monitoring natural disasters, initializing numerical weather forecasting models, and evaluating precipitation forecasts. For instance, some products exclusively utilize IR data as input. The Global Hydro Estimator (GHE, [
5]) employs a fixed relationship between IR data and rainfall rates, calibrated initially with radar data. The PERSIANN Dynamic Infrared–Rain Rate Model (PDIR, [
6]) calibrates IR data with PMW datasets and climatology data via several machine learning algorithms. On the other hand, Quantitative Precipitation Estimation (QPE, [
7]) and P-IN-SEVIRI [
8] use IR data calibrated in real time with the most recent PMW data through the SCaMPR [
9] and Rapid Update [
10] algorithms, respectively.
Certain products also incorporate several data types. The Global Satellite Mapping of Precipitation (GSMaP, [
11]) combines microwave data with IR data and rain gauges through a Kalman filter. The Integrated Multi-satellitE Retrievals for GPM (IMERG, [
12]) aims to intercalibrate and merge all PMW precipitation estimates with IR estimates, precipitation gauge analyses, and potentially other precipitation estimators. All the mentioned products offer rainfall estimation within an hour, except for IMERG, whose “Early Run” is available four hours post the start of the data acquisition.
Table 1 summarizes the characteristics of these products, which will be used for comparison.
Recent advancements in computing technology and increased availability of large-scale satellite datasets have unveiled new opportunities for enhancing precipitation estimation accuracy with deep learning methods. Deep learning, a branch of machine learning, has earned significant interest across various scientific disciplines due to its ability to autonomously learn intricate patterns and relationships from extensive datasets. Applying deep learning techniques, particularly deep convolutional neural networks (CNNs), to satellite data has demonstrated promising results in several domains, including image classification [
13], object detection [
14], and image segmentation [
15]. Importantly, CNNs are capable of efficiently extracting complex spatial features from images without the need for meticulous feature engineering.
Within the context of precipitation estimation, deep learning algorithms hold the potential to leverage the abundant information in satellite imagery for accurate and real-time precipitation estimation, a potential that has already been successfully demonstrated in several studies. Many researchers have used IR channels as their sole input and ground radars in the United States of America (USA) as their training target to train various networks, such as Stacked Denoising Autoencoders [
16], convolutional neural networks (CNNs) [
17,
18], U-Net, and Conditional Generative Adversarial Networks (cGANs) [
19]. Some researchers, also working with USA ground radar data, have attempted to combine IR data with PMW data using Multi-Layer Perceptron (MLP) [
20] or Generative Adversarial Networks (GANs) [
21]. The authors of [
22] pretrained their network on USA ground radar data and fine-tuned the network on sparse Chinese radar data, achieving superior performance compared to direct training on Chinese data. More recently, ref. [
23] is the sole study using Level 3 IMERG GPM data instead of ground radar data, with the authors focusing on the southeast coast of China and employing five IR channels as input to train an Attention U-Net.
Deep learning has not yet been globally applied to satellite precipitation estimation to investigate if deep neural networks can generalize to various parts of the globe and diverse climates. In this study, the objective is to build upon previous methodologies by training a state-of-the-art DeepLabV3+ architecture on a global precipitation dataset. This dataset is a combination of geostationary satellite data and additional descriptive features as input, with precipitation measurements from the GPMCO satellite serving as the target. Various configurations of input data, loss functions, and hyperparameters will be examined to identify the highest-performing model. The model’s performance will be evaluated globally against the GPMCO data. Additionally, we will compare our model with six operational satellite-based precipitation products using ground-based radar measurements from the French radar network as the reference. This evaluation will be conducted across four distinct regions spread across the globe.
This paper is structured as follows:
Section 2.1 introduces the datasets and study regions;
Section 2.2 describes the neural network and the experimental protocol;
Section 3 evaluates the quality of the model with various input data (
Section 3.1), against a test set of GPMCO data (
Section 3.2), in comparison with other operational products (
Section 3.3), and finally through case studies (
Section 3.4); and
Section 4 and
Section 5 present the conclusions and outline future prospects.
3. Results
3.1. Choice of Best Configuration
This section presents the investigations regarding the influence of the input features on the quality of the output from the neural network. The aim is to identify an optimal configuration to enhance the accuracy of our precipitation estimation model based on satellite observations. The investigations are led methodically: including all available data and then progressively removing different input features.
First, we evaluated the effect of varying the number of GEO images used as the input.
Figure 4a illustrates the performance of the model when varying the number of input images. Interestingly, using only the most recent image resulted in an improved FSS for the precipitation estimation. As depicted in the figure, the model’s performance improved as the historical images were removed, although the performance does not strictly follow the number of images. This indicates that the most recent image provides the most relevant information for estimating the current precipitation. Including additional images introduces either noise or redundant information.
Next, we investigated the influence of different channels on the model’s performance.
Figure 4b provides valuable insights into the significance of channel selection. Firstly, it demonstrates that combining the IR108 channel with WV062 significantly enhanced the FSS score across all the rain rate thresholds, compared to using either of these channels alone. Secondly, adding channels such as VIS006 and IR120 did not significantly improve the model’s FSS and, in fact, resulted in a degraded score. Similar to the historical series of images, the IR120 and VIS006 channels may only introduce noise or redundant information. These findings suggest that a combination of the IR108 and WV062 channels provides sufficient information for reliable precipitation estimates.
Further, we evaluated the significance of our additional descriptive features, including the latitude, longitude, sun elevation, and date information. The inclusion of these features aimed to enhance the model’s capacity to generalize across various regions and climates. We trained the model three times: once with all the additional features, once with only the latitude feature, and finally without any additional features.
Figure 4c shows that incorporating these additional features does not improve the model’s performances but actually degrades them.
These findings contradict our initial assumption that more features would help the network generalize to different situations and enhance its performance. Our hypothesis is that these additional sources of information are not relevant and the variability in the model’s performance is more due to intrinsic variability in the initialization of the weights and the training process than to the addition of features. Consequently, we determined the optimal configuration for the precipitation estimation model: our final configuration, named Espresso, focuses solely on the most relevant information, utilizing only the most recent GEO image as input, combined with the IR108 and WV062 channels. This is the configuration that was used in the subsequent experiments.
3.2. Evaluation of Espresso on the Test Set
Having established the optimal configuration for the Espresso model, its performances against the test set of the GPMCO data were evaluated. The evaluation provides insights into the model’s accuracy and its ability to estimate precipitation effectively.
The evaluation begins with an example of rainfall estimation from Espresso.
Figure 5 depicts the case of cyclone Emnati, which developed into a category 4 tropical cyclone when it passed north of La Réunion and Mauritius on the 20 September 2022. The cyclone caused flooding and wind gusts at speeds of 163 km/h recorded at the Maido station in La Réunion. The figure shows the two channels used as input for the neural network (IR108 and WV062), the estimation from Espresso, and the GPMCO rainfall measure. The comparison between Espresso and the GPMCO reveals certain characteristics. The rainfall field appears more smoothed and spread out compared to the data from the GPMCO. As a result, the spatial precision of Espresso is not as refined as that of the GPMCO, and it overlooks some light rains in the north and south of the GPM swath. Moreover, Espresso tends to enlarge the area’s rainfall, resulting in a consistent overestimation of precipitation within and on the fringes of rain cells. This characteristic can be attributed to the nature of the deep learning regression model trained with the MSE, which is unable to generate data as precise and discontinuous as the GPMCO’s. However, Espresso effectively captures the structure of the cyclone, including the eye wall with its peak rainfall intensity and the surrounding rainbands. The cyclone is well-positioned, and the intense rainfalls associated with the cyclone are accurately represented.
Secondly,
Figure 6 showcases the confusion matrix, providing a comprehensive view of the model’s ability to correctly classify rainfall intensities. It is evident from the figure that the model demonstrates strong performance in accurately identifying the “No Rain” category. However, its accuracy diminishes for higher rainfall thresholds, often resulting in an underestimation of precipitation rates. For instance, more than half of the “light rain” cases are mistakenly classified as “No rain”. Meanwhile, the model overlooks fewer medium and heavy rainfalls cases, still detecting some rain in the majority of cases, even if it is underestimated.
To further assess the model’s performance, we calculated the Fraction Skill Score (FSS), Probability of Detection (POD), and False Alarm Rate (FAR) of Espresso on the test set.
Figure 7 gives an overview of the FSS across various latitudes. Across each latitude band, the FSS decreases as the rainfall threshold increases, reflecting the trends observed in the confusion matrix accuracy. This characteristic can be attributed to the neural network’s challenge in generalizing infrequent events not extensively represented in its training dataset. Additionally, as observed in
Figure 5, the model tends to smooth the finer details present in the GPMCO, occasionally leading to the omission of isolated, intense rainfall pixels.
Furthermore, the model shows its best performance near the equator, with a higher FSS for all three rain thresholds. On the other hand, the FSS on the [−70
; −50
] and [50
; 70
] bands of latitude range from poor to very poor for heavy rainfalls. The POD and FAR, available in
Appendix C, support the same conclusions, even if the difference between the POD near the poles and at the equator is less marked.
These results suggest that while the model is able to accurately detect and estimate precipitation in temperate and tropical regions, it struggles to do so at higher latitudes. Despite precautions to oversample the dataset and weigh the loss in the higher latitudes, heavy rainfalls in these regions are still too scarce to allow the network to learn the different patterns of rain near the poles. In addition, at the poles, the tropopause is lower and the angle between the ground and the satellite’s sensors is greater, leading to a diminished contrast between precipitating and non-precipitating clouds. This makes the task of rain detection more challenging, even to the human eye. Moreover, it is worth noting that the overall FSS, POD, and FAR scores for each latitude band are relatively low. The significant variations in precipitation between adjacent pixels due to localized phenomena pose difficulties for a deep learning model to accurately reproduce, as seen in
Figure 5.
These results, based on the 2022 GPMCO data, underscore Espresso’s ability to effectively estimate precipitation across a range of rainfall categories in temperate and tropical regions. Nearer to the poles it often overlooks rainfall, especially heavy rainfalls. Although the model cannot reproduce the fine details of the GPMCO data, as evident from
Figure 5, it may still prove useful to forecasters in situations of extreme rainfall.
3.3. Comparison with Other Operational Products
In order to thoroughly evaluate the Espresso model, we carried out a comparative analysis against six other operational precipitation estimation products: IMERG, GHE, QPE, PDIR-NOW, P-IN-SEVIRI, and GSMAP. This evaluation concentrated on 1000 samples of 1 h accumulated data from 2022, spanning across our four radar domains. It is worth noting that QPE and P-IN-SEVIRI are generated in a geostationary space view; thus, they are not available across all the French radar domains.
Figure 8 presents the RMSE of each product across each radar domain. The overall RMSE values are higher in the tropical domains, which experience heavier rainfall than mainland France, leading to larger errors for all the models. Espresso is comparable to the other models, with IMERG, GSMAP, and PDIR-NOW competing for the lowest RMSE. The error of Espresso against ground radar data is on par with the other models. However, the RMSE does not provide information about the distributions of these errors.
Next, we examined the FSS, POD, and FAR.
Figure 9 displays the FSS for the France domain for each model. The FSS for the other domains, along with the POD and FAR for each domain, can be found in
Appendix D.1 and
Appendix D.2.
Espresso demonstrates superior performance in the FR, NC, and RE domains, achieving higher scores in terms of the FSS and POD compared to the other models. IMERG trails Espresso in performance, while the rest of the models exhibit comparable quality to each other but fall short of Espresso and IMERG. However, when all the domains are taken into account, Espresso is outpaced by IMERG and occasionally GSMAP and GHE in terms of the FAR for weak and moderate rainfall. Nonetheless, Espresso exhibits a lower FAR for heavy rainfall than IMERG and GSMAP, but on the NC and FR domains, Espresso is bested by GHE for heavy rainfall events.
In the AG domain, the GHE and QPE models outperform IMERG in terms of the FSS and POD. While Espresso maintains its superiority in terms of the FSS, it is significantly outdone by GHE and QPE in the detection of moderate and heavy rainfall. Among the global models, Espresso outperforms the rest in nearly all the FSS and POD scores, particularly in heavy precipitation events.
These results establish that Espresso is better at detecting and localizing rainfall than the other global products, especially heavy rainfall. It demonstrates a superior POD and FSS across all four domains with distinct climates. Simultaneously, its FAR is comparable to other products, indicating that Espresso does not overestimate rainfall and can be used in crisis management. The less pronounced difference with the other products in terms of the FAR and RMSE can be explained by the model’s intrinsic spreading of precipitation, which causes false alarms on the periphery of precipitation cells.
These findings position Espresso as a viable and reliable alternative to existing operational products for precipitation estimation. Espresso delivers real-time, accurate, and efficient precipitation estimates, comparable to or better than the widely recognized IMERG, without the associated data availability time delays.
3.4. Case Study
To further assess the Espresso model’s performance and visually compare it to the other six operational products, Météo-France forecasters conducted a double-blind review involving 15 instances of extreme precipitation events. Where available, their assessments were made in reference to radar readings and rain gauge measurements. The evaluation criteria included the maximum rainfall captured by the radar and predicted by the various models, the spatial distribution and spread of rainfall, and the structural representation of the event.
Overall, the forecasters exhibit a preference for Espresso, primarily due to its superior ability in localizing events and the proximity of its estimations to actual rain gauge values. GHE is the next preferred model due to its ability to accurately locate rainfall, even though the estimations are somewhat underestimated. IMERG, QPE, PDIR-NOW, and P-IN-SEVIRI are regarded similarly, with the precipitation cells often mislocated and underestimated. In contrast, GSMAP is least preferred as it frequently failed to detect precipitation events altogether. However, all the products tend to produce structures larger than the actual events.
Below, the analysis of two specific cases is detailed: one in Montpellier, located in the south of mainland France, along the Mediterranean coast, and another in the island of Guadeloupe.
3.4.1. Case Study 1: Stationary Convective Storm in Montpellier
The first case study features a stationary convective storm that impacted Montpellier on 6 September 2022, leading to a meteorological warning due to flooding risks and potential river overflow. This storm caused substantial rainfall, accumulating up to 70 mm in a single afternoon.
Figure 10 displays the 1 h cumulative estimations from each of the six available models, as well as the ground radar’s estimation.
Upon analysis, it was observed that GSMAP failed to detect the storm, perceiving only light rainfall. While the other operational products successfully pinpointed the storm, they consistently underestimated the associated rainfall. In contrast, Espresso emerged as the superior model in this scenario, accurately identifying and locating the storm and providing reliable rainfall estimates.
These results underscore the robustness and effectiveness of the Espresso model in capturing the detailed features of convective storms and providing accurate precipitation estimates.
Figure 10 also highlights Espresso’s excellent spatial resolution (5 km), which is comparable to GHE, PDIR-NOW, and P-IN-SEVIRI and notably superior to IMERG and GSMAP (10 km).
3.4.2. Case Study 2: Southeast Flow over Guadeloupe Island
Our second case study focuses on a southeast airflow over Guadeloupe island, characterized by warm and humid air. The combination of converging low-level winds, wind shear at higher altitudes, and the unique configuration of the island amplified the convective activity. Consequently, the region witnessed heavy rainfall, resulting in widespread flooding and the issuance of a meteorological warning. Record-breaking precipitation levels were noted, especially at the Raizet station, where 312 mm fell within a 24 h period. This severe rainfall event caused substantial material damage and fatalities.
Figure 11 displays the 1 h cumulative estimations from each of the six available models, as well as the ground radar’s estimation.
Upon evaluating the performance of the operational products in this scenario, we found that both GSMAP and IMERG failed to detect the storm. While GHE, P-IN-SEVIRI, and QPE were successful in pinpointing the storm’s location, they consistently underestimated its intensity. On the other hand, both Espresso and PDIR-NOW exhibited remarkable proficiency in storm localization and intensity estimation. Their ability to accurately locate and estimate extreme rainfall values highlights their potential for providing invaluable data for disaster management and response initiatives.
In conclusion, the Montpellier and Guadeloupe case studies consolidate Espresso’s position as a reliable and capable model for precipitation estimation. Despite its tendency to overestimate precipitation within and on the periphery of rain cells, the model’s demonstrated accuracy in localizing and estimating the intensity of convective storms underscores its practical utility in real-time monitoring and response to extreme weather events.
3.5. Computational Resources
The training of each experiment was performed on four Nvidia Tesla V100 Graphical Processing Units (GPU) and took approximately 4 h for the neural network to converge. Over the course of this year-and-a-half-long research project, a total of 160 training experiments were conducted. This accumulates to a total computation time of 27 days, equivalent to an electricity consumption of 650 kWh.
On the other hand, the inference phase, which covers the entire globe, requires only around 5 min of computation time on a single Central Processing Unit (CPU).
4. Discussion
Throughout this work, we have presented the development and evaluation of Espresso, a convolutional neural network architecture that leverages satellite imagery for global precipitation estimation. As a global model, Espresso overcomes the limitations of data availability delays inherent to operational models such as IMERG, offering real-time precipitation estimates while maintaining or even exceeding the performance of existing operational products. Furthermore, the relatively low computational cost of the inference phase and ease of deployment add to the model’s appeal, making it an attractive solution for operational use.
The model has been carefully designed to ensure that the high resolution of the input satellite data is preserved, contributing to the precise detection and estimation of precipitation events. In the various evaluations conducted, Espresso has demonstrated its ability to accurately capture precipitation patterns across the globe, particularly in temperate and tropical regions. It provides a better POD and FSS, and a similar FAR, when compared to other models, especially for heavy rainfall.
However, the model’s performance was found to be less robust at higher latitudes, an aspect that could be improved in future iterations. While the current approach to addressing the imbalance in the data using oversampling and weighting the loss has proven to be somewhat effective, other methods could be explored to further improve the model’s ability to learn in these regions. One potential solution is to incorporate data from periods before 2018 and after 2022.
Additionally, advancements in deep learning architectures such as Vision Transformers [
35] and diffusion models [
36] could be utilized to develop a model that provides finer details than our DeepLabV3+ and lowers the False Alarm Rate of the model. Furthermore, enhancements in the infrared sensors of the next generation of satellites, such as Meteosat Third Generation, are anticipated to yield higher-resolution infrared GEO images, thereby contributing to improved estimation accuracy.
Furthermore, our attempt to enhance the model’s performance by incorporating additional features like topography, latitude, or season into additional channels has proven to be ineffective. Better results might be achieved by adopting an approach similar to the recent MetNet-3 [
37], where the authors prefer to use topographical embeddings. This allows the network to autonomously discover relevant topographical information and store it in the embedding. This embedding is a trainable parameter, similar to techniques used in Natural Language Processing. Additionally, the seasonal information, which is constant for a single sample or for a global inference, could be integrated into the output of the final encoder layer prior to the first decoder layer. By adopting this approach, the model might be able to make more effective use of this information, as it would be connected to the features already learned from the input image.
5. Conclusions
In conclusion, this paper presents Espresso, a deep convolutional neural network designed for global precipitation estimation using satellite data. The model has demonstrated strong performance across various geographical regions, particularly in temperate and tropical zones. The ability of Espresso to detect and accurately estimate rainfall, especially heavy rainfall, establishes it as a reliable and competitive tool in the field of weather prediction and monitoring.
Despite some limitations in higher latitudes, the model demonstrates significant results and potential for further improvements. Future work could explore new approaches to address the imbalance in the data, incorporate additional data sources, or fine-tune the model parameters to enhance performance. As weather patterns continue to become increasingly complex due to climate change, the role of precise, real-time precipitation estimation models like Espresso becomes critical.
Espresso has been incorporated as an operational product at Météo-France, delivering high-quality, real-time global precipitation estimates every 30 min. These estimations are readily accessible to forecasters for monitoring French Overseas Territories, where ground radars may not be available, and for anticipating the movement of incoming precipitation before it becomes visible on radars. This tool strengthens Météo-France’s ability to respond to and manage the impacts of extreme weather events, thereby contributing to the protection of people and property across French territories.