1. Introduction
Precipitation is a key component of the global water cycle [
1]. Climate change has caused an increase in the frequency and quantity of precipitation, which has severely threatened people’s lives and properties [
2,
3,
4]. High-quality precipitation data are of great significance to industrial and agricultural production, water conservancy development, drought, and flood prevention [
5]. The traditional method for obtaining precipitation data is to install a network of rain gauge stations with a specific spatial density. Although accurate precipitation can be obtained at each station, the uneven distribution of rain gauge stations and the spatial discontinuity of precipitation data have obvious limitations [
6,
7]. Remote-sensing-based methods using radars or satellites have been increasingly applied to estimate precipitation and spatial distribution. However, many evaluation studies showed that the satellite precipitation estimates often included significant errors, especially greater uncertainty when estimating extreme precipitation [
8]. Therefore, accurately estimating extreme precipitation and understanding its spatiotemporal evolution is critical [
9].
With the development of satellite remote sensing technology and precipitation retrieval algorithms, satellite precipitation products have experienced significant progress [
10,
11]. The most typical example could be the transition from the Tropical Rainfall Measuring Mission (TRMM) to the Global Precipitation Measurement (GPM) mission [
12]. The Integrated Multi-Satellite Retrievals for GPM (IMERG) provides improved precipitation data by integrating microwave and infrared data from the GPM satellite constellation [
13]. Compared to TRMM products, IMERG products have higher temporal (half-hourly) and spatial resolution (0.1°) and improved capability for monitoring extreme precipitation [
14,
15,
16]. Since the release of IMERG precipitation products in 2014, many studies evaluating different versions of IMERG precipitation products have been conducted at various spatial (e.g., watersheds, countries, continents, and the globe) and temporal scales (e.g., half-hourly, hourly, daily, and monthly) [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26].
IMERG is one of the most widely used satellite precipitation products and has been extensively evaluated in China. For example, Tang et al. evaluated IMERG precipitation products in mainland China and revealed that IMERG precipitation products perform better than TMPA (TRMM Multi-satellite Precipitation Analysis) precipitation products on both hourly and daily scales; however, in arid areas and high-altitude areas, the accuracy of IMERG precipitation products needs to be improved [
8]. Li et al. evaluated the spatial distribution of the IMERG’s precipitation detection ability and error structure and the climate dependence of error sources. They concluded that IMERG could capture the spatial pattern of light rain (<10 mm/day) in mainland China, but IMERG’s performance is poor in areas with complex winter schemes [
27]. Yang et al. evaluated IMERG precipitation products with the rain gauge data over Sichuan, China, and concluded that IMERG performed better in estimating the rainfall intensity than detecting the precipitation [
28]. Guo et al. compared the IMERG precipitation products before and after the bias-correction/calibration and found that the IMERG-F product calibrated with rain gauge measurements shows improved quality, but it still overestimates the precipitation in western China [
29]. Chen et al. compared IMERG and TRMM data in China and found that IMERG has higher accuracy and lower probability of outliers [
30]. Foelsche et al. and Wang et al. compared the differences of three different versions of IMERG (V03, V04, and V05) products in mainland China. They found that V04 and V05 demonstrate an improvement over V03; among them, the early and late real-time products of the V04 version are better than the corresponding V03 products, and the V05 final version has the highest accuracy [
31,
32].
Recently, IMERG products have been updated to the latest version, V06B, which can provide retrospective precipitation estimates from June 2000 to the present. The extended period greatly improves the scope of applications of IMERG products, such as regional and global hydrological simulations of floods. Arshad et al. (2021) explored the advantages and differences of the GPM-IMERG V6 product (real-time and calibrated versions) in Pakistan and verified that the product captures precipitation in the northeast with high accuracy [
33]. Tang et al. found that although the quality of IMERG V06 has been greatly improved on the hourly scale and reproduces the diurnal cycle well, the snowfall estimation performance of IMERG is poor [
10]. The quality of IMERG for the hourly and half-hourly scale in high-latitude and high-altitude areas and the precipitation estimates in arid climates areas need to be further improved [
8]. Overall, evaluation studies of IMERG V06 are still limited.
Considering that extreme precipitation is a major cause of flood disasters, many studies investigate the performance of IMERG for different precipitation rates, with special attention to heavy precipitation. For example, Sahlu et al. found that IMERG is better than CMORPH on an hourly or daily scale but is worse at detecting heavy rain [
34]. He et al. found that IMERG improved the ability to capture moderate-intensity precipitation events and was highly sensitive to extreme precipitation but overestimated extreme precipitation events [
35]. Wang et al. found that IMERG captures light rain and moderate rain events well but shows degraded performance in detecting extreme precipitation events [
36]. Mahmoud et al. evaluated the performance of GPM IMERG products (i.e., early, late and final run products) in Saudi Arabia, especially the “errors associated” with different precipitation intensities, showing that IMERG products can be an effective surrogate for rainfall data in arid regions [
37]. The most common approach to extracting extreme precipitation events is to define intensity thresholds or percentiles (e.g., 90% and 99%). However, most studies treat precipitation in each grid and time interval as isolated events, which is easy to implement but ignores precipitation weather systems‘ temporal and spatial dynamic evolution. Therefore, it is necessary to consider precipitation’s temporal and spatial distribution characteristics for a more comprehensive assessment of satellite precipitation products. In this regard, we extracted three-dimensional extreme events using an object-based tracking method to overcome the limitation.
Given the abovementioned limitation of previous studies, this study aims to (1) evaluate the accuracy of the latest version (V06) of IMERG-L and IMERG-F at multiple scales (hourly and daily) with rain gauge measurements from 2000 to 2018 in the North China Plain; (2) introduce object-tracking to explore the accuracy of the post-real-time corrected IMERG-F and near-real-time IMERG-L, PERSIANN-CCS, and GSMAP satellite precipitation data to capture 3D extreme precipitation events and analyze the factors that induce errors; and (3) investigate whether IMERG products can capture typical flood disasters. The North China Plain is selected because it is the most populated region in China. Beijing, the capital of China, is located in the plain. Floods are non-negligible threats in the North China Plain during the rainy season. This study can provide a reference for the application of IMERG and the development of IMERG retrieval algorithms.
3. Results and discussion
3.1. Evaluation Results at Multiple Scales
3.1.1. Evaluation at the Hourly Scale
Many applications are built on sub-daily precipitation data, and thus, hourly scale evaluation is critical to comprehensively understand the accuracy of IMERG.
Figure 3 shows the spatial distributions of the three-evaluation metrics,
CC,
ME, and KGE, for the two IMERG products at the hourly scale. The
CC is below 0.2.
ME is relatively small in most areas of the North China Plain and is characterized by alternating positive and negative values. KGE balances the contributions of correlation, deviation, and variability. KGE values are very low (all below 0.1) for both IMERG-L and IMERG-F. All three evaluation metrics show that the accuracy of the two IMERG products at the hourly scale is quite poor compared to the rain gauge measurements. The difference between IMERG-F and IMERG-L is not obvious, indicating that the monthly correction used for IMERG-F has little impact on the hourly scale data. However, the current deviation correction of IMERG forces IMERG’s monthly precipitation to be consistent with the Global Precipitation Climatic Center (GPCC). This method has two limitations: (1) the spatial resolution of GPCC is coarse, and thus, the correction cannot account for the spatial variability of IMERG precipitation at the 0.1-degree resolution; and (2) the monthly correction only adjusts the total precipitation amount but cannot adjust sub-monthly (e.g., daily and hourly) precipitation temporal variability. This limitation restricts the further improvement of the quality of IMERG-F. Therefore, a possible future solution is to use a high-resolution meter-based daily precipitation data set to correct IMERG satellite precipitation estimates. IMERG should consider adopting bias correlation or calibration with ground measurements at finer temporal resolutions (daily or even hourly scale) in future versions.
Figure 4 shows the distribution of the detection ability of the two products at the hourly scale in terms of POD, FAR, and CSI. Again, the detection ability of IMERG-F is similar to that of IMERG-L. The maximum values of POD and CSI are 0.4 and 0.2, respectively, while the minimum FAR is 0.6. Obviously, the two products have low detection ability. However, it should be noted that for hourly scale evaluation, some problems such as the continuity time and the mismatch between point scale rain gauges and regional satellite data may lead to greater uncertainty.
3.1.2. Evaluation at the Daily Scale
The evaluation of daily precipitation is performed with rain gauge measurements from 2000 to 2018.
Figure 4 presents the distributions of three metrics: CC, ME, and KGE. Combining the results of various indicators, the performance of the southern region is higher than that of the northern region, and IMERG-F is slightly better than the IMERG-L in statistical accuracy.
CC and KGE of IMERG-F are significantly higher than that of IMERG-L, while the magnitude of ME is smaller. In all, IMERG-L significantly overestimates precipitation, while IMERG-F solves this problem due to monthly bias correction.
Combined with the results in
Figure 5, both IMERG-F and IMERG-L can better detect precipitation in the southern region.
Figure 6 shows the distribution of POD, FAR, and CSI. Notably, the difference between MERG-F and IMERG-L is extremely small, resulting in similar colors in
Figure 5. This proves that the monthly scale correction of IMERG-F has little impact on daily scale precipitation’s occurrence. Meanwhile, in the northeast part of the study area, the FAR of IMERG-F is slightly smaller than that of IMERG-L, indicating that IMERG-F has a lower false alarm rate and relatively better accuracy.
3.1.3. Evaluation Results at the Seasonal Scale
The seasonal analysis is conducted by calculating the indicators of daily or hourly data in the corresponding season and then analyzing it by season.
Figure 7 is the box plot of CC, KGE, and CSI for IMERG-F and IMERG-L products. Among them, the horizontal line inside the box is the median, the upper and lower boundaries of the box are 25% and 75% of the data, and the two outermost horizontal lines represent 10% and 90% of the data. Overall, the seasonal assessment indicators of IMERG-F and IMERG-L have insignificant differences. For example, the IMERG-F box type is higher than that of IMERG-L, and the numerical value and the median value vary widely.
Combined with the correlation indicators in
Figure 7a, the CCs of IMERG-F in spring, summer, autumn, and winter are 0.39, 0.38, 0.44, and 0.38, respectively, which are higher than the corresponding 0.35, 0.36, 0.4, and 0.31 of IMERG-L. The winter box width ratio is higher than that of the other three seasons, which is related to the climate of the North China Plain. Meanwhile, the KGEs of IMERG-F in spring, summer, autumn, and winter are 0.36, 0.36, 0.41, and 0.32, respectively, which are higher than the corresponding 0.31, 0.31, 0.08, and 0.17 of IMERG-L. In short, the regional performance of winter precipitation is unstable and changes greatly (
Figure 7b). However, the cabinets of IMERG-L vary widely, even below 0, reflecting the poor quality of the products in autumn and winter. For IMERG-F, the improvement effect is very obvious in autumn and winter but not obvious in summer when there is more precipitation. The CSI index characterizes the ability of these two products to detect precipitation. The CSIs of IMERG-F in spring, summer, autumn, and winter are 0.34, 0.44, 0.39, and 0.27, respectively, higher than the corresponding 0.34, 0.44, 0.37, and 0.26 of IMERG-L. The accuracy is similar in spring and summer, while in winter and spring, IMERG-F is slightly better than IMERG-L (
Figure 7c). In all, IMERG-F is better than IMERG-L on the seasonal scale. Among them, there is little difference in summer with more precipitation, and the regional performance is relatively stable, while in winter, the regional performance of precipitation is unstable and has greater variability.
3.2. Evaluation in Terms of 3D Extreme Precipitation Events
Three-dimensional precipitation events are extracted using the object-tracking method.
Figure 8 shows the thresholds of extreme precipitation for IMERG-F and IMERG-L at the hourly and daily scales. The thresholds display strong spatial variation, gradually decreasing from ocean to inland. This is because the water vapor supply in the inland areas is less than in the ocean and coastal areas, and thus, the intensity is lower. The lower thresholds of the IMERG-F indicate that performing bias correction with gauge data results in lower precipitation intensity. IMERG-L thresholds have a smooth transition between the ocean and the land, which conforms to the actual evolution of precipitation systems. However, IMERG-F thresholds drop sharply from ocean to land (
Figure 7b). The main reason is that rain gauge data used for bias correction during the generation of the IMERG-F are mainly from land, excluding measurements from the ocean (mainly due to the limited availability of data). Therefore, the bias correction would inevitably lead to more bias for the transitional areas between the ocean and land. In contrast, IMERG-F and IMERG-L data products are similar on the ocean. Therefore, if IMERG-F is applied to study typhoon and hurricane phenomena generated over the ocean and propagated to land, data users must consider the discontinuity caused by this problem.
In this study, we extract three-dimensional precipitation systems, which include the generation, movement, and dissipation processes of a precipitation system. For example, a precipitation event may occur over the ocean and move to land with time.
Figure 9 reveals the seasonal characteristics of extreme precipitation events (duration, total pixels, maximum rainfall, and average rainfall intensity). The precipitation area is defined as the number of total pixels of a precipitation event. Maximum precipitation is defined as the maximum precipitation measured or investigated in a certain period and area. IMERG-L presents a longer event duration than IMERG-F for all seasons except summer. IMERG-F displays the longest precipitation duration in summer, while IMERG-L has less obvious seasonal variations.
Figure 9b reveals that the precipitation coverage of the four seasons is basically the same, but IMERG-L is higher than IMERG-F.
The maximum precipitation refers to the maximum intensity of every three-dimensional event (
Figure 9c). The maximum precipitation has notable seasonal variation and reaches the peak in summer. Moreover, the maximum precipitation of IMERG-L exceeds that of IMERG-F. Compared to the maximum intensity, the mean intensity has similar seasonal features but a much smaller magnitude (
Figure 9d). The monthly gauge-based correction used by IMERG-F can adjust the amount of precipitation, as revealed by many previous studies [
10]. However, our analysis shows that the correction can impact the whole lifecycle of precipitation systems, such as the duration and affected area in
Figure 9. This adjustment indirectly contributes to the reduction of precipitation overestimation in the North China Plain (
Figure 5).
Figure 10 and
Figure 11 evaluate the extreme precipitation of IMERG and compare IMERG to PERSIANN-CCS and GSMAP using rain gauge data at hourly and daily scales. CC and BIAS of each 3-D precipitation event are selected as the metrics that follow. The hourly median CCs of IMERG-F, IMERG-L, GSMaP, and PERSIANN-CCS are −0.2, −0.16, −0.09, and 0.06, respectively. IMERG-F, IMERG-L, and GSMaP have negative linear correlations, and IMERG-F has the best correlation. The median daily CCs of IMERG-F and IMERG-L are 0.31 and 0.3, respectively. Both IMERG-F and IMERG-L have a positive correlation, but IMERG-F is slightly better than IMERG-L.
In terms of the mean error, the median hourly BIAS of IMERG-F, IMERG-L, GSMaP, and PERSIANN-CCS are 8.7, 10.77, 3.98, and 3.42 mm. Notably, both IMERG-F and IMERG-L overestimate precipitation. Among them, the relative error of IMERG-F is small: better than IMERG-L and slightly worse than GSMAP and PERSIANN-CCS. However, GSMaP and PERSIANN-CCS have a certain underestimation of precipitation. The median daily BIAS of IMERG-F and IMERG-L were 3.86 and 5.77. Similarly, IMERG-F was better than IMERG-L.
Based on the above analysis, the mean error of IMERG-F is better than IMERG-L, but the CC is lower, which is consistent with the previous analysis. Although many studies have shown that the quality of PERSIANN-CCS is generally low, GSMAP and PERSIANN-CCS are better for extreme precipitation in the North China Plain, especially PERSIANN-CCS, which shows that infrared has potential for extreme precipitation.
3.3. Extreme Precipitation-Induced Disaster Analysis
Flash flood disasters are caused by heavy rainfall in hilly areas and can result in debris flow and landslides, causing considerable losses to the national economy and people’s lives and property. The precursors of flash flood events are closely related to the intensity of precipitation, among which previous continuous rainfall and sudden heavy rainfall are the main causes of flash floods. It is extremely important to make short-term rainfall forecasts in advance, which is inseparable from high-precision rainfall data. Therefore, assessing the accuracy of IMERG to capture flash flood disasters is extremely important for flash flood early warnings in un-gauged basins. Combined with the better performance of IMERG-F, the article mainly focuses on qualitative analysis due to the difficulty of collecting large samples of measured flood data. Therefore, this study takes the “7·19” flood disaster as an example to investigate whether the spatial distribution of IMERG-F agrees with flood events.
Based on flash flood events, the danger is divided into four levels, as shown in
Figure 11, where the vertical axis is the number of dead or missing—the greater the value, the greater the disaster intensity. Flash floods are concentrated in the lower levels (second and first), while the more serious third and fourth levels occur less frequently but cause more casualties.
Figure 12 also demonstrates that the typical case selected in this article (the “7·19” flash flood disaster) is very harmful. If the satellite precipitation data can capture the precipitation that induced the disaster, this will greatly improve the accuracy of flash flood forecasting, thereby reducing the impact of the disaster.
From 18 to 20 July 2016, the “7.19” flood occurred in Hebei Province, mainly due to heavy rainfall. Considering the impact of previous rainfall, this study uses the evolution of rainfall data to analyze the disaster probability captured by IMERG-F and IMERG-L from 15 to 23 July.
Figure 13 shows the precipitation distribution from 15 to 23 July 2016, where (a) and (b) are IMERG-F and IMERG-L, respectively. The background is the precipitation of the corresponding IMERG product, and the black circles represent station precipitation, filled with the same color map of the corresponding IMERG product. The precipitation spatial distributions of IMERG-F and IMERG-L are basically consistent with the ground station and can capture the precipitation center. From 15 to 18 July, both of them captured most of the rainfall. From 18 to 21 July, these two precipitation products were in good agreement with the measured precipitation, reproducing the precipitation process. On 19 July, both IMERG and station data show relatively strong precipitation in the disaster area, indicating that heavy rainfall directly induced the disaster. On 20 July, the disaster-hit area had a small intraday rainfall and a large previous rainfall, reflecting that the previous cumulative rainfall was also a key factor inducing flood. On 21 to 23 July, the filling color of the small black circle was the same as the background color, again verifying that IMERG-F and IMERG-L were consistent with the measured precipitation. However, the area where the flood occurred on 23 July underwent heavy rainfall on 19 July, which may be attributed to different types of disasters that have not been confirmed in the previous period. Above all, both IMERG-F and IMERG-L have high accuracy in capturing disaster-inducing rainfall. Of course, the current flood research is relatively weak, especially in collecting measured disaster data. In the future, more flood disaster data needs to be collected to further strengthen the analysis.
4. Conclusions
This study evaluated the accuracy of the latest version of IMERG V06 satellite precipitation products (IMERG-L and IMERG-F) in capturing extreme precipitation at multiple scales and compared IMERG to several other datasets in the North China Plain. We utilized the novel object-based tracking method to extract 3D extreme precipitation events, which can comprehensively demonstrate the spatiotemporal evolution of precipitation systems compared to traditional evaluation methods. In addition, we selected a typical flood event in the North China Plain to explore whether IMERG can capture extreme precipitation-triggering floods.
The statistical evaluation shows that overall, IMERG-F is better than IMERG-L. The improvement is more notable for bias-related metrics such as mean error and KGE compared to CC and contingency metrics (POD, FAR, and CSI). However, the improvement is less significant at the hourly scale than the daily scale. This phenomenon is caused by the nature of the monthly scale total volume correction, which has a limited effect on adjusting daily variations of precipitation.
Object-based tracking is beneficial for more comprehensive evaluation. Although this study focuses on land areas, we found that IMERG-F exhibits discontinuity concerning the thresholds of extreme precipitation between land and ocean, while IMERG-L does not have this problem. This reminds researchers that caution is needed when using bias-corrected satellite products to study extreme precipitation in land–ocean transition zones. IMERG-L and IMERG-F show similar characteristics of extreme precipitation (e.g., seasonal variation) in the North China Plain. IMERG-F shows smaller precipitation duration, precipitating areas, and max and mean precipitation intensity, indicating that monthly correction reduces precipitation occurrence and magnitude in the study area. However, based on the evaluation metrics, both IMERG-L and IMERG-F do not outperform PERSIANN-CCS and GSMaP in capturing 3D extreme events at the hourly and daily scales. The IMERG products have a great deal of room for improvement, although qualitative flood assessment indicates that IMERG products can generally capture the precipitation events triggering floods.
In summary, this study validates that GPM IMERG can generally capture extreme precipitation in the North China Plain, whereas the quantitative estimates of extreme precipitation still have room for improvement according to statistical metrics. One limitation is the utilization of monthly scale correction, which cannot effectively adjust the daily variation of satellite precipitation. Future IMERG versions could consider adopting daily or sub-daily bias correction to further improve the performance of IMERG-F. Meanwhile, researchers should also pay attention to the discontinuity in coastal regions caused by gauge-based correction over land. This study shows that the object-based tracking method provides more evaluation aspects (e.g., precipitation features such as duration and max intensity and event-based evaluation) compared to pixel-by-pixel evaluation. Future studies could explore how multiple factors (e.g., regional distribution, wind direction, and terrain features) affect the errors revealed by object-based tracking methods and carry out more comprehensive multi-satellite dataset comparisons in broader regions. Finally, the general lack of flood disaster data limits relevant research. Thus, it is challenging but meaningful for future studies to collect and share more flood disaster data.