Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification

Billios, Marios; Vasiliades, Lampros

doi:10.3390/eesp2025032003

Open AccessProceeding Paper

Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification^†

by

Marios Billios

and

Lampros Vasiliades

^*

Laboratory of Hydrology and Aquatic Systems Analysis, Department of Civil Engineering, University of Thessaly, 38334 Volos, Greece

^*

Author to whom correspondence should be addressed.

^†

Presented at the 8th International Electronic Conference on Water Sciences, 14–16 October 2024; Available online: https://sciforum.net/event/ECWS-8.

Environ. Earth Sci. Proc. 2025, 32(1), 3; https://doi.org/10.3390/eesp2025032003

Published: 24 January 2025

(This article belongs to the Proceedings of The 8th International Electronic Conference on Water Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate quantile estimation of extreme precipitation is crucial for hydraulic infrastructure design but is often hindered by limited data records, leading to uncertainties. This study applies regional frequency analysis (RFA) using L-moments, comparing classical and Bayesian approaches to quantify uncertainties. Data from 55 rainfall stations in Thessaly, Greece, are analyzed through clustering using PCA and k-means. The Generalized Extreme Value (GEV) distribution is fitted to delineated clusters, and uncertainties are assessed via bootstrap and MCMC methods. Results highlight consistency in location and scale estimates, with Bayesian methods offering narrower uncertainty bounds, demonstrating improved reliability for long-term rainfall prediction and design.

Keywords:

regional frequency analysis; extreme precipitation; uncertainty; L-moments; metropolis algorithm; Markov Chain Monte Carlo

1. Introduction

Extreme precipitation plays a critical role in natural disasters, with hydro-meteorological events comprising approximately 90% of global disasters in recent decades. Accurate measurement of extreme precipitation is essential for effective infrastructure design and risk assessment [1,2]. While short-term rainfall forecasting is feasible, long-term prediction, crucial for infrastructure planning, is inherently uncertain and requires probabilistic approaches [3,4]. Rainfall is typically modeled as a random variable, necessitating the selection of an appropriate probability density function (PDF) to estimate the likelihood of extreme events. However, limited or incomplete data records often misrepresent the underlying population, especially for longer return periods, introducing significant sampling uncertainties [5,6].

To address these challenges, the L-moment approach, an extension of the “index flood” method proposed by Hosking and Wallis [7,8], is widely used for regional frequency analysis. This method enhances statistical robustness by pooling data from multiple sites to increase the effective record length, improving the reliability of parameter estimates. When dealing with high quantiles, it is crucial to quantify the associated uncertainties. Classical methods, such as the bootstrap technique, construct confidence intervals for this purpose. Similarly, Bayesian approaches, using methods like the Metropolis algorithm, produce credible intervals that incorporate prior knowledge and provide a more comprehensive uncertainty assessment. The information regarding the significance of extreme precipitation modeling in the context of climate change and risk management, existing methodologies like Maximum Likelihood Estimation (MLE), L-moments, Bayesian approaches, and uncertainty quantification techniques, including parametric bootstrapping and Markov Chain Monte Carlo (MCMC) methods, is discussed in recent studies [9,10,11].

This study applies regional frequency analysis (RFA) based on L-moments to analyze extreme precipitation in Thessaly, Greece, using data from 55 rainfall stations. The analysis evaluates and compares the frequentist and Bayesian approaches to parameter estimation and uncertainty quantification, aiming to enhance the accuracy and reliability of extreme precipitation predictions for infrastructure planning and risk management.

2. Materials and Methods

2.1. Study Area

This study analyzes extreme precipitation using data from 55 rain gauges in Thessaly, a region in central Greece characterized by complex terrain (Figure 1). The dataset comprises annual maxima of 24 h rainfall durations.

2.2. Regional Frequency Analysis Based on L-Moments

This study adopts the methodology proposed by Hosking and Wallis [7] following these steps:

Region Delineation and Clustering: The region is divided into statistically homogeneous clusters using the k-means algorithm, which minimizes the total within-cluster sum of squares [12]. To address the challenge of determining the optimal number of clusters, principal component analysis (PCA) is applied to identify key covariates (e.g., coordinates, elevation, mean annual precipitation) that best describe the data. The average silhouette method [13] provides insights into the appropriate number of clusters. Final cluster selection is performed using the NsRFA package [14], evaluated via the heterogeneity measure $H_{1}$ [7]. Based on the results, if $H < 1,$ the region is “acceptably homogeneous”; $1 \leq H < 2$ , the region is “probably heterogeneous”; and if $H \geq 2$ , the region is “definitely heterogeneous” [7].
Fit the Generalized Extreme Value Distribution Using the L-Moment Method: Hosking [15] introduced L-moments as linear combinations of probability-weighted moments (PWMs) [16]. L-moments offer advantages over methods like maximum likelihood, providing more robust parameter estimates [17]. According to classical extreme value theory [18], the distribution of block-maxima random variables converges to one of three limiting distributions: Type I (Gumbel), Type II (Fréchet), or Type III (Weibull) [19,20]. These three distributions form the Generalized Extreme Value (GEV) distribution as follows:

$G (X) = \exp (- {(1 + \frac{γ (x - a)}{β})}^{- \frac{1}{γ}}), 1 + \frac{γ (x - a)}{β} \geq α$

(1)

with location parameter a, scale parameter β, and shape parameter γ.
Quantiles estimation: The quantiles are going to be calculated based on the following “index-flood” equation:

$Q_{i} = μ_{i} q_{R} (F)$

(2)

where $μ_{i}$ is the mean value of each station, and $q_{R}$ is the regional growth curve estimated from the scaled and pooled data of a homogenous region.

2.3. Uncertainty Quantification

The first method used is parametric bootstrapping [6], where the GEV distribution is first fitted to the pooled data using L-moments. A sample of the same size is then generated from this distribution, and the GEV parameters are re-estimated to calculate the quantiles from the growth curve. This process is repeated 10,000 times to determine the 5–95% confidence intervals for the quantiles.

The second method involves using the MCMC technique with the Metropolis algorithm [21]. Bayesian statistics allow simultaneous parameter estimation and uncertainty quantification. The Metropolis algorithm operates by sampling a candidate value from the proposal distribution

q (x^{c a n d i d a t e}| x)

and accepting it as the new state of the Markov chain with a certain probability. It is important to note that selecting different starting values is best practice. In this study, we will generate three (3) distinct Markov chains for each estimate.

P (x, x^{c a n d i d a t e}) \geq \min [1, \frac{σ (x^{c a n d i d a t e})}{σ (x)}]

(3)

3. Results and Discussion

3.1. Cluster Analysis

For the PCA, four covariates were used: station coordinates, elevation, and mean annual precipitation. Table 1 presents the principal components, indicating the proportion of the total variance explained by each transformed variable. The first principal component (PC1) explains approximately 50% of the variance. To reduce dimensionality, three principal components were selected for further analysis. Table 2 shows the loadings, which indicate the contribution of each covariate to the principal components. In PC1, mean annual precipitation has the highest weight, followed by elevation Z and the X-coordinate.

As discussed in the introduction, analyzing the average silhouette plot (Figure 2) is beneficial. The optimal number of clusters identified is three, which has an average silhouette score of 0.42, followed closely by five and eight clusters with scores of 0.40 and 0.39, respectively. The final number of clusters will be determined by applying the k-means algorithm to create between two and seven clusters, each assessed with the H₁ heterogeneity measure (Table 3). The spatial distribution of the clusters is presented in Figure 1.

Upon interpreting the results, none of the cluster configurations yield homogenous regions. This outcome is anticipated due to data scarcity and the region’s complex terrain. Sparse data in complex terrains (such as Thessaly) can lead to increased heterogeneity among clusters, as rainfall is highly influenced by topography (e.g., elevation, slope, and orographic effects) [22]. Low-density rainfall networks are limited in their ability to represent large areas [23]. Also, increasing terrain complexity is associated with greater spatial variability in precipitation. Given that record length directly influences estimations, it is essential to balance the number of stations within each cluster against their heterogeneity. Consequently, after considering the spatial distribution of clusters, we have decided to delineate the area into four (4) clusters.

3.2. Estimation, Uncertainty and Covariates

After defining each cluster, we scale the data from each station by its mean. Subsequently, we pool the data and apply the L-moments method alongside the Metropolis algorithm. The final estimates derived from maximum likelihood estimation, L-moments, and three different Markov Chain Monte Carlo (MCMC) methods, along with their uncertainty intervals, are presented in Table 4 for location parameters, Table 5 for scale parameters, and Table 6 for shape parameters. The location estimates across all clusters are consistent regardless of the method employed, as are their uncertainty widths (the difference between the 95% quantile and the 5% quantile). A similar consistency is observed in scale parameter estimates across methods.

In contrast, shape parameter estimates exhibit greater variability among different methods compared to location and scale parameters, reflected in wider uncertainty intervals. The maximum likelihood and L-moments methods show broader uncertainties than the Metropolis algorithm results. Additionally, shape parameter uncertainties are larger than those for location and scale parameters. Notably, all methods converge on the distribution types: Cluster 1 and Cluster 2 follow a Weibull distribution, while Clusters 3 and 4 follow a Fréchet distribution.

Finally, we investigate the relationship between Generalized Extreme Value (GEV) parameters and station covariates. We focus on elevation and mean annual precipitation due to their significance in the first principal component analysis. To illustrate potential trends further, we fit a linear model to parameters derived from both Bayesian methods and L-moments in classical approaches. Figure 3 demonstrates a positive correlation between the location parameter and both cluster average mean annual precipitation and cluster mean elevation; higher values lead to increased location estimates. Conversely, Figure 4 reveals negative trends between the scale parameter and these same covariates. For the shape parameter, however, no clear trend emerges from the data.

To evaluate the effectiveness of the L-moments-based RFA methodology, we constructed a rainfall frequency analysis (Figure 5) for selected stations in each cluster using the InSite classical approach and RFA. We compared quantiles derived from L-moment estimates and their uncertainty bounds—calculated using parametric bootstrap—with quantiles obtained solely from station data (InSite method). The results support the method’s potential, indicating reduced uncertainty at higher return periods (Figure 5).

4. Concluding Remarks

This study aims to discover the differences in estimating extreme precipitation. Utilizing the regional frequency analysis (RFA) procedure based on the L-moments approach, we developed optimal clusters for our study area. We then pooled the data to estimate the Generalized Extreme Value (GEV) parameters using both classical and Bayesian methods. The analysis reveals that both approaches yield consistent results for location and scale parameters, while differences are observed in the shape parameter. This discrepancy is further reflected in the uncertainty analysis, which shows that the shape parameter has the widest overall confidence intervals, with the L-moments and maximum likelihood methods producing broader intervals than those generated by the Metropolis algorithm.

The results of this study have significant implications for the general application of extreme precipitation estimation in hydraulic infrastructure design and risk management. By employing regional frequency analysis (RFA) and comparing classical and Bayesian approaches, the research enhances the reliability of extreme value predictions, particularly in regions with limited data records. Furthermore, the integration of uncertainty quantification through parametric bootstrapping and Markov Chain Monte Carlo (MCMC) methods provides a robust framework for future studies aiming to assess the impact of extreme precipitation events on engineering practices.

Author Contributions

Conceptualization, software, methodology, writing—original draft–review, M.B.; conceptualization, supervision, methodology, review, and editing, L.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gründemann, G.J.; Zorzetto, E.; Beck, H.E.; Schleiss, M.; van de Giesen, N.; Marani, M.; van der Ent, R.J. Extreme Precipitation Return Levels for Multiple Durations on a Global Scale. J. Hydrol. 2023, 621, 129558. [Google Scholar] [CrossRef]
Hailegeorgis, T.T.; Alfredsen, K. Analyses of Extreme Precipitation and Runoff Events Including Uncertainties and Reliability in Design and Management of Urban Water Infrastructure. J. Hydrol. 2017, 544, 290–305. [Google Scholar] [CrossRef]
Papalexiou, S.M.; Koutsoyiannis, D.; Makropoulos, C. How Extreme Is Extreme? An Assessment of Daily Rainfall Distribution Tails. Hydrol. Earth Syst. Sci. 2013, 17, 851–862. [Google Scholar] [CrossRef]
Gu, X.; Ye, L.; Xin, Q.; Zhang, C.; Zeng, F.; Nerantzaki, S.D.; Papalexiou, S.M. Extreme Precipitation in China: A Review on Statistical Methods and Applications. Adv. Water Resour. 2022, 163, 104144. [Google Scholar] [CrossRef]
Hailegeorgis, T.T.; Thorolfsson, S.T.; Alfredsen, K. Regional Frequency Analysis of Extreme Precipitation with Consideration of Uncertainties to Update IDF Curves for the City of Trondheim. J. Hydrol. 2013, 498, 305–318. [Google Scholar] [CrossRef]
Liang, Y.; Liu, S.; Guo, Y.; Hua, H. L-Moment-Based Regional Frequency Analysis of Annual Extreme Precipitation and Its Uncertainty Analysis. Water Resour Manag. 2017, 31, 3899–3919. [Google Scholar] [CrossRef]
Hosking, J.R.M.; Wallis, J.R. Regional Frequency Analysis: An Approach Based on L-Moments; Cambridge University Press: Cambridge, UK, 1997; ISBN 978-0-521-43045-6. [Google Scholar]
Dalrymple, T. Flood-Frequency Analyses, Manual of Hydrology: Part 3; US Government Printing Office: Washington, DC, USA, 1960.
Efron, B. Bootstrap Methods: Another Look at the Jackknife. In Breakthroughs in Statistics: Methodology and Distribution; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992; pp. 569–593. ISBN 978-1-4612-4380-9. [Google Scholar]
Chung, E.-S.; Kim, S.U. Bayesian Rainfall Frequency Analysis with Extreme Value Using the Informative Prior Distribution. KSCE J. Civ. Eng. 2013, 17, 1502–1514. [Google Scholar] [CrossRef]
Billios, M.; Vasiliades, L. A Network-Based Clustering Method to Ensure Homogeneity in Regional Frequency Analysis of Extreme Rainfall. Water 2024, 17, 38. [Google Scholar] [CrossRef]
Na, S.; Xumin, L.; Yong, G. Research on K-Means Clustering Algorithm: An Improved k-Means Clustering Algorithm. In Proceedings of the 2010 Third International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, China, 2–4 April 2010; pp. 63–67. [Google Scholar]
Kaufman, L.; Rousseeuw, P. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1990; ISBN 978-0-471-87876-6. [Google Scholar]
Viglione, A.; Laio, F.; Claps, P. A Comparison of Homogeneity Tests for Regional Frequency Analysis. Water Resour. Res. 2007, 43, W03428. [Google Scholar] [CrossRef]
Hosking, J.R.M. L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics. J. R. Stat. Soc. Ser. B (Methodol.) 1990, 52, 105–124. [Google Scholar] [CrossRef]
Greenwood, J.A.; Landwehr, J.M.; Matalas, N.C.; Wallis, J.R. Probability Weighted Moments: Definition and Relation to Parameters of Several Distributions Expressable in Inverse Form. Water Resour. Res. 1979, 15, 1049–1054. [Google Scholar] [CrossRef]
Norbiato, D.; Borga, M.; Sangati, M.; Zanon, F. Regional Frequency Analysis of Extreme Precipitation in the Eastern Italian Alps and the August 29, 2003 Flash Flood. J. Hydrol. 2007, 345, 149–166. [Google Scholar] [CrossRef]
Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer Science & Business Media: London, UK, 2013; ISBN 978-1-4471-3675-0. [Google Scholar]
Papalexiou, S.M.; Koutsoyiannis, D. Battle of Extreme Value Distributions: A Global Survey on Extreme Daily Rainfall. Water Resour. Res. 2013, 49, 187–201. [Google Scholar] [CrossRef]
Fisher, R.A.; Tippett, L.H.C. Limiting Forms of the Frequency Distribution of the Largest or Smallest Member of a Sample. Math. Proc. Camb. Phil. Soc. 1928, 24, 180–190. [Google Scholar] [CrossRef]
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
Ouyang, L.; Lu, H.; Yang, K.; Leung, L.R.; Wang, Y.; Zhao, L.; Zhou, X.; Lazhu; Chen, Y.; Jiang, Y.; et al. Characterizing Uncertainties in Ground “Truth” of Precipitation Over Complex Terrain Through High-Resolution Numerical Modeling. Geophys. Res. Lett. 2021, 48, e2020GL091950. [Google Scholar] [CrossRef]
Long, Y.; Zhang, Y.; Ma, Q. A Merging Framework for Rainfall Estimation at High Spatiotemporal Resolution for Distributed Hydrological Modeling in a Data-Scarce Area. Remote Sens. 2016, 8, 599. [Google Scholar] [CrossRef]

Figure 1. Region of Thessaly, Greece, with the station clusters.

Figure 2. Silhouette plot for the rainfall stations in the study area.

Figure 3. Location parameter of clusters plotted against the mean elevation of the clusters (left) and the average mean annual precipitation of the clusters (right).

Figure 4. Scale parameter of clusters plotted against the mean elevation of the clusters (left) and the average mean annual precipitation of the clusters (right).

Figure 5. Comparison of InSite and RFA approaches for annual maximum rainfall frequency analysis for (a) Pythio Station—Cluster 1, (b) Myra Station—Cluster 2, (c) Argithea station—Cluster 3, and (d) Amarantos Station—Cluster 4.

Table 1. Percentage variance explained for each duration for the different PCs.

Importance of Components
	PC1	PC2	PC3	PC4
Proportion of Variance	0.51	0.28	0.14	0.08
Cumulative Proportion	0.51	0.78	0.92	1

Table 2. Loadings of each covariate.

Principal Components
	PC1	PC2	PC3	PC4
Mean annual Precipitation	0.607	0.327	0.03	0.723
X	−0.578	0.032	0.684	0.443
Y	0.042	−0.908	−0.168	0.382
Z	0.543	−0.261	0.709	−0.368

Table 3. Heterogeneity measure H₁ for every different cluster.

Clusters	H₁
2	2.00	7.56
3	4.47	1.20	6.18
4	3.24	1.43	6.86	1.50
5	1.50	5.68	7.36	0.18	1.37
6	1.32	0.43	0.13	3.43	6.72	1.02
7	5.07	4.57	1.29	0.60	1.44	0.12	3.44

Table 4. Location parameter estimation and its uncertainty width.

Estimation						Uncertainty Width (5–95%)
	MCMC1	MCMC2	MCMC3	ML	LMOM	MCMC1	MCMC2	MCMC3	ML	LMOM
Cluster 1	0.82	0.82	0.83	0.83	0.83	0.064	0.054	0.041	0.065	0.065
Cluster 2	0.77	0.77	0.77	0.77	0.77	0.055	0.062	0.063	0.062	0.062
Cluster 3	0.85	0.85	0.85	0.85	0.85	0.073	0.074	0.082	0.082	0.082
Cluster 4	0.85	0.85	0.86	0.84	0.85	0.059	0.069	0.059	0.083	0.083

Table 5. Scale parameter estimation and its uncertainty width.

Estimation						Uncertainty Width (5–95%)
	MCMC1	MCMC2	MCMC3	ML	LMOM	MCMC1	MCMC2	MCMC3	ML	LMOM
Cluster 1	0.29	0.29	0.30	0.29	0.29	0.05	0.05	0.03	0.05	0.05
Cluster 2	0.31	0.32	0.32	0.31	0.31	0.05	0.04	0.04	0.05	0.05
Cluster 3	0.27	0.27	0.28	0.27	0.26	0.06	0.06	0.05	0.07	0.07
Cluster 4	0.30	0.30	0.30	0.30	0.29	0.05	0.06	0.06	0.06	0.06

Table 6. Shape parameter estimation and its uncertainty width.

Estimation						Uncertainty Width (5–95%)
	MCMC1	MCMC2	MCMC3	ML	LMOM	MCMC1	MCMC2	MCMC3	ML	LMOM
Cluster 1	−0.004	−0.002	−0.003	−0.002	−0.003	0.143	0.131	0.121	0.145	0.145
Cluster 2	−0.113	−0.126	−0.101	−0.108	−0.126	0.091	0.073	0.063	0.136	0.136
Cluster 3	0.027	0.024	0.025	0.031	0.033	0.151	0.153	0.143	0.217	0.217
Cluster 4	0.053	0.054	0.053	0.060	0.086	0.120	0.117	0.107	0.219	0.219

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Billios, M.; Vasiliades, L. Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification. Environ. Earth Sci. Proc. 2025, 32, 3. https://doi.org/10.3390/eesp2025032003

AMA Style

Billios M, Vasiliades L. Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification. Environmental and Earth Sciences Proceedings. 2025; 32(1):3. https://doi.org/10.3390/eesp2025032003

Chicago/Turabian Style

Billios, Marios, and Lampros Vasiliades. 2025. "Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification" Environmental and Earth Sciences Proceedings 32, no. 1: 3. https://doi.org/10.3390/eesp2025032003

APA Style

Billios, M., & Vasiliades, L. (2025). Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification. Environmental and Earth Sciences Proceedings, 32(1), 3. https://doi.org/10.3390/eesp2025032003

Article Menu

Regional Frequency Analysis of Annual Maximum Rainfall and Sampling Uncertainty Quantification^†

Abstract

1. Introduction