1. Introduction
Some of the major natural catastrophe events are floods, caused by the falling of extreme rainfall exceeding the capacity of a particular catchment. These events are likely to be more frequent and severe considering the ongoing radical changes in climate [
1]. The calamity caused by these continuous, heavy inundations poses a significant effect on human lives and the surrounding environment both socially and economically [
2]. In fact, they are the costliest hydrological events, and account for 31% of the world’s economic loss [
3]. In the last 20 years, there have been 951 weather-related events among the 999 natural disaster events in Europe, of which 41% of them were floods [
3]. Specifically, flash floods, caused by intense rainfall, need to be comprehensively addressed since they leave little time to warn the population and cause more deaths than normal floods [
4].
Even with the ongoing threat associated with floods, settlements are sited along the floodplain zones due to the benefits that come from rivers [
5]. Such settlements have served a major contribution to the current flood’s complexity requiring the need for extensive analysis in predicting and managing the future probability of flood re-occurrence. This is possible by adopting various flood risk analysis measures [
6]. The designing of flood defenses has been widely applied as a measure to manage and control future flood risk. So long as a flood magnitude is known, decision-makers are in a position to characterize the possible location for a flood to occur by predicting flood duration, rising rate, flood depths, velocities, and damage. For instance, Dottori [
7] produced a new dataset for river flood hazard maps for the European and Mediterranean Basin regions, which can be used for a range of applications at a continental scale, from evaluating present and future river flood risk scenarios to the cost-benefit assessment of different adaptation strategies to reduce flood impacts. Diakakis [
8] developed a database of flood-related fatalities to investigate qualitative changes in flood mortality in Greece between 1960 and 2010. Chatzichristaki et al. [
9] analyzed the flash flood in Rhodes Island (South Greece) on 22 November 2013 through the curve number method and the soil conservation service runoff estimation method. Rehan [
10] developed an innovative micro-scale approach for vulnerability and flood risk assessment with applications in property-level protection for an urban area in Teddington, London.
The quantification of a flood risk involves the characterization of different hydrological factors that need to be taken into consideration during analysis. More clarity is seen by considering more than one variable in analyses through a multivariate approach [
10]. This kind of approach has been adopted in many research works [
11,
12,
13] due to its proficiency in delivering a full screen for a flood event. For example, Karatzetzou [
14] developed a unified seismic-flood-hazard model for the risk assessment of roadway networks in Greece, which supports the risk assessment of critical elements of transportation infrastructure in a multi-hazard environment. Moreover, an occurrence of a flood event is a combination of several attributes, such as peak flow, flood duration, and flood volume. However, flood factors are dependent on each other, which limits directly combining their marginal distributions. Recent studies employed copulas in modeling such multivariate hydrologic events [
1,
2,
10,
11,
12,
15,
16,
17,
18,
19,
20]. The copula function models the marginal distribution and multivariate dependency separately, offering flexibility in making the needful adjustment of the marginal and joint probability functions.
River Thames is the second-longest river in the UK, with a total catchment area of 12,935.77 km
2. There have been several reported flood incidences reflected as early as the Anglo-Saxon Chronicle of 1099 [
21], making it a flood-prone river. At Teddington lock (Kingston), the River Thames changes its regime from being tidal to non-tidal [
5]. Recently, there have been climatic suggestions of an increase in magnitude and intensity of rainfall as well as a rise in sea level that will lead to an increase in floods. In addition, there are new developments planned that will interfere with the effectiveness of the measures already in place to combat floods at the Thames flood plain [
21]. Although several mitigation measures are in place to manage the reoccurring events, the risk of flood is still a threat. Most flood defenses constructed on the Thames, such as the tidal surge barrier, walls, and embankments, are coming to an end of their design life. At the moment, there are plans in place to manage tidal flood risk for the next 100 years. However, implementations of engineering and managerial measures to combat floods require a properly structured flood risk analysis among the inputs. Owing to this, most studies conducted on the Thames have focused on the design and management of tidal floods at the Thames estuary [
11,
21,
22,
23]. Considering the ongoing threat of fluvial floods due to climate change, there are limited studies that analyze its risk. Furthermore, there are no research works undertaken that analyze multivariate risk resulting from floods in the Thames River. Multivariate flood risk analysis aims at finding the best probability estimate of flood re-occurrence on the river Thames in a specified return period, since it compares different variables using the copula function.
Therefore, this study aims to analyze the multivariate flood risk in the River Thames under consideration of multiple flood variables. In detail, this analysis involves quantifying the joint probabilities of flood peak and three-day maximum flow through different copula functions. Moreover, the unknown parameters of copulas will be estimated using method-of-moments (MOM). The performances of the copulas are evaluated through the goodness-of-fit statistical tests in which the best copula is employed for further flood risk analyses. In this study, the risk indices under consideration would be the return periods for critical thresholds of the flood variables, which include the primary, joint, and secondary return periods. Moreover, the risk of failure will be further characterized based on interactive analysis of two flood variables aiming at revealing significant effects for persisting elevated risk levels due to impacts from multiple interactive flood variables.
2. Overview of the Study Area
The River Thames, as shown in
Figure 1 below, is the longest river in England with a longitude and latitude of 51°35′8.07″ N and 0°36′57.87″ W, respectively. It has a length of 346 km flowing from the springs on Gloucestershire east towards the North Sea with a total catchment area of 12,935.77 km
2. The Thames is divided into five sections with three non-tidal and two tidal sections. It is fed by fifty-one named tributaries having eighty islands and forty-five weirs and locks throughout its course [
24].
Figure 1 shows the largest basin of the river Thames with a total catchment of 9942 km
2 above the Teddington weir (Kingston). The Teddington weir is located at the furthest lower west end of the catchment located at 51°24′54.0″ N and 0°18′31.9″ W.
The basin receives precipitation at an average of 710 mm, evenly distributed throughout the year with the exception of autumn or the early winter maximia. During summer, 460 mm of the precipitation is lost due to evaporation, accounting for 65% of the total rainfall. The average river flow ranges from 1.5 m
3/day to 65.8 m
3/day from the upper side of the river to the tidal limit located at Teddington [
25]. The annual runoff of the river ranges between 99 (in 1934) and 418 mm (in 1951), with an average of 250 mm [
26]. The catchment has a temperate climate with oceanic influence, with temperatures of 16.4 °C and 4.6 °C during summer and winter, respectively, and a mean yearly temperature of 11 °C [
24].
The basin has undergone historical changes in its topography, geology, and land use; its land cover is comprised of arable agriculture and pasture throughout the catchment, while forests and urban areas are mainly located at the lower side of the basin [
25]. Despite several measures placed in the lower Thames to support resilience to climate variability, there is still future fluvial flood risk due to an increase in rainfall, especially during the winter season [
22]. Flooding has always been one of the major challenges for the River Thames in the 20th/21st century, such as the 1947 and 2007 floods [
27]. The flood-generating mechanisms have changed in which snowmelt (sometimes over frozen ground) was a more common mechanism in major flood events (e.g., the 1947 flood) prior to the 1960s, while rising winter temperatures have seen snowmelt decline as an aggravating factor in relation to flood risk [
28] Furthermore, the sustained heavy rainfall on a saturated catchment caused the 1894 flood [
28].
3. Methodology
This study used an inductive research approach rather than a deductive one for analysis. Such an approach was chosen with an aim to formulate a hypothesis and develop a hydrological risk indicator from the joint distributions of two flood variables, i.e., annual peak flow and the corresponding 3-day maximum flow. This study is based on the quantitative approach of data collection using existing data from the NRFA. Before further analysis, the collected data was gathered in Microsoft Excel to check for missing data in any years—no missing data were observed. The data gathered from the gauged daily flow (GDF) was then calculated further to obtain the three-day annual maxima in each year from 1 January 1883 to 1 October 2019. The peak flow and three-day maxima flow, each containing 137 samples, were presented for further analysis in the software MATLAB. These variables are dependent on each other; hence, the copula function was used to bring together their marginals to form a joint distribution. The Gumbel, Frank, and Clayton copula methods, which belong to the Archimedean family, were applied to join the marginal distributions of the variables. The best copula method was then used to model the hydrological risk based on the joint return period in AND.
Various tools and materials were used in the analysis of the flood risk for the River Thames at Kingston. The main software used for the analysis of flood risk estimation were MATLAB and Microsoft Excel. MATLAB was extensively used for the construction and selection of marginal distributions and copulas, as well as the calculation of the return period of the flood variables. Several charts were extracted for analysis in this study. Microsoft Excel was used for the identification of the annual peak flow and 3-day maximum flow to be used in MATLAB.
3.1. Flood Risk Assessment through the Copula Function
The copula function is seen as a cornerstone in the field of hydrological and meteorological risk analysis. Its uniqueness is stipulated in its ability to incorporate the joint probabilities of more than one dependent variable to anticipate the future risk of an event, as shown in
Figure 2. It is applied in water resource engineering in analyzing flood frequency, precipitation, storm, and drought characteristics [
29]. The joint probabilities of multiple attributes provide a wider span in analysis, thus improving the accuracy of the estimated risk [
12].
A d-dimensional copula, C: [0; 1]
d → [0;1] is a cumulative distribution function (CDF) of a random vector (U
1,U
d) with uniform marginals (0,1). The generic copulas C(u) is shown in Equation (1) below as:
where
p [U
j ≤ u
j] = u
j for
and
.
If a component u
j is 0, then
while
if 0 ≤ u
j ≤ 1 [
2].
3.2. Sklar’s Theorem
The copulas theorem was first introduced by Sklar, who considered the existence of copulas when d-dimensional CDF and F have marginals
F1,
Fd such that [
29]:
where
are marginal distributions; if these are continuous, then the copula function C can be written as follows:
Details on copula function properties are described in detail by Zhang and Singh [
29] and Salvadori and De Michele [
30].
3.3. Copula Density
If the copula has a probability distribution function (PDF), then it is obtained in the usual manner as [
29]:
and the joint probability density function (pdf) of the two random variables is given as:
Subsequently, the conditional pdf of X
1, given the value of X
2, can be formulated as:
and the conditional pdf of X
2, given the value of X
1, can be expressed as:
3.4. Types of Copulas
There are five main types of copulas that are applied in different fields; the Gaussian and student-t copulas, which belong to the elliptical family, while the Gumbel, Clayton, and Frank copulas are from the Archimedean copulas family.
The Archimedean family is quite popular and attractive in hydrological flood frequency analysis. This is because of their capability of being easily generated and encapsulating a wide range of dependence structures with several desirable properties, such as symmetry and associativity [
31]. Furthermore, it has no limitation on the marginal distribution for constructing the joint distribution, making it a perfect fit in the analysis of the joint probability distribution in the flood analysis [
12].
The Archimedean copula is a simplified measure of dependence and is defined as [
18]:
where φ(u) is a copula function (C
2) called a generator of copular with φ(1) = 0 while φ′(u) < 0 then φ is decreasing. φ″(u) > 0 for all 0 ≤ u ≤1 [
2].
Each copula family’s persistence is constricted by the involvement of the flood variables (e.g., by using the Kendall’s rank correlation (t) dependence measure). To begin with, the Clayton copula and Gumbel-Hougaard copula can only model positive dependence between random variables (i.e., t
θ ≥ 0), while the Frank copula can model both negative and positive dependence structures (i.e., t
θ [−1, 1]). Despite the Ali-Mikhail-Haq copula being applicable for both negative and positive dependence, the dependence parameter is restricted for Kendall’s tau, t
θ [−0.1817, 0.3333], and thus its copula parameter θ does not cover the entire range [−1, 1] of association measures [
13]. Their formulas are shown in
Table 1 below.
3.5. Marginal Distribution of Flood Variables
There are several marginal distributions used for estimating the hydrological variables, such as normal distribution, lognormal, extreme value type-1 (Gumbel or EVI), and generalized extreme value distribution (GEV) [
12]. Two of the distributions that will be used in this study are elaborated in
Table 2 below.
3.6. Estimation of Parameters
Several methods are applicable for estimating copula parameters, such as method-of moments (MOM), maximum likelihood estimators, and matching moments methods [
32].
3.6.1. Method of Moment (MOM)
The MOM is a simple technique that matches the first amount of data with the theoretical moments of the distribution. This method is achieved by solving the sample (k) equations; hence, reducing the theoretical moment of a function of the parameters and parameter estimation problem. Given the k-th population moment of a random variable Y as
, and the k-th sample moment of a sample
to be
for k = 1,2,n, the method of moment of the distribution parameters
are attained by solving the set of
p equations expressed as [
32]:
Even though the method-of-moment estimators are not generally efficient, they are asymptotically normal and unbiased
3.6.2. Maximum Likelihood Estimation (MLE)
Although this method is considered difficult since it involves too many parameters, especially for large values of d, it is widely applied in flood risk. It takes the probability distribution of the flood variable, then makes the comparison of the distributed dataset to acquire a suitable match [
32].
The joint pdf of
is the same as the likelihood given as:
Assuming
as the parameter value at which
attains its maxima as the function of θ for each sample x,
is the maximum likelihood estimator of the parameter
, based on the sample data. Given the likelihood function as C
2, the probable MLE parameter values of θ are solved as [
32]:
3.7. Goodness-of-Fit Statistical Tests
The goodness-of-fit is a hypothesis test used to determine the correlation of variables and whether the collected data fit a particular distribution. They evaluate performance for both marginal and joint probability functions. The measures of the goodness-of-fit are used in hypothesis testing to test the normality of residuals as well as to check whether two samples (observed and from the marginal distribution) are drawn from identical distributions. The tests commonly used are: the chi-square, Kolmogorov–Smirnov, Anderson–Darling, root mean square error (RMSE), and Shapiro–Wilk. The estimation of empirical non-exceedance probabilities of the flood variables for the observed values is as follows [
2]:
where
is the probability of non-exceedance; N signifies the sample size; k is the kth smallest observation in the data set.
The root mean square error (RMSE) is one of the most applied measures of the goodness-of-test. RMSE calculates the variations between the hypothetical model and the observed values of the probability distribution. It is known that the smaller the RMSE, the better the estimation. It is given by [
11]:
where F
emp and C are the empirical and theoretical cumulative probability, respectively; n is the sample size. The smaller the RMSE value, the better the fitting [
11].
- 2.
Akaike Information Criterion (AIC)
The AIC value can be obtained as follows, established on the basis of RMSE [
11]:
where
N is the sample size and
k is the number of unknown parameters in the probability distribution
- 3.
The Kolmogorov–Smirnov (K–S) test
The Kolmogorov–Smirnov (K–S) test is preferred since it does not make any assumptions about the distribution of data [
33]. This method makes a comparison of the maximum gap of the experimental cumulative distribution function from the theoretical cumulative distribution function. The K–S test (
) is used to determine whether the parameters are acceptable or not and is given by [
33]:
where F
1,n = experimental distribution;
= theoretical distribution; sup = supremum function.
To perform the goodness-of-fit test, the null hypothesis is applied; it only gets accepted when the gap between the theoretical is smaller than the expected for the given sample.
3.8. Primary and Secondary Return Period
Primary and secondary return periods can be obtained if the appropriate copula functions are specified to reflect the joint probabilistic features among flood variables. The joint return periods can be described in OR and AND, formulated as follows [
31]:
where μ is the mean inter-arrival time of the two consecutive flooding events.
The secondary (Kendall’s) return period plays a vital role in identifying the average time between two supercritical flood events, thus analyzing the potential risk. The probability of a supercritical event decreases upon an increase in the primary return period. It is defined as [
31]:
where K
C is the Kendall’s distribution, associated with multivariate copula function C
θ.
For Archimedean copulas, K
C can be expressed as [
31]:
where
is the right derivative of the copula generator function
3.9. Flood Hydrologic Risk
In terms of the engineering design of hydrologic infrastructure, the risk is the probability of downstream flooding due to uncontrolled water release from upstream facilities associated with flooding [
31]. The intensity of flood risk is usually determined by the period of future re-occurrence of heavy inundation, commonly known as the return period. Under consideration of the service time (i.e.,
n) of a hydrological infrastructure, the risk of failure associated with the return period of a flooding event can be expressed as [
31]:
where R is the risk of failure,
p and
q are the probability of an event exceeding and not exceeding the set threshold, and T is the return period.
The failure probability (R) is the measure of flood risk ranging between 0 and 1, with a greater value indicating a greater risk magnitude. At most, the failure of hydrologic structures is due to high peak flow, making it an important attribute for risk analysis. Nevertheless, other flood variables, such as multi-day flood volume and duration, are also important for flood control. A risk framework that considers more than one variable may provide more support for actual flood control than the conventional analysis [
31]. It is essential to characterize floods through multiple variable aspects by a process called bivariate risk analysis. Bivariate risk analysis plays a significant role in taking non-structural safety measures and developing flood mitigation strategies. Bivariate risk analysis in relation to joint return period in AND is defined as [
31]:
6. Conclusions
6.1. Summary of Current Study
The novelty of this study was to analyze the flood risks for the River Thames within a multivariate context and further reveal the failure probabilities under consideration of the joint return period with different service time scenarios. Initially, a framework for bivariate hydrologic risk analysis for peak flow and 3-day annual maximum flow was prepared. This analysis was done based on historical data from 1883 to 2019, obtained from the National River flow regime website (NRFA). The marginal distribution of the obtained flood variables was conducted using Gumbel and lognormal methods by using two estimation methods—method of moment (MOM) and maximum likelihood estimation (MLE). The flood variables were hypothetically tested for statistical satisfaction. The Kolmogorov–Smirnov test was adopted to check for the acceptability of the variables, while the root mean square error (RMSE) was used to decide the best choice among the four options. The best fit from both flood variables was the Gumbel method obtained from MOM. The distributions of the two flood variables were joined using different copula methods—the Clayton, Frank, and Gumbel–Hougaard methods. The estimation of the unknown parameter of copula was conducted using the MOM method. The obtained copulas were then tested for statistical correlation using the Kolmogorov–Smirnov test. The results showed that all the methods were statistically correlated. The root mean square error (RMSE) and Akaike information criterion (AIC) methods were then used for the identification of the best copula among the three copula methods. Based on the selected copula, the primary, joint, and secondary return periods were developed. The bivariate hydrologic risk (i.e., failure probability) of the peak 3-day AMAX flow was then defined from the derived joint return periods
6.2. Main Findings and Recommendation
The Clayton copula was used to model the joint return period of the two flood variables to characterize the bivariate hydrological risk. Three return periods were estimated: primary, joint, and secondary return periods. The essence of calculating the secondary return period was to show the correspondence of the supercritical flood event. The secondary return period was located between TOR and TAND, whilst TOR was the shortest return period, less than the return period of individual flood variables, and TAND was the longest one among the joint and secondary return periods.
Therefore, the hydrological risk analysis of the flood pair was analyzed based on the TAND joint return period. Results showed that during a flood event, the bivariate hydrologic risk for the flood pair would have a constant 3-day flow up to 700 m3/s. However, there will be a decrease in flood risk as the 3-day flow rose above the stated value. Such bivariate values could be useful in the design of various engineering facilities that are to be placed on the Thames for flood control. Analysis of this study outlines the need to consider more variables apart from peak flow during infrastructure design to produce a more standardized design of the hydrologic facilities that can withstand extreme conditions of floods.
6.3. Suggestions for Future Research
This study only considered one flood pair, i.e., peak flow and 3-day AMAX flow, for risk characterization. During heavy inundation, peak was the major flooding factor for the failure of hydraulic facilities, making it an important factor for consideration of flood risk analysis. However, other factors have a role to play in the control and alleviation of river flooding. Such factors include flooding volume, duration, and annual maximum days flow. Analyzing flooding duration assists in the understanding of flood pressure, enabling its control while results from the volume factor airs out information on diversion of floods. Thus, further studies are required that consider other flood pairs (peak-volume, peak-duration, and peak-5-day AMAX) in order to come up with a robust multivariate flood analysis for the Thames.
Moreover, the developed model for multivariate risk analysis is based on stationary data series for both flood peaks and 3-day AMAX flows, since no significant variation trends are detected based on the historical records. Nevertheless, these stationary features may not be valid in the future due to the increasing effects of climate change. It is widely accepted that assuming stationarity in the hydrologic variables used for long-lived engineering designs is no longer tenable [
34,
35]. For the Thames Basin, it may hardly introduce covariates into the estimated parameters in the established copula model based on stationary historical records. One way to address the non-stationarity that may occur in future flood series is to generate flood series through reliable hydrological models based on climate projections as done by Bell et al. [
22]. The copula model can then be developed based on the projected flood series, in which covariates such as time can be used to reflect the nonstationary features in the flood series. Such a model can be applied for supporting flood resilience under climate change. Nevertheless, the results from the current study are still useful for flood control even though these results are based on stationary data series. These results can be considered as the reference baselines for future flood protection under climate change.