Next Article in Journal
Preparation of 2D/2D CoAl-LDH/BiO(OH)XI1−X Heterojunction Catalyst with Enhanced Visible–Light Photocatalytic Activity for Organic Pollutants Degradation in Water
Previous Article in Journal
Assessment of Hydrological and Meteorological Composite Drought Characteristics Based on Baseflow and Precipitation
Previous Article in Special Issue
Are Non-Conventional Water Resources the Solution for the Structural Water Deficit in Mediterranean Agriculture? The Case of the Segura River Basin in Spain
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas

1
Faculty of Geography, Yunnan Normal University, Kunming 650500, China
2
Yunnan Provincial Institute of Water Resources and Hydroelectric Survey & Design & Research, Kunming 650500, China
*
Authors to whom correspondence should be addressed.
Water 2024, 16(11), 1465; https://doi.org/10.3390/w16111465
Submission received: 21 April 2024 / Revised: 13 May 2024 / Accepted: 17 May 2024 / Published: 21 May 2024

Abstract

:
Accurate assessment and prediction of water shortage risk are essential prerequisites for the rational allocation and risk management of water resources. However, previous water shortage risk assessment models based on copulas have strict requirements for data distribution, making them unsuitable for extreme conditions such as insufficient data volume and indeterminate distribution shapes. These limitations restrict the applicability of the models and result in lower evaluation accuracy. To address these issues, this paper proposes a water shortage risk assessment model based on kernel density estimation (KDE) and copula functions. This approach not only enhances the robustness and stability of the model but also improves its prediction accuracy. The methodology involves initially utilizing kernel density estimation to quantify the random uncertainties in water supply and demand based on historical statistical data, thereby calculating their respective marginal probability distributions. Subsequently, copula functions are employed to quantify the coupled interdependence between water supply and demand based on these marginal probability distributions, thereby computing the joint probability distribution. Ultimately, the water shortage risk is evaluated based on potential loss rates and occurrence probabilities. This proposed model is applied to assess the water shortage risk of the Yuxi water receiving area in the Central Yunnan Water Diversion Project, and compared with existing models through experimental contrasts. The experimental results demonstrate that the model exhibits evident advantages in terms of robustness, stability, and evaluation accuracy, with a rejection rate of 0 for the null hypothesis of edge probability fitting and a smaller deviation in joint probability fitting compared to the most outstanding model in the field. These findings indicate that the model presented in this paper is capable of adapting to non-ideal scenarios and extreme climatic conditions for water shortage risk assessment, providing reliable prediction outcomes even under extreme circumstances. Therefore, it can serve as a valuable reference and source of inspiration for related engineering applications and technical research.

1. Introduction

Global warming and intensified human activities have exacerbated the spatiotemporal variations in global precipitation, evapotranspiration, runoff, and their associated water cycle [1,2], increasing uncertainty and risk in water supply and posing greater challenges to water resource management [3]. The “Global Drought Snapshot 2023: The Need for Proactive Action” [4] points out that if the global average temperature exceeds pre-industrial levels by 3 °C, an estimated 170 million people will experience extreme drought; limiting the temperature rise to 1.5 °C would result in an expected 50 million people experiencing extreme drought. Furthermore, 15–20% of China’s population could face more frequent moderate-to-severe droughts within this century; and by 2100, the intensity of drought in China is expected to increase by 80%. Therefore, the water crisis is a global issue [5], with China being particularly affected. Additionally, over the past century, the world’s freshwater usage has increased sixfold [6], and it is forecast that the global water demand will continue to rise, with approximately 25% of major cities facing heightened water stress [7]. Therefore, under the conditions of ongoing climate change and increasing water resource demand, accurate assessment and prediction of water scarcity risks are crucial for regional water resource planning, allocation, and risk management [8,9]. Since the 1980s, scholars have conducted extensive research on water scarcity and risk assessment [10,11,12,13,14], making significant contributions to mitigating global and regional water scarcity risks. However, there has been little research on the robustness of models and the accuracy of assessments.
Assessing and predicting water shortage risks is a complex process, owing to the uncertainty and extremity of hydrological variables in both space and time. Hashimoto, et al. [15] were the first to propose quantitative assessment indicators for water scarcity risks, including reliability, vulnerability, and resilience, from the perspective of the probability of water resource system failures. Subsequently, indicators such as the Falkenmark Water Stress Index [16], per capita water scarcity standards [17], Social Water Stress Index (SWSI) [18], Water Poverty Index (WPI) [19,20,21], and Water Scarcity Risk Index (WSRI) [22] have been proposed for assessing the degree and risk of water scarcity at both global and regional scales [23,24,25,26]. The aforementioned methods extract key elements from events and factors causing water scarcity to construct an evaluation indicator system for quantifying water shortage risk. Commonly used indicator weighting methods include the analytic hierarchy process (AHP) [27], entropy weight method [28], maximum entropy principle [29], principal component analysis (PCA) [30,31], projection pursuit [32], G1 method, and entropy weight–G1 method [33]. Approaches for risk quantification based on such indicators and weights include fuzzy comprehensive evaluation [27,30,34], fuzzy cluster analysis [35], variable fuzzy set evaluation model [36], grey relational and information diffusion theory [37,38,39], normal cloud model [40], matter element model [33,41], dynamic modeling of water resource shadow price calculation [42], and multi-objective risk decision models [43]. These types of methods have simple principles and are easy to use, but have strong subjectivity in indicator selection, weight determination, and risk level mapping. The evaluation results are relatively rudimentary, and the evaluation accuracy is not high, so they are only suitable for large-scale applications.
Another type of data-driven method [44,45,46] involves capturing data patterns through mathematical statistics, simulating the probability distribution of hydrological variables based on historical statistical data, thereby quantifying the risks associated with the random uncertainty of hydrological variables. The commonly used univariate probability distribution simulation methods include Pearson-III [47], normal [48], logistic [49], log-logistic [50], generalized extreme value (GEV) [51], gamma and Weibull [52,53], etc. These probability distribution functions are commonly used to fit the marginal probability distributions of hydro-meteorological time series such as precipitation [47], runoff [54,55], water supply and demand [50], drought duration and severity [51,53], potential evapotranspiration, and average temperature [52]. In practical situations, water shortage risk is a multi-variable stochastic coupled risk, requiring resolution of joint probabilities across multiple variables. Copula function families are widely used for computing joint probability distributions of multiple variables [50,53,56,57,58], due to their excellent ability to characterize the degree and structure of multivariate correlations [59]. It is important to emphasize that this requires building upon previous univariate marginal distributions. Ultimately, water scarcity risk is defined as a function of the joint probability of occurrence of water shortage events and their potential losses [48,60,61].
Previous studies have commonly selected a function from known distribution families that fits the univariate probability distribution as the marginal distribution, presupposing knowledge of the specific distribution to which the data adhere, verified through significance tests. However, the distribution forms of hydrological, meteorological, and socioeconomic variables are complex and diverse, potentially approximating normal or skewed distributions, complicating selection of distribution functions. This challenge may result in suboptimal function choices or even an inability to identify a suitable distribution function, severely restricting model applicability. To address this, this paper introduces kernel density estimation (KDE) [62,63,64] as a fitting method for univariate marginal probability distributions. Leveraging its non-parametric nature and capability to capture local detail features, KDE enhances model robustness and fitting accuracy, enabling adaptation to complex real-world scenarios. Multivariate joint probability distribution modeling utilizes copula functions. To accommodate different correlation structures among multivariate combinations, the copula function is not fixed. By employing Spearman and Kendall rank correlation coefficients [65,66], the optimal copula function is dynamically selected for each set of multivariate data as the final linking function, enhancing the predictive accuracy of the model. Finally, water shortage risk is quantified from the perspectives of joint probability and loss rate.
The Central Yunnan Water Diversion Project is a major water diversion project aimed at addressing severe water shortage issues in the Central Yunnan region, with significant economic, social, and ecological benefits [67]. Studying the characteristics and variations of water shortage risks in the water-receiving area based on historical supply and demand data is of great value, as it can provide a basis for the precise allocation of water resources after project completion. Applying the method proposed in this paper to evaluate the historical water shortage risk in the Yuxi water-receiving area of the Central Yunnan Water Diversion Project and conducting comparative experiments with existing methods will validate the superiority of the proposed method in terms of robustness, resilience, and predictive accuracy.

2. Principles and Methods

The water shortage risk refers to the probability and severity of threat posed to the normal operation of social, economic, and ecological systems in a specific spatiotemporal context due to the disruption of the water supply–demand balance caused by the randomness and uncertainty of water supply and demand [27]. Water scarcity is primarily evaluated and analyzed from the perspective of the water supply–demand balance to assess the degree and duration of water shortage and its potential impacts on life, production, and ecology. This evaluation considers how imbalances in water availability can disrupt normal societal functions, economic activities, and environmental health. This paper draws on Qian, Zhang, Wang and Hong [48] for the definition of water shortage risk, defining it as the product of water shortage probability level and potential loss rate level. Firstly, KDE is utilized to simulate the marginal probability distributions of long-term time series of both water supply and demand, quantifying the stochastic uncertainty of supply and demand. Secondly, based on the marginal probability distributions of both supply and demand, copula functions are employed to simulate the joint probability distribution, quantifying the probability of water shortage occurrence. Subsequently, the potential loss rate due to water shortage for each water user is calculated to quantify the severity of water shortage. Finally, both the probability of water shortage and the potential loss rate are divided into five warning levels, and criteria for level division are determined. The calculated water shortage probability and potential loss rate from the previous steps are mapped to the warning levels and multiplied to obtain the water shortage risk warning level. The constructed model not only characterizes the interdependence of supply and demand but also quantifies the coupling risk under extreme events. The algorithm flowchart of the model is shown in Figure 1.

2.1. Simulating the Marginal Probability

The monthly probability distributions of water supply and demand vary widely and are difficult to determine. This paper introduces kernel density estimation (KDE) to simulate the marginal probability distributions of water supply and demand sequences. KDE is a non-parametric probability density estimation method that does not require assuming data follow a specific distribution. Instead, it fits the probability distribution of discrete data based on the characteristics and properties of the data themselves. It can flexibly adapt to various complex and unknown data distribution forms and is commonly used in statistics for inferring the distribution of population data based on finite samples. Assuming that the discrete data points x 1 , x 2 , , x n are from an unknown distribution, the formula for the probability density f ( x ) at any point x is
f ( x ) = 1 n h i = 1 n K x x i h
where n is the sample size, h is the bandwidth parameter determining the smoothness of the estimated curve, and K ( · ) is the kernel function. Based on the smoothness and good mathematical properties of the Gaussian kernel function, this paper selects the Gaussian function as the kernel function for kernel density estimation. Thus, the Gaussian kernel density estimation function is
f ( x ) = 1 n h 2 π i = 1 n e ( x x i ) 2 2 h 2
Therefore, the formula for calculating the probability distribution at any point x is
F ( x ) = f ( x ) d x = 1 n h 2 π i = 1 n e ( x x i ) 2 2 h 2 d x
The size of the bandwidth h directly determines the performance of the fitting result. The value of h cannot be too large or too small. If it is too large, it does not satisfy the condition h → 0, and if it is too small, it means that too few points are involved in the fitting, leading to large errors. For determining the value of h , this paper borrows an idea from machine learning, constructing a risk function by minimizing the error through optimization iterations. When the error converges to a certain set value, the optimal bandwidth h is obtained. Here, the risk function is constructed by minimizing the mean integrated squared error (MISE):
M I S E ( h ) = m i n h f ^ ( x ) f ( x ) 2 d x
where f ( x ) is the true probability density function, and f ^ ( x ) is the probability density function obtained by using a given value of h. We need to find an h that minimizes M I S E ( h ) and this yields the optimal bandwidth value. Based on Silverman’s rule of thumb [68], the formula to calculate the optimal bandwidth h is given by
h b e s t = 1.06 × m i n σ ^ , I Q R 1.34 × n 1 5
where n is the sample size; σ ^ is the sample standard deviation; IQR (interquartile range) is the interquartile range, which is the difference between the upper quartile and the lower quartile, serving as another measure of data dispersion. When the distribution contains outliers, using IQR is more robust.

2.2. Simulating the Joint Probability

The water shortage risk arises from the randomness and uncertainty of water supply and demand. Copula functions can connect the joint distribution of multiple variables with their respective marginal distributions, capturing the stochastic dependence between variables. They are widely used in multivariate hydrological risk analysis and calculation [55,58]. In this study, copula function families are utilized to simulate the joint probability distribution of water supply and demand, aiming to quantify the coupling probability of water shortage. Due to the different properties of various copula functions and their ability to describe different correlation structures, as well as the varying correlation structures and probability distributions of water supply–demand sequences in different regions and months, selecting appropriate copula functions for different water supply–demand sequences can improve the fitting accuracy. Specifically, this study chooses a total of five copula functions from two families: the Archimedean function family (Frank copula, Clayton copula, Gumbel copula) [69] and the elliptical function family (Gaussian copula, t copula) [52] to simulate the joint probability distribution of water supply and demand. The best-fitting copula function is then selected as the final linking function. Copulas construct multidimensional joint distributions based on marginal distributions and correlation structures, with their fundamental theory based on Sklar’s theorem.
Sklar’s theorem [70]: Let x and y be continuous random variables with marginal distribution functions denoted by u = F x ( x ) and v = F y ( y ) respectively. Let F x , y be the joint distribution function of variables x and y . If F x and F y are continuous, then there exists a unique function C ( u , v ) such that
F x , y = C F x ( x ) , F y ( y ) , x , y
C ( u , v ) is the copula connection function, where u and v are the cumulative distribution function values of variables x and y , respectively. Different copula connection functions have different mathematical forms and properties. Table 1 below provides a brief introduction to the properties of the five bivariate copula connection functions and their parameters.

2.3. Estimation of Potential Loss Rate

The potential loss rate is the ratio of the potential loss incurred due to water shortage to the normal production value under conditions of full water supply for the water user. This paper categorizes water users into three groups: production, domestic, and ecological.
The production users mainly include industrial and agricultural. The potential loss rate for production, denoted as L P is defined as the ratio of the reduction due to water shortage in agriculture and industry to the normal production value; the calculation formula is as follows:
L P ( x , y , p ) = i = 1 N x i y i × p i i = 1 N x i × p i , i = 1 , 2
where x i and y i represent the water demand and supply for industrial or agricultural production, respectively, and p i denotes the economic value that ten thousand cubic meters of water supply can bring to industrial or agricultural production.
The potential loss rate of domestic (denoted as L L ) and the potential loss rate of ecological (denoted as L E ) cannot be quantified in the same manner as production. In this study, they are represented using the water shortage rate, defined as
L L ( x , y ) = x L y L x L
L E ( x , y ) = x E y E x E
where x L and y L represent the demand and supply of water for domestic, and x E and y E represent the demand and supply of water for ecological.
Given the differing levels of importance between production, domestic, and ecological water uses, an empirical weight is assigned to each type’s potential loss rate to distinguish their significance. Considering that domestic water use is the most critical, followed by production water use, and then, ecological water use, we assign weights of 0.5, 0.3, and 0.2, respectively, in that order. Consequently, the overall water shortage potential loss rate for a given assessment unit is finally defined as
L x , y , p = 0.5 × L L + 0.3 × L P + 0.2 × L E

2.4. Estimation of Water Shortage Risk

In this study, the water shortage risk warning level R is defined as the product of the water shortage probability warning level and the potential loss rate level:
R = M F x , y · M L x , y , p
Here, M represents the level mapping function, which is utilized to map water shortage probabilities to their corresponding warning levels according to probability threshold intervals outlined in Table 2. Similarly, it maps potential loss rates to their respective warning levels based on loss rate threshold intervals provided in Table 3. The final level mapping matrix for water shortage risk and the corresponding warning level and safety level classification standards can be found in Table 4. The equation indicates that the overall risk assessment takes into account both the likelihood of a water shortage event occurring and the severity of potential losses associated with such an event. By multiplying these two factors, a more comprehensive understanding of the risk posed by water shortage can be achieved.

3. Instance Application and Comparative Experiment

3.1. Overview of the Study Area

The Yuxi receiving water area of the Central Yunnan Water Diversion Project selected as the case study region (refer to Figure 2) is located in the central part of Yunnan Province, China. For convenient management and allocation of water resources, the research area is divided into 13 receiving water sub-regions and three ecological lakes, as depicted in Figure 2a. The hydrological and topographical features of the study area are observed in Figure 2b. Major river systems in the area include the Qujiang, Nanpanjiang, Dajie, and Hongqi rivers. The region encompasses five meteorological stations, Hongta, Jiangchuan, Tonghai, Huaning, and Eshan (Figure 2b), which serve as sources of precipitation data. The average annual precipitation in the study area ranges from 800 to 900 mm. From May to October, influenced by the southeast monsoon from the Bay of Bengal and the southwest monsoon from the Indian Ocean, the region experiences abundant rainfall, with approximately 83% to 87% of the annual precipitation occurring during this period. The months from June to September receive the highest rainfall, accounting for 65% to 69% of the annual total. From November to April of the following year, the region is influenced by dry and clear weather conditions due to the dry and warm air currents from the northern Indian continent, resulting in minimal precipitation, accounting for only 13% to 17% of the annual total. The driest months, such as December and February, receive only about 2% of the annual precipitation. The average annual temperature ranges from 15 to 16 °C, with little variation in other climatic features such as average evaporation and relative humidity. Runoff is primarily derived from precipitation, with distinct seasonal variations. Approximately 85% of the runoff occurs during the flood season from June to November, with the highest runoff volumes observed in July, August, and September, accounting for 58% of the annual total. In contrast, during the dry season from December to May of the following year, runoff decreases significantly, accounting for only about 15% of the annual total, with the lowest runoff volumes observed in March and April, representing only 2.9% of the annual total.

3.2. Data Sources

The precipitation data are sourced from the monthly surface meteorological observation historical dataset provided by the National Meteorological Science Data Center, available at: https://data.cma.cn (accessed on 20 January 2024), monthly precipitation data from five meteorological stations, including Hongta, Jiangchuan, Tonghai, Huaning, and Eshan, spanning from 1961 to 2011. Historical supply and demand data from 1960 to 2011 for 13 sub-receiving water areas, as well as population and economic data, were provided by the Yunnan Provincial Institute of Water Resources and Hydroelectric Survey, Design, and Research. The supply and demand data are monthly and include categories such as domestic, industrial, agricultural, and ecological; annual supply and demand series in the water-receiving area are shown in Figure 3. It can be observed that agricultural water demand is the highest, with significant annual fluctuations, followed by industrial, domestic, and ecological water demand, respectively. Other hydrological and water resource data are sourced from the “Yuxi City Water Resources Bulletin”; and the socioeconomic data are sourced from the “Statistical Bulletin on National Economic and Social Development of Yuxi City”.

3.3. Building the Water Shortage Risk Assessment Model

We use the annual supply–demand water series of Xiushan as sample data to demonstrate the construction process of the water shortage risk assessment model, implemented through Matlab programming. The key to model construction lies in determining the optimal bandwidth for Gaussian kernel density estimation and solving the parameters of the bivariate copula model.
To compute the optimal bandwidth
The optimal bandwidths for Gaussian kernel density estimation for the demand and supply water samples are calculated as h d e m a n d = 444.4479 and h s u p p l y = 457.8039 , respectively. To validate the fitting effect of the optimal bandwidth, we scale down (0.2 and 0.5 times) and scale up (1.5 and 2 times) the optimal bandwidths and use them as parameters in Equation (2) to simulate the probability density of demand and supply water sequences. The results are shown in Figure 4. From Figure 4, it can be observed that the fitting effect of the optimal bandwidth is the most ideal. The fitting result with the reduced bandwidth is not smooth enough and tends to overfit, while the fitting result with the enlarged bandwidth is too smooth, leading to larger errors. Then, by using the optimal bandwidths in Equation (3), we can construct the simulation functions for the marginal probability distributions of supply and demand water sequences.
Estimation of Bivariate Copula Parameters
Based on the marginal probability distributions of water supply and demand, the model parameters of the Gaussian copula, t copula, Gumbel copula, Clayton copula, and Frank copula are estimated using Matlab’s copulafit function. The parameter estimation results are shown in Table 5, and the fitted bivariate copula density functions and distribution functions are illustrated in Figure 5. Thus, by incorporating the model parameters into Equation (6), the bivariate copula joint probability fitting model can be obtained.

3.4. Model Performance Comparative Experiment

To verify the superiority of the proposed model in robustness and assessment accuracy, comparative experiments are designed from two aspects: simulating the robustness and fitting accuracy of univariate marginal distributions, and simulating the stability and fitting accuracy of bivariate joint probabilities. Comparative experiments are conducted using Matlab programming. Finally, the experimental results are compared and analyzed to provide evidence and insights.

3.4.1. Comparison Experiments on the Accuracy and Robustness of Marginal Distribution Simulations

To validate the superiority of KDE in simulating the univariate probability distribution of monthly supply–demand sequences, four commonly used distribution functions in the hydrological field (gamma [52], normal [48], logistic [49], Pearson3 [57]) were selected for comparison experiments with the introduced KDE in this paper. The Kolmogorov–Smirnov test (KS test) and root mean square error (RMSE) were utilized to evaluate the robustness and assessment accuracy of the comparison methods from both confidence probability and fitting accuracy perspectives.
The Kolmogorov–Smirnov test [71,72] is a non-parametric goodness-of-fit test. It determines whether to reject the null hypothesis by comparing the maximum absolute deviation (D-statistic) between the empirical cumulative distribution function (ECDF) of the sample and the theoretical cumulative distribution function (CDF) to the critical value at a specified significance level (0.05). Suppose the sample set X = x 1 , x 2 , x 3 , , x n has an empirical cumulative distribution function F n ( x ) , and X follows a theoretical distribution F ( x ) . The formula to compute the maximum absolute deviation D n between the two is as follows:
D n = s u p x F n x F x
where s u p x represents the supremum, which is the lowest upper bound of the distances, i.e., the maximum value among all possible absolute differences. If X follows the theoretical distribution F ( x ) , then as n tends to infinity, D n almost surely converges to 0.
The root mean square error (RMSE) [73] is commonly used to assess the performance of prediction models or the goodness-of-fit of data. In this paper, RMSE is utilized to quantify the difference between the empirical distribution and the fitted theoretical distribution as a measure of goodness-of-fit evaluation [74]. RMSE can be expressed as follows:
R M S E = 1 n i = 1 n F n x i F x i 2
In the above equation, F n x i represents the empirical cumulative probability distribution of the observed data, F x i represents the fitted theoretical probability distribution, and n is the number of samples in the dataset.
We simulated the probability distributions of monthly and annual water supply and demand sequences for 13 assessment units (sub-regions) in the study area using gamma, normal, logistic, Pearson3, and KDE, resulting in a total of 169 sequences for both supply and demand. Then, based on the fitted models and empirical cumulative probabilities, we used Equations (12) and (13) to calculate the KS p-value and RMSE.
Comparative analysis of the Kolmogorov–Smirnov test results
The results of the KS p-value calculations for the demand and supply sequences are plotted as curves in Figure 6. The p-value represents the probability of observing the current sample data or more extreme data under the assumption that the null hypothesis is true. A higher p-value indicates a higher probability of observing the data, indicating a better fit. From Figure 6a, it can be observed that out of the 169 demand sequences, only the 164th sequence has a KS test result of 0.038 (normal distribution fit), which is below the significance confidence level of 0.05. The results of the KS test for the other sequences are above the confidence level, indicating that extreme situations in water demand are relatively rare, and most tend towards a normal distribution. From the KS p-value calculation results for the supply sequences (Figure 6b), it can be seen that, except for KDE, the KS tests for the other four distributions show instances where the p-value is below the 0.05 confidence level. This indicates that the distribution of water supply is complex and diverse, and some sequences do not belong to any of the four known distributions; only KDE can correctly fit their probability distribution. Therefore, this indicates that KDE can perform well when the fitting results of other methods are poor. It can adapt to all extreme distribution shapes, demonstrating robustness and resilience unmatched by other methods.
The statistical results of the KS p-values, including the mean, variance, and rejection rate of the null hypothesis, are summarized in Table 6. The optimal values are highlighted in bold in the table. The mean is used to measure the overall fitting accuracy of the method. A higher mean of the KS p-value indicates higher overall fitting accuracy. From the mean statistics in Table 6, it can be observed that KDE has the highest overall fitting accuracy for both the demand and supply sequences. The variance is used to evaluate the volatility of the KS p-value results, which reflects the stability of the method. A smaller variance indicates less volatility and greater stability of the method. From the variance statistics in Table 6, it can be seen that KDE exhibits the most stable fitting of the distribution shapes for both the demand and supply sequences. The rejection rate of the null hypothesis indicates the proportion of p-values lower than the significance level of 0.05, serving to characterize the fitting capability of the method when facing various distribution shapes. As shown in Table 6, the rejection rate of the null hypothesis for KDE is 0, indicating that KDE can adapt to the distribution shapes of all water supply and demand sequences, demonstrating the strongest modeling capability.
Comparative analysis of the RMSE evaluation results
The RMSE calculation results for the empirical and theoretical probability distributions of the water demand and supply sequences are shown in Figure 7. RMSE is a measure of the deviation between empirical and theoretical distributions, where smaller RMSE values indicate better fitting accuracy. The mean, variance, and range of RMSE values are then calculated in Table 7, with the optimal values highlighted in bold. The mean RMSE represents the average deviation of the fitting, with smaller values indicating higher overall fitting accuracy. The variance and range of RMSE values assess the stability and robustness of the fitting, where smaller values indicate lower fluctuation and a narrower range. From Figure 7a and Table 7, it can be observed that all five methods exhibit relatively high fitting accuracy for the water demand sequences, with fitting deviations below 0.1. Pearson3 and KDE have the smallest mean deviations, but KDE also demonstrates the lowest variance and range, indicating that among the five methods, KDE not only offers high fitting accuracy but also superior stability and robustness. Regarding the RMSE evaluation results for the water supply sequences in Figure 7b and Table 7, it is evident that the fitting deviations vary significantly among the five methods. Except for KDE and logistic, the other three methods show considerable fluctuation, especially gamma, which has the largest mean deviation, variance, and range. This suggests that the gamma distribution cannot adequately accommodate all distribution patterns of the water supply sequences. In contrast, KDE exhibits the smallest mean deviation, variance, and range, indicating that KDE not only offers the highest fitting accuracy but also the best stability and robustness, capable of accommodating all extreme distribution shapes of the water supply sequences.
Comparison and analysis of the fitting results
To visually compare the fitting effects of the five methods more intuitively, empirical probability distribution graphs and fitted theoretical distribution graphs were further plotted. Five representative sets (Figure 8) were selected from all the results for comparative analysis. From the fitted curves, it can be observed that for sequence data tending towards a normal distribution, such as the water demand and supply sequences of Xiushan, all methods achieve relatively good fitting effects. For sequences with significant fluctuations in demand and supply, such as Luohe and Yanhe, except for poor fitting by the gamma distribution, the other four methods can still adapt. However, for supply sequences with special distribution shapes like Shifu and Lishan-2-month, only KDE can accurately fit their probability distributions.
Overall, it is evident that KDE demonstrates the optimal fitting capability, showing robustness and resilience for random variables of water supply and demand with diverse and unknown distribution shapes. It can adapt to extreme distribution shapes and is highly suitable as a marginal distribution simulation function for water supply and demand sequences.

3.4.2. Joint Probability Simulation Accuracy and Robustness Comparative Experiment

The accuracy of marginal probability fitting for water supply and demand univariate variables, along with the suitability of copula functions, jointly determine the accuracy of the joint probability simulation. The higher the accuracy of the joint probability simulation, the more accurate the assessment of the water shortage risk. To further verify the superiority of KDE in joint probability simulation, the marginal probability distributions of water supply and demand obtained by the previous five methods are, respectively, input into the five copula functions in Equation (6) to calculate their respective joint probability distributions. That is, the joint probabilities of water supply and demand are calculated for each combination of the five univariate probability simulation methods and the five copula functions. Then, the mean of the Spearman correlation coefficient and Kendall rank correlation coefficient for the bivariate copula under the given parameters, which are shown in Table 5, is used to select the optimal copula for each method as the final simulated linkage function. Finally, the accuracy of the model is evaluated by calculating the squared Euclidean distance (SED) between the empirical joint probability distribution and the theoretical joint probability distribution.
The Spearman and Kendall rank correlation coefficients are non-parametric methods for measuring the strength and direction of the relationship between two variables based on the ranks of data objects [65]. They are particularly suitable for situations where the data do not follow a bivariate normal distribution or the measurement scale is not continuous and quantitative, and they are not influenced by outliers. The difference lies in that the Kendall rank correlation coefficient ( τ ) is based on the concordant and discordant pairs of two sample datasets, as shown in Equation (14), while the Spearman rank correlation coefficient ( ρ ) is based on rank differences, as shown in Equation (15).
τ a = c d 1 2 n n 1 ,   τ b = c d c + d + t x c + d + t y
In the equations, n represents the number of samples, 1 2 n n 1 represents the total number of pairwise combinations of samples, c and d , respectively, represent the number of concordant and discordant pairs; τ b is used to handle tied ranks, where τ x and τ y represent the number of tied ranks in datasets X and Y , respectively. It is important to note that tied ranks occurring simultaneously in both X and Y are not counted in τ x and τ y .
ρ = 1 6 d i 2 n n 2 1
In the equation, n represents the number of samples, and d i is the absolute difference between the ranks of the original observed data x i and y i .
The formula for calculating the mean of the Spearman coefficient ( ρ ) and Kendall rank coefficient ( τ ) (abbreviated as MSK) is as follows:
M S K = τ + ρ 2
The formula for calculating the squared Euclidean distance between the empirical joint probability distribution and the theoretical joint probability distribution is as follows:
S E D = i = 1 n E i C i 2
The symbol E i represents the empirical joint probability distribution, C i represents the theoretical joint probability distribution fitted by different copula functions, and n is the sample size.
Comparative analysis of MSK
We use Matlab’s Copulastat function to calculate the τ and ρ of the bivariate copula under the given parameters, to measure the suitability of the input data with the copula function. Higher means of τ and ρ (MSK) indicate that the copula function is more capable of describing the dependence between input data, making it more suitable for simulating their joint probability distribution. Therefore, the copula function corresponding to the maximum MSK is chosen as the final connection function to compute the joint probability. The calculated results of MSK for the optimal bivariate copula, using gamma, normal, logistic, Pearson3, and KDE as marginal distribution functions, are shown in Figure 9. The mean and variance of MSK are further calculated and presented in Table 8, with the optimal values highlighted in bold. From Figure 9, it can be observed that the MSK of the bivariate copula with KDE as the marginal distribution function is higher than that of the other four methods. Particularly for some extreme supply and demand distribution shapes, KDE demonstrates significant superiority and robustness. From the statistical results in Table 8, it can be observed that the bivariate copula with KDE as the marginal distribution exhibits the optimal ability to fit joint probability distributions, demonstrating strong stability and robustness, particularly for extreme bivariate distribution shapes.
Comparative analysis of SED
First, the empirical joint probability distribution values of the supply and demand marginal distributions were calculated, and then, the optimal bivariate copula joint probability distribution values were calculated using five methods, gamma, normal, logistic, Pearson3, and KDE, as the marginal distribution functions. Subsequently, these values were input into Equation (17) to compute the squared Euclidean distance (SED) between the empirical joint probability (ECDF) and theoretical joint probability (CDF). The calculation results are shown in Figure 10, while the mean and variance statistics of SED are presented in Table 9, with the optimal values highlighted in bold. From Figure 10 and Table 9, it can be observed that the optimal bivariate copula model based on KDE exhibits minimal fitting bias and low fluctuation for all supply and demand sequences. The SED’s mean and variance are both minimal, indicating high simulation accuracy, robustness, and stability of the optimal bivariate copula model based on KDE.

3.5. Water Shortage Risk Assessment and Results Analysis

The superiority of the water shortage risk assessment model based on KDE and the optimal copula function has been validated. By incorporating monthly and yearly water supply and demand data, as well as potential losses of water users, into the model, and combining risk classification levels and threshold mapping standards (Table 4), the monthly and yearly water shortage risks for all water receiving sub-regions in the study area were calculated. The findings of this study are consistent with those of Jin, et al. [75]: the frequency, cumulative intensity, and cumulative impact station of regional droughts in Yunnan all show an increasing trend; droughts in Yunnan occur most frequently in December, January, and March, and least frequently in July and August.
Seasonal Variation Characteristics Analysis of Water Shortage Risk
The average water shortage risk was calculated for the four seasons (spring, summer, autumn, and winter) and the seasonal variation characteristic curve of the water shortage risk is plotted in Figure 11. From Figure 11, it can be observed that the water shortage risk is generally higher in spring and winter compared to summer and autumn, with the highest overall water shortage risk occurring in spring and the lowest in autumn. In wet years, the water shortage risk is low throughout the spring, summer, autumn, and winter seasons, while in dry years, the water shortage risk in spring is significantly higher than in other seasons. Overall, there is a trend of increasing water shortage risk.
Analysis of water shortage risk in typical wet, normal, and dry years
Based on the precipitation observation sequences from five precipitation observation stations in the water receiving area over 51 years (1961–2011), the cumulative frequency of the sorted average precipitation was fitted using the Pearson3 curve; five typical years were selected for scenario analysis of monthly water shortage risk, corresponding to precipitation frequencies of 5% (extremely abundant year: 1971), 25% (abundant year: 1981), 50% (normal year: 1996), 75% (drought year: 1982), and 95% (extreme drought year: 2011). The water shortage risk was categorized into high risk, medium risk, and low risk based on the 6 to 12 level of risk (refer to Table 4), as shown in Figure 12. From Figure 12, it can be observed that the water shortage risks in extreme drought years and drought years are significantly higher than in other years, with the first half of the year exhibiting higher risk than the second half. Water shortage risk in abundant years and extremely abundant years is relatively low and remains consistent throughout the year, which is related to the typical monsoon climate in the study area. In extreme drought years, except for the rainy season in August, the region experiences high risk almost throughout the year. The first half of drought years is almost consistentlyat a high-risk level. Normal years and abundant years are mostly in a medium-risk state, with only extremely abundant years approaching a low-risk level.
Spatial Distribution Characteristics of Water Shortage Risk
The mean and variance of water shortage risks in each water-receiving sub-region from 1961 to 2011 were calculated. The statistical values were then utilized to create thematic maps of the spatial distribution of the mean and variance of water shortage risks using the ArcGIS platform, based on the inverse-distance-weighted interpolation method (Figure 13). This was performed to evaluate the long-term water shortage risk status and the variation in the water-receiving sub-regions. Combining Figure 13a,b, it can be observed that Shifu, Ynahe, Dajie, and Ningzhou have been in a severe water shortage state for a long time, indicating that they are high-risk areas; this is related to the population concentration and rapid economic development in these areas. Dajie and Ningzhou exhibit significant fluctuations in water shortage risk, indicating a stronger dependence on precipitation; Qianwei, Jiangcheng, Xiushan, and Lishan have been in a moderate water shortage state for an extended period; the water resources in Luohe, Jiuxi, Xiongguan, Panxi, and Gaoda are abundant, and the water shortage risk has remained relatively low over the long term; the variance in water shortage risk in Luohe, Jiuxi, and Xiongguan is significant, indicating weak resilience in these areas, which are susceptible to the impacts of climate change and insufficient water infrastructure; only Gaoda exhibits strong resilience to water shortage risk, ensuring long-term water supply security without the need for external water supplementation.

4. Discussion

The accuracy of water scarcity risk assessment hinges on the time scale, spatial scale, water supply and demand entities, and assessment methods. To enhance overall quantification accuracy, Mekonnen and Hoekstra [76] utilized a higher spatial resolution (30 × 30 arc minutes), evaluated on a monthly time scale, and incorporated environmental flow requirements, yielding a more precise depiction of water scarcity scenarios. Salmivaara, et al. [77], addressing the modifiable areal unit problem (MAUP) in spatial water resource assessments, which arises from significant differences in results due to variations in analysis unit selection, proposed a multi-zone, multi-scale approach to enhance evaluation robustness. Similarly, Veldkamp, et al. [78], recognizing the inadequacy of a single scale for comprehensively revealing water scarcity issue diversity and complexity, employed probability simulation and multi-scale analysis methods to capture variations and extremes across scales, considering climate change and population growth contributions to water scarcity risks across regions. Some scholars have also made advancements in remote sensing, GIS technology, and the integration of multi-source data [9,79].
To enhance assessment accuracy, our study simultaneously considers temporal scales, spatial scales, and water users. Temporally, we examine the seasonal characteristics and historical trends of water scarcity risk on monthly and annual scales. Spatially, we use the minimum water-receiving area as the assessment unit to study differences in water scarcity risk and spatial distribution patterns under varying water endowments and socioeconomic structures across regions. Accounting for variations in domestic, industrial, and agricultural water demand and potential economic losses, the evaluation results better reflect reality. Finally, in terms of probability distribution simulation methods, our study pioneers the use of kernel density estimation to model the marginal probability distributions of supply and demand variables. The advantage of KDE lies in it not requiring a pre-assumption that the data follows a specific distributional form, nor does it need any prior knowledge. It can fit the probability distribution of discrete data based on the characteristics and properties of the data themselves. In particular, when the data distribution is complex and does not closely resemble any known distribution, KDE can capture the local features and details of the data, thereby providing more accurate and robust probability density estimates. However, a drawback of KDE is that it may not perform as well as methods specifically designed for discrete data when dealing with highly discrete data.

5. Conclusions

There are numerous studies on water shortage risk assessment models, but few studies exploring the robustness and evaluation accuracy of these models. This study leverages the non-parametric properties of KDE and the robustness in probability simulation to construct a new robust model for water shortage risk assessment based on copula functions. The model quantifies water shortage risk from the perspectives of both water shortage probability and loss rate. Through application in the Yuxi receiving area of the Central Yunnan Water Diversion Project, the robustness and accuracy of the model under various scenarios and data shapes are verified and analyzed. Additionally, seasonal characteristic analysis of historical water shortage risk, precipitation variability analysis under different scenarios, and assessment of the spatial distribution status and resilience of water receiving sub-regions are conducted. The conclusions drawn are consistent with the planning and design of the phase II supporting project of the Central Yunnan Water Diversion Project. The results of this research can provide data support and evaluation tools for the rational allocation of externally transferred water resources to Yuxi secondary receiving areas after the completion of the Central Yunnan Water Diversion Project. The constructed water shortage risk assessment model can be widely applied to the evaluation of water resource scarcity and rational water allocation in the entire receiving area of the Central Yunnan Water Diversion Project. It can also be applied to the quantitative research of other multivariate stochastic coupling risks, providing reference and guidance for other related research and applications.
The limitation of this study lies in its sole focus on enhancing the stability and accuracy of water scarcity risk assessment methods, without addressing the issue of water scarcity risk assessment in cross-basin water diversion projects under various coupled scenarios in the context of future climate change [80,81,82]. For instance, it did not explore the water scarcity risks in receiving areas under different water availability scenarios between the source and receiving areas, as well as the water scarcity risks under different proportions of external water diversion and local water sources. Furthermore, how to achieve optimal allocation of water resources under the premise of risk minimization based on water scarcity risk assessment results [57] is also an interesting topic worthy of further investigation. These will be key directions of future research.

Author Contributions

Conceptualization, T.Q. and Z.S.; methodology, T.Q.; software, T.Q.; validation, T.Q., J.C. (Jing Chen) and J.C. (Jinming Chen); formal analysis, T.Q. and W.X.; resources, S.G., L.W. and S.B.; data curation, S.B.; writing—original draft preparation, T.Q. and S.G.; project administration, Z.S.; funding acquisition, S.G.; writing—review and editing, T.Q., Z.S., S.G., W.X., J.C. (Jing Chen), J.C. (Jinming Chen), L.W. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the financial support from the Demonstration project of comprehensive government management and large-scale industrial application of the major special project of CHEOS (No. 89-Y50G31-9001-22/23-05).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank the National Meteorological Science Data Center for providing the precipitation data free of charge.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gudmundsson, L.; Boulange, J.; Do, H.X.; Gosling, S.N. Globally observed trends in mean and extreme river flow attributed to climate change. Science 2021, 371, 1159–1162. [Google Scholar] [CrossRef] [PubMed]
  2. Min, S.-K.; Zhang, X.; Zwiers, F.W.; Hegerl, G.C. Human contribution to more-intense precipitation extremes. Nature 2011, 470, 378–381. [Google Scholar] [CrossRef]
  3. Sara, S.A.; Abel, S.; Jaime, M.; Joaquín, A.; Javier, P.A. Risk assessment in water resources planning under climate change at the Júcar River basin. Hydrol. Earth Syst. Sci. 2020, 24, 5297–5315. [Google Scholar]
  4. UNCCD. Global Drought Snapshot 2023: The Need for Proactive Action; United Nations: New York, NY, USA, 2023. [Google Scholar]
  5. Maryam, S. Global water shortage and potable water safety; Today’s concern and tomorrow’s crisis. Environ. Int. 2021, 158, 106936. [Google Scholar]
  6. UNESCO. The United Nations World Water Development Report 2021: Valuing Water; United Nations: New York, NY, USA, 2021. [Google Scholar]
  7. Josefine, L.S.; Per, B. Differentiated vulnerabilities and capacities for adaptation to water shortage in Gaborone, Botswana. Int. J. Water Resour. Dev. 2021, 37, 278–299. [Google Scholar]
  8. Wang, H.; Qian, L.; Zhao, Z.; Wang, Y. Theory and assessment method of water resources risk. J. Hydraul. Eng. 2019, 50, 980–989. [Google Scholar] [CrossRef]
  9. Yang, P.; Zhang, S.; Xia, J.; Chen, Y.; Zhang, Y.; Cai, W.; Wang, W.; Wang, H.; Luo, X.; Chen, X. Risk assessment of water resource shortages in the Aksu River basin of northwest China under climate change. J. Environ. Manag. 2022, 305, 114394. [Google Scholar] [CrossRef]
  10. Jiang, Y. China’s water scarcity. J. Environ. Manag. 2009, 90, 3185–3196. [Google Scholar] [CrossRef]
  11. Li, J.; Li, L.; Liu, Y. Framework for Water Scarcity Assessment and Solution at Regional Scales: A Case Study in Beijing-Tianjin-Tangshan Region. Prog. Geogr. 2010, 29, 1041–1048. [Google Scholar]
  12. Jiang, Y. China’s water security: Current status, emerging challenges and future prospects. Environ. Sci. Policy 2015, 54, 106–125. [Google Scholar] [CrossRef]
  13. Müller, A.B.; Avellán, T.; Schanze, J. Risk and sustainability assessment framework for decision support in ‘water scarcity—Water reuse’ situations. J. Hydrol. 2020, 591, 125424. [Google Scholar] [CrossRef]
  14. Müller, A.B.; Avellán, T.; Schanze, J. Translating the ‘water scarcity—Water reuse’ situation into an information system for decision-making. Sustain. Sci. 2021, 17, 9–25. [Google Scholar] [CrossRef]
  15. Hashimoto, T.; Stedinger, J.; Loucks, P. Reliability, resiliency and vulnerability criteria for water resources system performance evaluation. Water Resour. Res. 1982, 18, 14–20. [Google Scholar] [CrossRef]
  16. Falkenmark, M.; Lundqvist, J.; Widstrand, C. Macro-scale water scarcity requires micro-scale approaches:aspects of vulnerability in semi-arid development. Nat. Resour. Forum. 1989, 13, 258–267. [Google Scholar] [CrossRef] [PubMed]
  17. Engelman, R.; LeRoy, P. Sustaining Water: Population and the Future of Renewable Water Supplies; Population Action International: Washington, DC, USA, 1993. [Google Scholar]
  18. OhIsson, L. Water conflicts and social resource scarcity. Phys. Chem. Earth Part B Hydrol. Ocean. Atmos. 2000, 25, 213–220. [Google Scholar] [CrossRef]
  19. Sullivan, C. Calculating a Water Poverty Index. World Dev. 2002, 30, 1195–1210. [Google Scholar] [CrossRef]
  20. Garriga, R.G.; Foguet, A.P. Improved Method to Calculate a Water Poverty Index at Local Scale. J. Environ. Eng. 2010, 136, 1287–1298. [Google Scholar] [CrossRef]
  21. Shalamzari, M.J.; Zhang, W. Assessing Water Scarcity Using the Water Poverty Index (WPI) in Golestan Province of Iran. Water 2018, 10, 1079. [Google Scholar] [CrossRef]
  22. Liu, L.N.; Liu, W.L. Risk assessment of drought-induced water scarcity in upper and middle reaches of Xiu River. IOP Conf. Ser. Earth Environ. Sci. 2017, 59, 012057. [Google Scholar] [CrossRef]
  23. Merabtene, T.; Kawamura, A.; Jinno, K.; Olsson, J. Risk assessment for optimal drought management of an integrated water resources system using a genetic algorithm. Hydrol. Process. 2002, 16, 2189–2208. [Google Scholar] [CrossRef]
  24. Gain, A.K.; Giupponi, C. A dynamic assessment of water scarcity risk in the Lower Brahmaputra River Basin: An integrated approach. Ecol. Indic. 2015, 48, 120–131. [Google Scholar] [CrossRef]
  25. Van Vliet, M.T.; Jones, E.R.; Flörke, M.; Franssen, W.H.; Hanasaki, N.; Wada, Y.; Yearsley, J.R. Global water scarcity including surface water quality and expansions of clean water technologies. Environ. Res. Lett. 2021, 16, 024020. [Google Scholar] [CrossRef]
  26. Lu, Z.; Kang, Z.; Li, W.; Huang, M. Research on water scarcity risk assessment in the Yellow River Basin. China Environ. Sci. 2024, 44, 1–11. [Google Scholar] [CrossRef]
  27. Ruan, B.; Han, Y.; Wang, H.; Jiang, R. Fuzzy comprehensive assessment of water shortage risk. J. Hydraul. Eng. 2005, 36, 906–912. [Google Scholar]
  28. Luo, J.; Xie, J.; Ruan, B. Fuzzy comprehensive assessment model for water shortage risk based on entropy weight. J. Hydraul. Eng. 2008, 39, 1092–1097+1104. [Google Scholar]
  29. Han, Y.; Wang, Y.; Xiaoming, F. Comprehensive Assessment of Regional Water Shortage Risk Based on the Maximum Entropy Principle. J. Anhui Agric. Sci. 2011, 39, 397–399. [Google Scholar]
  30. Yang, L.; Li, A.; Bai, H. Using Fuzzy Theory and Principal Component Analysis for Water Shortage Risk Assessment in Beijing, China. Energy Procedia 2011, 11, 2085–2092. [Google Scholar]
  31. Ling, Z.; Liu, R. Assessment of Regional Water Scarcity Risk in Guangdong Province Based on Principal Component Analysis. Resour. Sci. 2010, 32, 2324–2328. [Google Scholar]
  32. Li, J.; Cui, D.; Yuan, S. Evaluation of Water Resources Shortage Risk Based on Soccer League Competition Algorithm-Projection Pursuit-Cloud Model. J. China Hydrol. 2018, 38, 40–47. [Google Scholar]
  33. Xu, M. Study on the Risk Assessment Model of Water Resources Shortage—Taking Zhengzhou as an Example. Master’s Thesis, North China University of Water Resources and Electric Power, Zhengzhou, China, 2020. [Google Scholar]
  34. Hao, G.; Wang, X.; Luo, Y. Risk assessment of water shortage in Beijing based on an improved comprehensive evaluation model. Water Resour. Prot. 2017, 33, 27–31. [Google Scholar]
  35. Liao, Q.; Zhang, S.; Chen, J. Risk Assessment and Prediction of Water Shortages in Beijing. Resour. Sci. 2013, 35, 140–147. [Google Scholar]
  36. Wang, Y.; Wang, D.; Wu, J. A Variable Fuzzy Set Assessment Model for Water Shortage Risk: Two Case Studies from China. Hum. Ecol. Risk Assess. Int. J. 2011, 17, 631–645. [Google Scholar] [CrossRef]
  37. Feng, L.H.; Huang, C.F. A Risk Assessment Model of Water Shortage Based on Information Diffusion Technology and its Application in Analyzing Carrying Capacity of Water Resources. Water Resour. Manag. 2008, 22, 621–633. [Google Scholar] [CrossRef]
  38. Yan, F.; Xie, J.; Qin, T.; Ma, Z. Risk Evaluation of Water Shortage Based on Information Diffusion Theory. J. Xi’an Univ. Technol. 2011, 27, 285–289. [Google Scholar] [CrossRef]
  39. Du, X.; Feng, M.; Zhang, J. Research on risk assessment of water shortage based on improved information diffusion theory. Agric. Res. Arid Areas 2014, 32, 188–194. [Google Scholar]
  40. Gong, Y.; Liu, G.; Feng, L. Research on Similarity Cloud Evaluation Method for Water Scarcity Risk in Jiangsu Province. Resour. Environ. Yangtze Basin 2015, 24, 931–936. [Google Scholar]
  41. Jiang, Q.; Zhou, Z.; Wang, Z. Water Scarcity Risk Assessment and Optimization Based on the Coupling of Water and Soil Resources. Trans. Chin. Soc. Agric. Eng. 2017, 33, 136–143. [Google Scholar]
  42. Han, Y.; Ruan, B. Economic loss assessment of shortage risk of water resources. J. Hydraul. Eng. 2007, 10, 1253–1257. [Google Scholar]
  43. Han, Y.; Ruan, B.; Wang, D. Multi-objective risk decision-making model for regional water resources shortage. J. Hydraul. Eng. 2008, 39, 667–673. [Google Scholar]
  44. Qian, L.; Wang, Z.; Wang, H.; Deng, C. An improved method for predicting water shortage risk in the case of insufficient data and its application in Tianjin, China. J. Earth Syst. Sci. 2020, 129, 48. [Google Scholar] [CrossRef]
  45. Swain, S.S.; Mishra, A.; Sahoo, B.; Chatterjee, C. Water scarcity-risk assessment in data-scarce river basins under decadal climate change using a hydrological modelling approach. J. Hydrol. 2020, 590, 125260. [Google Scholar] [CrossRef]
  46. Qian, L.; Wang, H.; Zhang, K. Evaluation Criteria and Model for Risk Between Water Supply and Water Demand and its Application in Beijing. Water Resour. Manag. 2014, 28, 4433–4447. [Google Scholar] [CrossRef]
  47. Liu, X.; Wang, H.; Yu, S.; Ma, D.; Liang, Y.; Lai, W.; Gao, Y. Study on Water Resources Risk in Beijing after “South-North Water Transfer” Project. J. China Hydrol. 2015, 35, 55–61. [Google Scholar]
  48. Qian, L.; Zhang, R.; Wang, H.; Hong, M. A Water Resources Supply and Demand Risk Loss Model and Its Applications Based on Copula Functions. Syst. Eng.-Theory Pract. 2016, 36, 517–527. [Google Scholar]
  49. Qian, L.; Zhang, R.; Wang, H.; Wang, Y. Monthly Risk Assesment Model of Water Supply and Demand Based on Logistic Regression DEA and Its Application. J. Nat. Resour. 2016, 31, 177–186. [Google Scholar]
  50. Qian, L.; Wang, H.; Wang, Y.; Zhao, Z. Model for Water Shortage Risk Econimic Losses Based on M-Copula and Its Application. J. Appl. Basic Eng. Sci. 2022, 30, 907–917. [Google Scholar] [CrossRef]
  51. Zhang, D.-D.; Yan, D.-H.; Lu, F.; Wang, Y.-C.; Feng, J. Copula-based risk assessment of drought in Yunnan province, China. Nat. Hazards 2015, 75, 2199–2220. [Google Scholar] [CrossRef]
  52. Gu, S.; Zhao, Z.; Chen, J.; Chen, J.; Zhang, L. Daily reference evapotranspiration and meteorological drought forecast using high-dimensional Copula joint distribution model. Trans. Chin. Soc. Agric. Eng. 2020, 36, 143–151. [Google Scholar]
  53. Yang, X.; Li, Y.P.; Huang, G.H.; Li, Y.F.; Liu, Y.R.; Zhou, X. Development of a multi-GCMs Bayesian copula method for assessing multivariate drought risk under climate change: A case study of the Aral Sea basin. Catena 2022, 212, 106048. [Google Scholar] [CrossRef]
  54. Li, Z.; Shao, Q.; Tian, Q.; Zhang, L. Copula-based drought severity-area-frequency curve and its uncertainty, a case study of Heihe River basin, China. Hydrol. Res. 2020, 51, 867–881. [Google Scholar] [CrossRef]
  55. Zellou, B.; Rahali, H. Assessment of the joint impact of extreme rainfall and storm surge on the risk of flooding in a coastal area. J. Hydrol. 2019, 569, 647–665. [Google Scholar] [CrossRef]
  56. Bazrafshan, O.; Zamani, H.; Shekari, M.; Singh, V.P. Regional risk analysis and derivation of copula-based drought for severity-duration curve in arid and semi-arid regions. Theor. Appl. Climatol. 2020, 141, 889–905. [Google Scholar] [CrossRef]
  57. Gao, X.; Liu, Y.; Sun, B. Water shortage risk assessment considering large-scale regional transfers: A copula-based uncertainty case study in Lunan, China. Environ. Sci. Pollut. Res. Int. 2018, 25, 23328–23341. [Google Scholar] [CrossRef] [PubMed]
  58. Bevacqua, E.; Maraun, D.; Haff, I.H.; Widmann, M.; Vrac, M. Multivariate statistical modelling of compound events via pair-copula constructions: Analysis of floods in Ravenna (Italy). Hydrol. Earth Syst. Sci. 2017, 21, 2701–2723. [Google Scholar] [CrossRef]
  59. Liu, Z.; Guo, S.; Xu, X.; Xu, S.; Cheng, J. Application of Copula functions in hydrology and water resources: A state-of-the-art review. Adv. Water Sci. 2021, 32, 148–159. [Google Scholar] [CrossRef]
  60. Qian, L.; Zhang, R.; Wang, H.; Hong, M. A model for water shortage risk loss based on MEP and DEA and its application. J. Hydraul. Eng. 2015, 46, 1199–1206. [Google Scholar] [CrossRef]
  61. Qian, L.; Zhang, R.; Hong, M.; Wang, H.; Yang, L. A new multiple integral model for water shortage risk assessment and its application in Beijing, China. Nat. Hazards 2015, 80, 43–67. [Google Scholar] [CrossRef]
  62. Xiaoyu, B.; Hui, J.; Chen, L.; Lei, H. Joint probability distribution of coastal winds and waves using a log-transformed kernel density estimation and mixed copula approach. Ocean Eng. 2020, 216, 107937. [Google Scholar]
  63. François-Rémi, M.; Pierre-Yves, L. Towards a generic theoretical framework for pattern-based LUCC modeling: An accurate and powerful calibration–estimation method based on kernel density estimation. Environ. Model. Softw. 2022, 158, 105551. [Google Scholar]
  64. Wang, M.; Ying, F.; Nan, Q. Refined offshore wind speed prediction: Leveraging a two-layer decomposition technique, gated recurrent unit, and kernel density estimation for precise point and interval forecasts. Eng. Appl. Artif. Intell. 2024, 133, 108435. [Google Scholar] [CrossRef]
  65. Hashash, E.F.E.; Shiekh, R.H.A. A Comparison of the Pearson, Spearman Rank and Kendall Tau Correlation Coefficients Using Quantitative Variables. Asian J. Probab. Stat. 2022, 20, 36–48. [Google Scholar] [CrossRef]
  66. Jiang, J.; Zhang, X.; Yuan, Z. Feature selection for classification with Spearman’s rank correlation coefficient-based self-information in divergence-based fuzzy rough sets. Expert Syst. Appl. 2024, 249, 123633. [Google Scholar] [CrossRef]
  67. Zhao, Y. The Most Representative National Key Water Conservancy Project under Construction: Central Yunnan Water Diversion Project. Tunn. Constr. 2019, 39, 511–522. [Google Scholar]
  68. Harpole, J.K.; Woods, C.M.; Rodebaugh, T.L.; Levinson, C.A.; Lenze, E.J. How bandwidth selection algorithms impact exploratory data analysis using kernel density estimation. Psychol. Methods 2014, 19, 428–443. [Google Scholar] [CrossRef] [PubMed]
  69. Chen, J.; Gu, S.; Zhang, T. Synchronous-Asynchronous Encounter Probability Analysis of High-Low Runoff for Jinsha River, China, using Copulas*. MATEC Web Conf. 2018, 246, 01094. [Google Scholar]
  70. Espen, B.F.; Giulia, D.N.; Dennis, S. Copula measures and Sklar’s theorem in arbitrary dimensions. Scand. J. Stat. 2021, 49, 1144–1183. [Google Scholar]
  71. Massey, J.F.J. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
  72. Lin, X.; Fang, F. Variable selection of Kolmogorov-Smirnov maximization with a penalized surrogate loss. Comput. Stat. Data Anal. 2024, 195, 107944. [Google Scholar] [CrossRef]
  73. Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
  74. Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  75. Jin, Y.; Kuang, X.; Yan, H.; Wan, Y.; Wang, P. Studies on Distribution Characteristics and Variation Trend of the Regional Drought Events over Yunnan in Recent 55 Years. Meteorol. Mon. 2018, 44, 1169–1178. [Google Scholar]
  76. Mekonnen, M.M.; Hoekstra, A.Y. Four billion people facing severe water scarcity. Sci. Adv. 2016, 2, e1500323. [Google Scholar] [CrossRef]
  77. Salmivaara, A.; Porkka, M.; Kummu, M.; Keskinen, M.; Guillaume, J.H.A.; Varis, O. Exploring the Modifiable Areal Unit Problem in Spatial Water Assessments: A Case of Water Shortage in Monsoon Asia. Water 2015, 7, 898–917. [Google Scholar] [CrossRef]
  78. Veldkamp, T.I.E.; Wada, Y.; Aerts, J.C.J.H.; Ward, P.J. Towards a global water scarcity risk assessment framework: Incorporation of probability distributions and hydro-climatic variability. Environ. Res. Lett. 2016, 11, 024006. [Google Scholar] [CrossRef]
  79. Cordo, R.; Barros Ramalho, A.; Filho, B. Water shortage risk mapping: A GIS-MCDA approach for a medium-sized city in the Brazilian semi-arid region. Urban Water J. 2020, 17, 642–655. [Google Scholar] [CrossRef]
  80. Janssen, J.; Radić, V.; Ameli, A. Assessment of Future Risks of Seasonal Municipal Water Shortages Across North America. Front. Earth Sci. 2021, 9, 730631. [Google Scholar] [CrossRef]
  81. Zha, X.; Sun, H.; Jiang, H.; Cao, L.; Xue, J.; Gui, D.; Yan, D.; Tuo, Y. Coupling Bayesian Network and copula theory for water shortage assessment: A case study in source area of the South-to-North Water Division Project (SNWDP). J. Hydrol. 2023, 620, 129434. [Google Scholar] [CrossRef]
  82. Dehghani, S.; Bavani, A.M.; Roozbahani, A.; Sahin, O. Assessment of Climate Change-Induced Water Scarcity Risk by Using a Coupled System Dynamics and Bayesian Network Modeling Approaches. Water Resour. Manag. 2024. [Google Scholar] [CrossRef]
Figure 1. The algorithm flowchart of water shortage risk assessment model in this study.
Figure 1. The algorithm flowchart of water shortage risk assessment model in this study.
Water 16 01465 g001
Figure 2. Overview of the study area: (a) Location of study area and receiving water sub-regions; (b) river–lake hydrological system and topography.
Figure 2. Overview of the study area: (a) Location of study area and receiving water sub-regions; (b) river–lake hydrological system and topography.
Water 16 01465 g002
Figure 3. Annual supply and demand series in the water-receiving area (unit: 104×m3).
Figure 3. Annual supply and demand series in the water-receiving area (unit: 104×m3).
Water 16 01465 g003
Figure 4. Kernel density estimation with different bandwidths.
Figure 4. Kernel density estimation with different bandwidths.
Water 16 01465 g004
Figure 5. Bivariate copula Density function and distribution function: (a) Gaussian copula; (b) t copula; (c) Gumbel copula; (d) Clayton copula; (e) Frank copula.
Figure 5. Bivariate copula Density function and distribution function: (a) Gaussian copula; (b) t copula; (c) Gumbel copula; (d) Clayton copula; (e) Frank copula.
Water 16 01465 g005aWater 16 01465 g005b
Figure 6. KS p-values of water demand and supply sequences.
Figure 6. KS p-values of water demand and supply sequences.
Water 16 01465 g006aWater 16 01465 g006b
Figure 7. RMSE of water demand and supply sequences.
Figure 7. RMSE of water demand and supply sequences.
Water 16 01465 g007
Figure 8. The comparative graph of probability distributions fitted by different methods.
Figure 8. The comparative graph of probability distributions fitted by different methods.
Water 16 01465 g008aWater 16 01465 g008b
Figure 9. MSK of the bivariate copula with different method as the marginal distribution function.
Figure 9. MSK of the bivariate copula with different method as the marginal distribution function.
Water 16 01465 g009
Figure 10. SED between the empirical joint probability and the theoretical joint probability.
Figure 10. SED between the empirical joint probability and the theoretical joint probability.
Water 16 01465 g010
Figure 11. Seasonal characteristics of water shortage risk.
Figure 11. Seasonal characteristics of water shortage risk.
Water 16 01465 g011
Figure 12. Variation characteristics of water shortage risk for five typical years.
Figure 12. Variation characteristics of water shortage risk for five typical years.
Water 16 01465 g012
Figure 13. Spatial distribution characteristics of water shortage risk: the mean (a) and variance (b) of water shortage risks.
Figure 13. Spatial distribution characteristics of water shortage risk: the mean (a) and variance (b) of water shortage risks.
Water 16 01465 g013aWater 16 01465 g013b
Table 1. Binary copula connecting functions and parameter properties [52,69].
Table 1. Binary copula connecting functions and parameter properties [52,69].
Name of Copula C ( u , v ) Parameters
Gaussian Copula C u , v ; ρ = Φ 2 Φ 1 ( u ) , Φ 1 ( v ) ; ρ Φ and Φ 1 represent the CDF and the inverse CDF of the standard normal distribution, respectively Φ 2 denotes the CDF of the bivariate standard normal distribution ρ [ 1 , 1 ] is the correlation coefficient between variables x and y , reflecting the degree of linear association between x and y . ρ = 1 indicates perfect positive correlation, ρ = 1 indicates perfect negative correlation, and ρ = 0 indicates no correlation.
t Copula C u , v ; ρ , τ = t τ τ Φ 1 ( u ) , τ Φ 1 ( v ) ; ρ Φ and Φ 1 represent the CDF and the inverse CDF of the standard normal distribution, respectively. t τ denotes the CDF of the bivariate t-distribution with degrees of freedom τ , where τ typically takes the minimum degrees of freedom parameter that satisfies the dependence structure between the two variables.
Gumbel Copula C u , v ; θ = e x p l n u θ + l u v θ 1 θ θ > 1 determines the degree of dependence between two variables. As θ increases, the positive correlation between variables strengthens. Conversely, negative values of θ imply the presence of some form of negative correlation.
Clayton Copula C ( u , v ; θ ) = m a x u θ + v θ 1 , 0 1 θ θ > 1 determines the degree of dependence between two variables. As θ , the variables become nearly independent. As θ decreases, the positive correlation between variables strengthens, especially in the tail regions.
Frank Copula C ( u , v ; θ ) = 1 θ ln 1 + e θ u 1 e θ v 1 e θ 1 θ ( + ) determines the degree of dependence between two variables. As θ 0 , the variables become nearly independent. As | θ | increases, the strength of positive or negative correlation between variables strengthens.
Table 2. Classification of water shortage probability levels and mapping standards for threshold intervals.
Table 2. Classification of water shortage probability levels and mapping standards for threshold intervals.
Probabilistic Qualitative DescriptionProbability Range (%)Level MappingWarning Signal
Extremely low, almost impossible to occur, no warning[0.001, 0.1)1Green
Low, unlikely to occur, no warning[0.1, 0.2)2Blue
Medium, occasional occurrence, advisory[0.2, 0.3)3Yellow
High, likely to occur, warning[0.3, 0.5)4Orange
Extremely high, frequent occurrence, emergency[0.5, 1.0]5Red
Table 3. Classification of potential loss rate levels and mapping standards for threshold intervals.
Table 3. Classification of potential loss rate levels and mapping standards for threshold intervals.
Severity Description of Water ShortagePotential Loss Rate Range (%)Level MappingWarning Signal
Essentially No Scarcity: Indicates abundant water resources with no apparent supply issues.<51Green
Mild Scarcity: Implies slight water shortages, but not enough to significantly impact normal life and production.5–102Blue
Moderate Scarcity: Suggests water supply tension, potentially affecting certain areas or industries’ livelihoods and production.10–203Yellow
Severe Scarcity: Indicates severe water shortages that could impact normal life and production across large areas or multiple industries.20–404Orange
Critical Scarcity: Represents extremely scarce water resources, possibly resulting in interruptions, severe losses, or even threats to safety and life, production, and ecology.>405Red
Table 4. Mapping matrix of water shortage risk levels and criteria for warning level classification.
Table 4. Mapping matrix of water shortage risk levels and criteria for warning level classification.
Likelihood
(Probability Level)
Potential Loss Rate LevelWarning SignalQualitative Description of Safety Level
12345Color 1Warning Level
112345GreenNo warningLow risk
(R ≤ 6)
2246810BlueBlue warning
33691215YellowYellow warningMedium risk
(6 < R ≤ 12)
448121620OrangeOrange warning
5510152025RedRed warningHigh risk (R > 12)
Note: 1 Different colors represent distinct warning levels, while the same color signifies an equivalent warning level.
Table 5. Copula parameter estimation results.
Table 5. Copula parameter estimation results.
CopulaParameter NameParameter Values
Gaussian Copula ρ 10.9859
0.98591
t Copula ρ 10.9904
0.99041
τ 2.65
Gumbel Copula θ 12.8667
Clayton Copula θ 10.0259
Frank Copula θ 43.6217
Table 6. Statistical measures of KS p-values.
Table 6. Statistical measures of KS p-values.
SequenceStatistical MeasureGammaNormalLogisticPearson3KDE
Water DemandMean0.4760.4370.5180.560.57
Variance0.1280.1080.0920.1240.075
Rejection Rate of the Null Hypothesis (%)00.59000
Water SupplyMean0.3310.3360.5230.460.666
Variance0.120.1030.0910.1190.057
Rejection Rate of the Null Hypothesis (%)32.5421.303.5515.970
Table 7. Statistical measures of RMSE.
Table 7. Statistical measures of RMSE.
SequenceStatistical MeasureGammaNormalLogisticPearson3KDE
Water DemandMean0.0520.0560.0490.0460.046
Variance0.000370.000370.000250.000260.00017
Range0.0550.0560.0460.0450.038
Water SupplyMean0.0770.0650.0480.0580.039
Variance0.002400.000940.000340.000970.00011
Range0.1890.1340.1170.1390.037
Table 8. Statistical measures of MSK.
Table 8. Statistical measures of MSK.
Statistical MeasureGammaNormalLogisticPearson3KDE
Mean0.690.730.770.780.8
Variance0.0630.0440.0330.0260.022
Table 9. Statistical measures of SED.
Table 9. Statistical measures of SED.
Statistical MeasureGammaNormalLogisticPearson3KDE
Mean0.0980.0650.0430.0400.030
Variance0.01620.00600.00280.00190.0009
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qian, T.; Shi, Z.; Gu, S.; Xi, W.; Chen, J.; Chen, J.; Bai, S.; Wu, L. A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas. Water 2024, 16, 1465. https://doi.org/10.3390/w16111465

AMA Style

Qian T, Shi Z, Gu S, Xi W, Chen J, Chen J, Bai S, Wu L. A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas. Water. 2024; 16(11):1465. https://doi.org/10.3390/w16111465

Chicago/Turabian Style

Qian, Tanghui, Zhengtao Shi, Shixiang Gu, Wenfei Xi, Jing Chen, Jinming Chen, Shihan Bai, and Lei Wu. 2024. "A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas" Water 16, no. 11: 1465. https://doi.org/10.3390/w16111465

APA Style

Qian, T., Shi, Z., Gu, S., Xi, W., Chen, J., Chen, J., Bai, S., & Wu, L. (2024). A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas. Water, 16(11), 1465. https://doi.org/10.3390/w16111465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop