1. Introduction
It is still a major challenge to scientifically and efficiently manage the water resources in basins, which is primarily due to a lack of hydrological data. The International Association of Hydrological Sciences (IAHS) put forward hydrological projections for basins without data, which are predictions in ungauged basins (PUBs) [
1]. This is one of the key and difficult problems in hydrology research and has received extensive attention from many scholars [
2,
3,
4,
5]. In basins with no data or a lack of data, it is difficult to establish a plan for regional water resources.
However, there are only a certain number of gauging stations worldwide, and this number is decreasing as a result of funding reductions (i.e., not all streams are gauged). Generally, there are two methods for runoff process research in ungauged basin [
6]: One is to establish a watershed hydrological model that only contains physical parameters. Since the model parameters may be measured, they can be used directly in the areas without data. The alternative option is to build a watershed hydrological model with parameters that will be calibrated, in which the model parameters of regions with data are simulated and applied to regions without data [
7,
8].
FDCs can capture the properties of precipitation and runoff by building a statistical relationship between them, and they have been widely applied in various fields of hydrology [
9,
10,
11,
12]. FDCs are built in two main steps: ranking the streamflow data in descending order and plotting the sorted values against the corresponding frequency of exceedance [
13]. As a graphical representation of the relationship between flow frequency and flow rate can be simple and comprehensive, an altered graphic with which to describe the whole study period of runoff, from low water to flood characteristics of the traffic condition, can better reflect the basin rainfall runoff characteristics, and it also can be applied into water resources exploitation and protection.
A method with which to estimate the FDCs of ungauged sites based on distance measures that can be related to the catchment area and the climatic parameters has been established by many authors that addressed the topic of FDC prediction at ungauged or partially gauged locations through regional regression [
14,
15,
16,
17] and geostatistical interpolation [
18]. Spatial nonlinear interpolation methods were developed by several scholars [
19,
20,
21]. Worland et al. (2019) [
22] presented a method involving the use of the copula function. Hughes and Smakhtin (1996) [
23] proposed a method with which to extend or fill in daily flow time series at a site by using the monthly FDCs of a target site itself.
The Loess Plateau is located in the middle and upper reaches of the Yellow River, with fragile ecological environment and soil erosion [
6,
24]. With global climate change, the frequency of extreme climate events increases, and the risk of meteorological disasters intensifies. Furthermore, the rivers of the Loess Plateau are under the dual effects of global climate change and human activities; significant changes in flow processes have occurred, and the smallest watersheds are ungauged basins without observational flow data [
25]. The National Development and Reform Commission of the Ministry of Water Resources officially issued and implemented the 14th five-year implementation plan for the construction of warping dams in the Yellow River basin and the comprehensive control of soil and water loss in sloping farmland [
26], which intends to, over a period of five years, build 1461 warping dams and 2559 sand-blocking dams in the concentrated source area of coarse sediment. A large number of silt dams for soil and water conservation are constructed in small watersheds with no discharge data. If the FDCs of small watersheds can be constructed by regionalization of parameters, they will provide an important reference for the construction of warping dams in small watersheds.
The primary method used in the current study on FDCs is the construction of a trustworthy fitting function with which to infer the flow processes in an ungauged basin. The shapes of FDCs have strong regional differences. Preliminary studies on the Loess Plateau have shown that the low-flow part of the flow duration curve of the Yellow River basin will rapidly decrease and show an obvious S-shape. Blum et al. (2017) [
27] omitted intermittent sites with an average daily flow value of zero from their analysis because such intermittent sites require additional methodological considerations. Therefore, it is necessary to optimize the function form of the optimal FDC on the Loess Plateau and to construct a calculation formula of the parameters. Based on this, this study proposed constructing an FDC, analyzing the influential elements of the curve shape, and studying the variation in the parameters of the FDC on both the temporal and spatial scales. Due to the influence of both global climate change and human activities, important signatures of the hydrological processes of river basins have changed significantly, especially flow duration curves (FDCs). However, FDCs are difficult to quantitatively compare and differ between different basins due to climatic and basin characteristics. The shapes of the curves vary greatly between basins. For instance, the research finds that FDCs display an ‘L-shape’ in the Americas [
28]; however, the distribution runoff process in North China exhibits an ’S-shape’. With this being the case, the H2018 function [
29] has been put forward to describe the FDC of this study area. Therefore, it is necessary to study the variation of the FDC of this basin and obtain FDCs in ungauged regions via parametric analyses without discharge data. Additionally, the parametric formula of the FDC can be applied to validate the remote-sensing-based runoff data in ungauged basins.
The study is organized as follows.
Section 2 presents the study area and data.
Section 3 presents the methodology employed to fit FDCs at hydrological stations.
Section 4 presents the main results.
Section 5 discusses the results in a catchment.
Section 6 concludes the study and highlights the outlook for further research based on our finding.
3. Methods
Three functions, i.e., log normal function, generalized Pareto function and H2018 function, were selected to simulate FDC through four assessment indices, i.e., the Nash–Sutcliffe efficiency, the root mean square relative error, the logarithmic Nash efficiency coefficient, and the coefficient of determination. Additionally, a regression model was then used to construct the formulae of the parameters.
3.1. Log Normal Function
If the function Y = lnX of the random variable
x obeys the normal distribution
N(
μ,
σ2), the probability density function of the lognormal distribution of
x obeys the parameters
μ and
σ2 is said to be as follows:
where X is the random variable, and
x is the independent variable of the probability density function,
μ is the logarithmic mean parameter, and
σ is the logarithmic variance parameter, as well as the scale parameter.
3.2. Generalized Pareto Function
If X obeys the generalized Pareto function
, then its cumulative distribution function is as follows:
where
.
Its corresponding distribution density function (PDF) is as follows:
3.3. H2018 Function
Han and Tian [
29] proposed a new function of the form below:
where
m and
n are the parameters,
≤
≤
.
In flow duration curve fitting, when the frequency increases from 0 to 1, the flow gradually decreases, and the function in the original form needs to be transformed, such that the modified H2018 function form is as follows:
3.4. Evaluation Indices
In this study, the Nash–Sutcliffe efficiency (
NSE) [
17,
34,
35], the root mean square relative error (
RMSRE) [
36], the logarithmic Nash–Sutcliffe efficiency coefficient (
LNSE) and the coefficient of determination (
R2) [
37,
38,
39] were used to evaluate the applicability of different distribution functions at the hydrological station.
The Nash efficiency coefficient (
NSE) is an evaluation parameter used to evaluate model quality and is generally used to verify the quality of hydrological model simulation results. The
NSE is defined as follows:
where
refers to the observed value,
refers to the simulated value,
represents a certain value at time
t, and
represents the total average of the observed value.
The value of the NSE is from negative infinity to 1, and if the NSE is close to 1, this indicates good model quality and high model reliability. If the NSE is close to 0, this indicates that the simulation result is close to the average level of the observation value, that is, the overall result is reliable, but the process simulation error is large. If the NSE is much less than 0, the model is not credible.
The logarithmic Nash efficiency coefficient (
LNSE) is an evaluation parameter calculated by taking the logarithms of observed and simulated values. The specific calculation formula is as follows:
In the formula, refers to the logarithm of the observed value, refers to the logarithm of the simulated value, and refers to the logarithm result of the total average of the observed value.
The root mean square relative error, also known as the standard error, is the square root of the ratio between the sum of squares of the deviation between the observed value and the truth value and the number of observations,
m, which measures the deviation between the observed value and the truth value.
RMSRE is calculated as follows:
where
N is the number of samples,
is the observed value and
is the simulated value.
The coefficient of determination (R2) is the square of the Pearson correlation coefficient and is a kind of non-deterministic relation, and is a quantity used to study the degree of linear correlation between variables.
3.5. Multiple Regression
Regressive methods have been used to link the different model parameters to some catchment characteristics such as climatic indices, land coverage, and geological and geomorphological parameters. This analysis was performed in this study by dividing the island into 19 subzones. The regressive formula used in this study has the following structure [
40]:
where
y represents the parameters used in the function which has been chosen, parameters
k0 and
ki are determined through a regression analysis.
Multiple regression examines how multiple independent variables are related to one dependent variable. Once each of the independent factors has been determined to predict the dependent variable, the information on the multiple variables can be used to create an accurate prediction on the level of effect that they have on the outcome variable.
3.6. Mann–Kendall Test
The Mann–Kendall test, one of the non-parametric statistical test procedures, is frequently utilized by domestic and international researchers due to its simplicity and effective application; it is particularly useful for the investigation of hydrology, meteorology, and other non-normal distribution data. In this study, the absolute value of U is larger than 1.645, which means that it has passed the significant two-sided test of 0.1 when the Mann–Kendall test is applied for the trend test, and the reliability is 0.1 [
41,
42].
5. Discussion
The FDC is investigated in this section due to the impact of coal mining and additional influences on the flow of the Kuye River. The parameters of FDC are quite different from the other river basins in this region.
Figure 8,
Figure 9 and
Figure 10 indicate the temporal variations in three stations (Wenjiachuan, Wangdaohengta, and Shenmu) in the Kuye River. It can be easily found that both a and b have similar tendencies, and a Mann–Kendall test shows all obvious increases or decreases. However, the variation in parameter k is inconspicuous, and it has little difference with the other two stations in Wangdaohengta. Perhaps this was caused by the inconsistent geomorphology. The Wangdaohengta station, in particular, is situated in the upstream area and belongs to the sandy area, as opposed to the other two stations, which are situated in the Loess area. In addition, the annual runoff generally exhibited a downward tendency [
31], the mutational point appeared in 1996, and coal output also started to increase at that time. Additionally, the upstream rainfall erosion is the smallest erosion area.
Based on the general survey data of water conservation of the Yellow River Conservancy Committee in 2011, there were 306 key dams in the Kuye River watershed, with a total storage capacity of 316.64 Mm
3. Additionally, the majority of the dams were built in the mid-lower reaches. The regions that were at high risk of soil erosion and sediment yield were mainly concentrated in the middle reaches of the watershed [
51]. The influence of human activities will affect the regional runoff variations to a large extent.
Furthermore, the characteristic values of
a,
b, and
k at the Kuye River were also calculated (
Table 9). The results indicated that the values of a are all smaller than those at the Baijiachuan station, while those of
b and
k are both larger. These values may affect the formulation of parameters in
Section 4.3.
6. Conclusions
With the impact of human activity and climate change, runoff is continually changing, and the majority of the region’s minor watersheds are unmeasured, with insufficient flow data. Therefore, it is essential to develop formulas for FDCs that are widely applicable in the area and that may be expanded upon and used for research in a vast range of unmeasured areas. The study identifies the best function to fit the flow duration curve of a semi-arid region in North China, which is the H2018 function, and parameterizes the H2018 function of the FDCs of the region in order to estimate the FDCs of ungauged basins.
The generalized Pareto, H2018, and log-normal distribution functions are used to fit the flow duration curves of daily discharge at 19 stations in North China. The H2018 distribution function improves the tail and head fitting of the flow duration curve and has excellent performance to represent the flow duration curve, including zero-value discharge. The specific flows at percentiles were determined, and the ratio of the flow at different frequencies was calculated in order to better comprehend the changes of FDCs. As a result, the FDCs present an ‘S-shape’, in which the former and latter halves of the curve descend with large range changes, while the middle section has a smaller variation.
The parameters of a, b and k in H2018 were formulated with hydrometeorological features and basin characteristics, such as the annual runoff, precipitation, precipitation in summer, potential evapotranspiration, sub-basin area covered by hydrological stations, mean elevation, length of main channel, L, and max flow frequency xmax, by means of a regression model. The distributions of a, b, and k were all analyzed on both the spatial and temporal scales. Parameter a has an obvious descending trend, b has an obvious increasing trend, and k has an unobvious increasing trend. The regression formula constructed in this study can obtain a regional flow duration curve with satisfactory performance, which provides a reference for the validation of remote-sensing-based runoff data in ungauged regions.