Next Article in Journal
Evaluating Imputation Methods to Improve Prediction Accuracy for an HIV Study in Uganda
Previous Article in Journal
A Spatial–Temporal Bayesian Model for a Case-Crossover Design with Application to Extreme Heat and Claims Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two Types of Size-Biased Samples When Modeling Extreme Phenomena

by
Apostolos Batsidis
1,*,†,
George Tzavelas
2,† and
Polychronis Economou
3,†
1
Department of Mathematics, University of Ioannina, 45500 Ioannina, Greece
2
Department of Statistics and Insurance Science, University of Piraeus, 18534 Piraeus, Greece
3
Environmental Engineering Laboratory, Department of Civil Engineering, University of Patras, University Campus, 26500 Rio Achaia, Greece
*
Author to whom correspondence should be addressed.
All authors contributed equally to this work.
Stats 2024, 7(4), 1392-1404; https://doi.org/10.3390/stats7040081
Submission received: 30 September 2024 / Revised: 16 November 2024 / Accepted: 18 November 2024 / Published: 21 November 2024
(This article belongs to the Section Statistical Methods)

Abstract

:
The present research deals with two possible sources of bias that arise naturally from the selection procedure when modeling extreme phenomena. More specifically, the first type of bias arises when an r-size-biased sample from a set of maximum values is selected, while the second one occurs when a random sample of maxima is observed where each observation is obtained by a series of r-size-biased samples. The concept of weighted distributions is used, not only to describe both cases but also as an adjustment methodology. The differences between the two types of bias are discussed, while the impact of ignoring the bias on the estimation of the unknown parameters is revealed both theoretically and with the use of a simulation study, under the assumption that the parent distribution belongs to the Fréchet maximum domain of attraction. Finally, numerical results indicate that ignorance of the bias or misspecification of r results in inconsistent estimators.

1. Introduction

Sampling bias, also referred to as sample selection bias, is a bias introduced in a sample when individuals are selected in such a way that the observed sample cannot be considered as a representative sample of the population intended to be analyzed. Especially when modeling extreme phenomena in many scientific fields that involve, for example, environmental, meteorological, and biomedical data, e.g., [1,2,3] and references therein, the case of biased sampling is a rather common phenomenon. This may happen either by unintentionally applying a non-random sampling scheme or be due to the nature of the problem, e.g., [4,5]. Since the use of a biased sample from a specific population may cause serious problems if treated as a random one from that population, the accurate estimation of the parameters under biased sampling schemes is of major importance. When dealing with positive continuous random variables, after the pioneer works by [6,7], the concept of r-size-biased distributions have been used by several authors, not only to describe the biased sample but also as an adjustment methodology (e.g., [8,9,10] and the references therein) in cases where the sampling mechanism selects units with probability proportional to some measure of the unit size. In such cases, the observed biased sample X 1 ,…, X n from the positive continuous random variable X with probability density function (p.d.f.) f ( x ; θ ) , where θ is an unknown parameter, can be interpreted as a random sample from the corresponding r-size-biased distribution with p.d.f. given by
f r ( x ; θ ) = x r f ( x ; θ ) E ( X r ) ,
provided that E ( X r ) = μ r = 0 x r f ( x ; θ ) d x < , where E denotes expectation with respect to f ( x ; θ ) . The most common cases of r-size-biased distributions include the length-biased ( r = 1 ) and the area-biased ( r = 2 ) distributions.
In this paper, two different sources of bias that arise naturally from the selection procedure when dealing with extreme phenomena will be considered. The study is motivated by the sampling scheme described in [11], where it is assumed that the known information consists of a sample of n values, say X i , i = 1 , , n , each of which is the maximum of a sample of k independent random variables, say Y i j , j = 1 , , k , drawn from an exponential or normal distribution. In our study, we modify both the sampling mechanism and the choice of the distribution of the parent population.
To be more specific, the first source of bias occurs when the observed sample of length n is an r-size-biased sample from the population of maxima, where maxima have been obtained from random samples of an underlying parent population. Thus, in this case, we assume that X i , i = 1 , , n , is an r-size-biased sample obtained from a set of maxima Z i , i = 1 , , N , each of which is the maximum of a sample of k j independent random variables, say Y i j , j = 1 , , k j , drawn from an underlying parent distribution. On the other hand, the second type of data considers a different modification. Specifically, the known information consists of a sample of n values, say X i , i = 1 , , n , each of which is the maximum of an r-size-biased sample of size k j , say Y i j , i = 1 , , n , j = 1 , , k j , drawn from an underlying parent distribution. Thus, the observed sample of length n corresponds to the n maxima of n independent r-size-biased samples from an underlying parent population. In this frame, the concept of the r-size-biased distributions will be used, for the first time to the best of our knowledge, to describe and deal with the previously described types of possible sources of bias when modeling extreme phenomena.
Finally, related to the parent distribution, we assume that its cumulative distribution function belongs to the Fréchet maximum domain of attraction. Recall that, according to the fundamental theorem of extreme value theory (see [12,13]), three possible maximum domains of attraction exist, namely the Gumbel, the Weibull, and the Fréchet. As [14] mentioned, all distributions in the Gumbel domain, which contains well-known distributions such as the normal, the exponential, and the gamma, have the exponential as the limiting distribution of their tail, while the Fréchet domain contains distributions with an infinite yet heavier tail than the exponential. Moreover, the Weibull domain contains distributions with lighter tails than the exponential, which possess a finite upper bound (e.g., the uniform distribution). Taking into account that the Weibull domain of attraction consists of distribution functions with support bounded to the right and the fact that we deal with positive and continuous random variables, we easily conclude that the parent distribution cannot be in the Weibull domain of attraction. Moreover, to the best of our knowledge, the concept of r-size-biased distributions can only be used for dealing with size-biased samples in the case of distributions with support in the positive real line, which is not the case of the Gumbel distribution. On the other hand, every distribution function which belongs to the Fréchet domain has an infinite right endpoint, while at the same time several distributions belong to this domain. For these reasons, we assume in the whole study that the parent distribution belongs to the Fréchet maximum domain of attraction. This assumption implies that the concept of the r-size-biased Fréchet distribution, with r known, will be used in order to deal with the two types of bias. The r-size-biased Fréchet distribution has recently been introduced and studied in [15] and can be considered as an extension of the Fréchet distribution. For other extensions of the Fréchet distribution, we refer, among others, to [16,17] and references therein.
The rest of the paper is organized as follows. In Section 2, the first type of possible source of bias is described and the concept of the r-size-biased Fréchet distribution is used to deal with it. In Section 3, after describing the second type of possible source of bias, the results given in [18] will be utilized to obtain an accurate estimation of the unknown population parameters. At the end of both Section 2 and Section 3, the impact on the estimation of the unknown parameters when ignoring the bias or misspecifying the sampling mechanism is theoretically discussed, while Section 4 discusses the results of a simulation study performed for examining this impact. Finally, Section 5 concludes.
Before ending this section, some necessary notation is introduced: x ̲ = ( x 1 , , x n ) stands for the sample, x ¯ is the arithmetic mean of the sample, while d denotes convergence in distribution.

2. Bias to the Distribution of Maxima

This section deals with the first type of bias, which arises when a size-biased sample from a set of maximum values is selected.
Step 1:
Initially, N units are selected from a population and k j measurements in each one of them are recorded as related to the characteristic Y under study. We assume that Y is a positive and continuous random variable (r.v.), with cumulative distribution function (c.d.f.), say F Y , which belongs to the maximum domain of attraction of the Fréchet distribution, and k j is large. Equivalently, in this step we can say that we select N random and independent samples with k j units in each sample. Let Y j 1 , , Y j k j , j = 1 , , N be the r.v., which describes the measurements on the j-th unit or sample, with j = 1 , , N .
Step 2:
Next the maximum value of each one of the N units or samples is recorded. Let X j = m a x { Y j 1 , , Y j k j } , j = 1 , , N . Taking into account Theorem 3.3.7 in [19], since F Y belongs to the Fréchet domain and k j is large, the sample maximum X j of the random sample { Y j 1 , , Y j k j } , i.e., X j = m a x { Y j 1 , , Y j k j } , j = 1 , , N satisfies the following relation:
P X j δ k j y d e x p y β , y > 0 ,
where
δ k j = F Y 1 1 k j 1
i.e., δ k j is a normalizing constant, which depends on k j and is determined based on the generalized inverse of the c.d.f. F Y , called the quantile function. Thus, the distribution of each X j , j = 1 , , N is approximated by the Fréchet distribution, with shape parameter β > 0 and scale parameter δ k j .
Step 3:
An r-size-biased sample of length n is selected from the population of X 1 , …, X N , i.e., each unit of the population has the probability to be selected in the sample proportional to X r . Let, in order not to introduce more notation, X 1 , …, X n be the observed sample, which obviously does not coincide with the first n X i of the population.
It is obvious that if k 1 = k 2 = = k N = k , which is a usual assumption (see, for instance, the block maxima method), we have that the observed biased sample X 1 , …, X n from X can be interpreted as an r-size-biased sample from a distribution that is approximated by a Fréchet distribution with shape parameter β > 0 and scale parameter δ k . Equivalently, it can be interpreted as a random sample from a distribution that is approximated by the corresponding r-size-biased Fréchet distribution, which has recently been presented and studied by [15]. The p.d.f. of the r-size-biased Fréchet distribution is given by
f r ( x ; β , δ k ) = β δ k Γ ( 1 r β ) δ k x β r + 1 exp δ k x β , x > 0
where δ k > 0 and β > r 0 . In the sequel, any random variable with the p.d.f. given in (3) is denoted by F r ( β , δ k , r ) .
It is easily obtained that the log-likelihood function based on the observed sample is given by the following relation:
( x ̲ , β , δ k ) = n log β δ k Γ ( 1 r β ) + ( β r + 1 ) i = 1 n log δ k x i i = 1 n δ k x i β .
Then, according to [15], the maximum likelihood estimators of the unknown parameters of the r-size-biased Fréchet distribution, with r known, always exist, and they are unique. Specifically, the solution of K r ( β ) = 0 gives the MLE of β , say β ^ r , where
K r ( β ) = k 0 ( β ) + r k 1 ( β ) + k 2 ( β , r ) ,
and k 0 ( β ) , k 1 ( β ) , and k 2 ( β ) are defined as follows:
k 0 ( β ) = 1 β + 1 i = 1 n 1 x i β i = 1 n log ( x i ) x i β 1 n i = 1 n log ( x i ) ,
k 1 ( β ) = 1 β 1 i = 1 n 1 x i β i = 1 n log x i x i β 1 β 2 log 1 n i = 1 n x i β ,
and
k 2 ( β , r ) = r β 2 log 1 r β Ψ 1 r β ,
while the MLE of δ k , say δ ^ k , r , is given explicitly by the following equation:
δ ^ k , r = 1 r β ^ r 1 n i = 1 n 1 x i β ^ r 1 β ^ r .
Remark 1. 
Note that even if k j are not strictly equal, under the assumption that δ k l δ m a x k j 1 , which is fulfilled if the sample sizes k j are comparable large, we have that the observed sample can be considered as an r-size-biased sample from a distribution that is approximated by the Fréchet distribution with shape parameter β > 0 and scale parameter δ k = δ m a x k j .
It was previously assumed that F Y belongs to the maximum domain of attraction of the Fréchet distribution. This assumption, as explained in the introduction, is not at all restrictive. A list of positive and continuous distributions that belong to the Fréchet domain associated with the parameters β and δ k of the approximating Fréchet distribution, when a random sample of size k is available, is given in Table 1. After estimating the shape and scale parameters of the Fréchet distribution, which approximates the distribution of the maximum, it is possible to have an approximate estimator of the parameters of the parent distribution, taking into account the results given in Table 1. For instance, if the parent distribution is the Log-logistic distribution with parameters a and b, then it holds that a ^ = β ^ r and b ^ = δ ^ k , r ( k 1 ) 1 a ^ .
Remark 2. 
If we deal with phenomena that can be described by distributions with support bounded to the right and their parent distribution belongs to the Weibull domain of maximum attraction, then the r-size-biased Weibull distribution studied in [20] should be used instead of the r-size-biased Fréchet distribution.
It is obvious that, from a theoretical point of view, ignoring the bias or misspecifying the true value of r, say r 0 , implies that the MLEs are obtained as a solution of a different system of equations. This impact will be numerically studied in Section 4.

3. Maxima of Biased Samples

This section deals with the second type of bias, which arises when a random sample of maxima from a series of r-size-biased samples are observed.
Step 1:
Let us consider a positive and continuous random variable Y with p.d.f. f Y and c.d.f. F Y , which belongs to the maximum domain of attraction of the Fréchet distribution.
Step 2:
Let Y j 1 , , Y j k j , j = 1 , , n be n independent r-size-biased samples, with k j large.
Step 3:
Next, the maximum value of each one of the n samples is recorded. Let X j = m a x { Y j 1 , , Y j k j } , j = 1 , , n .
Step 4:
X 1 ,…, X n comprise our observed sample, i.e., it is a random sample of n maxima obtained from respective independent r-size-biased samples from Y with p.d.f. f Y .
Notice that, in this case, the result given in (2) related to the convergence in distribution of X j is not valid, since Y j 1 , , Y j k j is not a random sample but an r-size-biased one. However, based on the results given in [18], it holds that if the parent distribution belongs to the Fréchet domain with index β then the corresponding r-size-biased distribution also belongs to the Fréchet domain with index β * = β r , provided that β * > 0 .
Thus, in this case for the sample maximum X j of the r-size-biased sample m a x { Y j 1 , , Y j k j } , it holds that
P X j δ k j , r * y d e x p y β * , y > 0 ,
where δ k j , r * = F r 1 ( 1 k j 1 ) , with F r being the c.d.f. of the r-size-biased distribution associate to Y.
It is obvious that if k 1 = k 2 = = k n = k then the observed biased sample X 1 , …, X n is a random sample from a distribution that is approximated by a Fréchet distribution with shape parameter β * = β r > 0 and scale parameter δ k , r * = F r 1 ( 1 k 1 ) .
Remark 3. 
Note that, even if k j are not strictly equal, under the assumption that δ k l δ m a x k j 1 , which once again is fulfilled when the sample sizes k j are comparably large, we could assume that X 1 , …, X n is a random sample from a distribution that is approximated by a Fréchet distribution with shape parameter β * > 0 and scale parameter δ k , r * = δ m a x k j , r * .
Based on the previous analysis, we conclude that by using the observed sample X 1 , , X n , the estimates of the parameters β * > 0 and δ k , r * = σ > 0 can be obtained by maximizing
n log ( σ ) + n log ( β * ) β * + 1 i = 1 n log x i σ i = 1 n x i σ β * ,
provided that x i > 0 , i = 1 , , n .
Remark 4. 
In [18], sufficient conditions under which the weighted and parent distributions belong to the same domain were given, while at the same time the relation that determines the shape parameters of the limiting distributions was presented. Under an r-size-biased sampling, the results stated in [18] are given without any other conditions or restrictions for the Fréchet domain, while for the Gumbel and Weibull it is necessary to determine a specific form of the parent distribution in order to examine the closure property of the domain of attractors. This is one more reason, in addition to that given in the previous section, for restricting to the Fréchet maximum domain of attraction.
In the sequel, we discuss the impact of ignoring the bias or misspecifying the true value of r, say r 0 . From (10), we conclude that in any case the same function is maximized. However, when { Y j 1 , , Y j k } is a random sample, β ^ estimates the index parameter β 0 , which, for some specific cases of distributions that belong to the Fréchet domain, is given in Table 1, while δ ^ estimates the parameter δ k , as also given in the aforementioned table. Note that δ k is determined as the solution of the equation F Y ( y ) = 1 k 1 . On the other hand, when { Y j 1 , , Y j k } is an r 0 -size-biased sample, β ^ estimates the index parameter of the r-size-biased version of the random variable Y, which equals β * = β r 0 , while δ ^ estimates the parameter δ k , r 0 , which is determined as the solution of the equation F Y ( r 0 ) ( x ) = 1 k 1 . Taking into account that, according to [21], it holds that
F Y ( r 0 ) ( y ) = 1 E Y r 0 | Y y E ( Y r 0 ) 1 F ( y ) ,
we have, after some simple algebra, that δ k , r 0 is determined as the solution of the equation
E Y r 0 | Y y 1 F ( y ) = k 1 E ( Y r 0 ) .
Obviously, for r 0 = 0 , i.e., the case of a random sample, we have that β * = β , while δ k , 0 obtained from (11) with r = 0 coincides with δ k , as previously mentioned.
Based on the previous analysis, it is obvious that ignoring the bias or misspecifying the value of r 0 affects the estimation of the index parameter, which in turn affects the estimation of the mean value, the variance, the quantiles, etc. For further details, we refer to [18].

4. Numerical Experiments

In this section, the impact of ignoring the bias or specifying incorrectly the true value of r, say r 0 , is studied by means of an extensive simulation study. The simulation analysis was performed using the free license R language [22], and the code utilized is available from the authors upon request. Also, note that the study is performed under the first type of bias source, while for the second type of bias we refer to [18]. Finally, without any loss of generality, in the simulation study we consider that the parent distribution that belongs to the Fréchet maximum domain of attraction is the Log-logistic distribution (also known as the Fisk distribution in economics). Log-logistic is a continuous probability distribution for a non-negative random variable, similar in shape to the log-normal distribution but with heavier tails, which finds many applications, for instance, in survival analysis as a parametric model for the lifetime of an organism whose mortality rate increases initially and decreases later, in hydrology to model stream flow and precipitation, and in economics as a simple model of the distribution of wealth or income (see, among others, [23,24,25] and references therein). A random variable Y has a Log-logistic distribution with shape parameter a > 0 and scale parameter b > 0 , denoted as L L ( a , b ) if its p.d.f. is of the form
f y ; a , b = a / b y / b a 1 1 + y / b a 2 , a > 0 , b > 0 , y 0 .
Note that, taking into account the results given in Table 1 (see the discussion there), it holds that the distribution of the maximum of sample of size k from this distribution is approximated, for large k, by a Fréchet distribution with shape parameter β = a and scale parameter δ = b ( k 1 ) 1 a .
In the sequel, the simulation scenario, which takes into account Steps 1–3 of Section 2, is described in detail.
Step 1:
For j = 1 , , N , with N = 10,000 , we generate a sample of size k = 30 , 50 , 100 , Y j 1 ,…, Y j k , j = 1 , , N , from the Log-logistic distribution with shape parameter a > 0 and scale parameter b > 0 , denoted as L L ( a , b ) , where ( a , b ) { ( 4 , 2 ) , ( 4 , 3 ) , ( 5 , 2 ) , ( 5 , 3 ) } .
Step 2:
Next, the maximum value of each one of the N = 10,000 samples of size k is recorded. Let Z j = m a x { Y j 1 , , Y j k j } , j = 1 , , N . The set of these values comprise the population of maxima.
Step 3:
An r 0 -size-biased sample, r 0 = 1 , 2 , 3 , of length n, n = 30 , 50 , 100 , is selected from the population of maxima. Let X 1 ,…, X n be the observed sample.
Step 4:
The values β ^ r and δ ^ r , obtained by fitting the r-size-biased Fréchet for r = 0 , 1 , 2 , 3 and r r 0 , are computed.
Step 5:
Steps 1–3 are repeated 10,000 times (simulation runs). Thus, 10,000 values of β ^ r and δ ^ r , for r = 0 , 1 , 2 , 3 and r r 0 , are obtained, with β ^ r 0 and δ ^ r 0 being the appropriate MLEs under this scenario.
Step 6:
The mean of the 10,000 estimators β ^ r and δ ^ r , denoted by mean.br and mean.dr, respectively, as well as their standard deviations, denoted by sd.br and sd.dr, respectively, are computed for r = 0 , 1 , 2 , 3 and r r 0 .
Following the previous steps, using the results given in Table 1 and the discussion of Section 2 we have that the r 0 -size-biased observed sample X 1 , , X n can be considered as a random sample from a distribution that is approximated by the r 0 -size-biased Fréchet distribution with parameters β = a and δ = b ( k 1 ) 1 a , under the further assumption that β > r 0 . Thus, we evaluate the impact of ignoring the bias or specifying incorrectly the true value of r, say r 0 , by examining the closeness of mean.br and mean.dr to the true values β = a and δ k = b ( k 1 ) 1 a . The results of this numerical experiment are given in Table A1, Table A2, Table A3 and Table A4 of Appendix A. For convenience, in each row the values of mean.br and mean.dr that correspond to the correct value of r 0 are indicated with boldface, while the true values β = a and δ k = b ( k 1 ) 1 a of the Fréchet distribution that approximates the distribution of X j are given in the last column for each different combination of k and ( a , b ) .
From the results given in Table A1, Table A2, Table A3 and Table A4 in Appendix A, it is obvious that the mean value of the MLEs that corresponds to the correctly specified r = r 0 approach the true parameter values of the approximated Fréchet distribution, while ignoring the bias or misspecifying the value of r 0 affects the results, and the estimation is inconsistent in many cases.
Remark 5. 
In the previous simulation scenario, it was assumed that Y, which belongs to the Fréchet domain, follows a Log-logistic distribution. If Y follows another distribution in the Fréchet domain, the previous analysis can be performed following a similar procedure. The only difference lies in the parameters of the approximated Fréchet distribution, which can be computed using the respective results given in Table 1.

5. Conclusions

This paper deals with two different source of bias, which appear when modeling extreme phenomena, i.e., the bias to the distribution of maxima and the maxima of a series of r-size-biased samples. Since the use of a biased sample from a specific population may cause serious problems if treated as a random one from that population, this paper utilizes the concept of r-size-biased distributions not only to describe both cases but also as an adjustment methodology. From the theoretical and simulation results, it was concluded that ignoring the bias or misspecifying the value of r affects the estimation of the unknown parameters. Despite the fact that the derivation for both types of bias was done under the non-restrictive assumption that the c.d.f. of the parent population belongs to the maximum domain of attraction of the Fréchet distribution, it is an open problem to extend this work when the parent distribution belongs to another maximum domain of attraction.

Author Contributions

Conceptualization, A.B., G.T. and P.E.; methodology, A.B., G.T. and P.E.; software, A.B., G.T. and P.E.; validation, A.B., G.T. and P.E.; formal analysis, A.B., G.T., and P.E.; resources, A.B., G.T. and P.E.; data curation, A.B., G.T. and P.E.; writing—original draft preparation, A.B., G.T. and P.E.; writing—review and editing, A.B., G.T. and P.E. All authors have read and agreed to the published version of the manuscript.

Funding

Apostolos Batsidis acknowledges the support of this work by the project Establishment of Capacity Building Infrastructures in Biomedical Research (BIOMED-20) (MIS 5047236), which was implemented under the Action Reinforcement of the Research and Innovation Infrastructure, funded by the Operational Programme Competitiveness, Entrepreneurship, and Innovation (NSRF 2014-2020) and co-financed by Greece and the European Union (European Regional Development Fund).

Data Availability Statement

Data are contained within the article.

Acknowledgments

Authors express their gratitude to the three anonymous referees for their valuable comments and remarks that improved the paper.

Conflicts of Interest

No conflicts of interest exist in the submission of this manuscript, and the manuscript has been approved by all authors for publication.

Appendix A

In the following tables, the results of the numerical experiments described in Section 4 are provided.
Table A1. Mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 4 , 2 ) .
Table A1. Mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 4 , 2 ) .
kn r 0 mean.d0sd.d0mean.b0sd.b0mean.d1sd.d1mean.b1sd.b1mean.d2sd.d2mean.b2sd.b2mean.d3sd.d3mean.b3sd.b3 ( δ k , β )
303015.1460.3093.4320.5194.6760.2694.0990.512 (4.641192, 4)
50 5.1310.2263.3980.4074.6640.1974.0630.401
100 5.1310.1673.3420.2704.6570.1424.0040.265
5030 5.8700.3583.4540.5155.3410.2954.1210.509 (5.291503, 4)
50 5.8600.2743.3910.3975.3250.2294.0560.391
100 5.8420.1853.3610.2835.3090.1594.0240.278
10030 6.9780.4043.4910.5436.3600.3424.1590.536 (6.308684, 4)
50 6.9720.3153.4270.4006.3460.2684.0920.394
100 6.9560.2143.3750.2836.3260.1774.0380.279
303026.0180.4672.6660.4235.1770.3463.3530.4114.7240.3294.1390.392 (4.641192, 4)
50 5.9970.3432.6250.3145.1570.2693.3100.3064.6970.2624.0940.291
100 5.9420.2522.6320.2105.1250.1953.3150.2044.6720.1854.0960.194
5030 6.8450.5362.7030.4395.9090.3973.3900.4285.4000.3754.1750.409 (5.291503, 4)
50 6.8070.4002.6420.3245.8630.3043.3280.3165.3460.2924.1120.300
100 6.7720.2642.6290.2125.8400.2043.3120.2055.3220.1984.0930.195
10030 8.1400.6442.6900.4277.0170.4943.3760.4166.4070.4724.1600.396 (6.308684, 4)
50 8.0990.4562.6530.3156.9870.3513.3400.3076.3760.3394.1240.293
100 8.0540.3212.6490.2166.9600.2463.3330.2106.3510.2364.1140.199
303037.7650.7681.9960.2676.0610.5322.7020.2575.3160.4863.5250.2394.9300.4644.4150.224(4.641192, 4)
50 7.5990.5662.0220.2105.9820.4032.7240.2035.2500.3743.5410.1914.8630.3614.4260.180
100 7.3650.3762.1230.1535.9200.2762.8160.1495.2260.2573.6230.1414.8450.2504.4980.133
5030 8.8540.8881.9930.2856.9010.6202.7010.2766.0520.5673.5250.2605.6140.5434.4160.245(5.291503, 4)
50 8.6120.6422.0330.2156.7940.4542.7340.2085.9690.4183.5520.1955.5330.4024.4360.184
100 8.3880.4302.1090.1506.7270.3082.8030.1465.9340.2833.6110.1375.5010.2724.4880.129
10030 10.5201.0561.9850.2618.1910.7122.6920.2537.1770.6503.5160.2376.6540.6254.4070.223(6.308684, 4)
50 10.2590.7602.0320.2118.0940.5442.7330.2047.1100.4993.5500.1916.5890.4784.4350.180
100 9.9450.5192.1270.1598.0010.3822.8210.1547.0680.3553.6280.1456.5570.3424.5040.136
Table A2. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 4 , 3 ) .
Table A2. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 4 , 3 ) .
kn r 0 mean.d0sd.d0mean.b0sd.b0mean.d1sd.d1mean.b1sd.b1mean.d2sd.d2mean.b2sd.b2mean.d3sd.d3mean.b3sd.b3 ( δ k , β )
303017.7300.4573.4850.5477.0420.3894.1520.540 (6.961787, 4)
3050 7.7280.3703.3940.4177.0200.3134.0570.413
30100 7.7080.2513.3410.2826.9960.2174.0030.278
5030 8.7950.5193.5060.5508.0180.4394.1710.543 (7.937254, 4)
5050 8.7790.3833.3990.3927.9820.3274.0650.388
50100 8.7730.2813.3570.2697.9700.2394.0200.266
10030 10.4680.6163.4820.5559.5340.5214.1500.548 (9.463026, 4)
10050 10.4760.4743.4190.4199.5300.3894.0830.414
100100 10.4390.3333.3770.2699.4960.2824.0400.266
303029.0070.6712.6760.4327.7550.5013.3640.4207.0790.4824.1500.399 (6.961787, 4)
3050 8.9810.4962.6340.3147.7310.3833.3200.3057.0460.3734.1030.291
30100 8.9380.3692.6300.2217.7050.2673.3130.2157.0210.2524.0940.205
5030 10.3530.8162.6430.4208.8860.6083.3310.4088.0980.5774.1170.387 (7.937254, 4)
5050 10.2150.5932.6260.3138.7860.4533.3130.3048.0080.4374.0980.289
50100 10.1540.4022.6440.2198.7690.3083.3280.2147.9970.3004.1080.204
10030 12.2270.9352.6830.40810.5460.7213.3730.3989.6360.6884.1590.380 (9.463026, 4)
10050 12.1580.7072.6590.30710.4940.5423.3460.2999.5800.5174.1300.285
100100 12.0660.4872.6560.22310.4310.3733.3390.2179.5180.3584.1190.206
3030311.6791.2241.9880.2999.0780.8412.6950.2887.9540.7753.5190.2697.3770.7444.4100.251(6.961787, 4)
3050 11.3790.8912.0260.2188.9590.6132.7270.2117.8630.5573.5440.1987.2840.5344.4290.186
30100 11.0110.5642.1140.1548.8400.4252.8080.1497.8000.3973.6160.1417.2300.3864.4910.133
5030 13.2651.3551.9980.29810.3440.9552.7050.2899.0750.8873.5300.2728.4220.8554.4200.256(7.937254, 4)
5050 12.9010.9832.0280.22110.1660.6892.7310.2158.9310.6363.5490.2028.2810.6154.4350.191
50100 12.5080.6472.1220.15210.0530.4622.8150.1488.8740.4273.6220.1408.2280.4144.4980.133
10030 15.7261.5722.0090.29212.2951.0842.7160.28310.7950.9923.5400.26510.0210.9504.4290.249(9.463026, 4)
10050 15.4971.2832.0310.21712.2160.8822.7340.21210.7320.7993.5510.1999.9490.7664.4370.188
100100 14.9050.7722.1260.14611.9930.5552.8200.14310.5920.5083.6270.1359.8240.4904.5030.128
Table A3. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 5 , 2 ) .
Table A3. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 5 , 2 ) .
kn r 0 mean.d0sd.d0mean.b0sd.b0mean.d1sd.d1mean.b1sd.b1mean.d2sd.d2mean.b2sd.b2mean.d3sd.d3mean.b3sd.b3 ( δ k , β )
303014.1800.1874.5230.6843.9460.1665.1750.678 (3.922018, 5)
3050 4.1810.1464.3790.4953.9390.1295.0320.491
30100 4.1730.1034.3420.3513.9310.0924.9920.348
5030 4.6480.2114.5330.7094.3880.1875.1890.703 (4.355813, 5)
5050 4.6350.1614.4260.5294.3710.1405.0800.525
50100 4.6290.1104.3600.3464.3630.0985.0120.344
10030 5.3520.2454.5220.6805.0530.2145.1780.676 (5.013685, 5)
10050 5.3320.1824.4750.5085.0360.1615.1270.503
100100 5.3170.1264.4010.3645.0160.1105.0510.359
303024.5640.2523.7420.6014.2060.2134.4100.5913.9710.2075.1550.573 (3.922018, 5)
3050 4.5460.1863.6650.4294.1850.1534.3300.4233.9460.1485.0730.411
30100 4.5290.1313.6380.2954.1700.1074.3010.2913.9320.1035.0420.283
5030 5.0500.2633.7380.5964.6530.2224.4080.5874.3940.2205.1550.570 (4.355813, 5)
5050 5.0460.2053.6910.4454.6500.1724.3580.4374.3890.1685.1030.424
50100 5.0250.1433.6680.3074.6330.1184.3320.3024.3710.1155.0740.293
10030 5.8080.3283.7240.5805.3480.2654.3920.5715.0470.2535.1380.554 (5.013685, 5)
10050 5.7830.2403.7080.4455.3340.1974.3760.4395.0360.1915.1200.426
100100 5.7820.1673.6760.3045.3340.1404.3410.3005.0350.1365.0830.292
303035.1640.3532.9650.4724.5590.2683.6500.4624.2110.2564.4250.4443.9980.2545.2650.423(3.922018, 5)
3050 5.1590.2672.9280.3494.5570.2093.6120.3424.2080.2024.3870.3283.9910.2015.2270.313
30100 5.0960.1902.9810.2414.5270.1473.6590.2354.1860.1384.4270.2253.9710.1365.2590.214
5030 5.7590.4032.9610.4495.0870.3113.6470.4394.7000.2964.4240.4214.4630.2915.2650.401(4.355813, 5)
5050 5.6920.3062.9520.3535.0360.2333.6340.3454.6520.2214.4070.3314.4140.2185.2450.315
50100 5.6370.1992.9990.2485.0140.1553.6780.2434.6400.1484.4450.2334.4040.1485.2770.222
10030 6.5810.4452.9680.4465.8170.3433.6540.4365.3760.3274.4300.4185.1050.3225.2700.399(5.013685, 5)
10050 6.5770.3472.9180.3345.8060.2683.6010.3265.3570.2534.3750.3125.0790.2495.2140.297
100100 6.4970.2292.9890.2335.7760.1843.6680.2275.3440.1774.4370.2175.0720.1755.2700.207
Table A4. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 5 , 3 ) .
Table A4. New mean values (mean.br, mean.dr) and standard deviations (sd.br, sd.dr) of the estimators β ^ r and δ ^ r , r = 0 , 1 , 2 , 3 when 10,000 r 0 = 1 , 2 , 3 size-biased samples of sample size n, r 0 r , are selected from a population of N = 10,000 maxima obtained from respective random samples of size k from a L L ( 5 , 3 ) .
kn r 0 mean.d0sd.d0mean.b0sd.b0mean.d1sd.d1mean.b1sd.b1mean.d2sd.d2mean.b2sd.b2mean.d3sd.d3mean.b3sd.b3 ( δ k , β )
303016.2690.2864.5230.7075.9170.2535.1760.702 (5.883027, 5)
3050 6.2750.2284.3860.4915.9130.2015.0370.486
30100 6.2650.1494.3460.3605.9020.1334.9950.358
5030 6.9780.3144.5280.6916.5880.2775.1830.686 (6.533719, 5)
5050 6.9620.2394.4390.4966.5690.2125.0910.492
50100 6.9430.1714.3920.3736.5490.1515.0430.371
10030 8.0110.3614.5430.7017.5670.3175.1990.695 (7.520527, 5)
10050 7.9990.2684.4740.5347.5540.2395.1280.531
100100 7.9830.1944.4040.3707.5320.1715.0550.367
303026.8460.3733.7290.5946.3040.3064.3980.5875.9490.3005.1430.571 (5.883027, 5)
3050 6.8370.2853.6630.4226.2940.2374.3280.4165.9340.2315.0710.404
30100 6.8000.1953.6390.2976.2630.1664.3030.2935.9040.1645.0430.284
5030 7.5730.4113.7780.6146.9880.3424.4470.6076.6020.3335.1910.591 (6.533719, 5)
5050 7.5530.3153.7100.4546.9660.2674.3770.4476.5760.2645.1200.434
50100 7.5380.2213.6460.3006.9450.1834.3110.2956.5490.1785.0530.286
10030 8.6940.4663.7520.6088.0150.3794.4210.6007.5690.3665.1670.585 (7.520527, 5)
10050 8.7080.3493.6670.4158.0210.2964.3350.4087.5670.2905.0800.395
100100 8.6740.2503.6660.3117.9980.2064.3320.3067.5470.2005.0740.296
303037.7990.5702.9440.4596.8760.4313.6290.4506.3470.4034.4050.4326.0220.3955.2460.413(5.883027, 5)
3050 7.7090.3902.9450.3326.8210.3023.6280.3246.3010.2904.4010.3105.9780.2885.2390.294
30100 7.6410.2662.9860.2396.7900.2113.6640.2346.2790.2034.4300.2255.9560.2035.2620.215
5030 8.6190.6052.9390.4577.6000.4643.6250.4477.0160.4394.4030.4286.6590.4335.2450.409(6.533719, 5)
5050 8.5370.4412.9380.3307.5490.3353.6210.3226.9720.3154.3950.3086.6140.3095.2330.294
50100 8.4520.31430.2447.5180.2453.6780.2406.9570.2334.4450.2316.6010.2305.2770.221
10030 9.8770.6502.9530.4488.7220.5043.6400.4388.0580.4844.4170.4207.6510.4795.2590.400(7.520527, 5)
10050 9.8090.5352.9560.3498.6820.4003.6380.3428.0210.3734.4110.3287.6110.3665.2480.313
100100 9.7230.3472.9810.2428.6380.2653.6610.2377.9900.2504.4290.2277.5810.2465.2620.216

References

  1. Simon, R. Length biased sampling in etiologic studies. Am. J. Epidemiol. 1980, 111, 444–452. [Google Scholar] [CrossRef] [PubMed]
  2. Zelen, M.; Feinleib, M. On the theory of screening for chronic diseases. Biometrika 1969, 56, 601–614. [Google Scholar] [CrossRef]
  3. Liu, J.; Wang, L.; Tripathi, Y.; Lio, Y. Inference of Constant-Stress Model of Fréchet Distribution under a Maximum Ranked Set Sampling with Unequal Samples. Axioms 2024, 13, 394. [Google Scholar] [CrossRef]
  4. Patil, G.P.; Rao, C.R. Weighted distributions and size-biased sampling with applications to wildlife populations and human families. Biometrics 1978, 34, 179–189. [Google Scholar] [CrossRef]
  5. Mudasir, S.; Ahmad, S. Parameter Estimation of the Weighted Generalized Inverse Weibull Distribution. J. Stat. Theory Appl. 2021, 20, 395–406. [Google Scholar] [CrossRef]
  6. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen. 1934, 6, 13–25. [Google Scholar] [CrossRef]
  7. Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
  8. Tzavelas, G.; Economou, P. On the consequences of model misspecification for biased samples from the Weibull distribution. In Proceedings of the Statistics and Simulation: IWS 8, Vienna, Austria, 8 September 2015; Springer: Berlin/Heidelberg, Germany, 2018; pp. 357–369. [Google Scholar]
  9. Tzavelas, G.; Douli, M.; Economou, P. Model misspecification effects for biased samples. Metrika 2017, 80, 171–185. [Google Scholar] [CrossRef]
  10. Economou, P.; Tzavelas, G.; Batsidis, A. Robust inference under r-size-biased sampling without replacement from finite population. J. Appl. Stat. 2020, 47, 2808–2824. [Google Scholar] [CrossRef] [PubMed]
  11. Capaldi, A.; Kolba, T.N. Using the sample maximum to estimate the parameters of the underlying distribution. PLoS ONE 2019, 14, e0215529. [Google Scholar] [CrossRef] [PubMed]
  12. Gnedenko, B. Sur la distribution limite du terme maximum d’une serie aleatoire. Ann. Math. 1943, 44, 423–453. [Google Scholar] [CrossRef]
  13. Gumbel, E.J. Statistics of Extremes; Columbia University Press: New York, NY, USA, 1958. [Google Scholar]
  14. Beisel, C.J.; Rokyta, D.R.; Wichman, H.A.; Joyce, P. Testing the extreme value domain of attraction for distributions of beneficial fitness effects. Genetics 2007, 176, 2441–2449. [Google Scholar] [CrossRef] [PubMed]
  15. Tzavelas, G.; Batsidis, A.; Economou, P. Size Biased Fréchet Distribution: Properties and Statistical Inference. J. Stat. Theory Appl. 2024. [Google Scholar]
  16. Alzeley, O.; Almetwally, E.M.; Gemeay, A.M.; Alshanbari, H.M.; Hafez, E.H.; Abu-Moussa, M.H. Statistical Inference under Censored Data for the New Exponential-X Fréchet Distribution: Simulation and Application to Leukemia Data. Comput. Intell. Neurosci. 2021, 2021, 2167670. [Google Scholar] [CrossRef] [PubMed]
  17. Phaphan, W.; Ibrahim, A.; Wirawan, P. Properties and Maximum Likelihood Estimation of the Novel Mixture of Fréchet Distribution. Symmetry 2023, 15, 1380. [Google Scholar] [CrossRef]
  18. Tzavelas, G.; Economou, P. Extreme value distributions for biased samples. Probab. Eng. Inf. Sci. 2015, 29, 277–290. [Google Scholar] [CrossRef]
  19. Embrechts, P.; Klüppelberg, C.; Mikosch, T. Modelling Extremal Events: For Insurance and Finance; Springer Science & Business Media: Heidelberg, Germany, 2013. [Google Scholar]
  20. Tzavelas, G.; Panagiotakos, D. Statistical inference for the size-biased Weibull distribution. J. Stat. Comput. Simul. 2013, 83, 1252–1265. [Google Scholar] [CrossRef]
  21. Jain, K.; Singh, H.; Bagai, I. Relations for reliability measures of weighted distributions. Commun. Stat.-Theory Methods 1989, 18, 4393–4412. [Google Scholar] [CrossRef]
  22. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  23. Al-Shomrani, A.A.; Shawky, A.; Arif, O.H.; Aslam, M. Log-logistic distribution for survival data analysis using MCMC. SpringerPlus 2016, 5, 1–16. [Google Scholar] [CrossRef] [PubMed]
  24. Ashkar, F.; Mahdi, S. Fitting the log-logistic distribution by generalized moments. J. Hydrol. 2006, 328, 694–703. [Google Scholar] [CrossRef]
  25. Kleiber, C.; Kotz, S. Statistical Size Distributions in Economics and Actuarial Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Table 1. Positive and continuous distributions that belong to the Fréchet domain associated with the parameters β and δ k of the approximating Fréchet distribution.
Table 1. Positive and continuous distributions that belong to the Fréchet domain associated with the parameters β and δ k of the approximating Fréchet distribution.
Distribution 1 F ( x ) β δ k
G P ( σ , γ ) 1 + γ x σ 1 γ , x > 0 , σ , γ > 0 1 γ σ γ k γ 1
B u r r ( η , τ , λ ) (type XII) η η + x τ λ , x > 0 , η , τ , λ > 0 λ τ η 1 τ k 1 λ 1 1 τ
L L ( a , b ) b a b a + x a , x > 0 , a , b > 0 a b k 1 1 a
B u r r ( η , τ , λ ) (type III) 1 η η + x τ λ , x > 0 , η , τ , λ > 0 τ η 1 τ k 1 1 τ
I n v Γ ( λ , a ) x λ a Γ ( a ) e x p ( λ / w ) w a 1 d w , x > 0 , λ , a > 0 a λ Γ 1 a , Γ ( a ) ( 1 k 1 )
Fréchet (a) 1 e x p x a , x > 0 , a > 0 a l o g k 1 n 1 a
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Batsidis, A.; Tzavelas, G.; Economou, P. Two Types of Size-Biased Samples When Modeling Extreme Phenomena. Stats 2024, 7, 1392-1404. https://doi.org/10.3390/stats7040081

AMA Style

Batsidis A, Tzavelas G, Economou P. Two Types of Size-Biased Samples When Modeling Extreme Phenomena. Stats. 2024; 7(4):1392-1404. https://doi.org/10.3390/stats7040081

Chicago/Turabian Style

Batsidis, Apostolos, George Tzavelas, and Polychronis Economou. 2024. "Two Types of Size-Biased Samples When Modeling Extreme Phenomena" Stats 7, no. 4: 1392-1404. https://doi.org/10.3390/stats7040081

APA Style

Batsidis, A., Tzavelas, G., & Economou, P. (2024). Two Types of Size-Biased Samples When Modeling Extreme Phenomena. Stats, 7(4), 1392-1404. https://doi.org/10.3390/stats7040081

Article Metrics

Back to TopTop