1. Introduction
In the pursuit of estimating population parameters, auxiliary variables are crucial tools. These variables, which are different but intricately linked to the variable of interest, offer a dependable method for improving the consistency and validity of statistical estimations. Based on survey sampling theory, the importance of auxiliary variables becomes clear, especially in guaranteeing symmetry in the sampling process. Given the impracticality or difficulty of collecting comprehensive data from the entire population in survey research, researchers use sampling—an intentional selection of a smaller subset—to collect data, aiming for symmetry in representation. The goal is to extrapolate findings from this sample to the entire population, using auxiliary variables that promote symmetry in the estimate. Using auxiliary variables in the sample and estimating process appears to be a powerful method for boosting estimation accuracy while maintaining symmetry in the representation of population features. Previous research has clarified a variety of population parameters, including mean, median, total, distribution function, etc., each of which requires supportive variable data in addition to the variable of interest, adding to the symmetrical representation of population traits. In stratified sampling, a distribution function estimator is useful for estimating the cumulative distribution function (CDF) within each stratum. These estimates are then combined to generate a comprehensive estimate for the entire population, ensuring symmetry throughout the estimating process.
Over the years, numerous scholars have delved into different facets of estimators in stratified random sampling (St RS), enriching our understanding and refining the methodologies in this critical domain. The author of [
1] addressed the credibility of the approximate formula for computing variance, while [
2] discussed estimation methods and the dual property of ratio estimation. Eventually, ref. [
3] focused on techniques in post-stratification, and later, ref. [
4] demonstrated methods to improve ratio and regression estimators. Further, ref. [
5] explored the characteristics of estimators for finite population distribution functions. Later, ref. [
6] proposed a Bayesian model-based theory for post-stratification, and [
7] presented calibration estimators using auxiliary data. Further, refs. [
8,
9] also contributed with their estimators for distribution functions in post-stratification.
Later, ref. [
10] extended estimators with readily accessible supporting variables. An efficient ratio estimator for stratified sampling was introduced by [
11]. Further, ref. [
12] presented a family of estimators for the population mean, which was validated empirically. Later, ref. [
13] proposed superior exponential ratio estimators. Further, ref. [
14] derived a diligent ratio and product estimator, which outperformed others. Thereafter, ref. [
15] devised exponential ratio estimators based on supporting variables. Additionally, ref. [
16] suggested reliable ratio and difference estimators for population distributions. Researchers have made significant advancements in sampling estimation. Later, ref. [
17] improved exponential estimators for post-stratification. Ref. [
18] introduced superior estimators for SRS and stratified sampling with two supporting variables. Following that, ref. [
19] enhanced the difference cum exponential estimator. Thereafter, ref. [
20] proposed efficient estimators. Ref. [
21] suggested a ratio estimator for post-stratification. Subsequently, ref. [
22] improved the generalized population mean estimator, while [
23] developed estimators for the finite population mean in SRS. Later, ref. [
24] presented a two-parameter ratio product ratio estimator. Afterward, ref. [
25] suggested an innovative family of exponential estimators using supporting attributes and actual data sets.
In recent times, several authors have focused on distribution function estimators using supporting variables. Ref. [
26] proposed finite population distribution function estimators, which have outperformed others in simple random sampling (SRS) and stratified sampling. Following that, ref. [
27] introduced imputation methods for calculating the population mean in two-occasion successive sampling. Eventually, ref. [
28] recommended exponential-type estimators for finite population mean, demonstrating superiority with four data sets. Thereafter, ref. [
29] devised estimators for the population mean and efficiently combined and separate estimators in stratified sampling. Additionally, ref. [
30] proposed a ratio estimator with the highest effectiveness via empirical and simulation studies. Afterward, ref. [
31] developed an estimator for estimating the population distribution function and proved its efficiency by using a simulation study. Further, ref. [
32] discussed the efficiency of the ratio estimator in stratified sampling and proved its efficiency by utilizing empirical studies. Later, ref. [
33] suggested robust-type estimators for population variance, outperforming existing methods in simple and St RS. A hybrid estimator for the population mean was proposed by [
34], showing superior efficiency through empirical and simulated experiments. Later, ref. [
35] proposed a log-type estimator in stratified ranked set sampling. A new approach to the mean estimators in ranked set sampling was introduced by [
36].
The literature on the estimation of CDFs is notably sparse, highlighting a significant gap in research. In response, this article is committed to advancing this field by introducing innovative CDF estimators. Our focus lies in proposing two distinctive classes of estimators that harness auxiliary variable information to accurately estimate the CDF of a specific variable under examination. By leveraging auxiliary variables, our proposed estimators aim to fill this gap and provide enhanced methods for estimating CDFs, thus contributing to the broader advancement of statistical estimation techniques. The paper has been systematically organized to enhance clarity and coherence in presenting the research on stratified and post-stratified sampling methods. Beginning with an introduction that sets the stage for the study,
Section 2 elucidates key terms and concepts essential for understanding the subsequent discussion. The literature review delves into existing estimators in both stratified and post-stratified sampling, laying the groundwork for the methodology section, where novel estimators for each method are proposed. The theoretical framework provides a theoretical underpinning for both sampling techniques, while
Section 6 details the implementation and outcomes of empirical investigations conducted for each method.
Section 7 brings together the findings from both empirical studies, facilitating a comprehensive analysis of the proposed estimators and their implications. Finally,
Section 8 offers a concise summary of the study’s key findings and their significance for future research and practical applications. This organization ensures a logical flow of ideas and a clear delineation of the contributions made in the domains of both stratified and post-stratified sampling methodologies.
2. Background and Notations
2.1. Notations in Stratified Random Sampling
To evaluate the finite population distribution function, regarding a finite population, of distinct units is distributed to k homogeneous strata, is the size of stratum such that A sample size is taken from the stratum by utilizing SRS without replacement.
Let and be the population distribution function of the variables (study variable) and (auxiliary variable) under St RS, respectively. Let and be the sample distribution functions of the variables and respectively.
where and and sample size.
denotes the stratum weight of stratum.
and represents population and sample distribution functions of for the stratum and is the indicator variable of .
and represents the population and sample distribution functions of for the stratum and is the indicator variable of .
Here, we consider error terms for finding bias and MSE of the estimators.
Let ,
and
,
where
2.2. Notations in Post-Stratification
Post-stratification in survey sampling addresses missing crucial attributes by dividing the population into subgroups based on known auxiliary variables. Survey weights are adjusted to account for variations in the distribution of these variables, mitigating biases from nonresponse and small sample sizes. By using CDF estimators, researchers achieve more precise distribution estimates within specific subgroups, enhancing the understanding of the study variable’s characteristics.
In post-stratification, the traditional unbiased estimator of the population distribution function is referred to as
where
is the post-stratified empirical distribution function at
is the number of post-strata.
is the distribution function of for the stratum.
Variance of
is formulated as
Consider the error terms below to obtain the bias and MSE of our proposed estimator,
where
Now, we will find the expected values of error terms:
7. Results and Discussion
In this study, we have proposed two novel estimators to estimate the CDF of a study variable by employing the auxiliary variables’ information under stratified random sampling and in post-stratification.
The first estimator is proposed under St RS, which contains a combination of estimators presented in Equation (13). By taking suitable constants in place of
and functions of auxiliary variable
or constants in places of
we obtain efficient results for our proposed estimators. Because the proposed estimator contains a class of estimators, it has several existing estimators in it. Because we used suitable values in the functions or constants, we have different estimators, which are represented in
Table 1. Here, two data sets were used to prove the efficiency of the proposed estimator. The derived conditions are available in Equations (22)–(27). Data Set-I is taken from [
35], and the numerical study is presented in
Section 6. From
Table 3, we can observe the results; the proposed estimator
Outperform other estimators in terms of MSE and PRE. Data Set-II is extracted from the website
https://doi.org/10.34740/KAGGLE/DSV/7623777 (accessed on 29 February 2024).
Table 5 presents all the values needed for the calculation of MSEs. From
Table 6, among the estimators,
stands out with its remarkably low MSE of 0.000057 and a high PRE of 194.876102, underscoring its superior predictive accuracy and demonstrating greater symmetry than usual unbiased [
1,
2,
4,
13,
34] estimators.
The second estimator we proposed in this study is under post-stratification with constants
and
. We derived the equations of bias and MSE up to the first degree of approximation, and can find the theoretical conditions in
Section 5 from Equations (28)–(33). To prove the efficiency of the proposed estimator in post-stratification, we have utilized two data sets. We have taken the information of
,
and
from the Data Set-II and III. We can observe from
Table 8, that the MSE value of the proposed estimator is low and the relative efficiency values are high compared with the considered estimators, which is the same as we can observe from the figures. From
Table 8, the comparative analysis of various estimators applied to Data Set-II and Data Set-III reveals distinct performance characteristics. Notably, the estimator
consistently exhibits superior predictive accuracy, as evidenced by its low MSE value and consistently high PRE across both datasets. Conversely,
demonstrates poor performance, with significantly higher MSE values and lower PRE, suggesting limited predictive capability. Among the estimators,
,
, and
present moderate performance, displaying relatively lower MSE and higher PRE compared to
but not reaching the levels of
.
From
Figure 1 and
Figure 3, a striking trend emerges as the plotted trend line gracefully ascends, embodying our recommended estimator’s trajectory. In contrast,
Figure 2 reveals a consistent decline in MSE values, notably showcasing the diminishing errors of both [
34] and our proposed estimator, labeled as 6 and 7, respectively.
Figure 4 accentuates the nearly identical MSE values of the second and fourth estimators, hinting at commendable performance, albeit not surpassing the prowess demonstrated by our proposed estimator. Examining
Figure 3, a clear victor emerges as the proposed estimator outshines its counterparts in both Dataset-II and Dataset-III, closely trailed by [
4], in both datasets. Conversely, ref. [
2] presents a lackluster performance across both datasets, marking it as the weakest contender.
Figure 4 mirrors this pattern, with our proposed method boasting the lowest MSE followed closely by estimator [
4] across both datasets. Notably, ref. [
2] and the classical estimator struggle to keep pace, recording notably higher MSE values in Dataset-I and Dataset-II, respectively. Hence, the evidence from
Figure 3 and
Figure 4 unequivocally supports the superiority of our proposed estimator over its counterparts, a conclusion further reinforced by the insights gleaned from
Figure 1 and
Figure 2.
Table 3 serves as a comprehensive showcase of MSE and PRE metrics for existing estimators juxtaposed with our proposed solution, listed as serial No. 7. Notably, our proposed estimator garners the lowest MSE and the highest PRE, setting a benchmark closely followed by [
34].
Table 4 corroborates this finding, further establishing the pre-eminence of our proposed estimator. Additionally,
Table 6 unveils the performance metrics for Dataset-II, highlighting once more the supremacy of our proposed method, trailed by the estimator [
13]. This consistent dominance across datasets underscores the inconsistency plaguing existing estimators, a testament to the robustness and reliability of our proposed solution.