1. Introduction
Over the last 50 years, the line transect method has been used to calculate population abundance (density) as an alternative to plot or strip sampling methods. Establishing a plot and then counting all the objects of interest in it can be highly time-consuming, and defining plots in certain environments, e.g., at sea or with fast-moving species, might be challenging. For example, see Burnham et al. [
1]. A retrospective on this topic is proposed below, with the main references provided.
The following two techniques are used to assess the detection function that is fundamental for estimating abundance: (i) nonparametric methods or semiparametric methods, and (ii) parametric methods. Among the pioneers, Gates et al. [
2] provided a function based on an exponential distribution with one scale parameter, and Hemingway [
3] proposed a function based on the half normal distribution, with the shoulder function at the origin. In addition, various methods have been proposed to estimate the parameter
, indicated as the probability density function (pdf) at distance 0, and hence, under the shoulder condition in literature, population abundance, which is denoted as
D. The reader can find these parametric estimation methods in Burnham and Anderson [
4], Pollock [
5], Ramsey [
6], Karunamuni and Quinn [
7], Buckland [
8], Eberhardt [
9], Eidous [
10], Ababneh and Eidous [
11], Quinn and Gallucci [
12], and Eidous and Al-Eibood [
13]. When conducting a whale survey, Buckland and Turnock [
14] used primary and secondary viewing platforms to disprove the notion that every whale would be detected perfectly. Using a logistic regression, Manly, McDonald, and Garner [
15] created a mark-recapture distance sampling (MRDS) model to estimate the number of polar bears (Ursus maritimus) in northern Alaska. Additionally, their model includes factors in the population estimate, including group size. By combining line-transect data with a stratified Lincoln–Petersen estimate, Alpiza-Jara and Pollock [
16] created an MRDS model. As opposed to utilizing the maximum likelihood approach, Becker and Quang [
17] fitted the model using iteratively reweighted least squares. They employed a logistic detection function that included covariates for contour transects. They also calculated the population size of brown/grizzly bears, using the Horvitz–Thompson estimate (Horvitz and Thompson [
18]). To estimate the harbor porpoise population, Borchers et al. [
19] created an MRDS model that includes variables and employed the Horvitz–Thompson estimate. When collecting data on harbor porpoises, Manly, McDonald, and Garner [
15] and Becker and Quang [
17] employed a survey design that involved two observers on a single aerial survey platform, while Borchers et al. [
19] used two different platforms on the same ship.
In the general line transect sampling approach, an observer walks a distance
L, which is often divided into
K, transects arranged at random and spans the study region over which inferences are required in an effort to estimate the unknown population abundance, denoted by
D. The observer counts the detected objects, and for each of them he details the
x perpendicular distance from the centerline to the location of the object. Let
n be the number of detected objects and
be the random distances between them. Then, the population density may be calculated using these distances. Suppose that there exists a detection function
, which is defined to be the conditional probability of observing an object whose perpendicular distance from the line is
x, i.e.,
The line transect technique has the benefit of not detecting every object in a specific area; some may go undetected. Additionally, things close to the transect line’s center are more likely to be identified than objects farther out from the line. Mathematically, if and are observed perpendicular distances, such that then . Given the properties of the sighting process, the logical assumption about is that it originates from a shoulder. The shoulder requirement indicates that the detection remains certain, or very close to certain, at a very small distance from the center of the line transect. This can be mathematically represented as . The goal under these conditions is to build a more flexible detection function with various right-skewness or asymmetric properties and tail-weight modulation.
Provided that an object with perpendicular distance
x has been detected, there exists a pdf of
x that has the same shape as
, but scaled (see also Burnham and Anderson [
4]). Thus, this pdf can be defined as
,
, with
. From the statistical viewpoint, assume now that the likelihood of spotting an object at a distance of zero is one (i.e.,
), then
. The general estimate of the animal density is calculated as
(see also Burnham and Anderson [
4]), in which the sample of perpendicular distances serves as the foundation for an approximate sample estimate of
, denoted as
. Moreover, the estimation of the population abundance
D can be used to estimate the number of objects
N in the target region of area
A. Indeed, the following relation holds:
, implying that
by the substitution method (see also Burnham et al. [
1], p. 16).
A comprehensive discussion of line transect sampling can be found in Buckland et al. [
20], Barabesi et al. [
21], Eidous et al. [
22], Eidous et al. [
23], Jang et al. [
24], Seber et al. [
25], Pollock et al. [
5], Strindberg et al. [
26], Eberhardt et al. [
9], Quang et al. [
27], and Drummer et al. [
28]. The practical and statistical aspects with line transect sampling are covered in-depth in Burnham and Anderson [
29], Routledge and Fyfe [
30], Southwell [
31], Southwell et al. [
32], Brockelman et al. [
33], Drummer et al. [
28], Chen et al. [
34], Melville et al. [
35], and Thomas et al. [
36].
Some statistical or applied works on this topic are presented below. For kangaroo populations of a known size, Southwell [
31] discovered that line transect population estimates tended to underestimate the true pdf. This could be explained by the violation of a basic assumption (i.e., ”assumption 2” in Burnham et al. [
1]) because animals on the centerline were frequently disturbed and moved, although only slightly, before they were sighted by the observers. The same issue was discovered by Porteus et al. [
37] in line transect testing in England using known populations of domestic sheep. Combining line transect estimation with mark-recapture studies is a crucial strategy in line transect research so that detectability may be directly measured and appropriate corrections can be made to the estimations. This strategy is covered by Borchers et al. [
19], Laake et al. [
38], and Thomas et al. [
36]. The effectiveness of this strategy for an airborne census of penguin populations in Antarctica is demonstrated by Southwell et al. [
32]. Schmidt et al. [
39] offered a further example of the use of line transect sampling in an aerial census of Dall sheep in Alaska. For species that can live in groups rather than alone because groups are easier to see, the detection function may change based on the size of the group. This effect was discovered with penguins in Antarctica by Fewster et al. [
40]. Group size is just one of many variables that might influence the likelihood of detection, and all the variables that affect the accuracy of aerial surveys equally apply to line transects conducted on the ground or in the air. To compute line transects, there are numerous computer applications available. Program HAYNE computes the Hayne estimate and the modified Hayne estimate for line transect sampling. The considerably larger and more complete program DISTANCE (
http://www.ruwpa.st-and.ac.uk/distance/ (accessed on 20 September 2022)) of Buckland et al. [
20] calculates the half-normal, the Fourier-series estimate, and several other parametric functions. The industry standard for calculating line transects is Program DISTANCE. The shape-restricted estimate for line transect data is computed by the method of Routledge and Fyfe [
30], using the program TRANSAN, which is part of the ecological methodology program bundle.
In this article, we propose a new detection function based on the Burr XII distribution. In order to explain this interest, let us recall that many commonly used distributions, including the gamma, lognormal, log-logistic, bell-shaped, and J-shaped beta distributions, are included in, overlap with, or include the Burr XII distribution as a limiting case. The main feature of the Burr XII distribution is to have simple functions based on power functions, to possess a heavy-right tail, and to cover a wide range of skewness or asymmetry and kurtosis with various values of its parameters. As a result, it is used to represent different forms of data in many different domains, including finance, hydrology, reliability, household income, crop prices, insurance risk, etc. For more details on the Burr XII distribution, we refer to Rodriguez [
41] and Al-Hussaini [
42]. The survival function (sf) of the Burr XII distribution is considered in this paper to gain a new perspective on how to use it as the detection function for distance sampling. As a matter of fact, in the literature, there are many detection functions with one parameter. The proposed detection function has two parameters, which gives it the merit of better modeling and analyzing practical perpendicular data compared to those with one parameter. Moreover, there is a very rare “polynomial power based” detection function that seems well adapted for capturing the right heavy tail in the data; our model thus fills a gap in this sense.
The following works composed the article: Our function meets all of the conditions of the shoulder function; see
Section 2, which also plots some detection functions for various parametric values. In
Section 3, we obtain some moment properties, which can be used to derive the mean and variance. In
Section 4, inference is performed on the involved parameters using the moments and maximum likelihood methods to estimate the value of the pdf at distance 0 and the population abundance. In
Section 5, a simulation is carried out to assess how well the estimated parameters perform. Mathematica 10 is used for simulation, and some plots and graphs are also given for visual analysis. In
Section 6, practical perpendicular distance data sets are considered to demonstrate the performance of the proposed function, and related measurements are computed for these data sets using the model. The conclusions are offered in
Section 7.
2. The Proposed Detection Model
Motivated by the heavy-tailed nature of the Burr XII distribution, we propose the following function as a new two-parameter detection function:
where
c and
k are shape parameters. Because of the necessary decreasing property, the definition of
is based on the sf of the Burr XII distribution rather than its pdf (see again Rodriguez [
41] and Al-Hussaini [
42]). With this definition, we have
, and
which gives
for
, and
is monotonically decreasing for all
. As a result, the three assumptions made by Burnham et al. [
1] are met:
is an admissible detection function. In addition, as far as we know, our method is the first to take advantage of the Burr XII distribution’s heavy right tail for detection models.
Normalizing the detecting function (provided that it exists) yields the pdf associated with
, which is defined as
where
denotes the standard beta function, and it is assumed that
(if not, the beta function term into Equation (
1) is not valid). Hence, the conditions on the parameters are
and
. The plots of this pdf for different parametric values are given in
Figure 1, aiming to show the effects of
c and
k on the shapes.
From this figure, various decreasing asymmetric shapes are observed, and the heavy right-tailed property is preserved. To the best of our knowledge, the proposed pdf is one of the rarest of its kind in the context of distance sampling.
On the other hand, since
, we get
and the population parameter
D is given by
where
L represents the total length of
r lines
, which are assumed to be non-overlapping and placed randomly in a specific region, and
n represents the number of sighted objects collected based on the
k lines. For the estimation of
D, it is needed to estimate the parameter
, which demonstrates that
plays a crucial role in this regard.
5. Simulation Study and Results
We now carry out a simulation study to examine the performance of the proposed model (PM) estimates,
and
, of
compared to some other existing estimates. The negative exponential model (NEM) estimate,
, the half normal model (HNM) estimate,
, and weighted exponential model (WEM) estimate,
(see Saeed [
45]), are considered in this regard.
To replicate the perpendicular distances, two alternative target detection functions are considered, as well as observations
of
, with sample sizes of
, 100, 200, 300 and 500. These detection functions were chosen based on the requirement that they depict the many shapes that might appear in the specific field (see Eidous [
22]). Target models for simulating perpendicular distances include:
Exponential power (EP) detection function (see Pollock [
5]):
where
is the standard gamma function.
Beta exponential (BE) detection function (see Eberhardt [
9]):
To simulate the data, each model is truncated at some distances w. The truncated points of the EP and BE detection functions are , 3, and 2, and and 1, respectively. For both models, several (arbitrary) values of are chosen. For each model, , 100, 200, 300 and 500 sample sizes are considered with 1500 times the samples for the perpendicular distances that are randomly drawn.
Table 2 and
Table 3 provide the relative biases (RBs) and the relative mean square errors (RMEs). We made the following observations based on these results.
As a result of the simulation, we can conclude that the proposed model fits the line transect data and that the proposed estimates of and D are efficient. When compared to the NEM, HNM, and WEM, better results are obtained.