1. Introduction
The genetic prepotency of cows is an important issue since the development of livestock is directly related to the growth of the food production. Brazilians institutes are concerned with the development of a particular race, the Tabapua, which was the first humped cattle developed in the country. Due to the economic results of this particular race, this study is twofold: present the TLL model and fit the times up to the first calving of the cows pointing characteristics of this race.
Proposed by
Granzotto and Louzada-Neto (
2014), the TLL model presents important characteristics of a good model: it is flexible, tractable, interpretable and simple. Following the
Shaw and Buckley (
2007) idea, this new distribution incorporates a new third parameter
that introduces skewness and preserve the moments of the baseline distribution. Several studies can be cited that proposed similar generalizations of survival models, see for example (
Aryal and Tsokos 2009;
Aryal and Tsokos 2011).
Due to good characteristics of the TTL model along with its simplicity (the main functions are analytically expressed) and the hazard properties (it has a larger range of choices for the shape of the hazard function most commonly observed in the survival analysis field), this paper present an application of the model in a Bayesian context.
In order to fit this new model, the subjective Bayesian analysis was used. For that, the half-Cauchy prior distribution, cited by several authors such as (
Polson and Scott 2012;
Gelman 2006), as an alternative prior to a inverse Gamma distribution, was used. Specially,
Gelman (
2006) made use of this particular prior for variance parameters in hierarchical models which is our case.
Furthermore, in order to provide an indication of bad model fitting or influential observations, an influential analysis was made, see for example (
Ortega et al. 2003;
Fachini et al. 2008).
The paper is organized as follows. The hierarchical log-logistic model built by using the half-Cauchy prior distribution is presented in
Section 2. In
Section 3 we presented an application by using a real data set on a polled Tabapua race time up to first calving data. An influence diagnostic was presented in
Section 4 and the data set was re-analyzed refitting the model. Final remarks and conclusions are presented in
Section 5.
2. Hierarchical TLL Model
Proposed by
Granzotto and Louzada-Neto (
2014), the TLL is a generalization of the log-logistic model containing the baseline model as a particular case (for log-logistic distribution see (
Bennett 1983;
Chen et al. 2001)). Tractable, Interpretable and flexible enough, the new construction can be used to analyze more complex dataset, introducing skew to a base distribution and preserving its moments. Let
X be a nonnegative random variable denoting the lifetime of an individual in some population then, the probability density function (pdf) and the cumulative function of the TLL distribution are respectively given by
and
where
,
and
. Since the distribution was proposed to model experiments in reliability analysis,
Figure 1 presents several examples of survival, probability density and hazard rate functions for different values of the parameters.
According to
Chen and Ibrahim (
2006), one of most common ways of combining several sources of information is though hierarchical modeling. Thus, the authors show us the relationship between the power prior and hierarchical models using as example the regression models.
Also,
Gelman (
2006) show us that several studies by using multilevel models are central to modern Bayesian statistics for both conceptual and practical reasons. The authors suggest to use the half-t family as a prior distribution for variance parameters such the half-Cauchy distribution, that is a special conditionally-conjugate folded-noncentral-t family case of prior distributions for parameters that represent the discrepancy. Even though several studies use the half-Cauchy prior for scale parameter (see for example
Polson and Scott 2012),
Gelman (
2006) used this prior not for scale but for variance parameters and illustrated serious problems with the inverse-Gamma prior which is the most commonly used prior for variance component, see
Daniels and Daniels (
1998).
In this paper we proposed to use the hierarchical models in two levels, for that, suppose the hierarchical model given as , , , , and . The posterior distribution can be constructed as following.
Proposition 1. Let us suppose that, in the first stage, we considered a class Γ of priors that led to following Also, the second stage (sometimes called a hyperprior), would consist of putting a prior distribution on the hyperparameters and θ where Thus, the hyerarchical log-logistic posterior distribution is written as Proof. The demonstration is direct.
Note that, the
parameter is supposed to be a half-Cauchy distribution which probability density function given by
where
is a scale parameter which has a broad peak at zero and, in limit,
this becomes a uniform prior density. However, large finite values for
represent prior distributions which we call “weakly informative”. For example,
Gelman (
2006) show us that, for
, the half-Cauchy is nearly flat although it is not completely. ☐
3. Application to Real Data
Founded in 70’s, the Brazilian Agricultural Research Corporation (Embrapa) is under the aegis of the Brazilian Ministry of Agriculture, Livestock, and Food Supply. Since the foundation, they have taken on the challenge to develop a genuinely Brazilian model of tropical livestock (and agriculture as well), to keep increasing the production of food. As a result of the intense research work, the beef and pork supply were quadrupled, helping the Brazilian food to one of the world’s largest food producers and exporters.
One of the special research is related to the genetic prepotency of cows whereas the economic results is directly related to beef cattle, see for example
Pereira (
2000).
Granzotto and Louzada-Neto (
2014) study the Tabapua race time up to first calving of
animals observed from 1983 to 2007, held at Embrapa. Firstly, as the minimum observed calving was 721 days, we subtract this amount of the observed times and the distribution of the first calving times can be observed in the
Figure 2b.
Also, the TTT plot, presented in
Figure 2a shows the possible unimodal hazard rate as it is concave, convex and then concave again, see for example
Barlow and Campo (
1975).
After initial analysis, we are considering the hierarchical TLL model, as we specify in
Section 2, to fit the data. The posterior samples were generated by the Metropolis-Hastings technique. Three chains of the dimension
was considered for each parameter, discarding the first 10,000 iterations (in order to eliminate the effect of the initial values), a lag size 10 was used to avoid the correlation problems, resulting in a final sample size 10,000. The posterior summaries for
and
are present in
Table 1 and the
credible intervals by considering the priors above-mentioned can be seem in
Table 2.
The convergence of the chain was verified by Gelman and Rubin’s convergence diagnostic criterion, see for example (
Gelman and Rubin 1992), which demonstrate that these criteria is satisfied (
Table 3). Also, the convergence can be seem in
Figure 3a–j.
Also, the marginal posterior densities for
,
and
, respectively, can be analyzed by the
Figure 4a–e.
After estimate and analyze the convergence of the model,
Figure 5a,b show us, respectively, the hazard estimate curve, with the
and the
confidence interval; the survival curves estimated vs empirical and the histograma which are possible to see how well it fits a set of observations.
Considering the hyerarchical TLL fitting, the
is equals to
days (
months) and its
confidence interval is given by IC
days (see
Figure 5a). Furthermore, the median time up to first calving is equals to
days (or approximately
months), and the mean of time is
days (or approximately 18 months), with standard deviation equals to
months.
4. Influence Analysis
In this section we present an analysis of global influence for the data set given, using the TLL model in a bayesian context.
In few words, the influence analysis is a case-deletion, that we study the effect of withdraw of the
ith element sampled. Several measures of global influence analysis are presented in the literature. In this study we are considering two: the generalized Cook’s distance (CD) and the likelihood difference (LD). The first one, CD, defined as the standard norm of
and
, and the LD are given, respectively, by
and
According to
Lee et al. (
2006),
can be approximated by the estimated covariance and variance matrix. Some possible influence points are identified in the LD plot,
Figure 6.
Furthermore, the impact of the identified influential points should be measured. For that, we consider the relative changes that can be measured as , where denotes the MLE of after the set I of observations has been removed.
Three measures of influential observations are considered: TRC is the total relative changes, MRC the maximum relative changes and LD the likelihood displacement, see for example (
Lee et al. 2006;
Granzotto and Louzada-Neto 2014).
Table 4 presents the values when we withdrew from 0.01 to 5% of the outstand identified points in
Figure 6.
By considering the RC’s, 10 most influential points were withdrew and the model was re-fitted. Again, by using the Metropolis-Hastings technique we generated a chain of
observations, burn in of
and lag 10, resulting in a final sample size
.
Table 5 and
Table 6 shows the posterior summaries and the
credible intervals.
Clearly, the most affected estimate parameter was
if we compare to the parameters estimated by using the original dataset. Further, withdrawing
of sample, i.e., just 17 observations, we do not lose much information and also improve the fitted model, see
Figure 7a–c, that show us the fitted model.
5. Concluding Remarks
In this paper study the model propose by
Granzotto and Louzada-Neto (
2014), the TLL distribution, in a Bayesian context. The two levels hierarchical TLL model was formulated by using the half-Cauchy as a prior to the parameter of discrepancy.
Techniques of influential analysis were used to identify and measure the influence of the outstanding observed points. It is important to observe that the re-fitted model presents a reduction in the estimated likelihood value plus a reduction in the estimated standard deviation, which shortens the range of the confidence interval obtained for the most probable time up to first calving.
Finally, considering the final fitted model, the changes to against days ( months) and its confidence interval is given by IC months. The median time up to first calving is equals to days (or approximately months), and the mean of time is days (or approximately months), with standard deviation equals to months.