1. Introduction
The log-normal distribution is commonly used to model the behavior of data with positive asymmetry, in which most of the observations are concentrated near the minimum value. Some applications of the log-normal model are in species abundance patterns, environmental concentrations, stock prices, the distribution of the molecular weights of polymers, the production of copper nano-particles, etc.
The log-normal (LN) distribution arises from the transformation
, where
is the normal distribution with mean
and variance
and has the density given by
where
is the standard normal density. The notation
is typically used for the log-normal distribution with parameters
and
. More simply, we can say that a random variable
if and only if
. As noted by Jones and Arnold [
1], this distribution is both log-symmetric about its mean/median
and R-symmetric about its mode
, a condition called double symmetry. As defined by Mudholkar and Wang [
2], in density terms, a non-negative random variable
Y is said to be R-symmetric about
if
for some
and all
, which is equivalent to
, where “
” means equal in distribution. When
is unimodal, then
is the (unique) mode of
Y, which is less than the mean of
Y. Thus, R-symmetric distributions are always positively skewed. Despite these good properties, there are data that are not adequately modeled by the log-normal distribution, since they have symmetry and kurtosis indices that are outside their natural range. A model that presents this characteristic is the log-skew-normal (LSN) distribution, introduced and studied by Azzalini et al. [
3]. It is a version with positive support of the skew-normal (SN) distribution, defined as follows: for a skew-normal random variable
, where
is the skewness parameter, we say that the random variable
follows a log-skew-normal distribution with skewness parameter
, denoted by
, if its probability density function (pdf) is given by
where
denotes the cumulative distribution function (cdf) of the standard N
.
Applications of this model to real datasets are reported in Azzalini et al. [
3], Marchenko and Genton [
4], and Bolfarine et al. [
5], where a bimodal version of the model was used to fit a real (bimodal) dataset. Chai and Bailey [
6] extended the SN model to a situation of continuous datasets with a discrete component at zero. On the other hand, the modified skew-normal (MSN) distribution is a particular case of the generalized skew-normal (GSN) distribution introduced by Arellano-Valle et al. [
7], for
. Later, Arellano-Valle et al. [
8] investigated the Fisher information matrix (FIM) for the location-scale version of the GSN model. Thus, the MSN model is a fair competitor to the SN model since for both control asymmetry with a single scalar parameter, say
, such that, if
, then the ordinary normal model results. Moreover, one advantage of the MSN model over the SN model in location-scale situations is that its FIM is nonsingular at
, which is not the case with the SN model (see Arellano-Valle et al. [
8] and Arrué et al. [
9]). The present paper focuses on investigating the possibility of developing a more flexible distribution for positive data, such that the regularity conditions for inference in large samples remain valid when a maximum likelihood approach is used. We considered it natural to use the log version of the MSN for this situation, calling it the modified log-skew-normal(MLSN) model.
This paper is organized as follows.
Section 2 presents some probabilistic properties of the MSN model. It includes the derivation of the first few moments of the distribution and the moment-generating function. Plots and ranges for the asymmetry and kurtosis coefficients are reported. The score functions are derived, and the observed information matrix is presented.
Section 3 is devoted to the definition of the new distribution, termed the MLSN distribution. The survival and the associated risk functions are derived. A general expression for the moments is obtained, and the non-existence of the moment-generating function is proven. Parameter estimation is conducted by the maximum likelihood (ML) approach. Observed and expected (Fisher) information matrices are derived. It is shown that the FIM for a location-scale extension of the model is non-singular so that large sample properties of the ML estimators (MLE) are satisfied. In
Section 4, we present a brief introduction to Firth’s approach (see Firth [
10]) for bias reduction and some tables related to a small-scale simulation study, illustrating the amount of bias reduction in the asymmetry parameter. Finally, in
Section 5, a real data illustration is presented indicating that the new model outperforms its most direct competitors. The closing section summarizes the main contributions of the paper.
2. Preliminaries
We say that a random variable
Z follows a standard MSN distribution with parameter
, denoted
, if its pdf is given by
where
and
. If
, then the MSN pdf in (
2) reduces to the pdf of the standard normal distribution. Non-null values of the parameter
directly affect the model’s asymmetry so that, in the limit, as
, the MSN model becomes the half-normal (HN) distribution. All that is required to obtain the location-scale version of the model is to make the transformation
, where
and
are the location and scale parameters, respectively. We use the notation
. As shown in Arellano-Valle et al. [
8] the FIM for the location-scale version is nonsingular at
. Thus, the ordinary properties of the MLE (consistency and asymptotic normality) remain valid for the MSN model.
Figure 1 shows plots of the pdf for the standard MSN model for several values of
. The subtle difference between the MSN and SN densities for the same parameter value can be observed, say for the case
(dotted line and segmented line).
The following properties follow from Arellano-Valle et al. [
7].
2.1. Properties
Let ; then:
;
If , then ;
If and , then ;
, and .
According to Property 3, the MSN distribution can be represented as a mixture of the asymmetry parameter between the skew-normal and standard normal distributions. For the location-scale extension, that is
, so that
, with pdf
2.2. Moments of the MSN Model
According to Property 3, the moments of the MSN distribution can be obtained by using the fact that it is a mixture between the SN and the
so that we can write
, where
, are the moments for the SN model with asymmetry parameter
s. Thus, for even values of
k, the moments of the SN
distribution are constant, coinciding with the moments for the MSN distribution. For the odd moments, making use of the stochastic representation for the skew-normal distribution in Henze [
11], we can write
where
,
. Considering
and
, we have that
Therefore, the first four moments of the standard MSN distribution are given by
The odd moments can also be obtained from Arellano-Valle et al. [
7]:
for
where
2.3. Moment-Generating Function
As mentioned previously, if
and
, then
, so that we can write
Therefore, all the moments for the random variable
Z are defined as follows:
2.4. Observed Information Matrix for MSN Model
Consider a random sample
from the MSN(
) distribution, with
, so that the corresponding log-likelihood function is given by
where
,
,
,
,
, and
. The entries of the observed information matrix for the MSN model have the same structure as the corresponding moments for the MLSN model considered in
Section 4.3, with the appropriate transformation.
3. The Modified Log-Skew-Normal Distribution
If
, then we have that
is distributed according to the standard MLSNdistribution with parameter
, denoted by
, if its pdf is given by
where
,
, and
.
Figure 2 depicts plots of the pdf for the MLSN(
) model for several values of
. If
, then it coincides with the log-normal distribution (solid line), illustrating the fact that the MLSN model is an extension of the LN model. Concerning the location-scale situation, that is
, where
,
, and
, its density is given by
where
,
, and
. In a survival analysis scenario, it is important to study the following functions: the survival function
and the risk function
, which for the model under study can be shown to be given by
and
Clearly, using L’Hopital´s rule, it follows that as , as can also be appreciated graphically.
Figure 3 illustrates the behavior of the risk function
for some values of
. The maximum values of the risk function for each
decrease for
and increase otherwise. Moreover,
,
tends to a strictly increasing function defined in the interval
, otherwise being zero. On the other hand,
,
is a strictly increasing function defined in the interval
, coinciding with the risk function of the log-normal model in this interval and taking the value zero in the interval
.
3.1. Moments for the MLSN Model
The
r-th moment is given by
for
and
. This expression is obtained directly from the moment-generating function of the MSN model, given in (
5), since it is valid for all
, particularly for
. Alternatively, the moments can be obtained from
3.1.1. Non-Existence of the Moment-Generating Function for the MLSN Distribution
Proposition 1. For all , variable has no moment-generating function.
Proof. Part of the proof parallels that of Lin and Stoyanov [
12] for a similar proof. Thus, for each
,
where
so that, for
and
in both cases,
as
. Therefore, given
,
, for any
. □
3.1.2. Skewness and Kurtosis Coefficients
To obtain expressions for the kurtosis and skewness coefficients, we have to compute the central moments using the following relationships:
This allows us to compute the variance, standard deviation (SD), coefficient of variation (CV), asymmetry (
), and kurtosis (
) coefficients, respectively, given by:
Table 1 reveals that the variation ranges for the variance and CV are relatively short, while the variation range for the asymmetry and kurtosis coefficients are relatively long if compared with the ordinary LSN model.
Figure 4 shows the behavior of the variance, CV, skewness, and kurtosis as a function of the lambda parameter. It can be observed that the minimum values correspond to the asymptotes of the left tail. On the other hand, for small values of
, say
, the plots of the right tail stabilize around the horizontal asymptotes with values of 6.74, 0.936, 5.83, and 97.93, respectively, for the corresponding indices.
3.2. MLE for the MLSN Model
Given a random sample
from a random variable
, with
, so that the corresponding log-likelihood is given by
where
,
,
,
,
, and
. Using the following notation:
we have that the associated scoring vector is given by
Equating the scoring functions to zero, it follows that the likelihood equations are given as
Solving this system of equations, which require numerical procedures, leads to the MLE for , , and .
5. An Application
The dataset analyzed in this section was previously studied in Nadarajah [
15] and Leiva et al. [
16]. It consists of daily measurements of ozone concentration (in
) in New York city between May and September 1973. The data were supposed to be independent, without the presence of tendencies or cyclical components (see Gokhale and Khare [
17]).
Table 3 presents the summary statistics, in particular the asymmetry and kurtosis coefficients, which are represented by the sample quantities (
) and (
), respectively.
Table 4 shows the MLE for the three parameters for the MLSN, LN, and LSN distributions, respectively, where values in parentheses correspond to standard errors. Note also from the table that the AIC for the LSN model is slightly smaller than that for the MLSN model; it is our opinion that the latter model should be preferred because the LSN model has a singular covariance matrix, and so, the likelihood ratio statistics for testing
is not distributed according to a central chi-squared distribution in large samples. On the other hand, under the MLSN model, for which the FIM is non-singular, we have that the hypothesis
is rejected at the
level, using the likelihood ratio statistics; this is given by
Replacing the MLE from
Table 4, we have that
, greater than the critical value
. Furthermore, from the table, a large sample 95% confidence interval for
based on the MLSN model does not contain
.
Figure 5 presents the data histogram with the corresponding fitted fdp for the MLSN
(solid line) and LN
distributions (dotted line) and the fit cdf for the MLSN and LN models, jointly with the empirical cdf. It can be shown from the figure that the MLSN model seems to provide a (graphically) satisfactory fit.
Graphical corroboration of the better fit of the MLSN model than the LN model is also illustrated with the QQ-plots in
Figure 6.
Table 5 presents the MLEs
,
, and
and the modified
with the corresponding standard errors (in parentheses), obtained by using the estimated FIM for the MLSN model, bearing in mind that the asymptotic distribution of
is
, where
or
. The table indicates that the modified MLE
is greater than the ordinary MLE
and is expected to have smaller bias.
Table 6 presents confidence intervals for
and
for several confidence coefficients. Comparing the lengths of the intervals for the two estimators, there is strong evidence that the modified estimator
presents shorter intervals.
6. Concluding Remarks
This paper focused on a transformation of the MSN model (Arellano-Valle et al. [
8]), which led to a more flexible distribution (wider ranges for asymmetry and kurtosis). This model, which we call the MLSN distribution, is suitable for positive data, its main competitors being the LN and LSN distributions. One interesting aspect of the new model is that its FIM is non-singular so that the large sample theory for the MLE remains valid. This is not the case with the LSN model, for which the FIM is singular in the location-scale version. Thus, in particular for testing
against
, under the MLSN model, the likelihood ratio statistics in large samples are distributed as in the chi-squared distribution. Large sample confidence intervals could also be constructed and used for testing the hypothesis
, where
is the skewness parameter. Rejection of
indicates that the MLSN model should be preferred. It is also noticed with the simulation study that the MLE of
can overestimate
(which could be infinite for some samples). Thus, the bias-reducing approach of Firth [
10] was used to derive a less-biased estimator. Estimations for the location and scale parameters remain stable, however, and do not need to be corrected. An application to a real dataset revealed that the new model can be a valuable alternative for modeling positive data.