Next Article in Journal
Quasiperiodic Patterns of the Complex Dimensions of Nonlattice Self-Similar Strings, via the LLL Algorithm
Previous Article in Journal
Constructing C0-Semigroups via Picard Iterations and Generating Functions: An Application to a Black–Scholes Integro-Differential Operator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm

1
School of Mathematics, Shandong University, Jinan 250100, China
2
School of Mathematics and Statistics, Shandong University, Weihai 264209, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(6), 590; https://doi.org/10.3390/math9060590
Submission received: 4 February 2021 / Revised: 4 March 2021 / Accepted: 6 March 2021 / Published: 10 March 2021

Abstract

:
Finite mixtures normal regression (FMNR) models are widely used to investigate the relationship between a response variable and a set of explanatory variables from several unknown latent homogeneous groups. However, the classical EM algorithm and Gibbs sampling to deal with this model have several weak points. In this paper, a non-iterative sampling algorithm for fitting FMNR model is proposed from a Bayesian perspective. The procedure can generate independently and identically distributed samples from the posterior distributions of the parameters and produce more reliable estimations than the EM algorithm and Gibbs sampling. Simulation studies are conducted to illustrate the performance of the algorithm with supporting results. Finally, a real data is analyzed to show the usefulness of the methodology.

1. Introduction

Finite mixtures regression (FMR) models are powerful statistical tools to explore the relationship between a response variable and a set of explanatory variables from several latent homogeneous groups. The aim of FMR is to discriminate the group an observation belongs to, and reveal the dependent relationship between the response and predictor variables in the same group after classification. Finite mixture regression model with normal error assumption (FMNR) is the earliest and most used mixture regression model in practice, see Quandt and Ramsey [1], De Veaux [2], Jones and McLachlan [3], Turner [4], McLachlan [5], Frühwirth–Schnatter [6] and Faria and Soromenho [7] et al. for early literatures. In recent years, many authors have extended the finite mixture normal regression models to other error distribution based mixture regression models, such as mixture Student’s t regression (Peel [8]), mixture Laplace regression (Song et al. [9,10]), mixture regression based on scale mixture of skew-normal distribution (Zeller et al. [11]), and mixture quantile regression model (Tian et al. [12], Yang et al. [13]). The classical methods to deal with these mixture models are mainly based on Gibbs sampling for Bayesian analysis and EM algorithm (Dempster [14]) for finding the maximum likelihood estimator (MLE) from frequentist perspective, and the crucial technique in these methods is to employ a group of latent variables to indicate the group an observation belongs to, and formulate a missing data structure. Although EM algorithm and Markov Chain Monte Carlo (MCMC) based algorithm are widely used in dealing with mixture models, there are still some weak points in these algorithms, which should not be omitted. As to EM algorithm, the standard error of estimated parameter is always calculated as the square root of the diagonal element of the asymptotic covariance matrix motivated by the central limit theorem, but when the size of samples is small or even medium, this approximation may be unreasonable. In the case of Gibbs or other MCMC based sampling algorithms, the samples used for statistical inferences are iteratively generated, thus the accuracy of parameter estimation may decrease due to the dependency in samples. Besides, although there are several tests and methods to study the convergence of the generated Markov Chain, no procedure can check convincingly whether the stage of convergence has been reached upon termination of iteration. So it would be a beneficial attempt to develop some algorithm with more effectiveness and computationally feasibility to deal with complex missing data problems.
Tan et al. [15,16] proposed a kind of non-iterative sampling algorithm (named IBF algorithm after Inverse Bayesian Formula) for missing data models, which can draw independently and identically distributed (i.i.d.) samples from posterior distribution. Without convergence diagnosis, the i.i.d. samples can be directly used to estimate model parameters and their standard errors, thus avoids the shortcomings in EM algorithm and Gibbs sampling. Recently, extensive applications of IBF algorithm have seen prosperous development in documents, to name a few, Yuan and Yang [17] developed a IBF algorithm for Student’s t regression with censoring, Yang and Yuan [18] applied the idea of IBF sampling to quantile regression models, also Yang and Yuan [19] designed a IBF algorithm for robust regression models with scale mixture of normal distribution, which includes normal, Student’s t, Slash, and contaminated normal as special cases. Tsionas [20] applied the idea to financial area and proposed a non-iterative method for posterior inference in stochastic volatility models.
Inspired by Tian et al. [15,16], in this paper, we propose an effective Bayesian statistical inference for FMNR model from a non-iterative perspective. We first introduce a group of latent multinomial distributed mixture component variables to formulate a missing data structure, and then combine the EM algorithm, inverse Bayes Formula, and sampling/importance resampling (SIR) ([21,22]) algorithm into a non-iterative sampling algorithm. Finally, we implement the IBF sampling to generate i.i.d. samples from posterior distributions and use these samples directly to estimate the parameters. We conduct simulation studies to assess the performance of the procedure by comparison with the EM algorithm and Gibbs sampling, and apply it to a classical data set with supporting results. The IBF sampling procedure can be directly extended to other mixture regression models, such as mixture regression model driven by Student’s t or Laplace error distributions.

2. Finite Mixture Normal Regression (FMNR) Model

In this subsection, we first introduce the usual FMNR model, and then display the complete likelihood function for observations and a set of latent variables, which are the mixture component variables, indicating the group an observation belongs to.
Let y i represent the ith observation of a response variable and x i = ( 1 , x i 1 , , x i , p 1 ) T be the ith observation of a set of p 1 explanatory variables. The FMNR model is described as follows
y i j = 1 g λ j N ( x i T β j , σ j 2 ) , i = 1 , , n ,
therefore, the distribution of response variables is modeled as finite mixture normal distributions with g components. Here, λ j is the mixture proportion with 0 < λ j < 1 , and  j = 1 n λ j = 1 , β j is the p-dimension regression coefficient vector in jth group with the first element denoting the intercept, σ j 2 represents the variance of the error distribution in group j, and N ( x i T β j , σ j 2 ) represents the normal distribution with mean x i T β j and variance  σ j 2 .
Denote parameters
β = ( β 1 T , , β g T ) T , σ 2 = ( σ 1 2 , , σ g 2 ) T ,
λ = ( λ 1 , , λ g ) T , θ = ( β T , σ 2 T , λ T ) T ,
and observation vector y = ( y 1 , y 2 , , y n ) T , the likelihood of y in the FMNR model is that
L ( θ | y ) = i = 1 n j = 1 g λ j N ( y i ; x i T β j , σ j 2 ) = i = 1 n j = 1 g λ j 1 2 π σ j exp ( y i x i T β j ) 2 2 σ j 2 ,
where N ( y i ; x i T β j , σ j 2 ) denotes the density function of distribution N ( x i T β j , σ j 2 ) evaluated at y i . It is challenging to get the maximum likelihood estimations of the parameters by directly maximizing the above likelihood function, and it is also difficult to get the full conditional distributions used in Bayesian analysis. In documents, one effective approach to deal with mixture models is to introduce a group of latent variables which establishes a missing data structure, in this way, some data augmentation algorithms, such as EM algorithm and Gibbs sampling can be easily performed, besides, by taking advantage of the missing data structure, a new non-iterative sampling algorithm can be smoothly carried out.
Here, we introduce a group of latent membership-indicators G i = ( G i 1 , , G i g ) T , such that
G i j = 1 , if y i belongs to group j , 0 , otherwise ,
with P ( G i j = 1 ) = λ j for j = 1 , 2 , , g , and j = 1 g G i j = 1 . Given the mixing probabilities λ = ( λ 1 , , λ g ) T , the latent variables G 1 , , G n are independent of each other, with multinomial densities,
f ( G i | λ ) λ 1 G i 1 λ g 1 G i , g 1 ( 1 λ 1 λ g 1 ) G i g .
Note that given G i j = 1 , we have y i | G i j = 1 N ( x i T β j , σ j 2 ) , and thus get the following complete likelihood of observations y and latent variables G = ( G 1 , , G n ) :
L ( θ | y , G ) = i = 1 n j = 1 g λ j N ( y i ; x i T β j , σ j 2 ) G i j = i = 1 n j = 1 g λ j 1 2 π σ j exp { ( y i x i T β j ) 2 2 σ j 2 } G i j .
The above complete likelihood is crucial to implement EM algorighm, Gibbs sampling and IBF algorithm.

3. Bayesian Inference Using IBF Algorithm

3.1. The Prior and Conditional Distributions

In Bayesian framework, suitable prior distributions should be adopted for the parameters. With the purpose to emphasize the importance of data and for simplicity, a set of independent non-informative prior distributions are used, which are
π ( β j , σ j 2 ) 1 σ j 2 , π ( λ 1 , , λ g ) 1 .
In fact, the prior of ( λ 1 , , λ g ) is Dirichlet distribution with concentration equals to 1, also known as a uniform prior over the space λ 1 + + λ g = 1 . These priors are improper and noninformative, but the posterior distributions are proper and meaningful. Then given complete data ( y , G ) , the posterior density of θ is shown as
π ( θ | y , G ) L ( θ | y , G ) j = 1 g π ( β j , σ j 2 ) π ( σ j 2 ) π ( λ ) L ( θ | y , G ) j = 1 g 1 σ j 2 .
In order to implement the IBF sampling in FMNR model, we now present several required conditional posterior distributions, which are all proportional to the complete posterior density π ( θ | y , G ) .
1. Conditional distribution π ( G i | θ , y )
For i = 1 , , n , the conditional posterior distribution of G i = ( G i 1 , , G i g ) T is
π ( G i 1 , , G i g | θ , y ) j = 1 g ( λ j N y i ; x i T β j , σ j 2 ) G i j ,
the right side of (1) is the kernel of the probability density function of multimomial distribution, thus we obtain that
G i | θ , y Mult ( 1 , r i 1 , , r i g ) ,
where
r i j = λ j N ( y i ; x i T β j , σ j 2 ) k = 1 g λ k N ( y i ; x i T β k , σ k 2 ) ,
with Mult ( 1 , r i 1 , , r i g ) standing for the multinomial distribution with parameters ( r i 1 , , r i g ) .
2. Conditional distribution π ( β j | σ j 2 , y , G )
The conditional density of β j is given by
π ( β j | σ j 2 , y , G ) exp i = 1 n G i j ( y i x i T β j ) 2 2 σ j 2 .
Let G ( j ) = ( G 1 j , , G n j ) , W j = diag ( G ( j ) ) ,
X = ( x 1 , , x p 1 ) T ,
and
B j 1 = X T W j X , b j = B j X T W j y ,
then, we have
π ( β j | σ j 2 , y , G ( j ) ) exp ( β j b j ) T B j 1 ( β j b j ) 2 σ j 2 ,
consequently,
β j | σ j 2 , y , G ( j ) N ( b j , σ j 2 B j ) .
3. Conditional distribution π ( σ j 2 | y , G ( j ) )
The conditional density of σ j 2 is obtained as
π ( σ j 2 | y , G ( j ) ) 1 σ j 2 i = 1 n G i j p 2 + 1 exp i = 1 n G i j ( y i x i T b j ) 2 ) 2 σ j 2 ,
noticing that the right side of Equation (4) is the kernel of inverse-Gamma density, we get
σ j 2 | y , G ( j ) IG i = 1 n G i j p 2 , i = 1 n G i j ( y i x i T b j ) 2 2 .
4. Conditional distribution π ( λ | G )
Finally, we obtain the conditional density of λ = ( λ 1 , , λ g ) T , that is
π ( λ 1 , , λ g | G ) j = 1 g λ j i = 1 n G i j + 1 1 ,
the right side of (6) is the kernel of Dirichlet distribution, thus
λ | G D ( i = 1 n G i 1 + 1 , , i = 1 n G i g + 1 ) .

3.2. IBF Sampler

Based on the following expression
π ( θ | y ) = G π ( θ | y , G ) π ( G | y ) ,
and
π ( θ | y , G ) = π ( β , σ 2 , λ | y , G ) = j = 1 g π ( β j | σ j 2 , y , G ( j ) ) π ( σ j 2 | y , G ( j ) ) π ( λ | y , G ) ,
in order to draw i.i.d. samples from π ( θ | y ) , we should first get i.i.d. samples from π ( G | y ) . According to Tan et al. [15,16], and based on the inverse Bayes formula, the posterior distribution of G can be expressed as
π ( G | y ) π ( G | y , θ 0 ) π ( θ 0 | y , G ) ,
where θ 0 is the posterior mode of π ( θ | y ) obtained by EM algorithm. The idea of IBF sampler in the EM framework is to use EM algorithm to obtain π ( G | y , θ 0 ) , an optimal important sampling density which can best approximate π ( G | y ) , thus by implementing SIR algorithm, i.i.d. samples can be generated approximately from π ( G | y ) .
Specifically, the IBF algorithm for FMNR model includes the following four steps:
Step 1.   Based on (2), draw i.i.d. samples
G i ( l ) = ( G i 1 ( l ) , , G i g ( l ) ) T Mult ( r i 1 0 , , r i g 0 ) , l = 1 , , L ,
for i = 1 , 2 , , n , where r i j 0 is the same as r i j in (3), with θ replaced by θ 0 . Denote G ( l ) = ( G 1 ( l ) , G 2 ( l ) , , G n ( l ) ) .
Step 2.   Calculate the weights
ω l = 1 π ( θ 0 | y , G ( l ) ) l = 1 L 1 π ( θ 0 | y , G ( l ) ) , l = 1 , 2 , L ,
where π ( θ 0 | y , G ( l ) ) is calculated as π ( θ 0 | y , G ) with G replaced by G ( l ) .
Step 3.   Choose a subset from { G ( l ) } l = 1 L via resampling without replacement from the discrete distribution on { G ( l ) } l = 1 L with probabilities { ω l } l = 1 L to obtain an i.i.d. sample of size K ( < L ) approximately from π ( G | y ) , denoted by { G ( l k ) } k = 1 K .
Step 4.   Based on (8), generate
λ ( l k ) | G ( l k ) D ( i = 1 n G i 1 ( l k ) + 1 , , i = 1 n G i g ( l k ) + 1 ) .
For j = 1 , , g , based on (6), generate
σ j 2 ( l k ) | y , G ( j ) ( l k ) IG i = 1 n G i j ( l k ) p 2 , i = 1 n G i j ( l k ) ( y i x i T b j ( l k ) ) 2 2 ,
and based on (4), generate
β j ( l k ) | σ j 2 ( l k ) , y , G ( j ) ( l k ) N b j ( l k ) , σ j 2 ( l k ) B j ( l k ) ,
where
G ( j ) ( l k ) = ( G 1 j ( l k ) , , G n j ( l k ) ) , W j ( l k ) = diag ( G ( j ) ( l k ) ) ,
and
b j ( l k ) = B j ( l k ) X T W j ( l k ) y , B j ( l k ) = X T W j ( l k ) 1 X 1 .
Finally, { ( β ( l k ) , σ 2 ( l k ) , λ ( l k ) ) } k = 1 K are the i.i.d. samples approximately generated from π ( β , σ 2 , λ | y ) .
In this section, we employ a non-informative prior setting for the FMNR model and develop a IBF sampler. We will extend this procedure to the conjugate prior situation in the supplement.

4. Results

4.1. Simulation Studies

To investigate the performance of the proposed algorithm, we conduct some simulations under different situations by means of four criteria such as the mean, mean square error, mean absolute deviance, and the coverage probability to evaluate the method by comparison with the classical EM and Gibbs sampling.
The data { ( x i , y i ) , i = 1 , , n } are generated from the following mixture model
y i = 5 5 x i + ε i , if z i = 1 ; 5 + 5 x i + ε i , if z i = 2 ,
where z i is a group indicator of y i with P ( z i = 1 ) = 0.5 . Here, x i is generated from the uniform distribution on interval ( 1 , 1 ) , and  ε i is the random error. We consider the following symmetric distributions to describe error:
1.
Normal distribution: ε N ( 0 , 1 ) ;
2.
Student’s t distribution: ε t ( 3 ) ;
3.
Laplace distribution: ε L ( 0 , 1 ) ;
4.
Logistic distribution: ε L o g i s t i c ( 0 , 1 ) .
In these error distributions, cases 2–4 are heavy-tailed symmetric distributions and often used in the literature to mimic the outlier situations. We use these distributions to illustrate the robustness and effectiveness of the procedure, and implement a 2-components mixture normal regression model with IBF sampling algorithm, EM algorithm and Gibbs sampling to fit these simulated data. The prior setting is the same as in Section 3.1.
To evaluate the finite sample performance of the proposed methods, the experiment is replicated for 200 times with sample size n = 100 and 300, respectively, for each error distribution. In the i-th replication, we set L = 6000 and K = 3000 , then get 3000 i.i.d samples from the posterior distributions using the proposed non-iterative sampling algorithm. Similarly, when performing Gibbs sampling, we generate 6000 Gibbs samples and discard the first 3000 as burn-in samples, then use the later 3000 after burn-in samples as effective samples to estimate the parameters. We record the mean of these samples as θ ^ ( i ) , for i = 1 , 2 , , 200 , and calculate the means (Mean), mean squared errors (MSE) and mean absolute deviances (MAD) of { θ ^ ( i ) } i = 1 200 , which are expressed as
Mean = 1 200 i = 1 200 θ ^ ( i ) , MSE = 1 200 i = 1 200 ( θ ^ ( i ) θ ) 2 , MAD = 1 200 i = 1 200 | θ ^ ( i ) θ | .
Besides, we calculate the coverage probability (CP) of 200 intervals with 95 % confidence level which cover the true values of parameters. The evaluation criteria for EM algorithm are the same as the above four statistics, with θ ( i ) denoting posterior mode instead of posterior mean in the i-th replication. These four statistics are used to assess the performance of the three algorithms.
Simulation results are presented in Table 1, Table 2 and Table 3. From these tables, we are delighted to see that:
1.
All the three algorithms recover the true parameters successfully, since most of the Means based on the 200 replications are very close to the corresponding true values of parameters, and the MSEs and MADs are rather small. Besides, almost all of the coverage probabilities are around 0.95, which is the true level of confidence interval. We also see that as sample size increases from 100 to 300, the MSEs and MADs decrease and fluctuate steadily in a small range;
2.
In general, for fixed size of samples, the MSEs and MADs in IBF algorithm are smaller than the ones in EM algorithm and Gibbs sampling, which show that the IBF sampling algorithm can produce more accurate estimators.
Thus, from these simulation studies, we can draw the conclusion that the IBF sampling algorithm is comparable to, and in most cases more effective than the EM algorithm and Gibbs sampling when modelling mixture regression data. Besides that, the IBF algorithm is much faster that the iterative Gibbs sampling, for example, when n = 100 , the mean running time is 6 s for IBF and 10.39 s for Gibbs, and when n = 300 , it is 19.58 for IBF and 31.16 s for Gibbs.

4.2. Real Data Analysis

The tone perception data stem from an experiment of Cohen [23] on the perception of musical tones by musicians. In this study, a pure fundamental tone was played to a trained musician with electronically generated overtones added, which is determined by a stretch ratio. The predictor variable is stretch ratio (X) and the response variable is the tuning ratio (Y), which is defined as the ratio of the adjusted tone to the fundamental tone. 150 pairs of observations with the same musician were recorded. The data is typically suitable for mixture regression modelling for that two separate trends clearly emerge in the scatter plot Figure 1. The dataset “tonedata” can be found in the R package mixtools authored by Benaglia et al. [24]. Many articles have analyzed this dataset using a mixture of mean regressions framework, see Cohen [23], Viele and Tong [25], Hunter and Young [26], and Young [27]. Recently, Song et al. [20], Wu and Yao [28] analyzed the data using a 2-component mixtures median regression models from parametric and semi-parametric perspective, respectively.
Now we re-analyze this data with the following 2-components FMNR model:
Y λ 1 N ( β 10 + β 11 X , σ 1 2 ) + λ 2 N ( β 20 + β 21 X , σ 2 2 ) ,
and implenment IBF algorithm by comparison with the EM and Gibbs sampling. For EM algorithm, we let β 10 = 1 , β 11 = 1 , β 20 = 1 , β 21 = 1 , σ 1 2 = 2 , σ 2 2 = 1 , λ 1 = 0.6 , λ 2 = 0.4 as initial values, and use the R function mixreg and covmix in R package mixreg (authored by Turner [29]) to estimate the parameters and calculate the standard deviance. As to IBF algorithm, we set K = 10 , 000 and J = 60 , 000 , thus get 6000 posterior samples. Similarly, in Gibbs sampling, we generate 10,000 Gibbs samples and discard the first 4000 as burn-in samples, then use the later 6000 after burn-in samples as effective samples to estimate the parameters.
Table 4 reports the estimated means and standard deviances of the parameters based on the above three algorithms. Clearly, the IBF algorithm are comparable to the EM and Gibbs sampling with very slight differences in means and standard deviances. We should notice that, although EM algorithm is effective for parameter estimation in mixture models, the standard deviances in EM algorithm are calculated as the square roots of the diagonal elements for the observed inverse information matrix, which are based on the asymptotic theory of MLE, however when the sample size is small or moderate, the estimated standard deviances may be unreliable. Therefore, compared with EM algorithm, we prefer to IBF sampling, for that the standard deviances can be directly estimated based on the i.i.d. posterior samples without extra effort. Figure 2 is the scatter plot with fitted mixture regression lines based on IBF, which clearly describes the linear relationship between turning ratio and stretch ratio in two different groups. In this figure, the observations are shown in different symbols, and classified using the mean of { G ( l k ) } k = 1 3000 by comparison with 0.5, when the i-element of the mean vector is greater than 0.5, the i observation is classified to group one. In the supplement, we present the plots of ergodic means of posterior samples for all parameters based on IBF sampler and Gibbs sampler. We can see from theses plots that for each parameter, the ergodic means based on IBF algorithm converge much faster than the ones based on Gibbs sampling, which is not surprising for that the IBF samples are i.i.d generated from the posterior distributions. Considering this point, the IBF sampling outperforms the Gibbs sampler.
For illustrating purpose, in the supplement, the histograms with kernel density estimation curves based on 6000 IBF samples and 6000 Gibbs samples are plotted in Figures S5–S8, with the red lines denoting the posterior means. In general, the posterior distributions of each parameter based on IBF algorithm and Gibbs sampler are very similar. These plots vividly depict the posterior distributions of the parameters, for example, the distributions of regression coefficients are nearly symmetric, resemble normal distributions, but the distributions of variances σ 1 2 and σ 2 2 are both right skewed with a long tail at the right side of the distributions, which cannot be spotted by the asymptotically normality theory for MLE (or posterior mode).
Therefore, the IBF sampling algorithm provides an access to estimate the posterior distributions of parameters with more accuracy than the EM algorithm and Gibbs sampling by utilizing the i.i.d. posterior samples.

4.3. Algorithm Selection

There are several Bayesian model selection criteria to compare different models in fitting the same data set, here we take advantage of the idea of model selection to help determine which sampling algorithm is the “best”, the IBF algorithm or the Gibbs sampling. The deviance information criterion (DIC), proposed by Spiegelhalter et al. [30], was used as a measure of fit and complexity. The deviance is defined as
D ( θ ) = 2 log L ( y | θ ) = 2 l ( y | θ ) ,
where L ( θ | y ) is the likelihood function for FMNR model given by (2), and  l ( θ | y ) is the log-likelihood. The DIC statistic is defined as
DIC = 2 D ¯ D ^ ,
with D ¯ = k = 1 K D ( θ ( k ) ) / K and D ^ = D ( θ ^ ) . Here, { θ ( k ) } k = 1 K are the posterior samples obtained by IBF sampler or Gibbs sampling, and  θ ^ are the estimated posterior means.
In Bayesian framework, the Akaike information criterion (AIC) proposed by Brooks [31] and the Bayesian information criterion (BIC) proposed by Carlin and Louis [32], can be estimated by
AIC = D ¯ + 2 s , BIC = D ¯ + s log ( n ) ,
where s is the number of the model parameters. We choose the algorithm with smaller DIC , AIC , and  BIC as a better algorithm to fit the model.
In the tone data study, the estimated measures based on FMNR model are given in Table 5. This table tells us that for all the three criteria, the values based on IBF are slightly smaller than the Gibbs counterparts, which justifies the superiority of the IBF sampler over the Gibbs sampler, although this superiority is not significant.

5. Discussion

This paper propose an effective Bayesian inference procedures for fitting FMNR models. With the in-complete data structure introduced by a group of multinomial distributed mixture component variables, a non-iterative sampling algorithms named IBF under a non-informative prior settings is developed. The algorithm combines the EM algorithm, inverse Bayesian formula and SIR algorithm to obtain i.i.d samples approximately from the observed posterior distributions of parameters. Simulation studies show that the IBF algorithm can estimate the parameters with more accuracy than the EM algorithm and Gibbs sampling, and runs much faster than Gibbs sampling. Real data analysis shows the usefulness of the methodology, also illustrates the advantages of the IBF sampler, for example, the ergodic means of samples from IBF sampling converge much faster than the ones from Gibbs sampling, and density estimation based on i.i.d. IBF samples are more reliable than the asymptotically normal distribution from EM algorithm. The IBF sampling algorithm can be viewed as an important supplement of the classical EM, Gibbs sampling and other missing data algorithms. The procedure can be extended to other error distribution based finite mixture regression models, such as mixture Student’s t regression. In this case, the Student’s t distribution should be firstly represented as a mixture of normal distribution with the latent mixture variable following a gamma distribution, and the IBF algorithm should be carried out twice. In this paper, the number of component is known, when g is unknown, the Reversible-jump MCMC method (Green [33], Richardson and Green [34]) may be used to estimate g, and this point is under consideration.e. Future research directions may also be highlighted.

Supplementary Materials

The following are available online at https://www.mdpi.com/2227-7390/9/6/590/s1.

Author Contributions

Conceptualization, S.A. and F.Y.; methodology, S.A. and F.Y.; investigation, S.A. and F.Y.; software, S.A.; writing-original draft preparation, S.A. and F.Y.; writing-review and editing, S.A. and F.Y.; Both authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation of Shandong province of China under grant ZR2019MA026.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The “tonedata” used in this paper are availabe from the free R package mixtools authored by Benaglia et al. [24].

Acknowledgments

The authors would like to express their gratitude to Benaglia et al. [24] for providing the “tonedata” in the free R package mixtools.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Quandt, R.; Ramsey, J. Estimating mixtures of normal distributions and switching regression. J. Am. Stat. Assoc. 1978, 73, 730–738. [Google Scholar] [CrossRef]
  2. De Veaux, R.D. Mixtures of linear regressions. Comput. Stat. Data Anal. 1989, 8, 227–245. [Google Scholar] [CrossRef]
  3. Jones, P.N.; McLachlan, G.J. Fitting finite mixture models in a regression context. Aust. J. Stat. 1992, 34, 233–240. [Google Scholar] [CrossRef]
  4. Turner, T.R. Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J. R. Stat. Soc. C-Appl. 2000, 49, 371–384. [Google Scholar] [CrossRef]
  5. McLachlan, J.; Peel, D. Finite Mixture Models, 1st ed.; Wiley: New York, NY, USA, 2000. [Google Scholar]
  6. Frühwirth-Schnatter, S. Finite Mixture and Markov Switching Models, 1st ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  7. Faria, S.; Soromenho, G. Fitting mixtures of linear regressions. J. Stat. Comput. Simul. 2010, 80, 201–225. [Google Scholar] [CrossRef]
  8. Peel, D.; Mclachlan, G.J. Roubust mixture modelling using the t distribution. Stat. Comput. 2000, 10, 339–348. [Google Scholar] [CrossRef]
  9. Song, W.X.; Yao, W.X.; Xing, Y.R. Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 2014, 71, 128–137. [Google Scholar] [CrossRef] [Green Version]
  10. Song, W.X.; Yao, W.X.; Xing, Y.R. Robust mixture multivariate linear regression by multivariate Laplace distribution. Stat. Probab. Lett. 2017, 43, 2162–2172. [Google Scholar]
  11. Zeller, C.B.; Cabral, C.R.B.; Lachos, V.H. Robust mixture regression modelling based on scale mixtures of skew-normal distributions. Test 2016, 25, 375–396. [Google Scholar] [CrossRef]
  12. Tian, Y.Z.; Tang, M.L.; Tian, M.Z. A class of finite mixture of quantile regressions with its applications. J. Appl. Stat. 2016, 43, 1240–1252. [Google Scholar] [CrossRef]
  13. Yang, F.K.; Shan, A.; Yuan, H.J. Gibbs sampling for mixture quantile regression based on asymmetric Laplace distribution. Commun. Stat. Simul. Comput. 2019, 48, 1560–1573. [Google Scholar] [CrossRef]
  14. Dempster, A.P.; Laird, N.M.; Robin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 1977, 39, 1–38. [Google Scholar]
  15. Tan, M.; Tian, G.L.; Ng, K. A non-iterative sampling algorithm for computing posteriors in the structure of em-type algorithms. Stat. Sin. 2003, 43, 2162–2172. [Google Scholar]
  16. Tan, M.; Tian, G.L.; Ng, K. Bayesian Missing Data Problems: EM, Data Augmentation and Noniterative Computation, 1st ed.; Chapman & Hall/CRC: New York, NY, USA, 2010. [Google Scholar]
  17. Yuan, H.J.; Yang, F.K. A non-iterative Bayesian sampling algorithm for censored Student-t linear regression models. J. Stat. Comput. Simul. 2016, 86, 3337–3355. [Google Scholar]
  18. Yang, F.K.; Yuan, H.J. A non-iterative posterior sampling algorithm for linear quantile regression model. Commun. Stat. Simul. Comput. 2017, 46, 5861–5878. [Google Scholar] [CrossRef]
  19. Yang, F.K.; Yuan, H.J. A Non-iterative Bayesian Sampling Algorithm for Linear Regression Models with Scale Mixtures of Normal Distributions. Comput. Econ. 2017, 45, 579–597. [Google Scholar] [CrossRef]
  20. Tsionas, M.G. A non-iterative (trivial) method for posterior inference in stochastic volatility models. Stat. Probab. Lett. 2017, 126, 83–87. [Google Scholar] [CrossRef] [Green Version]
  21. Rubin, D.B. Comments on “The calculation of posterior distributions by data augmentation,” by M.A. Tanner and W.H. Wong. J. Am. Statist. Assoc. 1987, 82, 543–546. [Google Scholar]
  22. Rubin, D.B. Using the SIR algorithm to simulate posterior distributions. In Bayesian Statistics; Bernardo, J.M., DeGroot, M.H., Lindley, D.V.A., Simith, A.F.M., Eds.; Oxford University Press: Oxford, UK, 1988; Volume 3, pp. 395–402. [Google Scholar]
  23. Cohen, E. The Influence of Nonharmonic Partials on Tone Perception. Ph.D. Thesis, Stanford University, Stanford, CA, USA, 1980. [Google Scholar]
  24. Benaglia, T.; Chauveau, D.; Hunter, D.R.; Yong, D. mixtools: An R package for analyzing finite mixture models. J. Stat. Softw. 2009, 32, 1–19. [Google Scholar] [CrossRef] [Green Version]
  25. Viele, K.; Tong, B. Modeling with mixtures of linear regressions. Stat. Comput. 2002, 12, 315–330. [Google Scholar] [CrossRef]
  26. Hunter, D.R.; Young, D.S. Semiparametric Mixtures of regressions. J. Nonparametr. Stat. 2012, 24, 19–38. [Google Scholar] [CrossRef] [Green Version]
  27. Young, D.S. Mixtures of regressions with change points. Stat. Comput. 2014, 24, 265–281. [Google Scholar] [CrossRef]
  28. Wu, Q.; Yao, W.X. Mixtures of quantile regressions. Comput. Stat. Data Anal. 2016, 93, 162–176. [Google Scholar] [CrossRef] [Green Version]
  29. Turner, R. Mixreg: Functions to Fit Mixtures of Regressions. R Package Version 0.0-6. 2018. Available online: http://CRAN.R-project.org/package=mixreg (accessed on 30 March 2018).
  30. Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Van Der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 2002, 64, 583–639. [Google Scholar] [CrossRef] [Green Version]
  31. Brooks, S.P. Discussion on the paper by Spiegelhalter, Best, Carlin, and van der Linde. J. R. Stat. Soc. B 2002, 64, 616–618. [Google Scholar]
  32. Carlin, B.P.; Louis, T.A. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2000. [Google Scholar]
  33. Green, P.J. Reversible jump markov chain monte carlo computation and Bayesian model determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef]
  34. Richardson, S.; Green, P.J. On bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc. B 1997, 59, 731–784. [Google Scholar] [CrossRef]
Figure 1. Scatter plot of tone perception data.
Figure 1. Scatter plot of tone perception data.
Mathematics 09 00590 g001
Figure 2. Scatter plot of tone perception data with estimated mixture regression lines by IBF algorithm.
Figure 2. Scatter plot of tone perception data with estimated mixture regression lines by IBF algorithm.
Mathematics 09 00590 g002
Table 1. Monte Carlo simulation results for IBF sampler.
Table 1. Monte Carlo simulation results for IBF sampler.
DistributionParameterIBF ( n = 100 )IBF ( n = 300 )
MeanMSEMADCPMeanMSEMADCP
Normal β 10 5.01050.02250.11600.934.99710.00690.06640.96
β 11 −5.0000.07480.22020.93−4.99830.02500.12280.94
β 20 −4.99400.02140.11710.965−5.00610.00710.06620.93
β 21 4.99490.06850.20530.9405.02400.02300.12090.95
λ 1 0.49370.00250.04110.9650.49680.00080.02300.94
t (3) β 10 5.01100.05740.18840.965.02060.01840.11000.96
β 11 −5.02900.22510.35450.91−5.05010.06480.19750.955
β 20 −4.98170.05490.17850.96−5.00560.01650.10480.985
β 21 5.02430.20830.34960.955.03480.08520.22840.92
λ 1 0.49730.00320.04010.950.50050.00100.02620.93
Logistic (0,1) β 10 4.99070.07820.22150.9455.00640.03270.14430.940
β 11 −4.98120.23470.38900.905−4.98920.10260.24910.905
β 20 −5.00540.07520.21870.955−5.00530.03140.14280.915
β 21 4.98920.23880.38380.9154.98100.08900.23080.920
λ 1 0.49170.00350.04770.920.49720.00090.02480.955
Laplace (0,1) β 10 5.01820.04200.16310.955.01190.01380.09500.97
β 11 −4.98000.13850.27990.93−5.00290.05400.18560.935
β 20 −5.01610.04750.16440.915−5.01330.01600.09860.925
β 21 4.97430.16910.31630.9255.00140.04280.17050.950
λ 1 0.49960.00210.04290.950.49920.00100.02310.94
Table 2. Monte Carlo simulation results for Gibbs sampling.
Table 2. Monte Carlo simulation results for Gibbs sampling.
DistributionParameterGibbs ( n = 100 )Gibbs ( n = 300 )
MeanMSEMADCPMeanMSEMADCP
Normal β 10 5.00600.02280.12130.9555.00310.00760.07070.96
β 11 −5.00470.08710.23810.940−5.00240.03130.14130.93
β 20 −5.0000.02680.13110.945−4.99430.00730.06830.95
β 21 5.03560.08100.22720.9505.00790.02650.12810.94
λ 1 0.49890.00240.03920.9650.50080.00080.02380.955
t(3) β 10 4.96700.07320.19190.9554.97240.06160.19090.975
β 11 −5.04240.21380.34860.955−5.08500.19670.32150.960
β 20 −4.97660.07890.21050.965−4.97610.06640.19280.95
β 21 5.12340.28510.41290.9555.06540.20100.34150.935
λ 1 0.50010.00320.04490.920.50060.00270.04080.955
Logistic (0,1) β 10 4.96440.08610.23550.9604.99370.02460.12640.960
β 11 −5.06820.33500.47000.935−5.04300.08720.23440.925
β 20 −4.97750.08940.24440.96−4.99580.03670.14830.915
β 21 5.00360.29600.42890.9554.99600.09790.24890.945
λ 1 0.49830.00310.04370.930.504970.00090.02440.97
Laplace (0,1) β 10 5.00460.04470.16450.9505.00940.01450.09530.965
β 11 −5.02630.13700.28930.975−5.00650.05980.19210.940
β 20 −4.98490.03950.15610.975−5.00190.01370.09400.975
β 21 5.05490.17050.33760.9504.98230.04890.17170.950
λ 1 0.50500.00280.04380.960.49830.00090.02370.955
Table 3. Monte Carlo simulation results for EM algortim.
Table 3. Monte Carlo simulation results for EM algortim.
DistributionParameterEM ( n = 100 )EM ( n = 300 )
MeanMSEMADCPMeanMSEMADCP
Normal β 10 4.99210.02260.11990.9305.01950.02780.13460.955
β 11 −4.97890.07530.22030.935−4.99640.02270.12140.960
β 20 −5.00970.02430.12440.93−4.99590.00780.06840.935
β 21 4.98870.06900.20450.955.00090.02360.12460.960
λ 1 0.50200.00240.03860.9450.49840.00080.02410.955
t(3) β 10 4.99190.05690.18890.925.02450.02060.11840.945
β 11 −4.95310.23920.38260.89−0.50330.08370.22190.905
β 20 −4.99000.06940.20010.915−4.99870.02030.10860.935
β 21 5.01780.20870.35050.9205.03200.08760.23360.915
λ 1 0.49700.00320.04490.9250.49720.00100.02610.96
Logistic (0,1) β 10 5.02910.07860.22290.9255.01950.02780.13460.955
β 11 −4.93730.23770.39130.945−5.02320.09920.25140.915
β 20 −5.00340.07910.22330.925−5.00610.02760.13100.950
β 21 4.91620.24400.39030.9404.97690.09010.23160.935
λ 1 0.49620.00350.04870.950.50260.00090.02470.965
Laplace (0,1) β 10 4.99280.04250.16380.9404.99820.01770.10760.935
β 11 −4.96370.17140.32820.895−4.98340.04590.17490.970
β 20 −5.01850.04860.16970.895−5.01250.01330.09260.965
β 21 5.00640.18310.33340.9254.98800.05250.17950.940
λ 1 0.49870.00310.04450.950.49570.00090.02550.955
Table 4. Estimations with EM, Gibbs, and IBF in Tone data analysis.
Table 4. Estimations with EM, Gibbs, and IBF in Tone data analysis.
AlgorithmEstimates β 10 β 11 β 20 β 21 σ 1 2 σ 2 2 λ 1 λ 2
EMMean1.91630.0425−0.01920.99220.00210.01760.69770.3022
Sd0.02260.01020.10210.04410.00030.00410.04840.0484
IBFMean1.91620.0426−0.02110.99300.01960.00220.69510.3048
Sd0.02280.01030.10720.04620.00480.00030.04440.0444
GibbsMean1.91620.0427−0.01980.99210.02020.00220.69830.3017
Sd0.02330.01050.11240.04810.00550.00030.04750.0475
Table 5. Model selection criteria for FHNR model with IBF sampler and Gibbs algorithm in the tone data analysis.
Table 5. Model selection criteria for FHNR model with IBF sampler and Gibbs algorithm in the tone data analysis.
AlgorithmDICAICBIC
IBF−268.9490−259.5399−235.4548
Gibbs−268.4472−259.2297−235.1446
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shan, A.; Yang, F. Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm. Mathematics 2021, 9, 590. https://doi.org/10.3390/math9060590

AMA Style

Shan A, Yang F. Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm. Mathematics. 2021; 9(6):590. https://doi.org/10.3390/math9060590

Chicago/Turabian Style

Shan, Ang, and Fengkai Yang. 2021. "Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm" Mathematics 9, no. 6: 590. https://doi.org/10.3390/math9060590

APA Style

Shan, A., & Yang, F. (2021). Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm. Mathematics, 9(6), 590. https://doi.org/10.3390/math9060590

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop