Skew-Normal Inflated Models: Mathematical Characterization and Applications to Medical Data with Excess of Zeros and Ones
Abstract
:1. Introduction
2. Modeling with Skew Distributions
2.1. Fundamental Concepts of Regression and Link Functions
- Logit link function
- Probit link function , where is the inverse of the standard normal cumulative distribution function (CDF).
- Log–log link function .
- Complementary log–log link function .
2.2. Skew-Normal Distribution and Its Modeling
2.3. Single and Doubly Censored Data
2.4. Determination of Censoring Thresholds
3. Regression Models for Unit Interval Data with Inflation
3.1. Complementary Log–Log Unit Skew-Normal Regression Model
3.2. Probit Unit Skew-Normal Regression Model
3.3. Logit Unit Skew-Normal Regression Model
3.4. Information Matrices in Skew-Normal Models
4. Unit Skew-Normal Zero–One Inflated Regression Models
4.1. Formulation of the Skew-Normal Zero–One Inflated Model
4.2. Parameter Estimation in the SNZOI Model
5. Empirical Applications
5.1. Model Selection Criteria
5.2. Case Study 1: Doubly Censored Data
Algorithm 1 Case Study 1: Doubly censored data |
|
5.3. Case Study 2: One-Inflated Data
Algorithm 2 Case Study 2: One-inflated data |
|
5.4. Computational Costs of Algorithms
- Model fitting—The fitting process for models, such as L-LSNZOI and P-LSNZOI structures, involves maximizing the log-likelihood function through iterative numerical optimization methods. This process typically requires multiple iterations (often ranging from 50 to 200 iterations) to achieve convergence, especially in the presence of complex likelihood surfaces and multiple parameters. Each iteration involves recalculating the likelihood function and its gradient, adding to the computational time. The computational cost can be roughly estimated as , where n is the sample size, p is the number of parameters, and r is the number of iterations.
- Residual analysis—Calculating martingale residuals and their diagnostic plots involve extensive computations. Generating simulated envelopes through bootstrapping, for instance, necessitates multiple re-samplings of the data and re-fitting the model to each bootstrap sample, which is computationally expensive. If B bootstrap samples are used, the computational cost for this step is roughly .
- Model selection criteria—Calculating AIC and AICc requires obtaining the log-likelihood value at the estimated parameters, which involves evaluating the likelihood function at these estimates. While this step is less computationally intensive than the fitting process, it still adds to the overall computational burden. The computational cost for this step is approximately .
6. Discussion
6.1. Key Findings and Insights
6.2. Comparison with Previous Studies
6.3. Model Limitations
6.4. Directions for Future Research
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Schaminée, J.H.; Hennekens, S.M.; Chytry, M.; Rodwell, J.S. Vegetation-plot data and databases in Europe: An overview. Preslia 2009, 81, 173–185. [Google Scholar]
- Tobin, J. Estimation of relationships for limited dependent variables. Econometrica 1958, 26, 24–36. [Google Scholar] [CrossRef]
- Barros, M.; Galea, M.; González, M.; Leiva, V. Influence diagnostics in the tobit censored response model. Stat. Methods Appl. 2010, 19, 379–397. [Google Scholar] [CrossRef]
- Ferreira, P.H.; Shimizu, T.K.; Suzuki, A.K.; Louzada, F. On an asymmetric extension of the tobit model based on the tilted-normal distribution. Chil. J. Stat. 2019, 10, 99–122. [Google Scholar]
- Barros, M.; Galea, M.; Leiva, V.; Santos-Neto, M. Generalized tobit models: Diagnostics and application in econometrics. J. Appl. Stat. 2018, 45, 145–167. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F. Statistical inference for a general class of asymmetric distributions. J. Stat. Plan. Inference 2005, 128, 427–443. [Google Scholar] [CrossRef]
- Gallardo, D.I.; Bourguignon, M.; Galarza, C.E.; Gómez, H.W. A parametric quantile regression model for asymmetric response variables on the real line. Symmetry 2022, 14, 1938. [Google Scholar] [CrossRef]
- Gupta, R.D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
- Pewsey, A.; Gómez, H.W.; Bolfarine, H. Developments in skew-symmetric distributions and their applications. Symmetry 2022, 14, 567. [Google Scholar]
- Desousa, M.; Saulo, H.; Leiva, V.; Scalco, P. On a tobit-Birnbaum–Saunders model with an application to medical data. J. Appl. Stat. 2018, 45, 932–955. [Google Scholar] [CrossRef]
- Sanchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum–Saunders quantile regression and its diagnostics with application to economic data. Appl. Stoch. Model. Bus. Ind. 2021, 37, 53–73. [Google Scholar] [CrossRef]
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
- Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
- Castillo, N.O.; Gómez, H.W.; Leiva, V.; Sanhueza, A. On the Fernández-Steel distribution: Inference and application. Comput. Stat. Data Anal. 2011, 55, 2951–2961. [Google Scholar] [CrossRef]
- Ventura, M.; Saulo, H.; Leiva, V.; Monsueto, S. Log-symmetric regression models: Information criteria, application to movie business and industry data with economic implications. Appl. Stoch. Model. Bus. Ind. 2019, 3, 963–977. [Google Scholar] [CrossRef]
- Massuia, M.B.; Garay, A.M.; Cabral, C.R.; Lachos, V.H. Bayesian analysis of censored linear regression models with scale mixtures of skew-normal distributions. Stat. Its Interface 2017, 10, 425–439. [Google Scholar] [CrossRef]
- Morán-Vásquez, R.A.; Giraldo-Melo, A.D.; Mazo-Lopera, M.A. Quantile estimation using the log-skew-normal linear regression model with application to children’s weight data. Mathematics 2023, 11, 3736. [Google Scholar] [CrossRef]
- Dias-Domingues, T.; Mouriño, H.; Sepúlveda, N. Classification methods for the serological status based on mixtures of skew-normal and skew-t distributions. Mathematics 2024, 12, 217. [Google Scholar] [CrossRef]
- Mudholkar, G.S.; Hutson, A.D. The epsilon-skew-normal distribution for analyzing near-normal data. J. Stat. Plan. Inference 2000, 83, 291–309. [Google Scholar] [CrossRef]
- Gómez, H.W.; Venegas, O.; Bolfarine, H. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 2007, 18, 395–407. [Google Scholar] [CrossRef]
- Arrué, J.; Arellano-Valle, R.; Gómez, H.W.; Leiva, V. On a new type of Birnbaum–Saunders models and its inference and application to fatigue data. J. Appl. Stat. 2020, 47, 2690–2710. [Google Scholar] [CrossRef] [PubMed]
- Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution. J. Appl. Stat. 2000, 27, 859–870. [Google Scholar] [CrossRef]
- Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
- Ospina, R.; Ferrari, S. Inflated beta distributions. Stat. Pap. 2010, 51, 111–126. [Google Scholar] [CrossRef]
- Ospina, R.; Ferrari, S. A general class of zero-or-one inflated beta regression models. Comput. Stat. Data Anal. 2012, 56, 1609–1623. [Google Scholar] [CrossRef]
- Couri, L.; Ospina, R.; da Silva, G.; Leiva, V.; Figueroa-Zúñiga, J. A study on computational algorithms in the estimation of parameters for a class of beta regression models. Mathematics 2022, 10, 299. [Google Scholar] [CrossRef]
- Mohammadi, Z.; Sajjadnia, Z.; Bakouch, H.S.; Sharafi, M. Zero-and-one inflated Poisson-Lindley INAR (1) process for modelling count time series with extra zeros and ones. J. Stat. Comput. Simul. 2022, 92, 2018–2040. [Google Scholar] [CrossRef]
- Lee, B.S.; Haran, M. A class of models for large zero-inflated spatial data. J. Agric. Biol. Environ. Stat. 2024. [Google Scholar] [CrossRef]
- Figueroa-Zúñiga, J.; Niklitschek, S.; Leiva, V.; Liu, S. Modeling heavy-tailed bounded data by the trapezoidal beta distribution with applications. REVSTAT Stat. J. 2022, 20, 387–404. [Google Scholar]
- Jornsatian, C.; Bodhisuwan, W. Zero-one inflated negative binomial-beta exponential distribution for count data with many zeros and ones. Commun. Stat. Theory Methods 2022, 51, 8517–8531. [Google Scholar] [CrossRef]
- Keim, J.L.; DeWitt, P.D.; Fitzpatrick, J.J.; Jenni, N.S. Estimating plant abundance using inflated beta distributions: Applied learnings from a Lichen-Caribou ecosystem. Ecol. Evol. 2017, 7, 486–493. [Google Scholar] [CrossRef] [PubMed]
- Benites, L.; Maehara, R.; Lachos, V.H.; Bolfarine, H. Linear regression models using finite mixtures of skew heavy-tailed distributions. Chil. J. Stat. 2019, 10, 21–41. [Google Scholar]
- Desousa, M.; Saulo, H.; Santos-Neto, M.; Leiva, V. On a new mixture-based regression model: Simulation and application to data with high censoring. J. Stat. Comput. Simul. 2020, 90, 2861–2877. [Google Scholar] [CrossRef]
- Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F. A new class of skew-normal distributions. Commun. Stat. Theory Methods 2004, 33, 1465–1480. [Google Scholar] [CrossRef]
- Saulo, H.; Dasilva, A.; Leiva, V.; Sanchez, L.; de la Fuente, H. Log-symmetric quantile regression models. Stat. Neerl. 2022, 76, 124–163. [Google Scholar] [CrossRef]
- Chai, H.S.; Bailey, K.R. Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 2008, 27, 3643–3655. [Google Scholar] [CrossRef] [PubMed]
- Cragg, J.G. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 1971, 39, 829–844. [Google Scholar] [CrossRef]
- Moulton, L.H.; Halsey, N.A. A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics 1995, 51, 1570–1578. [Google Scholar] [CrossRef]
- McCulloch, R.; Rossi, P.E. An exact likelihood analysis of the multinomial probit model. J. Econom. 1994, 64, 207–240. [Google Scholar] [CrossRef]
- Keane, M.P. A note on identification in the multinomial probit model. J. Bus. Econ. Stat. 1992, 10, 193–200. [Google Scholar] [CrossRef]
- Heckman, J.; Sedlacek, G. Heterogeneity, aggregation, and market wage functions: An empirical model of self-selection in the labor market. J. Political Econ. 1985, 93, 1077–1125. [Google Scholar] [CrossRef]
- Imai, K.; Van-Dyk, D. A Bayesian analysis of the multinomial probit model using marginal data augmentation. J. Econom. 2005, 124, 311–334. [Google Scholar] [CrossRef]
- Nocedal, J.; Wright, S. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
- Farias, R.; Moreno-Arenas, G.; Patriota, A. Reduction of models in the presence of nuisance parameters. Colomb. J. Stat. 2009, 32, 99–121. [Google Scholar]
- Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2004. [Google Scholar]
- Hurvich, C.M.; Tsai, C.L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
- Galvis, D.M.; Bandyopadhyay, D.; Lachos, V.H. Augmented mixed beta regression models for periodontal proportion data. Stat. Med. 2014, 33, 3759–3771. [Google Scholar] [CrossRef]
- Ortega, E.M.; Bolfarine, H.; Paula, G.A. Influence diagnostics in generalized log-gamma regression models. Comput. Stat. Data Anal. 2003, 42, 165–186. [Google Scholar] [CrossRef]
Parameter | CLL-SNZOI | CLL-LSNZOI | CLL-DCLSN | P-LSNZOI | L-LSNZOI | L-DCLSN |
---|---|---|---|---|---|---|
−0.4405 | 0.6337 | −3.9122 | 0.6337 | 0.6337 | 0.7301 | |
(0.4174) | (0.7408) | (0.8863) | (0.7408) | (0.7408) | (0.8914) | |
0.0221 | −0.0376 | −0.0764 | −0.0376 | −0.0376 | −0.0957 | |
(0.0075) | (0.0135) | (0.0160) | (0.0135) | (0.0135) | (0.0161) | |
−2.2137 | −2.2137 | −2.1137 | −2.1123 | −1.2683 | −1.7604 | |
(1.2252) | (1.2252) | (0.5710) | (1.0552) | (1.2515) | (0.7648) | |
−1.5892 | −1.5892 | −0.4207 | −1.4316 | −1.1768 | −0.4990 | |
(0.7523) | (0.7523) | (0.1805) | (0.5520) | (0.4923) | (0.2381) | |
0.0509 | 0.0509 | 0.0274 | 0.0533 | 0.0517 | 0.0291 | |
(0.0260) | (0.0260) | (0.0101) | (0.0230) | (0.0218) | (0.0131) | |
−2.8527 | −8.0316 | −8.0886 | −8.0316 | −8.0316 | −8.3703 | |
(1.9217) | (2.3145) | (1.4958) | (2.3145) | (2.3145) | (1.4941) | |
0.0440 | 0.0788 | −0.0688 | 0.0788 | 0.0788 | 0.0324 | |
(0.0274) | (0.0358) | (0.0236) | (0.0358) | (0.0358) | (0.0236) | |
0.7671 | 0.6453 | 0.3023 | 0.6161 | 0.3177 | 0.3198 | |
(0.2731) | (0.2279) | (0.0545) | (0.2362) | (0.0477) | (0.0667) | |
−1.9835 | −1.9835 | −0.7232 | −1.7782 | −1.6450 | −0.9430 | |
(0.9456) | (0.9456) | (0.0523) | (0.7634) | (0.4274) | (0.1660) | |
AIC | 310.13 | 309.94 | 325.34 | 308.84 | 313.53 | 326.40 |
AICc | 312.91 | 312.73 | 328.12 | 311.63 | 316.32 | 329.47 |
Parameter | BZI | P-LNZOI | CLL-DCLN | P-DCLN | CLL-LNZOI |
---|---|---|---|---|---|
0.6337 | 0.6337 | 0.7768 | 1.1731 | 0.6337 | |
(0.7408) | (0.7408) | (0.7821) | (2.1798) | (0.7408) | |
−0.0376 | −0.0376 | −0.1791 | −0.2042 | −0.0376 | |
(0.0135) | (0.0135) | (0.0139) | (0.0450) | (0.0135) | |
−1.3885 | −2.7994 | −3.2301 | −2.0330 | −2.8949 | |
(0.3957) | (1.0573) | (0.5903) | (0.4353) | (1.1453) | |
−0.5366 | −1.1257 | −0.5926 | −0.4060 | −1.3134 | |
(0.1613) | (0.4304) | (0.2087) | (0.4905) | (0.4387) | |
0.0217 | 0.0452 | 0.0380 | 0.0263 | 0.0393 | |
(0.0068) | (0.0186) | (0.0106) | (0.0091) | (0.0194) | |
−8.0316 | −8.0316 | −8.4042 | −11.0378 | −8.0316 | |
(2.3153) | (2.3145) | (2.8866) | (1.3108) | (2.3145) | |
0.0788 | 0.0788 | −0.1049 | 0.0611 | 0.0788 | |
(0.0358) | (0.0358) | (0.0449) | (0.0210) | (0.0358) | |
0.0903 | 0.3250 | 0.2649 | 0.2646 | 0.3096 | |
(0.0652) | (0.0418) | (0.1012) | (0.0126) | (0.0796) | |
AIC | 311.70 | 313.80 | 324.33 | 323.51 | 316.07 |
AICc | 314.58 | 316.45 | 326.97 | 326.15 | 318.71 |
Parameter | BOI | L-LSNZOI | P-LSNZOI | L-DCLSN | Tobit |
---|---|---|---|---|---|
−0.0086 | −3.3573 | −2.8368 | −3.3573 | 0.2968 | |
(0.3339) | (1.0938) | (1.5888) | (1.0938) | (0.0897) | |
−0.2828 | 0.1203 | 0.1036 | 0.1203 | −0.1003 | |
(0.1513) | (0.0351) | (0.0489) | (0.0351) | (0.0393) | |
0.0182 | 0.0104 | ||||
(0.0061) | (0.0015) | ||||
−6.1599 | −6.1599 | −5.7350 | 3.6472 | ||
(1.0003) | (0.9992) | (1.0661) | (0.5480) | ||
0.0853 | 0.0853 | 0.0756 | −0.0503 | ||
(0.0166) | (0.0165) | (0.0174) | (0.0092) | ||
1.3112 | 0.3815 | 0.3666 | 0.3815 | ||
(0.0868) | (0.0215) | (0.0180) | (0.0215) | −1.3046 | |
−7.7660 | −8.1582 | −7.7660 | 0.0493 | ||
(3.1311) | (3.4725) | (3.1311) | |||
AIC | 170.67 | 167.74 | 172.65 | 166.85 | 174.29 |
AICc | 173.07 | 170.08 | 175.04 | 169.24 | 176.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Martínez-Flórez, G.; Tovar-Falón, R.; Leiva, V.; Castro, C. Skew-Normal Inflated Models: Mathematical Characterization and Applications to Medical Data with Excess of Zeros and Ones. Mathematics 2024, 12, 2486. https://doi.org/10.3390/math12162486
Martínez-Flórez G, Tovar-Falón R, Leiva V, Castro C. Skew-Normal Inflated Models: Mathematical Characterization and Applications to Medical Data with Excess of Zeros and Ones. Mathematics. 2024; 12(16):2486. https://doi.org/10.3390/math12162486
Chicago/Turabian StyleMartínez-Flórez, Guillermo, Roger Tovar-Falón, Víctor Leiva, and Cecilia Castro. 2024. "Skew-Normal Inflated Models: Mathematical Characterization and Applications to Medical Data with Excess of Zeros and Ones" Mathematics 12, no. 16: 2486. https://doi.org/10.3390/math12162486
APA StyleMartínez-Flórez, G., Tovar-Falón, R., Leiva, V., & Castro, C. (2024). Skew-Normal Inflated Models: Mathematical Characterization and Applications to Medical Data with Excess of Zeros and Ones. Mathematics, 12(16), 2486. https://doi.org/10.3390/math12162486