Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study and Dataset Used
2.2. Model Development and Validation
- Amniotic Fluid—categorical (1 = Yes, 0 = No)
- Complications During Pregnancy—categorical (1 = Yes, 0 = No)
- Ethnicity—categorical (1 = family, 0 = tribe)
- Gestational Age—continuous
- Maternal Age at Childs Birth—continuous
- Nationality—categorical (1 = Kuwaiti, 0 = Non-Kuwaiti)
- Parental Couple Consanguinity—categorical (1 = Yes, 0 = No)
- Pre-conceptional History—categorical (1 = Yes, 0 = No)
- Sex—categorical (1 = female, 0 = male)
- Age—continuous
2.2.1. Model Development
2.2.2. Model Complexity and Fit
2.2.3. Model Prediction
2.2.4. Bivariate Logistic Regression
3. Results
3.1. Study Cohort
3.2. Model Estimation
3.3. Model Diagnostics
3.4. Model Prediction
3.5. Bivariate Case
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zeger, S.L.; Liang, K.Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986, 42, 121–130. [Google Scholar] [CrossRef] [Green Version]
- Yi, G.Y.; Cook, R.J. Marginal methods for incomplete longitudinal data arising in clusters. J. Am. Stat. Assoc. 2002, 97, 1071–1080. [Google Scholar] [CrossRef]
- Fang, D.; Sun, R.; Wilson, J.R. Joint modeling of correlated binary outcomes: The case of contraceptive use and HIV knowledge in Bangladesh. PLoS ONE 2018, 13, e0190917. [Google Scholar] [CrossRef] [Green Version]
- El-Sayed, A. Modeling Multivariate Correlated Binary Data. Am. J. Theor. Appl. Stat. 2016, 5, 225–233. [Google Scholar] [CrossRef]
- Lee, K.; Joo, Y.; Yoo, J.K.; Lee, J.B. Marginalized random effects models for multivariate longitudinal binary data. Stat. Med. 2009, 28, 1284–1300. [Google Scholar] [CrossRef]
- Stiratelli, R.; Laird, N.; Ware, J.H. Random-effects models for serial observations with binary response. Biometrics 1984, 40, 961–971. [Google Scholar] [CrossRef]
- O’Brien, M.; Dunson, B. Bayesian multivariate logistic regression. Biometrics 2004, 60, 739–746. [Google Scholar] [CrossRef]
- Fitzmaurice, G.; Laird, N. A likelihood-based method for analysing longitudinal binary responses. Biometrika 1993, 80, 141–151. [Google Scholar] [CrossRef]
- Qaqish, B.; Ivanova, A. Multivariate logistic models. Biometrika 2006, 93, 1011–1017. [Google Scholar] [CrossRef]
- Chen, M.H.; Dey, D.K. Bayesian modeling of correlated binary responses via scale mixture of multivariate normal link functions. Sankhya 1998, 60, 322–343. [Google Scholar]
- Holmes, C.; Leonhard, H. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 2006, 1, 145–168. [Google Scholar] [CrossRef]
- Dunson, D.B.; Chen, Z.; Harry, J.A. Bayesian approach for joint modeling of cluster size and subunit-specific outcomes. Biometrics 2003, 59, 521–530. [Google Scholar] [CrossRef]
- Frühwirth-Schnatter, S.; Frühwirth, R. Data augmentation and MCMC for binary and multinomial logit models. In Statistical Modelling and Regression Structures; Kneib, T., Tutz, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 111–132. [Google Scholar]
- Chopin, N.; Ridgway, J. Leave Pima Indians alone: Binary regression as a benchmark for Bayesian computation. Stat. Sci. 2017, 32, 64–87. [Google Scholar] [CrossRef]
- Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using pólya–gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef] [Green Version]
- Edwards, Y.D.; Allenby, G.M. Multivariate analysis of multiple response data. J. Mark. Res. 2003, 40, 321–334. [Google Scholar] [CrossRef]
- Talhouk, A.; Doucet, A.; Murphy, K. Efficient bayesian inference for multivariate probit models with sparse inverse correlation matrices. J. Comput. Graph. Stat. 2012, 3, 739–757. [Google Scholar] [CrossRef] [Green Version]
- Fasano, A.; Durante, D.; Zanella, G. Scalable and accurate variational Bayes for high-dimensional binary regression models. arXiv 2019, arXiv:1911.06743. [Google Scholar]
- Cao, J.; Durante, D.; Genton, M. Scalable computation of predictive probabilities in probit models with Gaussian process priors. arXiv 2020, arXiv:2009.01471v2. [Google Scholar]
- Al-Jarallah, R.; Al-Awadi, S.; Bastaki, L.; Marafi, M. Genetic Diseases in State of Kuwait: A Statistical Approach; Technical Report; University of Kuwait: Kuwait City, Kuwait, 2012. [Google Scholar]
- Spiegelhalter, D.; Thomas, A.; Best, N.; Lunn, D. WinBUGS User Manual: Version 1.4; MRC Biostatistics Unit: Cambridge, UK, 2003. [Google Scholar]
- Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; van der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 2002, 64, 583–616. [Google Scholar] [CrossRef] [Green Version]
- Le Cessie, S.; Van Houwelingen, J.C. Logistic regression for correlated binary data. Appl. Stat. 1994, 43, 95–108. [Google Scholar] [CrossRef]
- Sheu, C.-F. Regression analysis of correlated binary outcomes. Behav. Res. Methods Instrum. Comput. 2000, 32, 269–273. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bhatnagar, S.R.; Atherton, J.; Benedetti, A. Comparing alternating logistic regressions to other approaches to modelling correlated binary data. J. Stat. Comput. Simul. 2015, 85, 2059–2071. [Google Scholar] [CrossRef]
- Natarajan, R.; Kass, R.E. Reference Bayesian methods for generalized linear mixed models. J. Am. Stat. Assoc. 2000, 95, 227–237. [Google Scholar] [CrossRef]
- Akaike, H. Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium Information Theory, Tsahkadsor, Armenia, 2–8 September 1971; Petrov, B.N., Caski, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
- McFadden, D. Conditional logit Analysis of Qualitative choice behavior. In Frontiers of Econometrics; Zarembka, P., Ed.; Academic Press: New York, NY, USA, 1974; pp. 105–142. [Google Scholar]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman & Hall: London, UK, 1989. [Google Scholar]
- Palmgren, J. Regression Models for Bivariate Binary Responses; Technical Report No. 101; Department of Biostatistics, University of Washington: Seattle, WA, USA, 1989. [Google Scholar]
- Gelman, A.; Rubin, D. Inference from iterative simulation using multiple sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
- Sweeting, T.J.; Kharroubi, S.A. Application of a predictive distribution formula to Bayesian computation for incomplete data models. Stat. Comput. 2005, 15, 167–178. [Google Scholar] [CrossRef] [Green Version]
- Kharroubi, S.A.; Sweeting, T.J. Posterior simulation via signed root log-likelihood ratios. Bayesian Anal. 2010, 5, 787–816. [Google Scholar] [CrossRef]
DS | MR | |||||
---|---|---|---|---|---|---|
Covariate | Logit | Probit | Cloglog | Logit | Probit | Cloglog |
Intercept | −6.868 (−7.533, −4.68) | −0.5829 (−0.8539, −0.0541) | −1.105 (−1.518, −0.094) | −2.282 (−2.626, −1.859) | −0.5294 (−0.584, −0.428) | −0.5971 (−0.851, −0.296) |
Ethnicity | −0.140 (−0.260, −0.021) | −0.079 (−0.143, −0.017) | −0.108 (−0.212, −0.004) | 0.031 (−0.045, 0.105) | 0.018 (−0.026, 0.064) | 0.019 (−0.035, 0.075) |
Sex | 0.025 (−0.075, 0.121) | 0.015 (−0.036, 0.068) | 0.017 (−0.068, 0.102) | −0.434 (−0.496, −0.374) | −0.267 (−0.305, −0.229) | −0.323 (−0.370, −0.277) |
Nationality | −0.637 (−0.736, −0.538) | −0.346 (−0.399, −0.292) | −0.548 (−0.634, −0.461) | 0.244 (0.175, 0.312) | 0.149 (0.107, 0.193) | 0.175 (0.123, 0.226) |
Parental Couple Consanguinity | −0.485 (−0.589, −0.386) | −0.257 (−0.311, −0.204) | −0.430 (−0.520, −0.341) | −0.302 (−0.365, −0.241) | −0.188 (−0.227, −0.149) | −0.220 (−0.266, −0.174) |
Maternal Age at Child’s Birth | 0.131 (0.123, 0.138) | 0.069 (0.065, 0.072) | 0.112 (0.106, 0.117) | 0.044 (0.039, 0.049) | 0.026 (0.023, 0.029) | 0.031 (0.027, 0.034) |
Pre-conceptional History | −0.069 (−0.212, 0.070) | −0.033 (−0.108, 0.040) | −0.057 (−0.181, 0.065) | −0.681 (−0.774, −0.591) | −0.419 (−0.475, −0.364) | −0.520 (−0.593, −0.449) |
Gestational Age | 0.064 (0.041, 0.088) | 0.031 (0.022, 0.043) | 0.059 (0.045, 0.074) | −4.423 (−6.908, −3.817) | −0.188 (−0.227, −0.149) | 0.010 (0.002, 0.018) |
Amniotic Fluid | −0.273 (−0.384, −0.164) | −0.146 (−0.206, −0.085) | −0.246 (−0.345, −0.147) | 0.071 (−0.003, 0.145) | 0.044 (−0.0005, 0.087) | 0.046 (−0.009, 0.101) |
Complications During Pregnancy | −0.598 (−0.770, −0.437) | −0.316 (−0.405, −0.228) | −0.543 (−0.694, −0.390) | −0.247 (−0.350, −0.149) | −0.152 (−0.213, −0.091) | −0.189 (−0.267, −0.113) |
Age | 0.006 (0.001, 0.011) | 0.003 (0.001, 0.006) | 0.002 (0.001, 0.007) | 0.013 (0.010, 0.016) | 0.008 (0.006, 0.010) | 0.008 (0.006, 0.010) |
R2 | 0.1145 | 0.0778 | 0.0755 | 0.0626 | 0.0594 | 0.0576 |
Adjusted R2 | 0.114 | 0.0772 | 0.0749 | 0.0621 | 0.0589 | 0.0571 |
RMSE | 0.3074 | 0.3137 | 0.3141 | 0.4676 | 0.4828 | 0.4833 |
Overall DIC | ||||||
7400.77 | 7414.02 | 7416.71 | 23,100 | 23,150 | 23,170 | |
PD | 17.603 | 17.293 | 17.246 | 9.123 | 11.21 | 7.344 |
DIC | 7435.98 | 7448.61 | 7451.21 | 23,120 | 23,180 | 23,190 |
Bayesian Estimates | |||
---|---|---|---|
Odds Ratio (95% Credible Interval) | |||
Covariate | DS | MR | Association |
Constant | 0.0005 (0.0002, 0.0010) | 0.1040 (0.0634, 0.1606) | 0.9196 |
Amniotic Fluid | 1.0960 (0.8037, 1.4780) | 1.7800 (1.4580, 2.1660) | 1.6641 |
Complications During Pregnancy | 0.5916 (0.4985, 0.6948) | 0.7668 (0.6938, 0.8452) | 0.9273 |
Ethnicity | 0.8452 (0.7508, 0.9496) | 1.0350 (0.9585, 1.1150) | 1.0019 |
Gestational Age | 1.0600 (1.0390,1.0830) | 1.0100 (0.9991, 1.0210) | 1.0129 |
Maternal Age at Child’s Birth | 1.1400 (1.1320, 1.1490) | 1.0450 (1.0400, 1.0500) | 1.1482 |
Nationality | 0.5261 (0.4754, 0.5806) | 1.2740 (1.1880, 1.3640) | 0.7584 |
Parental Couple Consanguinity | 0.6206 (0.5605, 0.6854) | 0.7430 (0.6970, 0.7910) | 1.0681 |
Pre-conceptional History | 0.9662 (0.8379, 1.1070) | 0.5044 (0.4597, 0.5518) | 0.9208 |
Sex | 1.0290 (0.9328, 1.1320) | 0.6499 (0.6102, 0.6912) | 0.9605 |
Age | 1.0070 (1.0020, 1.0120) | 1.0120 (1.0090, 1.0160) | 1.0747 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Aljarallah, R.; Kharroubi, S.A. Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation. Mathematics 2021, 9, 248. https://doi.org/10.3390/math9030248
Aljarallah R, Kharroubi SA. Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation. Mathematics. 2021; 9(3):248. https://doi.org/10.3390/math9030248
Chicago/Turabian StyleAljarallah, Reem, and Samer A Kharroubi. 2021. "Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation" Mathematics 9, no. 3: 248. https://doi.org/10.3390/math9030248
APA StyleAljarallah, R., & Kharroubi, S. A. (2021). Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation. Mathematics, 9(3), 248. https://doi.org/10.3390/math9030248