A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background
Abstract
:1. Introduction
2. Materials and Methods
2.1. The TauCOR Method
2.1.1. Algorithm
- (i)
- Estimating the joint contribution of extragenic SNPs to the trait variation and calculating the trait variance explained by these extragenic SNPs (i.e., the local SNP heritability, in terms proposed by Shi et al. [16]), and
- (ii)
- Adjusting the LD matrix for intragenic SNPs so that the distribution of the z statistics of these SNPs becomes conditionally independent of the distribution of the z statistics of SNPs in the external region.
2.1.2. Task: Designations, Input Data, and Formulation
2.2. Simulation Strategy
2.3. ‘Gold Standard’ Gene List
2.4. Method Performance
3. Results
3.1. Simulated Data Analysis
3.2. Real Data Analysis Using ‘Gold Standard’ List of Genes
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. From Individual-Level Data to Summary-Level Data
- 1.
- For setR, we construct a region-based linear regression model with random or fixed effects of on :
- 2.
- We move to the level of summary statistics by multiplying all components of this regression equation by on the left-hand side:
- 3.
- By definition, and . After these substitutions, we obtain the following equation:
Appendix B. Formulating the Parameter τ
- 1.
- According to #1 in Appendix A, we form the matrix of phenotypic correlations between individuals explained by the region (r) as follows:
- 2.
- Random effects model assumes that , and thus creates the following equation:
- 3.
- The matrix , scaled by , presents the relationship matrix (Rr) between individuals, which is explained by the following region:
- 4.
- The matrix of phenotypic correlations between individuals can be written in terms of local SNP heritability, , as follows:
- 5.
- Upon equating the two matrix expressions from #3 and #4, we obtain the following result:
References
- Svishcheva, G.R. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels. Sci. Rep. 2019, 9, 5461. [Google Scholar] [CrossRef]
- Svishcheva, G.R.; Belonogova, N.M.; Zorkoltseva, I.V.; Kirichenko, A.V.; Axenovich, T.I. Gene-based association tests using GWAS summary statistics. Bioinformatics 2019, 35, 3701–3708. [Google Scholar] [CrossRef]
- Yang, J.; Ferreira, T.; Morris, A.P.; Medland, S.E.; Genetic Investigation of ANthropometric Traits (GIANT) Consortium; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W.; et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 2012, 44, 369–375. [Google Scholar] [CrossRef] [PubMed]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Statist. 2004, 32, 407–499. [Google Scholar] [CrossRef]
- Ning, Z.; Lee, Y.; Joshi, P.K.; Wilson, J.F.; Pawitan, Y.; Shen, X. A selection operator for summary association statistics reveals allelic heterogeneity of complex traits. Am. J. Hum. Genet. 2017, 101, 903–912. [Google Scholar] [CrossRef]
- Belonogova, N.M.; Svishcheva, G.R.; Kirichenko, A.V.; Zorkoltseva, I.V.; Tsepilov, Y.A.; Axenovich, T.I. sumSTAAR: A flexible framework for gene-based association studies using GWAS summary statistics. PLoS Comput. Biol. 2022, 18, e1010172. [Google Scholar] [CrossRef]
- Belonogova, N.M.; Zorkoltseva, I.V.; Tsepilov, Y.A.; Axenovich, T.I. Gene-based association analysis identifies 190 genes affecting neuroticism. Sci. Rep. 2021, 11, 2484. [Google Scholar] [CrossRef]
- Li, M.; Jiang, L.; Mak, T.S.H.; Kwan, J.S.H.; Xue, C.; Chen, P.; Leung, H.C.-M.; Cui, L.; Li, T.; Sham, P.C. A powerful conditional gene-based association approach implicated functionally important genes for schizophrenia. Bioinformatics 2019, 35, 628–635. [Google Scholar] [CrossRef]
- Dering, C.; Hemmelmann, C.; Pugh, E.; Ziegler, A. Statistical analysis of rare sequence variants: An overview of collapsing methods. Genet. Epidemiol. 2011, 35, S12–S17. [Google Scholar] [CrossRef]
- Chen, H.; Meigs, J.B.; Dupuis, J. Sequence kernel association test for quantitative traits in family samples. Genet. Epidemiol. 2013, 37, 196–204. [Google Scholar] [CrossRef]
- Wu, M.C.; Lee, S.; Cai, T.; Li, Y.; Boehnke, M.; Lin, X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011, 89, 82–93. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.; Emond, M.J.; Bamshad, M.J.; Barnes, K.C.; Rieder, M.J.; Nickerson, D.A.; Team, E.L.P.; Christiani, D.C.; Wurfel, M.M.; Lin, X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 2012, 91, 224–237. [Google Scholar] [CrossRef] [PubMed]
- Wu, B.; Guan, W.; Pankow, J.S. On efficient and accurate calculation of significance p-values for sequence kernel association testing of variant set. Ann. Hum. Genet. 2016, 80, 123–135. [Google Scholar] [CrossRef] [PubMed]
- Wang, K.; Abbott, D. A principal components regression approach to multilocus genetic association studies. Genet. Epidemiol. Off. Publ. Int. Genet. Epidemiol. Soc. 2008, 32, 108–118. [Google Scholar] [CrossRef] [PubMed]
- Fan, R.; Wang, Y.; Mills, J.L.; Wilson, A.F.; Bailey-Wilson, J.E.; Xiong, M. Functional linear models for association analysis of quantitative traits. Genet. Epidemiol. 2013, 37, 726–742. [Google Scholar] [CrossRef]
- Shi, H.; Kichaev, G.; Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 2016, 99, 139–153. [Google Scholar] [CrossRef]
- Pongpanich, M.; Neely, M.L.; Tzeng, J.-Y. On the aggregation of multimarker information for marker-set and sequencing data analysis: Genotype collapsing vs. similarity collapsing. Front. Genet. 2012, 2, 110. [Google Scholar] [CrossRef]
- Lee, S.; Abecasis, G.R.; Boehnke, M.; Lin, X. Rare-variant association analysis: Study designs and statistical tests. Am. J. Hum. Genet. 2014, 95, 5–23. [Google Scholar] [CrossRef]
- Consortium, G.P. A global reference for human genetic variation. Nature 2015, 526, 68. [Google Scholar] [CrossRef]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.; Daly, M.J. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef]
- Mountjoy, E.; Schmidt, E.M.; Carmona, M.; Schwartzentruber, J.; Peat, G.; Miranda, A.; Fumis, L.; Hayhurst, J.; Buniello, A.; Karim, M.A. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 2021, 53, 1527–1533. [Google Scholar] [CrossRef] [PubMed]
- McLaren, W.; Gil, L.; Hunt, S.E.; Riat, H.S.; Ritchie, G.R.; Thormann, A.; Flicek, P.; Cunningham, F. The ensembl variant effect predictor. Genome Biol. 2016, 17, 122. [Google Scholar] [CrossRef] [PubMed]
- Bulik-Sullivan, B.; Finucane, H.K.; Anttila, V.; Gusev, A.; Day, F.R.; Loh, P.R.; ReproGen Consortium; Psychiatric Genomics Consortium; Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Control Consortium; Duncan, L.; et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 2015, 47, 1236–1241. [Google Scholar] [CrossRef]
- Pasaniuc, B.; Zaitlen, N.; Shi, H.; Bhatia, G.; Gusev, A.; Pickrell, J.; Hirschhorn, J.; Strachan, D.P.; Patterson, N.; Price, A.L. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 2014, 30, 2906–2914. [Google Scholar] [CrossRef]
- Zeng, J.; Xue, A.; Jiang, L.; Lloyd-Jones, L.R.; Wu, Y.; Wang, H.; Zheng, Z.; Yengo, L.; Kemper, K.E.; Goddard, M.E. Widespread signatures of natural selection across human complex traits and functional genomic categories. Nat. Commun. 2021, 12, 1164. [Google Scholar] [CrossRef]
- Fortune, M.D.; Wallace, C. simGWAS: A fast method for simulation of large scale case–control GWAS summary statistics. Bioinformatics 2019, 35, 1901–1906. [Google Scholar] [CrossRef]
- de Leeuw, C.A.; Mooij, J.M.; Heskes, T.; Posthuma, D. MAGMA: Generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 2015, 11, e1004219. [Google Scholar] [CrossRef]
- Bulik-Sullivan, B.K.; Loh, P.R.; Finucane, H.K.; Ripke, S.; Yang, J.; Schizophrenia Working Group of the Psychiatric Genomics Consortium; Patterson, N.; Daly, M.J.; Price, A.L.; Neale, B.M. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015, 47, 291–295. [Google Scholar] [CrossRef]
- Lee, S.; Wu, M.C.; Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 2012, 13, 762–775. [Google Scholar] [CrossRef]
Initial GBA | COJO + GBA | TauCOR + GBA | |
---|---|---|---|
GS genes | 15/28 * | 10/15 | 12/15 |
Neighboring genes | 54/423 | 15/54 | 5/54 |
Sensitivity | 0.54 | 0.67 | 0.80 |
Specificity | 0.87 | 0.72 | 0.91 |
Accuracy | 0.85 | 0.71 | 0.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Svishcheva, G.R.; Belonogova, N.M.; Kirichenko, A.V.; Tsepilov, Y.A.; Axenovich, T.I. A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background. Genes 2024, 15, 1174. https://doi.org/10.3390/genes15091174
Svishcheva GR, Belonogova NM, Kirichenko AV, Tsepilov YA, Axenovich TI. A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background. Genes. 2024; 15(9):1174. https://doi.org/10.3390/genes15091174
Chicago/Turabian StyleSvishcheva, Gulnara R., Nadezhda M. Belonogova, Anatoly V. Kirichenko, Yakov A. Tsepilov, and Tatiana I. Axenovich. 2024. "A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background" Genes 15, no. 9: 1174. https://doi.org/10.3390/genes15091174
APA StyleSvishcheva, G. R., Belonogova, N. M., Kirichenko, A. V., Tsepilov, Y. A., & Axenovich, T. I. (2024). A New Method for Conditional Gene-Based Analysis Effectively Accounts for the Regional Polygenic Background. Genes, 15(9), 1174. https://doi.org/10.3390/genes15091174