Reconstruction of a matrix of genotypic correlations between variants within a gene for joint analysis of imputed and sequenced data
- 作者: Svishcheva G.R.1,2, Kirichenko A.V.1, Belonogova N.M.1, Elgaeva E.E.1,3, Tsepilov A.Y.1, Zorkoltseva I.V.1, Axenovich T.I.1
-
隶属关系:
- Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
- Vavilov Institute of General Genetics, Russian Academy of Sciences
- Novosibirsk State University
- 期: 卷 60, 编号 7 (2024)
- 页面: 91-99
- 栏目: МАТЕМАТИЧЕСКИЕ МОДЕЛИ И МЕТОДЫ
- URL: https://rjpbr.com/0016-6758/article/view/667236
- DOI: https://doi.org/10.31857/S0016675824070089
- EDN: https://elibrary.ru/BHMPLU
- ID: 667236
如何引用文章
详细
When combining imputed and sequenced data in a single gene-based association analysis, the problem of reconstructing genetic correlation matrices arises. It is related to the fact that for a gene, we know the correlations between genotypes of all imputed variants and the correlations between genotypes of all sequenced variants, but we do not know the correlations between genotypes of variants, one of which is imputed and the other is sequenced. To recover these correlations, we propose an efficient method based on maximising the determinant of the matrix. This method has a number of useful properties and has an analytical solution for our task. Approbation of the proposed method was performed by comparing reconstructed and real correlation matrices constructed on individual genotypes from the UK biobank. Comparison of the results of gene-based association analysis performed by the SKAT, BT and PCA methods on reconstructed and real matrices, using modelled summary statistics and calculated summary statistics on real phenotypes, showed high quality of reconstruction and robustness of the method to different gene structures.
作者简介
G. Svishcheva
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences; Vavilov Institute of General Genetics, Russian Academy of Sciences
编辑信件的主要联系方式.
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk; 119991, Moscow
A. Kirichenko
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk
N. Belonogova
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk
E. Elgaeva
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences; Novosibirsk State University
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk; 630090, Novosibirsk
A. Tsepilov
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk
I. Zorkoltseva
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk
T. Axenovich
Federal Research Centre, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences
Email: gulsvi@mail.ru
俄罗斯联邦, 630090, Novosibirsk
参考
- Eichler E.E., Flint J., Gibson G. at al. Missing heritability and strategies for finding the underlying causes of complex disease // Nat. Rev. Genet. 2010. V. 11. № 6. P. 446–450. https://doi.org/10.1038/nrg2809
- Li B., Leal S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data // The Am. J. Hum. Genet. 2008. V. 83. № 3. P. 311−321. https://doi.org/10.1016/j.ajhg.2008.06.024
- Cirulli E.T. The increasing importance of gene-based analyses // PloS Genetics. 2016. V. 12. № 4. https://doi.org/10.1371/journal.pgen.1005852
- Kang G., Jiang B., Cui Y. Gene-based genomewide association analysis: A comparison study // Curr. Genomics. 2013. V. 14. № 4. P. 250–255. https://doi.org/10.2174/13892029113149990001
- Li Y., Willer C., Sanna S., Abecasis G. Genotype imputation // Ann. Rev. Genomics and Hum. Genet. 2009. V. 10. P. 387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
- Uffelmann E., Huang Q.Q., Munung N.S. et al. Genome-wide association studies // Nat. Rev. Methods Primers. 2021. V. 1. № 59. P. 1–21. https://doi.org/10.1038/s43586-021-00056-9
- Guo Y., Long J., He J. et al. Exome sequencing generates high quality data in non-target regions // BMC Genomics. 2012. V. 13. № 1. P. 1–10. https://doi.org/10.1186/1471-2164-13-194
- Clark M.J., Chen R., Lam H.Y. et al. Performance comparison of exome DNA sequencing technologies // Nat. Biotechnol. 2011. V. 29. № 10. P. 908–914. https://doi.org/10.1038/nbt.1975
- Stanley J.C., Wang M.D. Restrictions on the possible values of r12, given r13 and r23 // Educational and Psychol. Measurement. 1969. V. 29. № 3. P. 579–581.
- Glass G.V., Collins J.R. Geometric proof of the restriction on the possible values of rxy when rxz and ryz are fixed // Educational and Psychol. Measurement. 1970. V. 30. № 1. P. 37–39.
- Budden M., Hadavas P., Hoffman L., Pretz C. Generating valid 4 × 4 correlation matrices // Applied Mathemat. E-Notes. 2007. V. 7. P. 53–59.
- Glunt W., Hayden T., Johnson C.R., Tarazaga P. Positive definite completions and determinant maximization // Linear Algebra and its Applications. 1999. V. 288. P. 1–10. https://doi.org/10.1016/S0024-3795(98)10211-2
- Vandenberghe L., Boyd S., Wu S.-P. Determinant maximization with linear matrix inequality constraints // SIAM J. Matrix Analysis and Applications. 1998. V. 19. № 2. P. 499–533. https://doi.org/10.1137/S0895479896303430
- Georgescu D.I., Higham N.J., Peters G.W. Explicit solutions to correlation matrix completion problems, with an application to risk management and insurance // Royal Soc. Open Sci. 2018. V. 5. № 3. P. 172348.
- Grone R., Johnson C.R., Sá E.M., Wolkowicz H. Positive definite completions of partial Hermitian matrices // Linear Algebra and its Applications. 1984. V. 58. P. 109–124.
- Popescu O., Rose C., Popescu D.C. Maximizing the determinant for a special class of block-partitioned matrices // Mathem. Problems in Engineering. 2004. V. 2004. P. 49–61. https://doi.org/10.1155/S1024123X04307027
- Li B., Liu D.J., Leal S.M. Identifying rare variants associated with complex traits via sequencing // Curr. Protocols in Hum. Genet. 2013. V. 78. № 1. P. 1–26. https://doi.org/10.1002/0471142905.hg0126s78
- Wu M.C., Lee S., Cai T. et al. Rare-variant association testing for sequencing data with the sequence kernel association test // The Am. J. Hum. Genet. 2011. V. 89. № 1. P. 82–93. https://doi.org/10.1016/j.ajhg.2011.05.029
- Jiang L., Zheng Z., Fang H., Yang J. A generalized linear mixed model association tool for biobank-scale data // Nat. Genet. 2021. V. 53. № 11. P. 1616–1621. https://doi.org/10.1038/s41588-021-00954-4
- Svishcheva G.R. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels // Scientific Reports. 2019. V. 9. № 1. P. 1–8. https://doi.org/10.1038/s41598-019-41827-5
- Svishcheva G.R., Belonogova N.M., Zorkoltseva I.V. et al. Gene-based association tests using GWAS summary statistics // Bioinformatics. 2019. V. 35. № 19. P. 3701–3708. https://doi.org/10.1093/bioinformatics/btz172
- Belonogova N.M., Svishcheva G.R., Kirichenko A.V. et al. sumSTAAR: A flexible framework for gene-based association studies using GWAS summary statistics // PloS Comput. Biology. 2022. T. 18. № 6. https://doi.org/10.1371/journal.pcbi.1010172
- Тихонов А.Н. О решении некорректно поставленных задач и методе регуляризации // ДАН. 1963. Т. 151. № 3. C. 501–504.
补充文件
