Department of Economics and Business Economics

Accuracy of haplotype estimation and whole genome imputation affects complex trait analyses in complex biobanks

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

  • Vivek Appadurai, Institute of Biological Psychiatry
  • ,
  • Jonas Bybjerg-Grauholm, iPSYCH -The Lundbeck Foundation Initiative for Integrative Psychiatric Research
  • ,
  • Morten Dybdahl Krebs, Institute of Biological Psychiatry
  • ,
  • Anders Rosengren, Institute of Biological Psychiatry
  • ,
  • Alfonso Buil, Institute of Biological Psychiatry
  • ,
  • Andrés Ingason, Institute of Biological Psychiatry
  • ,
  • Ole Mors
  • Anders D Børglum
  • David M Hougaard, iPSYCH -The Lundbeck Foundation Initiative for Integrative Psychiatric Research
  • ,
  • Merete Nordentoft, iPSYCH -The Lundbeck Foundation Initiative for Integrative Psychiatric Research
  • ,
  • Preben B Mortensen
  • Olivier Delaneau, University of Lausanne
  • ,
  • Thomas Werge, Institute of Biological Psychiatry
  • ,
  • Andrew J Schork, Institute of Biological Psychiatry

Sample recruitment for research consortia, biobanks, and personal genomics companies span years, necessitating genotyping in batches, using different technologies. As marker content on genotyping arrays varies, integrating such datasets is non-trivial and its impact on haplotype estimation (phasing) and whole genome imputation, necessary steps for complex trait analysis, remains under-evaluated. Using the iPSYCH dataset, comprising 130,438 individuals, genotyped in two stages, on different arrays, we evaluated phasing and imputation performance across multiple phasing methods and data integration protocols. While phasing accuracy varied by choice of method and data integration protocol, imputation accuracy varied mostly between data integration protocols. We demonstrate an attenuation in imputation accuracy within samples of non-European origin, highlighting challenges to studying complex traits in diverse populations. Finally, imputation errors can bias association tests, reduce predictive utility of polygenic scores. Carefully optimized data integration strategies enhance accuracy and replicability of complex trait analyses in complex biobanks.

Original languageEnglish
Article number101
JournalCommunications Biology
Volume6
Issue1
Number of pages12
ISSN2399-3642
DOIs
Publication statusPublished - Dec 2023

Bibliographical note

© 2023. The Author(s).

See relations at Aarhus University Citationformats

ID: 304965508