TY - JOUR
T1 - Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores
AU - Privé, Florian
AU - Arbel, Julyan
AU - Aschard, Hugues
AU - Vilhjálmsson, Bjarni J.
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2022/10
Y1 - 2022/10
N2 - Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
AB - Publicly available genome-wide association studies (GWAS) summary statistics exhibit uneven quality, which can impact the validity of follow-up analyses. First, we present an overview of possible misspecifications that come with GWAS summary statistics. Then, in both simulations and real-data analyses, we show that additional information such as imputation INFO scores, allele frequencies, and per-variant sample sizes in GWAS summary statistics can be used to detect possible issues and correct for misspecifications in the GWAS summary statistics. One important motivation for us is to improve the predictive performance of polygenic scores built from these summary statistics. Unfortunately, owing to the lack of reporting standards for GWAS summary statistics, this additional information is not systematically reported. We also show that using well-matched linkage disequilibrium (LD) references can improve model fit and translate into more accurate prediction. Finally, we discuss how to make polygenic score methods such as lassosum and LDpred2 more robust to these misspecifications to improve their predictive power.
KW - GWAS summary statistics
KW - misspecifications
KW - polygenic scores
U2 - 10.1016/j.xhgg.2022.100136
DO - 10.1016/j.xhgg.2022.100136
M3 - Journal article
C2 - 36105883
AN - SCOPUS:85137302826
SN - 2666-2477
VL - 3
JO - Human Genetics and Genomics Advances
JF - Human Genetics and Genomics Advances
IS - 4
M1 - 100136
ER -