Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

PLoS Comput Biol. 2020 Feb 11;16(2):e1007565. doi: 10.1371/journal.pcbi.1007565. eCollection 2020 Feb.

Abstract

Genetic risk prediction is an important problem in human genetics, and accurate prediction can facilitate disease prevention and treatment. Calculating polygenic risk score (PRS) has become widely used due to its simplicity and effectiveness, where only summary statistics from genome-wide association studies are needed in the standard method. Recently, several methods have been proposed to improve standard PRS by utilizing external information, such as linkage disequilibrium and functional annotations. In this paper, we introduce EB-PRS, a novel method that leverages information for effect sizes across all the markers to improve prediction accuracy. Compared to most existing genetic risk prediction methods, our method does not need to tune parameters nor external information. Real data applications on six diseases, including asthma, breast cancer, celiac disease, Crohn's disease, Parkinson's disease and type 2 diabetes show that EB-PRS achieved 307.1%, 42.8%, 25.5%, 3.1%, 74.3% and 49.6% relative improvements in terms of predictive r2 over standard PRS method with optimally tuned parameters. Besides, compared to LDpred that makes use of LD information, EB-PRS also achieved 37.9%, 33.6%, 8.6%, 36.2%, 40.6% and 10.8% relative improvements. We note that our method is not the first method leveraging effect size distributions. Here we first justify our method by presenting theoretical optimal property over existing methods in this class of methods, and substantiate our theoretical result with extensive simulation results. The R-package EBPRS that implements our method is available on CRAN.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology*
  • Computer Simulation
  • Female
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods*
  • Humans
  • Linkage Disequilibrium
  • Male
  • Multifactorial Inheritance*
  • Software