A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

  • Li Jiang, Danmark
  • Stefan McKinnon Edwards, Danmark
  • Bo Thomsen
  • Christopher T. Workman, Center for Biological Sequence Analysis, Technical University of Denmark, DK 2100 Lyngby, Denmark, Danmark
  • Bernt Guldbrandtsen
  • Peter Sørensen
Background: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located ingenes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-miningexercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connectionto disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity ofcommon diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-basedapproach combined with phenotypic profiling would be useful for disease gene prioritization. Results: We developed a random-set scoring model and implemented it to quantify phenotype relevance in anetwork-based disease gene-prioritization approach. We validated our approach based on different gene phenotypicprofiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validityof several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms oftheir effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data.Our method demonstrated good precision and sensitivity compared with those of two alternative complex-basedprioritization approaches. We then conducted a global ranking of all human genes according to their relevance toa range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of ourapproach. Moreover, these data suggest many promising novel candidate genes for human disorders that have acomplex mode of inheritance. Conclusion: We have implemented and validated a network-based approach to prioritize genes for humandiseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rankcandidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of datafrom genome-wide association studies, and will help in the understanding of how the associated genetic variantsinfluence disease or quantitative phenotypes.
OriginalsprogEngelsk
TidsskriftBMC Bioinformatics
Vol/bind15
Nummer315
Sider (fra-til)1-13
Antal sider13
ISSN1471-2105
DOI
StatusUdgivet - 24 sep. 2014

Se relationer på Aarhus Universitet Citationsformater

Download-statistik

Ingen data tilgængelig

ID: 82078349