A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records

Li Jiang, Stefan McKinnon Edwards, Bo Thomsen, Christopher T. Workman, Bernt Guldbrandtsen, Peter Sørensen

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

238 Downloads (Pure)


Background: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located ingenes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-miningexercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connectionto disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity ofcommon diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-basedapproach combined with phenotypic profiling would be useful for disease gene prioritization. Results: We developed a random-set scoring model and implemented it to quantify phenotype relevance in anetwork-based disease gene-prioritization approach. We validated our approach based on different gene phenotypicprofiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validityof several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms oftheir effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data.Our method demonstrated good precision and sensitivity compared with those of two alternative complex-basedprioritization approaches. We then conducted a global ranking of all human genes according to their relevance toa range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of ourapproach. Moreover, these data suggest many promising novel candidate genes for human disorders that have acomplex mode of inheritance. Conclusion: We have implemented and validated a network-based approach to prioritize genes for humandiseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rankcandidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of datafrom genome-wide association studies, and will help in the understanding of how the associated genetic variantsinfluence disease or quantitative phenotypes.
Original languageEnglish
JournalBMC Bioinformatics
Pages (from-to)1-13
Number of pages13
Publication statusPublished - 24 Sept 2014


Dive into the research topics of 'A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records'. Together they form a unique fingerprint.

Cite this