TY - JOUR
T1 - A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
AU - Jiang, Li
AU - Edwards, Stefan McKinnon
AU - Thomsen, Bo
AU - Workman, Christopher T.
AU - Guldbrandtsen, Bernt
AU - Sørensen, Peter
PY - 2014/9/24
Y1 - 2014/9/24
N2 - Background: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located ingenes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-miningexercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connectionto disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity ofcommon diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-basedapproach combined with phenotypic profiling would be useful for disease gene prioritization. Results: We developed a random-set scoring model and implemented it to quantify phenotype relevance in anetwork-based disease gene-prioritization approach. We validated our approach based on different gene phenotypicprofiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validityof several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms oftheir effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data.Our method demonstrated good precision and sensitivity compared with those of two alternative complex-basedprioritization approaches. We then conducted a global ranking of all human genes according to their relevance toa range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of ourapproach. Moreover, these data suggest many promising novel candidate genes for human disorders that have acomplex mode of inheritance. Conclusion: We have implemented and validated a network-based approach to prioritize genes for humandiseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rankcandidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of datafrom genome-wide association studies, and will help in the understanding of how the associated genetic variantsinfluence disease or quantitative phenotypes.
AB - Background: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located ingenes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-miningexercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connectionto disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity ofcommon diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-basedapproach combined with phenotypic profiling would be useful for disease gene prioritization. Results: We developed a random-set scoring model and implemented it to quantify phenotype relevance in anetwork-based disease gene-prioritization approach. We validated our approach based on different gene phenotypicprofiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validityof several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms oftheir effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data.Our method demonstrated good precision and sensitivity compared with those of two alternative complex-basedprioritization approaches. We then conducted a global ranking of all human genes according to their relevance toa range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of ourapproach. Moreover, these data suggest many promising novel candidate genes for human disorders that have acomplex mode of inheritance. Conclusion: We have implemented and validated a network-based approach to prioritize genes for humandiseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rankcandidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of datafrom genome-wide association studies, and will help in the understanding of how the associated genetic variantsinfluence disease or quantitative phenotypes.
U2 - 10.1186/1471-2105-15-315
DO - 10.1186/1471-2105-15-315
M3 - Journal article
C2 - 25253562
SN - 1471-2105
VL - 15
SP - 1
EP - 13
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - 315
ER -