Automatic selection of reference taxa for protein–protein interaction prediction with phylogenetic profiling

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

  • Martin Simonsen, Denmark
  • Stefan R. Maetschke, The University of Queensland, Institute for Molecular Bioscience, Australia
  • Mark A Ragan, The University of Queensland, Institute for Molecular Bioscience, Australia
Motivation: Phylogenetic profiling methods can achieve good accuracy in predicting protein-protein interactions, especially in prokaryotes. Recent studies have shown that the choice of reference taxa is critical for accurate prediction, but with more than 2,500 fully sequenced taxa publicly available, identifying the most-informative reference taxa is becoming increasingly difficult. Previous studies on the selection of reference taxa have provided guidelines for manual taxon selection, and for eliminating closely related taxa. However, no general strategy for automatic selection of reference taxa is currently available.
Results: We present three novel approaches for automating the selection of reference taxa, using machine learning based on known protein-protein interaction networks. One of these approaches in particular, Tree-Based Search, yields greatly improved prediction accuracy. We further show that different methods for constituting phylogenetic profiles often require very different reference taxon sets to support high prediction accuracy.
Availability: All software used in the experiments can be found at
Original languageEnglish
Pages (from-to)851-857
Number of pages7
Publication statusPublished - 2012

See relations at Aarhus University Citationformats

ID: 40373768