Evolving stochastic context-free grammars for RNA secondary structure prediction

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

  • James WJ Anderson, Department of Statistics, University of Oxford, United Kingdom
  • Paula Cristina Tataru, Denmark
  • Joe Stains, University of Oxford, United Kingdom
  • Jotun Hein, University of Oxford, United Kingdom
  • Rune Lyngsø, University of Oxford, United Kingdom
Stochastic Context-Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few intuitively designed grammars have remained dominant. In this paper we investigate two automatic search techniques for effective grammars - exhaustive search for very compact grammars and an evolutionary algorithm to find larger grammars. We also examine whether grammar ambiguity is as problematic to structure prediction as has been previously suggested.

These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars.

Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many grammars that had strong predictive accuracy, as good or slightly better than those designed manually. Furthermore, several of the best grammars found were ambiguous, demonstrating that such grammars should not be disregarded.
Original languageEnglish
JournalB M C Bioinformatics
Publication statusPublished - 4 May 2012

See relations at Aarhus University Citationformats

ID: 45371051