Structured RNAs and synteny regions in the pig genome

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

Documents

DOI

  • Christian Anthon, Center for non-coding RNA in Technology and Health, IBHV University of Copenhagen, Denmark
  • Hakim Tafer, Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Germany
  • Jakob H Havgaard, Center for non-coding RNA in Technology and Health, IBHV University of Copenhagen, Denmark
  • Bo Thomsen
  • Jakob Hedegaard
  • Ernst Stefan Seemann, Institut for Klinisk Veterinær- og Husdyrvidenskab, Animal Genetics, Bioinformatics and Breeding, Denmark
  • Sachin Pundhir, 2012 Institut for Basal Husdyr- og Veterinærvidenskab, 2012 Genetik og Bioinformatik, Denmark
  • Stephanie Kehr, Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Germany
  • Sebastian Bartschat, Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Germany
  • Mathilde Nielsen, Denmark
  • Rasmus O Nielsen, GenoSkan A/S, Denmark
  • Merete Fredholm, Center for non-coding RNA in Technology and Health, IBHV University of Copenhagen, Denmark
  • Peter F Stadler, Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Germany
  • Jan Gorodkin, Center for non-coding RNA in Technology and Health, IBHV University of Copenhagen, Denmark

BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes . When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

CONCLUSIONS: We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Original languageEnglish
Article number459
JournalB M C Genomics
Volume15
Issue459
Pages (from-to)1-27
Number of pages27
ISSN1471-2164
DOIs
Publication statusPublished - 10 Jun 2014

See relations at Aarhus University Citationformats

Download statistics

No data available

ID: 77222877