Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review

DOI

  • Guillaume Holley, deCODE Genetics
  • ,
  • Doruk Beyter, deCODE Genetics
  • ,
  • Helga Ingimundardottir, deCODE Genetics
  • ,
  • Peter L. Møller
  • Snædis Kristmundsdottir, deCODE Genetics, Reykjavík University
  • ,
  • Hannes P. Eggertsson, deCODE Genetics
  • ,
  • Bjarni V. Halldorsson, deCODE Genetics, Reykjavík University

A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.

OriginalsprogEngelsk
Artikelnummer28
TidsskriftGenome Biology
Vol/bind22
Nummer1
Antal sider22
ISSN1474-7596
DOI
StatusUdgivet - dec. 2021

Bibliografisk note

Funding Information:
The authors would like to thank our colleagues from deCODE genetics and Amgen Inc. We would also like to thank Rosemary Dokos and Philipp Rescheneder from Oxford Nanopore Technologies for their feedback on Ratatosk and providing the initial HG002 data set. Finally, we thank all research participants who provided a biological sample to deCODE genetics and to the Genome in a Bottle Consortium. The review history is available as Additional file?2.

Publisher Copyright:
© 2021, The Author(s).

Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.

Se relationer på Aarhus Universitet Citationsformater

ID: 207873367