TY - JOUR
T1 - Benchmarking long-read sequencing strategies for obtaining ASV-resolved rRNA operons from environmental microeukaryotes
AU - Overgaard, Christina Karmisholt
AU - Jamy, Mahwash
AU - Radutoiu, Simona
AU - Burki, Fabien
AU - Dueholm, Morten Kam Dahl
N1 - Publisher Copyright:
© 2024 The Author(s). Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2024/10
Y1 - 2024/10
N2 - The use of short-read metabarcoding for classifying microeukaryotes is challenged by the lack of comprehensive 18S rRNA reference databases. While recent advances in high-throughput long-read sequencing provide the potential to greatly increase the phylogenetic coverage of these databases, the performance of different sequencing technologies and subsequent bioinformatics processing remain to be evaluated, primarily because of the absence of well-defined eukaryotic mock communities. To address this challenge, we created a eukaryotic rRNA operon clone-library and turned it into a precisely defined synthetic eukaryotic mock community. This mock community was then used to evaluate the performance of three long-read sequencing strategies (PacBio circular consensus sequencing and two Nanopore approaches using unique molecular identifiers) and three tools for resolving amplicons sequence variants (ASVs) (USEARCH, VSEARCH, and DADA2). We investigated the sensitivity of the sequencing techniques based on the number of detected mock taxa, and the accuracy of the different ASV-calling tools with a specific focus on the presence of chimera among the final rRNA operon ASVs. Based on our findings, we provide recommendations and best practice protocols for how to cost-effectively obtain essentially error-free rRNA operons in high-throughput. An agricultural soil sample was used to demonstrate that the sequencing and bioinformatic results from the mock community also translates to highly diverse natural samples, which enables us to identify previously undescribed microeukaryotic lineages.
AB - The use of short-read metabarcoding for classifying microeukaryotes is challenged by the lack of comprehensive 18S rRNA reference databases. While recent advances in high-throughput long-read sequencing provide the potential to greatly increase the phylogenetic coverage of these databases, the performance of different sequencing technologies and subsequent bioinformatics processing remain to be evaluated, primarily because of the absence of well-defined eukaryotic mock communities. To address this challenge, we created a eukaryotic rRNA operon clone-library and turned it into a precisely defined synthetic eukaryotic mock community. This mock community was then used to evaluate the performance of three long-read sequencing strategies (PacBio circular consensus sequencing and two Nanopore approaches using unique molecular identifiers) and three tools for resolving amplicons sequence variants (ASVs) (USEARCH, VSEARCH, and DADA2). We investigated the sensitivity of the sequencing techniques based on the number of detected mock taxa, and the accuracy of the different ASV-calling tools with a specific focus on the presence of chimera among the final rRNA operon ASVs. Based on our findings, we provide recommendations and best practice protocols for how to cost-effectively obtain essentially error-free rRNA operons in high-throughput. An agricultural soil sample was used to demonstrate that the sequencing and bioinformatic results from the mock community also translates to highly diverse natural samples, which enables us to identify previously undescribed microeukaryotic lineages.
KW - benchmarking
KW - eukaryotic mock community
KW - microeukaryotes
KW - rRNA operons
UR - http://www.scopus.com/inward/record.url?scp=85197689904&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13991
DO - 10.1111/1755-0998.13991
M3 - Journal article
C2 - 38979877
AN - SCOPUS:85197689904
SN - 1755-098X
VL - 24
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
IS - 7
M1 - e13991
ER -