TY - JOUR
T1 - Evaluating Bioinformatics Processing of Somatic Variant Detection in cfDNA Using Targeted Sequencing with UMIs
AU - Lin, Yixin
AU - Rasmussen, Mads Heilskov
AU - Christensen, Mikkel Hovden
AU - Frydendahl, Amanda
AU - Maretty, Lasse
AU - Andersen, Claus Lindbjerg
AU - Besenbacher, Søren
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/11
Y1 - 2024/11
N2 - Circulating tumor DNA (ctDNA) is a promising cancer biomarker, but accurately detecting tumor mutations in cell-free DNA (cfDNA) is challenging due to their low frequency and sequencing errors. Our study benchmarked Mutect2, VarScan2, shearwater, and DREAMS-vc using deep targeted sequencing of cfDNA with Unique Molecular Identifiers (UMIs) from 111 colorectal cancer patients. Performance was assessed at both the mutation level (distinguish tumor variants from errors) and the sample level (detect if an individual has cancer). Additionally, we investigated the effects of various UMI grouping and consensus strategies. The shearwater-AND variant calling method demonstrated the highest precision in detecting tumor-derived mutations from plasma, and reached the highest ROC-AUC of 0.984 for sample classification in tumor-informed cfDNA analyses. DREAMS-vc exhibited the highest ROC-AUC of 0.808 for sample classification in tumor-agnostic studies. We also found that sequencing depth differences in PBMCs could lead to false positives, particularly with VarScan2 and Mutect2, which was addressed by downsampling to equivalent mean depths. Additionally, network-based UMI grouping methods outperformed those using identical UMIs when all reads were retained. Our findings emphasize that the optimal variant caller depends on the study context—whether focused on mutation or sample classification, and whether conducted under tumor-informed or tumor-agnostic conditions.
AB - Circulating tumor DNA (ctDNA) is a promising cancer biomarker, but accurately detecting tumor mutations in cell-free DNA (cfDNA) is challenging due to their low frequency and sequencing errors. Our study benchmarked Mutect2, VarScan2, shearwater, and DREAMS-vc using deep targeted sequencing of cfDNA with Unique Molecular Identifiers (UMIs) from 111 colorectal cancer patients. Performance was assessed at both the mutation level (distinguish tumor variants from errors) and the sample level (detect if an individual has cancer). Additionally, we investigated the effects of various UMI grouping and consensus strategies. The shearwater-AND variant calling method demonstrated the highest precision in detecting tumor-derived mutations from plasma, and reached the highest ROC-AUC of 0.984 for sample classification in tumor-informed cfDNA analyses. DREAMS-vc exhibited the highest ROC-AUC of 0.808 for sample classification in tumor-agnostic studies. We also found that sequencing depth differences in PBMCs could lead to false positives, particularly with VarScan2 and Mutect2, which was addressed by downsampling to equivalent mean depths. Additionally, network-based UMI grouping methods outperformed those using identical UMIs when all reads were retained. Our findings emphasize that the optimal variant caller depends on the study context—whether focused on mutation or sample classification, and whether conducted under tumor-informed or tumor-agnostic conditions.
KW - benchmarking
KW - cancer sample classification
KW - cell-free DNA
KW - low-frequency variant calling
KW - UMI sequencing
UR - http://www.scopus.com/inward/record.url?scp=85208548854&partnerID=8YFLogxK
U2 - 10.3390/ijms252111439
DO - 10.3390/ijms252111439
M3 - Journal article
C2 - 39518990
AN - SCOPUS:85208548854
SN - 1661-6596
VL - 25
JO - International Journal of Molecular Sciences
JF - International Journal of Molecular Sciences
IS - 21
M1 - 11439
ER -