TY - JOUR
T1 - Versatile Detection of Diverse Selective Sweeps with Flex-Sweep
AU - Lauterbur, M. Elise
AU - Munch, Kasper
AU - Enard, David
PY - 2023/6
Y1 - 2023/6
N2 - Understanding the impacts of selection pressures influencing modern-day genomic diversity is a major goal of evolutionary genomics. In particular, the contribution of selective sweeps to adaptation remains an open question, with persistent statistical limitations on the power and specificity of sweep detection methods. Sweeps with subtle genomic signals have been particularly challenging to detect. Although many existing methods powerfully detect specific types of sweeps and/or those with strong signals, their power comes at the expense of versatility. We present Flex-sweep, a machine learning-based tool designed to detect sweeps with a variety of subtle signals, including those thousands of generations old. It is especially valuable for nonmodel organisms, for which we have neither expectations about the overall characteristics of sweeps nor outgroups with population-level sequencing to otherwise facilitate detecting very old sweeps. We show that Flex-sweep has the power to detect sweeps with subtle signals, even in the face of demographic model misspecification, recombination rate heterogeneity, and background selection. Flex-sweep detects sweeps up to 0.125∗4Ne generations old, including those that are weak, soft, and/or incomplete; it can also detect strong, complete sweeps up to 0.25∗4Ne generations old. We apply Flex-sweep to the 1000 Genomes Yoruba data set and, in addition to recovering previously identified sweeps, show that sweeps disproportionately occur within genic regions and are close to regulatory regions. In addition, we show that virus-interacting proteins (VIPs) are strongly enriched for selective sweeps, recapitulating previous results that demonstrate the importance of viruses as a driver of adaptive evolution in humans.
AB - Understanding the impacts of selection pressures influencing modern-day genomic diversity is a major goal of evolutionary genomics. In particular, the contribution of selective sweeps to adaptation remains an open question, with persistent statistical limitations on the power and specificity of sweep detection methods. Sweeps with subtle genomic signals have been particularly challenging to detect. Although many existing methods powerfully detect specific types of sweeps and/or those with strong signals, their power comes at the expense of versatility. We present Flex-sweep, a machine learning-based tool designed to detect sweeps with a variety of subtle signals, including those thousands of generations old. It is especially valuable for nonmodel organisms, for which we have neither expectations about the overall characteristics of sweeps nor outgroups with population-level sequencing to otherwise facilitate detecting very old sweeps. We show that Flex-sweep has the power to detect sweeps with subtle signals, even in the face of demographic model misspecification, recombination rate heterogeneity, and background selection. Flex-sweep detects sweeps up to 0.125∗4Ne generations old, including those that are weak, soft, and/or incomplete; it can also detect strong, complete sweeps up to 0.25∗4Ne generations old. We apply Flex-sweep to the 1000 Genomes Yoruba data set and, in addition to recovering previously identified sweeps, show that sweeps disproportionately occur within genic regions and are close to regulatory regions. In addition, we show that virus-interacting proteins (VIPs) are strongly enriched for selective sweeps, recapitulating previous results that demonstrate the importance of viruses as a driver of adaptive evolution in humans.
UR - http://www.scopus.com/inward/record.url?scp=85163821672&partnerID=8YFLogxK
U2 - 10.1093/molbev/msad139
DO - 10.1093/molbev/msad139
M3 - Journal article
C2 - 37307561
AN - SCOPUS:85163821672
SN - 0737-4038
VL - 40
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
IS - 6
M1 - msad139
ER -