Inferring gene flow between populations using statistical methods

Research output: Book/anthology/dissertation/reportPh.D. thesisResearch


Gene flow is the transfer of genetic material from one population to another.
It is very common and important in the description of the genetic history of
a population. Gene flow is hidden in the genome as migrated segments of
different genetic material. Inferring the gene flow based on sequenced genomes
is challenging. I present my work on estimating gene flow using three different
statistical models.
With two sequences from different populations, the model Isolation Mi-
gration CoalHMM can determine the amount of gene flow between the pop-
ulations after their initial split. It takes all possible migrated segments into
account using the powerful HMM algorithms. In collaboration I have build
new CoalHMM’s incorporating several pairs of sequences to infer direction and
variation in the gene flow. I show that estimating direction and variation jointly
is too hard. I show that estimating direction is possible with some uncertainty.
I apply the methods to a dataset of extinct and extant elephants and show that
there is extensive gene flow.
Instead of considering all possible segments with an HMM, I examine the
potential of only considering the most likely segments with particle filtering.
I present challenges and advantageous choices when implementing a particle
filter for this problem.
The second statistical model infers gene flow from a covariance matrix be-
tween several populations. Some relations between entries in the covariance
matrix can only be explained by gene flow. In collaboration I have developed a
method that fits the best phylogeny with gene flow events for an observed co-
variance matrix. The method uses MCMC. A phylogeny with gene flow events
is called an admixture graph and the method is called AdmixtureBayes. I show
that AdmixtureBayes has a smaller error than the most popular admixture
graph estimators on simulated data. AdmixtureBayes produces a posterior
sample of admixture graphs and I demonstrate the possibilities with such a
sample on a real dataset of Native American genomes.
The last statistical model infers very recent gene flow by classifying hybrid
individuals. The genome of a hybrid individual has big segments of alleles
originating from different populations. Using the allele frequencies from those
populations, it is possible to infer the segments. In collaboration, I implemented
ImmediateAncestry which infers the segments and the most likely hybrid type
with an HMM. I show that the classifier has good accuracy on simulated data.
The classifier is not robust for a real dataset of chimpanzees, so I discuss reasons
and remedies.
Original languageEnglish
Place of publicationAarhus
PublisherAarhus Universitet
Number of pages139
Publication statusPublished - 9 Oct 2018

See relations at Aarhus University Citationformats

Download statistics

No data available

ID: 130490309