Research output: Book/anthology/dissertation/report › Ph.D. thesis › Research

- Svend Vendelbo Nielsen thesis
Final published version, 4 MB, PDF-document

Gene flow is the transfer of genetic material from one population to another.

It is very common and important in the description of the genetic history of

a population. Gene flow is hidden in the genome as migrated segments of

different genetic material. Inferring the gene flow based on sequenced genomes

is challenging. I present my work on estimating gene flow using three different

statistical models.

With two sequences from different populations, the model Isolation Mi-

gration CoalHMM can determine the amount of gene flow between the pop-

ulations after their initial split. It takes all possible migrated segments into

account using the powerful HMM algorithms. In collaboration I have build

new CoalHMM’s incorporating several pairs of sequences to infer direction and

variation in the gene flow. I show that estimating direction and variation jointly

is too hard. I show that estimating direction is possible with some uncertainty.

I apply the methods to a dataset of extinct and extant elephants and show that

there is extensive gene flow.

Instead of considering all possible segments with an HMM, I examine the

potential of only considering the most likely segments with particle filtering.

I present challenges and advantageous choices when implementing a particle

filter for this problem.

The second statistical model infers gene flow from a covariance matrix be-

tween several populations. Some relations between entries in the covariance

matrix can only be explained by gene flow. In collaboration I have developed a

method that fits the best phylogeny with gene flow events for an observed co-

variance matrix. The method uses MCMC. A phylogeny with gene flow events

is called an admixture graph and the method is called AdmixtureBayes. I show

that AdmixtureBayes has a smaller error than the most popular admixture

graph estimators on simulated data. AdmixtureBayes produces a posterior

sample of admixture graphs and I demonstrate the possibilities with such a

sample on a real dataset of Native American genomes.

The last statistical model infers very recent gene flow by classifying hybrid

individuals. The genome of a hybrid individual has big segments of alleles

originating from different populations. Using the allele frequencies from those

populations, it is possible to infer the segments. In collaboration, I implemented

ImmediateAncestry which infers the segments and the most likely hybrid type

with an HMM. I show that the classifier has good accuracy on simulated data.

The classifier is not robust for a real dataset of chimpanzees, so I discuss reasons

and remedies.

It is very common and important in the description of the genetic history of

a population. Gene flow is hidden in the genome as migrated segments of

different genetic material. Inferring the gene flow based on sequenced genomes

is challenging. I present my work on estimating gene flow using three different

statistical models.

With two sequences from different populations, the model Isolation Mi-

gration CoalHMM can determine the amount of gene flow between the pop-

ulations after their initial split. It takes all possible migrated segments into

account using the powerful HMM algorithms. In collaboration I have build

new CoalHMM’s incorporating several pairs of sequences to infer direction and

variation in the gene flow. I show that estimating direction and variation jointly

is too hard. I show that estimating direction is possible with some uncertainty.

I apply the methods to a dataset of extinct and extant elephants and show that

there is extensive gene flow.

Instead of considering all possible segments with an HMM, I examine the

potential of only considering the most likely segments with particle filtering.

I present challenges and advantageous choices when implementing a particle

filter for this problem.

The second statistical model infers gene flow from a covariance matrix be-

tween several populations. Some relations between entries in the covariance

matrix can only be explained by gene flow. In collaboration I have developed a

method that fits the best phylogeny with gene flow events for an observed co-

variance matrix. The method uses MCMC. A phylogeny with gene flow events

is called an admixture graph and the method is called AdmixtureBayes. I show

that AdmixtureBayes has a smaller error than the most popular admixture

graph estimators on simulated data. AdmixtureBayes produces a posterior

sample of admixture graphs and I demonstrate the possibilities with such a

sample on a real dataset of Native American genomes.

The last statistical model infers very recent gene flow by classifying hybrid

individuals. The genome of a hybrid individual has big segments of alleles

originating from different populations. Using the allele frequencies from those

populations, it is possible to infer the segments. In collaboration, I implemented

ImmediateAncestry which infers the segments and the most likely hybrid type

with an HMM. I show that the classifier has good accuracy on simulated data.

The classifier is not robust for a real dataset of chimpanzees, so I discuss reasons

and remedies.

Translated title of the contribution | Inferens af genflow mellem population ved hjælp af statistiske metoder |
---|---|

Original language | English |

Place of publication | Aarhus |
---|---|

Publisher | Aarhus Universitet |

Number of pages | 139 |

Publication status | Published - 9 Oct 2018 |

See relations at Aarhus University Citationformats

No data available

ID: 130490309