Ancestral multi-species population genomics

Research output: Book/anthology/dissertation/reportPh.D. thesis

Abstract

The ancestral history of DNA sequences cannot be represented with a single genealogy, and, thus, its
reconstruction is an arduous task, especially in the multi-species case. In this thesis, and reflecting on
the work I have conducted during my PhD, I present a collection of six articles showcasing theoretical,
methodological, and empirical challenges and solutions for inferring the ancestral history of genomic
samples. In the first paper, we characterize the pervasive signal of incomplete lineage sorting (ILS)
in primates, reconstructing the primate phylogeny with estimated species split times and linking the
genome-wide variation of ILS to selective processes. We also provide examples of genomic locations
consistently showing low and high levels of ILS across primates. In another study targeting ILS in the
rapidly radiating marsupial phylogeny, we solve the previously troublesome phylogenetic placement of
a marsupial species, we identify genes under the influence of ILS that might contribute to phenotypic
hemiplasy, and we experimentally validate these candidate genes using transgenic mice. In the third study,
an ILS-aware exploration of bird genomes enabled the inference of an updated topology of the Neoaves
clade within the avian phylogeny. It also revealed a long high-ILS region in an avian chromosome, in
which we hypothesize that recombination was suppressed during millions of years due to polymorphic
large-scale chromosomal rearrangements. In the fourth paper, we describe the development of a novel
coalescent-based hidden Markov model, in which we derive the coalescent-with-recombination formulas
for the sorting of lineages of three species in a time-discretized coalescent space. This allows for the
unbiased estimation of the population genetics parameters of the speciation process and the inference of
the multi-species ancestral recombination graph through posterior decoding of the hidden states. In the
fifth manuscript, we provide a novel way of rethinking the coalescent through phase-type distributions,
which can be used to model many quantities in population genomics through general formulas in matrix
notation. We include various examples to illustrate the usefulness of phase-type theory in population
genetics by contrasting them to classical mathematical derivations. Finally, in the last paper, we describe
an R package we developed, which contains general-purpose and user-friendly functions for key operations
arising from phase-type theory. We further supplement the software with extensive documentation and
examples from population genetics.
Original languageEnglish
PublisherAarhus University
Number of pages176
Publication statusPublished - Oct 2023

Fingerprint

Dive into the research topics of 'Ancestral multi-species population genomics'. Together they form a unique fingerprint.

Cite this