Abstract
The progress achieved by sequencing technologies has revolutionized data collection, leading to an exponential increase in the genomic sequences available. The large amount of data places the field of bioinformatics in a unique position, where novel biological questions can be proposed. Accordingly, the existing mathematical models and computational methods need to be reformulated. I address this from an inference perspective in two areas of bioinformatics.
Population genetics studies the influence exerted by various factors on the dynamics of a population's genetic variation. These factors cover evolutionary forces, such as mutation and selection, but also changes in population size. The aim in population genetics is to untangle the history of a population from observed genetic variation. This subject is dominated by two dual models, the Wright-Fisher and coalescent. I first introduce a new approximation to the Wright-Fisher model, which I show to accurately infer split times between populations. This approximation can potentially be applied for inference of mutation rates and selection coefficients. I then illustrate how the coalescent process is the natural framework for detecting traces of common ancestry. Lastly, I discuss and extend efficient methods for calculating expectations of certain summary statistics of unobserved data.
The identification of the intricate patterns resulting from biological processes can often shed light on these mechanisms. I address two independent problems of pattern inference within bioinformatics. The first one is the occurrence of patterns described by regular expressions in observed or hidden sequences. I present how to detect statistically significant patterns in a list of ranked sequences, such as RNA sequences ranked after expression level. I then show how standard algorithms can be improved by including pattern occurrence in the hidden structure of observed sequences. Such a hidden structure could be the localization and composition of genes within a DNA sequence. The second problem I target is the computational prediction of the pattern of basepairs resulting in RNA secondary structure. I introduce an evolutionary algorithm to search for a good predictor. Additionally, given a predictor, I present how to improve it using the kinetics of RNA folding coupled with evolutionary information contained within an RNA alignment.
Population genetics studies the influence exerted by various factors on the dynamics of a population's genetic variation. These factors cover evolutionary forces, such as mutation and selection, but also changes in population size. The aim in population genetics is to untangle the history of a population from observed genetic variation. This subject is dominated by two dual models, the Wright-Fisher and coalescent. I first introduce a new approximation to the Wright-Fisher model, which I show to accurately infer split times between populations. This approximation can potentially be applied for inference of mutation rates and selection coefficients. I then illustrate how the coalescent process is the natural framework for detecting traces of common ancestry. Lastly, I discuss and extend efficient methods for calculating expectations of certain summary statistics of unobserved data.
The identification of the intricate patterns resulting from biological processes can often shed light on these mechanisms. I address two independent problems of pattern inference within bioinformatics. The first one is the occurrence of patterns described by regular expressions in observed or hidden sequences. I present how to detect statistically significant patterns in a list of ranked sequences, such as RNA sequences ranked after expression level. I then show how standard algorithms can be improved by including pattern occurrence in the hidden structure of observed sequences. Such a hidden structure could be the localization and composition of genes within a DNA sequence. The second problem I target is the computational prediction of the pattern of basepairs resulting in RNA secondary structure. I introduce an evolutionary algorithm to search for a good predictor. Additionally, given a predictor, I present how to improve it using the kinetics of RNA folding coupled with evolutionary information contained within an RNA alignment.
Original language | English |
---|
Publisher | Department of Computer Science, Aarhus University |
---|---|
Number of pages | 178 |
Publication status | Published - 2015 |