## Abstract

In this thesis we consider three algorithmic problems that appear in computational biology and are mainly about comparing and constructing rooted phylogenetic trees and networks. A rooted phylogenetic tree is a rooted unordered tree that is used to represent evolutionary relationships among species. A rooted phylogenetic network is a rooted directed acyclic graph, and is able to represent more complex evolutionary relationships that arise because of reticulation events such as horizontal gene transfer and hybridization.

The first problem is about computing the rooted triplet distance between two rooted phylogenetic trees that are built on the same leaf label set. We provide two new algorithms that are cache oblivious, one optimized for binary trees and another that works for trees of arbitrary degree. Our algorithms achieve the best time and space bounds in the RAM model, and at the same time are the first to scale to external memory. We implement our algorithms and compare them experimentally against previous state-of-the-art algorithms, and show that our algorithms achieve the best performance in practice.

The second problem is about computing the rooted triplet distance between two rooted phylogenetic networks that are built on the same leaf label set. Previous algorithms were either based on a trivial approach or only worked under certain restrictions on the degrees of the nodes of the two input networks. We provide algorithms that have no such restrictions, as well as implementations and extensive experiments on both simulated and real datasets illustrating their practical performance.

Finally, the third problem is about constructing rooted phylogenetic trees, where the input is a set of binary trees on three leaves over a leaf label set, and the objective is to construct a rooted phylogenetic tree that has as embedded subtrees the largest number of input trees. This turns out to be an NP-Hard problem. Previous approximation algorithms produced trees with an arbitrary number of internal nodes, some even always produced binary trees. We study the computational complexity of the version of this problem, where there is a constraint on the number of internal nodes that the output tree is allowed to have. We provide several inapproximability results, approximation algorithms, as well as implementations and extensive experiments on both simulated and real datasets.

The first problem is about computing the rooted triplet distance between two rooted phylogenetic trees that are built on the same leaf label set. We provide two new algorithms that are cache oblivious, one optimized for binary trees and another that works for trees of arbitrary degree. Our algorithms achieve the best time and space bounds in the RAM model, and at the same time are the first to scale to external memory. We implement our algorithms and compare them experimentally against previous state-of-the-art algorithms, and show that our algorithms achieve the best performance in practice.

The second problem is about computing the rooted triplet distance between two rooted phylogenetic networks that are built on the same leaf label set. Previous algorithms were either based on a trivial approach or only worked under certain restrictions on the degrees of the nodes of the two input networks. We provide algorithms that have no such restrictions, as well as implementations and extensive experiments on both simulated and real datasets illustrating their practical performance.

Finally, the third problem is about constructing rooted phylogenetic trees, where the input is a set of binary trees on three leaves over a leaf label set, and the objective is to construct a rooted phylogenetic tree that has as embedded subtrees the largest number of input trees. This turns out to be an NP-Hard problem. Previous approximation algorithms produced trees with an arbitrary number of internal nodes, some even always produced binary trees. We study the computational complexity of the version of this problem, where there is a constraint on the number of internal nodes that the output tree is allowed to have. We provide several inapproximability results, approximation algorithms, as well as implementations and extensive experiments on both simulated and real datasets.

Original language | English |
---|

Publisher | Aarhus University |
---|---|

Number of pages | 169 |

Publication status | Published - Oct 2019 |