Reference free phasing and representation of complex variation

Research output: Book/anthology/dissertation/reportPh.D. thesis

  • Jacob Malte Jensen
High throughput sequencing has revolutionized our ability to interrogate genomes and entire human genomes are sequenced daily across the world. Mapping of short reads to a reference genome has enhanced our ability to detect genetic variation and is currently the most widely used technology to detect and call variation in humans. However, it has become evident that mapping of short reads to a single reference genome is subject to ascertainment bias (reference bias). This bias is especially pronounced in complex regions of the genome and particularly hampers detection of structural variation. Therefore, new methods for detecting variation that reduce reference bias are needed including ways of representing genomes that account for the variability within and between populations. The major histocompatibility complex (MHC) region is one of the most diverse and complex regions of the human genome. The region contains genes that play central roles in the immune response and have been associated with far more diseases than any other locus in the human genome. However, due to the complexity of the region, identifying causal variants have been challenging and in many cases futile. We have developed a new method to phase the MHC region without relying on a reference genome. Here, we present 100 de novo assembled and fully resolved MHC haplotypes from the Danish population. We use the haplotypes to call a large set of variants including a significant amount of structural variants. We use this call set to perform a population genetics analysis of the region. We also show that our haplotypes contain more than 700kb of novel sequence and that some of the novel segments are common and polymorphic in the Danish population. Finally, we propose and implement a new method to construct population reference graphs from complete haplotypes and show that it can be used to efficiently store variation from the complex MHC region.
Original languageEnglish
Number of pages186
Publication statusPublished - 27 Jul 2017

See relations at Aarhus University Citationformats

ID: 115150535