Deep sequencing of Danish Holstein dairy cattle for variant detection and insight into potential loss-of-function variants in protein coding genes

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

BACKGROUND: Over the last few years, continuous development of high-throughput sequencing platforms and sequence analysis tools has facilitated reliable identification and characterization of genetic variants in many cattle breeds. Deep sequencing of entire genomes within a cattle breed that has not been thoroughly investigated would be imagined to discover functional variants that are underlying phenotypic differences. Here, we sequenced to a high coverage the Danish Holstein cattle breed to detect and characterize single nucleotide polymorphisms (SNPs), insertion/deletions (Indels), and loss-of-function (LoF) variants in protein-coding genes in order to provide a comprehensive resource for subsequent detection of causal variants for recessive traits.

RESULTS: We sequenced four genetically unrelated Danish Holstein cows with a mean coverage of 27X using an Illumina Hiseq 2000. Multi-sample SNP calling identified 10,796,794 SNPs and 1,295,036 indels whereof 482,835 (4.5 %) SNPs and 231,359 (17.9 %) indels were novel. A comparison between sequencing-derived SNPs and genotyping from the BovineHD BeadChip revealed a concordance rate of 99.6-99.8 % for homozygous SNPs and 93.3-96.5 % for heterozygous SNPs. Annotation of the SNPs discovered 74,886 SNPs and 1937 indels affecting coding sequences with 2145 being LoF mutations. The frequency of LoF variants differed greatly across the genome, a hot spot with a strikingly high density was observed in a 6 Mb region on BTA18. LoF affected genes were enriched for functional categories related to olfactory reception and underrepresented for genes related to key cellular constituents and cellular and biological process regulation. Filtering using sequence derived genotype data for 288 Holstein animals from the 1000 bull genomes project removing variants containing homozygous individuals retained 345 of the LoF variants as putatively deleterious. A substantial number of the putative deleterious LoF variants had a minor allele frequency >0.05 in the 1000 bull genomes data set.

CONCLUSIONS: Deep sequencing of Danish Holstein genomes enabled us to identify 12.1 million variants. An investigation into LoF variants discovered a set of variants predicted to disrupt protein-coding genes. This catalog of variants will be a resource for future studies to understand variation underlying important phenotypes, particularly recessively inherited lethal phenotypes.

Original languageEnglish
Article number1043
JournalB M C Genomics
Pages (from-to)1-12
Number of pages12
Publication statusPublished - 2015

See relations at Aarhus University Citationformats

ID: 95244239