Aarhus University Seal / Aarhus Universitets segl

Leveraging Multiple Layers of Data To Predict Drosophila Complex Traits

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review


  • Fabio Morgante, North Carolina State University
  • ,
  • Wen Huang, North Carolina State University
  • ,
  • Peter Sørensen
  • Christian Maltecca, North Carolina State University
  • ,
  • Trudy F.C. Mackay, North Carolina State University

The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels (r = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes (r = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response (r = 0.15 and 0.14 for female and male genotypes, respectively; and r = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 (r = 0.39 for transcripts) and GO:0032870 (r = 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three Drosophila complex phenotypes.

Original languageEnglish
JournalG3 (Bethesda, Md.)
Pages (from-to)4599-4613
Number of pages15
Publication statusPublished - Dec 2020

    Research areas

  • Drosophila Genetic Reference Panel, Gene Ontology informed prediction, Genomic prediction, GenPred, Shared data resources, Transcriptomic prediction

See relations at Aarhus University Citationformats

ID: 202548390