Projekter pr. år
Abstract
This dissertation falls in two parts. The rst part discusses statistical modelling
of association studies within the eld of epidemiology. A special focus
is given to genomewide association studies (GWAS), which are able to investigate
specic associations between positions in the genome and dierent
diseases. In the second part statistical methods for inferring population history
is discussed. Knowledge on e.g. the common ancestor of the human
species, possible bottlenecks back in time, and the expected number of rare
variants in each genome, may be factors in the full picture of any disease
aetiology.
Epidemiology
In epidemiology the wording "odds ratio" is used for the estimator of any
casecontrol study independent of the sampling of the controls. This phrase
is ambiguous without specications of the sampling schemes of the controls.
When controls are sampled among the nondiseased individuals at the end of
followup, i.e. the classical casecontrol study, the estimator is consistently
measuring the odds ratio (OR). If controls are sampled among those at
risk when each case is diagnosed, i.e. the matched casecontrol study, the
estimator consistently estimates the incidence rate ratio (IRR). The OR is
interpreted as the eect of an exposure on the probability of being diseased
at the end of followup, while the interpretation of the IRR is the eect of
an exposure on the probability of becoming diseased.
Through a simulation study, the OR from a classical casecontrol study
is shown to be an inconsistent estimator of the IRR. The dierence between
the OR and the IRR is re
ected in the pvalue of the null hypothesis of
no exposure eect. For multiple testing scenarios, e.g. in a GWAS, these
dierences in estimators imply a change in comparison between the null
hypotheses for dierent sampling schemes of controls.
Population genetics
In population genetics two methods concerning the inference of the population
size back in time are described. Both methods are based on the site
iii
iv
frequency spectrum (SFS), and the fact that the expected SFS only depends
on the time between coalescent events back in time.
The rst method provides a simple goodness of t test by comparing the
observed SFS with the expected SFS under a given model of population size
changes. By the use of Monte Carlo estimation the expected time between
coalescent events can be estimated and the expected SFS can thereby be
evaluated. Using the classical chisquare statistics we are able to infer single
parameter models. Multiple parameter models, e.g. multiple epochs, are
harder to identify.
By introducing the inference of population size back in time as an inverse
problem, the second procedure applies the theory of smoothing splines
to infer the changes in population size. By adding a penalising term to
the goodnessoft described above, we are able to estimate the integrated
intensity of the coalescent process by a two times continuous dierentiable
piecewise cubic polynomial.
of association studies within the eld of epidemiology. A special focus
is given to genomewide association studies (GWAS), which are able to investigate
specic associations between positions in the genome and dierent
diseases. In the second part statistical methods for inferring population history
is discussed. Knowledge on e.g. the common ancestor of the human
species, possible bottlenecks back in time, and the expected number of rare
variants in each genome, may be factors in the full picture of any disease
aetiology.
Epidemiology
In epidemiology the wording "odds ratio" is used for the estimator of any
casecontrol study independent of the sampling of the controls. This phrase
is ambiguous without specications of the sampling schemes of the controls.
When controls are sampled among the nondiseased individuals at the end of
followup, i.e. the classical casecontrol study, the estimator is consistently
measuring the odds ratio (OR). If controls are sampled among those at
risk when each case is diagnosed, i.e. the matched casecontrol study, the
estimator consistently estimates the incidence rate ratio (IRR). The OR is
interpreted as the eect of an exposure on the probability of being diseased
at the end of followup, while the interpretation of the IRR is the eect of
an exposure on the probability of becoming diseased.
Through a simulation study, the OR from a classical casecontrol study
is shown to be an inconsistent estimator of the IRR. The dierence between
the OR and the IRR is re
ected in the pvalue of the null hypothesis of
no exposure eect. For multiple testing scenarios, e.g. in a GWAS, these
dierences in estimators imply a change in comparison between the null
hypotheses for dierent sampling schemes of controls.
Population genetics
In population genetics two methods concerning the inference of the population
size back in time are described. Both methods are based on the site
iii
iv
frequency spectrum (SFS), and the fact that the expected SFS only depends
on the time between coalescent events back in time.
The rst method provides a simple goodness of t test by comparing the
observed SFS with the expected SFS under a given model of population size
changes. By the use of Monte Carlo estimation the expected time between
coalescent events can be estimated and the expected SFS can thereby be
evaluated. Using the classical chisquare statistics we are able to infer single
parameter models. Multiple parameter models, e.g. multiple epochs, are
harder to identify.
By introducing the inference of population size back in time as an inverse
problem, the second procedure applies the theory of smoothing splines
to infer the changes in population size. By adding a penalising term to
the goodnessoft described above, we are able to estimate the integrated
intensity of the coalescent process by a two times continuous dierentiable
piecewise cubic polynomial.
Originalsprog  Engelsk 

Forlag  Århus Universitet 

Antal sider  180 
Status  Udgivet  30 sep. 2016 
Projekter
 1 Igangværende

iPSYCH: The Lundbeck Foundation Initiative for Integrative Psychiatric Research
Bang Rasmussen, A. (Deltager)
01/01/2015 → …
Projekter: Projekt › Forskning