Addressing sources of variance in large-scale metabolomics

Research output: Types of ThesisPhD thesis

Abstract

Metabolomics offers a direct insight into the biochemical processes of a biological sample, providing a
snapshot of its metabolic state. By analyzing the complete set of metabolites – small molecules of the
cellular processes – metabolomics allows researchers to investigate organisms and uncover biomarkers
for disease diagnosis, progression, and treatment response. This powerful approach integrates complex
data from various analytical techniques, such as liquid chromatography-mass spectrometry (LC-MS)
and matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS), allowing screening
of both known and unknown molecules for a holistic search of new markers. A significant limitation of
metabolomics is the difficulties in separating variance that arise from technical and biological
confounders from the outcome variance of interest. Biological variance can stem from genetic diversity,
environmental factors, and general day-to-day fluctuations among samples. Technical variance, on the
other hand, originates from factors such as inconsistencies in sample preparation, instrument
performance, and computational data processing. In this thesis, I aim to explore and mitigate these
sources of variance to enhance the reliability and reproducibility of metabolomics studies. By
employing robust statistical models, machine learning techniques, and tailored experimental protocols,
I optimize the extraction of true biological signals amidst the noise. I show that large-scale
metabolomics studies are viable and have used them specifically to gain insights into aging processes,
osteoporosis progression, and antimicrobial resistance predictions. These studies demonstrate the
methodological perspectives of large-scale untargeted metabolomics, longitudinal sampling and end-to-end workflows. Specifically, I have modelled person age with root mean squared error (RMSE) of
5.77 years in ten thousand untargeted metabolomics samples, predicted osteoporosis one year ahead of
the original diagnosis at an area under the receiver operating curve (ROC-AUC) of 0.72, and
demonstrated that end-to-end modeling of MALDI-MS data may predict antibiotic resistance with near
perfect accuracy. The proposed approaches not only improve the precision of metabolic profiling but
also pave the way for more accurate biomarker discovery and better understanding of disease
mechanisms. Finally, I highlight that the presented work needs methodological and biological validation
in future studies to assess the overall generalizability.
Translated title of the contributionIdentificering af varianskilder i storskala metabolomics
Original languageEnglish
QualificationPhD
Awarding Institution
  • Aarhus University
Supervisors/Advisors
  • Nielsen, Kirstine Lykke, Co-supervisor
  • Villesen, Palle, Supervisor
Award date8 Apr 2025
Publisher
Publication statusPublished - 8 Apr 2025

Fingerprint

Dive into the research topics of 'Addressing sources of variance in large-scale metabolomics'. Together they form a unique fingerprint.

Cite this