TY - JOUR
T1 - Compositionally aware estimation of crosscorrelations for microbiome data
AU - Jensen, Ib Thorsgaard
AU - Janss, Luc
AU - Radutoiu, Simona
AU - Waagepetersen, Rasmus
N1 - Publisher Copyright:
© 2024 Jensen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2024/6
Y1 - 2024/6
N2 - In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the novel methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data. Spar- CEV and SparXCC both assume that the average correlation in the dataset is zero. Iterative versions of SparCEV and SparXCC are proposed to alleviate bias resulting from deviations from this assumption. We compare these new methods to empirical Pearson cross-correlations after applying naive transformations of the data (log and log-TSS). Additionally, we test the centered log ratio transformation (CLR) and the variance stabilising transformation (VST). We find that CLR and VST outperform naive transformations, except when the correlation matrix is dense. SparCEV and SparXCC outperform CLR and VST when the number of OTUs is small and perform similarly to CLR and VST for large numbers of OTUs. Adding the iterative procedure increases accuracy for SparCEV and SparXCC for all cases, except when the average correlation in the dataset is close to zero or the correlation matrix is dense. These results are consistent with our theoretical considerations.
AB - In the field of microbiome studies, it is of interest to infer correlations between abundances of different microbes (here referred to as operational taxonomic units, OTUs). Several methods taking the compositional nature of the sequencing data into account exist. However, these methods cannot infer correlations between OTU abundances and other variables. In this paper we introduce the novel methods SparCEV (Sparse Correlations with External Variables) and SparXCC (Sparse Cross-Correlations between Compositional data) for quantifying correlations between OTU abundances and either continuous phenotypic variables or components of other compositional datasets, such as transcriptomic data. Spar- CEV and SparXCC both assume that the average correlation in the dataset is zero. Iterative versions of SparCEV and SparXCC are proposed to alleviate bias resulting from deviations from this assumption. We compare these new methods to empirical Pearson cross-correlations after applying naive transformations of the data (log and log-TSS). Additionally, we test the centered log ratio transformation (CLR) and the variance stabilising transformation (VST). We find that CLR and VST outperform naive transformations, except when the correlation matrix is dense. SparCEV and SparXCC outperform CLR and VST when the number of OTUs is small and perform similarly to CLR and VST for large numbers of OTUs. Adding the iterative procedure increases accuracy for SparCEV and SparXCC for all cases, except when the average correlation in the dataset is close to zero or the correlation matrix is dense. These results are consistent with our theoretical considerations.
UR - http://www.scopus.com/inward/record.url?scp=85197194840&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0305032
DO - 10.1371/journal.pone.0305032
M3 - Journal article
C2 - 38941272
AN - SCOPUS:85197194840
SN - 1932-6203
VL - 19
JO - PLOS ONE
JF - PLOS ONE
IS - 6
M1 - e0305032
ER -