Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies

Sebastiano Busato, Max Gordon, Meenal Chaudhari, Ib Thorsgaard Jensen, Turgut Yigit Akyol, Stig Uggerhøj Andersen, Cranos Williams

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperReviewResearchpeer-review

Abstract

The plant-associated microbiome is a key component of plant systems, contributing to their health, growth, and productivity. The application of machine learning (ML) in this field promises to help untangle the relationships involved. However, measurements of microbial communities by high-throughput sequencing pose challenges for ML. Noise from low sample sizes, soil heterogeneity, and technical factors can impact the performance of ML. Additionally, the compositional and sparse nature of these datasets can impact the predictive accuracy of ML. We review recent literature from plant studies to illustrate that these properties often go unmentioned. We expand our analysis to other fields to quantify the degree to which mitigation approaches improve the performance of ML and describe the mathematical basis for this. With the advent of accessible analytical packages for microbiome data including learning models, researchers must be familiar with the nature of their datasets.
Original languageEnglish
Article number102326
JournalCurrent Opinion in Plant Biology
Volume71
Number of pages11
ISSN1369-5266
DOIs
Publication statusPublished - Feb 2023

Keywords

  • Machine learning
  • Deep Learning
  • Plant-associated microbiome
  • Compositional data analysis
  • Deep learning
  • Microbiota
  • Algorithms
  • Plants
  • Machine Learning

Fingerprint

Dive into the research topics of 'Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies'. Together they form a unique fingerprint.

Cite this