Deploying a Single Strand of Life to Unravel the Environmental Microbiomes in Action

Research output: Book/anthology/dissertation/reportPh.D. thesisResearch

  • Muhammad Zohaib Anwar
Metatranscriptomics allows an unprecedented insight to complex functional dynamics of microbial communities in various environments. The method is typically used to identify, quantify, and compare the functional response of microbial communities in natural habitats or in relation to environmental or physio-chemical impacts. Using High Throughput Sequencing (HTS) techniques such as Illumina, metatranscriptomics offers a non-PCR–biased state-of-the-art method for identifying and quantifying the transcriptional activity occurring within a complex and diverse microbial population at a specific point in time. To the best of my knowledge, a gold standard or a standardized bioinformatics road map to analyze the gigantic data produced from total RNA based metatrnascriptomics is an evident gap in studying active microbial communities. The foci of this thesis were to benchmark, design, implement and validate an open and reproducible bioinformatics workflow to analyze environmental metatranscriptomes. Furthermore, this thesis also highlights the potential caveats and future prospects of metatranscriptomics with regards to game changing third generation sequencers. The Introduction in section Overview paints a comprehensive picture of evolution of the culture independent -- sequencing-based -- structural and functional microbial ecology starting from the amplicon-based analysis, shotgun metagenomics to the state-of-the-art metatranscriptomics along with their usability and caveats. Moreover, Introduction also emphasizes the use of benchmarking as a stalwart in taking microbial bioinformatics forward given the increase in throughput of Next Generation Sequencing (NGS). Discussion section mainly elaborate the methodological considerations, challenges and a few initial failures in benchmarking, developing and implementation of the workflow. Discussion section also highlights the significance of open, transparent and reproducible structure of implementation that allows structured modifications and up-gradation. Finally, the Discussion section highlights the identification of caveats of the total RNA based metatranscriptomics and indicates towards future prospects in improving the method.

Overall six manuscripts (published, under-review and in preparation) and a computational capsule are enclosed in this thesis. Conceptually, this section is divided into three areas, i) Bioinformatics method development ii) Analysis of infidelity of Reverse Transcriptase (RT) and formulation of its potential impact on metatranscriptomes and iii) Case-studies of Bioinformatics method developed.

Manuscript I focuses on benchmarking two widely used alternatives of metatranscriptomic analysis; "assembly-based" and "assembly-free". This manuscript investigates these two contrasting approaches by analyzing a simulated dataset and two real-world metatranscriptomes from different environments. It also signifies how choice of approach has significant impact on the interpretation and understanding of the transcriptional changes in the respective environment. Furthermore, based on the benchmarking, this manuscript presents a thorough road map and a standardized worklfow -- Comparative Metatranscriptomics Workflow (CoMW) -- that helps in making informed decisions when analyzing complex metatranscriptomes. Manuscript II is a peer-reviewed computational capsule that presents a workflow in an open, modular and reproducible structure. Manuscript II strengthens the implementation structure by making the workflow available as a platform independent, easy to install (docker, anaconda container) gold standard workflow. It has the capacity of being structurally modified as per future expected improvements in sequencing technologies.

Manuscript III nudges on a potential caveat of metatranscriptomics by investigating the infidelity of Reverse Transcriptase (RT) enzyme. It provides an overview of types of errors, and their potential association to GC-content. Manuscript III highlights these errors and their potential implications in metatranscriptomes by using deep sequencing of DNA and cDNA of two strains Salmonella enterica susbsp. enterica serovar Enteriditis PT1 and Sphingobium herbicidovorans MH. Manuscript IV presents a first circular and complete genome of Salmonella enterica susbsp. enterica serovar Enteriditis PT1 which was assembled using a hybrid Illumina + Oxford Nanopore sequencing approach. This closed circular genome was used in Manuscript III to evaluate the assembly and errors of the RT enzyme.

Manuscript V, VI and VII are three case-studies of Comparative Metatranscriptomics Workflow (CoMW). These three independent studies were designed to study the transcriptional response of microbial communities in response to external stimuli using CoMW. Manuscript V presents a study designed to investigate the transcriptional response during warming from - 10 °C to 2 °C and subsequent cooling from 2 °C to - 10 °C of an Arctic tundra active layer soil from Svalbard, Norway. Manuscript VI presents a study of compositional and transcriptional effects on microbial communities in Danish forest and agricultural soils in response to wood ash amendment. Manuscript VII demonstrate the heat shock response of the active microbial communities from perennial cave ice. Data from manuscripts V and VI were also analyzed by the alternative "assembly-free" approach to highlight the impact of choice on interpretation, in Manuscript I.
Original languageEnglish
PublisherAarhus Universitet
Number of pages183
Publication statusPublished - Oct 2019

Note re. dissertation

Defence date: 24-10-2019

See relations at Aarhus University Citationformats

ID: 166967605