Description
The development of culture-independent methods has borne fruit; metagenomics and related fields are thriving thanks to the well-developed pipelines and decades of work. Given the increased proliferation of whole genome sequenced data, we can characterisen metagenomic environments by function and taxonomy. Comparative metagenomics is based on determining features of interest from differentially abundant counts. Differential abundance does not imply importance to one's study, and can be the result of differences in diversity. A key question of interest in comparative metagenomics is to identify the role of environment on the functionome. Essentially, which gene functions are undergoing selection pressures in a given environment? We present MetaDEx, a simple Python package for determining environmentally relevant functions and subsystems for comparative metagenomic data. The package receives annotations from MG-RAST and continues with a simple workflow for determining genes and subsystems of interest. To validate our tool, we ran MetaDEx with MG-RAST data for two Moroccan lagoons: Nador and Oualidia. Here we show how it was used to determine environmentally relevant functional categories, including genes for photosynthesis and bilin biosynthesis. MetaDEx combines a variety of data-wrangling methods in a simple and sequential workflow, simplifying the comparative metagenomic goal of determining important features. It is available under MIT license through PyPI.