Diazotroph community characterization via a highthroughput nifH amplicon sequencing and analysis pipeline

Abstract

The dinitrogenase reductase gene (nifH) is the most widely established molecular marker for the study of nitrogen-fixing prokaryotes in nature. A large number of PCR primer sets have been developed for nifH amplification, and the effective deployment of these approaches should be guided by a rapid, easy-to-use analysis protocol. Bioinformatic analysis of marker gene sequences also requires considerable expertise. In this study, we advance the state of the art for nifH analysis by evaluating nifH primer set performance, developing an improved amplicon sequencing workflow, and implementing a user-friendly bioinformatics pipeline. The developed amplicon sequencing workflow is a three-stage PCR-based approach that uses established technologies for incorporating sample-specific barcode sequences and sequencing adapters. Based on our primer evaluation, we recommend the Ando primer set be used with a modified annealing temperature of 58°C, as this approach captured the largest diversity of nifH templates, including paralog cluster IV/V sequences. To improve nifH sequence analysis, we developed a computational pipeline which infers taxonomy and optionally filters out paralog sequences. In addition, we employed an empirical model to derive optimal operational taxonomic unit (OTU) cutoffs for the nifH gene at the species, genus, and family levels. A comprehensive workflow script named TaxADivA (TAXonomy Assignment and DIVersity Assessment) is provided to ease processing and analysis of nifH amplicons. Our approach is then validated through characterization of diazotroph communities across environmental gradients in beach sands impacted by the Deepwater Horizon oil spill in the Gulf of Mexico, in a peat moss-dominated wetland, and in various plant compartments of a sugarcane field.IMPORTANCE Nitrogen availability often limits ecosystem productivity, and nitrogen fixation, exclusive to prokaryotes, comprises a major source of nitrogen input that sustains food webs. The nifH gene, which codes for the iron protein of the nitrogenase enzyme, is the most widely established molecular marker for the study of nitrogen-fixing microorganisms (diazotrophs) in nature. In this study, a flexible sequencing/analysis pipeline, named TaxADivA, was developed for nifH amplicons produced by Illumina paired-end sequencing, and it enables an inference of taxonomy, performs clustering, and produces output in formats that may be used by programs that facilitate data exploration and analysis. Diazotroph diversity and community composition are linked to ecosystem functioning, and our results advance the phylogenetic characterization of diazotroph communities by providing empirically derived nifH similarity cutoffs for species, genus, and family levels. The utility of our pipeline is validated for diazotroph communities in a variety of ecosystems, including contaminated beach sands, peatland ecosystems, living plant tissues, and rhizosphere soil.

Publication
Applied and Environmental Microbiology