Sequence reprocessing
We obtained raw 16S rRNA gene amplicon data and metadata from the NCBI
Sequence Read Archive with the exception of two datasets, one of which
came from another database, and the other was obtained directly from the
authors (see Table S1 for accession numbers). We processed sequences in
R 3.4.3
(Team
2017) using the dada2 package
(Callahanet al. 2016a). Prior to processing, we visually inspected two
sequences per study with the plotQualityProfile to determine
whether the reads had been merged prior to archiving and to confirm that
primers were not present. We only used forward reads because reverse
reads were not available for all studies. Following inspection, we
trimmed and truncated sequences on a study-by-study basis (see Table S1
for trimming and truncation lengths) to preserve a 90 bp segment, the
minimum recommended in the Earth Microbiome Project protocols
(Thompsonet al. 2017). We filtered, dereplicated, and chimera‐checked
each read using standard workflow parameters
(Callahanet al. 2016b), and assigned reads to exact sequence variants
(ASVs, 100% sequence identity in the 90 bp segment) with the SILVA
v.132 training set
(Quastet al. 2013). We removed reads not assigned to the domain
Bacteria. Details about the percentage of reads lost at each step of
sequence processing, per study, are included in Figure S1. Due to the
wide range of sequencing depths across samples, we standardized samples
to 1,500 reads per sample. To ensure that our findings were not affected
by observation depth, we ran all analyses in parallel using the deepest
possible observation depth (with a lower bound of 1,500 reads per
sample) for each study (Table S1). As our findings were consistent
regardless of standardization (Figure S2), and we present only the
results from the global standardization. To examine the completeness of
each sample relative to the total richness in a community, we calculated
coverage (Chao and Jost 2012) using the BetaC package
(Engelet al. 2020). On average, our samples represented 0.96±0.05
(mean±sd) of the community. We removed any time points that had less
than three experimental replicates for each time series. We coded time
series so that time (days) ≥ 0 occurred after disturbance, and time
< 0 denoted the pre-disturbance community. In line with our
conservative approach to sequence processing, our analyses focused on
the dominant portion of the community.