Sequence reprocessing
We obtained raw 16S rRNA gene amplicon data and metadata from the NCBI Sequence Read Archive with the exception of two datasets, one of which came from another database, and the other was obtained directly from the authors (see Table S1 for accession numbers). We processed sequences in R 3.4.3 (Team 2017) using the dada2 package (Callahanet al. 2016a). Prior to processing, we visually inspected two sequences per study with the plotQualityProfile to determine whether the reads had been merged prior to archiving and to confirm that primers were not present. We only used forward reads because reverse reads were not available for all studies. Following inspection, we trimmed and truncated sequences on a study-by-study basis (see Table S1 for trimming and truncation lengths) to preserve a 90 bp segment, the minimum recommended in the Earth Microbiome Project protocols (Thompsonet al. 2017). We filtered, dereplicated, and chimera‐checked each read using standard workflow parameters (Callahanet al. 2016b), and assigned reads to exact sequence variants (ASVs, 100% sequence identity in the 90 bp segment) with the SILVA v.132 training set (Quastet al. 2013). We removed reads not assigned to the domain Bacteria. Details about the percentage of reads lost at each step of sequence processing, per study, are included in Figure S1. Due to the wide range of sequencing depths across samples, we standardized samples to 1,500 reads per sample. To ensure that our findings were not affected by observation depth, we ran all analyses in parallel using the deepest possible observation depth (with a lower bound of 1,500 reads per sample) for each study (Table S1). As our findings were consistent regardless of standardization (Figure S2), and we present only the results from the global standardization. To examine the completeness of each sample relative to the total richness in a community, we calculated coverage (Chao and Jost 2012) using the BetaC package (Engelet al. 2020). On average, our samples represented 0.96±0.05 (mean±sd) of the community. We removed any time points that had less than three experimental replicates for each time series. We coded time series so that time (days) ≥ 0 occurred after disturbance, and time < 0 denoted the pre-disturbance community. In line with our conservative approach to sequence processing, our analyses focused on the dominant portion of the community.