Materials and Methods
Zoopagomycota fungi in the mock community – A mock
community comprised of Zoopagomycota fungi was created by combining
equilibrated aliquots of genomic DNA from species of Dimargaritales,
Kickxellales, and Zoopagales. DNA of mock community members was obtained
from cultures grown from the University of Florida Gerald Benny culture
collection or received from collaborators (TABLE 1). Species of
Dimargaritales, Piptocephalis, and Syncephalis are
haustorial mycoparasites and were grown in dual cultures with their host
fungi (Benny et al., 2016b). Accordingly, the DNA extracts from these
cultures also contained an unknown quantity of host DNA. Similarly,
genomic DNAs from the Davis et al. (2019) study were single cell genomes
amplified by multiple displacement methods and also contained DNA from
both the fungi and their host organisms. Species of Kickxellales are
saprotrophic and were grown axenically (Benjamin, 1958). A preliminary
test of ITS1F/ITS2 primers on DNA from Piptocephalis andSyncephalis species mixed with soil DNA indicated that increased
ITS1 length in these taxa resulted in reduced amplification and
sequencing (SUPP FIG 1). We included those taxa and additional taxa with
known length variation in our mock community to further examine these
potential biases across different markers. The final community contained
30 isolates of Zoopagomycota fungi (TABLE 1). We also generated
reference Sanger sequences from individual mock community members using
primer pair LROR/LR5 (Hopple and Vilgalys 1994; Vilgalys and Hester
1990) for LSU and primer pair ITS1F/ITS4 (Gardes & Bruns, 1993; White
et al., 1990) for ITS. Reference sequences were verified by BLAST
analysis against NCBI GenBank and with phylogenetic reconstruction (data
not shown). A non-biological, equimolar DNA mock community which
consisted of a mixture of 12 synthetic single-copy sequences (SYNMO) was
included alongside the ITS1 samples to help detect index bleed between
samples and evaluate bioinformatic parameters (Palmer et al., 2018). All
OTUs recovered from the mock community samples were submitted for BLAST
searches to assess the taxonomic identity of the OTUs for comparison
against the bioinformatic output.
Environmental samples – We collected environmental
samples from five sites in CA and two in FL (SUPP TABLE 1). At each site
five substrates were collected: 1) bulk water from a freshwater stream
or pond (water), 2) saturated sediment from the edge of the water body
(mud), 3) the upper soil layers and leaf litter, consisting of the
visible organic layer (topsoil), 4) the mineral soil layers below the
topsoil (deep soil), and 5) microinvertebrates collected from the soil
samples using Baermann funnels (invertebrates). At each site five
replicates of each sample type were collected. The topsoil and deep soil
replicates were collected at increasing distances from the water source
along a 25 m transect (i.e. sample 1 was closest to the water and soil
sample 5 was furthest from the water). For each sample approximately
15-25 mL of soil was collected into sterile 50 mL tubes and filled with
sterile 2x Cetyl Trimethyl Ammonium Bromide lysis buffer (CTAB) to 30-35
mL. Water samples were collected in 950 mL sterilized Mason jars by
dipping the jar into the water from the embankment. Water samples
included some sediment and debris present on the bottom or floating on
the top of the water. Large debris such as sticks, rocks, and clumps of
leaves were removed. Vacuum filtration and a sterile Büchner funnel were
used to filter the water through filter paper (6 µm pore size). After
filtration, the filter papers were immediately placed in sterile 50 mL
tubes filled with CTAB. No water samples were collected from the
“Sweeney wash” site in California because there was no standing water
although mud samples were collected from a wet depression between rocks.
Microinvertebrates (protists, nematodes, tardigrades, etc.) were
collected using Baermann funnels. A 50/50 mixture of topsoil and deep
soil was added to a funnel, filled with sterilized water, and covered
with Parafilm. One mL of water was collected in a sterile tube after 24
hours incubation and stored in a -20 C freezer. A second mL of water was
collected during the second 24-hour period. The two samples were
centrifuged at 15000 x g for 15 minutes, the excess water was drained,
and then the samples were combined into one tube with CTAB for DNA
extraction. For the three “Sweeney” desert sites in California, three
rather than five Baermann funnel replicates were obtained due to limited
funnel availability. Five microinvertebrate replicates were performed
for all other sites. Baermann funnel samples are enriched for potential
hosts of Zoopagales parasites and were used to target these taxa.
DNA Extraction – DNA extraction followed a modified
CTAB protocol (Gardes & Bruns, 1993). Topsoil, deep soil, and filter
papers from the water samples were subjected to several cycles of
freezing and thawing in 50 mL tubes with CTAB prior to DNA extraction.
Several sterilized glass beads were then added, and samples were shaken
for 1 minute at 1500 RPM in a 1600 MiniG tissue homogenizer (SPEX,
Metuchen, NJ, USA). Two mL of CTAB was collected from each 50 mL tube
and placed in sterile microcentrifuge tubes. Samples in CTAB were
incubated with a 1:1 mixture of phenol:chloroform overnight and then
washed with an additional chloroform step. The remainder of the CTAB
protocol was performed without modification. Extractions from cultures
followed the methods of Reynolds et al. (2019) and also used the CTAB
protocol. Following extraction, DNA yield was estimated by a Nanodrop
2000 spectrophotometer (ThermoFisher Scientific, Waltham, MA, USA).
Samples with genomic DNA concentrations >200 ng/ul were
further cleaned with the DNeasy PowerClean Pro cleanup kit (Qiagen,
Germantown, MD, USA); samples with low DNA yield (<200 ng/uL)
were not cleaned due to loss of DNA during the clean-up process.
Primer Selection – A multiple sequence alignment
containing Dikarya, Mucoromycota, and Zoopagomycota sequences from
GenBank, as well as the newly sequenced Zoopagomycota species, was
created in Mesquite (Maddison & Maddison, 2019) and aligned with MUSCLE
(Edgar, 2004). This reference alignment was used to evaluate the number
of mismatches between published primer sequences and Zoopagomycota
fungi. After testing several modified primer combinations on four
samples, we chose the primers ITS1F and ITS2 (White et al., 1990) for
the ITS1 region for further comparisons. For LSU, we chose the LROR/LR3
(Hopple & Vilgalys, 1994) primer set, which has been successfully used
in metagenomics studies by processing only the forward reads (Benucci et
al., 2019; Bonito et al., 2014; Johansen et al., 2016). We also included
a modified forward primer LR22F (5´-GAGACCGATAGHRHACAAG-3´) used in
combination with reverse primer LR3. LR22F is the reverse compliment of
primer LR22 to which we added three degenerate positions to maximize
compatibility with EDF. This primer is similar to LR22R (Mueller et al.,
2015, 2016) but shifted upstream 8 bp because target Zoopagomycota taxa
in our alignment had mismatches to LR22R (SUPP FIG 2). Hereafter we
refer to these primer sets by the forward primer: ITS1F, LROR or LR22F.
Library Preparation – We prepared the ITS1F Illumina
library using the thermocycling protocols of Truong et al. (2019).
Briefly, we used Phusion high fidelity polymerase (ThermoFisher
Scientific, Waltham, MA, USA) and a dual-indexing approach with the
following modifications. Amplification and sequencing with LROR were the
same as for ITS1F except that the thermocycling program for the initial
amplification step was 95°C for 1 minute, step down of -0.1°C per cycle
from 55°C, and 72°C for 1:30 with a total of 30 cycles and a final
elongation step of 72°C for 10 minutes. Temperature gradient tests were
performed for LR22F to determine the best annealing temperature. The
thermocycling program for LR22F was the same as for LROR except that the
annealing temperature was a step down of -0.2°C per cycle starting from
65°C. All samples were amplified in three separate replicates and the
replicates were combined prior to the indexing reaction. Negative PCR
controls were included in all reactions. Amplicons from both the initial
amplification and index attachment reaction were verified on 1.5%
agarose gels. Indices were added using the Nextera XT index kit v2
(Illumina, San Diego, CA, USA) in a separate PCR step with the following
protocol: GoTaq Green master mix (Promega, Madison, WI, USA) was used
for the reaction and the thermocycling program included nine cycles at
94°C for 1 minute, 55°C for 1 minute, 72°C for 1:30, and a final
elongation step of 72°C for 10 minutes. Indexed PCR products were pooled
into groups of three based on similar band intensity and then cleaned
with the Select-a-Size DNA Clean and Concentrator kit (Zymo, Irvine, CA,
USA). All samples were then quantified using a Qubit 4.0 Fluorometer
(Invitrogen, Waltham, MA, USA), equilibrated, and combined for the final
library. Libraries were further purified with the Agencourt AMPure XP
kit (Beckman Coulter, Brea, CA, USA) to remove primer dimers and then
sequenced with the Illumina MiSeq 300 bp PE protocol using V3 chemistry
(Illumina, San Diego, CA, USA) at either the UF Interdisciplinary Center
for Biotechnology Research or the UC Riverside Institute for Integrative
Genome Biology. Raw data are available at NCBI’s Sequence Read Archive
(BioProject PRJNA660245).
Bioinformatics – All sequencing data were processed
with the AMPtk pipeline v 1.4.1 (Palmer et al., 2018). Up to two
nucleotide mismatches were allowed in each primer, a maximum of two
expected errors were allowed during demultiplexing and quality
filtering, the maximum length was set at 600 bp, a standard index bleed
of 0.5% was set, and reference-based chimera filtering was used during
clustering. The 600 bp maximum length refers to the truncation of reads
for downstream processing in AMPtk. Reads are only truncated if they are
above the length cutoff. We tested several different length cutoffs and
found that 600 bp returned the best results for consistent contig
formation. For the ITS1F data, a minimum length of 150 bp was used, the
clustering threshold was 97% using the UNOISE3 (Edgar 2010) algorithm,
and the built-in UNITE ITS database in AMPtk was used for taxonomic
identification via the hybrid method. The ITS OTUs were also assigned
taxonomy with the RDP classifier for comparisons across primer sets. The
LSU data from both primer sets were processed with a minimum length of
225 bp and a clustering threshold of 98% using the UNOISE3 method.
Because some fungal species have ITS1 sequences under 200 bp, a shorter
minimum length was used for ITS1F than LSU. A custom LSU database was
installed in AMPtk for taxonomy assignment (see below) because the
built-in LSU database was outdated (RDP training set 8) and no other
comprehensive LSU database compatible with AMPtk was available. The
clustering thresholds for the LROR and LR22F datasets were determined by
evaluating which mock community output best matched the true community
composition after processing the data at different cutoffs (97, 98, and
99% for both primer sets) and comparing UNOISE3 against DADA2 (Callahan
et al., 2016). For both LSU data sets, UNOISE3 was used instead of DADA2
because DADA2 doubled the number of OTUs in the output and did not
significantly improve recovery of the mock community. Additionally,
using UNOISE3 enabled direct comparisons between datasets generated with
the three primer sets. Any remaining OTUs detected in the negative
controls after filtering were deleted from the OTU tables.
To create the LSU database, the RDP (training set 11) (Liu et al., 2012;
Wang et al., 2007) and SILVA (LSU Ref 132) (Quast et al., 2013) fungal
LSU reference FASTA files were downloaded, concatenated, and the
taxonomy strings reformatted for use in AMPtk. Reference sequences from
Zoopagomycota in the mock community were added to the database along
with >200 sequences of animals, fungi, plants, and protists
from GenBank. We selected these additional sequences based on BLAST
results of OTUs in the dataset that were initially unidentified or
misidentified by the databases. Due to limitations in formatting, the
database is only compatible with the UTAX or DADA2 taxonomy assignment
methods in AMPtk. The final updated database consisted of 115,382
dereplicated sequences and is freely available on OSF
(https://osf.io/cz3mh/). We refer to this modified database as
RDP+SILVA, and the program used for taxonomy assignment was UTAX. All
three OTU datasets were also assigned taxonomy with the RDP classifier v
2.12 (Wang et al., 2007) executed in AMPtk using the ITS UNITE and LSU
training sets, a commonly used methodology for metabarcoding studies.
Statistical analyses – Once OTU tables were obtained,
further analyses were conducted in R (R Core Team, 2019) with scripts
available at OSF (https://osf.io/cz3mh/). To check sampling coverage,
rarefaction plots were made using the package iNEXT (Chao et al., 2014;
Hsieh et al., 2020). Community comparisons were performed on the subset
of OTUs identified as EDF and compared to all fungi. To visualize
community differences among primer sets, the OTU table was converted
into presence/absence format and B sim dissimilarity matrices were
calculated with the betadiver function in the vegan package (Oksanen et
al., 2019) using the “w” method (Koleff et al., 2003). Non-metric
multidimensional scaling (NMDS) ordinations were performed with the
metaMDS function in vegan and plotted using ggplot2 (Wickham, 2016).
These ordinations were visualized for the entire fungal community and on
subsets of the EDF OTUs. NMDS ordinations were performed for each primer
set separately and the scores were extracted as dataframe objects which
were then overlayed onto a single plot without modification of the
coordinates. To compare the effect of the sampling location and sample
type on fungal and EDF community composition, the dissimilarity matrices
were analyzed with permutational multivariate analysis of variance
(PERMANOVA - Anderson, 2001) of the B sim distance matrices
(Anderson et al., 2006) using the adonis2 function in the vegan package.
Because PERMANOVA tests are sensitive to dispersion of the samples
within groups, an ANOVA analysis of the B sim distance matrices
was conducted to test for significant differences between groups of
interest. Significant ANOVA results of B sim output indicate that
dispersion between groups may confound results from PERMANOVA analyses.
Centroid differences were used for B sim (type = “centroid”),
and both B sim and PERMANOVA used 9,999 permutations. To further
examine similarities between primer sets, Mantel tests of correlation
between the dissimilarity matrices of each primer set were conducted
using Spearman’s rank correlation and 999 permutations. Finally,
taxonomy bar plots and alpha diversity metrics for all fungi and the
subset of EDF were conducted in phyloseq (McMurdie & Holmes, 2013) or
the R base functions. Alpha diversity measures were evaluated for fungal
community differences based on site and sample type with ANOVA and
Tukey’s Honest Significant Differences using the agricolae package (de
Mendiburu, 2020). Mantel tests of fungal OTUs utilized a subset of the
data that contained only the 127 samples (118 for EDF) that worked for
all three primer sets.
We selected 50 OTUs from the LROR dataset and 50 OTUs from the LR22F
dataset to study using phylogenetic analyses. We selected those that had
the greatest number of reads, were detected in more than one sample, and
were identified only as kingdom Fungi with our pipeline (i.e. the UTAX
algorithm and RDP+SILVA database). The OTUs were also subjected to BLAST
searches and OTUs with high matches to non-fungal sequences were not
included. These 100 unknown OTUs were added to a sequence alignment of
Zoopagomycota, other EDF taxa, and additional Dikarya. As a check on the
taxonomic identification by our pipeline, OTUs classified within the
Zoopagomycota (Kickxellomycotina or Zoopagomycotina) were included in a
smaller alignment and processed the same way as the larger alignment.
The sequences were aligned with MUSCLE, ambiguously aligned regions were
excluded, and Maximum Likelihood analyses were performed in RAxML v 8
(Stamatakis, 2014) using the GTRGAMMA model and 1,000 bootstraps. The
unknown OTUs were also examined using BLAST searches against GenBank
using both default parameters and excluding uncultured/environmental
sample sequences and the results were compared to the placement of the
OTUs in the phylogeny. Resulting figures were modified in FigTree v
1.4.3 (Rambaut, 2012) and InkScape v 0.92.2 (https://inkscape.org/en/).