Materials and Methods
Zoopagomycota fungi in the mock community A mock community comprised of Zoopagomycota fungi was created by combining equilibrated aliquots of genomic DNA from species of Dimargaritales, Kickxellales, and Zoopagales. DNA of mock community members was obtained from cultures grown from the University of Florida Gerald Benny culture collection or received from collaborators (TABLE 1). Species of Dimargaritales, Piptocephalis, and Syncephalis are haustorial mycoparasites and were grown in dual cultures with their host fungi (Benny et al., 2016b). Accordingly, the DNA extracts from these cultures also contained an unknown quantity of host DNA. Similarly, genomic DNAs from the Davis et al. (2019) study were single cell genomes amplified by multiple displacement methods and also contained DNA from both the fungi and their host organisms. Species of Kickxellales are saprotrophic and were grown axenically (Benjamin, 1958). A preliminary test of ITS1F/ITS2 primers on DNA from Piptocephalis andSyncephalis species mixed with soil DNA indicated that increased ITS1 length in these taxa resulted in reduced amplification and sequencing (SUPP FIG 1). We included those taxa and additional taxa with known length variation in our mock community to further examine these potential biases across different markers. The final community contained 30 isolates of Zoopagomycota fungi (TABLE 1). We also generated reference Sanger sequences from individual mock community members using primer pair LROR/LR5 (Hopple and Vilgalys 1994; Vilgalys and Hester 1990) for LSU and primer pair ITS1F/ITS4 (Gardes & Bruns, 1993; White et al., 1990) for ITS. Reference sequences were verified by BLAST analysis against NCBI GenBank and with phylogenetic reconstruction (data not shown). A non-biological, equimolar DNA mock community which consisted of a mixture of 12 synthetic single-copy sequences (SYNMO) was included alongside the ITS1 samples to help detect index bleed between samples and evaluate bioinformatic parameters (Palmer et al., 2018). All OTUs recovered from the mock community samples were submitted for BLAST searches to assess the taxonomic identity of the OTUs for comparison against the bioinformatic output.
Environmental samples We collected environmental samples from five sites in CA and two in FL (SUPP TABLE 1). At each site five substrates were collected: 1) bulk water from a freshwater stream or pond (water), 2) saturated sediment from the edge of the water body (mud), 3) the upper soil layers and leaf litter, consisting of the visible organic layer (topsoil), 4) the mineral soil layers below the topsoil (deep soil), and 5) microinvertebrates collected from the soil samples using Baermann funnels (invertebrates). At each site five replicates of each sample type were collected. The topsoil and deep soil replicates were collected at increasing distances from the water source along a 25 m transect (i.e. sample 1 was closest to the water and soil sample 5 was furthest from the water). For each sample approximately 15-25 mL of soil was collected into sterile 50 mL tubes and filled with sterile 2x Cetyl Trimethyl Ammonium Bromide lysis buffer (CTAB) to 30-35 mL. Water samples were collected in 950 mL sterilized Mason jars by dipping the jar into the water from the embankment. Water samples included some sediment and debris present on the bottom or floating on the top of the water. Large debris such as sticks, rocks, and clumps of leaves were removed. Vacuum filtration and a sterile Büchner funnel were used to filter the water through filter paper (6 µm pore size). After filtration, the filter papers were immediately placed in sterile 50 mL tubes filled with CTAB. No water samples were collected from the “Sweeney wash” site in California because there was no standing water although mud samples were collected from a wet depression between rocks. Microinvertebrates (protists, nematodes, tardigrades, etc.) were collected using Baermann funnels. A 50/50 mixture of topsoil and deep soil was added to a funnel, filled with sterilized water, and covered with Parafilm. One mL of water was collected in a sterile tube after 24 hours incubation and stored in a -20 C freezer. A second mL of water was collected during the second 24-hour period. The two samples were centrifuged at 15000 x g for 15 minutes, the excess water was drained, and then the samples were combined into one tube with CTAB for DNA extraction. For the three “Sweeney” desert sites in California, three rather than five Baermann funnel replicates were obtained due to limited funnel availability. Five microinvertebrate replicates were performed for all other sites. Baermann funnel samples are enriched for potential hosts of Zoopagales parasites and were used to target these taxa.
DNA Extraction DNA extraction followed a modified CTAB protocol (Gardes & Bruns, 1993). Topsoil, deep soil, and filter papers from the water samples were subjected to several cycles of freezing and thawing in 50 mL tubes with CTAB prior to DNA extraction. Several sterilized glass beads were then added, and samples were shaken for 1 minute at 1500 RPM in a 1600 MiniG tissue homogenizer (SPEX, Metuchen, NJ, USA). Two mL of CTAB was collected from each 50 mL tube and placed in sterile microcentrifuge tubes. Samples in CTAB were incubated with a 1:1 mixture of phenol:chloroform overnight and then washed with an additional chloroform step. The remainder of the CTAB protocol was performed without modification. Extractions from cultures followed the methods of Reynolds et al. (2019) and also used the CTAB protocol. Following extraction, DNA yield was estimated by a Nanodrop 2000 spectrophotometer (ThermoFisher Scientific, Waltham, MA, USA). Samples with genomic DNA concentrations >200 ng/ul were further cleaned with the DNeasy PowerClean Pro cleanup kit (Qiagen, Germantown, MD, USA); samples with low DNA yield (<200 ng/uL) were not cleaned due to loss of DNA during the clean-up process.
Primer Selection A multiple sequence alignment containing Dikarya, Mucoromycota, and Zoopagomycota sequences from GenBank, as well as the newly sequenced Zoopagomycota species, was created in Mesquite (Maddison & Maddison, 2019) and aligned with MUSCLE (Edgar, 2004). This reference alignment was used to evaluate the number of mismatches between published primer sequences and Zoopagomycota fungi. After testing several modified primer combinations on four samples, we chose the primers ITS1F and ITS2 (White et al., 1990) for the ITS1 region for further comparisons. For LSU, we chose the LROR/LR3 (Hopple & Vilgalys, 1994) primer set, which has been successfully used in metagenomics studies by processing only the forward reads (Benucci et al., 2019; Bonito et al., 2014; Johansen et al., 2016). We also included a modified forward primer LR22F (5´-GAGACCGATAGHRHACAAG-3´) used in combination with reverse primer LR3. LR22F is the reverse compliment of primer LR22 to which we added three degenerate positions to maximize compatibility with EDF. This primer is similar to LR22R (Mueller et al., 2015, 2016) but shifted upstream 8 bp because target Zoopagomycota taxa in our alignment had mismatches to LR22R (SUPP FIG 2). Hereafter we refer to these primer sets by the forward primer: ITS1F, LROR or LR22F.
Library Preparation We prepared the ITS1F Illumina library using the thermocycling protocols of Truong et al. (2019). Briefly, we used Phusion high fidelity polymerase (ThermoFisher Scientific, Waltham, MA, USA) and a dual-indexing approach with the following modifications. Amplification and sequencing with LROR were the same as for ITS1F except that the thermocycling program for the initial amplification step was 95°C for 1 minute, step down of -0.1°C per cycle from 55°C, and 72°C for 1:30 with a total of 30 cycles and a final elongation step of 72°C for 10 minutes. Temperature gradient tests were performed for LR22F to determine the best annealing temperature. The thermocycling program for LR22F was the same as for LROR except that the annealing temperature was a step down of -0.2°C per cycle starting from 65°C. All samples were amplified in three separate replicates and the replicates were combined prior to the indexing reaction. Negative PCR controls were included in all reactions. Amplicons from both the initial amplification and index attachment reaction were verified on 1.5% agarose gels. Indices were added using the Nextera XT index kit v2 (Illumina, San Diego, CA, USA) in a separate PCR step with the following protocol: GoTaq Green master mix (Promega, Madison, WI, USA) was used for the reaction and the thermocycling program included nine cycles at 94°C for 1 minute, 55°C for 1 minute, 72°C for 1:30, and a final elongation step of 72°C for 10 minutes. Indexed PCR products were pooled into groups of three based on similar band intensity and then cleaned with the Select-a-Size DNA Clean and Concentrator kit (Zymo, Irvine, CA, USA). All samples were then quantified using a Qubit 4.0 Fluorometer (Invitrogen, Waltham, MA, USA), equilibrated, and combined for the final library. Libraries were further purified with the Agencourt AMPure XP kit (Beckman Coulter, Brea, CA, USA) to remove primer dimers and then sequenced with the Illumina MiSeq 300 bp PE protocol using V3 chemistry (Illumina, San Diego, CA, USA) at either the UF Interdisciplinary Center for Biotechnology Research or the UC Riverside Institute for Integrative Genome Biology. Raw data are available at NCBI’s Sequence Read Archive (BioProject PRJNA660245).
Bioinformatics – All sequencing data were processed with the AMPtk pipeline v 1.4.1 (Palmer et al., 2018). Up to two nucleotide mismatches were allowed in each primer, a maximum of two expected errors were allowed during demultiplexing and quality filtering, the maximum length was set at 600 bp, a standard index bleed of 0.5% was set, and reference-based chimera filtering was used during clustering. The 600 bp maximum length refers to the truncation of reads for downstream processing in AMPtk. Reads are only truncated if they are above the length cutoff. We tested several different length cutoffs and found that 600 bp returned the best results for consistent contig formation. For the ITS1F data, a minimum length of 150 bp was used, the clustering threshold was 97% using the UNOISE3 (Edgar 2010) algorithm, and the built-in UNITE ITS database in AMPtk was used for taxonomic identification via the hybrid method. The ITS OTUs were also assigned taxonomy with the RDP classifier for comparisons across primer sets. The LSU data from both primer sets were processed with a minimum length of 225 bp and a clustering threshold of 98% using the UNOISE3 method. Because some fungal species have ITS1 sequences under 200 bp, a shorter minimum length was used for ITS1F than LSU. A custom LSU database was installed in AMPtk for taxonomy assignment (see below) because the built-in LSU database was outdated (RDP training set 8) and no other comprehensive LSU database compatible with AMPtk was available. The clustering thresholds for the LROR and LR22F datasets were determined by evaluating which mock community output best matched the true community composition after processing the data at different cutoffs (97, 98, and 99% for both primer sets) and comparing UNOISE3 against DADA2 (Callahan et al., 2016). For both LSU data sets, UNOISE3 was used instead of DADA2 because DADA2 doubled the number of OTUs in the output and did not significantly improve recovery of the mock community. Additionally, using UNOISE3 enabled direct comparisons between datasets generated with the three primer sets. Any remaining OTUs detected in the negative controls after filtering were deleted from the OTU tables.
To create the LSU database, the RDP (training set 11) (Liu et al., 2012; Wang et al., 2007) and SILVA (LSU Ref 132) (Quast et al., 2013) fungal LSU reference FASTA files were downloaded, concatenated, and the taxonomy strings reformatted for use in AMPtk. Reference sequences from Zoopagomycota in the mock community were added to the database along with >200 sequences of animals, fungi, plants, and protists from GenBank. We selected these additional sequences based on BLAST results of OTUs in the dataset that were initially unidentified or misidentified by the databases. Due to limitations in formatting, the database is only compatible with the UTAX or DADA2 taxonomy assignment methods in AMPtk. The final updated database consisted of 115,382 dereplicated sequences and is freely available on OSF (https://osf.io/cz3mh/). We refer to this modified database as RDP+SILVA, and the program used for taxonomy assignment was UTAX. All three OTU datasets were also assigned taxonomy with the RDP classifier v 2.12 (Wang et al., 2007) executed in AMPtk using the ITS UNITE and LSU training sets, a commonly used methodology for metabarcoding studies.
Statistical analyses – Once OTU tables were obtained, further analyses were conducted in R (R Core Team, 2019) with scripts available at OSF (https://osf.io/cz3mh/). To check sampling coverage, rarefaction plots were made using the package iNEXT (Chao et al., 2014; Hsieh et al., 2020). Community comparisons were performed on the subset of OTUs identified as EDF and compared to all fungi. To visualize community differences among primer sets, the OTU table was converted into presence/absence format and B sim dissimilarity matrices were calculated with the betadiver function in the vegan package (Oksanen et al., 2019) using the “w” method (Koleff et al., 2003). Non-metric multidimensional scaling (NMDS) ordinations were performed with the metaMDS function in vegan and plotted using ggplot2 (Wickham, 2016). These ordinations were visualized for the entire fungal community and on subsets of the EDF OTUs. NMDS ordinations were performed for each primer set separately and the scores were extracted as dataframe objects which were then overlayed onto a single plot without modification of the coordinates. To compare the effect of the sampling location and sample type on fungal and EDF community composition, the dissimilarity matrices were analyzed with permutational multivariate analysis of variance (PERMANOVA - Anderson, 2001) of the B sim distance matrices (Anderson et al., 2006) using the adonis2 function in the vegan package. Because PERMANOVA tests are sensitive to dispersion of the samples within groups, an ANOVA analysis of the B sim distance matrices was conducted to test for significant differences between groups of interest. Significant ANOVA results of B sim output indicate that dispersion between groups may confound results from PERMANOVA analyses. Centroid differences were used for B sim (type = “centroid”), and both B sim and PERMANOVA used 9,999 permutations. To further examine similarities between primer sets, Mantel tests of correlation between the dissimilarity matrices of each primer set were conducted using Spearman’s rank correlation and 999 permutations. Finally, taxonomy bar plots and alpha diversity metrics for all fungi and the subset of EDF were conducted in phyloseq (McMurdie & Holmes, 2013) or the R base functions. Alpha diversity measures were evaluated for fungal community differences based on site and sample type with ANOVA and Tukey’s Honest Significant Differences using the agricolae package (de Mendiburu, 2020). Mantel tests of fungal OTUs utilized a subset of the data that contained only the 127 samples (118 for EDF) that worked for all three primer sets.
We selected 50 OTUs from the LROR dataset and 50 OTUs from the LR22F dataset to study using phylogenetic analyses. We selected those that had the greatest number of reads, were detected in more than one sample, and were identified only as kingdom Fungi with our pipeline (i.e. the UTAX algorithm and RDP+SILVA database). The OTUs were also subjected to BLAST searches and OTUs with high matches to non-fungal sequences were not included. These 100 unknown OTUs were added to a sequence alignment of Zoopagomycota, other EDF taxa, and additional Dikarya. As a check on the taxonomic identification by our pipeline, OTUs classified within the Zoopagomycota (Kickxellomycotina or Zoopagomycotina) were included in a smaller alignment and processed the same way as the larger alignment. The sequences were aligned with MUSCLE, ambiguously aligned regions were excluded, and Maximum Likelihood analyses were performed in RAxML v 8 (Stamatakis, 2014) using the GTRGAMMA model and 1,000 bootstraps. The unknown OTUs were also examined using BLAST searches against GenBank using both default parameters and excluding uncultured/environmental sample sequences and the results were compared to the placement of the OTUs in the phylogeny. Resulting figures were modified in FigTree v 1.4.3 (Rambaut, 2012) and InkScape v 0.92.2 (https://inkscape.org/en/).