Introduction
Fungi are inconspicuous organisms, only a
proportion of which sporadically reveal their presence through the
formation of tangible morphological structures such as fruiting
bodies1. The scientific study of fungi has therefore
benefited immensely from the development of molecular (DNA) sequencing
tools during the last 30 years. However, even with the use of molecular
tools, studies involving the tropics have neglected fungi despite the
fact that the majority of the undescribed fungi are thought to occur in
the tropics2, 3, 4. Among all tropical biomes,
rainforests provide the widest range of ecosystem services through high
above- and below-ground biodiversity5, including water
cycling and carbon storage6, 7. The largest and most
diverse of those forests is Amazonia8,9, which comprises approximately 40% of the area
occupied by rainforest habitats around the world. Amazonian ecosystem
services can only be maintained through abiotic and biotic processes,
many of which are mediated by
fungi.
To better characterize fungal communities in Amazonia, short-read
High-Throughput Sequencing (HTS) platforms such as Illumina are being
increasingly used10 ,11 ,12 ,13 ,14. These approaches are often used together with PCR
techniques to amplify individual markers. In particular, the nuclear
ribosomal Internal Transcribed Spacer (ITS) region has been selected as
the best DNA region to identify the widest possible range of fungal
groups and is therefore commonly used as a universal DNA barcode for
fungi15. However, this region is typically 500–600
bases long, preventing it from being sequenced under some sequencing
technologies. The use of partial sequencing (targeting only a sub-region
such as ITS1 or ITS2) has at times limited the taxonomic coverage and
identification of fungi by not providing enough variation to tell
species apart16. Furthermore, even though HTS
approaches produce hundreds of thousands or millions of sequences per
sample, the limited length of these sequences can introduce critical
biases to the precise taxonomic identification of the underlying
lineages17, 18.
Long-read HTS has the potential to overcome some of these limitations,
but they have rarely been used in environmental
studies18,19. One of the most well-developed platforms
is the single-molecule real-time sequencing platform of Pacific
Biosciences (PacBio)20. Although the PacBio platform
had a high error rate at the time it was launched, the error rate is
currently less than 1%21. Recent studies have shown
that the potential of the PacBio platform for the identification of
fungal communities using environmental samples is
high18, 19, but so far they have not
been widely applied to any ecosystems.
Taken together, the use of short- and long-sequence HTS techniques
offers the potential to overcome the challenges of characterizing fungal
diversity in species-rich ecosystems, such as Amazonia in northern South
America. Amazonia is a heterogeneous biome, and its biodiversity has
been shown to vary considerably across geographical ranges. On a large
scale, a west (more diverse) to east (less diverse) diversity gradient
has been observed in many animal and plant groups22,
23, 24 and also in micro-organisms, including
fungi10, 25. Another source of
heterogeneity in Amazonia is the presence of distinct habitats types.
Each phytophysiognomy comprises a largely distinct biota and own soil
characteristics, flooding regimes, and nutrient
availability11, 26. Four widespread
and important habitats, here given in the order of decreasing plant and
animal diversity25, 26, are: unflooded tropical
forests (terra-firme); forests seasonally flooded by fertile white-water
rivers (várzeas); forests seasonally flooded by unfertile black water
rivers (igapós); and naturally open areas associated with white-sand
soils (campinas). The richness gradient for micro-organisms has been
found to differ from this general trend, as campinas harbour the highest
microbial richness10, 25.
Soil physicochemical characteristics are often considered crucial for
biotic dynamics, vegetation, and diversity patterns at local to regional
scales across Amazonia14, 27, 28, 29. Although several
studies have reported on the importance of soil characteristics in
shaping community structure, no unified pattern has emerged. In a recent
study using HTS with short reads from environmental samples in Amazonia,
members of our team showed a mixed effect of soil properties on the
microorganism richness and community turnover11. In
that study, we used general primers to target all eukaryotes, and we did
not address specifically these effects on Fungi.
This study seeks to characterize fungal communities across Amazonia
using environmental samples of soil and litter. For the first time (to
our knowledge) in an Amazonian context, we use a long-read approach to
sequence the full fungal ITS region on the PacBio platform. In addition,
we combine our novel long-read data with our previously released
short-read HTS data of the nuclear ribosomal 18S rRNA small subunit
(18S) gene and the mitochondrial cytochrome c oxidase subunit I
(COI) gene produced in a Illumina sequencing platform. We discuss the
patterns of fungal richness and community turnover across Amazonia and
compare the results obtained from different genes and platforms.
Methods
Study area and sampling
design
We sampled four localities across Brazilian Amazonia (Fig. 1) following
the sampling design described by Tedersoo et al.12.
Detailed locality descriptions are available in Ritter et
al.10. Briefly, we sampled all depths of the litter
layer above the mineral soil (all organic matter, including leaves,
roots, and animal debris) and the top 5 cm of the mineral soil in a
total of 39 circular plots, each with a radius of 28 m. We chose 20
random trees inside each plot and collected litter and soil on both
sides of each tree. We then pooled the samples by substrate to produce
one litter sample and one soil sample per plot. The soil physicochemical
properties were determined by a Brazilian company (EMBRAPA); additional
details of the soil analysis can be found in Ritter et
al.11.
Data
generation
For the nuclear ribosomal small subunit
(SSU) 18S rRNA (18S) and the mitochondrial cytochrome c oxidase
subunit I (COI) genes, we used the OTU table produced in Ritter et
al.25. We selected the OTUs assigned to the fungal
kingdom based on SILVA30 for 18S and
GenBank31 for COI datasets, respectively, for all our
analyses. We present here the results of both markers in light of the
fact that the previous publication did not analyse fungi separately,
which imposed limits on the fungal richness and community structure
analyses employed at the
time.
For the ITS, we used the approach described in Tedersoo et
al.18. We used the forward primers ITS9MUNngs
(5′‐TACACACCGCCCGTCG‐3’32) and ITS4ngsUni
(5′‐CCTSCSCTTANTDATATGC‐3’32) to target the full ITS
region (ITS1 - 5.8S - ITS2)”. For amplification, we used a PCR mixture
comprised 5 ul of FirePol DNA polymerase master mix (Solis Biodyne,
Tartu, Estonia), 0.5 ul of each forward and reverse primer (20 mM), 1 ul
of DNA extract and 18 μl ddH2O. Thermal cycling included an initial
denaturation at 95 °C for 15 min; cycles of denaturation for 30 s at 95
°C, annealing for 30 s, elongation at 72 °C for 1min; final elongation
at 72 °C for 10 min and storage at 4 °C. The duplicate PCR samples were
pooled; their relative quantity was estimated by running 5 μl DNA on 1%
agarose gel stained with ethidium bromide (Sigma-Aldrich, St Louis, MO,
USA). We used negative (for DNA extraction and PCR) controls throughout
the experiment. The amplicons were purified with FavorPrep PCR Clean Kit
(FavorGen Biotech Corporation, Vienna, Austria). The libraries were
prepared using PacBio amplicon library preparation protocol (Pacific
Biosciences, Inc, Menlo Park, Ca, USA) and loaded to seven SMRT cells
using the MagBead method. The libraries were sequenced using the PacBio
RS II instrument using P6-C4 chemistry following the manufacturer´s
protocol.
Bioinformatics analyses were performed using the PipeCraft
platform33. PacBio circular consensus reads (CCS,
reads_of_insert) were quality filtered with
vsearch34 (maxee = 2, maxns = 0, minlen = 150).
Filtered reads were demultiplexed based on the unique sequence
identifiers using mothur35 (bdiffs = 1). Putative
chimeric reads were filtered using denovo and reference-database-based
methods in vsearch. Additionally, sequences where the full PCR primer
was found anywhere in the read were filtered out using the PipeCraft
built-in module, as these reads represent additional chimeras not
detected by the vsearch method. The full ITS region was extracted using
ITSx36 and clustered using the UPARSE
algorithm37 with a 98% similarity threshold.
Additionally, the post-clustering curation method
LULU38 was applied (minimum_ratio_type = “min”,
minimum_match = 98) to merge consistently co-occurring ‘daughter’ OTUs.
Taxonomy annotation was performed using BLASTn39against the UNITE40, 41 and INSDC42databases.
Statistical
analysis
We performed all statistical analyses in R v.3.6.043.
Two samples (SCUICAMP3 and LCUITFP3) had a very low number of reads in
the ITS results, and were excluded from subsequent analyses of all
markers. We use as a diversity estimative the effective number of OTUs,
calculated with the unrarefied read counts as OTU abundance, using the
exponential of the Shannon entropy diversity of order q =
144. This measure is more robust against biases
arising from uneven sampling depth than the simple number of
OTUs45. For the abundance-based community matrices, we
transformed read counts using the “varianceStabilizingTransformation”
function in DESeq246 as suggested by McMurdie and
Holmes45. This transformation normalizes the count
data with respect to sample size (number of reads in each sample) and
variances, based on fitted dispersion-mean
relationships46.
We tested the correlation between diversity of each marker through a
Pearson correlation between each pair of markers. To test between the
community composition correlation, we performed a Mantel test with the
Jaccard dissimilarity matrices, using the Pearson correlation and 999
permutations for significance. Both analyses were performed using the
vegan v.2.5.5 R package47.
For soil physicochemical analysis, we first normalized all variables to
mean = 0 and variance = 1. We then performed two principal component
analyses (PCA), one for soil grain size and the other for chemical
compounds, using the vegan package. We used the first axis of each PCA
(explaining 56% and 69% of the total variation, respectively) in the
subsequent linear models and multiple regressions analysis. Given the
expected importance of soil organic carbon content11,48 and pH11, 49,
we used these as independent variables.
/
To test the effect of soil properties on fungal OTU richness, we
performed a Bayesian general linear model (GLM) analysis, as implemented
in the R-INLA v.17.6.20 R package50. The response
variables were the OTU diversity by soil layer (litter and soils) and
marker (18S, ITS and COI), giving a total of six models. In each case,
the soil properties (PC1 for the physical, PC1 for the chemical, organic
carbon content, and pH both standardized to mean = 0 and variance = 1)
were used as explanatory variables. We tested the effect of spatial
autocorrelation by comparing analyses of standard GLMs with GLM analysis
using stochastic partial differential equations (SPDE) that explicitly
consider spatial correlation.
To test the effect of soil properties on fungal community turnover, we
used multiple regressions on dissimilarity matrices (MRM) with the R
package ecodist v.2.0.151. The response variables were
dissimilarity matrices calculated using the Jaccard dissimilarity. In
each case, the explanatory variables were the distance matrices based on
soil properties (physical PC1, chemical PC1, organic carbon, and pH) and
one geographical distance matrix (all calculated using Euclidean
distances). Statistical significance of the regression coefficients was
determined using 10,000 permutations.
For the analysis of differences of community composition by locality and
habitat, we performed a non-metric multi-dimensional scaling (NMDS)
analysis using the Jaccard dissimilarity matrix and tested the
significance of groups using the envfit test, which fits vectors of
continuous variables - in this case the NMDS axes - and centroids of
levels of class variables (locality, habitat, and soil layer) using the
vegan package. Additionally, we performed a permutational analysis of
variance (PERMANOVA) to test the significance of each factor (locality,
habitat, soil layer, first PC of both PCAs, pH, and carbon) in the
community composition of each dataset (18S, COI, and ITS) using the
vegan package. We furthermore performed an analysis of indicator OTUs of
each locality, habitat, and soil layer using the R package indicspecies
v.1.7.652 using the matrix of relative abundance. This
analysis identifies the species, in our case the OTUs, that are
associated with a determined group. We performed the analysis three
times with each dataset (18S, COI, and ITS): the first grouping the OTUs
by locality, the second by habitat, and the third by soil layer. We
tested the significance with 9,999 permutations, from which we
quantified the number of indicator OTUs for each group with an alpha
< 0.05. Following this, based on literature and experience,
V.X.L. classified all possible indicator OTUs in their functional guild
group, such as mycorrhizae, phytopathogen, and saprobe based in their
assigned taxonomy (Table S3). We calculated the mean number of OTUs by
each factor (locality, habitat, and soil layer) in each dataset (18S,
COI, and ITS) using the vegan R package. We produced a Venn diagram for
visualization of the number and proportion of exclusive and shared OTUs
for each factor (locality, habitat, and soil layer) in each dataset
(18S, COI, and ITS) using the online tool Venny 2.053.
Additional R packages used for data curation were tidyverse
v.1.2.154 and ggplot2 v.3.1.155. All
scripts and data used in the analyses are available as supplementary
material.
Results