IV. DISCUSSION
The adoption of multiple complementary scaffolding approaches resulted
in an assembly of similar quality to the best available salmonid
genomes. Multiple lines of evidence suggest that the genome presented
here represents a nearly complete and accurate model of the Lake Trout
genome. First, the total size of the finished genome was slightly
greater than the genome size estimate obtained from GenomeScope.
Pflug et al. (2020) found that
k-mer based methods for genome size estimation tend to underestimate
genome size by 4.5% on average, so this result is not entirely
unexpected. Additionally, BUSCO scores were similar to those obtained
for the highest quality salmonid genomes available at the time of
analysis (e.g. Coho Salmon, Brown Trout, Rainbow Trout). Scores were
highly similar between Brown Trout and Lake Trout genomes; however, the
proportion of missing BUSCOs was 1.9% higher for Lake Trout and the
proportion of complete duplicated BUSCOs was 2% lower suggesting that
some duplicated regions might be missing from the Lake Trout genome.
Nonetheless, these two assemblies had the highest percentage of complete
BUSCOs and the highest percentage of complete duplicated BUSCOs out of
the genome assemblies examined. Furthermore, the order of loci on the
Lake Trout linkage map and the order of loci on Lake Trout chromosomes
was shown to be highly concordant, suggesting that contigs are
accurately ordered and properly oriented. The genome presented here is
also highly contiguous, with a contig N50 higher than any published
salmonid genome (but see the recently released assembly for Arlee Strain
Rainbow Trout; GCF_013265735.2). Interestingly, the PacBio data used
for assembly were of similar coverage to the data used for assembling
the European Whitefish genome (De-Kayne et al. 2020); however, the Lake
Trout genome contig N50 is >3X higher (although scaffold
N50 is lower). There are two reasonable explanations for this. First,
the European Whitefish genome was assembled using DNA from a
wild-caught, outbred individual rather than a double haploid. Second,
the European Whitefish genome was not gap filled after scaffolding. Gap
filling the Lake Trout genome with PBJelly increased contig N50 by
561,496 bp.
The Lake Trout genome will likely be sufficient for the majority of
downstream uses; however, improvements could likely be made using
supplementary scaffolding resources such as a higher density linkage map
or optical map (Pan et al. 2020). The annotation could also be improved
by generating additional RNA-seq data. The number of annotated genes and
pseudogenes (n=49,668) is similar to what has been obtained for other
salmonids (eg Chum salmon Oncorhynchus keta , Sockeye salmonOncorhynchus nerka , and Dolly Varden) using the same annotation
pipeline. However, it is important to note that annotation completeness
is markedly reduced relative to other assemblies with similar BUSCO
scores such as Atlantic Salmon (57,783; GCF_000233375.1; Annotation
Release 100), Coho Salmon (63,465; GCF_002021735.2; Annotation Release
101), Brown Trout (61,583; GCF_901001165.1; Annotation Release 100),
Rainbow Trout (55,630, GCF_002163495.1, Annotation Release 100), and
Chinook Salmon (53,685, GCF_002872995.1, Annotation Release 100). These
annotations were produced using RNA-seq evidence from a greater
diversity of tissue types, which likely explains this discrepancy. The
Lake Trout annotation, as well as annotations for other salmonids, could
also be further improved by directly sequencing full length transcripts
using long-read sequencing technologies
(Workman et al. 2018). We predict
that the completeness of the Lake Trout genome annotation will be
improved as more gene expression data from a greater diversity of tissue
types becomes available for the species
(Salzberg 2019). Nonetheless, the
current genome annotation will undoubtably aid in the interpretation of
future findings by allowing researchers to link signals of selection and
loci associated with phenotypes with putatively causal genes and
biological processes. Publicly available gene expression and functional
annotation resources, like those being developed by the Functional
Annotation of All Salmonid Genomes (FAASG) initiative, will also aid in
this effort (Macqueen et al. 2017).
The availability of a second high-quality assembly for aSalvelinus species will likely benefit comparative genomic
research aimed at understanding the evolutionary consequences of genome
duplication. Salmonids have long been appreciated as a model system for
understanding evolution following whole genome duplication events (Ohno
1970) and the wealth of genomic resources for salmonids will hopefully
continue to shed light on the evolutionary processes at play following
autotetraploid genome duplication events. Additionally, multiple recent
studies have highlighted the importance of structural genetic variation
for promoting adaptive diversification within salmonid species (Pearse
et al. 2019; Bertolotti et al.
2020), and chromosome-anchored genome assemblies are typically needed
for detecting and genotyping structural variants
(Merot et al. 2020).
Genomic methods have dramatically increased the precision of population
genetic analyses and have enabled researchers to address qualitatively
unique questions that require some knowledge of genome structure and
function (Waples et al. 2020).
Lake Trout have undergone repeated parallel adaptive radiations and
ecotypic diversity appears to be heritable (Goetz et al. 2010); however,
the genetic or epigenetic basis for ecotypic diversity is still unclear
(PerreaultâPayette et al. 2017). A
genome assembly will greatly simplify the process of mapping loci
associated with ecophenotypic differentiation and could enable
identification of loci associated with reproductive isolation among
ecotypes in populations where multiple ecotypes exist. Anecdotal
evidence suggests that Lake Superior once harbored as many as ten
ecotypes (Goodier 1981). Three ecotypes are contemporarily recognized
(lean, siscowet and humper) and a fourth ecotype was recently identified
(redfin; Muir et al. 2014). Interestingly, Muir et al. (2014) found that
ecotypes collected near Isle Royale were moderately distinct, which is
at odds with historical records suggesting that they were easy to
identify visually (Rakestraw
1967). An improved understanding of the genetic basis for ecotypic
differentiation could help determine if this is due to phenotypic
plasticity, increased levels of hybridization between ecotypes, or other
processes (Baillie et al. 2016).
The ability to genotype historical collections and quantify levels of
adaptive differentiation at different time points
(Guinand et al. 2003) provides a
particularly exciting avenue for future research on Lake Trout.
The Lake Trout genome assembly could also have important implications
for ongoing Lake Trout restoration activities throughout the Great
Lakes. The resources presented here will allow for the identification of
loci associated with variation in fitness between Lake Trout hatchery
strains in contemporary Great Lakes environments
(Scribner et al. 2018) and the
identification of loci that are adaptively diverged between hatchery
strains. This information could help fisheries managers to maximize
adaptive genetic diversity in re-emerging wild populations and
prioritize hatchery populations for continued propagation.