Target selection and capture
Our screening illustrates the robustness of identifying ORFs that are likely single-copy and orthologous from existing databases followed by paralog detection based on ingroup transcriptomes. Our enrichment strategy performed well and was not affected much by the unknown intron-exon boundaries upon probe design. This result opens opportunities to use ORF predictions from transcriptome assemblies in other Metazoa as direct targets for probe design, provided orthology is verified. Therewith, our strategy simplifies the development of genomic datasets significantly, especially for non-model organisms. If many short exons are expected, the use of shorter probes, e.g. of 80 nt, or covering targets more densely with probes, e.g. at 3× or 4×, may further enhance the capture efficiency.
The effectiveness of selecting UCEs for unionids with the PHYLUCE pipeline is somewhat hampered because few and only distant genomes were available for mollusks (Sigwart et al., 2021; Sun et al., 2019) compared to other taxa for which UCE sets have been developed. Nevertheless, we recovered hits for 1,895 (46%) of our target UCEs, which is comparable to values obtained in some previous UCE studies (e.g. Kulkarni et al., 2020; Starrett et al., 2017; Streicher et al., 2018), indicating that our design worked. As is regularly the case in UCE studies (Buenaventura et al., 2021; Faircloth et al., 2012; Kulkarni et al., 2020; Quattrini et al., 2018; Starrett et al., 2017), the number of UCEs that can eventually be included in the alignment for phylogenetic inference was restricted to a subset of UCEs with high recovery across all ingroup taxa (but see Branstetter et al., 2017). Phylogenetic analysis on 276 UCEs allowed to unambiguously reconstruct the backbone phylogeny of Coelaturini and estimates of population genetic diversity from 309 UCEs were comparable to those obtained from ORFs, but more similar to the diversity at non-synonymous than at synonymous sites.
Combining ORFs and UCEs in the same probe set has resulted in competition: Although UCEs account for over 25% of the probes, only ~1% of our reads cover UCE targets. A potential factor of influence is the phylogenetic distance among the genomes used to identify UCEs and our ingroup, compared to the selection of ORFs based on ingroup transcriptomes. The recovery of 1,895 UCEs across our samples despite having only ~1% of our reads mapping to UCE targets indicates that the issue results from hybridization efficiency rather than probe design, however. This result was unexpected based on a previous integration of multiple types of markers (Hutter et al., 2019), where no such competition was observed, but in that study the average length of UCE targets was >700 bp, compared to ~145 bp in ours. As we did not find a relationships between the length and recovery of UCEs (Fig. S2), the most likely explanation is that differences in inherent properties of UCE and ORF targets (e.g. mismatches to genomic libraries or differences in melting temperatures) cause variation in sensitivity and specificity during hybridization. This hypothesis is corroborated by the more restricted recovery of UCEs compared to ORFs upon in silico mapping of reads onto the Venustaconcha genome. UCE recovery could be enhanced by altering the temperatures of hybridization and washing reactions, but further work is required to better understand the balance of enrichment across UCEs and ORFs.