Challenges associated with genotyping SVs
Accurately genotyping SVs is in some ways more challenging than SV discovery (see Alkan et al., 2011; Antaki et al., 2018; Jiang et al., 2023 preprint; Nguyen et al., 2023). This is in part because the approaches implemented for SV discovery are not immediately transferrable to genotyping individuals (Alkan et al., 2011). Although some discovery tools provide complementary genotyping packages (e.g., Delly, Smoove), many do not (e.g., CuteSV, Manta, Sniffles). Like SV discovery tools, genotyping tools may implement different approaches for different SV types and scoring qualities (S. Chen et al., 2019; Jiang et al., 2023 preprint; Nguyen et al., 2023; Sibbesen et al., 2017). This in turn makes it challenging to identify appropriate thresholds across a range of tools. In addition, there is growing evidence that short-read data is simply insufficient to accurately genotype some SV types (e.g., Sibbesen et al., 2018). For example, relatively few duplications and inversions passed genotyping quality thresholds in all six datasets, including the long-read based datasets which were genotyped using short-read data.
Assessing the performance of genotyping tools and approaches is challenging for most non-model species as there is a paucity of data for independently validating SVs (i.e., truth call sets; Cameron et al., 2019; Kosugi et al., 2019). Because it is next to impossible to estimate false discovery rates or verify genotypes without a truth call set (but see Belyeu et al., 2021), we leveraged a well curated pedigree to assess the proportion of SVs among each of the genotype filtered datasets that adhered to Mendelian inheritance in parent-offspring trios. Although concordance across all (100%) trios was low for some tools, it is promising to note that call and genotype-filtered SVs had between 72-100% concordance in at least 80% of trios. Where robust pedigree data is available, this additional filtering step is likely to enrich datasets for true positives and subsequent validation.