Structural variant analyses

Structural variants were counted for each SV discovery tool prior to and after filtering. To explore the level of call consensus between these outputs, the number of overlapping SVs were identified using SURVIVOR v1.0.7 (Jeffares et al., 2017) in 1kb, 500bp, 50bp windows and for exact overlaps. To count as a consensus call, SV type and strand were required to match and a minimum variant length of 50bp were required. To assess whether some chromosomes carried more SVs relative to their size, we estimated the number of SVs per chromosome and the proportion of base-pairs of each chromosome within an SV (i.e., the sum of all SV lengths for a given chromosome / chromosome size).
Following SV discovery across the six strategies, all individuals were genotyped using the aligned kākāpō125+ short-read dataset. The genotype filtered SV data for all six variant call sets were used to compare individual and generational variability. Due to population structure previously identified using SNPs (Guhlin et al., 2022 preprint) and demonstrated here using SVs (Supplemental Figure 1), generational variability was assessed for Fiordland and Rakiura separately. When reporting the number of SVs per individual and number of SVs among kākāpō generations, we use presence or absence of SVs per individual. That is, we consider genotypes as evidence of whether or not the individual carries the SV (0/1 & 1/1 = carrier; 0/0 = non-carrier). Both Fiordland- and Rakiura-derived birds were used for comparisons across three generations (n = 1, 3, 4 for Fiordland F0, F1 and F2 and n = 40, 59, 10 for Rakiura F0, F1 and F2 respectively). Due to the lek mating system and a relatively long lifespan, the kākāpō population has had significant backcrossing through the generations. Therefore, the F1 and F2 generations represented here excluded all individuals with backcrossed lineages, as this may bias true generational patterns in SVs carried by individuals. Due to the high variability in the number of individuals representing each generation for both lineages, and the number of SVs carried by individuals, individual SV counts were transformed using the natural log and the mean estimated for each generation. Trends between generations, lineages and data sets were then plotted for comparison.
In the absence of a previously validated catalogue of SVs, neither a ‘true’ nor ‘false’ positive rate of detection could be assessed. Despite not being able to estimate the precision and accuracy of SVs called in our data, we aimed to test the consistency of genotyping results using Mendelian Inheritance tests with parent-offspring trios. Although this does not eliminate the possibility of systematic error, nor does it provide an indication of the precision or accuracy of SV detection, departures from Mendelian Inheritance may indicate inconsistency of genotyping within a given SV call. Genotyping consistency is an important consideration for population studies as patterns of population structure or inferences about local adaptation may be impacted by inconsistencies.
To identify SVs that violate Mendelian Inheritance patterns, the BCFtools +mendelian plugin was used. Pedigree data provided by the New Zealand Department of Conservation identified 120 parent-offspring trios consisting of 158 unique individuals in the Kākāpō125+ sequence data. We tested SV genotypes by calculating the proportion of Mendelian Inheritance errors relative to the number of non-missing genotypes (i.e., GT != “mis”). Four thresholds were tested where adherence to Mendelian Inheritance expectations were either 100%, ≥95%, ≥90% and ≥80% of genotypes passed. It is important to note that not all 169 sequenced individuals were represented in pedigree trios, as they may not have descendants or antecedents represented in the short-read data analysed here. In addition, some individuals are represented multiple times in different family groups. This bias towards highly represented individuals in the kākāpō breeding population may not adequately capture all SVs called within the population. As such, we did not filter SVs using Mendelian Inheritance errors for downstream analysis. Rather, these tests may provide some insights into the relative performance of genotyping strategies among the pipelines used here.