Introduction
The increased accessibility of whole-genome sequencing (WGS) technology has revolutionised population genetic/genomic studies in non-model species, and continues to provide valuable insights into the mechanisms underpinning genome divergence during speciation as well as the interplay between mutation, genetic drift, selection, and gene flow in the context of population demography (Campbell et al., 2018; Chueca et al., 2021; Cruickshank & Hahn, 2014; Formenti et al., 2022; Lado et al., 2020; Mathur & DeWoody, 2021). To date, the vast majority of these studies use single nucleotide polymorphisms (SNPs) to investigate these processes, yet there is a growing interest in the evolutionary and adaptive significance of structural variants (SVs), which are genomic rearrangements that include deletions, duplications, insertions, translocations, and inversions (Mérot et al., 2020; Wellenreuther & Bernatchez, 2018). SVs have been shown to influence the evolutionary trajectory of populations by determining traits associated with reproductive strategies (Huynh et al., 2011; Küpper et al., 2016), local adaptation and adaptive potential (Berdan et al., 2021; Cayuela et al., 2021; Dorant et al., 2020; Huang et al., 2020; Kess et al., 2021; Tigano et al., 2021). There is also growing evidence that SVs may lead to speciation (Davey et al., 2011; Funk et al., 2021; Todesco et al., 2020).
Previous studies exploring SV diversity in natural populations have generally combined multiple sequencing technologies (e.g., short- and long-read sequencing, optical mapping) and large sample sizes (reviewed in Wold et al., 2021). Further, many studies to date have aimed to identify SVs in close association with specific traits of interest and subsequently validate them with more traditional approaches (e.g., vonHoldt et al., 2017). There is ample opportunity to develop ‘good’ practice to reliably investigate population-level differences in SV frequency, location or size distributions in non-model species. However, agricultural and human genomics studies have identified caveats to consider before using short-read sequence data to call SVs. For example, we expect to observe a high false-positive rate and biases in the type and size range of SVs detected (Cameron et al., 2019; English et al., 2015; Ho et al., 2020; Mahmoud et al., 2019). This is in part because SV discovery tools commonly use discordant reads (i.e., those that are improperly aligned and/or depart from expected and observed insert lengths) and read depth to identify putative variants (Alkan et al., 2011; Cameron et al., 2017; X. Chen et al., 2016; Layer et al., 2014; Rausch et al., 2012). Although discordant reads do occur as a result of ‘true’ SVs, they may also arise as the result of mapping/sequencing error or reference error (Bayer et al., 2020; Hurgobin & Edwards, 2017).
Distinguishing between the underlying sources of discordant read mapping generally requires independent data, such as extensive long-read sequencing, PCR amplification and Sanger sequencing, or Optical mapping (Ho et al., 2020). Such resource intensive approaches may not be feasible for many non-model species, especially those of conservation concern. Given that long-read sequences have been shown to outperform short-read data for SV discovery (Alkan et al., 2011; Chaisson et al., 2019; Mahmoud et al., 2019), researchers may choose to use a strategic approach that combines long-read sequencing for SV discovery and short-read sequencing for population-scale genotyping (e.g., Chander et al., 2019; Huddleston et al., 2017; Jun et al., 2021). Guidelines around the application of genotyping SVs with short-read data in non-model species remain somewhat unclear (e.g., target sequence depth, ideal read insert size distribution, considerations for polyploids). This is in large part due to the lack of datasets–excluding human genomic datasets–suitable for benchmarking SV discovery and genotyping strategies (e.g., Cameron et al., 2019; Kosugi et al., 2019).
The critically endangered kākāpō is a nocturnal ground parrot endemic to Aotearoa New Zealand. Once widely distributed throughout the North and South Islands of Aotearoa, kākāpō populations rapidly declined as a result of anthropogenic disturbances and introduced mammalian predators (Lloyd & Powlesland, 1994; Veltman, 1996; Williams, 1956). Populations continued to decline across the mainland and are believed to have gone extinct on the North Island in the 1930’s. The last known South Island population was lost in the 1980’s (Lloyd & Powlesland, 1994). A relict population was discovered on Rakiura (Stewart Island) in 1977 and a translocation of a small handful of kākāpō found in Fiordland National Park on the West Coast of the South Island was attempted (Best & Powlesland, 1985; Lloyd & Powlesland, 1994). However, only one individual from Fiordland successfully bred with individuals from Rakiura. After intensive conservation management interventions, the kākāpō population has grown from a record low of 51 individuals in 1995 to ~200 individuals as of the 2021/2022 breeding season (Kākāpō Recovery Group, 2017; Kākāpō Recovery Team personal communications). In fact, of the ~200 birds discovered on Rakiura and in Fiordland National Park, the extant kākāpō population can be traced back to only 35 founding individuals (Kākāpō Recovery Team personal communications). In an effort to mitigate the effects of small population size and inbreeding in kākāpō, island translocations are partially informed by pedigree data and more recently, genomic estimates of relatedness as a result of the Kākāpō125+ consortium (Guhlin et al., 2022 preprint). Briefly, as described in Guhlin et al. (2022), to inform kākāpō conservation efforts, the Kākāpō125+ project was initiated in 2015 to sequence all 125 living kākāpō at the time. Between 2015 and 2018, whole-genome short-read sequence data for these 125 individuals, and an additional 44 deceased adults and chicks, were generated for a total of 169 sequenced individuals. The Kākāpō125+ project has established a near-whole species high-quality variant dataset for a species of conservation concern and presents an exciting opportunity to explore strategies for SV discovery and genotyping in a non-model species. Here, we combine these data with long-read sequence data for a subset of individuals highly represented in the kākāpō pedigree, a highly contiguous reference genome (Rhie et al., 2021), and extensive life history data for all individuals, including verified pedigree relationships (Bergner et al., 2014; Guhlin et al., 2022 preprint) to compare four short-read and two long-read SV discovery and genotyping strategies to assess how each impacts inferences about SV frequency and size distributions in kākāpō. This study represents a critical first step towards our understanding the eco-evolutionary dynamics of SVs in small populations (Wold et al., 2021).