Katarina Stuart -

Abstract:Whole genome sequencing (WGS) has greatly expanded researchers' ability to study structural variants (SVs), i.e. the variation in the presence, number, orientation, or position of a DNA sequence. This has paved the way to study the eco-evolutionary dynamics of SVs across the tree of life and within a population genomics framework. In this review, we provide the necessary fundamentals to help researchers generate and analyze population-level SV data. We discuss the unique properties of different SV groups, and how these fundamental differences interact with important biological and evolutionary processes using both empirical results and theory. This includes discussion of unresolved issues around SVs, such as technical difficulties in identification, accounting for diversity, and evaluating functional effects. We explicitly integrate into this discussion transposable elements, which are an important component of SVs often identified in population-level variant data. Finally, we focus on the practical side of SV analysis, offering a framework for SV identification and data analysis. In particular, we examine the heterogeneous nature of SV properties (type, length, sequence identity) that should be considered when studying them in ecology and evolution. This review aims to provide resources and guidelines to help researchers, navigate the complexities of a relatively new field of eco-evolutionary genomics research.Keywords: inversions, chromosomal rearrangements, copy number variants, transposable elements, distribution of fitness effects, rapid adaptation 1. IntroductionCharacterizing genomic variation is fundamental to address a wide array of ecological and evolutionary questions. The advancement of DNA sequencing methods over time has enabled the discovery of new aspects of genetic diversity at every step. In particular, ecological and evolutionary genomics have flourished with the increased availability of high quality reference genomes (Formenti et al., 2022), and attainability of whole genome sequencing (WGS) (Fuentes-Pardo & Ruzzante, 2017). Resequencing entire genomes, rather than a small portion through reduced-representation approaches, provides a rich source of information and has led to a proliferation of methods to investigate evolutionary and demographic processes. We can now identify signatures of balancing selection in the genome (Stern & Lee, 2020), reconstruct demographic history in the near and distant past with unprecedented resolution (Nadachowska-Brzyska et al., 2022; Santiago et al., 2020), and characterize the roles of genome structure and recombination in the levels and distribution of genomic variation across the genome (Akopyan et al., 2025; Tigano et al., 2021). Many genomic analysis methods are currently catered to SNP variation, and WGS in particular has greatly expanded our view of genome wide variation within and between species.The increasing accessibility and coverage of WGS data has also enabled the direct identification of larger genetic variants, known as structural variants (SVs) (Alkan et al., 2011), enabling a deeper understanding of their role in ecology and evolution (Mérot, Oomen, et al., 2020). The growing breadth of population-level SV studies has quickly revealed both the ubiquity and magnitude of SVs’ contributions towards genomic diversity. Several population genomics studies have shown that SVs generally cover more base pairs than sequence variation (i.e., Single Nucleotide Polymorphisms, SNPs) by a factor of 3x to 8x across investigated species (Catanach et al., 2019, Mérot et al., 2023, Hämälä et al., 2021, Tigano et al. 2020)). SVs can also strongly affect fitness and phenotypes (Wellenreuther et al., 2025). For example, the estimated heritability of agronomically-important traits in tomatoes increased by 24% when SVs were also considered compared to analyses based on SNPs only (Zhou et al., 2022). The inclusion of SVs into research fields that have previously focused heavily on SNPs will aid with the interpretation of complex genomic patterns and processes (e.g., biogeography, Dallaire et al., 2023; eco-evolutionary dynamics, Oomen et al., 2020) and provide researchers with a more complete picture of intraspecific and interspecific genetic variation.Because of this growing appreciation of SVs, researchers are increasingly interested in reanalysing existing WGS datasets or obtaining new data to examine SVs in their species of interest. This can be a daunting task because of the genetic resources required, as well as the technical and biological complexity of SV data analysis and interpretation. Furthermore, SVs are an extremely diverse category of variants, and analysing SVs like SNPs - as a single group - limits insights from their diverse subtypes and complex roles in genetic diversity.In this review, we aim to answer practical questions for those new to the study of SVs, guide study design and analytical best practices for those seeking to analyse population level SVs within eco-evolutionary studies, and suggest future avenues of inquiry. First, we summarise the differences between SVs and SNPs, as well as between diverse types of SVs, and discuss how these specific properties may interact with eco-evolutionary processes. We then focus on the practical side of SV analysis, providing a framework for identifying and analysing SVs from WGS data. Accompanying the global movement providing high quality reference genomes (e.g. Ebenezer et al., 2022; Lewin et al., 2022; Mc Cartney et al., 2024), this review aims to reduce barriers to analysing the full spectrum of genomic variation by incorporating SVs in eco-evolutionary studies.2. How do we define and classify SVs, including TEs?‘Structural Variant’ is a broad term that encompasses all variation in the DNA sequence other than single nucleotide variants (SNV, which include single nucleotide polymorphisms [SNPs]). SVs are generally defined as variation in the presence, absence, number, orientation, or position of a DNA sequence. Some studies classify a variant as structural if it exceeds a minimum length threshold (typically 50 bp), which is usually an arbitrary cutoff inherited from SV detection software (Mérot, Oomen, et al., 2020). At the extreme, SVs can be whole chromosomes and genome duplications (Scherer et al., 2007). Earlier methods for identifying small structural variants, such as insertion–deletions (indels), were limited by short-read sequencing and ignored variants longer than this threshold. Modern SV detection tools now often target this intermediate length range. In reality, however, such variants occur along a continuous length spectrum, extending from just a few base pairs to many megabases (Mérot, Oomen, et al., 2020, Wellenreuther et al. 2025). Any length threshold is thus inherently arbitrary and constrains the detection and interpretation of evolutionarily relevant SVs (Recuerda & Campagna, 2024). In practice, the apparent length range of SVs in any given study is determined not by biology but by the technical limits of variant-calling algorithms and the characteristics of the sequencing data. Longer reads generally enable the detection of larger and more complex variants (Mahmoud et al., 2019). Consequently, good scientific practice requires authors to clearly specify the length range of SVs targeted in their analyses, ideally grounded in the empirically defined detection limits established by benchmarking studies of the chosen SV caller(s) (e.g., Helal et al., 2024). This is particularly important when the operational definition of ‘SV’ in a study depends on those methodological constraints. For example, studies based on assembly alignment will perform well at characterising variants spanning hundreds of kilobases, whereas short-read based approaches capture those below a few kilobases (He et al., 2025). Longread (LR) platforms such as PacBio HiFi or Oxford Nanopore are theoretically able to recover variants of any length, but it is worth keeping in mind that most of the tools have been designed and tested on simulated or curated databases with a majority of SVs between 50bp and a few kilobases.SVs are also defined by their sequence change relative to a reference genome. This usually includes deletions (DEL), insertions (INS), duplications , inversions (INV), fusions, and translocations (Alkan et al., 2011). While this categorization is meaningful from a bioinformatic perspective (the variant is classified by comparing with the reference genome), other intrinsic characteristics of SVs may be more relevant from a biological point of view, such as how these variants originate and evolve over time. SVs can originate by many mechanisms, including errors in meiotic recombination like incomplete crossover (e.g., due to age or toxins), improper DNA repair, or replication issues like template switching or slippage (Carvalho & Lupski, 2016; Currall et al., 2013). Some SVs (but not all), when their sequence is examined, will be identified as a repeat, for example microsatellites or transposable elements (TEs). All these major classifications of SVs can also be caused by the activity of TEs (Almojil et al., 2021), which are repetitive genetic elements that can originate from the genome itself or from external viruses and have the ability to move and replicate themselves within the genome (Bourque et al., 2018). When a TE replicates or relocates within the host genome, it creates structural genetic variation, thus making this TE an SV. While SVs are identified by their differences from a reference genome, TEs are identified by their recurrent sequence motifs, which are generally phylogenetically grouped into 'families' sharing similar sequences that diversify alongside their host genomes (Bourque et al., 2018). When TEs become fixed due to selection or drift within the population or species, they are no longer a polymorphic variant, so they should not e considered a SV despite showing TE-like sequences. Thus, not all SVs are TEs, and conversely, not all TEs are SVs. Whether TEs are SVs or not, they can promote SV formation by creating similar genomic regions that trigger non-allelic homologous recombination (Klein & O’Neill, 2018, Harringmeyer & Hoekstra, 2022; Meyer et al., 2024). In this review, we use 'SV' to refer collectively to structural variants of all types and lengths, including those of TE and non-TE origin, unless otherwise specified. Note that many studies focus on specific subsets of SVs, using different terms such as copy number variation (CNV), indels (insertions-deletions), presence-absence variation (PAV), chromosomal rearrangements (CRs, e.g. inversions, translocations, fusions, usually longer than 100s of kb), or microsatellites (which are CNVs, which are in turn indels). Similarly, TEs are referred to as transposons, jumping genes, mobile genetic elements (MGEs), mobile DNA, retrotransposons or DNA transposons. Overlooking this diverse terminology may lead to important research being missed.2.1 Why do we study SV diversity when we already have genome wide SNP data? One might question whether identifying SVs is necessary—can SNP sequence variation alone capture the patterns of genetic variation necessary for genomic analysis? Broadly, patterns of diversity and differentiation (e.g., population structure) across sequence (SNPs) and structural (SVs) variation often correlate (Tigano et al., 2024; Tigano & Russello, 2022), although differential patterns are sometimes observed (Dorant et al., 2020; Tigano et al., 2024). SVs can affect patterns of population structure if they harbour a concentration of highly differentiated SNPs that capture an axis of differentiation, for example local adaptation, different from the “neutral” patterns of population differentiation (Tepolt et al., 2022). Moreover, because large structural rearrangements often underlie ecotypic differentiation, they frequently harbour loci of large effect that contribute to local adaptation. Such variants are therefore important to consider in the context of species management and conservation (Wold et al., 2021, Schneller et al., 2025). Their relevance extends to population viability, as demonstrated in wolves (Canis lupus lupus), where structural genomic variation in the inbred Scandinavian population contributes to realized genetic load (the accumulation of deleterious variants) but can be mitigated by immigration (Smeds et al., 2024). Similarly, in Atlantic salmon (Salmo salar), population-specific structural variants have been associated with different local adaptation between ecotypes (Lecomte et al., 2024). Throughout the rest of this review we will discuss many more examples, across a wide variety of SV types and different taxa. Much of the theory and analytical approaches used in population genomics has been developed around SNPs, and expanding the genomic toolkit to include SVs is still in its infancy (e.g. Barton & Zeng, 2018). Due to their dense and genome-wide distribution, SNPs remain ideal for some applications such as linkage disequilibrium (LD) studies (e.g. inferring recombination landscapes, effective population size, etc) and for quantitative trait loci (QTL) mapping. However, because SVs are larger and encompass more base pair changes overall, limiting the population genomic inference to SNP-based analysis will misrepresent overall genetic diversity. SNPs located within SVs may not be independent markers meaning the patterns they capture may be overrepresented in population genetic data, and complex SVs are often not captured by SNPs at all. This limitation disproportionately affects the detection of large-effect variants in the genome, as larger, more complex SVs are more likely to have functional impacts on the organism (see Section 3.2). SVs can be the genomic basis of discrete morphotypes (Lamichhaney et al., 2016) and ecotypes (Li et al., 2024), and underlie many human diseases, likely contributing to the missing heritability issue when overlooked (i.e. where trait heritability estimates are much lower than expected when calculated using SNPs only) (Groza, Chen, et al., 2024). An increasing number of studies on commercially-relevant species demonstrate that SVs underlie traits of economic interest (Jayakodi et al., 2020; Leonard et al., 2024). Therefore, important sources of genomic variation will be missed in ecology, evolution, and in applied research if SVs are ignored.One may wonder to what extent the effect of SV can be predicted from neighbouring SNPs. This approach may be cost-efficient in some cases such as well-documented catalogues of variants (Blaj et al., 2022) or for diagnostic SNPs associated with large inversions (Fang & Edwards, 2024). However, in general, SNP calling pipelines often exclude SV signals inadvertently (or intentionally) by masking repeats or removing SNPs with anomalously high depth or systematic missingness, which removes false SNP signals (e.g. Dallaire et al., 2023; Jaegle et al., 2023) but ultimately underrepresents SV variation. Further, SVs are often found in regions of high recombination (Currall et al., 2013; Stuart, Tan, et al., 2025) and may respond to selection differently than SNPs (discussed below), which will impact the patterns of linkage disequilibrium between SVs and adjacent SNPs (Kato et al., 2006). The interplay of both biological and technical reasons means that whether SVs are anchored to nearby SNPs is likely highly dependent on the SV properties (e.g., type, length) and genomic context (Chia et al., 2012; Geibel et al., 2022).3. What are the properties of SVs that matter for population genomics?Understanding the population genomics of SVs requires recognizing their properties and how they can interact with eco-evolutionary processes (Fig. 1). Most knowledge about genome-wide SVs comes from model organisms such as humans (often in disease research), focusing on variants with large effects or easily identifiable SVs like short deletions or large inversions. However, understanding how a study system's unique history interacts with SV properties can guide the development of evolutionarily relevant hypotheses.3.1 The origins and dynamics of SVs are diverse and poorly characterizedThe formation rates of SVs, including TEs, have long been studied at the macro-evolutionary scale within the context of speciation and genome size evolution (Talla et al., 2017). The vast diversity in genome sizes across the tree of life reflects the highly variable rates of SV origin and accumulation, and consequently, their contributions to genome content (Chalopin et al., 2015). The rate at which SVs arise is expected to be more variable than for SNPs, reflecting the wide diversity in SV types and lengths (Fig. 1a, Ho & Schaack, 2021). SVs are thought to arise at a lower frequency than SNPs, on average, in terms of singular SNP mutation/SV formation events (e.g. 0.16 vs 70 events per genome (Belyeu et al., 2021)), even though novel SV formation can impact a greater number of nucleotides. However, some types of SVs may arise more frequently than SNPs (e.g. very small microsatellites (Vigouroux et al., 2002) or duplications (Katju & Bergthorsson, 2013)). The subset of SVs that are TEs are known for their high activity levels (Biémont & Vieira, 2006), though this rate varies considerably across TE families and across time (Ho et al., 2021). Many eco-evolutionary and population genomics models for SNPs assume constant mutation rates (e.g. Nadachowska-Brzyska et al., 2022), which is known to be a convenient simplification (Bergeron et al., 2023; Heasley et al., 2021). However, SVs are more evolutionarily varied than SNPs as they form and evolve in a variety of different ways and any assumption about fixed mutation/formation rates appears even less realistic and reliable than when applied to sequence variation (Loewenthal et al., 2022; Petrov, 2002). TEs, for example, are well known to often multiply in bursts when triggered by population expansion or hybridization events (Bergman & Bensasson, 2007), leading to many new TE insertions. In such instances the data would reflect a general excess of rare variants, which traditionally may be attributed to population expansion or purifying selection, which would be an inaccurate interpretation in this case (Bourgeois & Boissinot, 2019). Thus when trying to infer pattern from SV mutation rates, identifying their mechanism of creation (e.g., what type of repeat motif characterises an SV) is an essential analytical step, and interpreting such data alongside comparisons with SNP spectra will likely offer evolutionary research nuanced insights into the interplay between mutation rate, drift, selection, and demographic history (Fig. 1e).3.2 SVs are less likely to be neutral due to their length Theoretical predictions state that the larger an SV, the higher the likelihood that it will induce a functional, likely negative, impact on fitness (Hämälä et al., 2021; Scott et al., 2021). Thus, we may expect that the distribution of fitness effects (the range and frequency of fitness consequences of new mutations, from deleterious to neutral to beneficial) of SVs to have a lower proportion of nearly neutral variants than SNPs (Fig. 1b). The maximum negative fitness consequence will be the same for SNPs and SVs (i.e. the variant is lethal), however, the rarity of large SVs suggests lethal deleterious SVs may be more common than deleterious SNPs (Eichler, 2019). Conversely, the magnitude of potential beneficial effect of SVs most likely extends beyond that of SNPs (Fig. 1b), with many large inversion polymorphisms being examples example of beneficial variation (Berdan et al., 2023). The distribution of fitness effects reflects the interaction of formation rates, genome interactions, and selection regime. Even closely related species can exhibit differences in the fitness effects of SNPs, and SVs may be even more diverse in this regard (James et al., 2023). Larger variants are more likely to have functional impacts, and thus effect size generally scales with variant length (Collins et al., 2020), with smaller SVs behaving more similarly to SNPs, though exceptions exist (Metzgar et al., 2000). The fitness effects of an SV also depend on the type of sequence change, with the addition, removal, or rearrangement of genomic sequence likely to be under varying strengths of selection in different contexts (Gaut et al., 2018; Loewenthal et al., 2022). The location of the variant is also important, for example, intronic INS are less likely to be disruptive compared to exonic deletions (Petrov, 2002). Finally, the impact of an SV may change over time. All SVs, and inversions in particular, are prone to genetic load, owing to localised suppressed recombination and reduced Ne (Hämälä et al., 2021), which may increase the mutational load they confer over generations (Jay et al., 2021).This relationship between length and fitness consequence for SVs is also likely to interact with many biological processes in different ways compared to SNPs. Large-effect variants may better support local adaptation in the face of gene flow, which can dilute adaptive alleles, so SVs may play a greater role than SNPs in such scenarios (Yeaman & Whitlock, 2011). Large SVs may also help maintain strong local adaptation under high mutation rates, as clusters of adaptive variants can be disrupted by frequent mutations whereas the SV will keep co-adapted variants within its length together (Sakamoto et al., 2024), and the SV itself may suppress recombination (see Section 3.3 for further discussion). Because of differences in underlying mutation rates and subsequent fitness effects, the evolutionary dynamics of SVs may vary across population and species divergence continua. For example, within some plant lineages, duplications and translocations have been found to accumulate with increasing phylogenetic distance, suggesting they may be common SV classes that differentiate sister taxa, whereas differences in inversions were more stochastic and highly variable (Ferguson et al., 2024; Hirabayashi & Owens, 2023). Consequently, the different characteristics of SVs such as length and type are important to consider, alongside selective and demographic processes, when trying to understand the fitness impacts of changes to the genome (Collins et al., 2020). However, due to the unavoidable detection biases present in many SV studies to date (see Section 5.2), a comprehensive framework for population-level expectations of variant types, lengths, and frequencies has yet to be developed. 3.3 SVs can alter population level dynamics in ways that SNPs cannotLarge SVs create non-homologous sequences when occurring in the heterozygous form, which may interfere with, or entirely inhibit, recombination (Fig. 1c). Although inversions are the most studied in this regard (Hoffmann & Rieseberg, 2008), other types of SVs, such as fusions, large CNVs or complex rearrangements, may have similar effects on recombination; this is evidenced by the reduction or relocation of chiasma or locally elevated LD (Dumas & Britton-Davidian, 2002; Trickett & Butlin, 1994; Wellband et al., 2019). SVs frequently arise in recombination hotspots (Currall et al., 2013), but can subsequently impede or inhibit recombination (Morgan et al., 2017). Regions with high recombination rates are more effective at purging deleterious variants (Kent et al., 2017; Morgan et al., 2017), however, selection within gene-rich regions may favour the persistence of SVs that suppress recombination, to maintain advantageous haplotypes. Recombination suppression has minimal effects on genome functionality but significantly influences population dynamics and the level of genetic variation (Yeaman, 2013). Whereas the rest of the genome is homogenized by recombination, the rearranged region exhibits two sets of haplotypes, with reduced or no gene flux between them and locally reduced effective population size (Ne) (see Faria et al., 2019 for further discussion). Large SVs inhibiting recombination may accumulate additional variants over time, often increasing deleterious effects (Berdan et al., 2022; Jay et al., 2021; Mahmoud et al., 2019). Conversely, SVs can link beneficial variants into ‘supergenes,’ particularly in inversions (Wellenreuther & Bernatchez, 2018). Reduced recombination can also enhance the spread of beneficial variants, such as from range cores to edges in expanding populations (Peischl et al., 2015). By preventing the breakage of complexes of co-adaptive alleles, SVs can maintain their overall fitness effect, thus promoting the spread of adaptive alleles both within species, resulting in parallel adaptation, and across species, when reproductive isolation barriers are still permeable (Battlay, Craig, et al., 2024; Jay et al., 2018; Nicolas et al., 2025; Westram et al., 2022). 3.4 SVs can alter genome organization and functionality in ways that SNPs cannotLarge SVs can alter the 3D organization of DNA, for example by altering the boundaries of topologically associating domains (TAD), which helps to segregate interacting sequence features such as target genes and their cis-regulatory sequences (Fig. 1d, Spielmann et al., 2018). More generally, SVs may alter chromatin structure, changing gene accessibility to transcription factors and/or RNA polymerase, thus affecting gene regulation without impacting the gene sequence itself (Bourque et al., 2018, Kim et al., 2019). The role of SVs in chromatin accessibility can be examined using ATAC-seq (Buenrostro et al. 2013). Although most applications of this technique have focused on disease and developmental studies, its use in evolutionary biology is growing. For example, Ruggieri et al. (2022) found that SVs accounted for approximately 30% of regions with altered chromatin accessibility across several Heliconius butterfly species. Many of these SVs corresponded to transposable element (TE) insertions from different TE families among species and were often located near gene transcription start sites. Another method of interrogating DNA for changes in organisation is chromosome conformation capture, which investigates 3D interactions between separate regions of the genome, and has been used to demonstrate changes in 3D structure and TAD boundaries, which in turn impact the position of chromosomes during meiosis, encouraging the formation of fusion SVs (Vara et al., 2021). In addition to largescale 3D organisation changes, even small SVs may cause alterations to genome functionality through gene expression changes via alterations to regulatory or coding regions. Here, SVs that are TEs play an important role due to their unique properties. They may in fact confer change because the host genomes’ mechanisms of defence may modify DNA structure or methylation, which may in turn alter expression in off target regions (Klein & O’Neill, 2018). Because of the arms race between TEs and their host genomes, admixture of genetically distinct taxa lineages can lead to ‘transcriptome shock,’ resulting in deregulation of gene expression in hybrids driven by the reactivation of silenced TEs, thus showing a potential role of TEs in the reinforcement of reproductive barriers between diverging taxa (Dion-Côté et al., 2014). These genome defences against TE invasions may also be disrupted through stress from selection regime change, for example, which weakens regulation, triggers bursts of activity, and potentially causes genetic innovation (Capy et al., 2000; Stapley et al., 2015, Klein & O’Neill, 2018). Because TEs carry sequence motifs capable of driving transcription within the host genome (Bourque et al., 2018), TE-derived sequences may also elicit functional change by undergoing co-option, a process in which TE derived DNA sequences are repurposed by the host genome in response to selection, potentially creating new genes or reactivating previously inactive genes (Jangam et al., 2017). Practically, the evolutionary impact of TE variation can be evaluated by combining population genomics with RNA-seq data to identify TE families with high transcription rates (Jin et al., 2015) or identify TE-derived genes resulting from co-option events (Oliveira et al., 2023) (Fig. 1h). Through such mechanisms, SVs, especially TEs, can shape both evolution and plasticity through the generation of functional variation throughout the genome (e.g., Catlin et al., 2025).