Microevolutionary analyses
Variant calling. Assessments of population genetic diversity and population structure are based on ORF targets without intronic/intergenic flanking regions, and UCEs with flanking regions. Variant calling was performed by alignment of the raw reads against custom-built consensus sequences for ORFs and UCEs with their respective flanking regions using Bowtie2 on a sample-by-sample basis, after which .sam files were converted to .bam using SAMTools, modified with Picard v. 2.21.4 (available at http://broadinstitute.github.io/picard), and merged across individuals with SAMTools. These .bam files, one for ORFs, another for UCEs, were used to call and annotate single nucleotide polymorphisms (SNPs) with GATK v. 4.1.9.0 (McKenna et al., 2010) using HaplotypeCaller. Individual GVCF files were merged per marker type and subjected to joint genotyping to obtain a .vcf file with information on all sites, both variant and invariant. From the .vcf file for ORFs we extracted information on exonic sites using an EXONERATE-generated .gff annotation file. The resulting .vcf files were filtered with functions of VCFtools v. 0.1.16 (Danecek et al., 2011) and BCFtools v. 1.12 (Danecek et al., 2021) depending on downstream analyses as described below.
Nucleotide diversity and population differentiation. For subsequent analyses of nucleotide diversity and population differentiation, the .vcf files for ORFs and UCEs were individually filtered to retain sites with a mean genotype depth of 8 that have been successfully genotyped in 80% of the individuals. Because invariant sites do not have quality scores when using condensed non-variant blocks, we created individual .vcf files for variant and invariant sites. Invariant sites were identified by setting the minor allele frequency to zero (–max-maf), whereas variant sites have a minor allele count ≥1 (–mac). Only variant sites with a minimum quality score of 30 (-minQ) were retained. Subsequently, we indexed both .vcf files with tabix of SAMtools and combined them with BCFtools to estimate nucleotide diversity (π ) within populations per ORF and UCE, the average, absolute nucleotide divergence between population pairs (DXY ) and population differentiation (FST ) using Pixy v. 1.0.4 (Korunes & Samuk, 2021). We also converted the filtered .vcf file for ORFs into a multifasta file with random (unphased) assignment of variants to the two alleles. Each multifasta file was aligned to its ORF target, after which reading frames were manually verified in SeaView. Gap-only sites in protein sequences were removed. The verified multifasta files were combined to calculate synonymous and non-synonymous nucleotide diversity (πS and πN, respectively) with dNdSpiNpiS v. 1.0 (available at http://kimura.univ-montp2.fr/PopPhyl).
Genetic structure. Analyses of genetic structure were undertaken for variant sites of ORFs only, which were obtained from the total .vcf file by filtering for a minimum allele count of 3, a minor allele frequency ≥0.05, a minimum quality score of 30 and a mean genotype depth of 8 for at least 80% of the individuals. Furthermore, indels and variants that were not bi-allelic were removed, as were sites with a Hardy-Weinberg p -value<0.001. Finally, we removed individuals for which >50% of the sites displayed missing data. The resulting .vcf file was converted to a .bed file for principal component analysis (PCA) using PLINK v. 1.90b6.18 (Purcell et al., 2007), and to examine population structure with fastSTRUCTURE (Raj et al., 2014) using K=1-6 with 30 independent runs per K. The underlying genetic structure and the appropriate number of clusters were examined with the ΔK method (Evanno et al., 2005) and others that are more robust when sampling is uneven (Puechmaille, 2016), as implemented in StructureSelector (Y.-L. Li & Liu, 2018).