2.3 RAD data processing, SNP calling and filtering
Reads from the six libraries were demultiplexed
using process_radtags (Stacks v1.13) (Catchen, Amores,
Hohenlohe, Cresko, & Postlethwait, 2011). Quality assessment and
filtering were performed using FASTQC (Andrews, 2012) and low-quality
terminal sites were removed using Fastx_trimmer to produce reads
of exactly 87 bp. A total number of 55,043,269 reads were aligned using
the bwa mem algorithms (Li & Durbin, 2009) to the updated
reference genome of C. capitata (NW_019376232.1) (Papanicolaou
et al., 2016). Finally, the bam files were sorted and indexed using
Samtools (Li et al., 2009).
The STACKS program v2.2 (Rochette, Rivera-Colon, & Catchen, 2019) was
used for SNP detection and genotype calling under the Marukilow model
using default parameters, except for the minimum mapping quality
(–min-mapq 20), which resulted in 35,345 loci from 42,8333,799
remaining reads for a mean coverage of 78.2x.
The populations program of the STACKS v2.2 package was applied to
retain only loci with SNPs present in at least 80% of the individuals
in a specific population (r = 0.8) and the
option order_export was activated because the reads were mapped
to a reference genome. The dataset was filtered by selecting the most
informative SNP per locus based on the options write-single-snps ,
to reduce the probability of linkage disequilibrium among loci in the
further analyses.
Deviation from Hardy-Weinberg Equilibrium (HWE) for each locus was
examined using populations, and those loci that deviated from HWE
at a significance level of 0.005 were removed. Furthermore, we
identified outlier SNPs using ARLEQUIN v.3.5.2.1 (Excoffier & Lischer,
2010) and BAYESCAN v2.1 (Foll & Gaggiotti, 2008). For ARLEQUIN, we used
the “no hierarchical island model”, 100,000 simulations and 1,000
demes. The p- values from this analysis were adjusted for each
locus using the FDR method to calculate the q- values in R
(Team, 2020). As for BAYESCAN we ran the analysis with 10,000
output iterations and 100 prior odds. The final putative neutral SNPs
dataset consisted of 1,907 SNPs in 92 individuals (Table 1).