Stacks parameters optimization

The protocol established by Paris et al. (2017) to identify the optimal parameters for the “de novo” analysis was followed. This optimization is critical due to the comparison of two different species, which have higher variability than intraspecific samples. A sub-sample of one individual from each population was sorted at random to obtain the greatest genetic diversity from the total sampling. For this shortened data set, the Stacks protocol was performed several times, varying only one parameter in each run. Initially, the standard parameters: M – the maximum distance (in nucleotides) allowed between stacks (default 2); m — minimum depth of coverage required to create a stack (default 3); N — maximum distance allowed to align secondary reads to primary stacks (default: M + 2). Then, the parameters of the ustacks program were increased: m from 3 to 4 (m3-m4) and the M parameter from 2 to 5 (M2-M5). In addition to these parameters, the cstacks parameter n from 1 to 5 (n1-n5) was tested while all other parameters (m3, M2, and n1) remained constant. The parameters were chosen to maximize the number of recovered polymorphic loci and were selected when further increases in the parameter resulted in the same number of recovered polymorphic loci. To incorporate the interspecific polymorphisms, the populations parameter was set to -R = 90%, indicating that SNPs were present in 90% of the samples, including at least one A. alalia individual.