a. WGA test dataset
Samples treated with WGA prior to GBS library preparation had higher numbers of raw sequence reads relative to the non-WGA samples, however this read abundance was not evenly distributed across individuals (Table S1). The WGA test dataset (24 individuals sequenced with and without WGA = 48 sub-libraries) produced a total of 445.5 million raw sequence reads; 57.2 million reads were attributed to the non-WGA treated sequences and the remaining 388.4 million to the WGA treated sequences (Appendix 1: Table A1). After quality filtering, the number of retained reads dropped to 8.7 million and 80.4 million, respectively. Approximately 68% of the total sequencing reads were discarded during quality filtering due to adapter contamination, while only 2.2% of the total reads were discarded due to low quality. Across samples, 8 of the 24 samples represented approximately 80% of the WGA raw sequence reads (min: 21.6 million, max: 70.3 million, mean: 38.9 million, Appendix 1: Table A2). The remaining 16 samples contained markedly fewer raw sequencing reads (min: 2.7 million, max: 8.7 million, mean: 4.8 million). While the non-WGA samples had a more even distribution of raw reads across samples, the same proportion of samples (8 of 24) still contained the majority (55%) of the non-WGA raw reads (min: 2.9 million, max: 6 million, mean: 3.9 million, Appendix 1: Table A2), and 5 of these highly-sequenced individuals were the same between treatments.
Next, we assessed the number of invariant loci, polymorphic loci, and SNPs for each tested value of M and n using the 48 libraries in the WGA test dataset (24 with WGA and 24 without). Following Paris et al. (2017), we chose parameter values forM and n that optimized both the number of polymorphic loci and SNPs, and for both the WGA and non-WGA treatments these values were maximized at M 2n 2. In the resulting dataset, we observed large differences in the number of polymorphic loci, SNPs, and overall read depth between the two treatments. The non-WGA samples had more than twice the number of loci and SNPs than the samples treated with WGA, and the mean depth of coverage in these sequences was approximately 30% that of the WGA samples (Appendix 1: Table A3). However, the mean number of SNPs per locus between treatments (non-WGA = 2.4, WGA = 2.1, Table A3) and values of observed heterozygosity (non-WGA = 0.15, WGA = 0.13, Table A3) were similar. Additionally, pairwiseFST calculations between the WGA and non-WGA sequences for each population were zero (Appendix 1: Table A4), and a PCA of this dataset clustered libraries by sample, not WGA treatment. (Appendix 1: Fig. A1).