a. WGA test dataset
Samples treated with WGA prior to GBS library preparation had higher
numbers of raw sequence reads relative to the non-WGA samples, however
this read abundance was not evenly distributed across individuals (Table
S1). The WGA test dataset (24 individuals sequenced with and without WGA
= 48 sub-libraries) produced a total of 445.5 million raw sequence
reads; 57.2 million reads were attributed to the non-WGA treated
sequences and the remaining 388.4 million to the WGA treated sequences
(Appendix 1: Table A1). After quality filtering, the number of retained
reads dropped to 8.7 million and 80.4 million, respectively.
Approximately 68% of the total sequencing reads were discarded during
quality filtering due to adapter contamination, while only 2.2% of the
total reads were discarded due to low quality. Across samples, 8 of the
24 samples represented approximately 80% of the WGA raw sequence reads
(min: 21.6 million, max: 70.3 million, mean: 38.9 million, Appendix 1:
Table A2). The remaining 16 samples contained markedly fewer raw
sequencing reads (min: 2.7 million, max: 8.7 million, mean: 4.8
million). While the non-WGA samples had a more even distribution of raw
reads across samples, the same proportion of samples (8 of 24) still
contained the majority (55%) of the non-WGA raw reads (min: 2.9
million, max: 6 million, mean: 3.9 million, Appendix 1: Table A2), and 5
of these highly-sequenced individuals were the same between treatments.
Next, we assessed the number of invariant loci, polymorphic loci, and
SNPs for each tested value of M and n using the 48
libraries in the WGA test dataset (24 with WGA and 24 without).
Following Paris et al. (2017), we chose parameter values forM and n that optimized both the number of polymorphic loci
and SNPs, and for both the WGA and non-WGA treatments these values were
maximized at M 2n 2. In the resulting dataset, we observed
large differences in the number of polymorphic loci, SNPs, and overall
read depth between the two treatments. The non-WGA samples had more than
twice the number of loci and SNPs than the samples treated with WGA, and
the mean depth of coverage in these sequences was approximately 30%
that of the WGA samples (Appendix 1: Table A3). However, the mean number
of SNPs per locus between treatments (non-WGA = 2.4, WGA = 2.1, Table
A3) and values of observed heterozygosity (non-WGA = 0.15, WGA = 0.13,
Table A3) were similar. Additionally, pairwiseFST calculations between the WGA and non-WGA
sequences for each population were zero (Appendix 1: Table A4), and a
PCA of this dataset clustered libraries by sample, not WGA treatment.
(Appendix 1: Fig. A1).