2.4 Population structure analysis
As linkage disequilibrium (LD) may affect the inference of population
structure, the diploid SNPs were firstly filtered with
LD (r 2) < 0.2 using PLINK 1.9 (Chang
et al., 2015) with the parameters ‘–indep-pairwise 100 10 0.2’. To
analyze data of diploids and triploids together, the pruned diploid SNPs
were then compared with the triploid SNPs using the isec function in
BCFtools (Danecek et al., 2021), and the intersection of SNPs was used
for population structure analysis. Three methods were used to infer
population genetic structure including principal component analysis
(PCA), structure analysis and genetic distance analysis. The first
method was used for diploids and triploids separately as well as all
samples together, while the other two methods were used for diploids and
triploids separately.
For PCA, the genotype data at each locus was firstly converted into the
frequency of the reference allele, that is, 0/0.5/1 for diploids and
0/0.33/0.67/1 for triploids. The PCA was then performed using the R
built-in function prcomp with default parameters. The STRUCTURE
software Version 2.3.4 (Pritchard et al., 2000) was used for genetic
structure analysis with five run times for each K value ranging
from 1 to 12. The optimal K , which indicates the most likely
number of genetic clusters, was determined according to the method
described in Evanno et al. (2005) (Evanno et al., 2005). For genetic
distance analysis, the identity-by-state (IBS) which describes the
genetic relationship among individuals was calculated using a custom R
script. The minimum evolution phylogeny trees were constructed based on
the genetic distance matrix of 1-IBS values using the FastME program
(Lefort et al., 2015) and visualized using the online tool iTOL
(http://itol.embl.de) (Letunic & Bork, 2019).
Pairwise genetic differentiation (F ST) and tests
for significance were estimated for invasive populations and regionally
defined genetic clusters of native populations using the R packageStAMPP (Pembleton et al., 2013). Additionally, genetic
differentiation among populations was analyzed by an Analysis of
Molecular Variance (AMOVA) using 100 permutations, where the variance
components were partitioned between regions (invasive and source
ranges), among populations within regions and within populations.