Genome-wide association mapping for dewlap traits
The genome-wide association study (GWAS) was based on all LD-filtered markers. Prior to GWAS, missing data was imputed using BEAGLE v5.0 (Browning & Browning, 2016) at default settings. To investigate correlations among the 11 traits that capture dewlap size (i.e., area and perimeter) and color (i.e., hue, total brightness, mean brightness, UV chroma, red chroma, yellow chroma, and composition of red, orange, or yellow), we first used the ggcorrplot R package (v. 0.1.3; Kassambara, 2019). The GWAS was then performed using two complementary approaches, which we refer to as the “standard GWAS” and the “ancestry-specific GWAS”. The standard GWAS consisted of a linear mixed model implemented in GEMMA (Zhou & Stephens, 2012). We first used GEMMA to calculate 10 relatedness matrices, by excluding SNPs from one chromosome at a time (i.e., each matrix was based on SNPs from 9 chromosomes). The GWAS for markers on a given chromosome was then performed using the relatedness matrix that excluded markers on that same chromosome. This approach should improve power to detect associations, given that the relatedness matrix does not include markers that are in linkage disequilibrium with the marker being tested (Cheng et al., 2013). We then obtained P values from the Wald test as outputted by GEMMA.
The ancestry-specific GWAS was conducted in asaMap (Skotte et al., 2019). To control for the confounding effect of population structure, we relied on the first 10 PCs from a PCA computed using the adegenetR package (v. 2.1.1; Jombart & Ahmed, 2011), which were included as covariates in the asaMap analysis. The PCA was based on all LD-filtered autosomal SNPs. The asaMap software performs eight association tests, with and without different ancestry-specific effects considered for two ancestral populations (Skotte et al., 2019). Following asaMap manual recommendations, we estimated genome-wide ancestry proportions of each sample, as well as SNP allele frequencies for each ancestral population using ADMIXTURE v. 1.3.0 (Alexander et al., 2009). For each dewlap trait, we obtained genome-wide P values for all tests and selected the test with the strongest association (i.e., the lowest observed P value among all tested SNPs), following Bock et al., (2021).
For both the standard and the ancestry-specific GWAS results, we used the GenABEL R package (Aulchenko et al., 2007) to calculate the lambda genomic inflation factor (λ), and account for any remainingP value inflation, which can be caused by unaccounted population stratification. Genome-wide significance thresholds were then set using the Bonferroni method (0.05/total SNPs used for the association test), which is the most conservative option used in association studies (e.g., Duggal et al., 2008; Kaler & Purcell, 2019). We also considered a suggestive genome-wide significance threshold (1/total SNPs used for the association test; Duggal et al., 2008), for which one false positive per GWAS is expected. To visualize GWAS results, we used Manhattan plots, made in R v.4.1.0 using publicly available scripts (https://danielroelfs.com/blog/how-i-create-manhattan-plots-using-ggplot/). Finally, to estimate the effect size of associations, we build linear models in R. These models had number of alternative alleles (i.e., alleles different from those incorporated in the reference genome) at the lead GWA SNP as the predictor variable, and the corresponding dewlap trait as the response variable. Effect sizes were then estimated as R2 values, using the “summary” function in R.
To better understand the observed effect of ancestry on dewlap traits, we next focused on results of the ancestry-specific GWAS. Specifically, we investigated how dewlap traits vary among genotype classes for ancestry-specific associations that passed the Bonferroni or suggestive significance thresholds. For this, we stratified the invasive A. sagrei samples based on ancestry, using a cutoff of 70% Western Cuba ancestry, as inferred using the STRUCTURE analysis. Because of the genetic makeup of invasive A. sagrei populations across the southeastern United States, this approach partitions samples into two groups, one with limited hybridization (i.e., Western Cuba ancestry samples) and another with frequent hybrids for which Central-eastern Cuba ancestry predominates (see also Bock et al., 2021).
Identifying the signature of local adaptation for