2.7 Gene set overlap analysis
In addition to the multivariate analysis, which was expected to reveal
broad genome-wide parallelism (in genetic divergence or gene expression)
or tradeoffs (in gene expression only), we have used a simpler approach
based on counting the number of shared outlier genes between two or more
contrasts, as well as the number of contrast-specific outliers. To
determine how many genes are expected there just by chance, we used
5,000 permutations of the test statistic to obtain null distributions.
We calculated the p-value for observing a certain number of genes in an
overlapping or contrast-specific gene set as the fraction of permuted
analyses returning the same or more extreme number of genes in the same
set, multiplied by two to account for the two-tailed nature of the test.
The genes representing each contrast were the top 25% quantile for the
genetic divergence (F ST) data, or top and bottom
12.5% quantiles for the log-fold gene expression change. To calculate
gene set overlaps we used the function venn in the R packagegplots and visualized the results as UpSet plots using the R
package UpSetR (Conway et al., 2017).