2.9 Tandem Duplications
Tandemly duplicated genes were identified using all-vs-all blastp searches within each genome for biotic stress associated intervals (Table S4). Genes with blastp E scores < 1e-180 that were located within 500 kb of one another were considered to be recent tandem duplications. The window size was determined by testing a range of values and choosing a window size at which the number of newly-discovered tandem duplicates began to decline (Figure 3). The stringent E score cutoff was intended to focus the analysis on genes that are recently duplicated and therefore potentially differentially duplicated between the species. The QTL intervals were tested for significant enrichment of tandem duplicates by using a Monte Carlo simulation. Sets of contiguous genes equal in number to those contained in each QTL interval were randomly selected from the whole genome, and the number of sampled tandem duplications was counted for each iteration. This was repeated 10,000 times, and the observed number of tandem duplicates was compared to the simulated distribution to derive an empirical P-value.