Figure captions
Figure 1. Schematic representation of our workflow indicating the main steps of the analytical pipeline. Our four main phases are represented in different colors, i.e. transcriptomic steps for the selection of open reading frames (ORFs, blue), comparative genomics for the selection of ultraconserved elements (UCEs, dark purple), probe design and the sequencing of genomic libraries (light purple), analyses after the processing of sequencing data (red).
Figure 2. Percentual recovery of the total length of 1,114 ORFs for 96 sequenced taxa and an outgroup transcriptome. The total size of ORFs, on average ~1500 nt, and the number of million filtered reads per sample are also indicated. The red line as to number of reads indicates a cut-off value of 500,000 reads. Samples with a lower number of reads displayed a drastic decrease in ORF recovery. A drastic decrease in recovery was also observed for the outgroup sample dna0240 belonging to the family Iridinidae, despite having generated 3,900,000 clean reads.
Figure 3. UCE recovery depending on combinations of coverage and identity as specified in the PHYLUCE pipeline. Boxplots indicate the contig ratio for all individuals, i.e. the number of unique contigs/maximum number of unique contigs per individual for all 96 individuals. The total number of UCEs recovered for 96 individuals varied between 1905 (combination cov50 and id50-60) and 926 (combination cov80 and id80). Several scenarios where coverage and identity are between 50 and 65% resulted in similar results, whereas both the number of unique contigs and UCE recovery decrease substantially for scenarios with identity >70%.
Figure 4. Gene map of the maternally-inherited mitochondrial genome for Coelaturini from the Malawi Basin. Genes positioned on the 5’ to 3’ (positive/heavy) strand are indicated on the inner circle, whereas those on the 3’ to 5’ (negative/light) strand on the outer circle.
Figure 5. Maximum likelihood phylogeny of Coelaturini based on a concatenated dataset of 1,109 open reading frames (2,348,614 bp with 515,219 parsimony informative sites; left), including exons and their intronic/intergenic flanking regions, and based on a concatenated dataset of 276 ultraconserved elements (119,105 bp with 11,001 parsimony informative sites; right). Red nodes are fully resolved using a Shimodaira-Hasegawa approximate likelihood ratio test and ultrafast bootstrapping. Both datasets result in highly congruent topologies, which are fully or mainly supported for ORFs and UCEs, respectively. More details are provided in Figs. S3 and S4.
Figure 6.A) Square root transformed nucleotide diversity (π) within six populations of Coelaturini from the Malawi Basin as inferred from ORFs (without intronic/intergenic flanking regions). The nucleotide diversity averaged over all six populations is indicated with a dashed red line, whereas mean population pairwise sequence divergence for each of 15 population pairs, i.e. mean, square root transformed DXY values, are indicated with blue lines (they are highly similar for all population pairs, resulting in a bold blue line). B) The density distribution of population pairwiseFST values for 1104 ORFs (one value per ORF) for two out of the 15 population pairs (these distributions are representative for other population pairs too).
Figure 7. Structure of molecular diversity in Coelaturini from the Malawi Basin. A,B) Principal component analysis on genome-wide SNP data, with 95% convex hulls on sampling localities. A bathymetric map of Lake Malawi, its outflow and the studied populations is provided in the inset. C) Bayesian clustering with fastSTRUCTURE on the same SNP dataset returned most support for a four-cluster solution separating the northern and southern regions of the Malawi Basin, and additionally the populations of Likoma Island (MLW8_032) and the Shire River (MLW8_010).