Phylogeographic analysis

Admixture results considering different number of clusters (2-8) [Fig. S1] provide compelling information about the genetic structure of the 12 sites sampled. When using [K = 2], the sites from Rio Negro and Rio Solimões are strongly differentiated. Only CAT and JAR, two sites located downstream from the Rio Negro, have a weak admixture with Rio Negro sites. When using [K = 3], the two sites located upstream of the Rio Negro (BAR and NEG) are differentiated, but still show some admixture with sites downstream (CEM and ANA). Nevertheless, BRA does not share posterior membership probability with BAR and NEG even though these sites are close geographically. When using [K = 4], sites located far upstream of the Rio Solimões (SOL and TEF) are differentiated from the rest of the Rio Solimões sites. But PIR, a site even further upstream the Rio Solimões, is not differentiated from other downstream lakes. Each subsequent increase of the K-value up to eight led to the differentiation of a new site, [K = 5] differentiates MAN, [K = 6] differentiates BRA, [K = 7] differentiates PIR and [K = 8] differentiates JAR.
According to the results from the cross-validation error values from ADMIXTURE [Fig. S2], the optimal number of clusters for our genetic data is three. Nevertheless, this cross-validation value 0.20966 is close to the one obtained with four clusters 0.20997. The “find.cluster” function from Adegenet led to a similar result since its goodness of the fit (BIC) values reduced more slowly at the fourth cluster [Fig. S3]. Additionally, the posterior membership probability plots [Fig. S1] stopped forming biologically significant clusters after [K = 4], differentiating only one sampling site at the time when adding more clusters. For this reason, we completed the phylogeographic analysis assuming that our full SNPs dataset is represented by four genetic clusters.
According to the pie chart map [Fig. 2], some sites that are close to each other are showing a strong genetic differentiation with each others. For instance, there is a genetic gap , a disproportional genetic distance compared to the river course distance separating sites, at the confluence of the Rio Negro and Rio Solimões. Additionally, there is another similar genetic gap between BRA and NEG. Thesegenetic gaps are detectable in every neutral population structure analysis that we have produced; the pie chart map [Fig. 2], the high pairwise Fst values [Fig. 3] and the admixture barplots [Fig S1]. These two genetic gaps represent sites that are separated by a short river course distance, but that are isolated by a strong downstream water flow and are from drastically different environmental conditions, as shown by the differences in physicochemical properties recorded at these sites [Table 1]. However, BRA (white water) has a higher relatedness with ANA (black water) than NEG (black water) has [Fig 2 and 3]. This is despite BRA and NEG being at similar river course distances from ANA. This migration pattern, migrating downstream preferentially from white water to black water, is inverse to the one observed at the Rio Negro-Solimões confluence, where the white water sites are more closely related to other white water sites. Similarly, ANA and CEM, sites from the Rio Negro, share common posterior memberships with sites downstream (JAR and CAT) [Fig. 2] even though they are from drastically divergent environments.
We detected a third genetic gap at the confluence of the Rio Tefé and Solimões. Effectively, the genetic distance between TEF and PIR is disproportionately big when compared to the small geographic distance separating the sites. Additionally, SOL (white water) is more closely related to TEF (black water) than PIR (white water). This result is detectable in the pie chart map [Fig. 2], where PIR shares a common posterior membership with other downstream Rio Solimões sites, and TEF and SOL are clustered apart. This result is also present in the pairwise Fst heatmap [Fig. 3], where SOL and TEF are almost identical. Howerver, according to the pairwise Fst heatmap, TEF and SOL are not much more genetically distant to other downstream Rio Solimões sites than PIR [Fig. 2].
The multiple regression on distance matrices (MRM) analysis detected a significant association between the pairwise Fst matrix and both the river course distance (p-value = 0.021) and the connectivity (p-value = 0.001) matrices. The relation between the genetic distance and the water type similarity matrix was not significant (p-value = 0.571). When using both the river course distance and the connectivity matrices, 59.23 % of the dependant matrix is explained by the linear model produced. According to the one-by-one Mantel tests [Fig. S4], the pairwise genetic distances between sites are moderately correlated with the pairwise river course distances (correlation coefficient of 0.54 with a p-value of 0.004). In the same way, there is a strong correlation between the pairwise genetic distances and downstream water flow connectivity (correlation coefficient of 0.71 with a p-value of 0.001) and a non-significant correlation between genetic distances and the water type similarity matrix (correlation coefficient of 0.25 with a p-value > 0.05).

Environmental Association Study

As seen in the physicochemical parameters biplot using the five selected environmental parameters [Fig. 4], differences in water physicochemical characteristics can differentiate the two water types. Black water sites were characterized by higher DOC and Al concentrations and lower pH, while white water sites had higher amounts of silicate in suspension, as well as higher conductivity and Chl a concentration [Table 1 and Fig. 4].
All six axes of the RDA were significant (p-value < 0.05) and used for the detection of associations between the genotypes and environmental predictors. The corrected sum of the variance explained by the environmental predictors in the redundancy analysis is 4.93 %. Sample representation in the RDA according to the explanatory variables was unrelated to their respective genetic clusters [Fig. S6]. A total of 584 unique SNPs were associated to the environmental predictors in the RDA. From these, 45 were associated to aluminum concentration, 29 to productivity, 74 to conductivity, 44 to DOC concentration, 357 to silicate concentration and 35 directly to water types. For the LFMM2, a total of 367 unique SNPs had a significant p-value after the Bonferroni correction. From these, 13 were associated to aluminum concentration, 215 to productivity, 107 to conductivity, 4 to DOC concentration, 117 to silicate concentration and 24 directly to water type. For Baypass2, the neutral genetic structure estimated by the program [Fig. S8] is concordant with the Fst heatmap previously produced [Fig. 3]. A total of 307 unique SNPs had an eBPis superior to 1.5 and were considered as putatively under selection. From these, 178 were associated to aluminum concentration, 63 to productivity, 60 to conductivity, 5 to DOC concentration, 21 to silicate concentration and 15 directly to water type. From these SNPs, 172 were found in at least 2 methods and kept for the following analyses [Fig. 5].
Yet, the 172 selected SNPs resulting from our EAS are not structuring the samples according to their water type. According to the PCA using the water type associated SNPs [Fig. 6], samples are clustering according to their watershed of origin [Fig. 6B] and not according to their water type [Fig. 6C]. Samples from the two main Amazonian watersheds are well differentiated by PC1, which retains 26.56 % of the variation in the genetic matrix. Additionally, BRA (white water) is clustering with black water sites from the Rio Negro (i.e., ANA, CEM, NEG and BAR). In contrast, TEF and SOL (respectively black and white water sites) seemed to be isolated from the other Solimões River sites, which is concordant with our previous results [Fig. 2 and 3]. When compared to a PCA using the full 41,268 SNPs [Fig. S10], the general clusters stay the same. The only major difference is in the clustering of SOL and TEF with the other sites from the Solimões watershed and the higher dispersion of the sites from Rio Negro along PC2. Again, the differences in water type between sites do not seem to be the main structuring factors in the data.