Results

Phylogeographic analysis

            Admixture results considering different number of clusters (2-4) provide compelling information about the genetic structure of the 12 sites sampled [Fig. 2]. When using [K = 2], the sites from Rio Negro and Rio Solimões are strongly differentiated. Only CAT and JAR, two sites located downstream from the Rio Negro, have a weak admixture with Rio Negro sites. When using [K = 3], the two sites located upstream of the Rio Negro (BAR and NEG) are differentiated, but still show some admixture with sites downstream (CEM and ANA). Nevertheless, BRA does not share posterior membership probability with BAR and NEG even though these sites are close geographically. When using [K = 4], sites located far upstream of the Rio Solimões (SOL and TEF) are differentiated from the rest of the Rio Solimões sites. But PIR, a site even further upstream the Rio Solimões, is not differentiated from other downstream lakes. Each subsequent increase of the K-value up to eight led to the differentiation of a new site, [K = 5] differentiates MAN, [K = 6] differentiates BRA, [K = 7] differentiates PIR and [K = 8] differentiates JAR [Fig. S1].
            According to the results from the cross-validation error values from ADMIXTURE [Fig. S2], the optimal number of clusters for our genetic data is three. Nevertheless, this cross-validation value 0.20966 is close to the one obtained with four clusters 0.20997. The “find.cluster” function from Adegenet led to a similar result since its goodness of the fit (BIC) values reduced more slowly at the fourth cluster [Fig. S3]. Additionally, the posterior membership probability plots [Fig. S1] stopped forming biologically significant clusters after [K = 4], differentiating only one sampling site at the time when adding more clusters. For this reason, we completed the phylogeographic analysis assuming that our full SNPs dataset is represented by four genetic clusters.
            According to the Admixture results [Fig. 2], some sites that are close to each other are showing a strong genetic differentiation with each others. For instance, there is a first genetic gap (i.e., a disproportional genetic distance compared to the river course distance separating sites) at the confluence of the Rio Negro and Rio Solimões. Additionally, there is another second genetic gap between BRA and NEG, and a third one between TEF and PIR. These genetic gaps are detectable in every neutral population structure analysis that we have produced; the Admixture results [Fig. 2] and the high pairwise Fst values [Fig. 3].
            In the first genetic gap, white water sites are always more closely related to other white water sites, and vice versa [Fig. 2]. However, the inverse is observed in the second genetic gap, where BRA (white water) has a higher relatedness with ANA (black water) than NEG (black water) has [Fig 2 and 3]. This migration pattern, migrating downstream preferentially from white water to black water, is inverse to the one observed at the Rio Negro-Solimões confluence (The first genetic gap). The third genetic gap is at the confluence of the Rio Tefé and Solimões. Effectively, the genetic distance between SOL and PIR is disproportionately big when compared to the small geographic distance separating the sites. Additionally, SOL (white water) is more closely related to TEF (black water) than PIR (white water). This result is detectable in the Admixture results [Fig. 2], where PIR shares a common posterior membership with other downstream Rio Solimões sites, and TEF and SOL are clustered apart. However, according to the pairwise linearized Fst/(1-Fst) heatmap, TEF and SOL are not much more genetically distant to other downstream Rio Solimões sites than PIR [Fig. 3].
            The multiple regression on distance matrices (MRM) detected a significant association between the pairwise linearized Fst/(1-Fst) matrix and both the river course distance (p-value = 0.021) and the connectivity (p-value = 0.001) matrices. The relation between the genetic distance and the water type similarity matrix was not significant (p-value = 0.571). When using both the river course distance and the downstream connectivity matrices, 59.23 % of the dependent matrix is explained by the linear model produced. According to the one-by-one Mantel tests [Fig. S4], the pairwise genetic distances between sites are moderately correlated with the pairwise river course distances (correlation coefficient of 0.54 with a p-value of 0.004). In the same way, there is a strong correlation between the pairwise genetic distances and downstream water flow connectivity (correlation coefficient of 0.71 with a p-value of 0.001) and a non-significant correlation between genetic distances and the water type similarity matrix (correlation coefficient of 0.25 with a p-value > 0.05).

Environmental Association Study

As seen in the physicochemical parameters biplot using the five selected environmental parameters [Fig. 4], differences in water physicochemical characteristics can differentiate the two water types. Black water sites were characterized by higher DOC and Al concentrations and lower pH, while white water sites had higher amounts of silicate in suspension, as well as higher conductivity and Chl a concentration [Table 1 and Fig. S5].
All six axes of the RDA were significant (p-value < 0.05) and used for the detection of associations between the genotypes and environmental predictors. The corrected sum of the variance explained by the environmental predictors in the redundancy analysis is 4.93 %. Sample representation in the RDA according to the explanatory variables was unrelated to their respective genetic clusters [Fig. S6]. A total of 584 unique SNPs were associated to the environmental predictors in the RDA. From these, 45 were associated to aluminum concentration, 29 to productivity, 74 to conductivity, 44 to DOC concentration, 357 to silicate concentration and 35 directly to water types [Fig. S7]. For Baypass2, the neutral genetic structure estimated by the program [Fig. S8] is concordant with the Fst heatmap previously produced [Fig. 3]. A total of 307 unique SNPs had an eBPis superior to 1.5 and were considered as putatively under selection. From these, 178 were associated to aluminum concentration, 63 to productivity, 60 to conductivity, 5 to DOC concentration, 21 to silicate concentration and 15 directly to water type. In 35 occurrences, SNPs were associated to two environmental variables. For the LFMM2, a total of 367 unique SNPs had a significant p-value after the Bonferroni correction. From these, 13 were associated to aluminum concentration, 215 to productivity, 107 to conductivity, 4 to DOC concentration, 117 to silicate concentration and 24 directly to water type [Fig. S9]. In 113 occurrences, SNPs were associated to multiple environmental variables. From these SNPs, 172 were found in at least 2 methods and kept for the following analyses [Fig. 5].
            Yet, the 172 selected SNPs resulting from our EAS are not structuring the samples according to their water type. According to the PCA using the water type associated SNPs [Fig. 6], samples are clustering according to their watershed of origin [Fig. 6B] and not according to their water type [Fig. 6C]. Samples from the two main Amazonian watersheds are well differentiated by PC1, which explains 26.56 % of the variation in the genetic matrix. Additionally, BRA (white water) is clustering with black water sites from the Rio Negro (i.e., ANA, CEM, NEG and BAR). In contrast, TEF and SOL (respectively black and white water sites) seemed to be isolated from the other Solimões River sites, which is concordant with our previous results [Fig. 2 and 3]. When compared to a PCA using the full 41,268 SNPs [Fig. S10], the general clusters stay the same. The only major difference is in the clustering of SOL and TEF with the other sites from the Solimões watershed and the higher dispersion of the sites from Rio Negro along PC2. Again, the differences in water type between sites do not seem to be the main structuring factors in the data.