3.2 Similarity in gene expression among samples
Similarity in gene expression among biological replicates - i.e., individuals belonging to the same treatment group - gives an idea of reproducibility of our data and of the overall variation among samples. Similarity in gene expression within and among groups can be estimated using the sample correlation or Euclidean distances (see Materials and Methods for further details). Pearson correlation coefficients (r) for biological replicates were above 0.9 for the majority of comparisons (same tissue and same group), with only a few pairwise comparisons having values 0.8<r<0.9 (Supporting Information Table S2). Lower values near 0.8 were mostly due to one sample (blood, group 2) being different from the rest. This indicates that although variation in gene expression occurs among individuals, biological replicates are generally very similar.
Pearson r values among groups for each tissue type are slightly lower than what obtained for individuals belonging to the same group, but generally above 0.85 and with the majority of pairwise comparisons being above 0.90 (Supporting Information Table S2), indicating comparable levels of gene expression across tested groups for the same tissue. Also, in this case, the same sample mentioned above (blood, group 2) has lower r (>0.73) (Supporting Information Table S2). Pearson r values between the two sequencing platforms for whole mRNA-Seq samples (called NEB here below) are all above 0.87 except for the comparisons involving the blood sample from group 2 (>0.77) (Supporting Information Table S2), indicating that different sequencing methods did not influence the number of uniquely mapped reads. Finally, r among different tissues (for 3’ Tag-Seq) and among 3’ Tag-Seq vs. NEB are generally <0.5 and sometimes negative, suggesting different levels of gene expression among tissues and among the same mapped genes between the two library methods.
Heatmaps of the distance matrices for the different group comparisons provide hierarchical clustering based on sample distances. When heatmaps were made combining data from the three different tissues for 3’ Tag-Seq, we found three clusters corresponding to the three different tissues (Figure 1a). However, within each cluster, as also shown by the heatmaps built with data from each tissue separately, samples belonging to different groups are clustered together, indicating no clear difference in gene expression among the tested groups (Supporting Information Figure S2). Lack of difference in gene expression among the different groups was also found using NEB data (Figure 1). Finally, comparison of 3’ Tag-Seq vs NEB found differences in gene expression between the two methods; this difference was however not associated with any of the groups (Figure 1). Principal component analysis (PCA), another way to visualize variation in gene expression among samples, further supports the lack of differences among sampling methods and time of tissue harvesting and the differentiation between 3’ Tag-Seq versus NEB and among the three sampled tissues (Figure 2 and Supporting Information Figure S3).