6. Addressing the overinterpretation of sequencing data

Amplicon sequencing data are well-suited for exploratory analysis and hypothesis generation in soil research, but can also be applied for targeted hypothesis testing if appropriate complementary and statistical methods are selected (\citep{Gloor_2017}; sections 3 and 4). As amplicon datasets from soil are characterized by compositionality, heterogeneity and sparsity, the use of standard statistical methods (including Pearson correlations or t-tests on proportions) can lead to very high false-positive discovery rates (up to 100% ; \citep{Mandal_2015,Morton_2017})⁠. Almost any soil microbiome data set will show significant correlations as the data consist of thousands of individual variables. The possibility to obtain significant results, therefore, may also lead to an abuse of the statistical significance (also referred to “p hacking”). These effects are further compounded by spatio-temporal dynamics that contribute to challenges in statistical inference from amplicon sequencing in soils (see section 5). Consequently, we ask researchers to apply caution when inferring effects or associations solely based on statistical significance. The recent discussion surrounding the abuse of p-values has resulted in alternatives and suggestions for the use of more stringent p-values to reduce the false-positive discovery rate \citep{Nuzzo_2014,Amrhein_2019,Wasserstein_2019,Benjamin_2017}. This would require an estimated dramatic increase in sample size (up to 70%), which would be costly, but could also save money in the long run that would have been spent on unsubstantiated research.
We explored the impact of sample replication on statistical power in soil microbiome analysis using a published dataset on bacterial and fungal communities that features a range of soils representative of the heterogeneity and biological diversity of soils \citep{Zheng_2019} (see supplementary methods) following the approach described in \citet{Kelly_2015}. We simulated OTU/ASV tables (see supplementary information for description of data processing) and computed the dependency of statistical power of permutational multivariate analysis of variance (PERMANOVA) on the effect size, by bootstraping the simulated matrices  with varying replicate numbers (4, 5, 8 and 10 replicates; Fig. 5). We briefly described the procedure used in the Supplementary information and address the reader to previous publication \cite{Kelly_2015} for further details and how to implement the analysis with the package 'micropower' available for R  programming language.
Figure 5a shows the statistical power to detect significant differences with increasing effect size for multiple groups (representing different sample sizes). This clearly shows that even a small increase in the sample size increases the power to detect small differences. These results are similar to the findings described in \citet{Kelly_2015} using the Human Microbiome Project (HMP) dataset with 16S rRNA marker gene data sampled at multiple body sites. To better visualize these differences, we further calculated the average statistical power for a range of effect sizes  (  ω2 ) defined as 'Low' (0.001-0.04), 'Medium' (0.04-0.08) and 'High' (0.08-0.12). Our analysis showed that the number of replicates hardly affects the statistical power if there was a strong effect of treatment/site(Fig. 5b, "High"). However, if the simulated treatment/site effect was lower, we found that an increase of the replicate number from 4 to 5 was sufficient to almost double the statistical power of small effect size ("Low") and to achieve the recommended power above 0.8 for medium effect sizes (Fig. 5b, "Low" and "Medium"). Consequently, these effects were more pronounced when the number of replicates was doubled (4 to 8; Fig. 5b). Identical effects were observed for the fungal data set (Fig. S1bc). 
In practice, obtaining knowledge about the level of differences in soil microbial communities a priori is a complicated undertaking. If preliminary sequencing data is available we encourage researchers to perform such power analyses before experimental planning. Such considerations should also include the amount of technical replicates that will be pooled to alleviate the spatial heterogeneity of soils (see section 5). We refer to further literature on experimental planning and robust statistical analyses (e.g., \citealt{Coenen_2020,Kelly_2015,Johnson_2014}).