José Antonio Villacorta Atienza -

Statistical inference traditionally relies on p-values, assessing the alignment between data and the absence of effect. Nevertheless, in large datasets, p-values lose relevance, marking minor differences as statistically significant. Thus, it becomes imperative in large data to evaluate the practical, clinical, or biological effect's magnitude. Non-dimensional metrics like Cohen's d, allow for general comparisons, but they can obscure practical meaning. Dimensional metrics, such as confidence intervals, lack standardization and may complicate practical interpretation. We propose a novel approach termed the similarity structure for characterizing differences in large samples focused on the probability distribution of two subsamples being of size N, given that they are similar (statistically non-different). This quantifies the effect size as the expected sample size when similarity exists, irrespective of data nature, dimensionality, or hypothesis testing. Additionally, it can be translated into common measures like Cohen's d and required sample sizes for a statistical power of 0.9. Furthermore, the similarity structure allows to statistically compare effect sizes, assessing the importance of the factors involved in sample differences. The similarity structure allows for the transparent and versatile assessment, interpretation, and comparison of effect sizes, contributing to more comprehensible and reproducible scientific research. This approach is demonstrated with real-world examples.