loading page

The frequent insignificance of a “significant” P-value
  • David McGiffin,
  • Geoff Cumming,
  • Paul Myles
David McGiffin
Alfred Hospital

Corresponding Author:d.mcgiffin@alfred.org.au

Author Profile
Geoff Cumming
La Trobe University
Author Profile
Paul Myles
Alfred Hospital
Author Profile

Abstract

Null hypothesis significance testing (NHST) and p-values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p-values and suggest alternatives. We describe diagnostic tests, the prosecutor’s fallacy in the courtroom, and NHST, which involve inter-related conditional probabilities, to help clarify the meaning of p-values, and discuss the enormous sampling variability, or unreliability, of p-values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p-values. In clinical studies, p-values provide a poor summary of the observed treatment effect, whereas the three- number summary provided by effect estimates and confidence intervals is more informative and minimises over-interpretation of a “significant” result. P-values are an unreliable measure of strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three-number summaries) or other better techniques.
04 Aug 2021Submitted to Journal of Cardiac Surgery
04 Aug 2021Submission Checks Completed
04 Aug 2021Assigned to Editor
04 Aug 2021Editorial Decision: Accept
Nov 2021Published in Journal of Cardiac Surgery volume 36 issue 11 on pages 4322-4331. 10.1111/jocs.15960