Mindstretcher 1: Power and confidence

Many readers will remember the sense of satisfaction they felt when they first mastered the principle of the P-value and could either themselves assert, or understand when others asserted, that a particular result was "significant" or "not significant". The P-value is now largely dead and new methods of reporting significance have emerged.

The rise of the confidence interval

As the P-value has declined, so has the confidence interval increased and the confidence interval, usually set at 95% limits (though others may be used), is a visually dramatic and comprehensible way of describing the confidence that can be placed on "the result".

Confidence intervals figure largely in many systematic reviews and meta-analyses, and Bandolier has used them in previous editions.

The rise of power

More recently there has been growing interest in the power of a study. This is "the probability that a study of a given size would detect a statistically significant real difference of a given magnitude" [1]. If the difference expected is a 100% reduction in mortality, a small study will have sufficient power; if the expected reduction in mortality is much smaller, say 5%, then a very much larger study must be conducted to produce a result which will have statistical significance.

Of course, the question needs still to be addressed as to whether a statistical significance is clinically important or effective.

Post hoc power

As the concept of power becomes more widely diffused, it is becoming increasingly common to ask in journal clubs or in letters to the editors of journals "what was the power of the study to detect the observed difference?" This may seem a sensible question to ask, but an excellent and mindstretching article by Goodman & Berlin [2] points out why this question is inappropriate, and how to place reliance on confidence intervals. Those who wish to read about trials in the original reports are well-advised to read this paper, as are those who are sufficiently well-adjusted to realise that their statistical know-how needs a brush-up.

Another important paper is that by David Moher and his colleagues in Ottawa, who reviewed 383 randomised controlled trials published in 1975, '80, '85 and '90 in JAMA, Lancet and New England Journal of Medicine. Of these, 27% were classified as having negative results, but only a small fraction had sufficient power to detect relative differences of 25% or even 50%, and only 20 of the 102 reports made any statement related to the clinical significance of the observed differences.

Educational objectives

Those who wish to mindstretch should set themselves some objectives and here are some to consider. Someone who wishes to be an interpreter of the evidence should be able:

• to define and describe to people what is meant by the power of a study
• to describe what is meant by confidence intervals.

References:

1. DG Altman. Practical Statistics for Medical Research. Chapman & Hall, 1991 p455.
2. SN Goodman, JA Berlin. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine 1994 121: 200-6.
3. D Moher, CS Culberg, GA Wells. Statistical power, sample size, and their reporting in randomized controlled trials. Journal of the American Medical Association 1994 272: 121-3.

previous or next story in this issue