Bias in critical appraisal - the example of acupuncture for stroke rehabilitation

Bandolier 80 contained a short summary of some of the ways that bias can creep into studies, and distort the results. With little exception, the bias is all one way, in that it makes the results look better than they are.

The knowledge that bias can be operating on individual studies, or on systematic reviews if they do not take potential bias into account, means that each of us, individually, has to have our own bias detectors operating continually. In meta-analysis, where data are pooled, sensitivity analysis often tests whether trials with different populations, or trials with different characteristics have different results. In systematic reviews where data are not pooled, an impression about whether a technology 'works' is often derived from vote-counting - the number of papers that says it works is bigger than the number that says it doesn't.

Suppose we have a proposal to purchase acupuncture for patients with stroke, because it apparently improves their rehabilitation. Being evidence-based, we ask whether there are any randomised trials showing efficacy. The answer we receive is that there are seven trials found from searching databases, and that six of the seven trials show acupuncture works in stroke. It's a no-brainer, we should purchase it.

We asked one of our colleagues who has been through critical appraisal training to read the papers. They tell a different story:

Our colleague tells us that this means that all the positive studies were subject to potential bias. Moreover, when they read the papers, they failed to agree with the original authors for four of the 'positive' trials: these trials were actually negative.

Table 1: Acupuncture for stroke: vote-counting with and without allowance for bias

  Conclusion of original authors Conclusion of reviewers
Potential source of bias Positive Negative Positive Negative
No source of bias considered 6 1 2 5
Double blind trials 0 0 0 0
Observer blind trials 2 1 0 3
Non blind trials 4 0 2 2
Reporting quality 3 or more 0 1 0 1
Reporting quality 2 or less 6 0 2 4
Validity score 9 or more 1 1 0 2
Validity score 8 or less 5 0 2 3
European studies 2 1 1 2
Far east studies 4 0 1 3
Reporting quality using 0-5 scale [Jadad et al, 1996]; Validity scoring using 0-16 scale [Smith et al, 2000]; Geographical definitions [Vickers et al; 1998]

So how many trials do we have that were free from potential bias and were positive? Answer is none. Not only that, but our colleague tells us that they are not really sure that it was acupuncture that was being tested. All the trials used "elecro-acupuncture". Three studies mentioned that they tuned up the voltage sufficiently to get muscle twitching, though the other studies probably did it the same way. Is this acupuncture or electrical stimulation of muscles to maintain tone?


This is a rapid appreciation of a topic that took some time for experienced people to get their heads around. To help readers, the full review is published on the Bandolier Internet site in html and downloadable PDF formats. When it comes to bias, it's worth remembering that there is a lot of it about.


