Most of us can recall the exam question that starts "Compare and contrast...." This month Bandolier has selected two systematic reviews which make good reading for comparing and contrasting those things which make them good and bad. Below we give Bandolier's views, but others will have their own opinions, and the two together could make good teaching material.

Dynamic exercise therapy

This review, from Holland, examined dynamic exercise therapy in rheumatoid arthritis [1]. It sought randomised trials about using exercise for at least six weeks through an extensive searching process, and all studies were examined using a 10-point quality score that Dutch reviewers use.

They found six studies, and tell us all about them. In particular they examined the different outcome measures, including both surrogate measures like aerobic capacity, measures of disease activity like pain or ESR, and functional outcomes like walking distance.

The results are disappointing, with some statistical differences with surrogate markers, but no consistent beneficial effects for measures of disease activity or functional ability. What the authors do, however, in discussing the results, is to suggest that the numerous (75) methods of describing outcomes in trials like this needs to be sorted out, so that results may be interpreted better in the future.

Acupuncture in dental pain

Possibly the first systematic review of acupuncture in dental pain [2] again used a comprehensive searching strategy for relevant papers. They were scored using a reliable 5-point scale. Included studies had to be "controlled, conducted in humans and if they tested acupuncture as a treatment of dental pain".

They found 16 trials, of which 11 were randomised and 3 were double blind. Five studies scored zero on the five-point scale, while three scored three points, and none scored four or five points. Detailed tables give information about the 16 trials, their design, the outcomes and the main results for pain and adverse effects.

The conclusion, without pooling data, was that acupuncture can alleviate dental pain, and that future research should concentrate on the best technique and the relative efficacy of acupuncture to conventional treatment.

Compare and contrast

What we have here is two systematic reviews, both well searched, one examining an intervention in a chronic painful condition, and the other an intervention in an acute painful condition. They use different quality scores, but in much the same way. One, on exercise therapy, accepts only randomised studies but includes those where concealment of allocation is not hidden (date of birth, for example). Because the intervention cannot be blinded, this may be less of a problem than in other circumstances.

Both, in effect, use a vote count, noting the number of studies which come up with a statistical significance for some outcome in favour of the intervention. Neither comments on the balance between positive and negative findings for different outcomes, and may bias us because of that. The review on dynamic exercise testing at least gives us all the results on which to make up our own minds.


The review on dynamic exercise therapy bemoans the lack of consistent criteria on which to base a successful outcome. This is essentially a lack of rules or validity criteria. The review of acupuncture ignores established validity criteria for acute pain measurements.

In acute pain we know that for assays to be valid pain has to be moderate or severe in intensity, that established pain measuring methods must be used for between four and six hours, that trials have to be both randomised and double blind, that pain during a procedure may be different from pain after a procedure, and that experimental pain is not clinical pain.

For acupuncture that leaves us with three randomised, double-blind studies out of the 16. Of those one is of three hours duration only, one is pain during drilling, and one is experimental dental pain. Actually not one of these studies would get into a review of trials with the sort of standards of experimentation that we expect for a new analgesic.


There are deficiencies in these reviews. We know how important things like randomisation and blinding are for eliminating bias. For instance we reported in Bandolier 37 on transcutaneous electrical nerve stimulation (TENS) that 17 out of 19 non-random studies showed it worked, while 15 out of 17 randomised studies said it didn't.

Yet both these reviews were inclined to accept information from either non-randomised or inadequately randomised studies. Particularly for subjective measures that is inappropriate. Where there is bias, it is not surprising that it all goes the same way, so vote-counting of biased studies is again inappropriate.

It all comes down to quality. The use of quality standards, including known rules for valid trials, is like using a searchlight on a dark night. These two reviews both fail to use the searchlight, in interesting, but different, ways.


