Skip navigation

Outcomes in arthritis trials

Arthritis trials of NSAIDs have been poorly reported, and are usually too small to distinguish any real difference in effect or harm.


PC Gøtzsche. Reporting of outcomes in arthritis trials measured on ordinal and interval scales is inadequate in relation to meta-analysis. Ann Rheum Dis 2001 60: 349-352.


A search was made for all randomised double-blind trials that compared two of the 22 NSAIDs available in Denmark published up to August 1998. The final sample was 144 trial reports.

These were examined for a number of outcomes commonly used in arthritis research that are measured on ordinal or interval scales, and therefore potentially useful for meta-analysis (global evaluation, pain, number of tender joints and grip strength). The reports were examined to see if the information was useable in meta-analysis. The definitions were optimal and usable. Optimal was information on the original ordered categories (number in each category), and usable required information on patients in two or more ordered categories.


The main results are shown in the Table for the 144 reports.

Arthritis outcomes reported in clinical trials


Global evaluation


Joint count

Grip strength

Number of trials





Percent optimal





Percent usable





The median sample size was 60 patients, increasing in time to 110 patients per trial. The most common problem was the lack of standard deviations, or confidence intervals, or other useful statistical outcomes.


Gøtzsche comments that many of the studies are full of misleading statements unsupported by data, comments that are biases in themselves. Similar comments about "non-significantly greater than" effects, or similar, are to be found in many trials of conventional and unconventional treatments. Moreover, the size of these trials is so small that even large differences might be missed. He calculated that for one drug to be half as effective as another is not to be overlooked, a trial would need nearly 300 patients. For a difference of 25%, it would need nearly 1200 patients.

We have to be vigilant. Even meta-analysis can be useless if trials themselves are useless. Gathering small piles of junk together gives one large pile of junk. Systematic review and meta-analysis should be about picking the nuggets of gold from the dross.