Skip navigation
Link to Back issues listing | Back Issue Listing with content Index | Subject Index

Mindstretcher 1 - quality and size

Quality and size [1]
Size [3]

Early in the reign of Augustus, Dionysius of Halicarnasus commented that "history is philosophy from examples". We think of evidence in much the same way, in seeking examples from the archaeology of medicine to learn what constitutes good science and what bad, perhaps leavened here and there with a bit of real philosophy and science. Theory tells us that randomisation is good, and examples from reviews frequently confirm it. Yet we are condemned to re-learn the lessons because so many systematic reviews include trials whose architecture potentially misleads.

And it's not just about architecture, because size is also important. Indeed, the two are linked, so some helpful revisiting of the issues of trial quality [1] and size [1,4] revivify our knowledge of these matters.

Quality and size [1]

Two Danish researchers looked for large clinical trials with at least 1,000 patients together with meta-analyses of small trials. They were asking the sensible question about how possible discrepancies between large trials and meta-analyses could be affected by methodological quality.

They found 14 meta-analyses, pulled all the original papers, subjected those to quality review, and examined outcomes in terms of odds ratios. They then used the ratio of the odds ratio in the large randomised trial to that from the meta-analysis of small trials to produce a "ratio of odds ratios" as the final outcome. When the ratio of odds ratios was significantly less than 1, that indicated that small trials with particular quality criteria exaggerated the effect of an intervention compared with the large trial.

The quality criteria they tested for were generation of the allocation sequence, allocation concealment, double blinding, and withdrawals or dropouts. The relevant criteria are in Table 1.

Table 1: Quality criteria tested

Quality feature Adequate Inadequate
Generation of the allocation sequence computer-generated random number or similar not described
Allocation concealment central independent unit, sealed envelope, or similar not described, or open table of random numbers
Double blinding identical placebo or similar not described, or tablets versus injection not double dummy
Withdrawals or dropouts number and reasons for drop outs not described


They used 23 large trials and 167 small trials with 136,000 patients. Compared with large trials, small trials with inadequate generation or allocation concealment of the randomisation sequence, or those that were not adequately double blinded over-estimated the effect of treatment (Table 2). When methodological quality was compared in large and small trials, inadequate generation of the randomisation sequence and inadequate double-blinding caused over-estimation of the treatment effect (Table 3), and much the same was found for a similar analysis of small trials alone.

Table 2: Comparison of large trials with small trials with different quality criteria

Common comparator Comparison Ratio of odds ratios
Large trials Small trials with inadequate generation of allocation sequence 0.46 (0.25 to 0.83)
Large trials Small trials with adequate generation of allocation sequence 0.90 (0.47 to 1.76)
Large trials Small trials inadequate allocation concealment 0.49 (0.27 to 0.86)
Large trials Small trials adequate allocation concealment 1.01 (0.48 to 2.11)
Large trials Small trial with inadequate or no double blinding 0.52 (0.28 to 0.96)
Large trials Small trial with adequate or no double blinding 0.84 (0.43 to 1.66)
Large trials Small trials with inadequate follow up 0.72 (0.30 to 1.71)
Large trials Small trials with adequate follow up 0.58 (0.32 to 1.02)
When the ratio of the odds ratios is less than 1, it indicates that the feature (inadequate blinding, for example) exaggerates the intervention effect

Table 3: Comparison of adequate versus inadequate quality criteria in large and small trials

Common comparator Comparison Ratio of odds ratios
Adequate Inadequate generation of allocation sequence 0.49 (0.30 to 0.81)
Adequate Inadequate allocation concealment 0.60 (0.31 to 1.15)
Adequate Inadequate or no double blinding 0.56 (0.33 to 0.98)
Adequate Inadequate follow up 1.50 (0.80 to 2.78)
When the ratio of the odds ratios is less than 1, it indicates that the feature (inadequate blinding, for example) exaggerates the intervention effect

Quality scoring using the Oxford system [2], perhaps one of the most commonly used scoring systems in systematic reviews, produced sensible results. Small trials with lower quality scores over-estimated treatment effects compared with large trials. Small trials with higher quality scores did not. With both large and small trials, treatment effects were exaggerated with low versus high quality scores.

Size [3]

It is obvious that if we have a very small amount of information, from few patients, that the effects of random chance can be significant. As the amount of information or number of patients increases, then the effects of chance will diminish. In some circumstances, like acute pain trials, we can define how much information is needed for us to be confident not just that a treatment works, but how big is the effect of that treatment [3].

Confirmation that our estimate of the effect of treatment can be heavily dependent on size comes from a study from the USA and Greece [4]. Researchers looked at 60 meta-analyses of randomised trials where there were at least five trials published in more than three different calendar years. They were in either pregnancy and perinatal medicine or myocardial infarction.

For each meta-analysis trials were chronologically ordered by publication year and cumulative meta-analysis performed to arrive at a pooled odds ratio at the end of each calendar year. The relative change in treatment effect was calculated for each successive additional calendar year by dividing the odds ratio of the new assessment with more patients by the odds ratio of the previous assessment with fewer patients. This gives a "relative odds ratio", in which a number greater than 1 indicated more treatment effect, and one less than 1 indicates less treatment effect.

The relative odds ratio can be plotted against the number of patients included. The expected result is a horizontal funnel, with less change with more patients, and the relative odds ratio settling down to 1.


In the paper, the two graphs for pregnancy/perinatal medicine and myocardial infarction showed exactly this expected pattern, but are just impossible to reproduce here. Below 100 patients the relative odds ratios varied between 0.2 and 6. By the time 1000 patients were included they were between 0.5 and 2. By 5,000 patients they settle down close to 1. The 95% prediction interval for the relative change in the odds ratio for different numbers for both examples is shown in Table 4.

Table 4: 95% prediction interval for relative change in odds ratio for different numbers of accumulated patients randomised

Fixed effect prediction interval for relative change in odds ratio
Number of patients Pregnancy/perinatal Myocardial infarction
100 0.32 - 2.78 0.18 - 5.51
500 0.59 - 1.71 0.60 - 1.67
1000 0.67 - 1.49 0.74 - 1.35
2000 0.74 - 1.35 0.83 - 1.21
15000 0.85 - 1.14 0.96 - 1.05

When evidence was based on only a few patients there was substantial uncertainty about how much the pooled treatment effect will change in the future. With only 100 patients randomised, additional information from more trials could multiply or divide the odds ratios at that point by three.


At first look this is all complicated pointy-head stuff, but actually it's no more than simple common sense. If trials are not done properly, they might be wrong. If trials are small, they might be wrong. To be sure of what we know we need large data sets of high quality, whether from single trials or meta-analyses. The corollary is that if we have small amounts of information, or information of poor quality, the chance of that result being incorrect is substantial, and then we need to be cautious and conservative.

Cynics might say that much decision-making in healthcare is done on small amounts of inadequate information. They may be right, but knowing that that information may be misleading is still helpful, because we know that we need to examine what we do in practice to check that it conforms with what we thought we started out with. Suspending belief is not an option.


  1. LL Kjaergard & C Gluud. Reported methodologic quality and discrepancies between large and small randomised trials in meta-analyses. Annals of Internal Medicine 2001 135: 982-989.
  2. AR Jadad et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Controlled Clinical Trials 1996 17: 1-12.
  3. RA Moore et al. Size is everything - large amounts of information are needed to overcome random effects in estimating direction and magnitude of treatment effects. Pain 1998 78: 209-16.
  4. JP Ioannidis & J Lau. Evolution of treatment effects over time: empirical insight from recursive metaanlyses. Proceedings of the National Academy of Sciences 2001 98: 831-836.
previous or next story in this issue