Skip navigation
Link to Back issues listing | Back Issue Listing with content Index | Subject Index

Class and equivalence

Levels of evidence for efficacy
Levels of evidence for safety
Control shown previously to be effective?
Patients and outcomes similar to original trials?
Regimens applied in identical fashion?
Appropriate statistical analysis?
Prespecified equivalence margin?

Class (noun); any set of people or things grouped together or differentiated from others. An increasingly asked question is that of whether a set of drugs forms a class, and whether there is a 'class effect'. Class effect is usually taken to mean similar therapeutic effects and similar adverse effects, both in nature and extent. If such a 'class effect' exists, then it makes decision-making easy: you choose the cheapest.

Criteria for drugs to be grouped together as a class involve some or all of the following:

Declaring a class effect requires a bit of thought, though. How much thought, and of what type, has been considered in one of that brilliant JAMA series on users guides to the medical literature [1]. No one should declare a class effect and choose the cheapest without reference to the rules of evidence set out in this paper.

Levels of evidence for efficacy

These are shown in Table 1, though if it comes down to levels 3 and 4 evidence for efficacy, the ground is pretty shaky. Level 1 evidence is what we always want and almost always never get, the large randomised head to head comparison. By the time there are enough compounds around to form a class, there is almost no organisation interested in funding expensive, new, trials to test whether A is truly better than B.

Table 1: Levels of evidence for efficacy for class effect

Level Comparison Patients Outcomes Criteria for validity
1 RCT direct comparison Identical Clinically important Randomisation concealment
Complete follow up
Outcome assessment must be sound
2 RCT direct comparison Identical Valid surrogate Level 1 plus
Validity of surrogate outcome
2 Indirect comparison with placebo from RCTs Similar or different in disease severity or risk Clinically important or valid surrogate Level 1 plus
Differences in methodological quality
End points
Baseline risk
3 Subgroup analyses from indirect comparisons of RCTs with placebo Similar or different in disease severity or risk Clinically important or valid surrogate Level 1 plus
Multiple comparisons, post hoc data dredging
Underpowered subgroups
Misclassification into subgroups
3 Indirect comparison with placebo from RCTs Similar or different in disease severity or risk Unvalidated surrogate Surrogate outcomes may not capture all good or bad effects of treatment
4 Indirect comparison of nonrandomised studies Similar or different in disease severity or risk Clinically important Confounding by indication, compliance, or time
Unknown or unmeasured confounders
Measurement error
Limited database, or coding systems not suitable for research

Most of the time we will be dealing with randomised trials of A versus placebo or standard treatment and B versus placebo or standard treatment. This will be level 2 evidence based on clinically important outcomes (a healing event) or validated surrogate outcomes (reduction of cholesterol with a statin). So establishing a class effect will likely involve quality systematic review or meta-analysis of quality randomised trials.

What constitutes quality in general is captured in Table 1, though there will be some situation-dependent factors. One thing missing from Table 1 is size. There probably needs to be some prior estimate of how many patients or events constitutes a reasonable number for analysis.

Levels of evidence for safety

These are shown in Table 2. There are always going to be problems concerning rare, but serious, adverse events. The inverse rule of three tells us that if we have seen no serious adverse events in 1500 exposed patients, then we can be 95% sure that they do not occur more frequently than 1 in 500 patients.

Table 2: Levels of evidence for safety for class effect

Level Type of study Advantages Criteria for validity
1 RCT Only design that permits detection of adverse effects when the adverse effect is similar to the event the treatment is trying to prevent Underpowered for detecting adverse events unless specifically designed to do so
2 Cohort Prospective data collection, defined cohort Critically depends on follow up, classification and measurement accuracy
3 Case-control Cheap and usually fast to perform Selection and recall bias may provide problems, and temporal relationships may not be clear.
4 Phase 4 studies Can detect rare but serious adverse events if large enough No control or unmatched control
Critically depends on follow up, classification and measurement accuracy
5 Case series Cheap and usually fast Often small sample size, selection bias may be a problem, no control group
6 Case report(s) Cheap and usually fast Often small sample size, selection bias may be a problem, no control group

Randomised trials of efficacy will usually be underpowered to detect rate, serious adverse events, and we will usually have to use other study designs. In practice the difficulty will be that soon after new treatments are introduced there will be a paucity of data for these other types of study. Only rarely will randomised trials powered to detect rare adverse events be conducted.

Most new treatments are introduced after being tested on perhaps a few thousand patients in controlled trials. Caution is needed in treatments for chronic conditions, especially difficult if trials are only short-term and where other diseases and treatments are likely.


A difficult issue this, with a fragmented literature. But we do know that while compliance is usually high in clinical trials it may be lower in practice. Treatment schedules that are likely to improve compliance (once a day, for instance) might be important.


Economic studies are complicated beasts, and we need to treat this evidence with caution. Assumption of a class effect is usually done to justify choosing the cheapest drug in terms of acquisition (prescribing) costs. Terrific if this means that the costs of achieving the same ends are minimised. It may not be like that, and health economics in class effects need to be carefully thought through.


This paper uses statins as an example, with a decision being taken by clinician and policymaker between older, more expensive statins, and newer, cheaper, statins. Tactfully one chooses the cheaper statin with less information, and the other the older and more expensive statin with masses of patient experience. Can you guess who chose what?

Bandolier 47 examined the evidence for some of the older statins, with up to 27,000 years of patient experience and made the point that weight of evidence should be as important as acquisition cost. Having this paper to hand at the time would have been a great help.


McAlister & Sackett extend their thoughts on class effects to the particular example of equivalence trials, and provide some useful guides about what features of equivalence trials are important in determining their validity [2]. The intellectual problem with equivalence (A versus B) trials is that the same result is consistent with three conclusions:

  1. Both A and B are equally effective
  2. Both A and B are equally ineffective
  3. Trials inadequate to detect differences between A and B

To combat the problems posed by the latter two conclusions, McAlister & Sackett suggest several criteria in addition to those used for superiority trials (A and/or B versus placebo). These are shown in Table 3.

Table 3: Evidence quality for superiority and active-control equivalence trials

Superiority trials Active-control equivalence trials
Randomised allocation Randomised allocation
Randomisation concealed Randomisation concealed
All patients randomised accounted for All patients randomised accounted for
Intention to treat analysis Intention to treat analysis and on-treatment analysis
Clinicians and patients blinded to treatment received Clinicians and patients blinded to treatment received
Groups treated equally Groups treated equally
Groups identical at baseline Groups identical at baseline
Clinically important outcomes Clinically important outcomes
Active control previously shown to be effective
Patients and outcomes similar to trials previously showing efficacy
Both regimens applied in an optimal fashion
Appropriate null hypothesis tested
Equivalence margin pre-specified
Trial of sufficient size Trial of sufficient size

Control shown previously to be effective?

Ideally documented in a systematic review of placebo controlled trials with benefits on active drug exceeding a clinically important effect. Without this information both may be equally ineffective.

Patients and outcomes similar to original trials?

Obvious, this one. If they are not, then any conclusion about equivalence is doomed. Beware, though, trials designed to show equivalent efficacy being used to demonstrate differences in harm or toxicity, for which they were not powered.

Regimens applied in identical fashion?

The most common example is that of choosing the best dose of A versus an ineffective dose of B (no names, no pack drill, but no prizes for picking out numerous examples especially from pharmaceutical company sponsored trials showing 'our drug is better than yours'). Should be OK if licensed doses are chosen.

Other pitfalls to look out for are low compliance or frequent treatment changes, incomplete follow up, disproportionate use of cointerventions and lack of blinding.

Appropriate statistical analysis?

Equivalence trials are designed to rule out meaningful differences between two treatments. Often one-sided tests of difference are used. Lack of significant superiority is not necessarily the same as defining an appropriate level of equivalence and testing for it.

Intention to treat analysis confers the risk of making a false-negative conclusion that treatments have the same efficacy when they do not. In equivalence trials the conservative approach may be to compare patients actually on treatment. Both analyses should probably be used.

Prespecified equivalence margin?

How different is different? Equivalence trials should have a prior definition of how big a difference is a difference, and justify it. Even more than that, they have to convince you that the lack of that difference means that treatments would, in fact, be equivalent.


Most equivalence trials do not have enough power to detect even a 50% difference between treatments, and a 1994 review [3] found that 84% were too small to detect a 25% difference. Size is everything when we want to show no difference, and the smaller the difference that is important, the larger the trial has to be.


McAlister & Sackett apply their methodological criteria to four large equivalence trials in hypertension. All had failings, and none could detect a 10% difference between treatments. Readers of equivalence trials should beware.

Designating a class effect on a group of drugs, and judging them to be equivalent on inadequate evidence is something most of us do at some time or another. Because prescribing costs often drive decisions, 'cheapest is best' thinking often applies. Much of the time we will make incorrect decisions, but fortunately won't have the evidence to know that we are wrong. This is important and tricky territory that needs more work.


  1. FA McAlister et al. Users' guides to the medical literature XIX Applying clinical trial results B. Guidelines for determining whether a drug is exerting (more than) a class effect. JAMA 1999 282: 1371-1377.
  2. FA McAlister & DL Sackett. Active-control equivalence trials and antihypertensive agents. American Journal of Medicine 2001 111: 553-558.
  3. D Moher et al. Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 1994 272: 122-124.
previous or next story in this issue