Describing results of trials and reviews
Most of the outputs that we use for reporting trials and reviews have their origins in epidemiology, the world where we look for small effects in large populations  things like aspirin after a heart attack, or reducing cholesterol. Most of the activity of medicine is conversely about large effects in small populations, like hip replacements for osteoarthritic joints, or anaesthesia, or pain relief, or antibiotics for infection. So our example is one most of us will be familiar with.
Table 1 is a hypothetical trial of ibuprofen in acute pain. Not worrying too much at this stage about any other features or even the result itself, we will use this trial to present some of the more common definitions for presentation of results where information is available in dichotomous form. Dichotomous means the patient had the outcome or did not, and we have the numbers for each. In this trial, for instance, 22 of 40 patients given ibuprofen had adequate pain relief compared with only 7 of 40 given placebo. The term experimental event rate (EER) is used to describe the rate that good events occur with ibuprofen (22/40, or 55%) and control event rate (CER) to describe the rate that good events occur with placebo (7/40 or 18%).
Odds ratios
This Table shows first how to compute odds. Odds refers to the ratio of the number of people having the good event to the number not having the good event, so the experimental event odds are 22/18 or 1.2. The odds ratio is the ratio of the odds with experimental treatment and that of control, or here 1.2/0.21 = 5.7. There are lots of different ways of computing odds ratios that give slightly different answers in different circumstances. Values greater than 1 show that experimental is better than control, and if a 95% confidence interval is calculated, statistical significance is assumed if the interval does not include 1.
Some would change this around and compute the odds ratios from the point of view of the patients not having adequate pain relief. The experimental event odds would be 18/22 or 0.82, and the control event odds would be 33/7 or 4.7. The odds ratio then would be 0.82/4.7 = 0.17.
For ibuprofen versus placebo the odds ratio is 5.7 or 0.17. Pick the bones out of that. How would you use that, other than knowing that an odds ratio that was far from 1 meant that ibuprofen was better than placebo.
Table 1: Results of hypthetical randomised trial




Ibuprofen 400 mg 



Placebo 





Experimental event rate (EER, event rate with ibuprofen) 


Control event rate (CER, event rate with placebo) 


Experimental event odds 


Control event odds 


Odds ratio 


Relative risk (EER/CER) 


Relative risk increase (100(EERCER)/CER ) as a percentage 


Absolute risk increase or reduction (EERCER) 


NNT (1/(EERCER)) 

Relative risk or benefit
Relative risk is a bit easier on the brain. It is simply the ratio of EER to CER, here 0.55/0.18 (or 55/18 for percentages), and is 3.1. Again values greater than 1 show that experimental is better than control, and if a 95% confidence interval is calculated, statistical significance is assumed if the interval does not include 1. Odds ratios and relative risk often give the same numerical value when they are low, but not when high. There is disagreement between eminent statisticians about which of these is best. We use relative risk, but wouldn't pick a fight with someone who preferred odds ratios.
Again, knowing that the relative risk is 3.1 is not intuitively useful. Both relative risk and odds ratio are important ways of ensuring that there is statistical significance in our result. Unless there is statistical significance, we should not be using a treatment except in exceptional circumstances. So whatever else we do in the way of data manipulation, one or other of these tests has primacy for giving us the right to move on.
Relative risk reduction or increase
The relative risk reduction is the difference between the EER and CER (EERCER) divided by the CER, and usually expressed as a percentage. In Table 1 the relative risk increase is 206%. If the number of events is smaller with treatment, then the relative risk reduction is calculated by subtracting the CER from EER in the equation.
Absolute risk increase or reduction
If we subtract the CER from the EER (EERCER) then we have the absolute risk increase (ARI), the effect due solely to ibuprofen, and nothing else. The language here doesn't quite work because it was originally taken from the world of epidemiology where reducing risk (cholesterol lowering etc) is all. The absolute risk reduction (ARR) is CEREER, when events occur more often with control than they do with treatment.
Number needed to treat (NNT)
For every 100 patients with acute pain treated with ibuprofen, 37 (55  18) will have adequate pain relief because of the ibuprofen we have given them. Clearly then, we have to treat 100/37, or 2.7 patients with ibuprofen for one to benefit because of the ibuprofen they have been given. That's what NNT is (Table 1). This has immediate clinical relevance because we immediately know what clinical and other effort is being made to produce one result with a particular intervention.
The best NNT would be 1, where everyone got better with treatment and nobody got better with control, and NNTs close to 1 can be found with antibiotic treatments for susceptible organisms, for instance. Higher NNTs represent less good treatment, and the NNT is a useful tool for comparing two similar treatments. When doing so the NNT must always specify the comparator (e.g., placebo, no treatment, or some other treatment), the therapeutic outcome, and the duration of treatment necessary to achieve that outcome. If these are different, you probably should not be comparing NNTs. It is also worth mentioning that prophylactic interventions that produce small effects in large numbers of patients will have high NNTs, perhaps 20100. Just because an NNT is large does not mean it will not be a useful treatment.
We can use the same methods for adverse events, when numbers needed to treat become numbers needed to harm (NNH). Here small numbers are bad (more frequent harm) and larger numbers good. When making comparisons between treatments, the same provisos apply as for NNT, especially that for definition.
For both NNT and NNH we should recognise that we are working with an unusual scale which runs from 1 (everyone has outcome with treatment and none with control) to 1 (noone has outcome with treatment and everyone has it with control), with infinity as the mid point where we divide by zero when EER equals CER. Once NNTs are NNHs are much above 10 the upper confidence interval gets closer to infinity and the upper and lower intervals look unbalanced.
Other outputs
There are masses of other outputs that people use for trials and epidemiological studies. These include effect size, relative risk reductions and so on. We don't find these useful, but there will always be circumstances in which they are the appropriate outputs.