Skip navigation
Link to Back issues listing | Back Issue Listing with content Index | Subject Index

Ottawa ankle rules revisited


As long ago as Bandolier 21 the Ottawa ankle rules were featured in these pages as an example of how clinical diagnostic tests can be assessed and evaluated. Since then the rules have been evaluated in numerous studies, so that we now have a meta-analysis [1]. Bandolier also thought that this might be a useful opportunity to contrast traditional methods of describing diagnostic tests (sensitivity, specificity, likelihood ratios) with the natural frequency methods suggested by Gerd Gigerenzer ( Bandolier 109 ).


The review used a systematic search for studies of the Ottawa ankle and foot rules, using several databases and without language restriction. For each study information was sought on methodological issues, such as whether enrolment was consecutive, whether radiologists were blinded, and whether radiography was used in all patients. Pooled assessment was made for sensitivity, but not specificity, and negative likelihood ratios calculated, with sensitivity analyses.


In total 32 studies were found, some looking at the ankle rules, some at the foot rules, some at both, and while most were in adults, some were in children. In 27 studies with data for a pooled analysis, out of 15,581 patients, 27 (0.3%) had a false negative result where the Ottawa test was negative, but where they actually had a fracture.

Overall the pooled sensitivity (percentage with a fracture testing positive and correctly classified as such) was 98%, and most studies achieved very high levels of sensitivity. Specificity (the percentage without a fracture who tested negative) was highly variable, some studies being as low as about 10%, with most at about 40%, and some as high as about 70%.

The likelihood ratio for a negative test was about 0.1, meaning that with a fracture prevalence of about 10%, the chance of there actually being a fracture was about 1 in 100.


The Ottawa ankle and foot rules were designed to minimise the number of radiographs needed. Specificity was the key to this. Table 1 shows the calculations for a cohort of 1,000 persons, in whom there were 100 actual fractures, applied to the best, average, and worse specificity values found in the review. As specificity declined, many more positive tests, most of them false, would be found, requiring more radiographs. As specificity declines, the reason for the clinical diagnostic test diminishes. The ideal specificity of about 0.9 would yield only about 200 tests out of 1000 people. The actual results required between 350 and 900.

Table 1: Results of studies of Ottawa ankle and foot rules, using an ideal scenario, and best, average, and worst specificities from systematic review




Number of tests

Chance of a fracture in test








Ideal specificity 0.98 0.9 198 802 1 in 2 1 in 400
Best specificity 0.98 0.7 348 652 1 in 4 1 in 326
Average specificity 0.98 0.4 648 352 1 in 7 1 in 176
Worst specificity 0.98 0.1 898 102 1 in 9 1 in 51
Outcomes predicted from a cohort of 1000 people presenting with possible fractured ankle, in which 100 (10%) actually have a broken ankle

With the best the probability of a fracture with a negative test was 1 in 330, and with a positive test was 1 in 4. High probabilities are best with a negative test, and low probability best for a positive test. The ideal result would mean that 1 in 2 people with a positive test would actually have a fracture, and only 1 in 400 with a negative test.

Figure 1: Ottawa ankle and foot rules using natural frequencies and "best" results for specificity from systematic review


Figure 1 shows how this looks for the example of the best specificity found in the review, using natural frequencies. It allows the calculation of these probabilities rather easily. Now Bandolier has always had problems with sensitivity, specificity, and likelihood ratio definitions. Each time we come to them we open David Sackett's books and start from scratch, and have to use our bespoke Excel sheet to do the calculations. And even then it needs several cups of strong coffee and some aching neurones before we get it. Natural frequencies seem easier to Bandolier , and the output, of chance of disease with positive or negative tests, seems intuitive and useful. Worth persevering with.


  1. LM Bachmann et al. Accuracy of Ottawa ankle rules to exclude fractures of the ankle and mid-foot: a systematic review. BMJ 2003 326: 417-423.
previous or next story in this issue