Skip navigation

Mindstretcher on Diagnostics

Bandolier is interested in better ways of making diagnoses, and in particular looks for systematic reviews of diagnostic tests. We can be reasonably comfortable about ways of assessing the evidence for treatment trials, knowing about sources of bias from failure to randomise or to blind trials.

But for diagnostic tests the rules are still being written. And worse, we have few examples where such evidence as exists shows the test in a particularly good light. This month, therefore, Bandolier concentrates on some news from the diagnostic test front - some good, some hopeful, but some giving cause for concern. There are two "eat-your-heart-out" reviews, one on foetal fibronectin [1] predicting preterm delivery which we feature. The other on the diagnosis of left-sided heart failure [2] is one of those articles to which no précis can do justice. So for those of you who are interested - read it!

1 Preterm delivery prediction

To some people it is the destination that matters (does this work, and how well does it work?). Others are more interested in the route we travel (how do we know if it works, could we be wrong in our conclusions?). So all the better when one finds an exceptional piece of work which satisfies both these appetites.

A systematic review of the use of cervico-vaginal foetal fibronectin from Dundee [1] is one of the best examples of a diagnostic test review Bandolier has seen to date. It's one of those papers we wish had our name on. Anyone interested in doing a systematic review of a diagnostic test should read this paper. Anyone whose profession involves diagnostics should be ashamed if they don't read it.


Foetal fibronectin is found in amniotic fluid and placental tissue. Mechanical or inflammatory damage to the membranes and placenta before preterm delivery may result in release of foetal fibronectin into cervico-vaginal fluid. So if you take a swab, and analyse the fluid for foetal fibronectin, then presto!, if it is raised preterm delivery may be imminent. Well that's the theory. So the chaps from Dundee searched the literature for appropriate studies. Their searching and methods section where they explain the great lengths they went to to legitimise the analysis is highly detailed and of the highest quality, as is their reporting of the quality issues in the primary studies they found.

So does the test work?

Well sort of. The results are shown in the Table, with pre-test probabilities calculated from the pooled prevalence in the studies and post-test probabilities calculated by applying the likelihood ratios for a positive test (>=50 ng/mL) or negative test (<50 ng/mL) using either laboratory or bedside tests.
Population Outcome Test result Pretest probability Likelihood ratio Post-test probability
Symptomatic women (%) (%)
Delivery <37 weeks Positive 34 (30 - 37) 4.6 (3.5 - 6.1) 70 (63 - 76)
Negative 0.5 (0.4 - 0.6) 21 (17 - 25)
Delivery <34 weeks Positive 33 (24 - 41) 2.6 (1.8 - 3.7) 56 (43 - 67)
Negative 0.2 (0.1 - 0.5) 8 (3 - 20)
Delivery within 1 week Positive 7 (4 - 9) 5.0 (3.8 - 6.4) 26 (18 - 36)
Negative 0.2 (0.1 - 0.4) 1 (0.4 - 3.1)
Asymptomatic women
Low risk Delivery <37 weeks Positive 25 (22 - 28) 3.2 (2.2 - 4.8) 52 (41 - 63)
Negative 0.8 (0.7 - 0.9) 22 (19 - 26)
High risk Delivery <37 weeks Positive 32 (23 - 40) 2.0 (1.5 - 2.6) 48 (37 - 43)
Negative 0.4 (0.2 - 0.8) 17 (9 - 28)
Delivery <34 weeks Positive 16 (10 - 21) 2.4 (1.8 - 3.2) 31 (21 - 43)
Negative 0.6 (0.4 - 0.9) 10 (6 - 17)
Values in parenthesis are 95% confidence intervals
The key question, according to the authors, is whether delivery is likely within one week of the test being done. Few studies addressed that, but the results were not encouraging. The best that can be said is that in symptomatic women, the combination of low delivery rates in the week following the test, plus a low likelihood ratio of a negative test meant that a negative test in these women gave about a 1% chance of delivery in the following week.
Of course, the test could be used in conjunction with other independent tests, chemical, clinical, or physical to generate better diagnostic accuracy.

2 More about Kappa

Some of our correspondents were concerned that Bandolier was unfair to pathologists in emphasising the subjective variation that can occur in what was formerly regarded as the gold standard, namely histopathological diagnosis. In Bandolier 37 we pointed out that there was considerable variation both in how pathologists classified phenomena they were looking at and the meaning they attached to the name they had given particular phenomena they had observed.

Need for Kappa

Kappa is a measure of agreement which takes into account the probability that some agreement will occur by chance. Imagine a situation in which 98% of the population are known to be free from tuberculosis. Anyone looking at 100 X-rays could call all of them negative, safe in the knowledge that they would have at least a 98% agreement with the best radiologist in the world. This is manifestly absurd. So we use Kappa as one technique of letting us know how well we agree. The Kappa scale ranges from zero (no better agreement than would be expected by chance) to 1 (perfect agreement).

3 Diagnosing gastric cancer

The incidence of gastric cancer is reported to be high in Japan, put down to factors like genetics and diet. But no-one has tested the idea that Japanese and Western pathologists may differ about what constitutes gastric cancer.
Eight pathologists from Japan, North America and Europe individually reviewed 35 microscope slides of specimens from 17 Japanese patients [3]. There wasn't a great deal of agreement - as the table shows when suspected carcinomas are grouped with definite carcinomas.
Adenoma or reactive epithelium Suspected or definite carcinoma Total
Western Adenoma or reactive epithelium 4 17 21
Suspected or definite carcinoma 0 14 14
Total 4 31 35
When the opinion of the majority of pathologists was taken as a final diagnosis, there was agreement between Japanese and Western pathologists in only 11 of 35 slides. This gave a Kappa of 0.15 (95% confidence interval 0.01 to 0.29). In seven slides, Western pathologists diagnosed low-grade adenoma or dysplasia, whereas the Japanese pathologists diagnosed definite carcinoma in four slides and suspected carcinoma in one. Of the 12 slides which Western pathologists graded as having "high-grade adenoma and dysplasia", the Japanese gave the diagnosis of definite carcinoma in 11 and suspected carcinoma in one.
There is a "so what?" question hanging here. Does it matter? An accompanying editorial [4] concludes that some Japanese patients may have unnecessary resections, but that some Western patients with lesions at high risk of progressing to advanced cancer may remain untreated.

4 More prostate cancer than you think

The gold standard for diagnosis of prostate cancer has become the sextant biopsy. This is a technique where men suspected of having cancer (raised PSA, or symptoms, or abnormal rectal examination) have the gland biopsied with six needles. Usually done under sedation, this is eye-wateringly painful. Pathologists then look to see if they can find cancerous tissue in these six samples of prostate gland.
But there is an old adage, that the more you look, the more you find. If you were to have your prostate biopsied in North Carolina, the chances are that a different biopsy technique would be used, one that takes 13 samples of the prostate gland [5]. In a series of 119 men having biopsies, 31 were shown to have cancer using biopsies from the usual sextant position, but an additional 17 (35% more) using all 13 biopsy specimens. Most of the additional 17 cancers detected were fairly advanced.
This is mind-blowing stuff if repeated. It means re-thinking much of what we think we know about prostate cancer screening, because all those strategies which have been, or are being tested are predicated upon the sextant biopsy as the gold standard. But if the gold standard is tarnished, what do we think then?

5 New PSA test may help

PSA in plasma is predominantly bound to alpha-1-antichymotrypsin. Tests for measuring the percentage of free PSA (that fraction not bound to proteins like alpha-1-antichymotrypsin) are being used in making the differential diagnosis of prostate cancer and BPH. This may have real potential, not least because the Baltimore Longitudinal Study of Ageing has indicated, albeit in a small number of men, that the percentage of free PSA in serum is predictive of tumour aggression [6].

Combining percentage free PSA and prostate volume appears to have exciting prospects, though only two studies have looked at this as yet. One [7] showed that while only 11% of men with BPH and a prostate volume below 30 mL had a percentage of free PSA below 15%, 52% of men with prostate cancer did so. Even better results were reported in the same issue of the same journal [8]. Using a different analysis system, researchers found that men with a prostate volume below 40 mL and a percentage of free PSA below 15% misclassified only one of 16 men with BPH and only one of 26 men with prostate cancer, giving a sensitivity of 94%, specificity of 96% and likelihood ratio of 23.5. We need these observations confirmed in larger prospective studies, because with a prevalence of prostate cancer of 30-40% in patients referred to urologists, this test could save some unnecessary biopsies because it gives a post-test probability of about 95%.

6 Colonoscopy can miss the point

Now you don't see it, now you do! That is the message from a carefully conducted study from Indiana [9]. Patients (186 of them) who needed colonoscopy, and who were able to take two examinations in one day, underwent a second examination following a first examination done in a standard way. All examiners had done more than 500 colonoscopies. The second examination, to find the number of missed adenomas, was randomised between four strategies:
  • same position, same examiner
  • different position, same examiner
  • same position, different examiner
  • different position, different examiner

In the initial examination 289 adenomas were found. The second colonoscopy found 89 more, a "miss rate" of 24% (89/378), or rather a miss rate of 31% expressed as a percentage of the first examination. How the second examination was carried out did not seem to matter. The misses included two large (>1 cm) adenomas, and the miss rate increased with decreasing size, as the diagram below demonstrates. Rather different from the marked agreement we saw between endoscopists in Bandolier 38 .

The interesting question is whether much that is important is being missed. A commentary to the paper in the ACP Journal Club suggests not [10]. But it does ask a really interesting question about how trials to prevent adenomatous polyp growth are judged. If the end point of such trials is detecting polyps on colonoscopy after treatment, but you can spot 24% (or 31%) of those seen initially but don't know that, then maybe the negative results of such trials aren't really negative at all.


So there are mixed messages coming from the diagnostics world. But perhaps that is not unexpected with people working hard to try and find ways forward in tricky territory. The case of the new PSA test is not atypical. Studies which look at a new test, but where the gold standard verification is flawed, may be judged unjustly.

There may be the odd step backwards, but at least it is accompanied by two steps forward. Anyone wanting to get their brains around some of the difficult issues involved with using an 'evidence-based' prefix in diagnostics could do little better than read Adrian Dixon's comments on diagnostic radiology [11].


  1. PFW Chien, KS Khan, S Ogston, P Owen. The diagnostic accuracy of cervico-vaginal fetal fibronectin in predicting pre-term delivery: an overview. British Journal of Obstetrics and Gynaecology 1997 104: 436-44.
  2. RG Badgett, CR Lucey, CD Mulrow. Can the clinical examination diagnose left-sided heart failure in adults? Journal of the American Medical Association 1997 277: 1712-9.
  3. RS Schlemper, M Itabashi, Y Kato, et al. Difference in diagnostic criteria for gastric carcinoma between Japanese and Western Pathologists. Lancet 1997 349: 1725-29.
  4. J Sakamoto, M Yasue. Do Japanese statistics on gastric carcinoma need to be revised? Lancet 1997 349: 1711.
  5. LA Eskew, RL Bare, DL McCullough. Systematic 5 region prostate biopsy is superior to sextant method for diagnosing carcinoma of the prostate. Journal of Urology 1997 157: 199-203.
  6. HB Carter, AW Partin, AA Luderer et al. Percentage of free prostate-specific antigen in sera predicts agressiveness of prostate cancer a decade before diagnosis. Urology 1997 49: 379-84.
  7. S Egawa, S Soh, M Ohori et al. The ratio of free to total serum prostate specific antigen and its use in differential diagnosis of prostate carcinoma in Japan. Cancer 1997 79: 90-8.
  8. C Stephan, M Lein, K Jung, D Schnorr, SA Loening. The influence of prostate volume on the ratio of free to total prostate specific antigen in serum of patients with prostate carcinoma and benign prostate hyperplasia. Cancer 1997 79: 104-9.
  9. DK Rex, CS Cutler, GT Lemmel et al. Colonoscopic miss rates of adenomas determined by back-to-back colonoscopies. Gastroenterology 1997 112:24-8.
  10. AI Neugent. Commentary on Rex et al. ACP Journal Club 1997 July/August, 16.
  11. AK Dixon. Evidence-based diagnostic radiology. Lancet 1997 350: 509-12.

previous or next story in this issue