Skip navigation

Bias in diagnostic testing

Levels of evidence


One description of levels of evidence commonly used is shown below. The keys to good quality are independence, masked comparison with a reference standard, and consecutive patients from an appropriate population. Lower quality comes from inappropriate populations and comparisons that are not masked or with different reference standards. Other standards have been applied to diagnostic tests, as reported in Bandolier 26 .

Levels of evidence for studies of diagnostic methods

Level Criteria
1 An independent, masked comparison with reference standard among an appropriate population of consecutive patients.
2 An independent, masked comparison with reference standard among non-consecutive patients or confined to a narrow population of study patients.
3 An independent, masked comparison with an appropriate population of patients, but reference standard not applied to all study patients
4 Reference standard not applied independently or masked
5 Expert opinion with no explicit critical appraisal, based on physiology, bench research, or first principles.

Bias in diagnostic test studies


What we have lacked up to now is proof that poor study design is associated with bias. A new contribution from Holland [1] provides the missing link.

It searched for and found 26 systematic reviews of diagnostic tests with at least five included studies. Only 11 could be used in their analysis, because 15 were either not systematic in their searching or did not report any sensitivity or specificity. Data from the remainder were subjected to mathematical analysis, to investigate whether the presence or absence of some item of proposed study quality made a difference to the perceived value of the test.

There were 218 studies, only 15 of which satisfied all eight criteria of quality for the analysis. Thirty percent fulfilled at least six of eight criteria. The relative diagnostic odds ratio used indicated the diagnostic performance of a test in studies failing to satisfy the methodological criterion relative to its performance in studies with the corresponding feature. Over-estimation of effectiveness (positive bias) of a diagnostic test was shown by a lower confidence interval for the relative diagnostic odds ratio of more than 1.
Study characteristic Relative diagnostic odds ratio (95% CI) Description
Case-control 3.0 (2.0 to 4.5) A group of patients already known to have the disease compared with a separate group of normal patients
Different reference tests 2.2 (1.5 to 3.3) Different reference tests used for patients with and without the disease
Not blinded 1.3 (1.0 to 1.9) Interpretation of test and reference is not blinded to outcomes
No description test 1.7 (1.1 to 1.7) Test not properly described
No description of population 1.4 (1.1 to 1.7) Population under investigation not properly described
No description reference 0.7 (0.6 to 0.9) Reference standard not properly described
The relative diagnostic odds ratio indicates the diagnostic performance of a test in studies failing to satisfy the methodological criterion relative to its performance in studies with the corresponding feature.

The results are shown in the Table. Use of different reference tests, lack of blinding and lack of a description of either the test or the population in which it was studied led to positive bias. But the largest factor leading to positive bias was evaluating a test in a group of patients already known to have the disease and a separate group of normal patients - called a case-control study here.

Comment


The amount of positive bias in poorly conducted studies of diagnostic tests is extremely worrying. Most information for most laboratory tests is only available in the form of case-control studies - those with the highest bias.

Take one example, that of the fashionable free-PSA test [3]. The likelihood ratios from the early studies were 2 to 7. This might be useful in a population of men referred to a urology clinic with prostate cancer or BPH, but most of the studies were case-control studies. If the likelihood ratios were biased, and in truth were lower, the test may be of no use even in a high prevalence setting.

It is all very worrying. It is time someone in academe, or the NHS, or industry sat up and took notice. The problem is not just, or even, with treatment. The problem is knowing who is to be treated. The message is that we need to get back to first principles and do some large high-quality real-life studies. CARE has started that for the clinical examination, but there's absolutely no reason why similar studies could not be performed in other setting for laboratory tests and clinical examinations combined.

References:

  1. JG Lijmer et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999 282: 1061-6.
  2. RA Moore. Free PSA as a percentage of the total: where do we go from here? Clinical Chemistry 1997 43: 1561-2.
previous or next story in this issue