Skip navigation

Pathology as art appreciation: melanoma diagnosis

Have you ever considered just how much a pathologist is like an art expert? Just as a Lord Clarke (of Civilisation) could spot a 17th century Italian masterpiece, know a Cezanne because of the particular character of the blues in it, or pick out an original Seurat from a pile of copies because of the nature of the little blobs the artist employed, so a pathologist has to recognise one set of visual impressions as being cancer or not.

Histopathology has long been regarded as providing a "gold standard" diagnosis against which all others have to be measured. This gold standard is falling in value after being tested and found to be less than 24 carat. There are two main weaknesses - both pretty fundamental - inter-observer variability and the lack of statistically-based predictive power of their "diagnoses".

Impressionism in pathology

Inter-observer variability (how different pathologists interpret a slide) has to be distinguished from intra-observer variability (how the same pathologist interprets the same slide at different times). Even if the same specimen is shown to the same observer there is not always perfect agreement between the two opinions.

The major problem though is the variability between observers. When any paper assesses the accuracy of a diagnostic test which depends upon an individual's perception, on their vision or hearing, or on some subjective decision-making process, it is essential to ensure that the study includes an assessment of the degree of inter-observer variability. Any such paper should be using a statistical technique, of which the KAPPA is the most widely used, to assess the degree of the inter-observer variability.

Agreement by chance

Techniques such as kappa allow for agreement by chance, taking into account the prevalence of the condition being assessed, so the degree of agreement in those areas where agreement or disagreement matters can be assessed. Kappa (or k) has values between 0 (a random effect) and 1 (perfect agreement). In practice a kappa score of greater than 0.4 is taken to indicate that agreement is becoming reasonable, while 0.6 or above is good agreement.

A recent editorial in the Journal of Pathology [1] pointed out that apart from grouping lesions into broad categories (benign versus malignant), poor agreement is not infrequent. Some examples from the literature (albeit generally with small numbers of pathologists, but with perhaps more attention given than is the norm in clinical practice) quoted in the editorial are reproduced in the table below.

Diagnosis of melanoma

As part of the National Institutes of Health consensus conferences on the diagnosis and treatment of early melanoma, a study was conducted to review pathology specimens and measure inter-pathologist agreement [2]. A panel of eight pathologists expert in melanoma diagnosis was selected. Each submitted five cases (slide plus clinical history and eventual diagnosis), which they considered "classic" cases of melanomas or melanocytic nevi that shared histological features with melanomas.

From these 37 cases were selected and anonymised. Slides with histories (but not eventual diagnosis) were then randomised independently of the organising group. The slides were sent to each panel member in turn, so that the same glass slides were used by each panel member. Each case was to be scored as benign, malignant, or indeterminate. Any description other than this had to be defined.


All 37 cases were reviewed, without loss or breakage. Eight benign cases and five malignant cases were agreed unanimously. Lack of unanimous agreement occurred in 24 cases (62%). Two or more discordant diagnoses were made in 14 cases (38%) and discordance was three or more in 8 cases (22%). The kappa was 0.5, indicating only moderate agreement.

It is illuminating to look at the extremes. One expert (and these were all experts in melanoma, don't forget) thought 21 cases were malignant and 16 were benign. Another thought 10 were malignant, 26 benign, and one indeterminate. Between them, these two pathologists disagreed on 12 out of 37 cases, and in 11 cases one pathologist identified a case as malignant while the other identified the same case as benign.

An accompanying editorial [3] speaks of "shattered illusions". It comments that "the conclusions of the article....should be chilling not only to physicians, but to patients, and sobering to lawyers for plaintiffs."

Names or diagnoses

Even if there is perfect agreement that a collection of cells or piece of tissue can be given a certain name with a high level of agreement, the name itself may be meaningless. The name may just not convey that a person is at risk of disease or has a disease. In the past these names were called "diagnoses". A diagnosis implies that a patient has a disease with a certain prognosis, established scientifically. This is not always the case. A name may be just a label given to appearances that histopathologists name that way - like "impressionism" or "cubism" - not a diagnosis like "tuberculosis".

An excellent leader in The Lancet [4] emphasised the danger of labelling something with a name which might have harmful consequences for the patient, both physical and psychological, without any benefit. The title, of which Bandolier would have been proud, was "Carcinoma-in-situ of the breast: have pathologists run amok?" The author, a pathologist in New Mexico, delivers a trenchant attack on "in-situ-diagnosis" and suggests that pathologists' monopoly on giving names to things they see is challenged. He argues that "their diagnoses of carcinoma-in-situ have transmitted more fear than knowledge into the clinical arena". He calls for "terminology consumers" to start saying what names and classifications they would find most helpful. He finishes up with the striking sentence that "it does not require specialist training in pathology to recognise that the patient's diagnosis should not be an anachronism sustained by anecdotes, conjecture and tradition".

Eels in grease

Diagnostic tests, of whatever type and done in any specialty, have their problems. Tackling these problems (and particularly trying to describe ways in which they can be dealt with) is like trying to hold on to eels in grease. Similar problems occur in other areas, and in diagnosis done in test tubes. Bandolier was moved to run this article because of the realistic attitude being taken by pathologists about their profession and by the forceful attitudes expressed in both the British and American pathology journals. Do our readers have examples of other diagnostic tests, of whatever sort, that can lead us astray or do extremely well?


  1. KA Fleming. Evidence-based pathology. Journal of Pathology 1996 179: 127-8.
  2. ER Farmer, R Gonin, MP Hanna. Discordance in the histopathologic diagnosis of melanoma and melanocytic nevi between expert pathologists. Human Pathology 1996 27: 528-31.
  3. AB Ackerman. Discordance among expert pathologists in diagnosis of melanocytic neoplasms. Human Pathology 1996 27:1115-6
  4. E Foucar. Carcinoma-in-situ of the breast: have pathologists run amok? Lancet 1996 347: 707-8.

previous or next story in this issue