# Mindstretcher: Logistic regression explained

A tutorial from Professor Peter Freeman in sunny Cornwall:

So Bandolier confesses to finding logistic regression analysis impenetrable. Well said! So, I suspect, does the majority of the medical community, so let's have a go at explaining.

The example that Bandolier reported was typical. There was one outcome variable, liver cirrhosis confirmed by biopsy, and a whole host of explanatory variables, clinical observations as well as biochemistry results. The aim was to find a subset of all the explanatory variables that can be combined to predict the value of the outcome variable. This is the usual technique that statisticians call regression analysis. It is done using tons of complicated arithmetic, but the outcome is a regression equation having the outcome variable on the left hand side and a combination of the explanatory variables on the right. For any future patient, the values of their explanatory variables can be fed into the equation to predict the value of their outcome variable.

The important result was that liver cirrhosis can be predicted with a high accuracy using selected clinical observations only and that biochemistry readings are of little or no extra help in the diagnosis. Even better, only six of the 25 clinical factors are important, so good diagnosis only needs observation of abdominal wall veins, facial telangiectasia, fatness, peripheral oedema, vascular spiders and white nails.

For technically-minded readers who are curious about the mysterious word "logistic" that pops up before "regression", this is simply a little mathematical trick to get over the snag that in this example the risk of liver cirrhosis can only take values between 0 and 1. Something called a logistic transformation is used to change this into something more numerically convenient, with a reverse transformation to get us back again to the real world afterwards.

### Some sums (not too hard)

Readers who are still game for more enlightenment might like to try working through the following details:-

Table 3 of the paper Hamberg et al  gives the clinical model logistic regression equation as

LOGODDS = -4.18 + 3.01xFAC + 1.80xVAS + 1.75xWHI + 1.48xABD + 1.07xFAT - 1.28xPER

where:
• FAC = facial telangiectasia
• VAS = vascular spiders
• WHI = white nails
• ABD = abdominal wall veins
• FAT = fatness
• PER = peripheral oedema

and the funny word on the left-hand side of the equation just stands for the mathematical way of writing the logistic transformation.

Now I think (although the paper does not say so explicitly) that the variables on the right-hand side take the value 1 if the clinical sign is present and the value 0 if it is absent. If this is so, then a patient who showed all six clinical signs would have a risk of liver cirrhosis given by

LOGODDS = -4.18 + 3.01 + 1.80 +1.75 + 1.48 + 1.07 - 1.28

= 3.65

and so

ODDS = e 3.65 = 38.47
(any scientific calculator has a button for this)

and hence

RISK = ODDS / (1 + ODDS) = 38.47/39.47 = 0.97 or 97%

while a patient who showed only white nails and fatness would have risk

LOGODDS = -4.18 + 1.75 + 1.07 = -1.36

ODDS = e --1.36 = 0.26

RISK = 0.26 / 1.26 = 0.20 or 20%
QED

Professor Peter Freeman