Skip navigation
Link to Back issues listing | Back Issue Listing with content Index | Subject Index

Astrology, illness, and chance

Study
Results
Comment

We depend a lot on statistical testing to tell us what to think about a result, and how much weight to put on it, but often forget that statistics (and chance) are themselves subject to rules. For instance, by setting a 95% confidence limit on “normal” values, we automatically define 5% of the results to be “abnormal”. In another example, Bandolier 105 examined the DICE studies, one showing that rolling dice to simulate trials provides 1 in 20 which were statistically significant at the 5% level (statistical significance set at a p value of 0.05). This is what one would expect just by chance even though there was no difference. Another example showed that subgroup analysis of homogeneous data produced results of spurious high statistical significance.

The perils of multiple statistical testing might have been drummed into us during our education, but as researchers we often forget them in the search for “results”, especially when such testing confirms our pre-existing biases. A large and thorough examination of multiple statistical tests [1] underscores the problems this can pose.

Study


This population-based retrospective cohort study used linked administrative databases that covered 10.7 million residents of Ontario aged 18-100 years who were alive and had a birthday in the year 2000. Before any analyses, the database was split in two to provide both derivation and validation cohorts of about 5.3 million persons, so that associations found in one cohort could be confirmed in the other cohort.

All admissions to Ontario hospitals classified as urgent (but not elective or planned) was used, using DSM criteria, ranked by frequency. This was used to determine which persons were admitted within the 365 days following their birthday in 2000, and the proportion admitted under each astrological sign. The astrological sign with the highest hospital admission rate was then tested statistically against the rate for all 11 other signs combined, using a significance level of 0.05. This was done until two statistically significant diagnoses were identified for each astrological sign.

Results


In all 223 diagnoses (accounting for 92% of all urgent admissions) were examined to find two statistically significant results for each astrological sign. Of these 223, 72 (32%) were statistically significant for at least one sign compared with all the others combined. The extremes were Scorpio with two significant results, and Taurus with 10, with significance levels of 0.0003 to 0.048.

The two most frequent diagnoses for each sign were used to select 24 significant associations in the derivation cohort. These included, for instance, intestinal obstructions and anaemia for people with the astrological sign of Cancer, and head and neck symptoms and fracture of the humerus for Sagittarius. Levels of statistical significance ranged from 0.0006 to 0.048, and relative risk from 1.1 to 1.8 (Figure 1), with most being modest.


Figure 1: Relative risk of associations between astrological sign and illness for the 24 chosen associations, using a statistical significance of 0.05, uncorrected for multiple comparisons






Protection against spurious statistical significance from multiple comparisons was tested in several ways.

  1. When the 24 associations were tested in the validation cohort, only two remained significant, gastrointestinal haemorrhage and Leo (relative risk 1.2), and fractured humerus for Sagittarius (relative risk 1.4).
  2. Preserving an overall error rate of 5% meant using a significance level of 0.002 would have left 9 of 24 comparisons significant in the derivation cohort, but none in the both derivation and validation cohort.
  3. Correcting for the 14,718 comparisons used in the derivation cohort would have meant using a significance level of 0.000003, and no comparison would have been significant.

Comment


This study is a sobering reminder that statistical significance can mislead when we don't use statistics properly: don't blame statistics or statisticians, blame our use of them. There is no biological plausibility for a relationship between astrological sign and illness, yet many could be found in this huge data set when using standard levels of statistical significance without thinking about the problem of multiple comparisons. Even using a derivation and validation set did not offer complete protection against spurious results in enormous data sets.

Multiple subgroup analyses are common in published articles in our journals, usually without any adjustment for multiple testing. The authors examined 131 randomised trials published in top journals in six months in 2004, which had an average of 5 subgroup analyses, and 27 significance tests for efficacy and safety. The danger is that we may react to results that may have spurious statistical significance, especially when the size of the effect is not large.

Reference:

  1. PC Austin et al. Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology 2006 59:964-969.

previous or next story