Skip navigation



In systematic reviews heterogeneity refers to variability or differences between studies in the estimates of effects. A distinction should made between "statistical heterogeneity" (differences in the reported effects), "methodological heterogeneity" (differences in study design) and "clinical heterogeneity" (differences between studies in key characteristics of the participants, interventions or outcome measures). Where there are large differences in clinical or methodological nature between studies, the simplest question to ask is whether there is any good reason for pooling data from these studies in a meta-analysis, where heterogeneity is known to exist.

More difficult is the occurrence of statistical heterogeneity where there is methodological and clinical homogeneity. Statistical tests of heterogeneity are used to assess whether the observed variability in study results (effect sizes) is greater than that expected to occur by chance. These tests have low statistical power, and the boundary for statistical significance is usually set at 10%, or 0.1. Some people think that if these tests are used, then a value of 1%, or 0.01 makes more sense.

An analysis of the performance of commonly used tests shows that the Breslow-day test performs most consistently (DJ Gavaghan et al. An evaluation of homogeneity tests in meta-analysis in pain using simulations of individual patient data. Pain 2000 85: 415-24).