A scholarly article from BMC Medical Research Methodology titled 'Meta-analysis models relaxing the random-effects normality assumption: methodological systematic review and simulation study' with author names listed.

Most meta-analyses are based on the random-effects model, the standard go-to framework based on the assumption that each study estimates a different “true” effect and that these true effects follow a normal distribution.

This assumption underpins nearly every random-effects analysis performed in clinical and public-health research.

  • But what if that assumption isn’t true?
  • What if study results follow a skewed, bimodal, or heavy-tailed pattern?
  • Could such deviations distort our pooled estimates or make the “average effect” misleading?

These questions motivated Panagiotopoulou et al. (Panagiotopoulou, K., Evrenoglou, T., Schmid, C.H., Metelli, S., & Chaimani, A. (2025). Meta-analysis models relaxing the random-effects normality assumption: methodological systematic review and simulation study. BMC Medical Research Methodology, 25, 231. https://doi.org/10.1186/s12874-025-02658-3) , who systematically reviewed all available meta-analysis models that relax the normality assumption and conducted a large simulation study comparing their performance.

Phase 1: Systematic review of alternative models

The authors identified 27 methodological papers describing 24 alternative random-effects models that deviate from the usual normal assumption. These fell into three main families:

  1. Long-tailed or skewed distributions such as t, skew-normal, or skew-t models, which accommodate outliers and asymmetry.
  2. Mixture models combining two or more distributions to represent hidden subgroups or latent clusters of studies.
  3. Dirichlet process (DP) priors: flexible Bayesian, semi-parametric models capable of automatically discovering clusters in the data.

Most approaches were Bayesian and available only through specialized software/package (for example, metaplus, flexmeta, bspmma); despite their conceptual richness, few have been applied in practice.

Phase 2: Simulation study

To test these models head-to-head, the authors simulated 22 distinct scenarios, varying:

  • Heterogeneity levels: moderate vs. large random-effects variance
  • Number of studies: 14 or 26 (plus additional tests with only 5 studies)
  • True distribution shape: normal, skew-normal, or mixture of two normal distributions (bimodal)

They evaluated 15 models (11 Bayesian and 4 frequentist) using mean absolute bias, coverage probability, and mean squared error (MSE) for both the mean (mu) and heterogeneity (tau).

Key Findings

1. No model escapes the limits of high heterogeneity

When between-study variance was large, all models, normal and non-normal alike, struggled to estimate the true mean accurately.
Bias in mu increased sharply with heterogeneity, and differences among models were modest. Even sophisticated methods could not overcome this fundamental problem: when studies are highly inconsistent, the pooled average becomes unstable and potentially meaningless.

Importantly, the conventional normal–normal model performed comparably to more complex alternatives in most realistic scenarios.

2. Non-normal models can reveal structure

When data were truly non-normal, alternative models provided meaningful insights:

  • DP models uncovered latent clusters of studies when the true distribution was bimodal, offering improved coverage and more accurate estimation of study-specific effects
  • The Bayesian skew-normal model performed best when the underlying effects were skewed, correctly recovering the direction and magnitude of asymmetry

These models are therefore less about producing a “better mean” and more about revealing structure, identifying subgroups, asymmetry, or multimodality that the normal model obscures.

3. Estimating heterogeneity: priors matter

Within the Bayesian framework:

  • Using a half-normal prior on tau reduced bias and MSE
  • Using a uniform prior yielded coverage probabilities closer to the nominal level.

DP models (when convergent) consistently exhibited the lowest relative bias for tau, particularly under high heterogeneity.

4. Other practical points

  • Convergence issues: Non-parametric models (especially DP) often failed when there were very few studies (n = 5).
  • Small meta-analyses: for small datasets, the simple normal model remains not only practical but often optimal.

Guidance for Practice

The authors emphasize a pragmatic hierarchy:

  1. Start with the normal model, but check its assumptions
  2. Use alternative models as sensitivity analyses, particularly when skewness, outliers, or clustering are suspected
  3. Look beyond the mean: when heterogeneity is large, interpret prediction intervals and distribution shape instead of relying solely on a single summary estimate.
  4. Explore structure: mixture and semi-parametric models can uncover meaningful subgroups of studies, helping to explain heterogeneity rather than simply averaging over it.

Take-Home Messages

1. A single average effect can be potentially misleading

When study results differ widely, the pooled average may be spurious.

2. Don’t ignore variation

When between-study variance is large, the estimated mean is less representative of the studies themselves.

3. Focus on the range, not just the mean

The spread of effects (and we all agree on the importance of reporting prediction intervals) is often more informative than the average: It reflects what future studies are likely to find and the inherent uncertainty in generalizing results.

4. Look for subgroups, not just outliers

Large heterogeneity signals that studies may not be comparable. Instead of forcing them into one model, researchers should seek subgroups or contexts that explain the observed variation: usual “problems” (ie. ecological fallacy) of subgroup analyses based on aggregate data remain.

What Does “Random Effects” Really Mean?

This study is a good reminder that assumptions in meta-analysis are not mere technicalities, they shape how we think about evidence and its synthesis.

The distribution of random effects is a modelling tool, a way to represent uncertainty and variation among studies. Choosing its form is not simply a computational decision but a reflection of how broadly one wishes to generalize findings.

Different estimands and inferential goals call for different perspectives: the mean effect may be meaningful within a narrow, well-defined clinical context but less so for global or policy-level inference. The shape of heterogeneity, its tails, skewness, or clusters, directly affects how far we can responsibly generalize beyond the included studies.

Thus, the choice of random-effects distribution is not just a statistical matter but a scientific and interpretive one.

In Summary

Panagiotopoulou and colleagues highlight that:

  • The normal model remains a solid and defensible starting point.
  • Complex alternatives help us explore, not replace, the conventional framework.
  • What matters most is recognizing when the average effect ceases to be meaningful and when the structure of variation tells a richer story.

Discover more from The Bayesian Meta-Analysis Network

Subscribe to get the latest posts sent to your email.

One response to “Going beyond the normal assumptions in random effects models in meta-analysis?”

  1. […] assume here that the heterogeneity is normal (see Gian Luca’s recent post on this); you can change that in the […]

Discover more from The Bayesian Meta-Analysis Network

Subscribe now to keep reading and get access to the full archive.

Continue reading