Common effect meta-analysis with Stan

You can start reading about Stan at Section 2.4.4 of our book.

This is a meta-analysis where we consider all studies to have data drawn from distribution(s) where the contrast between intervention and control has the same true population value. Some writers call this fixed-effects, which we think is potentially confusing for beginners; others use that term to mean something else. Michael Borenstein calls it “fixed effect (singular)”, which is better, but we prefer common effect to avoid any confusion. The code here relates to Section 3.6.2, where you will find R code for cmdstanr and rstan packages.

Caveat: some meta-analysts, including Robert and Gian Luca, don’t believe that common effect meta-analysis (CE MA) is really a defensible model for studies, unless they were done in-house to exactly the same conditions. However, it is an important first step in learning about MA as a statistical model, rather than a mysterious estimator that has some justification in terms of a weighted average.

Each study reports an estimate of intervention effect and a standard error. These intervention effects, or contrasts will have asymptotically normal sampling distributions, but for small sample sizes in studies, you should consider a t-distribution instead (see below).

See Chapter 5 of the book if you have to calculate the standard errors from other statistics given. In another post, we show models for unreported standard errors, but in this example, we assume all the stats are known.

Simple contrast-based models

Let’s assume there are m studies. Each study reports a log odds ratio in a variable called logor, and its standard error in a variable called se_logor. You don’t have to put the lower=0 constraint on se_logor in the data block, as a negative se_logor would return an error at the normal evaluation. In this simple example, the rationale for doing a Bayesian analysis is to have an informative prior on theta.

data {
  int m;
  array[m] real logor;
  array[m] real<lower=0> se_logor;
}
parameters {
  real theta;
}
model {
  theta ~ ADD_YOUR_PRIOR_HERE
  for(j in 1:m) {
    logor[j] ~ normal(theta, se_logor[j])
  }
}

Our Stan code uses more arrays and looping than some people might like. This is because we think it helps most learners to see exactly what’s going on.

Although this shows a log odds ratio as the contrast statistic, the same model is applicable to any statistic that: (1) has an asymptotically normal sampling distribution and the sample size is big enough to rely on this (otherwise, see Exact Likelihoods below), and (2) we are content to treat the standard errors as perfectly known. You could use a mean difference or log hazard ratio, for example.

Why don’t we give priors in our online code? Well, an important stimulus to writing our book was the widespread adoption of the network meta-analysis BUGS code given in 2011 by the NICE Decision Support Unit report. However, Robert’s scoping review of Bayesian MAs 2005-16 found that many authors of NMAs had simply copied and pasted code including the priors. Sometimes, what looks like a sensible “default prior” can be unexpectedly informative. So, we force you to think about the priors.

We give R code for the interfaces cmdstanr and rstan in the book. You can find equivalents for other interfaces at the Stan website.

Sensible initial values here would be close to zero, in line with the clinical trial principle of equipoise, unless you have strong information otherwise, or are meta-analysing other study designs. For example, for two chains, -0.5 and 0.5.

Exact likelihoods and small sample sizes

When studies have small sample sizes (n), we can easily switch from the asymptotic normal likelihood to an “exact” alternative. Remember that “exact” has a specific meaning in statistical theory and does not imply that the normal is unreliable.

Means have a Student’s t sampling distribution with small numbers, and we can use that for the likelihood. Counts of how many participants had an event will come from a binomial distribution, and if the study counts events at a location or over a period of time, like road traffic accidents, it might come from a Poisson distribution. If you are uncertain about which sampling distribution is right for your evidence base, ask a statistician (they don’t have to be familiar with Bayesian methods to know this).

Let’s start with the mean differences (md) and their standard errors (se_md) first. The t-distribution will have n[j]-2 degrees of freedom, so we can supply that as transformed data. We need to send md, se_md (or precision), and n (or df).

data {
  int m;
  array[m] real md;
  array[m] real<lower=0> se_md;
  array[m] int<lower=0> n;
}
transformed data {
  array[m] real df;
  for(j in 1:m) {
    df[j] = n[j] - 2.0;
  }
}
parameters {
  real theta;
}
model {
  theta ~ ADD_YOUR_PRIOR_HERE
  for(j in 1:m) {
    md[j] ~ student_t(theta, se_md[j], df[j])
  }
}

For log odds ratios, Alan Agresti suggested that the asymptotics behave well in small samples. Our experimentation with simulated data suggests that, although the empirical sampling distribution can become discrete and lumpy for rare events and very small sample sizes, it still has a symmetric normal-like shape. Using the normal to infer the true population log odds ratio is justifiable. It is also possible to deal with each arm’s proportion of events using an arm-based model, which is addressed in another post.

Other statistics might need to be assessed by simulating pseudo-studies and looking at the distribution of their stats. The same applies to well-known statistics on unusually-distributed outcome variables. This will be the subject of a future post.

Uncertainty in standard errors

Because the t-distribution is the exact likelihood for a mean’s sampling distribution when the standard deviation is not known (and it never is outside of textbooks), we can use it on sampling distributions as likelihoods. The standard deviation of a sampling distribution is called a standard error. Just remember to set df to n-2, as above.

Other “fully Bayesian” models have been suggested, going back to 2000, where sampling distributions are used for both the mean given the standard deviation, and the standard deviation alone. These can be useful for imputing missing (unreported) study statistics, but in most circumstances, they will not add more information compared to the simpler exact likelihood approach. We present one in the post of arm-based models.

tags: #stan, #stan-repo, #repo

Common effect meta-analysis with Stan

Simple contrast-based models

Exact likelihoods and small sample sizes

Uncertainty in standard errors

Like this:

Discover more from The Bayesian Meta-Analysis Network

Leave a ReplyCancel reply

Common effect meta-analysis with Stan

Simple contrast-based models

Exact likelihoods and small sample sizes

Uncertainty in standard errors

Share this:

Like this:

Discover more from The Bayesian Meta-Analysis Network

Leave a ReplyCancel reply

Discover more from The Bayesian Meta-Analysis Network