You can start reading about brms at Section 4.4.5 of our book. Full documentation is at the website.

This is a meta-analysis where we consider all studies to have data drawn from distribution(s) where the contrast between intervention and control has the same true population value. Some writers call this fixed-effects, which we think is potentially confusing for beginners; others use that term to mean something else. Michael Borenstein calls it “fixed effect (singular)”, which is better, but we prefer common effect to avoid any confusion. The code here relates to Section 3.6.2, where you will find R code for cmdstanr and rstan packages.

Caveat: some meta-analysts, including Robert and Gian Luca, don’t believe that common effect meta-analysis (CE MA) is really a defensible model for studies, unless they were done in-house to exactly the same conditions. However, it is an important first step in learning about MA as a statistical model, rather than a mysterious estimator that has some justification in terms of a weighted average.

Each study reports an estimate of intervention effect and a standard error. These intervention effects, or contrasts will have asymptotically normal sampling distributions, but for small sample sizes in studies, you should consider a t-distribution instead (see below).

See Chapter 5 of the book if you have to calculate the standard errors from other statistics given. In another post, we show models for unreported standard errors, but in this example, we assume all the stats are known.

Common effect, contrast-based models

Let’s assume there are m studies. Each study reports a log odds ratio in a variable called logor, and its standard error in a variable called se_logor, and these are in the data frame ‘mydata’. In this simple example, the rationale for doing a Bayesian analysis is to have an informative prior on theta.

brms uses a regression formula notation in which an outcome (in our case, the reported contrast, logor) is predicted by one or more predictors. A tilde (∼) is used to specify that there is a predictive relationship. Meta-analysis is a special case as we do not have any predictor variable, so we type 1 in the place of predictor variable names. This is then an intercept-only model. In order to fix the standard error of logor (which will then be used in likelihood calculations), we can use the function se(se_logor).

library(brms)

priors <- c(prior(ADD_THETA_PRIOR_HERE,
                  class=Intercept))

inits_list <- list(list(b_Intercept = -0.5),
                   list(b_Intercept =  0.5))

meta_brms <- brm(logor | se(se_logor) ~ 1,
             data = mydata,
             prior = priors,
             family = gaussian(),
             iter = 5000,
             chains = 2,
             cores = 2,
             inits = inits_list)

summary(meta_brms)

Although this shows a log odds ratio as the contrast statistic, the same model is applicable to any statistic that: (1) has an asymptotically normal sampling distribution and the sample size is big enough to rely on this (otherwise, see Exact Likelihoods below), and (2) we are content to treat the standard errors as perfectly known. You could use a mean difference or log hazard ratio, for example.

Why don’t we give priors in our online code? Well, an important stimulus to writing our book was the widespread adoption of the network meta-analysis BUGS code given in 2011 by the NICE Decision Support Unit report. However, Robert’s scoping review of Bayesian MAs 2005-16 found that many authors of NMAs had simply copied and pasted code including the priors. Sometimes, what looks like a sensible “default prior” can be unexpectedly informative. So, we force you to think about the priors.

Sensible initial values here would be close to zero, in line with the clinical trial principle of equipoise, unless you have strong information otherwise, or are meta-analysing other study designs. For example, for two chains, -0.5 and 0.5.

Exact likelihoods and small sample sizes

When studies have small sample sizes (n), we can easily switch from the asymptotic normal likelihood to an “exact” alternative. Remember that “exact” has a specific meaning in statistical theory and does not imply that the normal is unreliable.

Means have a Student’s t sampling distribution with small numbers, and we can use that for the likelihood. Counts of how many participants had an event will come from a binomial distribution, and if the study counts events at a location or over a period of time, like road traffic accidents, it might come from a Poisson distribution. If you are uncertain about which sampling distribution is right for your evidence base, ask a statistician (they don’t have to be familiar with Bayesian methods to know this).

Let’s start with the mean differences (md) and their standard errors (se_md) first. The t-distribution will have n[j]-2 degrees of freedom, so we can supply that as transformed data. We need to send md, se_md (or precision), and n (or df).

mydata$df <- mydata$n - 2

meta_brms <- brm(logor | se(se_logor, 
                            sigma = FALSE, 
                            df = df) ~ 1,
                 data = meta_data,
                 family = student(),
                 prior = priors,
                 iter = 5000,
                 chains = 2,
                 cores = 2,
                 inits = inits_list)

For log odds ratios, Alan Agresti suggested that the asymptotics behave well in small samples. Our experimentation with simulated data suggests that, although the empirical sampling distribution can become discrete and lumpy for rare events and very small sample sizes, it still has a symmetric normal-like shape. Using the normal to infer the true population log odds ratio is justifiable. It is also possible to deal with each arm’s proportion of events using an arm-based model, which is addressed in another post.

Other statistics might need to be assessed by simulating pseudo-studies and looking at the distribution of their stats. The same applies to well-known statistics on unusually-distributed outcome variables. This will be the subject of a future post.

Uncertainty in standard errors

Because the t-distribution is the exact likelihood for a mean’s sampling distribution when the standard deviation is not known (and it never is outside of textbooks), we can use it on sampling distributions as likelihoods. The standard deviation of a sampling distribution is called a standard error. Just remember to set df to n-2, as above.

Other “fully Bayesian” models have been suggested, going back to 2000, where sampling distributions are used for both the mean given the standard deviation, and the standard deviation alone. These can be useful for imputing missing (unreported) study statistics, but in most circumstances, they will not add more information compared to the simpler exact likelihood approach. We present one in the post of arm-based models.

Random effects models

The basic random effects meta-analysis extends the common effect model by allowing studies to sample from their own populations, so that the true contrasts or intervention effects in the study populations differ. It seems reasonable to represent this between-study variation or heterogeneity with a normal distribution because of the Central Limit Theorem (discussed in Chapter 4 of our book). This produces a multilevel model:

priors <- c(prior(ADD_THETA_PRIOR_HERE,
                  class=Intercept),
            prior(ADD_TAU_PRIOR_HERE,
                  class=sd, lb=0))

meta_brms <- brm(logor | se(se_logor) ~ 1 + (1|study),
                 data = mydata,
                 prior = priors,
                 family = gaussian(),
                 iter = 5000,
                 chains = 2,
                 cores = 2,
                 inits = inits_list)

We apply a lower bound to the prior (whatever it is going to be) for tau with the ‘lb’ argument. The studies are identified by integers in the variable ‘study’.

Here, we again start with a contrast-based model and use the log odds ratio as an example of a study statistic with an asymptotically normal sampling distribution. We are also taking the study estimates of the standard errors as totally correct and certain. That makes the model a Bayesian match for the frequentist DerSimonian-Laird estimator.

We have included only the basic implementation of prior and likelihood, omitting the “generated quantities” block, which is useful to store draws from the study-specific parameters theta+u[j] for graphics. Also, that is where we create random draws for prior and posterior predictive checking. We illustrate these methods in another post.

Divergent transitions

brms runs Stan in the background. Stan will identify “divergent transitions”, which are attempts to move to a new vector of parameter values (unknowns) that are potentially biased by sharp changes in gradients in the log-posterior density. This happens in multilevel models when the heterogeneity standard deviation approaches zero, and this is referred to as a funnel (sometimes Neal’s funnel) problem.

The Stan website’s page on warnings and diagnostics is the go-to resource to learn more about these challenges to MCMC algorithms, and there is a case study by Mike Betancourt on the Stan website about this. One way to avoid it is to use a prior on tau that gives low prior density to values close to zero. Half-normal, half-t, or half-Cauchy will be prone to this, because they give the highest prior density to values close to zero, while gamma, lognormal, chi-squared, or bespoke elicited priors can avoid the problem, but only if you are willing to be more informative about tau. Other methods are explained in the case study. The slow mixing traceplots in Figure 2.12 of the book are in fact caused by funnel behaviour.

Including uncertainty in the standard errors

When the sampling distribution is asymptotically normal, the natural way to acknowledge uncertainty in the standard error is to swap the normal likelihood for the t-distribution, using the family=student() option in brms, with n-2 degrees of freedom, as shown above. This model is analogous to the Sidik-Jonkman estimator with the Hartung-Knapp adjustment, and we address this in Section 4.6 of the book. This also has the effect of making the meta-analysis a little more robust to outlier studies.

mydata$df <- mydata$n - 2

meta_brms <- brm(logor | se(se_logor, 
                            sigma = FALSE, 
                            df = df) ~ 1 + (1|study),
                 data = mydata,
                 prior = priors,
                 family = student(),
                 iter = 5000,
                 chains = 2,
                 cores = 2,
                 inits = inits_list)

This student’s t distribution is the exact likelihood for means, but is heuristic for log odds ratios, simply chosen for its higher probability in the tails, and other options could be tried, including changing the degrees of freedom for the t.


Discover more from The Bayesian Meta-Analysis Network

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from The Bayesian Meta-Analysis Network

Subscribe now to keep reading and get access to the full archive.

Continue reading