CC-BY William Warby. close up photgraph of a seven spot labybird on a dandelion flower

Arm-based pairwise MA models are a precursor to coding arm-based network meta-analysis, and can also be extended to deal with some problems of the evidence base, such as unreported statistics. This post expands on Chapter 8 of our book.

Let’s give each study a unique number j and do the same over all the arms with k. Although we work with arm-level study statistics, we can’t just mix up all the control arms in one batch and all the intervention arms in another, and compare them. The arms are paired with one another and we need to preserve that pairing. To do this, we estimate mu[j], the control mean in study j, and we compare study j’s intervention mean to that (not to the mean of all studies’ control arms).

There is therefore a vector mu of length m (number of studies). These should have a diffuse or flat prior, not a heterogeneity distribution. We don’t want them pulled toward one another or “sharing information”.

Common effect, arm-based

Let’s start with studies reporting means and mean differences first. We ultimately are interested in theta, the mean difference. For m studies, there will be K=2m rows in your dataset. Each reports a mean in “arm_mean”, a standard error of the mean in “arm_se”, and number of participants/subjects in “arm_n”. They also have a variable called “active”, which is 1 if that arm had the intervention of interest, 0 if it is the control. Finally, each arm belongs to a study, and these consecutive numbers 1, 2, 3… are in a variable called study_id.

model{
  theta ~ ADD_YOUR_PRIOR_HERE
  for(j in 1:m){
    mu[j] ~ dnorm(0, 0.000001) # very diffuse
  }
  for(k in 1:K){
    precision[k] <- 1/(arm_se[k]*arm_se[k])
    df[k] <- arm_n[k]-1
    predicted[k] <- mu[study_id[k]] + theta*active[k]
    arm_mean[k] ~ dt(predicted[k], precision[k], df[k])
  }
}

Notice that we use double indexing in mu[study_id[k]]. Each row in the data is an arm and it belong to the study given in study_id[k]. That study has an estimated control arm mean stored in mu[study_id[k]].

Warning! Using lower case k and upper case K in the same program increases the risk of a typo (Robert has done it, and been baffled for an hour or so because it is hard to spot). This matches the mathematical notation in the book, but practically, it would be better to use a name like n_arms instead of K.

Because arms are smaller in sample size than whole studies, we have switched to the Student t distribution in the likelihood above, although that is not essential.

In the case of binomial likelihoods, studies report numerators (arm_d) and denominators (arm_n) (for example, the ACE inhibitor versus ARB dataset). Here, mu is the log odds in the control arms, and theta is the log odds ratio:

model{
  theta ~ ADD_YOUR_PRIOR_HERE
  for(j in 1:m){
    mu[j] ~ dnorm(0, 0.000001) # very diffuse
  }
  for(k in 1:K){
    logit(prob[k]) <- mu[study_id[k]] + theta*active[k]
    arm_d[k] ~ dbin(prob[k], arm_n[k])
  }
  oddsratio <- exp(theta)
}

Note that we can compute a log odds and obtain the corresponding risk (probability) by assigning it to logit(prob[k]). This is a BUGS and JAGS convention and is generally not possible in other software options. If you need to do the inverse logistic transformation manually (from log odds to risk), it is:

$$p = \frac{1}{1 + e^{\log w}}$$

prob <- 1 / (1 + exp(-1*logodds))

You might prefer the equivalent alternative code below. Instead of making the dataset “long”, and having one linear predictor formula for all, regression-style, we can keep it “wide” and have two lines of code that contribute to the likelihood, one for each arm. Here, d0 and n0 are the counts for the control arm, and d1 and n1 are for the intervention arm. You should use whichever code template you feel most comfortable with, as your task is to understand deeply what it does and be able to explain it.

model {
  theta ~ ADD_YOUR_PRIOR_HERE
  for(j in 1:m) {
        mu[j] ~ normal(0, 0.000001)
        logit(prob0[j]) <- mu[j]
        logit(prob1[j]) <- mu[j]+theta
  	d0[j] ~ dbin(prob0[j], n0[j])
  	d1[j] ~ dbin(prob1[j], n1[j])
  }
  oddsratio <- exp(theta)
}

Connection to fixed effects

Now you have seen code for the mu vector with a flat or diffuse prior. This is in fact the same coding approach that we would apply to a vector of theta[j] if we needed to fit a “fixed effects” meta-analysis, where each study has its own theta[j], but they are not “shrunken”, “pulled together”, or “sharing information”.

Random effects, arm-based

We have seen that, in order to respect the pairing of arms within studies, we have to introduce one more unknown to our model for each study. Further, these can be expected to correlate negatively in the posterior distributions with the intervention effect: if you under-estimate all the studies’ control arm means or log odds, then you have to over-estimate the mean difference or log odds ratio in order to estimate the intervention arms. This all makes the task more challenging for Bayesian algorithms like Metropolis-Hastings and the Gibbs sampler, which we find in BUGS, JAGS, Stata, and parts of JASP. In contrast, the Hamiltonian Monte Carlo algorithm in Stan (hence brms and multiNMA) and PyMC will deal with a higher-dimensional, correlated posterior. When we add random effects to the model, this starts to cause trouble. However, we can still get useful outputs as long as we are willing to specify good initial values, consider stronger priors, and run the algorithm for many more iterations.

We start again with arms reporting means. For small sample sizes, you can use the student_t() likelihood with arm_n[k]-1 degress of freedom, where k is an index of all the arms, and arm k belongs to study study_id[k].

model{
  theta ~ ADD_YOUR_PRIOR_HERE
  tau ~ ADD_YOUR_PRIOR_HERE
  tau_prec <- 1 / (tau*tau)

  for(j in 1:m){
    mu[j] ~ dnorm(0, 0.000001) # very diffuse
    u[j] ~ dnorm(0, tau_prec) # heterogeneity
  }

  for(k in 1:K){
    precision[k] <- 1/(arm_se[k]*arm_se[k])
    df[k] <- arm_n[k]-1
    predicted[k] <- mu[study_id[k]] + (theta+u[study_id[k]])*active[k]
    arm_mean[k] ~ dt(predicted[k], precision[k], df[k])
  }
}

The changes necessary for binary outcomes are only those previously shown:

model{
  theta ~ dnorm(0, 0.111)
  tau ~ dgamma(1.2, 1)
  tau_prec <- 1 / (tau*tau)

  for(j in 1:m){
    mu[j] ~ dnorm(0, 0.000001) # very diffuse
    u[j] ~ dnorm(0, tau_prec) # heterogeneity
  }

  for(k in 1:K){
    logit(prob[k]) <- mu[study[k]] + (theta+u[study[k]])*active[k]
    arm_d[k] ~ dbin(prob[k], arm_n[k])
  }
  oddsratio <- exp(theta)
}

This code is the equivalent with separate likelihood lines for the two arms:

model{
  theta ~ dnorm(0, 0.111)
  tau ~ dgamma(1.2, 1)
  tau_prec <- 1 / (tau*tau)

  for(j in 1:m){
    mu[j] ~ dnorm(0, 0.000001) # very diffuse
    u[j] ~ dnorm(0, tau_prec) # heterogeneity
    logit(prob1[j]) <- mu[study[j]]
    logit(prob2[j]) <- mu[study[j]] + u[study[j]] + theta
    d1[j] ~ dbin(prob1[j], n1[j])
    d2[j] ~ dbin(prob2[j], n2[j])
  }
  oddsratio <- exp(theta)
}

Once we have mu[j] and u[j], theta and tau, we have 2m+2 unknowns but 2m arms, so we are slicing the study information very thinly indeed, and trusting in the sharing of information in the model to get us through. When the studies are all the same size, none is too small, and the events not too rare or too common, which we can achieve in simulation (see the “all pairwise models” posts), this model will work. However, real-life evidence bases are not so obliging. The ACEI vs ARB dataset has one very large study and some of the others are too small to contribute much information at all. Further, the event is fairly rare, so the absolute numbers get very small. If you try these models with that dataset, you may see some problems in MCMC convergence.

Fully Bayesian models

The “fully Bayesian” model has a likelihood contribution from the arm mean and another from the arm precision. (see Section 8.4 in the book for explanation of the sampling distribution of precisions and variances)

There are important modelling assumptions to be chosen and justified regarding the population standard deviation or precision of the outcome variable. In this random effects example, we have heterogeneity on the study means but we assume a common population SD shared by all control arms (sigma_ctl), and another for all intervention arms (sigma_int). This can be changed in various ways, as long as the assumptions can be justified. For example, there could be one sigma common to all arms in all studies, or some form of heterogeneity distribution could be applied across studies.

model {
  # priors:
  theta ~ ADD_YOUR_PRIOR_HERE
  tau ~ ADD_YOUR_PRIOR_HERE
  sigma_int ~ ADD_YOUR_PRIOR_HERE
  sigma_ctl ~ ADD_YOUR_PRIOR_HERE

  # heterogeneity precision
  tau_prec <- 1 / (tau*tau)

  # very diffuse mu, heterogeneity for u
  for(j in 1:m) {
    mu[j] ~ dnorm(0, 0.000001)
    u[j] ~ dnorm(0, tau)
  }

  for(k in 1:K) {
    # likelihood of mu, u, theta, given sigmas
    arm_prec[k] <- 1 / (arm_sd[k]*arm_sd[k])
    arm_mean[k] ~ dnorm(mu[study_id[k]] + (theta+u[study_id[k]])*active[k], arm_prec[k])

    # likelihood of sigmas
    arm_sigma[k] = active[k]*sigma_int +
                      (1-active[k])*sigma_ctl
    varx2[k] <- arm_sd[k]*arm_sd[k]*(arm_n[k]-1)/(arm_sigma[k]*arm_sigma[k])
    varx2[k] ~ chi_square(arm_n[k]-1)     
  }
}
  

tags: #bugs, #bugs-repo, #jags, #jags-repo, #repo


Discover more from The Bayesian Meta-Analysis Network

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from The Bayesian Meta-Analysis Network

Subscribe now to keep reading and get access to the full archive.

Continue reading