Arm-based pairwise MA models are a precursor to coding arm-based network meta-analysis, and can also be extended to deal with some problems of the evidence base, such as unreported statistics. This post expands on Chapter 8 of our book.

To reshape the green tea data, for example, to be arm-based:

import delimited "cochrane_green_tea_weight_loss.csv", clear

keep id n_* mean_* sd_*

rename n_tea n2
rename mean_tea mean2
rename sd_tea sd2
rename n_control n1
rename mean_control mean1
rename sd_control sd1

reshape long n mean sd, i(id) j(active)
replace active = active-1
// active now contains 0=control, 1=active

gen se2 = (sd^2) / n
gen df = n - 1

Each study now has a unique number in the variable id.

Although we work with arm-level study statistics, we can’t just mix up all the control arms in one batch and all the intervention arms in another, and compare them. The arms are paired with one another and we need to preserve that pairing.

To do this, we include a “random effect” in Stata’s parlance, MU[study_id]. Don’t worry about how this belongs in a common effect meta-analysis just yet, we will clarify the connections below. The nocons option means that MU[] acts as the control mean in each study, and we compare each study’s intervention mean to that (not to the mean of all studies’ control arms). MU[] should have a diffuse or flat prior, not a heterogeneity distribution. We don’t want them pulled toward one another or “sharing information”.

For binary data like the ACE inhibitor vs ARBs dataset, we can go to arm-based like this:

import delimited "ARBvACEI.csv", clear

// ACEI is the "intervention" here
generate acei_no_ae = acei_n - acei_ae
generate arb_no_ae = arb_n - arb_ae

gen id = _n
rename arb_ae ae1
rename arb_n n_pts1
rename acei_ae ae2
rename acei_n n_pts2

keep ae1 n_pts1 ae2 n_pts2 id

reshape long ae n_pts, i(id) j(active)
replace active = active-1
/* active==1 means ACE inhibitor,
   0 means ARB */

Common effect, arm-based

Let’s start with studies reporting means and mean differences first. We ultimately are interested in theta, the mean difference. For m studies, there will be K=2m rows in your dataset. Each reports a mean in “mean”, a squared standard error of the mean in “se2”, and number of participants/subjects in “n”. They also have a variable called “active”, which is 1 if that arm had the intervention of interest, 0 if it is the control.

bayesmh mean active MU[id], ///
   nocons ///
   likelihood(t(se2, df)) ///
   prior({mean:active}, ADD_YOUR_PRIOR_HERE) ///
   prior({MU[id]}, normal(0, 10000)) ///
   nchains(2) ///
   burnin(2500) ///
   mcmcsize(10000) ///
   init1({mean:active} 0.5 ) ///
   init2({mean:active} -0.5)

For the prior on {MU[id]} in this example, we use normal with variance of 10,000 (SD of 100), which is diffuse in comparison to the changes in body mass index in the green tea data. Remember to assess the scale required in the context of your own evidence base.

Because arms are smaller in sample size than whole studies, we have switched to the Student t distribution in the likelihood above, although that is not essential.

Try this variant, which omits {MU[id}, and so assumes complete
exchangeability, which some people describe as breaking
randomisation, and induces a small bias:

bayesmh mean active, ///
        likelihood(t(se2, df)) ///
        prior({mean:_cons}, ADD_YOUR_PRIOR_HERE) /// 
        prior({mean:active}, ADD_YOUR_PRIOR_HERE) ///
        nchains(2) ///
        burnin(2500) ///
        mcmcsize(10000) ///
        init1({mean:_cons} -0.2 ///
              {mean:active} 0.2 ) ///
        init2({mean:_cons} 0.5 ///
              {mean:active} -0.4)

To be clear, that’s not recommended! But it is useful to compare its output with the right model.

In the case of binomial likelihoods, studies report numerators (arm_d) and denominators (arm_n). The corresponding code is similar, but uses the binomial() likelihood specification:

bayesmh ae active MU[id], nocons ///
          likelihood(binomial(n_pts)) ///
	  prior({ae:active}, ADD_YOUR_PRIOR_HERE) ///
	  prior({MU[id]}, normal(0, 10000)) ///
          dots ///
	  showreffects ///
	  rseed(4567) ///
	  nchains(2) ///
	  burnin(10000) ///
	  mcmcsize(10000) ///
	  init1({ae:active} 0.5 ) ///
	  init2({ae:active} -0.5)

Here, {MU[id]} is the log odds in the control arms, and {ae:active} is the log odds ratio:

Note that we estimate the log odds in each control arm (MU[id]) and the log odds ratio. If you need to do the inverse logistic transformation manually (from log odds to risk), it is:

gen risk = 1 / (1 + exp(-1*logodds))

However, inference with small numbers and variation in study size can prove to be difficult using the old Metropolis-Hastings algorithm. This is in fact the case for the ACE inhibitor vs ARB dataset! The simple solution is to expand the data to be one row per participant (this is easy for binary data, but is not an option for continuous outcomes like weight loss), and then use the “logistic” likelihood specification:

gen arm_id = _n
expand n_pts
sort arm_id
gen event = 0
bysort arm_id: replace event = 1 if _n<=ae

bayesmh event MU[id] active, nocons ///
           likelihood(logistic) ///
	   prior({event:active}, ADD_YOUR_PRIOR) ///
	   prior({MU[id]}, normal(0, 10000)) ///
	   dots ///
	   showreffects ///
           rseed(8901) ///
	   nchains(2) ///
	   burnin(10000) ///
	   mcmcsize(10000) ///
	   init1({event:active} 0.5 ) ///
	   init2({event:active} -0.5) 

Note the long burnin, which is to make sure that adaptation has finished. The dots option is on too, because we need to see aaaaaa turn into ……. in the output. This pseudo-participant level computation will of course take longer than the arm level one.

Connection to fixed effects

Now you have seen code for {MU[id]} with a diffuse prior. This is in fact the same coding approach that we would apply to a vector of study-specific contrasts if we needed to fit a “fixed effects” meta-analysis, where each study has its own control arm population mean or proportion, but they are not “shrunken”, “pulled together”, or “sharing information”.

Another way of doing it is to make study id into a categorical predictor, and put the constant term back. In Stata, these look very different, and different computational machinery is used on them behind the scenes, while in full probabilistic programming languages like Stan or JAGS or PyMC, there is no difference.

Random effects, arm-based

We have seen that, in order to respect the pairing of arms within studies, we have to introduce one more unknown to our model for each study. Further, these can be expected to correlate negatively in the posterior distributions with the intervention effect: if you under-estimate all the studies’ control arm means or log odds, then you have to over-estimate the mean difference or log odds ratio in order to estimate the intervention arms. This all makes the task more challenging for Bayesian algorithms like Metropolis-Hastings and the Gibbs sampler, which we find in BUGS, JAGS, Stata, and parts of JASP. In contrast, the Hamiltonian Monte Carlo algorithm in Stan (hence brms and multiNMA) and PyMC can cope with a higher-dimensional, correlated posterior. When we add random effects to the model, this starts to cause trouble. However, we can still get useful outputs as long as we are willing to specify good initial values, consider stronger priors, and run the algorithm for many more iterations.

We start again with arms reporting means. For small sample sizes, you can use the Student t likelihood with n-1 degress of freedom, as shown above. If we begin with data one row per study, we need to reshape it first. Then, note that we include ‘active’ as a predictor, and its interaction with the random effects {U[id]} (but not U[id] by itself): this ensures that {MU[id]} affects both arms of each study, but {U[id]} affects the treatment arm only.

keep study n* mean* sd*
reshape long n mean sd, i(id) j(active)
gen se2 = (sd^2) / n
gen df = n - 1

bayesmh mean i.active 1.active#U[id] MU[id], ///
        nocons ///
        likelihood(t(se2, df)) ///
	prior({MU[id]}, normal(0, 10000)) ///
	prior({mean:1.active}, ADD_YOUR_PRIOR_HERE) ///
	prior({U[id]}, normal(0, {tau2})) ///
	prior({tau2}, ADD_YOUR_PRIOR_HERE) ///
	dots ///
	nchains(2) ///
	showreffects ///
	rseed(5678) ///
	burnin(20000) ///
	mcmcsize(10000) ///
	thin(10) ///
	init1({mean:1.active} -0.2 ///
	      {tau2} 0.05) ///
	init2({mean:1.active} 0.2 ///
	      {tau2} 0.15)

bayesstats grubin
bayesgraph diagnostics {mean:active}, ///
     title("Mean difference (theta)") ///
     name(diag_lor, replace)
bayesgraph diagnostics {MU[1]}, ///
     title("MU[1]") ///
     name(diag_mu1, replace)
bayesgraph diagnostics {U[1]}, ///
     title("U[1]") ///
     name(diag_u1, replace)
bayesgraph diagnostics {tau2}, ///
     title("tau squared") ///
     name(diag_tau2, replace)

The changes necessary for binary outcomes are only those previously shown, thanks to the way that Stata treats the dependent, indepedent variables, and random effects as the linear predictor, regardless of the likelihood type (including the logistic link function):

bayesmh d i.active 1.active#U[id] MU[id], ///
        nocons ///
        likelihood(binomial(n)) ///
        prior({MU[id]}, normal(0, 10000)) ///
	prior({d:1.active}, ADD_YOUR_PRIOR_HERE) ///
	prior({U[id]}, normal(0, {tau2})) ///
	prior({tau2}, ADD_YOUR_PRIOR_HERE) ///
	nchains(2) ///
	showreffects ///
	rseed(5678) ///
	burnin(20000) ///
	mcmcsize(10000) ///
	thin(10) ///
	init1({d:1.active} -0.2 ///
	      {tau2} 0.05) ///
	init2({d:1.active} 0.2 ///
	      {tau2} 0.15)

bayesstats grubin
bayesgraph diagnostics {d:1.active}, ///
     title("Log odds ratio") ///
     name(diag_lor, replace)
bayesgraph diagnostics {MU[1]}, ///
     title("MU[1]") ///
     name(diag_mu1, replace)
bayesgraph diagnostics {U[1]}, ///
     title("U[1]") ///
     name(diag_u1, replace)
bayesgraph diagnostics {tau2}, ///
     title("tau squared") ///
     name(diag_tau2, replace)

…or you can run it at pseudo-participant level:

gen arm_id = _n
expand n_pts
sort arm_id
gen event = 0
bysort arm_id: replace event = 1 if _n<=ae

bayesmh event MU[id] i.active 1.active#U[id], ///
          nocons ///
          likelihood(logistic) ///
	  prior({event:1.active}, ADD_YOUR_PRIOR) ///
	  prior({MU[id]}, normal(0, 10000)) ///
	  prior({U[id]}, normal(0, {tau2})) ///
	  prior({tau2}, ADD_YOUR_PRIOR_HERE) ///
	  dots ///
	  rseed(0246) ///
	  showreffects ///
	  nchains(2) ///
	  burnin(15000) ///
	  mcmcsize(10000) ///
	  thin(10) ///
	  init1({event:1.active} 0.5  ///
		{tau2} 0.05) ///
	  init2({event:1.active} -0.5  ///
		{tau2} 0.15) 

bayesstats grubin
bayesgraph diagnostics {event:1.active}, ///
     title("Log odds ratio") ///
     name(diag_lor, replace)
bayesgraph diagnostics {MU[1]}, ///
     title("MU[1]") ///
     name(diag_mu1, replace)
bayesgraph diagnostics {U[1]}, ///
     title("U[1]") ///
     name(diag_u1, replace)
bayesgraph diagnostics {tau2}, ///
     title("tau squared") ///
     name(diag_tau2, replace)

Once we have mu[j] and u[j], theta and tau, we have 2m+2 unknowns but 2m arms, so we are slicing the study information very thinly indeed, and trusting in the sharing of information in the model to get us through. When the studies are all the same size, none is too small, and the events not too rare or too common, which we can achieve in simulation (see the “all pairwise models” posts), this model will work. However, real-life evidence bases are not so obliging. The ACEI vs ARB dataset has one very large study and some of the others are too small to contribute much information at all. Further, the event is fairly rare, so the absolute numbers get very small. If you try these models with that dataset, you may see some problems in MCMC convergence.

Fully Bayesian models

The “fully Bayesian” model has a likelihood contribution from the arm mean and another from the arm variance (see Section 8.4 in the book for explanation of the sampling distribution of variances).

There are important modelling assumptions to be chosen and justified regarding the population standard deviation or precision of the outcome variable. In this random effects example, we have heterogeneity on the study means but we assume a common population SD shared by all control arms, {sigma_ctl}, and another for all intervention arms, {sigma_int}. This can be changed in various ways, as long as the assumptions can be justified. For example, there could be one {sigma} common to all arms in all studies, or some form of heterogeneity distribution could be applied across studies.

tags: #stata, #stata-repo, #repo


Discover more from The Bayesian Meta-Analysis Network

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from The Bayesian Meta-Analysis Network

Subscribe now to keep reading and get access to the full archive.

Continue reading