When study statistics might have a typo

This post expands on subject matter of Chapter 5 of our book, on extracting study statistics for meta-analysis.

Uncertainty about a published statistic is one of the most common reasons to consider going Bayesian in your meta-analysis. Because probability can represent any uncertainty, we can use a prior distribution for the uncertain statistic instead of taking it at face value.

You and your non-statistical colleagues would probably be comfortable with sensitivity analysis, where you try out a limited number of possible values, plug them in, and see what happens. Using a prior distribution is just like sensitivity analysis++. We can allow the unknown statistic to move around with each iteration of our Bayesian smpling algorithm, and see what the final effect is.

Sometimes people feel that this “injects” uncertainty into the analysis, but it is really about acknowledging the uncertainty that is already there.

The extreme instance of this uncertainty is when the statistic is not reported at all! We discuss this in Chapter 11 of our book, and code for those models will be provided in other posts. Here, we want to focus on the possibility of a good old typographic error. We’ve probably all done it, or typesetters and colleagues have done it for us. Sometimes it is glaringly obvious, at other times it requires a bit of exploratory data analysis to reveal it.

Let’s consider a simple example in the green tea dataset.

We can plot the means and SDs in each arm of each study:

library(dplyr)

greentea <- read.csv("cochrane_green_tea_weight_loss.csv")

plot(greentea$mean_tea, greentea$sd_tea, 
     ylim=c(0, 3), xlim=c(-3.5, 0.5), 
     xaxs='i', yaxs='i', 
     xlab="Mean weight loss (BMI)", 
     ylab="Standard deviation")
points(greentea$mean_control, 
       greentea$sd_control, 
       pch=19)
for(i in 1:NROW(greentea)) { 
  lines(c(greentea$mean_tea[i],
          greentea$mean_control[i]), 
        c(greentea$sd_tea[i], 
          greentea$sd_control[i]), 
        col='#00000060') 
}
legend("topleft", 
       pch=c(1,19), 
       legend=c("Green tea", "Control"))

Each study generally has a lower (further from zero) weight loss in the tea arm than the control, and the SDs are about the same. One study, shown on the left of the graph, has unusually large weight loss and high SDs, but the arms are consistent with each other on that. We also don’t have much information about the left region (larger weight losses) and for all we know, something about different treatment preparations and sub-populations (heterogeneity) could induce a negative correlation between the mean and the standard deviation, joining that study to the bulk of stats in the bottom right corner of the graph.

The study that is more concerning, and which we focus on here, has a nearly vertical line, where the tea arm has a SD of 2.8 while the control arm has a SD of 0.8.

The paper is Hsu et al (2008) (unfortunately not open access), and we use it here as an example of coding. We have read it in detail and have no compelling evidence of a typo, though 2.8 looks strange. In future posts, we will dig in depth into papers and do detective work.

Hsu et al have an unusual value in the variable sd_tea. It might plausibly be a typo; perhaps it should be 0.8 instead of 2.8? Of course, the first thing to do is to write to the authors, but let’s assume you don’t solve the issue that way (one rarely does). The second thing to do is to ask project experts for their take on it, and document those discussions. We could include a prior on that one statistic. But first, let’s look at the impact of the extreme cases we are willing to consider: 0.8 and 2.8.

library(meta)

re_2_8 <- metacont(n_tea, 
                   mean_tea, 
                   sd_tea, 
                   n_control, 
                   mean_control, 
                   sd_control, 
	           data=greentea, 
	           sm="MD",
                   studlab=Study,
	           common=FALSE, random=TRUE,
	           method.tau="DL")
print(summary(re_2_8))

                       MD             95%-CI %W(random)
Kozuma 2005       -1.3000 [-1.4345; -1.1655]        9.5
Takase 2008       -1.2000 [-1.3718; -1.0282]        9.4
Nagao 2007        -0.6000 [-0.7519; -0.4481]        9.5
Takeshita 2008    -0.4000 [-0.5970; -0.2030]        9.3
Kajimoto 2005     -0.4000 [-0.6114; -0.1886]        9.3
Suzuki 2009       -0.2000 [-0.5500;  0.1500]        8.6
Kataoka 2004      -0.1000 [-0.3632;  0.1632]        9.0
Takashima 2004     0.0000 [-0.5403;  0.5403]        7.4
Auvichayapat 2008 -1.1000 [-1.9860; -0.2140]        5.2
Hill 2007         -0.2000 [-0.4248;  0.0248]        9.2
Hsu 2008          -0.1000 [-0.9950;  0.7950]        5.2
Diepvens 2005      0.0000 [-0.3768;  0.3768]        8.4

Number of studies: k = 12
Number of observations: o = 1252 (o.e = 655, o.c = 597)

                          MD             95%-CI     z p-value
Random effects model -0.4753 [-0.7736; -0.1770] -3.12  0.0018

Quantifying heterogeneity:
 tau^2 = 0.2385 [0.0945; 0.7413]; tau = 0.4884 [0.3075; 0.8610]
 I^2 = 94.4% [91.9%; 96.1%]; H = 4.22 [3.51; 5.09]

Test of heterogeneity:
      Q d.f.  p-value
 196.31   11 < 0.0001

Details on meta-analytical method:
- Inverse variance method
- DerSimonian-Laird estimator for tau^2
- Jackson method for confidence interval of tau^2 and tau

We are running a non-Bayesian meta-analysis for this, but you could also do a basic random effects Bayesian model.

Now make a copy of the data frame and set Hsu’s sd_tea to 0.8:

greentea2 <- greentea
greentea2$sd_tea[11] <- 0.8

re_0_8 <- metacont(n_tea, 
                   mean_tea, 
                   sd_tea, 
                   n_control, 
                   mean_control, 
                   sd_control, 
	           data=greentea2, 
	           sm="MD",
	           studlab=Study,
                   common=FALSE, random=TRUE,
	           method.tau="DL")
print(summary(re_0_8))

                       MD             95%-CI %W(random)
Kozuma 2005       -1.3000 [-1.4345; -1.1655]        9.2
Takase 2008       -1.2000 [-1.3718; -1.0282]        9.1
Nagao 2007        -0.6000 [-0.7519; -0.4481]        9.2
Takeshita 2008    -0.4000 [-0.5970; -0.2030]        9.0
Kajimoto 2005     -0.4000 [-0.6114; -0.1886]        9.0
Suzuki 2009       -0.2000 [-0.5500;  0.1500]        8.3
Kataoka 2004      -0.1000 [-0.3632;  0.1632]        8.7
Takashima 2004     0.0000 [-0.5403;  0.5403]        7.1
Auvichayapat 2008 -1.1000 [-1.9860; -0.2140]        5.1
Hill 2007         -0.2000 [-0.4248;  0.0248]        8.9
Hsu 2008          -0.1000 [-0.4555;  0.2555]        8.3
Diepvens 2005      0.0000 [-0.3768;  0.3768]        8.1

Number of studies: k = 12
Number of observations: o = 1252 (o.e = 655, o.c = 597)

                          MD             95%-CI     z p-value
Random effects model -0.4631 [-0.7582; -0.1680] -3.08  0.0021

Quantifying heterogeneity:
 tau^2 = 0.2415 [0.0982; 0.7285]; tau = 0.4914 [0.3133; 0.8535]
 I^2 = 94.6% [92.3%; 96.3%]; H = 4.32 [3.60; 5.19]

Test of heterogeneity:
      Q d.f.  p-value
 205.41   11 < 0.0001

Details on meta-analytical method:
- Inverse variance method
- DerSimonian-Laird estimator for tau^2
- Jackson method for confidence interval of tau^2 and tau

Comparing the tau and theta estimates and 95% confidence intervals:

sd_tea value	Parameter or Unknown	Estimate	Lower CI	Upper CI
2.8	theta	-0.48	-0.77	-0.18
0.8	theta	-0.46	-0.76	-0.17
2.8	tau	0.49	0.31	0.86
0.8	tau	0.49	0.31	0.85

This change has a negligible effect and so we can justify not including a prior for it. Record any decisions and sensitivity analysis like this for posterity.

(In fact, you can see a small change at the 2nd decimal place. Below, we will explain that shift towards lower theta and wider CI as the sd_tea grows bigger. It is entirely what we would expect based on general statistical theory.)

But for learning, let’s construct a Bayesian RE MA that has weakly informative priors for theta and tau but treats Hsu’s sd_tea as another unknown and puts a uniform prior on it over the range [0.8, 2.8]. This treats it as Coarsened Completely At Random, in the Heitjan-Rubin terminology: no other stats we have to hand can give us any information about its true value, but we are willing to model it with that prior, which represents either degree of belief or (more plausibly) a weakly informative prior for sensitivity analysis.

One could make a coherent argument to, for example, extend the prior a little beyond [0.8, 2.8] on either side. Another option might be a U-shaped rescaled beta prior that prefers the two extremes as plausible typo explanations, but also allows the possibility of an in-between value, for example:

$$\frac{\sigma_{\mathbf{Hsu}, \mathbf{tea}} – 0.8}{2.8 – 0.8} \sim \mathrm{Beta}(0.9, 0.9)$$

but this is asking for non-convergence of chains and difficulty in interpretation

These are issues to discuss with the whole project team and to get approval for. Don’t let any subject experts duck that meeting as too technical. You need their sign-off so everyone can justify choices later, and so you don’t get scope creep.

First, make md and se_md. This is the dplyr (but non-piping) alternative to the base code in the book (page 98).

greentea <- mutate(greentea, 
       md=mean_tea-mean_control)
greentea <- mutate(greentea,
       pooled_var=((n_tea-1)*(sd_tea^2) +
                   (n_control-1)*(sd_control^2)) /  
                  (n_tea+n_control-2))
greentea <- mutate(greentea, 
       var_md=(n_tea+n_control) * pooled_var / 
              (n_tea*n_control))
greentea <- mutate(greentea, 
       se_md=sqrt(var_md))

not_hsu <- greentea[-11,]

If you really want to pipe left-to-right:

library(magrittr)
greentea %>% 
   mutate(md=mean_tea-mean_control) %>%
   mutate(pooled_var=((n_tea-1)*(sd_tea^2) +
                   (n_control-1)*(sd_control^2)) /  
                  (n_tea+n_control-2)) %>%
   mutate(var_md=(n_tea+n_control) * pooled_var / 
              (n_tea*n_control)) %>%
   mutate(se_md=sqrt(var_md)) -> greentea

not_hsu <- greentea[-11,]

You can replace the magrittr pipe with the newer native pipe |> in this code chunk if you prefer, and you don’t have to right-assign with -> at the end if that is not your style. Write your code the way you can most easily read and understand (and debug!) it.

If we run a Bayesian random effects meta-analysis with weakly informative priors on theta and tau, we can check that we obtain almost identical answers. We use a uniform prior on tau that excludes values under 0.1. This is not always a good idea: make sure the upper limit is a good way above any reasonable range of uncertainty in the context of your meta-analysis. If you are unsure, go with something like a gamma prior instead. We clip the small values to stop funnel behaviour which could be a problem, especially as we will use JAGS via the rjags package (see the introductory post on random effects MA models in BUGS/JAGS if you are not famililar with this).

jagsdata <- list(m = NROW(greentea),
	             md = greentea$md,
	             se_md = greentea$se_md)

jagscode <- "
model {
  theta ~ dnorm(0, 0.25) # SD=2
  tau ~ dunif(0.1, 10)
  tau_prec <- 1/(tau*tau)

  for(j in 1:m) {
    u[j] ~ dnorm(0, tau_prec)
    theta_u[j] <- theta+u[j]
    prec_md[j] <- 1 / (se_md[j]*se_md[j])
    md[j] ~ dnorm(theta_u[j], prec_md[j])
  }
}
"

write(jagscode, "JAGS-hsu-basic.txt")

jagsinits <- list(list(theta=(-0.4), tau=1),
                  list(theta=(-0.2), tau=1.2))
jags.seed <- 12345
jagsmod <- jags.model("JAGS-hsu-basic.txt",
                      data=jagsdata,
                      inits=jagsinits,
                      n.chains=2)

update(jagsmod, n.iter=1000)
jagsdraws <- coda.samples(jagsmod,
                        variable.names=c("theta", 
                                         "tau", 
                                         "theta_u[1]",
                                         "theta_u[2]"),
                        n.iter=100000,
                        thin=10)

summary(jagsdraws)

gelman.diag(jagsdraws)

par(mfcol=c(2,2))

autocorr.plot(jagsdraws)
traceplot(jagsdraws, col=c("#b6252580", "#25b6b680"))

par(mfcol=c(1,1))

pairs(as.matrix(jagsdraws[[1]]), cex=0.4, col="#b6252510")
pairs(as.matrix(jagsdraws[[2]]), cex=0.4, col="#25b6b610")

> summary(jagsdraws)

Iterations = 2010:102000
Thinning interval = 10 
Number of chains = 2 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

              Mean      SD  Naive SE Time-series SE
tau         0.5091 0.14053 0.0009937      0.0011795
theta      -0.4742 0.16147 0.0011418      0.0025701
theta_u[1] -1.2829 0.06639 0.0004694      0.0004703
theta_u[2] -1.1758 0.08637 0.0006107      0.0006107

2. Quantiles for each variable:

              2.5%     25%     50%     75%   97.5%
tau         0.3062  0.4109  0.4841  0.5813  0.8523
theta      -0.7953 -0.5772 -0.4761 -0.3710 -0.1500
theta_u[1] -1.4130 -1.3276 -1.2831 -1.2385 -1.1528
theta_u[2] -1.3454 -1.2334 -1.1759 -1.1185 -1.0068

Let’s run the Bayesian meta-analysis, with extra prior on Hsu’s SE, in JAGS via the rjags package. We will make use of the formulas for the pooled variance and the standard error of the mean difference, which appear in Equations 3.1 and 3.3 in our book. Note that we send 11 studies’ stats as vectors and send Hsu’s separately in scalars md_hsu, nt, nc and sc.

jagsdata <- list(m = NROW(not_hsu),
	             md = not_hsu$md,
	             se_md = not_hsu$se_md,
	             md_hsu = greentea$md[11],
	             nt = greentea$n_tea[11],
	             nc = greentea$n_control[11],
	             sc = greentea$sd_control[11])

jagscode <- "
model {
  theta ~ dnorm(0, 0.25) # SD=2
  tau ~ dunif(0.1, 10)
  tau_prec <- 1 / (tau*tau)

  for(j in 1:m) {
    u[j] ~ dnorm(0, tau_prec)
    prec_md[j] <- 1 / (se_md[j]*se_md[j])
    md[j] ~ dnorm(theta+u[j], prec_md[j])
  }
  sd_tea_hsu ~ dunif(0.8, 2.8)
  pooled_var_hsu <- ((nt-1)*(sd_tea_hsu^2) + (nc-1)*(sc^2)) / (nt+nc-2)
  var_md_hsu <- (nt+nc) * pooled_var_hsu / (nt*nc)
  prec_md_hsu <- 1 / var_md_hsu
  
  u_hsu ~ dnorm(0, tau_prec)
  md_hsu ~ dnorm(theta+u_hsu, prec_md_hsu)
}
"

write(jagscode, "JAGS-hsu-sensitivity.txt")

jagsinits <- list(list(theta=0.2, tau=1),
                  list(theta=(-0.2), tau=1.2))
jags.seed <- 12345
jagsmod <- jags.model("JAGS-hsu-sensitivity.txt",
                      data=jagsdata,
                      inits=jagsinits,
                      n.chains=2)

update(jagsmod, n.iter=1000)
jagsdraws <- coda.samples(jagsmod,
                        variable.names=c("theta", 
                                         "tau",
                                         "sd_tea_hsu", 
                                         "u[1]", 
                                         "u[2]", 
                                         "u_hsu"),
                        n.iter=100000,
                        thin=10)

summary(jagsdraws)

gelman.diag(jagsdraws)

par(mfcol=c(3,2))

autocorr.plot(jagsdraws)
traceplot(jagsdraws, col=c("#b6252580", "#25b6b680"))

for(i in 1:6) {
	plot(density(as.numeric(jagsdraws[[1]][,i]), bw="SJ"),
    	 main=colnames(jagsdraws[[1]])[i])
}

par(mfcol=c(1,1))

pairs(as.matrix(jagsdraws[[1]]), cex=0.4, col="#b6252510")
pairs(as.matrix(jagsdraws[[2]]), cex=0.4, col="#25b6b610")

Iterations = 2010:102000
Thinning interval = 10 
Number of chains = 2 
Sample size per chain = 10000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

              Mean     SD Naive SE Time-series SE
sd_tea_hsu  1.7789 0.5748 0.004065       0.004065
tau         0.5104 0.1440 0.001018       0.001563
theta      -0.4678 0.1685 0.001191       0.002976
u[1]       -0.8146 0.1797 0.001271       0.003075
u[2]       -0.7061 0.1871 0.001323       0.003108
u_hsu       0.2546 0.2997 0.002119       0.003032

2. Quantiles for each variable:

              2.5%      25%     50%     75%   97.5%
sd_tea_hsu  0.8493  1.27587  1.7710  2.2737  2.7438
tau         0.3094  0.41143  0.4852  0.5784  0.8599
theta      -0.7933 -0.57078 -0.4703 -0.3677 -0.1393
u[1]       -1.1695 -0.92420 -0.8110 -0.7009 -0.4758
u[2]       -1.0703 -0.82098 -0.7013 -0.5863 -0.3549
u_hsu      -0.3432  0.06334  0.2579  0.4479  0.8479

Now that we allow the possibility of Hsu’s se_tea being smaller than was reported, we see a very small shift in the putative effect of green tea extracts (theta) upward toward zero, with a very slightly narrower credible interval. This is because Hsu et al reported a mean difference close to zero, a higher value than most studies. By shrinking the standard error, we give this result more credence and more influence over the meta-analysis (compare the DerSimonian-Laird weights (%) given to it in the two non-Bayesian meta-anlayses at the beginning of this post). The heterogeneity (tau) is not convincingly changed compared to the size of the Monte Carlo error (seen here as “Time-series SE”).

Bottom line: even if it is wrong, it does not make a difference to the conclusions that is worth worrying about. (We knew this already from the crude senitivity analysis above…) You should always record these sorts of decisions and exploratory extended models.

Caution: don’t feel tempted to put priors like this on everything in case something might be wrong. They will not tell you some kind of unobtainable truth, they just wriggle the model and see what happens. And, if you try priors for more and more possible problems, you might make an incorrect decision in model building by chance, just like p-hacking in frequentist statistics.

We can also use the ~dcat() function to allow only two possible values, 0.8 or 2.8. This is not possible in Stan (hence brms) because the Hamiltonian Monte Carlo algorithm uses gradients of the log posterior density and so requires all parameters (unknowns) to be “continuous” (to jointly define a metric space). However, if you are reasonably mathematically capable, see the Stan User Guide on marginalising discrete parameters.

Another option (if your project team sign it off) is

Other algorithms that use the gradients are the more exotic MALA, bouncy particle sampler, or zig-zag sampler. At the time of writing, bayesmeta and JASP simply would not allow this kind of extended Bayesian model. BUGS, JAGS and Stata will allow permit discrete unknowns, and PyMC can run the algorithms (Gibbs sampler or random walk Metropolis-Hastings) that would accommodate them. We will explore the ~dcat() prior in the context of data extraction in other posts.

tags: #greentea, #jags, #rjags, #extraction

When study statistics might have a typo

Like this:

Discover more from The Bayesian Meta-Analysis Network

Leave a ReplyCancel reply

When study statistics might have a typo

Share this:

Like this:

Discover more from The Bayesian Meta-Analysis Network

Leave a ReplyCancel reply

Discover more from The Bayesian Meta-Analysis Network