Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Power calculations are conducted on the basis of observing ‘statistically significant’ differences between study arms, however ‘statistical significance’ is not a statistic used in Bayesian analyses, nor we argue is it the appropriate statistic for determining the effectiveness of the intervention. Instead, we conduct an ‘assurance’ analysis, to determine what the probability of estimating an effect size to within a given degree of certainty is once we have observed the data and updated the model. This has a more natural interpretation and allows us to take into account our prior uncertainty over the effect size. Given that the assurance calculations are mathematically intractable, we simulate trial data to conduct our evaluation of the design. We conduct a simulation with no individual or cluster-level covariates and a trial start staggered over three time periods.
The West Midlands Combined Authority will recruit 132 workplaces to be randomised equally between the different arms of the trial (33 per trial arm). Based on the demographics of SMEs in the West Midlands, we anticipate a mean size of 35 employees per small-medium enterprise, so approximately 4,620 employees would be eligible to be sampled.
A minimum of 10 employees per small-medium enterprise (at a range of different levels) will be randomly selected from each small-medium enterprise for interview, employees will be resampled for each subsequent assessment.
For the purposes of the design evaluation and generation of simulation data we consider informative priors, akin to a prior belief about effect size in Frequentist power calculations, as distinct to the analysis priors for the analysis stage and model estimation. The treatment effects for the 50% and 100% incentive arms are considered likely to fall in the ranges (in terms of percentage point increases) of [0, 20] and [0, 30] for the simulation, which translated into approximate odds ratio intervals of [1.00, 2.33] and [1.00, 3.50] respectively. We therefore conduct our simulation with distributions on the log scale of N(0.44, 0.222) and N(0.83, 0.322). The baseline was assumed to be between 0% and 50% and we used a simulation distribution of N(-1.1,0.52). We also assumed the ICC would lie between 0.02 and 0.05 uniformly. Finally, we simulated the measurement effect to be, as an odds ratio, between 0.9 and 1.1, thus using a N(0,0.12) distribution.
Based on the design of the trial, we determined the probability that there would be at least a 95% posterior probability that each treatment effect (as an odds ratio) would be >1 and that the absolute difference between the treatment effects would be >0. The respective probabilities for the 50% and 100% treatment conditions were 85% and 94%, and the probability for the difference was >99%.