Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Outcome unit:
Assessment scores are measured on each component's official raw scale (that is, the points recorded for the unit). For the purpose of illustrating statistical power, minimum detectable effects (MDEs) below are expressed in standard deviation units.
Main continuous outcomes (format and encouragement effects, intention to treat):
The power calculations follow the pre-analysis plan: a two-sided test at the 5 per cent significance level, 80 per cent power, a balanced 2x2 factorial design, and individual-level randomisation. Main effects are estimated by pooling across the other factor, so each arm contains roughly half of the total sample.
Without covariates, the minimum detectable difference between arms is about 0.13 standard deviations at total N = 1,800, 0.13 at N = 2,000, and 0.12 at N = 2,200.
For the interaction effect (format x encouragement), which is estimated using roughly one quarter of the sample in each cell, the corresponding MDEs are about 0.26 at N = 1,800, 0.25 at N = 2,000, and 0.24 at N = 2,200.
Including pre-randomisation covariates (such as baseline assessment performance) should reduce residual variance. If these controls explain around 20 to 30 per cent of the variation in outcomes, the MDEs fall proportionally, implying reductions of roughly 10 to 15 per cent. This would place the main-effect MDEs at around 0.10 to 0.12, and the interaction MDEs at around 0.21 to 0.23.
Usage (instrumental):
If encouragement shifts the usage measure (for example, log of one plus total minutes) by around 0.3 standard deviations in the first stage, the minimum detectable local average effect of usage on assessment scores is roughly three times the intention-to-treat MDE expressed in the same units. With main-effect MDEs around 0.12 to 0.13, this corresponds to about 0.35 to 0.40 of standard deviation. A weaker first stage would increase this considerably and may leave the instrumental estimates imprecise.