Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We conduct power calculations both for our current sample and with twice our current sample size, hoping that we will be able to double the sample size with the second wave. For simplicity, we use the sample used for randomization with 36 schools and assume that there are 2 grades in each school (following our planned design but not accounting for peculiarities of individual schools), with on average 26 children per grade, so the resulting power is an estimate. This means that for wave 1 (2), we calculate power with 36 (72) schools and 72 (144) grades.
We use Optimal Design Software for all power calculations. The main interest of the project is the combined effect of the treatments. To test, we compare 18 (36) grades in TPD schools that receive the FEEDBACK treatment to the 18 (36) grades in NO TPD schools not receiving the FEEDBACK treatment. We account for clustering at the school level and assume a conservative intra-school correlation coefficient of 0.1. We also account for the 18 blocks we get through our pairwise design and assume an effect size variability of 0.01. Assuming that there are 26 children in each grade, so 52 children in each block, we are able to detect an effect of 0.35 (0.24) standard deviations with a power of 80% assuming α = 0.05. If we include student characteristics and pre-treatment assessment scores as controls, then statistical power is likely to be improved. We are able to detect the same effect size for the TPD treatment by comparing the 36 (72) NO FEEDBACK classes between the 18 (36) TPD and the 18(36) NO TPD schools. We are also able to detect a similar effect size for the feedback treatment, where we only consider the 18 (36) NO TPD schools/blocks and compare the 18 (36) Feedback grades to the 18 (36) No Feedback grades within school.
As described in more detail in the experimental design, we will also use OLS regressions with interaction terms to measure treatment effects, controlling for baseline measures and block fixed effects, which should improve power.