Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We estimate that we need a sample of at least 128. We use the "Dynamic Treatment Effects of Job Training" by Jorge Rodriguez, Fernando Saltiel, and Sergio Urzua forthcoming in Journal of Applied Econometrics as an analogue to our setting. They focus on earnings as a means of recovering the returns to job training. Given a desired power of 80% and p-value of 0.05, applying their sample logged earnings average with and without the treatment of 6.23 and 6.14, respectively, together with their given standard deviation of 0.57 and treatment effect of 1.7%, implies an N of 630. However, this estimate is highly sensitive to the standard deviation and treatment effect. First, their upper bound on the treatment effect is 3.4%, so exchanging that gives a much smaller N = 128. Second, their standard deviation of earnings is large and focused on a much less developed country where earnings are more heterogeneous. The American Time Use Survey from 2020 suggests that the standard deviation of logged earnings for full-time workers is 0.57. If we furthermore assume that our sample will be much more heterogeneous, closer to 0.40 (and a treatment effect of 3.7%), then we only need N = 68. For these reasons, we believe upwards of N = 128 is a reasonable view.