Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Power calculations for students’ subject-specific proficiency rates based on standardized test scores, by academic level (i.e., Grades 3-8, Grade 11), using school-level administrative data for AY 2015-16 & 2016-17. Considering the cluster-randomized design with measurements of student test scores at baseline and end-line, the power calculations (power=0.9, α=0.05) suggest that with 237 elementary/middle schools per arm (assuming equal treatment and control group sample sizes), it will be possible to detect a year-specific increase in Grades 3-8 students’ academic proficiency in Spanish, English, and Math scores of respectively 0.11s, 0.10s, and 0.09s. (See power analysis appendix and response for details.) This is similar to existing short-term impacts of principal training programs in the US using student-level outcomes (e.g., Fryer 2017, Gates et al. 2014). Comparable MDEs for Grade 11 test scores are somewhat larger –in the range of 0.27s (in English) to 0.31s (in Spanish) – given the more limited number of high schools (61 high schools per experimental arm). We expect to gain statistical power at both grade level groups since we will have access to individual students’ (continuous) standardized test scores as opposed to only (binary) proficiency levels, individual-level baseline outcomes, and a larger sample of schools in the control group in Year 1 [e.g., 474 (= 273 x 2) K-8 schools and 122 (= 61 x 2) high schools].