Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)

EXPERIMENT 1
We designed Experiment 1 before knowing the actual capacity at CPS for the Summer Learning program. There were roughly 23,000 students who were eligible for the program from the district. We make a series of assumptions in order to carry out our power calculation analysis under different scenarios.
In the first scenario, we assume that not all eligible students will want to participate in the program and the district does not have the capacity to serve all eligible students, so we set program capacity to 11,000 students served in Summer Learning. Assuming there are 3,300 students already registered by the time we start experiment 1, and that the control group’s take-up rate is 50 percent, we calculate the following minimum detectable effect sizes (MDEs) for changes in registration for the program. We vary the number of students who are called by an outreach worker to be 3,500 or 7,000, and we set power = 0.8. The following table, which summarizes our findings suggests that if outreach workers called 3,500 students, we would be able to detect a 2.8 percentage point change in treatment student registration as compared to the control students. Alternatively, if outreach workers called 7,000 students, we would be able to detect a 2.4 percentage point difference in the probability to register for Summer Learning between the treatment and control groups.
MDE (3500 students called, 8100 students in control): 0.0283227
MDE (7000 students called, 6350 students in control): 0.0242665
In the second scenario, we set program capacity to 23,000 students (all those who were eligible) served in Summer Learning and repeat the above exercise. We further assume that there are 8,000 students already registered by the time Experiment 1 starts, and that the control group take-up rate is 15 percent, in order to calculate the following MDEs. The outcome here is registering for the program, we vary the number of students who are called by an outreach worker to be 4,000, 2,000, or 1,000, and we set power = 0.8. The following table, which summarizes our findings suggests that if outreach workers called 4,000 students, we would be able to detect a 2.1 percentage point change in treatment student registration as compared to the control students. Alternatively, if outreach workers called 2,000 or 1,000 students, we would be able to detect a 2.6 and 3.5 percentage point differences in registration rate across the treatment and control groups, respectively.
MDE (4,000 students called, 5500 in control): 0.0213398
MDE (2,000 students called, 6500 in control): 0.0262759
MDE (1,000 students called, 7000 students in control): 0.034877
EXPERIMENT 2 (texting experiment):
We ran the following power calculations for the individual-level texting treatment where we assign half of the total 7,608 students in Summer Learning to the treatment group and the other half to the control group, so 3,804 students in each group. Since we did not have data from SY20 on which specific students were eligible for Summer Learning when calculating the Minimum Detectable Effect, we constructed two samples of students from SY18 that we assumed would look similar to those eligible for Summer Learning this year—those students with the most absent days in SY18, and those students with the most math and reading course failures in SY18. We look at these same students’ outcomes from the following school year, 2019.
Absences refers to the number of days absent per school year. Math and reading course failures refers to the total number of math or reading related courses that a student failed in the fall quarter. Test scores are from the fall quarter math and reading assessments, and are standardized at the grade and subject level, by year.
To interpret this: when using the mock sample of students with the most absent days in SY18, assuming 7,608 students register for Summer Learning, with 3,804 treatment students and 3,804 control students, where power = 0.8, we are able to detect a change of 1.48 days in the number of absent days; a change of 0.03 in the number of courses failed; a change of 0.06 standard deviations in standardized math score; and a change of 0.07 standard deviations in standardized reading score. When using the mock sample of students with the most math and reading course failures in SY18, with the same assumptions, we are able to detect a change of 1.06 days in the number of absent days; a change of 0.03 in the number of courses failed; a change of 0.06 standard deviations in standardized math score; and a change of 0.06 standard deviations in standardized reading score. We also test whether we can detect a change in summer program failure rate, assuming the control group has a failure rate of 50%, and we find that we are able to detect a 3-percentage point change in summer program failure rate.
MDE (for 7608 students, 3804 in treatment and 3804 in control):
SY19 absences using most SY18 absences sample: 1.47966
SY19 course failures using most SY18 absences sample: 0.0303899
SY19 standardized math score using most SY18 absences sample: 0.0625444
SY19 standardized reading scores using most SY18 absences sample: 0.0655973
SY19 absences using most SY18 course failures sample: 1.055417
SY19 course failures using most SY18 course failures sample: 0.0349519
SY19 standardized math score using most SY18 course failures sample: 0.0613209
SY19 standardized reading score using most SY18 course failures sample: 0.0616284
Summer program failure rate, control rate 50 percent: 0.0320979
EXPERIMENT 3 (Resident Teacher Experiment):
We assume that there are 120 treatment clusters and 341 control clusters, and 8,000 students in the program (the total number in the program, not only the students who consented to receive texts). At the time of designing the experiment and writing up the pre-analysis plan, we do not have classroom roster data and therefore do not have any information on how students were distributed across classrooms; we assume the number of students per classroom is the same for all. We calculate the intra-cluster correlation coefficient based on the SY18 year-school-grade for each SY19 outcome. Since we did not have data from SY20 on which specific students were eligible for Summer Learning when calculating the Minimum Detectable Effect, we constructed two samples of students from SY18 that we assumed would look similar to those eligible for Summer Learning this year—those students with the most absent days in SY18, and those students with the most math and reading course failures in SY18. We look at these same students’ outcomes from the following school year, 2019.
Absences refers to the number of days absent per school year. Math and reading course failures refers to the total number of math or reading related courses that a student failed in the fall quarter. Test scores are from the fall quarter math and reading assessments, and are standardized at the grade and subject level, by year.
To interpret this: when using the mock sample of students with the most absent days in SY18, assuming 8,000 students register for Summer Learning, with 120 treatment classrooms and 341 control classrooms, where power = 0.8, we are able to detect a change of 2.62 days in the number of absent days; a change of 0.08 in the number of courses failed; a change of 0.12 standard deviations in standardized math score; and a change of 0.13 standard deviations in standardized reading score. When using the mock sample of students with the most math and reading course failures in SY18, with the same assumptions, we are able to detect a change of 2.04 days in the number of absent days; a change of 0.09 in the number of courses failed; a change of 0.16 standard deviations in standardized math score; and a change of 0.14 standard deviations in standardized reading score. We also test whether we can detect a change in summer program failure rate, assuming the control group has a failure rate of 50%, and we are able to detect a 11-percentage point change in summer program failure rate.
MDE (for 8,000 students, 120 treatment classrooms, 341 control)
SY19 absences using most SY18 absences sample: 2.617067
SY19 course failures using most SY18 absences sample: 0.0758736
SY19 standardized math score using most SY18 absences sample: 0.1239422
SY19 standardized reading scores using most SY18 absences sample: 0.1253976
SY19 absences using most SY18 course failures sample: 2.04071
SY19 course failures using most SY18 course failures sample: 0.0859007
SY19 standardized math score using most SY18 course failures sample: 0.1587865
SY19 standardized reading score using most SY18 course failures sample: 0.143222
Summer program failure rate, control rate 50 percent: 0.1067743