Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We ran 1,000 power simulations to determine statistical power for our experiment. In each simulation, we incorporated estimates of each site's charging rate and proportion of Black/Hispanic and all other arrestees based on historical data from each site. We also estimated each site's number of cases they will send to our experiment each year, as well as the proportion of cases they will send to control, based on recent conversations with each site. We assumed that pre-existing charging rates for Black and Hispanic arrestees were 2.75pp higher than the charging rate for white arrestees, and that the treatment arm reduced 90% of this gap.
We generated a synthetic experiment population for each experiment simulation using these parameters. We then fit a logistic regression model to the synthetic population of the form logit(Pr(Y = 1)) = β_0 + β_1 * race + β_2 * treatment + β_3 * race * treatment, where "race" was 1 if the arrestee were Black or Hispanic, and 0 otherwise; and where "treatment" was 1 if the case was randomly assigned to the treatment arm. We then calculated whether the 95% confidence interval for β_3 crossed zero. Under this setup, we expect to detect the primary effect—a reduction in bias in charging decisions—in 81.3% of experiments of this size and design, indicating adequate power to detect small reductions in charging rate differences for Black and Hispanic arrestees.
For our secondary outcome, we ran a similar set of simulations with no pre-existing disparity, instead assuming that charging rates for all arrestees were 2.25pp higher or lower in the treatment arm. We then fit a logistic regression model to the synthetic population of the form logit(Pr(Y = 1)) = β_0 + β_1 * race + β_2 * treatment. This is nearly identical to the above model, though we dropped the interaction to simplify the power analysis. We then examined the coefficient β_2 and calculated whether the 95% confidence interval crossed zero. Given this setup, we expect to detect this effect in 80.5% of experiments of this size and design, again indicating adequate power to detect small changes in the overall charging rate.
We expect to gain additional statistical power by including a random effect for the prosecutor assigned to make the charging decision and by adjusting for case covariates, including arrestee sex, and age; the day, month, and year of the arrest; the presence of flags on the incident report indicating e.g., domestic violence, elderly victims, gang involvement, weapons, or the use of a body-worn camera; the Census-derived racial composition of the area in which the incident occurred, if the address is available; the precinct or police department where the arrest occurred; two-year retrospective arrest and felony arrest counts for the suspect; the alleged charges; and the number of alleged charges in total.