Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
To conduct our audit study with adequate power to detect effects with 95% confidence based on the approximate point estimates in the existing literature (including Darolia et al., 2014 and Deming et al., 2016) we aim to send at least 6,000 resumes. Our power calculations use the following parameters: based on prior audit studies, we assume a baseline callback rate of 8% and aim to detect differences of 3-4 percentage points between degree types, with a power of 0.80 and a significance level of 0.05 (using Bonferroni correction for multiple comparisons between the three-degree types). For our proposed study, detecting a minimum effect size of 4 percentage points would require applying to 2,278 different job openings– with AA, CCBA, and traditional BA resumes – for a total of 6,834 applications. To detect a smaller effect size of 3 percentage points while maintaining 80% power, these sample sizes would need to increase by approximately 78%, requiring 4,054 applications per degree type and 12,162 applications total. Note that detecting differences in callback rates across degrees is the focus of this pilot, but that we will also randomize perceived race and ethnicity (Black, Hispanic, and White) across resume submissions. Race-ethnicity by degree interaction effects will be detectable at a slightly higher level given the proposed number of resumes submitted. The final number of applications sent will be constrained by the number of available job openings, but given these power calculations we aim to send a minimum of 6,000 resumes across the two states.