Back to History

Fields Changed

Registration

Field Before After
Last Published September 14, 2022 11:03 PM September 19, 2022 10:08 AM
Randomization Unit Workers are grouped into random groups of 120, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which job assignment mechanism they will be evaluated by). Workers are grouped into random groups of 40, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which job assignment mechanism they will be evaluated by).
Planned Number of Clusters 3,600 workers will be initially recruited. There are 30 groups (of 120 workers) that are assigned together to a particular treatment. 3,600 workers will be initially recruited. There are 90 groups (of 40 workers each) that are assigned together to a particular treatment.
Planned Number of Observations 3,600 workers will be initially recruited. 2,640 are expected to be assigned to the easier task and return for the experimental session. 2304 are expected to be in the subsample that would have been assigned to the easy task by all three mechanisms. 3,600 workers will be initially recruited. 2,664 are expected to be assigned to the easier task and return for the experimental session. 2304 of these are expected to be in the subsample that would have been assigned to the easy task by all three mechanisms.
Sample size (or number of clusters) by treatment arms 10 groups control, 10 groups treatment 1, 5 groups treatment 2 and see demographics of historical workers, 5 groups treatment 2 and don't see demographics of historical workers. 30 groups control, 30 groups treatment 1, 15 groups treatment 2 and see demographics of historical workers, 15 groups treatment 2 and don't see demographics of historical workers.
Power calculation: Minimum Detectable Effect Size for Main Outcomes All calculations assume power of 80 percent and a significance level of 0.95. I will recruit 3,600 participants to complete the screening survey. I expect take-up for the follow-up survey to be high, since the screening survey's initial description will indicate that there is a well-paid follow up survey. Assuming that 80 percent of workers complete the follow-up survey and 92 percent of workers are assigned to the easier task, the final analysis sample will be around 2,640 workers assigned to the easier task by their randomly assigned mechanism or around 2,304 workers assigned to the easy task by any of the three mechanisms (assuming 20 percent of workers are assigned to the hard task by at least one mechanism, i.e. that the mechanisms’ decisions are slightly but not strongly correlated). The analysis might use either of these samples to deal with “selection” issues, as described above. 1. First stage regressions 1.1. Main effects. Regressions to test whether the demographic-blinded manager reduces perceived discrimination relative to the manager who knows demographics are powered to detect effects larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). Similarly, testing whether one of the algorithm sub-groups reduces perceived discrimination relative to the manager who knows demographics is powered to detect effects larger than 8 percentage points for either sample size. Given the results from a pilot study, the effect sizes are expected to be larger than these MDEs. 1.2. Treatment effect heterogeneity. Treatment effect heterogeneity is powered as follows: tests of whether the effect of the algorithm depends on whether the worker knows the race of the historically assigned workers are powered to detect differences larger than 9.5 or 10 percentage points (for sample sizes of 2,640 or 2,304, respectively). Tests of whether the effect of the demographic-blind human differs from one algorithm sub-group are powered to detect differences of 8 percentage points and tests that the effect of the demographic-blind human differs from both algorithm subgroups (which are pooled and don't differ from each other) are powered to detect differences larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). 1.3. Racial and gender heterogeneity. Racial and gender heterogeneity is powered as follows: when testing for heterogeneity in the effects of the blinded manager, gender heterogeneity among non-white participants and racial heterogeneity among men are powered to detect differences in the treatment effect of 15 percentage points, gender heterogeneity among white participants is powered to detect differences in the treatment effect of 18 percentage points, and racial heterogeneity among women is powered to detect differences in the treatment effect of 20 percentage points. For each group, testing for heterogeneity in the effects of the algorithm are powered to detect MDEs about 3 percentage points larger than the MDEs for differences in the effects of the blinded manager. These MDEs come from simulations with a sample size of 2,640; with a sample size of 2,304 each MDE is about 1 percentage point larger. 2. Reduced form regressions 2.1. Binary outcomes. Regressions to test the effects of the demographic-blind manager on the binary measures of retention (completing only the minimum 6 paragraphs, or completing all 18 paragraphs) are powered to detect effects larger than 5 or 5.5 percentage points (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 6 or 6.5 percentage points (for sample sizes of 2,640 or 2,304, respectively). In pilot data, 12 percent of workers completed only 6 paragraphs and 68 percent complete all 18 paragraphs, which is assumed in these calculations. 2.2. Continuous outcomes. All other outcomes are continuous. Regressions to test the effects of the demographic-blind manager are powered to detect effects larger than 0.14sd or 0.15sd (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 0.17sd (for either sample size). 3. Two-stage least squares regressions 3.1. The two-stage least squares power calculations assume that the effects of the treatments on perceived discrimination are quite large, effectively taking the rate of perceived discrimination to zero in the demographic-blind manager group and both algorithm sub-groups. This is consistent with piloting (though in very small samples). 3.2. Then, two-stage-least-squares regressions are powered to detect effects of reducing perceived discrimination that are larger than 4 or 5 percentage points on the binary outcomes (for sample sizes of 2,640 or 2,304, respectively), and effects larger than 0.12sd on the continuous outcomes (for either sample size). All calculations assume power of 80 percent and a significance level of 0.95. I will recruit 3,600 participants to complete the screening survey. I expect take-up for the follow-up survey to be high, since the screening survey's initial description will indicate that there is a well-paid follow up survey. Assuming that 80 percent of workers complete the follow-up survey and 92.5 percent of workers are assigned to the easier task, the final analysis sample will be around 2,664 workers assigned to the easier task by their randomly assigned mechanism or around 2,304 workers assigned to the easy task by any of the three mechanisms (assuming 20 percent of workers are assigned to the hard task by at least one mechanism, i.e. that the mechanisms’ decisions are slightly but not strongly correlated). The analysis might use either of these samples to deal with “selection” issues, as described above. 1. First stage regressions 1.1. Main effects. Regressions to test whether the demographic-blinded manager reduces perceived discrimination relative to the manager who knows demographics are powered to detect effects larger than 6.5 or 7 percentage points (for sample sizes of 2,664 or 2,304, respectively). Similarly, testing whether one of the algorithm sub-groups reduces perceived discrimination relative to the manager who knows demographics is powered to detect effects larger than 8 percentage points for either sample size. Given the results from a pilot study, the effect sizes are expected to be larger than these MDEs. 1.2. Treatment effect heterogeneity. Treatment effect heterogeneity is powered as follows: tests of whether the effect of the algorithm depends on whether the worker knows the race of the historically assigned workers are powered to detect differences larger than 9.5 or 10 percentage points (for sample sizes of 2,664 or 2,304, respectively). Tests of whether the effect of the demographic-blind human differs from one algorithm sub-group are powered to detect differences of 8 percentage points and tests that the effect of the demographic-blind human differs from both algorithm subgroups (which are pooled and don't differ from each other) are powered to detect differences larger than 6.5 or 7 percentage points (for sample sizes of 2,664 or 2,304, respectively). 1.3. Racial and gender heterogeneity. Racial and gender heterogeneity is powered as follows: when testing for heterogeneity in the effects of the blinded manager, gender heterogeneity among non-white participants and racial heterogeneity among men are powered to detect differences in the treatment effect of 15 percentage points, gender heterogeneity among white participants is powered to detect differences in the treatment effect of 18 percentage points, and racial heterogeneity among women is powered to detect differences in the treatment effect of 20 percentage points. For each group, testing for heterogeneity in the effects of the algorithm are powered to detect MDEs about 3 percentage points larger than the MDEs for differences in the effects of the blinded manager. These MDEs come from simulations with a sample size of 2,664; with a sample size of 2,304 each MDE is about 1 percentage point larger. 2. Reduced form regressions 2.1. Binary outcomes. Regressions to test the effects of the demographic-blind manager on the binary measures of retention (completing only the minimum 6 paragraphs, or completing all 18 paragraphs) are powered to detect effects larger than 5 or 5.5 percentage points (for sample sizes of 2,664 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 6 or 6.5 percentage points (for sample sizes of 2,664 or 2,304, respectively). In pilot data, 12 percent of workers completed only 6 paragraphs and 68 percent complete all 18 paragraphs, which is assumed in these calculations. 2.2. Continuous outcomes. All other outcomes are continuous. Regressions to test the effects of the demographic-blind manager are powered to detect effects larger than 0.14sd or 0.15sd (for sample sizes of 2,664 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 0.17sd (for either sample size). 3. Two-stage least squares regressions 3.1. The two-stage least squares power calculations assume that the effects of the treatments on perceived discrimination are quite large, effectively taking the rate of perceived discrimination to zero in the demographic-blind manager group and both algorithm sub-groups. This is consistent with piloting (though in very small samples). 3.2. Then, two-stage-least-squares regressions are powered to detect effects of reducing perceived discrimination that are larger than 4 or 5 percentage points on the binary outcomes (for sample sizes of 2,664 or 2,304, respectively), and effects larger than 0.12sd on the continuous outcomes (for either sample size).
Back to top