Back to History

Fields Changed

Registration

Field Before After
Trial Start Date September 05, 2022 October 05, 2022
Last Published June 26, 2022 05:25 AM September 14, 2022 10:59 PM
Intervention (Public) The intervention varies what participants believe about how they were assigned to the easier, lower-paying of two tasks related to scientific communication. In the status quo arm, workers are told that managers who know worker demographics made the job assignment decisions. In two treatment arms, workers are told that the assignments were made using other mechanisms that are unable to discriminate. The intervention varies how participants are assigned to the easier, lower-paying of two tasks related to scientific communication. In the status quo arm, workers are evaluated by managers who know worker demographics. In two treatment arms, workers are evaluated by other mechanisms that are unable to discriminate.
Intervention Start Date October 03, 2022 November 03, 2022
Primary Outcomes (End Points) Perceived discrimination, cooperation with and reciprocity towards managers, effort, retention, and performance, and future labor supply Perceived discrimination, effort, retention, and performance, and future labor supply
Primary Outcomes (Explanation) Several of the above variables can be combined into indices, which is described in more detail in the uploaded pre-analysis plan. Several of the above variables can be combined into indices, which is described in more detail in the uploaded pre-analysis plan. Perceived discrimination is measured in three ways: (1) Implicit perceived discrimination is the difference between how many stars a worker thinks they would have needed to earn on the screening quiz to be assigned to the hard task and how many stars they think comparable workers with different races and genders would have needed to earn to be assigned to the hard task. On an aggregate level, gender and racial differences in the number of stars workers think they would have needed to earn (conditional on how many they did earn) to be assigned to the hard task is another measure of perceived discrimination (2) Explicit perceived discrimination: Answering "Yes, I think so" or "or Yes, I am sure" to the two questions "do you think you would have been assigned to the harder task if your gender [race] was different." (3) General (population) perceived discrimination: whether workers think that their own group (or others) is under-represented among workers assigned to the hard task.
Experimental Design (Public) Workers will be recruited with a screening survey and then evaluated by three job assignment mechanisms. Workers who are assigned to the harder task by any of the mechanisms will be assigned to the harder task and exit the sample of interest. The remaining workers will be randomly assigned which mechanism they are told was the one responsible for assigning them to the easier, lower-paying task. They will then answer questions about their interest in future work, do the easy task, answer survey questions, and finish the experiment. Workers will be recruited with a screening survey and then evaluated by three job assignment mechanisms. Workers are assigned to either a harder or easier task by one randomly-assigned mechanism. Workers assigned to the hard task are not the sample of interest. The remaining workers who are assigned to the easy task by their randomly-assigned mechanism do the easier, lower-paying task. They will then answer questions about their interest in future work, answer survey questions, and finish the experiment.
Randomization Method Randomization is done in an office using Stata on a computer and treatment values are uploaded to Qualtrics for each participant when they return for the follow-up (experimental) survey. This allows stratification by race and gender, which is not possible when randomizing in Qualtrics directly. Randomization is done in an office using Stata on a computer and treatment values are uploaded to Qualtrics for each participant when they return for the follow-up (experimental) survey. This allows clustering, which is not possible when randomizing in Qualtrics directly.
Randomization Unit Individual Workers are grouped into random groups of 120, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which job assignment mechanism they will be evaluated by).
Was the treatment clustered? No Yes
Planned Number of Clusters 3,500 workers will be initially recruited. 2,100 are expected to be assigned to the easier task and return for the experimental session. 3,600 workers will be initially recruited. There are 30 groups (of 120 workers) that are assigned together to a particular treatment.
Planned Number of Observations 3,500 workers will be initially recruited. 2,100 are expected to be assigned to the easier task and return for the experimental session. 3,600 workers will be initially recruited. 2,640 are expected to be assigned to the easier task and return for the experimental session. 2304 are expected to be in the subsample that would have been assigned to the easy task by all three mechanisms.
Sample size (or number of clusters) by treatment arms 840 workers status quo, 840 workers manager treatment, 420 workers algorithm treatment 10 groups control, 10 groups treatment 1, 5 groups treatment 2 and see demographics of historical workers, 5 groups treatment 2 and don't see demographics of historical workers.
Power calculation: Minimum Detectable Effect Size for Main Outcomes All calculations assume a power of 80% and a significance level of 0.95. Regressions to test whether the manager treatment (algorithm treatment) reduces perceived discrimination relative to the status quo are powered to detect effects larger than 10pp (14pp). The percent of the population perceiving discrimination in the status quo group is assumed to match the rate of perceived discrimination in each race*gender cell in a pilot study describing the main experiment as a hypothetical scenario -- see the uploaded pre-analysis plan for details. Given the results from a pilot study, the effect sizes are expected to be larger than these MDEs. The study will also be powered to detect a difference in these effects for the two treatments of 14pp or more, assuming the algorithm treatment reduces perceived discrimination by up to 4pp and the manager treatment is more effective. The study is also powered to detect that the effect of the manager treatment (algorithm treatment) differs for whites and non-whites or men and women by 16pp (20pp). Regressions to test whether the manager treatment increases labor supply (whether a worker completes all eighteen paragraphs and whether they opt in to future work), or any continuous outcome that is standardized to be zero in the control group (e.g. reservation wages for working more closely with their manager, willingness to pay to choose one's own manager, job satisfaction, effort, proofreading quality) are powered to detect effects of at least 16pp and 0.06-0.12sd, respectively (where control group means and distributions are predicted from the pilot data where possible, including that 60 percent of workers in the status quo group are assumed to complete all 18 paragraphs). All calculations assume power of 80 percent and a significance level of 0.95. I will recruit 3,600 participants to complete the screening survey. I expect take-up for the follow-up survey to be high, since the screening survey's initial description will indicate that there is a well-paid follow up survey. Assuming that 80 percent of workers complete the follow-up survey and 92 percent of workers are assigned to the easier task, the final analysis sample will be around 2,640 workers assigned to the easier task by their randomly assigned mechanism or around 2,304 workers assigned to the easy task by any of the three mechanisms (assuming 20 percent of workers are assigned to the hard task by at least one mechanism, i.e. that the mechanisms’ decisions are slightly but not strongly correlated). The analysis might use either of these samples to deal with “selection” issues, as described above. 1. First stage regressions 1.1. Main effects. Regressions to test whether the demographic-blinded manager reduces perceived discrimination relative to the manager who knows demographics are powered to detect effects larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). Similarly, testing whether one of the algorithm sub-groups reduces perceived discrimination relative to the manager who knows demographics is powered to detect effects larger than 8 percentage points for either sample size. Given the results from a pilot study, the effect sizes are expected to be larger than these MDEs. 1.2. Treatment effect heterogeneity. Treatment effect heterogeneity is powered as follows: tests of whether the effect of the algorithm depends on whether the worker knows the race of the historically assigned workers are powered to detect differences larger than 9.5 or 10 percentage points (for sample sizes of 2,640 or 2,304, respectively). Tests of whether the effect of the demographic-blind human differs from one algorithm sub-group are powered to detect differences of 8 percentage points and tests that the effect of the demographic-blind human differs from both algorithm subgroups (which are pooled and don't differ from each other) are powered to detect differences larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). 1.3. Racial and gender heterogeneity. Racial and gender heterogeneity is powered as follows: when testing for heterogeneity in the effects of the blinded manager, gender heterogeneity among non-white participants and racial heterogeneity among men are powered to detect differences in the treatment effect of 15 percentage points, gender heterogeneity among white participants is powered to detect differences in the treatment effect of 18 percentage points, and racial heterogeneity among women is powered to detect differences in the treatment effect of 20 percentage points. For each group, testing for heterogeneity in the effects of the algorithm are powered to detect MDEs about 3 percentage points larger than the MDEs for differences in the effects of the blinded manager. These MDEs come from simulations with a sample size of 2,640; with a sample size of 2,304 each MDE is about 1 percentage point larger. 2. Reduced form regressions 2.1. Binary outcomes. Regressions to test the effects of the demographic-blind manager on the binary measures of retention (completing only the minimum 6 paragraphs, or completing all 18 paragraphs) are powered to detect effects larger than 5 or 5.5 percentage points (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 6 or 6.5 percentage points (for sample sizes of 2,640 or 2,304, respectively). In pilot data, 12 percent of workers completed only 6 paragraphs and 68 percent complete all 18 paragraphs, which is assumed in these calculations. 2.2. Continuous outcomes. All other outcomes are continuous. Regressions to test the effects of the demographic-blind manager are powered to detect effects larger than 0.14sd or 0.15sd (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 0.17sd (for either sample size). 3. Two-stage least squares regressions 3.1. The two-stage least squares power calculations assume that the effects of the treatments on perceived discrimination are quite large, effectively taking the rate of perceived discrimination to zero in the demographic-blind manager group and both algorithm sub-groups. This is consistent with piloting (though in very small samples). 3.2. Then, two-stage-least-squares regressions are powered to detect effects of reducing perceived discrimination that are larger than 4 or 5 percentage points on the binary outcomes (for sample sizes of 2,640 or 2,304, respectively), and effects larger than 0.12sd on the continuous outcomes (for either sample size).
Intervention (Hidden) The intervention varies what participants believe about how they were assigned to the easier, lower-paying of two tasks related to scientific communication. In the status quo group, workers are told that managers who know worker demographics made the job assignment decisions. Two treatment arms reduce the scope for discrimination. First, in one arm workers are told that a screening algorithm that only uses worker screening quiz scores and education assigned the tasks. Second, in another arm workers are told that a manager made the assignments but the manager did not know worker demographics. In truth, all workers were assigned to the easier task by all three mechanisms. Any worker assigned to the harder task by any of the three mechanisms is assigned to the harder task and is not in the experimental sample. The intervention varies how participants were assigned to the easier, lower-paying of two tasks related to scientific communication. In the status quo group, workers are evaluated by managers who know worker demographics. Two treatment arms reduce the scope for discrimination. First, in one arm workers are evaluated by a screening algorithm that only uses worker screening quiz scores and education to assigned the tasks. Second, in another arm workers are evaluated by a manager who did not know worker demographics. There are two possible ways that the job assignment mechanisms could change average worker outcomes among the workers randomly assigned to be evaluated by them: 1. The mechanisms have different potential to discriminate against workers, and thus may effect perceptions of discrimination (the effect of interest) 2. The three mechanisms could assign different types of workers to the hard versus easy tasks (a “selection effect”) Two methods will be used to isolate the effect of interest by accounting for the possible selection effect. Luckily, this is easier than dealing with a similar type of selection effect in observational data. We will use two methods to do so: (1) we will control for all observable characteristics that managers or the algorithm could have used when making their decisions (unlike in the real world, we know that this is all (or more) information that the manager or algorithm knew about each worker when they decided on who to assign to the harder task); and (2) the experiment is designed so that we have collected data on how each mechanism would have assigned each worker, so restricting to the sample of workers who all would have been assigned to the easier task by all three mechanisms should also account for any selection effect.
Secondary Outcomes (End Points) Self-efficacy in the work task and task-related skills, job satisfaction, affective well-being Self-efficacy in the work task and task-related skills, job satisfaction, affective well-being, cooperation with and reciprocity towards managers, and beliefs about the likelihood of future discrimination.
Back to top

Irbs

Field Before After
IRB Approval Date June 22, 2022 September 14, 2022
Back to top