Back to History

Fields Changed

Registration

Field Before After
Abstract This pre-analysis plan documents the intended analysis for an experiment that follows up on AEARCTR-0009592. This follow-up randomized experiment examines how individuals perceive discrimination, further (relative to the original experiment) varying the methods used to hire workers and what workers know about them to understand certain mechanisms behind the original treatments that reduce perceptions of discrimination. The main outcome is the rate of perceived discrimination in each of 9 treatment arms (four of which replicate the original experiment). The follow-up will also replicate and extend the results of the original experiment on the effects of perceived discrimination on future labor supply, and, unlike the original experiment, will measure comprehension of the various treatments. This plan outlines the study design and hypotheses, outcomes of interest, and empirical specifications. This pre-analysis plan documents the intended analysis for an experiment that follows up on AEARCTR-0009592. This follow-up randomized experiment examines how individuals perceive discrimination, further (relative to the original experiment) varying the methods used to hire workers and what workers know about them to understand certain mechanisms behind the original treatments that reduce perceptions of discrimination. The main outcome is the rate of perceived discrimination in each of 6 treatment arms (four of which replicate the original experiment). The follow-up will also replicate and extend the results of the original experiment on the effects of perceived discrimination on future labor supply, and, unlike the original experiment, will measure comprehension of the various treatments. This plan outlines the study design and hypotheses, outcomes of interest, and empirical specifications.
Last Published August 10, 2023 12:33 PM September 11, 2023 03:55 PM
Intervention Start Date August 15, 2023 September 12, 2023
Intervention End Date September 01, 2023 October 30, 2023
Randomization Unit Workers are grouped into random groups of 40, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which hiring method they will be evaluated by). Each group of 40 is evaluated by the same manager and thus sees the same information about their manager and their past hiring decisions. Workers are grouped into random groups of 40, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which hiring method they will be evaluated by). Each group of 40 is sees the same historically-hired workers and, in the manager arms, manager profile.
Planned Number of Clusters 3960 workers are originally recruited. There are 99 groups (of 40 workers each) that are assigned together to a particular treatment. Conditional on their quintile quiz scores these workers are randomly grouped together. 3960 workers are originally recruited. There are 99 groups (of 40 workers each) that are assigned together to a particular treatment. Conditional on their quintile quiz scores these workers are randomly grouped together. Some groups of 40 are randomly further paired with another to make a "super group" that sees the same historically-promoted workers, and the same demographics of their manager's if they are in the manager arms. This yields approximately 66 total clusters, with half the clusters having 80 workers and the other half having 40.
Sample size (or number of clusters) by treatment arms 11 groups of 40 workers assigned to each of the 9 treatment arms described above. There are 15 or 16 groups of 40 workers assigned to each of the 6 treatment arms described above.
Power calculation: Minimum Detectable Effect Size for Main Outcomes Approximately 350 workers are expected to return in each of the nine treatment arms and approximately 315 per arm are expected to be in the subsample not hired by any hiring method. For each power calculation I will specify the MDE for the 350 case followed by the 315 case in parentheses. The primary research question in this follow up is what types of algorithms are effective at reducing perceptions of discrimination. Thus, I am most interested in comparing each of the other treatment groups to the non-blind manager, and I calculate conservative MDEs by focusing on the comparisons of two groups at a time without control variables – pooling arms and adding controls would improve the precision of the estimates. With this sample size, I will be powered (confidence level = 0.05, power = 80 percent) to detect a 10pp (11pp) difference in either direction between each treatment group and the non-blind manager group (based on analytical power calculations with a total sample size of 700 (630), and assuming that 40 percent of participants perceive discrimination in the non-blind manager group, as in the original experiment (among women and racial minority men, who will make up the whole sample for the follow-up). Given the sample size needed to obtain the power described above, I can also calculate the MDEs for the differences between other treatment groups, depending on the rate of perceived discrimination in the less-discriminatory group. For example, I may also be interested in testing whether the algorithm that uses demographics is perceived to discriminate more than the algorithm that doesn't, or whether providing information about algorithmic fairness reduces perceptions of algorithmic bias. Here, the relevant control mean is 20%, not 40%, so I would be powered to detect differences larger than 9.1pp (9.6pp). When one group has near zero percent of participants perceiving discrimination, I can detect differences larger than 3.4pp (3.7pp). The second outcome is reservation wages for future work, which, between the manager arms is a replication of the original experiment and will only be possible in the algorithm arms if there are still positive rates of perceived of discrimination in some of the algorithm arms. Again focusing on comparing just two arms, the MDE for the effect on a continuous variable is about 0.2sd for either sample size. Instead, pooling the three arms where there will almost certainly be no perceived discrimination (based on the results of the original experiment) and pooling the three arms where there will most likely be positive rates of perceived discrimination between 20-40 percent (based on the results of the original experiment), the MDE is about 0.12sd for either sample size (N=2100 or 1890). Approximately 530 workers are expected to return in each of the nine treatment arms and approximately 475 per arm are expected to be in the subsample not hired by any hiring method. For each power calculation I will specify the MDE for the 530 case followed by the 475 case in parentheses. The primary research question in this follow up is what types of algorithms are effective at reducing perceptions of discrimination. Thus, I am most interested in comparing each of the other treatment groups to the non-blind manager, and I calculate conservative MDEs by focusing on the comparisons of two groups at a time without control variables – pooling arms and adding controls would improve the precision of the estimates. With this sample size, I will be powered (confidence level = 0.05, power = 80 percent) to detect a 8.5pp (9pp) difference in either direction between each treatment group and the non-blind manager group (based on analytical power calculations with a total sample size of 1060 (950), and assuming that 40 percent of participants perceive discrimination in the non-blind manager group, as in the original experiment (among women and racial minority men, who will make up the whole sample for the follow-up). Given the sample size needed to obtain the power described above, I can also calculate the MDEs for the differences between other treatment groups, depending on the rate of perceived discrimination in the less-discriminatory group, all of which would be better-powered based on the results from the main experiment. For example, I am interested in testing whether the algorithm that uses demographics is perceived to discriminate more than the algorithm that doesn't, as well as the difference between the arms with the blind manager and the algorithm without demographics in which workers know that mostly white men were hired in the past. Here, the relevant control mean is 20%, not 40%, so I would be powered to detect differences larger than 7.3pp (7.7pp). When one group has near zero percent of participants perceiving discrimination, I can detect differences larger than 2.5pp (2.7pp). The second outcome is reservation wages for future work, which, between the manager arms is a replication of the original experiment and will only be possible in the algorithm arms if there are still positive rates of perceived of discrimination in some of the algorithm arms. Again focusing on comparing just two arms, the MDE for the effect on a continuous variable is about 0.17sd for either sample size. Instead, pooling the two arms where there will almost certainly be no perceived discrimination (based on the results of the original experiment) and pooling the two arms where there will most likely be positive rates of perceived discrimination between 20-40 percent (based on the results of the original experiment), the MDE is about 0.12sd for either sample size (N=2120 or 1900).
Intervention (Hidden) The intervention varies how participants are hired to do a difficult proofreading and summarizing task related to scientific communication, as well as the information participants know about the hiring methods used. The follow-up experiment replicates the four arms of the original experiment and adds five more that will clarify mechanisms and extend the findings of the original study. The three original hiring methods are: 1. A manager who knows demographics when deciding who to hire 2. A manager who does not know demographics when deciding who to hire 3. An algorithm that does not use demographic information The original experiment also varied whether workers evaluated by the algorithm knew the demographics of the workers the algorithm promoted in the past (thus, four arms). The follow-up adds one new hiring mechanism, an algorithm that uses demographics to make sure that it is equally predictive of performance for all race x gender groups. The follow-up also adds additional arms that provide differential information to workers about the hiring method they were randomly assigned to be evaluated by. In each algorithm arm, half of the workers are given information about how the algorithm was trained (demographics of the training sample) and explicit information that the algorithm is not biased (that conditional on initial performance, there are no race, gender, or age differences in the predicted performance measure the algorithm outputs). Additionally, in each of the arms where the hiring method does not use demographics, half of the workers are shown the demographics of the workers hired historically and half are not. In the arms where the hiring method does use demographics, workers always know the demographics of the historically hired workers. Thus, there are nine arms total, each equally-sized. The intervention varies how participants are hired to do a difficult proofreading and summarizing task related to scientific communication, as well as the information participants know about the hiring methods used. The follow-up experiment replicates the four arms of the original experiment and adds two more that will clarify mechanisms and results in the main study. The three original hiring methods are: 1. A manager who knows demographics when deciding who to hire 2. A manager who does not know demographics when deciding who to hire 3. An algorithm that does not use demographic information The original experiment also varied whether workers evaluated by the algorithm knew the demographics of the workers the algorithm promoted in the past (thus, four arms). The follow-up adds one new hiring mechanism, an algorithm that allows the relationship between performance and test scores+education to vary by race and gender. Workers assigned to this hiring mechanism are also randomly assigned to either see the demographics of workers hired in the past or not. The final new arm is to also randomize whether workers evaluated by the blind manager are shown the demographics of the workers hired in the past. Thus, there are six arms, each equally-sized.
Back to top