Experimental Design Details
(1) Part of the experimental design includes showing workers the profiles of other workers who were assigned to the hard task by their manager/the algorithm (depending on their treatment assignment) in a previous round. This “historical sample” round is designed to be more representative of STEM or other industries where white men are particularly over-represented. Generally, the study procedures for the historical sample will be the same as for the main sample, with some logistical differences.
(2) The screening survey. Workers will be recruited with a “screening survey,” common on Prolific, that qualifies the worker to participate in future high-paying surveys. In this screening survey, their first session, workers will complete the three screening tasks, and answer questions about their demographics and job history. After all workers have completed the screening survey, their scores and demographics will be aggregated into worker profiles. Workers will be grouped into groups with similar education levels and average quiz scores to be evaluated and assigned to the harder or easier task. Quiz scores are shown as 1-5 stars on a worker profile that are approximate quintiles.
(3) Manager randomization. In the historical sample, each worker profile will be evaluated by one manager. Managers will be randomly assigned to evaluate workers with or without demographic information in their profile (profiles always include average test score stars and education). Managers will get the same type of information about their workers when they return to evaluate the main study sample. They will also be randomly assigned to evaluate workers of a certain average quiz score level (1-2 stars, 2-3 stars, 3-4 stars, 4-5 stars) and will evaluate workers of the same average quiz score level when they return to evaluate the main study sample.
After managers return to evaluate the main study sample workers, they will be randomly paired with another manager who has the opposite assignment as them (receiving demographics when they don't, or vice versa) and was assigned to evaluate workers of the same average quiz score level. These manager pairs will evaluate the same 360 workers (3 sets of 120, say set A, B, and C). All three sets will also be evaluated by the algorithm. Managers know that their decisions will be implemented for one of the three sets and that the performance of the workers they assign to the harder task in that set will determine their bonus payment.
(4) Worker randomization. Then, each set A, B, and C will be randomly matched with one of the three job assignment mechanisms, without replacement. For example, set B could be assigned to jobs following the decisions of the demographic-blind manager, set A could be assigned to jobs following the decisions of the manager with access to demographics, and set C could be assigned to jobs following the decisions of the algorithm. Workers are only told about the mechanism that determined their assignment. The assignments by the other mechanisms only generate data to be used in the analysis of the experiment.
(5) Work task and survey. After all workers have been evaluated, 2-3 weeks after workers complete the screening task, any worker who is assigned to the harder task by their manager or the algorithm (depending on their treatment assignment) will be offered the harder task. They will do the harder task and finish the experiment (about 8 percent of workers).
At the same time, any worker who is assigned to the easier task by their manager or the algorithm (depending on their treatment assignment) will be offered the easier task. Among this sample of interest, after agreeing to take the follow-up survey, workers will be told about how they were assigned to the easier, lower-paying task. If they were assigned by a manager, they will see some demographic characteristics of their manager. All workers will see three profiles of the workers that their manager/the algorithm (depending on their treatment assignment) assigned to the harder task in a previous round (i.e. the historical sample). If their manager knew their and other workers' demographics, those demographics will be included in the profiles along with quiz scores and education. If their manager did not know anyone's demographics, the profiles will only include test score stars and education. Workers assigned to be evaluated by the algorithm will be randomized into two sub-groups: those who are shown the race/gender/avatar of those previously assigned by the algorithm to the hard task, and those who are not. All of them will know that the algorithm only used data on education and quiz scores and see this information. Comparisons of these two groups isolates a difference in perceived discrimination within the algorithm arm, and provides data on how common perceived discrimination might be even in a setting where a worker knows that an algorithm was fair, just from seeing that historically the algorithm has hired or promoted (almost) entirely white men.
One-third of the sample will be randomized into the demographic-blind manager arm, another one-third will be randomized into the arm where managers knew demographics, one-sixth will be assigned by the algorithm and shown the race and gender of those who were previously assigned to the hard task, and another one-sixth will be assigned by the algorithm and not shown the race and gender of those who were previously assigned to the hard task.
After they are told about how they were assigned, workers will be asked to many stars they think they would have needed to score on the screening quizzes in order to be assigned to the harder task by their manager. After they answer this question, they will be asked to imagine that they are a worker with a different (fictitious) profile with randomly assigned characteristics, and asked how many stars they think they would have needed to score on the screening quizzes in order to be assigned to the harder task. Differences between these answers for fictitious workers of different races and genders provides a measure of implicit perceived discrimination.
Then, workers will do the easier proofreading task. Workers will know that they have to proofread at least six paragraphs to receive their completion payment and that they are able to proofread up to eighteen paragraphs (each for a bonus). They know that they will be eligible to be evaluated again to do the harder task for a higher wage in a future survey (though they could also be assigned again to the easier task). After either the first, third, or fifth paragraph (randomly assigned), workers will complete an affective well-being scale.
After finishing the easier proofreading task, workers' reservation wages to do this work again will be elicited, first under the assumption that the assignment mechanism is the same as in the experimental treatment (as depends on their treatment assignment) and again under the assumption that the workers with the top screening quiz scores will be offered the harder job (a cutoff rule). One set of wages will be selected and approximately five percent of workers will be randomly selected to have their choices implemented, for each assignment type.
Next, workers in a manager arms will be asked at what wage they would want to work together with their manager on a similar but collaborative (instead of hierarchical) task in the future, how much they would be willing to give up in wages to be able to choose their own manager (instead of a default of working with the same manager who assigned them in the main experiment), and how they would share a thank-you bonus with their manager. Each of these choices will be implemented for a randomly selected subset of participants.
Then, workers will answer questions about their self-efficacy to do the easier or harder job, job satisfaction, complaints about the promotion process, whether they think they would have been assigned to the harder task if they were assigned by the same mechanism but had a different race or gender (explicit measures of perceived discrimination), and asked incentivized questions about whether they think workers in each race*gender group were over-represented, under-represented, or neither, conditional on their quiz scores, among workers assigned to the harder task (measures of the perceived existence of discrimination).