Reducing perceptions of discrimination

Last registered on September 14, 2022

View Trial History

Pre-Trial

Trial Information

General Information

Title

Reducing perceptions of discrimination

RCT ID

AEARCTR-0009592

Initial registration date

June 22, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

June 26, 2022, 5:25 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

September 14, 2022, 10:59 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

Hannah Ruebeck

Affiliation

MIT

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

In development

Start date

2022-10-05

End date

2024-05-31

Keywords

Behavior, Gender, Labor

Additional Keywords

Race, Discrimination, Beliefs

JEL code(s)

J71, J53, J22

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

This randomized experiment examines how individuals perceive discrimination under three job assignment mechanisms (with varying potential to discriminate) and the effects of the two mechanisms that reduce the scope for discrimination from the status quo on perceived discrimination, retention, effort, performance, cooperation with and reciprocity towards managers, and future labor supply. The study design and randomization ensure that the only differences between the three groups is what they believe about the job assignment process.

External Link(s)

Registration Citation

Citation

Ruebeck, Hannah. 2022. "Reducing perceptions of discrimination." AEA RCT Registry. September 14. https://doi.org/10.1257/rct.9592-2.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

The intervention varies how participants are assigned to the easier, lower-paying of two tasks related to scientific communication. In the status quo arm, workers are evaluated by managers who know worker demographics. In two treatment arms, workers are evaluated by other mechanisms that are unable to discriminate.

Intervention Start Date

2022-11-03

Intervention End Date

2022-12-05

Primary Outcomes

Primary Outcomes (end points)

Perceived discrimination, effort, retention, and performance, and future labor supply

Primary Outcomes (explanation)

Several of the above variables can be combined into indices, which is described in more detail in the uploaded pre-analysis plan. Perceived discrimination is measured in three ways:
(1) Implicit perceived discrimination is the difference between how many stars a worker thinks they would have needed to earn on the screening quiz to be assigned to the hard task and how many stars they think comparable workers with different races and genders would have needed to earn to be assigned to the hard task. On an aggregate level, gender and racial differences in the number of stars workers think they would have needed to earn (conditional on how many they did earn) to be assigned to the hard task is another measure of perceived discrimination
(2) Explicit perceived discrimination: Answering "Yes, I think so" or "or Yes, I am sure" to the two questions "do you think you would have been assigned to the harder task if your gender [race] was different."
(3) General (population) perceived discrimination: whether workers think that their own group (or others) is under-represented among workers assigned to the hard task.

Secondary Outcomes

Secondary Outcomes (end points)

Self-efficacy in the work task and task-related skills, job satisfaction, affective well-being, cooperation with and reciprocity towards managers, and beliefs about the likelihood of future discrimination.

Secondary Outcomes (explanation)

Experimental Design

Workers will be recruited with a screening survey and then evaluated by three job assignment mechanisms. Workers are assigned to either a harder or easier task by one randomly-assigned mechanism. Workers assigned to the hard task are not the sample of interest. The remaining workers who are assigned to the easy task by their randomly-assigned mechanism do the easier, lower-paying task. They will then answer questions about their interest in future work, answer survey questions, and finish the experiment.

Experimental Design Details

(1) Part of the experimental design includes showing workers the profiles of other workers who were assigned to the hard task by their manager/the algorithm (depending on their treatment assignment) in a previous round. This “historical sample” round is designed to be more representative of STEM or other industries where white men are particularly over-represented. Generally, the study procedures for the historical sample will be the same as for the main sample, with some logistical differences.

(2) The screening survey. Workers will be recruited with a “screening survey,” common on Prolific, that qualifies the worker to participate in future high-paying surveys. In this screening survey, their first session, workers will complete the three screening tasks, and answer questions about their demographics and job history. After all workers have completed the screening survey, their scores and demographics will be aggregated into worker profiles. Workers will be grouped into groups with similar education levels and average quiz scores to be evaluated and assigned to the harder or easier task. Quiz scores are shown as 1-5 stars on a worker profile that are approximate quintiles.

(3) Manager randomization. In the historical sample, each worker profile will be evaluated by one manager. Managers will be randomly assigned to evaluate workers with or without demographic information in their profile (profiles always include average test score stars and education). Managers will get the same type of information about their workers when they return to evaluate the main study sample. They will also be randomly assigned to evaluate workers of a certain average quiz score level (1-2 stars, 2-3 stars, 3-4 stars, 4-5 stars) and will evaluate workers of the same average quiz score level when they return to evaluate the main study sample.

After managers return to evaluate the main study sample workers, they will be randomly paired with another manager who has the opposite assignment as them (receiving demographics when they don't, or vice versa) and was assigned to evaluate workers of the same average quiz score level. These manager pairs will evaluate the same 360 workers (3 sets of 120, say set A, B, and C). All three sets will also be evaluated by the algorithm. Managers know that their decisions will be implemented for one of the three sets and that the performance of the workers they assign to the harder task in that set will determine their bonus payment.

(4) Worker randomization. Then, each set A, B, and C will be randomly matched with one of the three job assignment mechanisms, without replacement. For example, set B could be assigned to jobs following the decisions of the demographic-blind manager, set A could be assigned to jobs following the decisions of the manager with access to demographics, and set C could be assigned to jobs following the decisions of the algorithm. Workers are only told about the mechanism that determined their assignment. The assignments by the other mechanisms only generate data to be used in the analysis of the experiment.

(5) Work task and survey. After all workers have been evaluated, 2-3 weeks after workers complete the screening task, any worker who is assigned to the harder task by their manager or the algorithm (depending on their treatment assignment) will be offered the harder task. They will do the harder task and finish the experiment (about 8 percent of workers).

At the same time, any worker who is assigned to the easier task by their manager or the algorithm (depending on their treatment assignment) will be offered the easier task. Among this sample of interest, after agreeing to take the follow-up survey, workers will be told about how they were assigned to the easier, lower-paying task. If they were assigned by a manager, they will see some demographic characteristics of their manager. All workers will see three profiles of the workers that their manager/the algorithm (depending on their treatment assignment) assigned to the harder task in a previous round (i.e. the historical sample). If their manager knew their and other workers' demographics, those demographics will be included in the profiles along with quiz scores and education. If their manager did not know anyone's demographics, the profiles will only include test score stars and education. Workers assigned to be evaluated by the algorithm will be randomized into two sub-groups: those who are shown the race/gender/avatar of those previously assigned by the algorithm to the hard task, and those who are not. All of them will know that the algorithm only used data on education and quiz scores and see this information. Comparisons of these two groups isolates a difference in perceived discrimination within the algorithm arm, and provides data on how common perceived discrimination might be even in a setting where a worker knows that an algorithm was fair, just from seeing that historically the algorithm has hired or promoted (almost) entirely white men.

One-third of the sample will be randomized into the demographic-blind manager arm, another one-third will be randomized into the arm where managers knew demographics, one-sixth will be assigned by the algorithm and shown the race and gender of those who were previously assigned to the hard task, and another one-sixth will be assigned by the algorithm and not shown the race and gender of those who were previously assigned to the hard task.

After they are told about how they were assigned, workers will be asked to many stars they think they would have needed to score on the screening quizzes in order to be assigned to the harder task by their manager. After they answer this question, they will be asked to imagine that they are a worker with a different (fictitious) profile with randomly assigned characteristics, and asked how many stars they think they would have needed to score on the screening quizzes in order to be assigned to the harder task. Differences between these answers for fictitious workers of different races and genders provides a measure of implicit perceived discrimination.

Then, workers will do the easier proofreading task. Workers will know that they have to proofread at least six paragraphs to receive their completion payment and that they are able to proofread up to eighteen paragraphs (each for a bonus). They know that they will be eligible to be evaluated again to do the harder task for a higher wage in a future survey (though they could also be assigned again to the easier task). After either the first, third, or fifth paragraph (randomly assigned), workers will complete an affective well-being scale.

After finishing the easier proofreading task, workers' reservation wages to do this work again will be elicited, first under the assumption that the assignment mechanism is the same as in the experimental treatment (as depends on their treatment assignment) and again under the assumption that the workers with the top screening quiz scores will be offered the harder job (a cutoff rule). One set of wages will be selected and approximately five percent of workers will be randomly selected to have their choices implemented, for each assignment type.

Next, workers in a manager arms will be asked at what wage they would want to work together with their manager on a similar but collaborative (instead of hierarchical) task in the future, how much they would be willing to give up in wages to be able to choose their own manager (instead of a default of working with the same manager who assigned them in the main experiment), and how they would share a thank-you bonus with their manager. Each of these choices will be implemented for a randomly selected subset of participants.

Then, workers will answer questions about their self-efficacy to do the easier or harder job, job satisfaction, complaints about the promotion process, whether they think they would have been assigned to the harder task if they were assigned by the same mechanism but had a different race or gender (explicit measures of perceived discrimination), and asked incentivized questions about whether they think workers in each race*gender group were over-represented, under-represented, or neither, conditional on their quiz scores, among workers assigned to the harder task (measures of the perceived existence of discrimination).

Randomization Method

Randomization is done in an office using Stata on a computer and treatment values are uploaded to Qualtrics for each participant when they return for the follow-up (experimental) survey. This allows clustering, which is not possible when randomizing in Qualtrics directly.

Randomization Unit

Workers are grouped into random groups of 120, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which job assignment mechanism they will be evaluated by).

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

3,600 workers will be initially recruited. There are 30 groups (of 120 workers) that are assigned together to a particular treatment.

Sample size: planned number of observations

3,600 workers will be initially recruited. 2,640 are expected to be assigned to the easier task and return for the experimental session. 2304 are expected to be in the subsample that would have been assigned to the easy task by all three mechanisms.

Sample size (or number of clusters) by treatment arms

10 groups control, 10 groups treatment 1, 5 groups treatment 2 and see demographics of historical workers, 5 groups treatment 2 and don't see demographics of historical workers.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

All calculations assume power of 80 percent and a significance level of 0.95. I will recruit 3,600 participants to complete the screening survey. I expect take-up for the follow-up survey to be high, since the screening survey's initial description will indicate that there is a well-paid follow up survey. Assuming that 80 percent of workers complete the follow-up survey and 92 percent of workers are assigned to the easier task, the final analysis sample will be around 2,640 workers assigned to the easier task by their randomly assigned mechanism or around 2,304 workers assigned to the easy task by any of the three mechanisms (assuming 20 percent of workers are assigned to the hard task by at least one mechanism, i.e. that the mechanisms’ decisions are slightly but not strongly correlated). The analysis might use either of these samples to deal with “selection” issues, as described above. 1. First stage regressions 1.1. Main effects. Regressions to test whether the demographic-blinded manager reduces perceived discrimination relative to the manager who knows demographics are powered to detect effects larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). Similarly, testing whether one of the algorithm sub-groups reduces perceived discrimination relative to the manager who knows demographics is powered to detect effects larger than 8 percentage points for either sample size. Given the results from a pilot study, the effect sizes are expected to be larger than these MDEs. 1.2. Treatment effect heterogeneity. Treatment effect heterogeneity is powered as follows: tests of whether the effect of the algorithm depends on whether the worker knows the race of the historically assigned workers are powered to detect differences larger than 9.5 or 10 percentage points (for sample sizes of 2,640 or 2,304, respectively). Tests of whether the effect of the demographic-blind human differs from one algorithm sub-group are powered to detect differences of 8 percentage points and tests that the effect of the demographic-blind human differs from both algorithm subgroups (which are pooled and don't differ from each other) are powered to detect differences larger than 6.5 or 7 percentage points (for sample sizes of 2,640 or 2,304, respectively). 1.3. Racial and gender heterogeneity. Racial and gender heterogeneity is powered as follows: when testing for heterogeneity in the effects of the blinded manager, gender heterogeneity among non-white participants and racial heterogeneity among men are powered to detect differences in the treatment effect of 15 percentage points, gender heterogeneity among white participants is powered to detect differences in the treatment effect of 18 percentage points, and racial heterogeneity among women is powered to detect differences in the treatment effect of 20 percentage points. For each group, testing for heterogeneity in the effects of the algorithm are powered to detect MDEs about 3 percentage points larger than the MDEs for differences in the effects of the blinded manager. These MDEs come from simulations with a sample size of 2,640; with a sample size of 2,304 each MDE is about 1 percentage point larger. 2. Reduced form regressions 2.1. Binary outcomes. Regressions to test the effects of the demographic-blind manager on the binary measures of retention (completing only the minimum 6 paragraphs, or completing all 18 paragraphs) are powered to detect effects larger than 5 or 5.5 percentage points (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 6 or 6.5 percentage points (for sample sizes of 2,640 or 2,304, respectively). In pilot data, 12 percent of workers completed only 6 paragraphs and 68 percent complete all 18 paragraphs, which is assumed in these calculations. 2.2. Continuous outcomes. All other outcomes are continuous. Regressions to test the effects of the demographic-blind manager are powered to detect effects larger than 0.14sd or 0.15sd (for sample sizes of 2,640 or 2,304, respectively), and tests of the effect of one algorithm sub-group are powered to detect effects larger than 0.17sd (for either sample size). 3. Two-stage least squares regressions 3.1. The two-stage least squares power calculations assume that the effects of the treatments on perceived discrimination are quite large, effectively taking the rate of perceived discrimination to zero in the demographic-blind manager group and both algorithm sub-groups. This is consistent with piloting (though in very small samples). 3.2. Then, two-stage-least-squares regressions are powered to detect effects of reducing perceived discrimination that are larger than 4 or 5 percentage points on the binary outcomes (for sample sizes of 2,640 or 2,304, respectively), and effects larger than 0.12sd on the continuous outcomes (for either sample size).

Supporting Documents and Materials

IRB