Reducing perceptions of discrimination (follow-up to AEARCTR-0009592)

Last registered on March 01, 2024

View Trial History

Pre-Trial

Trial Information

General Information

Title

Reducing perceptions of discrimination (follow-up to AEARCTR-0009592)

RCT ID

AEARCTR-0011806

Initial registration date

July 31, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

August 10, 2023, 12:33 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

March 01, 2024, 12:32 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

Hannah Ruebeck

Affiliation

MIT

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

Completed

Start date

2023-07-31

End date

2024-02-29

Keywords

Behavior, Gender, Labor

Additional Keywords

Race, Discrimination, Beliefs

JEL code(s)

J71, J53, J22

Secondary IDs

Prior work

This trial is based on or builds upon one or more prior RCTs.

Abstract

This pre-analysis plan documents the intended analysis for an experiment that follows up on AEARCTR-0009592. This follow-up randomized experiment examines how individuals perceive discrimination, further (relative to the original experiment) varying the methods used to hire workers and what workers know about them to understand certain mechanisms behind the original treatments that reduce perceptions of discrimination. The main outcome is the rate of perceived discrimination in each of 6 treatment arms (four of which replicate the original experiment). The follow-up will also replicate and extend the results of the original experiment on the effects of perceived discrimination on future labor supply, and, unlike the original experiment, will measure comprehension of the various treatments. This plan outlines the study design and hypotheses, outcomes of interest, and empirical specifications.

External Link(s)

The pre-registration of the original experiment that this follow-up builds on

Registration Citation

Citation

Ruebeck, Hannah. 2024. "Reducing perceptions of discrimination (follow-up to AEARCTR-0009592)." AEA RCT Registry. March 01. https://doi.org/10.1257/rct.11806-2.1

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

The intervention varies how participants are hired to do a difficult proofreading and summarizing task related to scientific communication, as well as the information participants know about the hiring methods used (the original experiment used the same evaluation methods but was in the context of job assignment to one of two tasks, i.e. promotion). The follow-up experiment replicates the three methods used in the original experiment and adds one more that will extend the findings of the original study. To further understand the mechanisms behind the results of the original study, the experiment will further vary the information that workers have about the hiring methods.

Intervention Start Date

2023-09-12

Intervention End Date

2023-10-30

Primary Outcomes

Primary Outcomes (end points)

Perceived discrimination and future labor supply (reservation wages)

Primary Outcomes (explanation)

The main measure of perceived discrimination is the explicit measure of perceived discrimination used in the original experiment: Mentioning race, gender, bias, or discrimination in a free-response question about what needed to be different about their profile to be hired to do the proofreading task.

Secondary Outcomes

Secondary Outcomes (end points)

Comprehension of the various hiring mechanisms, secondary measures of perceived discrimination, beliefs about the likelihood of being hired in the future, self-efficacy, affective well-being

Secondary Outcomes (explanation)

Secondary measures of perceived discrimination are complaints about discrimination or bias, and answering “yes” to questions about being hired if their gender or race was different. Self-efficacy and affective well-being are the same indices used in the original experiment.

Experimental Design

Workers will be recruited with a screening survey and then evaluated by four hiring methods. Workers are hired or not hired to do a difficult task by one randomly-assigned mechanism. Workers who are hired are not the sample of interest. The remaining workers who were not hired by their randomly-assigned hiring method are the sample of interest. They learn about their randomly-assigned method used to determine if they were hired, answer questions about their interest in future work, and answer some survey questions.

Experimental Design Details

The experimental design follows almost exactly the design of the original experiment, with the following differences:
1. The same managers are brought back for the follow-up experiment, eliminating the need to re-recruit a "historical sample" since I can tell workers about their manager's/the algorithm's decisions in the original historical sample
2. The screening survey is shortened to reduce costs but the same measures are used for building worker evaluation profiles are collected as in the main experiment
3. Managers are paired in the same way and do the same worker evaluation task, but they evaluate 9 groups of 40 rather than 3
4. When the workers who aren't hired are brought back, there is no "easier task" for them to do (i.e. I am not measuring effects on effort and performance). I measure only the outcomes described in the previous section (further shortening the experimental survey)
5. One-sixth of the sample will be randomized into each of the arms described above.

Randomization Method

Randomization is done in an office using Stata on a computer and treatment values are uploaded to Qualtrics for each participant when they return for the experimental survey. This allows clustering, which is not possible when randomizing in Qualtrics directly. Workers are linked across surveys by their Prolific ID.

Randomization Unit

Workers are grouped into random groups of 40, conditional on having quiz scores in adjacent quintiles. These groups are randomly assigned to treatment (which hiring method they will be evaluated by). Each group of 40 is sees the same historically-hired workers and, in the manager arms, manager profile.

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

3960 workers are originally recruited. There are 99 groups (of 40 workers each) that are assigned together to a particular treatment. Conditional on their quintile quiz scores these workers are randomly grouped together. Some groups of 40 are randomly further paired with another to make a "super group" that sees the same historically-promoted workers, and the same demographics of their manager's if they are in the manager arms. This yields approximately 66 total clusters, with half the clusters having 80 workers and the other half having 40.

Sample size: planned number of observations

3960 workers are originally recruited. 3166 are expected to be not hired and return for the experimental session. 2850 of these are expected to be in the subsample that would have been not hired by all four hiring methods.

Sample size (or number of clusters) by treatment arms

There are 15 or 16 groups of 40 workers assigned to each of the 6 treatment arms described above.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Approximately 530 workers are expected to return in each of the nine treatment arms and approximately 475 per arm are expected to be in the subsample not hired by any hiring method. For each power calculation I will specify the MDE for the 530 case followed by the 475 case in parentheses. The primary research question in this follow up is what types of algorithms are effective at reducing perceptions of discrimination. Thus, I am most interested in comparing each of the other treatment groups to the non-blind manager, and I calculate conservative MDEs by focusing on the comparisons of two groups at a time without control variables – pooling arms and adding controls would improve the precision of the estimates. With this sample size, I will be powered (confidence level = 0.05, power = 80 percent) to detect a 8.5pp (9pp) difference in either direction between each treatment group and the non-blind manager group (based on analytical power calculations with a total sample size of 1060 (950), and assuming that 40 percent of participants perceive discrimination in the non-blind manager group, as in the original experiment (among women and racial minority men, who will make up the whole sample for the follow-up). Given the sample size needed to obtain the power described above, I can also calculate the MDEs for the differences between other treatment groups, depending on the rate of perceived discrimination in the less-discriminatory group, all of which would be better-powered based on the results from the main experiment. For example, I am interested in testing whether the algorithm that uses demographics is perceived to discriminate more than the algorithm that doesn't, as well as the difference between the arms with the blind manager and the algorithm without demographics in which workers know that mostly white men were hired in the past. Here, the relevant control mean is 20%, not 40%, so I would be powered to detect differences larger than 7.3pp (7.7pp). When one group has near zero percent of participants perceiving discrimination, I can detect differences larger than 2.5pp (2.7pp). The second outcome is reservation wages for future work, which, between the manager arms is a replication of the original experiment and will only be possible in the algorithm arms if there are still positive rates of perceived of discrimination in some of the algorithm arms. Again focusing on comparing just two arms, the MDE for the effect on a continuous variable is about 0.17sd for either sample size. Instead, pooling the two arms where there will almost certainly be no perceived discrimination (based on the results of the original experiment) and pooling the two arms where there will most likely be positive rates of perceived discrimination between 20-40 percent (based on the results of the original experiment), the MDE is about 0.12sd for either sample size (N=2120 or 1900).

Supporting Documents and Materials

IRB