Going Beyond the Mean: How Gender Affects the Distribution of Evaluations and How the Distribution of Evaluations Affects Hiring Decisions

Last registered on August 10, 2023

Pre-Trial

Trial Information

General Information

Title
Going Beyond the Mean: How Gender Affects the Distribution of Evaluations and How the Distribution of Evaluations Affects Hiring Decisions
RCT ID
AEARCTR-0011877
Initial registration date
August 02, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 10, 2023, 12:59 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Monash University

Other Primary Investigator(s)

PI Affiliation
Monash University
PI Affiliation
University of Gothenburg

Additional Trial Information

Status
In development
Start date
2023-08-15
End date
2024-08-15
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In this project, we investigate the impact of various distributional metrics of job applicants on the evaluation of applicants and their likelihood of being selected for hiring. We also study whether these distributional metrics have similar effects depending on the gender of the applicants and whether the gender of the applicants is known.
External Link(s)

Registration Citation

Citation
Avery, Mallory, Andreas Leibbrandt and Joseph Vecci. 2023. "Going Beyond the Mean: How Gender Affects the Distribution of Evaluations and How the Distribution of Evaluations Affects Hiring Decisions." AEA RCT Registry. August 10. https://doi.org/10.1257/rct.11877-1.0
Experimental Details

Interventions

Intervention(s)
In this project, we conduct multiple interventions to investigate the impact of various distributional metrics of job applicants on the evaluation of applicants and their likelihood of being selected for hiring. We also study whether these distributional metrics have similar effects depending on the gender of the applicants and whether the gender of the applicants is known.
Intervention Start Date
2023-08-15
Intervention End Date
2024-08-15

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:
- Choose to hire: this is defined as the applicant, out of a pair, that is chosen by the evaluator to be recommended for hiring by the evaluator
Primary Outcomes (explanation)
In an earlier survey study, we found evidence that larger means and lower variance and minimums, relative to the other applicant, led to increased chance of being hired. We thus hypothesize that these patterns will follow in the experiment:

Hypothesis 1a: When mean differences are small metrics other than the mean (i.e. variance, range, maximum, minimum, outliers, skew) will have predictive power when determining which out of a pair of applicants is chosen to be hired.

Hypothesis 1b: When mean differences are small an applicant having a higher mean or maximum will increase their likelihood of being hired, while a higher variance will decrease their likelihood of being hired.

Small mean differences are defined based off of the three evaluation sets per pair that generated the smallest mean difference while retaining trade-offs between the two applicants, i.e. that one applicant did not strictly or weakly dominate the other applicant in terms of evaluations. The largest mean difference in our sample is 5.33 out of a possible range of 0-100. Because we focus on this case of smaller mean differences, we acknowledge that our results may not be generalizable to cases where the difference in means of the evaluations are more substantial.

Furthermore, based on this earlier evidence, we hypothesize the following treatment interactions:

Hypothesis 2: The predictive power of the evaluation metrics will be diminished when gender is known, compared to when gender is not known.

Hypothesis 3: When gender is known conditional on mean, we expect that metrics will have a less positive (or more negative) impact on hiring decisions for women than for men.

Hypothesis 4: The benefit (cost) from having a higher (lower) mean and a lower (higher) variance will be amplified for men (women) in mixed-gender pairs relative to when gender is not known.

Secondary Outcomes

Secondary Outcomes (end points)
To understand possible mechanisms we elicit the following secondary outcome variables:
-Time spent: as a proxy for attention, we will measure how long evaluators spend on each pair. Greater time spent will be related to greater attention spent on making the right decision.
-Gender: we will measure the likelihood that the chosen individual is female given whether gender is known and the gender composition of the pair
-Quality: we will measure the relative quality of the applicant that is chosen, as measured by their qualifications, average evaluation scores from all evaluations given, and the AI-generated scores assigned to each applicant.
Secondary Outcomes (explanation)
See above

Experimental Design

Experimental Design
Our design aims to measure the impact of the distribution of applicants’ evaluations when making hiring decisions.
Experimental Design Details
In this study, we will provide expert subjects, called evaluators, with information about applicants collected in Avery et al. (2023). They will be told, truthfully, that their decisions will help us decide to whom to offer the position.

Evaluators will be recruited to perform a freelance hiring activity. As this is a natural field experiment, they will not be told that they are in an experiment. Each evaluator will be randomized into one of two treatments: gender known or gender unknown. Each evaluator will be provided with information about the job and the recruitment process. Then, evaluators will be provided with a series of three pairs of applicants. Two of these pairs will be mixed gender, while one will have both applicants be male. For each pair, their job will be to choose who should be considered for the position. For each applicant, they will be provided with the following information: the applicant’s years of experience, their education (whether at least a university degree or not), where they learned coding, what coding languages they know, their answers to four interview questions, and 3 evaluations provided by other evaluators as a part of Avery et al. (2023). In addition to the above information, evaluators in the Gender-Known treatment will also be provided with the first name and last initial of the applicants. The ordering of the applicants on the screen will be determined randomly. The exact same pairs will be shown in the gender know and gender unknown treatments.

In a given pair, the evaluation scores of the applicants, referred to as the set of evaluations, may vary across evaluators. For example, in evaluation set 1, Applicant 1 could have scores of 10, 20, and 30, while Applicant 2 could have scores of 20, 80, and 100. In evaluation set 2, the scores could be different, such as Applicant 1: 80, 20, and 30, and Applicant 2: 70, 80, and 50. This design enables us to examine how evaluators rate the same applicant when the evaluation metrics, such as variance, differ. All evaluations shown will be real evaluations given to that applicant, we use the fact that applicants receive at least 3 evaluations, allowing us to vary the evaluation scores shown to each evaluator.

For each pair of applicants, evaluators will choose which they recommend to be hired. Then, they will be presented with the applicant they chose from all three pairs and asked, between those three, which they would recommend most. While this data will not be used in the primary analysis, it is to justify the presentation of the three pairs rather than one.

To select the pairs of applicants we use the following procedure:
• Starting with the sample of applicants from Avery et al (2023) we first drop individuals with non-typical Western sounding names, those with gender-neutral names, and those who have less than 3 evaluations.
• We also drop those applicants where the answers to the interview questions were unusually short in length or too long, relative to average.
• We then focus on the sample where the average of all evaluation scores put them into the top part of the distribution, as this is the sample from which the selected applicant is likely to come from.
• We then created two groups of pairs: exact match pairs, where the CVs of the two applicants within the pair are identical or very similar; and trade-off pairs, where the CVs of the two applicants within the pair are close in quality but differ on areas of relative strength (e.g., one applicant might have higher education, while the other would have more years of experience).
• For each type of pairs, we have one male-male set and the remainder are mixed-gender pairs.
• For each pair we create 3 sets of evaluation scores. We aimed to select sets of scores that have a low mean difference between pairs but also where there is variation in the variance, where possible. We also aimed to generate a low correlation between variance and mean.
• In total we selected 10 pairs and 3 evaluation sets per pair.

In order to not disadvantage the subjects who were not chosen to be in the pairs, we will use a system similar to that of Kessler et al. (2019) to use the decisions made by our evaluators over the sample pairs to identify the applicants in the full sample that would be chosen by the evaluators. We will then invite applicants who are predicted to be selected to be invited for further interview and from there to be hired.

Our sample will be taken from Mturk, Prolific and, if viable, UpWork.We plan collect 45% of our sample from Mturk, 45% from Prolific and 10% from Upwork although the latter depends on our ability to recruit viable evaluators.

References
Kessler, Judd B., Corinne Low, and Colin D. Sullivan. "Incentivized resume rating: Eliciting employer preferences without deception." American Economic Review 109, no. 11 (2019): 3713-3744.
Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
There are multiple forms of randomization:
1) Randomization into the Gender-Known and No-Gender treatments will be across subjects. Thus, this will be at the individual level.
2) We randomize the pairs shown to each evaluator. That is what pair is shown to each evaluator take place at the individual evaluator level.
3) For each evaluator we also randomize the set of evaluation scores shown. While the evaluation sets are pre-determined what evaluation set is shown to each individual is randomized at the individual evaluator level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
See above, in most cases randomization occurs at the individual level, as such we cluster at the individual level.
Sample size: planned number of observations
Each evaluator is shown 3 pairs. We have 10 pairs of applicants; this means there are 30 possible combination (10*3) that could be shown to evaluators. If we collect 500 evaluators who each see 3 pairs, we can at most have 1500 possible observations, we then divide this by 2 for the gender known and gender not known treatment which gives us 750 obs. This means we have 25 observations for each possible pair combination (750/30 = 25). Total number of evaluator observations will be 1500.
Sample size (or number of clusters) by treatment arms
Gender Known: 750
Gender Unknown: 750
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Assuming the proportion of the study population that would have a value of 1 for the binary outcome in the absence of the treatment is 0.50 and conservatively treating each evaluator as an individual observation, we have a MDE of 0.11 for our main outcome of “Choose to hire”.
IRB

Institutional Review Boards (IRBs)

IRB Name
Monash University Ethics Committee
IRB Approval Date
2018-09-18
IRB Approval Number
14985

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials