The impact of AI and Debiasing on job applicants

Last registered on November 30, 2022

Pre-Trial

Trial Information

General Information

Title
The impact of AI and Debiasing on job applicants
RCT ID
AEARCTR-0010470
Initial registration date
November 23, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 30, 2022, 3:27 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Gothenburg University

Other Primary Investigator(s)

PI Affiliation
University of Exeter
PI Affiliation
Gothenburg University
PI Affiliation
University of Exeter
PI Affiliation
University of Exeter

Additional Trial Information

Status
In development
Start date
2022-11-24
End date
2022-12-01
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In this project, we study how different evaluators (AI or human) and the nature of them (biased or debiased) affect application decisions and quality of applicants.
External Link(s)

Registration Citation

Citation
Awad, Edmond et al. 2022. "The impact of AI and Debiasing on job applicants." AEA RCT Registry. November 30. https://doi.org/10.1257/rct.10470-1.0
Experimental Details

Interventions

Intervention(s)
We vary the type of the evaluators (human or AI) and nature of the evaluators (biased or debiased) as well as the nature of the job (competitiveness).
Intervention Start Date
2022-11-24
Intervention End Date
2022-11-30

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:
- Application decisions

In particular, we are interested in how decisions differ based on
-Confidence of the applicants
-Skills of the applicants
-Gender
-Treatment
Primary Outcomes (explanation)
- The application decision is the binary choice between different jobs.
- Confidence is measured on a scale of 0 to 100 (with 100 being better than all other applicants, 50 being better than 50% of all applicants, etc.).
- Skills are measured using a quantitative reasoning task (total sum of correct responses in 4 minutes).

Analysis:
To study the research questions, we will compare across treatments. In particular, for each experiment we will do the following:
Experiment 1:
In addition to the choices within each treatment, we are also interested in studying the choices across treatments. We study:

• C1 vs C2— human vs debiased human
• C3 vs C4—debiased AI vs AI
• C1 vs C3--- AI vs human
• C2 vs C4—debiased AI vs debiased human

Experiment 2:

In addition to the choices within each treatment, we are also interested in examining choices across treatments. In particular, we are interested in comparing:

• C1 vs C2 vs C3-- this allows us to study the status quo (humans) vs AI vs debiased AI vs debiased Human
• C1 vs C4 vs C6-- status quo (AI) vs human vs debiased AI vs debiased Human
• C3 vs C4 vs C5—status quo (debiased human) vs human, debiased AI and AI
• C2 vs C5 vs C6—status quo (debiased AI) vs human, debiased human and AI

Secondary Outcomes

Secondary Outcomes (end points)
We are interested in a number of additional explanatory variables:

We study demographic variables, quality of resume collected as part of the experiment (self-reported).

We study possible mechanisms including beliefs about how good each evaluator (i.e., human evaluators, Human Evaluators trained on equal opportunities for all genders, AI and a debiased AI) is at selecting applicants with the best quantitative skills and similarly ensuring gender diversity.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
In this experiment, we ask online subjects to apply for “jobs” with different types of evaluators.
Experimental Design Details
In this project we use an online experiment to study how the use of evaluation algorithms compared to humans impacts applicant's application decision and choices and also how the use of different evaluations affects the quality and gender of applicants. Through a controlled experiment, we also examine the impact of debiasing human evaluators or algorithms on applicant decisions and the quality of applicants of different genders. This experiment contributes to the literature on diversity and inclusion, ethics of artificial intelligence, and human-computer interactions. More importantly, it has practical insights for practitioners on how to best use technology and diversity and inclusion training to attract the best applicants in an inclusive way.

To study this we use an online economic experiment using a sample of Prolific subjects. In the experiment subjects will answer a range of questions about their employment and skills, we use this to generate a resume for each individual. After this, subjects perform 2 experiments (in random order). In one experiment, subjects must choose between different jobs that differ on its attributes (reward amount, evaluation, competitiveness) under different evaluators (see above). In the other experiment, subjects must choose between different evaluators for jobs that have similar attributes otherwise (see above). These experiments mimic real life candidate shortlisting that are typically evaluated and screened using this information by humans or AI. After the experiments, we measure their quantitative skills and confidence. We also ask survey questions, comprehension questions to check understanding and attention checks.
Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
The randomization unit will be the individual for all treatments.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Since randomization occurs at the individual level we cluster at the level of the individual. We will collect data from 1000 individuals.
Sample size: planned number of observations
We plan to collect data from 1000 individuals. We utilize a within subject design, since we have 2 experiments, the first one collecting data from 4 choices and the second from 6 choices, for each individual we will have 10 choices in total. Exclusion: Participants are excluded if they fail two attention check or complete the survey in less than a third of the median time from the pilot launch. Participants are excluded for providing nonsense responses in particular to the open ended questions asking them to explain their choices.
Sample size (or number of clusters) by treatment arms
We utilize a within subject design, meaning all subjects are assigned to all treatments (in a random order). This means the total sample size is the same as the number of individuals in the experiment.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
An observation of 1000 gives us around 9 percentage point MDE if we split our analyses by gender or above/below median skill or confidence level.
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Exeter
IRB Approval Date
2022-11-17
IRB Approval Number
NA

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials