The impact of AI and Debiasing on job applicants

Last registered on January 07, 2024


Trial Information

General Information

The impact of AI and Debiasing on job applicants
Initial registration date
November 23, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 30, 2022, 3:27 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
January 07, 2024, 10:16 PM EST

Last updated is the most recent time when changes to the trial's registration were published.



Primary Investigator

Gothenburg University

Other Primary Investigator(s)

PI Affiliation
University of Exeter
PI Affiliation
Gothenburg University
PI Affiliation
University of Exeter
PI Affiliation
University of Exeter

Additional Trial Information

Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
In this project, we study how different evaluators (AI or human) and the nature of them (biased or debiased) affect application decisions and quality of applicants.
External Link(s)

Registration Citation

Awad, Edmond et al. 2024. "The impact of AI and Debiasing on job applicants." AEA RCT Registry. January 07.
Experimental Details


We vary the type of the evaluators (human or AI) and nature of the evaluators (biased or debiased) as well as the nature of the job (competitiveness).
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:
- Application decisions

In particular, we are interested in how decisions differ based on
-Confidence of the applicants
-Skills of the applicants
Primary Outcomes (explanation)
- The application decision is the binary choice between different jobs.
- Confidence is measured on a scale of 0 to 100 (with 100 being better than all other applicants, 50 being better than 50% of all applicants, etc.).
- Skills are measured using a quantitative reasoning task (total sum of correct responses in 4 minutes).

To study the research questions, we will compare across treatments. In particular, for each experiment we will do the following:
Experiment 1:
In addition to the choices within each treatment, we are also interested in studying the choices across treatments. We study:

• C1 vs C2— human vs debiased human
• C3 vs C4—debiased AI vs AI
• C1 vs C3--- AI vs human
• C2 vs C4—debiased AI vs debiased human

Experiment 2:

In addition to the choices within each treatment, we are also interested in examining choices across treatments. In particular, we are interested in comparing:

• C1 vs C2 vs C3-- this allows us to study the status quo (humans) vs AI vs debiased AI vs debiased Human
• C1 vs C4 vs C6-- status quo (AI) vs human vs debiased AI vs debiased Human
• C3 vs C4 vs C5—status quo (debiased human) vs human, debiased AI and AI
• C2 vs C5 vs C6—status quo (debiased AI) vs human, debiased human and AI

Secondary Outcomes

Secondary Outcomes (end points)
We are interested in a number of additional explanatory variables:

We study demographic variables, quality of resume collected as part of the experiment (self-reported).

We study possible mechanisms including beliefs about how good each evaluator (i.e., human evaluators, Human Evaluators trained on equal opportunities for all genders, AI and a debiased AI) is at selecting applicants with the best quantitative skills and similarly ensuring gender diversity.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
In this experiment, we ask online subjects to apply for “jobs” with different types of evaluators.
Experimental Design Details
In this project we use an online experiment to study how the use of evaluation algorithms compared to humans impacts applicant's application decision and choices and also how the use of different evaluations affects the quality and gender of applicants. Through a controlled experiment, we also examine the impact of debiasing human evaluators or algorithms on applicant decisions and the quality of applicants of different genders. This experiment contributes to the literature on diversity and inclusion, ethics of artificial intelligence, and human-computer interactions. More importantly, it has practical insights for practitioners on how to best use technology and diversity and inclusion training to attract the best applicants in an inclusive way.

To study this we use an online economic experiment using a sample of Prolific subjects. In the experiment subjects will answer a range of questions about their employment and skills, we use this to generate a resume for each individual. After this, subjects perform 2 experiments (in random order). In one experiment, subjects must choose between different jobs that differ on its attributes (reward amount, evaluation, competitiveness) under different evaluators (see above). In the other experiment, subjects must choose between different evaluators for jobs that have similar attributes otherwise (see above). These experiments mimic real life candidate shortlisting that are typically evaluated and screened using this information by humans or AI. After the experiments, we measure their quantitative skills and confidence. We also ask survey questions, comprehension questions to check understanding and attention checks.
Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
The randomization unit will be the individual for all treatments.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
Since randomization occurs at the individual level we cluster at the level of the individual. We will collect data from 1000 individuals.
Sample size: planned number of observations
We plan to collect data from 1000 individuals. We utilize a within subject design, since we have 2 experiments, the first one collecting data from 4 choices and the second from 6 choices, for each individual we will have 10 choices in total. Exclusion: Participants are excluded if they fail two attention check or complete the survey in less than a third of the median time from the pilot launch. Participants are excluded for providing nonsense responses in particular to the open ended questions asking them to explain their choices.
Sample size (or number of clusters) by treatment arms
We utilize a within subject design, meaning all subjects are assigned to all treatments (in a random order). This means the total sample size is the same as the number of individuals in the experiment.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
An observation of 1000 gives us around 9 percentage point MDE if we split our analyses by gender or above/below median skill or confidence level.

Institutional Review Boards (IRBs)

IRB Name
University of Exeter
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Intervention Completion Date
December 01, 2022, 12:00 +00:00
Data Collection Complete
Data Collection Completion Date
December 01, 2022, 12:00 +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
Was attrition correlated with treatment status?
Final Sample Size: Total Number of Observations
Final Sample Size (or Number of Clusters) by Treatment Arms
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials