Understanding the Implications of AI in the Job Application Process.

Last registered on January 07, 2024

Pre-Trial

Trial Information

General Information

Title
Understanding the Implications of AI in the Job Application Process.
RCT ID
AEARCTR-0008296
Initial registration date
September 27, 2021

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 30, 2021, 11:00 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
January 07, 2024, 10:42 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
Gothenburg University

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
PI Affiliation

Additional Trial Information

Status
Completed
Start date
2021-09-29
End date
2021-10-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In this project, we study whether candidates for a real job behave differently when they are informed their application will be evaluated by AI relative to a human evaluator. We also study the differences in the characteristics of candidates recommended by AI relative to a human evaluator and perceptions of AI recruitment tools. Finally, we attempt to understand any potential differences between AI and human evaluations.
External Link(s)

Registration Citation

Citation
Adamovic, Mladen et al. 2024. "Understanding the Implications of AI in the Job Application Process. ." AEA RCT Registry. January 07. https://doi.org/10.1257/rct.8296-1.2
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
In the project we study whether candidates for a real job behave differently when their applications are evaluated by AI relative to a human evaluator.
Intervention Start Date
2021-09-29
Intervention End Date
2021-10-30

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:

- Proportion who starts the candidate assessment: This is defined as the number of candidates who start the candidate assessment/ The number who receive the candidate assessment. A candidate is considered to have received the assessment if they are sent an email with details on the assessment.

-Proportion who complete: This is defined as the number who complete the candidate assessment/ The number who receive the candidate assessment. A candidate is considered to have received the assessment if they are sent an email with details on the assessment. A candidate is said to have completed if they answer all questions.

-Overall assessment evaluation score: This is made up of two parts: Overall evaluation of candidates AI score: The AI will evaluate all candidates and score them on a scale between 0-100 (where 0 means low and 100 high). Overall evaluation of candidates Human: Our human evaluators will score all candidates on a scale between 0-100.

Note: Any metrics calculated by the AI algorithm such as, the candidates overall score, are set by the tech firm we are collaborating with and is not influenced by the researchers. The algorithm will not change throughout the project.
Primary Outcomes (explanation)
Below we outline how we will use our primary outcomes and the key hypothesis.
We split the outcomes into the supply and demand side:

Supply Side: To understand whether AI evaluation impacts the behaviour of candidates relative to human evaluation we study the proportion that complete and the proportion who start the assessment. We generate the following hypothesis:

Hypothesis 1: Application/Completion rates are lower in the treatment where candidates are informed they will be evaluated by AI relative to humans.

Hypothesis 2: Females/ male ratio in application rates are higher in the AI-info treatment than in the human-info treatment.

Demand Side: To understand the differences between the AI evaluation score and the human evaluation score we study the overall assessment evaluation score (described above). In particular, our primary hypothesis is the following:

Hypothesis 3: The gender difference in Human evaluation scores are greater than AI evaluation scores IFF human evaluators are aware of the gender of the applicant.

To study this, we use the overall assessment evaluation score but we keep constant the information treatments. In other words, we study the evaluation score by humans relative to AI within the “Informed AI evaluator treatment” and separately within the “Informed Human evaluator treatment”. We do this to avoid confounding behaviour due to candidates knowledge about the evaluator and evaluator effects. Although, if we find that—based on the supply side—that the AI evaluator treatment leads to changes in the proportion who complete or start (relative to the human treatment) we will focus only on the sample taken from the Informed Human evaluator treatment. We do this to avoid the confounds of selection.

Secondary Outcomes

Secondary Outcomes (end points)
To understand possible mechanisms we elicit the following secondary outcome variables:

-Belief about bias: Candidates are asked to rate their expected level of bias when a human evaluates their assessment and also when an AI evaluates their assessment. In this analysis we compare these two responses.

-Belief about fairness: Candidates are asked to rate their expected and experienced level of fairness when a human evaluates their assessment and also when a AI evaluates their assessment. In this analysis we compare these two responses.

-Overall satisfaction with assessment process: All candidates are asked to rate the assessment process on a scale between 1 and 10 (where 10 represented most satisfied).

Beliefs about value of applicant in organization: All candidates are asked to rate whether the application process made them feel valued by the company.

Beliefs about status of applicant in organization: All candidates are asked to rate whether they believe they will have high or low status within the company.

Time to complete the questions: the candidates completion time is recorded.

Number of words: This outcome is the number of words written in the assessment.

Second order overall evaluation of candidates Human: Our human evaluators will also be asked to score candidates based on their expected score of the median human evaluator. The score will be on a scale between 0-100.

-We also ask human evaluators the rationale for their overall evaluation score. We use this to understand potential mechanisms.

Heterogeneity Analysis :
We will also use a number of variables to understand the influence of AI relative to humans on candidate diversity. Our main focus will be on gender but as a secondary analysis conditional on obtaining sufficient observations we also study ethnicity. To focus on diversity, we will interact the treatment variables by the gender and possibly ethnicity (separately) of the candidate. We will also include a variable indicating whether the evaluator is HR or a Web Designer.

We also conduct a number of robustness tests:

-We use the Double Lasso method to select possible control variables.

-To understand whether the genderness of the assessment text influences the human and AI evaluation score we will use an AI algorithm to rate the probability the assessment text is written by a male or female.

- Experimenter demand effect may influence the behavior of human evaluators (i.e., human evaluators may guess the aim of the project/or because they are observed they may inflate the scores of minorities/females to avoid being perceived as behaving in a socially undesirable manner). To measure this, we ask human evaluators what they think is the aim of the study.

-As a robustness to completion and application rates, we also track the proportion of people who open the email.

Secondary Outcomes (explanation)
See above

Experimental Design

Experimental Design
Our design aims to measure the impact of AI relative to human evaluation of job candidates.
Experimental Design Details
The design consists of two stages.

In stage 1, we will post a job ad for a real temporary web designer position across the United States at several job portals (e.g., joinhandshake.com, dice.com, indeed.com). To apply job applicants must send their CV and fill out a short survey (e.g. years of experience programming, demographics). Applicants must reside in the United States. After applications close, we will invite all applicants to take part in an assessment. The assessment involves responding to 4 text-based questions. The questions are mainly “behavioural interview questions” and they generally ask candidates to provide an example from their personal experience of when they have demonstrated a particular work-based trait (or behaviour). Candidates are expected to write between 50-150 words per question. Prior to taking part in the assessment, candidates are randomly assigned into one of our treatments (described above).

In Stage 2, our human evaluators will evaluate the assessments of all candidates. We will use Qualtrics or another data collection company to recruit the human evaluators. Our sample of human evaluators consists of 1) people who work as web designers and who are responsible for hiring, and 2) people who work in recruitment (including recruiting web developers). Each evaluator will be shown the responses to the assessment questions of several candidates (this number will be updated prior to the commencement of stage 2, see Appendix 1). The evaluators are given a brief description of the context and their task. For each candidate they are shown the responses to the assessment questions and information taken from the candidates CV (education, first name ect). Evaluators must then rate each candidate on a scale between 0-100 (where 100 is high). To incentivise evaluators, they are told (correctly) that their evaluation score will be used when deciding on whom to hire. As a secondary measure, we also elicit second order beliefs. Evaluators are asked to score candidates based on their expected score of the median human evaluator. Evaluators are paid a bonus based on the median rating of other human evaluators using a binarized scoring rule.

After evaluating the candidates, all evaluators complete a short survey. The survey collects additional information related to the research (e.g., whether they think women/ethnic groups would perform worse on these kinds of assessments, job experience, demographics etc). This survey will be used to help understand why differences between AI and human evaluators may exist.

Both the AI and Human evaluators will evaluate all candidates. All metrics calculated by the AI algorithm such as, the candidates overall score, is set by the tech firm we are collaborating with and is not influenced by the researchers. The algorithm will not change throughout the project.

We will also conduct additional treatments on the side of the evaluator to help understand behaviour and possibly to test solutions. However, these will be added in Appendix 2 after the first stage of the experiment.

Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
The randomization unit will be the individual for all treatments.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
The clusters will be equal to the number of legitimate candidates. Based on previous experience we expect to have 1200 legitimate job applicants, but this will likely vary depending on a number of factors. A legitimate candidate is someone who resides in the United States and completes the initial application form. We plan to assign 30% of the sample to the Inform AI treatment and 70% to the Inform Human evaluation treatment. Further, we will stratify by gender such that there is the same proportion of men/women in each information treatment.

For stage 2, we plan to recruit enough human evaluators so that each candidate is assessed by several human evaluators. We will decide on the exact number of potential human evaluators based on the number of candidates in stage 1. The updated number can be found in Appendix 1.
Sample size: planned number of observations
For the applicant sample (stage 1), the number of observations is the same as the number of clusters. We hope to have up to 1200 observations. For stage 2, the AI will evaluate all candidates. The candidate pool will also be evaluated by human evaluators each candidate will be evaluated by several human evaluators. The number of human evaluators and the number of candidates assessed by each evaluator will be added to the pre-analysis plan before the start of stage 2 (see Appendix 1).
Sample size (or number of clusters) by treatment arms
See above. We plan to assign 30% of the sample to the Inform AI treatment and 70% to the Inform Human evaluation treatment. Further, we will stratify our sample by gender such that 70% of the overall male and female sample will be assigned to the Inform Human evaluation treatment and 30% to the Inform AI treatment.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
The MDE for the following outcomes are: Proportion who complete and start: Using a significance of 0.05, power of 0.8 and a value of 0.6 as the proportion who complete/start in the absence of AI we can detect a minimum effect size of 0.079. We will add the other MDE’s before the start of stage 2 once we have confirmed the sample size of stage 2.
IRB

Institutional Review Boards (IRBs)

IRB Name
Monash University
IRB Approval Date
2021-07-16
IRB Approval Number
14985
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
Yes
Intervention Completion Date
October 30, 2021, 12:00 +00:00
Data Collection Complete
Yes
Data Collection Completion Date
October 30, 2021, 12:00 +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
Was attrition correlated with treatment status?
Final Sample Size: Total Number of Observations
Final Sample Size (or Number of Clusters) by Treatment Arms
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials