Understanding How Information about AI affects Response to AI in the Recruitment Process

Last registered on October 23, 2023

Pre-Trial

Trial Information

General Information

Title
Understanding How Information about AI affects Response to AI in the Recruitment Process
RCT ID
AEARCTR-0012314
Initial registration date
October 17, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
October 23, 2023, 9:21 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Monash University

Other Primary Investigator(s)

PI Affiliation
Monash University

Additional Trial Information

Status
In development
Start date
2023-10-23
End date
2024-10-23
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In this project, we study how providing information about AI to candidates and evaluators for a real job affects their application and evaluation behavior.
External Link(s)

Registration Citation

Citation
Avery, Mallory and Andreas Leibbrandt. 2023. "Understanding How Information about AI affects Response to AI in the Recruitment Process." AEA RCT Registry. October 23. https://doi.org/10.1257/rct.12314-1.0
Experimental Details

Interventions

Intervention(s)
In this project, we study how providing information about AI to candidates and evaluators for a real job affects their application and evaluation behavior.
Intervention Start Date
2023-10-23
Intervention End Date
2024-10-23

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:

- Proportion who starts the candidate assessment: This is defined as the number of candidates who start the candidate assessment/ The number who receive the candidate assessment. A candidate is considered to have received the assessment if they are sent an email with details on the assessment.

-Proportion who complete: This is defined as the number who complete the candidate assessment/ The number who receive the candidate assessment. A candidate is considered to have received the assessment if they are sent an email with details on the assessment. A candidate is said to have completed if they answer all questions.

-Overall assessment evaluation score: This is made up of two parts: Overall evaluation of candidates AI score: The AI will evaluate all candidates and score them on a scale between 0-100 (where 0 means low and 100 high). Overall evaluation of candidates Human: Our human evaluators will score all candidates on a scale between 0-100.

Note: Any metrics calculated by the AI algorithm such as, the candidates overall score, are set by the tech firm we are collaborating with and is not influenced by the researchers. The algorithm will not change throughout the project.
Primary Outcomes (explanation)
In all of the following, we will focus primarily on the minority-majority gap (mmg), which we define as the gap in starting, completion, or score between a minority and majority group. These groups can be defined by gender, where women and non-binary people are the minority and men are the majority, race, where underrepresented minorities are the minority and white and Asian people are the majority, or both, where either a gender or race minority category means minority and the remainder are majority. We anticipate that the mmg will favor majorities in all of our measures.

Below we outline the primary comparisons and what conclusions will be drawn from them:

1) AI-Explained vs. AI-Biased: by comparing these two groups, we will evaluate the impact that disclaiming the possible existence of bias in AI has on the mmg.

The mmg may be higher in AI-biased due to greater concern from minorities about the presence of bias in the AI system; on the other hand, the mmg may be lower if the presence of a disclaimer makes minority candidates believe that transparency indicates something positive about the AI provider or employer, such as their goal to reduce disparities.

2) AI-Biased vs. AI-Debiased: by comparing these two groups, we will evaluate the impact that providing a statement asserting the unbiasedness of the AI has in the presence of knowledge that AI can be biased has on the mmg.

3) AI-Biased vs. Human-Oversight: by comparing these two groups, we will evaluate the impact that providing a statement asserting human oversight has in the presence of knowledge that AI can be biased has on the mmg.

For both 2 and 3 above, providing this additional information may lead to a smaller mmg by assuaging minorities’ concerns about bias in the AI. However, this information may be ineffective or actually seen as a negative if it is seen as a band-aid, rather than a true intervention to curb disparities between majority and minority groups.

4) AI-Debiased vs. Human-Oversight: by comparing these two groups, we will evaluate the relative efficacy of providing statements of debiased AI vs. statements of human oversight in reducing the mmg.

Using this comparison, we will understand the relative efficacy of providing information about efforts to debias the AI and having human oversight in final decision making. These are two proposed ideas for how to assuage concerns generated by required disclaimers of the potential bias in AI. Assertions of human oversight may be more effective if people believe that humans make better final decisions or they are uncomfortable with AI making important decisions like hiring. On the other hand, minorities may be concerned about the bias against themselves in human evaluators, leading to claims of human oversight being less effective.

5) AI-Explained vs. AI-Debiased: by comparing these two groups, we will evaluate the efficacy of providing statements of debiased AI relative to the benchmark of no explanation of bias for the mmg.

6) AI-Explained vs. Human-Oversight: by comparing these two groups, we will evaluate the efficacy of providing statements of human oversight relative to the benchmark of no explanation of bias for the mmg.

For both 5 and 6, we will be understanding whether the impact of informing applicants about the potential bias in AI can be mitigated by information about efforts to undo that bias (in AI-Debiased) or information about human oversight in the decision-making process (in Human-Oversight).

Secondary Outcomes

Secondary Outcomes (end points)
To understand possible mechanisms we elicit the following secondary outcome variables:

Time to complete the questions: the candidates completion time is recorded.

Number of words: This outcome is the number of words written in the assessment.

Second order overall evaluation of candidates Human: Our human evaluators will also be asked to score candidates based on their expected score of the median human evaluator. The score will be on a scale between 0-100.
Secondary Outcomes (explanation)
See above

Experimental Design

Experimental Design
Our design aims to measure the impact of providing information about AI and AI-generated bias on diversity in recruitment outcomes.
Experimental Design Details
Not available
Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
The randomization unit will be the individual for all treatments.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
The clusters will be equal to the number of legitimate candidates. Based on previous experience we expect to have 1200 legitimate job applicants, but this will likely vary depending on a number of factors. A legitimate candidate is someone who resides in the US or Australia and completes the initial application form. We plan to assign 25% of the applications to each treatment. We will cross-stratify by country, gender, and racial minority status.

For stage 2, we plan to recruit enough human evaluators so that each candidate is assessed by several human evaluators. We will decide on the exact number of potential human evaluators based on the number of candidates in stage 1. The updated number can be found in Appendix 1.
Sample size: planned number of observations
For the applicant sample (stage 1), the number of observations is the same as the number of clusters. We hope to have up to 1200 observations. For stage 2, the AI will evaluate all candidates. The candidate pool will also be evaluated by human evaluators. Each candidate will be evaluated by several human evaluators. The number of human evaluators and the number of candidates assessed by each evaluator will be added to the pre-analysis plan before the start of stage 2 (see Appendix 1).
Sample size (or number of clusters) by treatment arms
See above. We plan to assign 25% of the sample to all treatments. Further, we will cross-stratify by gender, racial minority status, and country.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
The MDE for the following outcomes are: Proportion who complete and start: Using a significance of 0.05, power of 0.8 and a value of 0.6 as the proportion who complete/start in the absence of AI we can detect a minimum effect size of 0.058. We will add the other MDE’s before the start of stage 2 once we have confirmed the sample size of stage 2.
IRB

Institutional Review Boards (IRBs)

IRB Name
Monash University Ethics Committee
IRB Approval Date
2018-11-14
IRB Approval Number
2023-14985-99498