An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task

Last registered on August 10, 2023

Pre-Trial

Trial Information

General Information

Title
An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task
RCT ID
AEARCTR-0011857
Initial registration date
August 04, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 10, 2023, 1:26 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Primary Investigator

Affiliation
Stanford Graduate School of Business

Other Primary Investigator(s)

PI Affiliation
Stanford University

Additional Trial Information

Status
In development
Start date
2023-08-10
End date
2023-08-17
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Our study uses an online lab experiment to understand how human decision-makers use recommendations provided by an algorithm, and how their use of the recommendation changes as the design of the algorithm varies. The experiment will assess if human participants can effectively use a recommendation algorithm which (i) ignores all observations for which the human has perfect certainty and/or (ii) withholds recommendations selectively. In theory these targeted recommendations are more effective than a generic recommendation as the participant should ignore the recommendation if they have perfect certainty (or aren't given the recommendation).

We have designed a small hiring game that will see if participants are able to properly ignore the algorithm when given certain information while taking the algorithm's recommendation in other instances. Participants see the role applicants are applying for (Engineering, Sales, or Communications) while a recommendation algorithm assisting them is able to assess their personality type. Participants need to hire good applicants and not hire bad applicants. Interventions vary the structure of the recommendation algorithm assisting the participants.
External Link(s)

Registration Citation

Citation
McLaughlin, Bryce and Jann Spiess. 2023. "An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task." AEA RCT Registry. August 10. https://doi.org/10.1257/rct.11857-1.0
Experimental Details

Interventions

Intervention(s)
Our Intervention is in the form of different recommendation algorithms given to the participant. We list the names of the treatments here for reference with the sample size distribution, but describe these treatments in the non-public section of the PAP.

Baseline
Compliance
Prior
Compliance-Prior
No Recommendation
Adjusted 3 Level
3 Level
Intervention (Hidden)
The Baseline algorithm sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 1/2 and sends a Don’t Hire recommendation otherwise.

The Compliance algorithm applies the threshold structure of the baseline recommendation to a version of the applicant pool which excludes the Engineering position.

The Prior algorithm applies a safety check to the baseline algorithm and if the safety check fails the recommendation is Not Sent. The safety check makes sure at least 1/2 of the subpool of Communications applicants are Good before sending a Hire recommendation and makes sure at least 1/2 of the subpool of Engineering and Sales applicants are Bad before sending a Not Hire recommendation. These subpools are chosen by applicants the decision-maker is likely to take the opposite decision of the recommendation over, meaning that these are the applicants for which we may expect the recommendation to impact the decision.

The Compliance-Prior algorithm applies the safety checks of the Prior algorithm to the Compliance algorithm.

The No Recommendation treatment allows us to verify participants act as expected when not given the recommendation.

The Adjusted 3 Level algorithm sends the same recommendation for each personality type as the Compliance-Prior algorithm. However, the Adjusted 3 Level has this structure as it sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 9/10, a Don’t Hire recommendation if the probability an applicant of a given personality type is Bad is at least 1/3, and otherwise a
recommendation is Not Sent.

The 3 Level treatment provides an alternative algorithm to measure the performance of Adjusted 3 Level against which is constructed in a similar way (a double threshold). This algorithm sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 2/3, a Don’t Hire recommendation if the probability an applicant of a given personality type is Bad is at least 2/3, and otherwise a recommendation is Not Sent.
Intervention Start Date
2023-08-10
Intervention End Date
2023-08-17

Primary Outcomes

Primary Outcomes (end points)
The difference in performance on our hiring task between participants who received different recommendation algorithms. Performance for us is defined as (Number of Good applicants Hired) + (Number of Bad applicants Not Hired).
Specifically our hypotheses are:

H1 - The average performance of the Compliance group exceeds the average performance of the Baseline group.

H2 - The average performance of the Prior group exceeds the average performance of the Baseline group.

H3a- The average performance of the Compliance-Prior group exceeds the average performance of the Prior group.

H3b- The average performance of the Compliance-Prior group exceeds the average performance of the Compliance group.

H4 - The average performance of the Compliance-Prior group exceeds the average performance of the Adjusted 3 Level group.
Primary Outcomes (explanation)
Performance will be measured as the total number of good applicants hired + the total number of bad applicants hired.

Secondary Outcomes

Secondary Outcomes (end points)
The probability each candidate is hired in each intervention group.
Secondary Outcomes (explanation)
We will look at which applicants (and thus which features) the different treatment groups perform differently on.

Experimental Design

Experimental Design

Participants will be informed they are being placed in the role of a hiring manager who must make 25 assisted hiring decisions. Each candidate (all of which are hypothetical) is good or bad and the good candidates should be hired while the bad candidates should not be hired. The participant knows the role the individual is interviewing for (Engineering, Sales, or Communications) and the probability the candidate is good for that role based on the pool your company hires from (100%, 60%, 40%). Participants are informed that to earn the most money they should hire if and only if they believe the probability a candidate is good is 50% or higher.

Participant answer comprehension questions.

In addition the participant may receive access to a recommendation algorithm (differentiated by treatments) that recommends they hire or don't hire the candidate. This algorithm sees a different attribute of the candidate than the role they are hiring for and then makes a report of (according to the algorithms information and the signaling mechanism described by the participants condition) hire, not hire, or no recommendation made.

Participant answer comprehension questions.

Participants then make 25 hiring decisions with their recommendation algorithm. For each decision they see what the recommendation says and what role they are hiring for. They do not see the personality type. To generate these candidates we will take a 25 member applicant pool (given to the participants) and give these 25 candidates to the participant in a random order.
Experimental Design Details
Randomization Method
Randomization will be done through the qualtrics randomizer which will balance (in weights we choose) the treatment distribution to incoming participants
Randomization Unit
Randomization is performed at the study subject level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
1500 subject
Sample size: planned number of observations
1500 subjects each making 25 decisions.
Sample size (or number of clusters) by treatment arms

Baseline - 250 participants
Compliance - 250 participants
Prior - 250 participants
Compliance Prior - 250 participants
No Recommendation - 125 participants
Adjusted 3 Level - 250 participants
3 Level - 125 participants
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Stanford University's Administrative Panels for the Protection of Human Subjects
IRB Approval Date
2023-07-28
IRB Approval Number
71013

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials