An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task

Last registered on August 10, 2023

View Trial History

Pre-Trial

Trial Information

General Information

Title

An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task

RCT ID

AEARCTR-0011857

Initial registration date

August 04, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

August 10, 2023, 1:26 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

United States of America

Region

Online

Primary Investigator

Name

Bryce McLaughlin

Affiliation

Stanford Graduate School of Business

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Jann Spiess

PI Affiliation

Stanford University

Contact Investigator

Additional Trial Information

Status

In development

Start date

2023-08-10

End date

2023-08-17

Keywords

Behavior, Lab

Additional Keywords

Human Algorithm Interaction, Risk Assessments

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

Our study uses an online lab experiment to understand how human decision-makers use recommendations provided by an algorithm, and how their use of the recommendation changes as the design of the algorithm varies. The experiment will assess if human participants can effectively use a recommendation algorithm which (i) ignores all observations for which the human has perfect certainty and/or (ii) withholds recommendations selectively. In theory these targeted recommendations are more effective than a generic recommendation as the participant should ignore the recommendation if they have perfect certainty (or aren't given the recommendation).

We have designed a small hiring game that will see if participants are able to properly ignore the algorithm when given certain information while taking the algorithm's recommendation in other instances. Participants see the role applicants are applying for (Engineering, Sales, or Communications) while a recommendation algorithm assisting them is able to assess their personality type. Participants need to hire good applicants and not hire bad applicants. Interventions vary the structure of the recommendation algorithm assisting the participants.

External Link(s)

Registration Citation

Citation

McLaughlin, Bryce and Jann Spiess. 2023. "An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task." AEA RCT Registry. August 10. https://doi.org/10.1257/rct.11857-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Our Intervention is in the form of different recommendation algorithms given to the participant. We list the names of the treatments here for reference with the sample size distribution, but describe these treatments in the non-public section of the PAP.

Baseline
Compliance
Prior
Compliance-Prior
No Recommendation
Adjusted 3 Level
3 Level

Intervention (Hidden)

The Baseline algorithm sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 1/2 and sends a Don’t Hire recommendation otherwise.

The Compliance algorithm applies the threshold structure of the baseline recommendation to a version of the applicant pool which excludes the Engineering position.

The Prior algorithm applies a safety check to the baseline algorithm and if the safety check fails the recommendation is Not Sent. The safety check makes sure at least 1/2 of the subpool of Communications applicants are Good before sending a Hire recommendation and makes sure at least 1/2 of the subpool of Engineering and Sales applicants are Bad before sending a Not Hire recommendation. These subpools are chosen by applicants the decision-maker is likely to take the opposite decision of the recommendation over, meaning that these are the applicants for which we may expect the recommendation to impact the decision.

The Compliance-Prior algorithm applies the safety checks of the Prior algorithm to the Compliance algorithm.

The No Recommendation treatment allows us to verify participants act as expected when not given the recommendation.

The Adjusted 3 Level algorithm sends the same recommendation for each personality type as the Compliance-Prior algorithm. However, the Adjusted 3 Level has this structure as it sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 9/10, a Don’t Hire recommendation if the probability an applicant of a given personality type is Bad is at least 1/3, and otherwise a
recommendation is Not Sent.

The 3 Level treatment provides an alternative algorithm to measure the performance of Adjusted 3 Level against which is constructed in a similar way (a double threshold). This algorithm sends a Hire recommendation if the probability an applicant of a given personality type is Good is at least 2/3, a Don’t Hire recommendation if the probability an applicant of a given personality type is Bad is at least 2/3, and otherwise a recommendation is Not Sent.

Intervention Start Date

2023-08-10

Intervention End Date

2023-08-17

Primary Outcomes

Primary Outcomes (end points)

The difference in performance on our hiring task between participants who received different recommendation algorithms. Performance for us is defined as (Number of Good applicants Hired) + (Number of Bad applicants Not Hired).
Specifically our hypotheses are:

H1 - The average performance of the Compliance group exceeds the average performance of the Baseline group.

H2 - The average performance of the Prior group exceeds the average performance of the Baseline group.

H3a- The average performance of the Compliance-Prior group exceeds the average performance of the Prior group.

H3b- The average performance of the Compliance-Prior group exceeds the average performance of the Compliance group.

H4 - The average performance of the Compliance-Prior group exceeds the average performance of the Adjusted 3 Level group.

Primary Outcomes (explanation)

Performance will be measured as the total number of good applicants hired + the total number of bad applicants hired.

Secondary Outcomes

Secondary Outcomes (end points)

The probability each candidate is hired in each intervention group.

Secondary Outcomes (explanation)

We will look at which applicants (and thus which features) the different treatment groups perform differently on.

Experimental Design

Participants will be informed they are being placed in the role of a hiring manager who must make 25 assisted hiring decisions. Each candidate (all of which are hypothetical) is good or bad and the good candidates should be hired while the bad candidates should not be hired. The participant knows the role the individual is interviewing for (Engineering, Sales, or Communications) and the probability the candidate is good for that role based on the pool your company hires from (100%, 60%, 40%). Participants are informed that to earn the most money they should hire if and only if they believe the probability a candidate is good is 50% or higher.

Participant answer comprehension questions.

In addition the participant may receive access to a recommendation algorithm (differentiated by treatments) that recommends they hire or don't hire the candidate. This algorithm sees a different attribute of the candidate than the role they are hiring for and then makes a report of (according to the algorithms information and the signaling mechanism described by the participants condition) hire, not hire, or no recommendation made.

Participant answer comprehension questions.

Participants then make 25 hiring decisions with their recommendation algorithm. For each decision they see what the recommendation says and what role they are hiring for. They do not see the personality type. To generate these candidates we will take a 25 member applicant pool (given to the participants) and give these 25 candidates to the participant in a random order.

Experimental Design Details

Randomization Method

Randomization will be done through the qualtrics randomizer which will balance (in weights we choose) the treatment distribution to incoming participants

Randomization Unit

Randomization is performed at the study subject level.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

1500 subject

Sample size: planned number of observations

1500 subjects each making 25 decisions.

Sample size (or number of clusters) by treatment arms

Baseline - 250 participants
Compliance - 250 participants
Prior - 250 participants
Compliance Prior - 250 participants
No Recommendation - 125 participants
Adjusted 3 Level - 250 participants
3 Level - 125 participants

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

IRB