An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task

Last registered on August 10, 2023


Trial Information

General Information

An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task
Initial registration date
August 04, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 10, 2023, 1:26 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.


Primary Investigator

Stanford Graduate School of Business

Other Primary Investigator(s)

PI Affiliation
Stanford University

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Our study uses an online lab experiment to understand how human decision-makers use recommendations provided by an algorithm, and how their use of the recommendation changes as the design of the algorithm varies. The experiment will assess if human participants can effectively use a recommendation algorithm which (i) ignores all observations for which the human has perfect certainty and/or (ii) withholds recommendations selectively. In theory these targeted recommendations are more effective than a generic recommendation as the participant should ignore the recommendation if they have perfect certainty (or aren't given the recommendation).

We have designed a small hiring game that will see if participants are able to properly ignore the algorithm when given certain information while taking the algorithm's recommendation in other instances. Participants see the role applicants are applying for (Engineering, Sales, or Communications) while a recommendation algorithm assisting them is able to assess their personality type. Participants need to hire good applicants and not hire bad applicants. Interventions vary the structure of the recommendation algorithm assisting the participants.
External Link(s)

Registration Citation

McLaughlin, Bryce and Jann Spiess. 2023. "An Online Experiment on Compliance to Recommendation Algorithms in a Hiring Task." AEA RCT Registry. August 10.
Experimental Details


Our Intervention is in the form of different recommendation algorithms given to the participant. We list the names of the treatments here for reference with the sample size distribution, but describe these treatments in the non-public section of the PAP.

No Recommendation
Adjusted 3 Level
3 Level
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
The difference in performance on our hiring task between participants who received different recommendation algorithms. Performance for us is defined as (Number of Good applicants Hired) + (Number of Bad applicants Not Hired).
Specifically our hypotheses are:

H1 - The average performance of the Compliance group exceeds the average performance of the Baseline group.

H2 - The average performance of the Prior group exceeds the average performance of the Baseline group.

H3a- The average performance of the Compliance-Prior group exceeds the average performance of the Prior group.

H3b- The average performance of the Compliance-Prior group exceeds the average performance of the Compliance group.

H4 - The average performance of the Compliance-Prior group exceeds the average performance of the Adjusted 3 Level group.
Primary Outcomes (explanation)
Performance will be measured as the total number of good applicants hired + the total number of bad applicants hired.

Secondary Outcomes

Secondary Outcomes (end points)
The probability each candidate is hired in each intervention group.
Secondary Outcomes (explanation)
We will look at which applicants (and thus which features) the different treatment groups perform differently on.

Experimental Design

Experimental Design

Participants will be informed they are being placed in the role of a hiring manager who must make 25 assisted hiring decisions. Each candidate (all of which are hypothetical) is good or bad and the good candidates should be hired while the bad candidates should not be hired. The participant knows the role the individual is interviewing for (Engineering, Sales, or Communications) and the probability the candidate is good for that role based on the pool your company hires from (100%, 60%, 40%). Participants are informed that to earn the most money they should hire if and only if they believe the probability a candidate is good is 50% or higher.

Participant answer comprehension questions.

In addition the participant may receive access to a recommendation algorithm (differentiated by treatments) that recommends they hire or don't hire the candidate. This algorithm sees a different attribute of the candidate than the role they are hiring for and then makes a report of (according to the algorithms information and the signaling mechanism described by the participants condition) hire, not hire, or no recommendation made.

Participant answer comprehension questions.

Participants then make 25 hiring decisions with their recommendation algorithm. For each decision they see what the recommendation says and what role they are hiring for. They do not see the personality type. To generate these candidates we will take a 25 member applicant pool (given to the participants) and give these 25 candidates to the participant in a random order.
Experimental Design Details
Randomization Method
Randomization will be done through the qualtrics randomizer which will balance (in weights we choose) the treatment distribution to incoming participants
Randomization Unit
Randomization is performed at the study subject level.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
1500 subject
Sample size: planned number of observations
1500 subjects each making 25 decisions.
Sample size (or number of clusters) by treatment arms

Baseline - 250 participants
Compliance - 250 participants
Prior - 250 participants
Compliance Prior - 250 participants
No Recommendation - 125 participants
Adjusted 3 Level - 250 participants
3 Level - 125 participants
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Institutional Review Boards (IRBs)

IRB Name
Stanford University's Administrative Panels for the Protection of Human Subjects
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials