Algorithmic Fairness: The Role of Beliefs

Last registered on September 26, 2024

Pre-Trial

Trial Information

General Information

Title
Algorithmic Fairness: The Role of Beliefs
RCT ID
AEARCTR-0014169
Initial registration date
August 21, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 27, 2024, 3:37 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
September 26, 2024, 1:03 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
University of Hamburg

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2024-09-06
End date
2024-09-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Algorithmic decision support tools used in high-stakes domains such as criminal justice, hiring, and lending often exclude protected variables from input data to comply with anti-discrimination regulations and fairness principles. While these interventions are typically assessed based on quantitative disparity measures, their behavioral implications are less explored. This experiment examines how human decision makers, who are assisted by these tools in making accurate decisions about others, respond to algorithmic predictions that omit protected group membership, and whether these reactions are related to their prior statistical beliefs about protected groups. It examines how such interventions affect belief updating, and how this in turn affects discrimination in subsequent decision outcomes.

The experiment uses a hiring context in which participants predict the performance of individual workers on a math and science quiz, both before and after receiving algorithmic performance predictions for each worker. The workers are selected to be balanced by gender, with otherwise identical characteristics, allowing for the measurement of prior statistical beliefs about gender differences in quiz performance. The algorithm’s input data varies between subjects regarding the inclusion of the gender variable. Participants are informed about the input data. Prediction results remain constant as gender is neither a significant predictor nor correlated with significant predictors. Another treatment excludes month of birth (even vs. odd) from the training data, while keeping gender included. By excluding a non-stereotypical variable, the experiment differentiates whether participants’ reactions are specifically driven by the omission of the gender variable, or if their behavior changes due to the exclusion of any variable.
External Link(s)

Registration Citation

Citation
Woemmel, Arna. 2024. "Algorithmic Fairness: The Role of Beliefs ." AEA RCT Registry. September 26. https://doi.org/10.1257/rct.14169-1.2
Experimental Details

Interventions

Intervention(s)
(i) Exclusion of the gender variable from the algorithm’s data used for performance predictions.The input data is disclosed to participants. The prediction results and algorithm's accuracy remain constant, as gender is neither a predictor of performance nor correlated with any of the predictors.

(ii) Exclusion of the month of birth (even vs odd) variable from the algorithm’s data used for performance predictions.The input data is disclosed to participants. The prediction results and algorithm's accuracy remain constant, as month of birth (even vs. odd) is neither a predictor of performance nor correlated with any of the predictors.
Intervention (Hidden)
Intervention Start Date
2024-09-06
Intervention End Date
2024-09-30

Primary Outcomes

Primary Outcomes (end points)
Treatment effects on (i) belief updating about worker performance; and (ii) discrimination in subsequent hiring decisions; and whether these are related to prior statistical beliefs about gender differences in quiz performance.
Primary Outcomes (explanation)
(i) Belief updating is measured as the difference between beliefs about worker performance before (prior) and after (posterior) receiving the algorithmic performance prediction.
(ii) Hiring decisions are made sequentially, with a yes/no decision for each individual worker.

Secondary Outcomes

Secondary Outcomes (end points)
- Estimated accuracy of algorithm (calculate implied Bayesian Posterior)
- Failures in Bayesian updating; cognitive biases (conservatism bias, confirmation bias)
- Preferences, not explained by beliefs (e.g., tastes, social image concerns) in hiring decisions
- Asymmetric updating w.r.t postive vs. negative signal (i.e. low vs top prediction) and female vs. male worker
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
At the start of the experiment, participants are informed that 400 U.S. adults, representative of the U.S. population, took part in an online 20-questions Math and Science Quiz and briefed on the topics covered in the quiz. The experiment is divided into two parts. In each part they can earn an additional $5. At the end of the study, one of these parts will be randomly selected for bonus payment.

In Part 1, participants review short CVs of eight pre-selected workers and estimate for each worker the likelihood that their performance is in the top 50% relative to the performance of the other 400 workers ("top performer"). The order in which the workers are presented is randomized. The CVs include information on gender (female/male), level of education (no bachelor's degree / bachelor's degree or higher), and month of birth (even/odd). Participants then receive performance predictions for each worker from a machine learning algorithm and have the opportunity to revise their initial estimates. They are informed that the algorithm is trained on the remaining workers from the full 400-worker sample and uses all the CV variables in the baseline condition (excluding gender in the first treatment condition and exluding month of birth - even vs. odd numbered - in the second treatment), along with the workers' performance on a prior math and science quiz to make the performance predictions. Performance on the other quiz is strongly correlated with Math and Science Quiz performance, making the predictions informative. Prediction results remain constant as gender (month of birth) is neither a significant predictor nor correlated with significant predictors. Participants are shown the eight workers in the same order as before and must submit their final estimates, which will be used to determine their $5 bonus payment.

Part 2 follows immediately, in which participants complete a simple logic task after completing the hiring task. In the hiring task, participants are again presented with the eight workers and are asked to decide whether to hire them, making a yes/no decision for each individual. Participants are told that they will receive $2.50 if they solve the logic task correctly. For the bonus payment, one of the hiring decisions is randomly selected. If the selected decision involves hiring a top performer, the participant earns $5. If the selected decision is to hire a worker who is not a top performer, the participant earns $0. If the selected decision is not to hire the worker, the participant keeps their $2.50. The hired worker in the randomly selected decision receives $2.50 regardless of performance.

Participants are then asked to estimate the accuracy of the algorithm (incentivized). The experiment concludes with a brief survey on attitudes toward algorithms and technology, knowledge about algorithms, perceptions of gender discrimination in the U.S., and a demographic questionnaire.
Comprehension questions are presented throughout the experiment, which participants must answer to proceed. All beliefs are elicited using the stochastic Becker-DeGroot-Marschak method.

The quiz was previously conducted as part of a separate online Math and Science Quiz involving 400 U.S. adults, representative of the U.S. adult population, and consisted exclusively of ASVAB questions. These adults completed two similar 20-question quizzes consecutively. The variables presented in the CVs and used for the algorithm’s predictions were selected based on results from a separate survey of 300 U.S. adults (representative sample) that measured beliefs about differences in Math and Science Quiz performance by gender, education level, and month of birth. The algorithm is trained on the sample of 400 workers, excluding the eight selected workers. The algorithm is based on a logistic regression model.
Experimental Design Details
Randomization Method
Randomization done by a computer
Randomization Unit
Individual randomization in all treatments
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Baseline: All variables in training data included
Treatment 1: Gender excluded
Treatment 2: Month of birth (even vs. odd numbered) exluded
Sample size: planned number of observations
1100-1350 participants
Sample size (or number of clusters) by treatment arms
350-450 participants per treatment
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Ethics Committee for the Faculty of Business, Economics and Social Sciences at the University ofHamburg (WISO)
IRB Approval Date
2024-05-14
IRB Approval Number
2024-011

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials