x

We are happy to announce that all trial registrations will now be issued DOIs (digital object identifiers). For more information, see here.
Biased Programmers? Or Biased Training Data? A Field Experiment about Algorithmic Bias
Last registered on November 11, 2019

Pre-Trial

Trial Information
General Information
Title
Biased Programmers? Or Biased Training Data? A Field Experiment about Algorithmic Bias
RCT ID
AEARCTR-0003574
Initial registration date
May 20, 2019
Last updated
November 11, 2019 10:51 AM EST
Location(s)

This section is unavailable to the public. Use the button below to request access to this information.

Request Information
Primary Investigator
Affiliation
Columbia Business School
Other Primary Investigator(s)
PI Affiliation
Columbia Business School
Additional Trial Information
Status
On going
Start date
2019-05-03
End date
2020-01-20
Secondary IDs
Abstract
Why does “algorithmic bias” occur? The two most frequently cited reasons are “biased programmers” and “biased training data.” We quantify the effects of these using a field experiment.
External Link(s)
Registration Citation
Citation
Cowgill, Bo and Fabrizio Dell'Acqua. 2019. "Biased Programmers? Or Biased Training Data? A Field Experiment about Algorithmic Bias." AEA RCT Registry. November 11. https://doi.org/10.1257/rct.3574-1.1.
Former Citation
Cowgill, Bo and Fabrizio Dell'Acqua. 2019. "Biased Programmers? Or Biased Training Data? A Field Experiment about Algorithmic Bias." AEA RCT Registry. November 11. https://www.socialscienceregistry.org/trials/3574/history/56727.
Experimental Details
Interventions
Intervention(s)
Participants will be assigned to one of four conditions, with either biased or unbiased data. We think of the biased data (with no other interventions) as being the control group.
Intervention Start Date
2019-05-03
Intervention End Date
2019-11-23
Primary Outcomes
Primary Outcomes (end points)
- The performance of the algorithms developed for the data science competition.
Performance will be evaluated based on accuracy of prediction (for both groups).
Primary Outcomes (explanation)
We use performance to measure and compare bias in the proposed algorithms.
Note that even if programmers were given training data with sample selection issues, they would be evaluated using a sample without selection bias. We feel that this is how many real-world applications work: Models trained using unrepresentative datasets (sometimes
“datasets of convenience”) are often deployed in practice on a larger population.

Additionally, we will measure the effects of different incentives: two different threshold levels to pass the assignment, and two levels of extra credit students gain for the same accuracy improvement.
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
We will recruit a diverse set of programmers into a data science contest. As participants register, they will receive a training dataset to utilize in predicting an outcome. The outcome will be related to an event that historically exhibited bias against a particular group.
Experimental Design Details
Not available
Randomization Method
The randomization will be done in office by a computer
Randomization Unit
Individual programmers
Was the treatment clustered?
Yes
Experiment Characteristics
Sample size: planned number of clusters
464 programmers
Sample size: planned number of observations
464 programmers
Sample size (or number of clusters) by treatment arms
264 programmers in the ML class, equally divided between the four treatments.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Supporting Documents and Materials

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
Columbia University
IRB Approval Date
2019-04-18
IRB Approval Number
AAAS2100