What Drives Different Assessments of Performance on Coding Tasks?

Last registered on February 17, 2023

Pre-Trial

Trial Information

General Information

Title
What Drives Different Assessments of Performance on Coding Tasks?
RCT ID
AEARCTR-0009816
Initial registration date
December 14, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 03, 2023, 4:24 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
February 17, 2023, 8:38 AM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
University of Michigan

Other Primary Investigator(s)

PI Affiliation
University of Toronto

Additional Trial Information

Status
In development
Start date
2022-10-01
End date
2025-12-01
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
This study focuses on evaluations of in coding interviews, which are used to hire computer programmers. We aim to shed light on the mechanisms underlying these differences in these ratings, including differences in coding quality and style, and coder effort.
External Link(s)

Registration Citation

Citation
Craig, Ashley and Clémentine Van Effenterre. 2023. "What Drives Different Assessments of Performance on Coding Tasks?." AEA RCT Registry. February 17. https://doi.org/10.1257/rct.9816-1.1
Experimental Details

Interventions

Intervention(s)
We aim to assess how software developers evaluate pieces of code written by computer programmers. Our experiment uses a large set of de-identified code blocks from an online coding platform, which span coders of different skill and problems of different levels of difficulty. For each code block, we will have access to the code's objective measures of performance including sub-test results (e.g., whether it runs, whether it produces correct answers to unit tests etc.). Using these data, we will ask evaluators to judge the quality of the code using the same Likert scales as on the platform.
Intervention Start Date
2023-01-15
Intervention End Date
2023-02-15

Primary Outcomes

Primary Outcomes (end points)
Subjective ratings of code quality
Primary Outcomes (explanation)
Our primary outcome is evaluators’ subjective ratings of the code quality, whether it differs by the gender of the coder and by the treatment condition. For each block of code, respondents will be asked to rate problems on a scale from 1 to 4.

Secondary Outcomes

Secondary Outcomes (end points)
Evaluators’ prediction of the candidate’s score from the automated evaluation tool.
Secondary Outcomes (explanation)
This is a continuous variable from [0,1]. A third outcome variable is evaluators’ prediction of the candidate hireability score. This is measured on a Likert-scale from 1 to 4, and allows us to draw a more direct link between our findings and hiring outcomes. Additionally, we will measure how much time respondents spend on each question to measure fatigue and inattention, and how this varies over time.

Experimental Design

Experimental Design
Our design relies on multiple observations per subject. Each participant will evaluate 4 code blocks.
Experimental Design Details
Not available
Randomization Method
Randomization will be done by a computer.
Randomization Unit
We use a within-subject design: half of the codes seen by an evaluator will be in the treated group, half in the control group. The order in which the treated half of the codes is seen will be randomized. The order of the codes within each condition will be randomized.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
400 evaluators.
Sample size: planned number of observations
1600 observations.
Sample size (or number of clusters) by treatment arms
800 treatment code blocks, 800 non-non-treated.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Toronto Research Oversight and Compliance Office — Human Research Ethics Program
IRB Approval Date
2022-10-06
IRB Approval Number
41662
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information