Advanced Search

What Drives Different Assessments of Performance on Coding Tasks?

Last registered on January 08, 2025

View Trial History

Pre-Trial

Trial Information

General Information

Title

What Drives Different Assessments of Performance on Coding Tasks?

RCT ID

AEARCTR-0009816

Initial registration date

December 14, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

January 03, 2023, 4:24 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

January 08, 2025, 2:21 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Affiliation

University of Michigan

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Clémentine Van Effenterre

PI Affiliation

University of Toronto

Contact Investigator

Additional Trial Information

Status

In development

Start date

2022-10-01

End date

2025-12-01

Keywords

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

This study focuses on evaluations of in coding interviews, which are used to hire computer programmers. We aim to shed light on the mechanisms underlying these differences in these ratings, including differences in coding quality and style, and coder effort.

External Link(s)

Registration Citation

Citation

Craig, Ashley and Clémentine Van Effenterre. 2025. "What Drives Different Assessments of Performance on Coding Tasks?." AEA RCT Registry. January 08. https://doi.org/10.1257/rct.9816-1.2

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

We aim to assess how software developers evaluate pieces of code written by computer programmers. Our experiment uses a large set of de-identified code blocks from an online coding platform, which span coders of different skill and problems of different levels of difficulty. For each code block, we will have access to the code's objective measures of performance including sub-test results (e.g., whether it runs, whether it produces correct answers to unit tests etc.). Using these data, we will ask evaluators to judge the quality of the code using the same Likert scales as on the platform.

Intervention Start Date

2023-01-15

Intervention End Date

2023-02-15

Primary Outcomes

Primary Outcomes (end points)

Subjective ratings of code quality

Primary Outcomes (explanation)

Our primary outcome is evaluators’ subjective ratings of the code quality, whether it differs by the gender of the coder and by the treatment condition. For each block of code, respondents will be asked to rate problems on a scale from 1 to 4.

Secondary Outcomes

Secondary Outcomes (end points)

Evaluators’ prediction of the candidate’s score from the automated evaluation tool.

Secondary Outcomes (explanation)

This is a continuous variable from [0,1]. A third outcome variable is evaluators’ prediction of the candidate hireability score. This is measured on a Likert-scale from 1 to 4, and allows us to draw a more direct link between our findings and hiring outcomes. Additionally, we will measure how much time respondents spend on each question to measure fatigue and inattention, and how this varies over time.

Experimental Design

Experimental Design

Our design relies on multiple observations per subject. Each participant will evaluate 4 code blocks.

Experimental Design Details

Not available

Randomization Method

Randomization will be done by a computer.

Randomization Unit

We use a within-subject design: half of the codes seen by an evaluator will be in the treated group, half in the control group. The order in which the treated half of the codes is seen will be randomized. The order of the codes within each condition will be randomized.

Was the treatment clustered?

No

Experiment Characteristics

Sample size: planned number of clusters

400 evaluators.

Sample size: planned number of observations

1600 observations.

Sample size (or number of clusters) by treatment arms

800 treatment code blocks, 800 non-non-treated.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Institutional Review Boards (IRBs)

IRB Name

University of Toronto Research Oversight and Compliance Office — Human Research Ethics Program

IRB Approval Date

2022-10-06

IRB Approval Number

41662

Analysis Plan Documents

Pre-analysis plan

MD5: dcda8c268891601c2a8a769d18346b2e

SHA1: dbe57ea15edb7fc0dae1fbf4c24eb14bce7cbe8d

Uploaded At: February 17, 2023