AEA RCT Registry

Trial Information

General Information

Title

Performance pay for introspection

RCT ID

AEARCTR-0007425

Initial registration date

March 27, 2021

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

March 29, 2021, 11:00 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

October 02, 2021, 10:57 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

Guthrie Gray-Lobe

Affiliation

University of Chicago

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Peter Hickman

PI Affiliation

Harvard University

Additional Trial Information

Status

In development

Start date

2021-03-30

End date

2021-11-30

Keywords

Behavior, Labor, Other

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

We examine the effects of different incentive schemes on effort on a subjective classification task on the labor market platform MTurk.
Subjective classification tasks are common strategies to gather information on tastes, attitudes, and develop training datasets for artificial intelligence. When the output of the task is subjective, designing effective incentive schemes is challenging because effort is difficult to observe. However, opinions are often given freely, suggesting that the cost of providing a subjective opinion is low (or even negative), so that incentivizing effort may be ineffective or even counterproductive if monitoring and incentives either crowd out altruistic motives or generate multi-tasking problems and potentially incentivize gaming.

This pilot broadly investigates how to incentivize respondents to perform a simple introspective task: classifying responses to an open-ended question according to “originality”. MTurk workers are asked to report both their first order beliefs (what they think) and their second order beliefs (what they think others think). Using several measures of respondent effort, we will compare the performance of workers under a range of different incentive schemes including fixed wage schemes, various forms of attention checks, and the Bayesian Truth Serum (BTS).

For each incentive scheme, we will examine the level of effort using a range of novel outcomes, including an incentivized measure of the degree of disutility associated with the task. We will use these measures to examine whether performance incentives for subjective tasks increases the level of effort, whether linking performance pay to particular sub-tasks creates crowds out effort on other tasks, and whether the effects of different incentive schemes are heterogeneous across people with different predispositions to exert effort when there are no performance incentives.

External Link(s)

Registration Citation

Citation

Gray-Lobe, Guthrie and Peter Hickman. 2021. "Performance pay for introspection." AEA RCT Registry. October 02. https://doi.org/10.1257/rct.7425-1.1

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

We examine the effects of different incentive schemes on effort on a subjective classification task on the labor market platform MTurk.
Subjective classification tasks are common strategies to gather information on tastes, attitudes, and develop training datasets for artificial intelligence. When the output of the task is subjective, designing effective incentive schemes is challenging because effort is difficult to observe. However, opinions are often given freely, suggesting that the cost of providing a subjective opinion is low (or even negative), so that incentivizing effort may be ineffective or even counterproductive if monitoring and incentives either crowd out altruistic motives or generate multi-tasking problems and potentially incentivize gaming.

This pilot broadly investigates how to incentivize respondents to perform a simple introspective task: classifying responses to an open-ended question according to “originality”. MTurk workers are asked to report both their first order beliefs (what they think) and their second order beliefs (what they think others think). Using several measures of respondent effort, we will compare the performance of workers under a range of different incentive schemes including fixed wage schemes, various forms of attention checks, and the Bayesian Truth Serum (BTS).

For each incentive scheme, we will examine the level of effort using a range of novel outcomes, including an incentivized measure of the degree of disutility associated with the task. We will use these measures to examine whether performance incentives for subjective tasks increases the level of effort, whether linking performance pay to particular sub-tasks creates crowds out effort on other tasks, and whether the effects of different incentive schemes are heterogeneous across people with different predispositions to exert effort when there are no performance incentives.

Intervention Start Date

2021-04-01

Intervention End Date

2021-04-27

Primary Outcomes

Primary Outcomes (end points)

Time on task
Internal consistency (share of repeated items classified similarly)
Group consistency (average absolute difference between report and mean group report)
Negative reservation base wage (exact amount of bid)
Negative payout adjusted reservation base wage (bid – an adjustment for the amount the worker was paid out)

We will define gaming behaviors for the internal consistency strategy as:
The maximum 1st order belief option share (e.g., out of three options, the share of the option that is used most frequently)
Negative similarity of individual responses to other workers
Negative time on task
Degenerate second order beliefs (e.g., reporting 100 percent in the second order beliefs elicitation module)

Primary Outcomes (explanation)

Time on task, internal consistency and group consistency will be measured for the full task (first and second order beliefs) and separately for first order and second order beliefs. We will refer to a generic measure of effort as e, and effort on first (second) order beliefs as e_1 (e_2).

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

The data for this study come from an mTurk subjective classification task. Workers were asked to classify reports from Kenyan pupils on possible uses of a spoon in terms of the degree of “originality”. Workers were asked to report their first order beliefs about whether the use was “original” (three-point scale), and their second order beliefs about the percentage of other mTurkers who report that the use was “original”. Each round, workers are asked to grade 100 items. Of the 100 items, 20 will repeat (10 unique items x 2 occurrences). Workers complete 2-3 100-item rounds.
Treatments vary how the task is incentivized. There are six treatment arms:
TA: Control (N=200) – fixed payment
TB: Intrinsic motivation (N=160) – fixed payment
TC: Internal consistency (N=160) - Respondents receive a bonus from classifying repeated items the same way.
TD: Group consistency (N=160) - Workers receive an incentive based on the similarity of their second order beliefs to others’ reported first order beliefs
TE: Bayesian Truth Serum (BTS) (N=160)- Workers are incentivized using the BTS mechanism
TF: Attention checks (N=160) - Workers are incentivized to pay attention by giving a pre-specified answer for items containing a pre-specified word.
Workers first complete a practice round of 10 items. They then complete round 1 with 100 items, all unincentivized.
Before round 2, workers are randomized into treatments and then complete another 100 items.
After completing the second round, workers participate in a Becker-Degroot-Marshalk auction to participate in a third round. Workers submit the lowest base wage for which they would agree to work another round under the same incentive scheme. Workers who bid an amount below a randomized offer are required to participate in the third round with the base wage being the randomized offer.
The base wage will be US$ 6, the total bonus will be up to US$ 4.
The MDE for a comparison between any two treatment arms will be 0.29 SDs (alpha 0.05, beta = 0.80).

Experimental Design Details

Randomization Method

By computer.

Randomization Unit

Individual

Was the treatment clustered?

No

Experiment Characteristics

Sample size: planned number of clusters

1000

Sample size: planned number of observations

1000

Sample size (or number of clusters) by treatment arms

TA: Control (N=200) – fixed payment
TB: Intrinsic motivation (N=160) – fixed payment
TC: Internal consistency (N=160) - Respondents receive a bonus from classifying repeated items the same way.
TD: Group consistency (N=160) - Workers receive an incentive based on the similarity of their second order beliefs to others’ reported first order beliefs
TE: Bayesian Truth Serum (BTS) (N=160)- Workers are incentivized using the BTS mechanism
TF: Attention checks (N=160) - Workers are incentivized to pay attention by giving a pre-specified answer for items containing a pre-specified word.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

0.29 SDs (alpha 0.05, beta = 0.80).

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Innovations for Poverty Action

IRB Approval Date

2020-06-27

IRB Approval Number

7401

Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Performance pay for introspection

Pre-Trial

General Information

Locations

Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Registration Citation

Interventions

Primary Outcomes

Secondary Outcomes

Experimental Design

Experiment Characteristics

Institutional Review Boards (IRBs)

Post-Trial

Study Withdrawal

Intervention

Data Publication

Program Files

Relevant Paper(s)

Reports & Other Materials