Performance pay for introspection

Last registered on October 02, 2021

Pre-Trial

Trial Information

General Information

Title
Performance pay for introspection
RCT ID
AEARCTR-0007425
Initial registration date
March 27, 2021

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
March 29, 2021, 11:00 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 02, 2021, 10:57 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
University of Chicago

Other Primary Investigator(s)

PI Affiliation
Harvard University

Additional Trial Information

Status
In development
Start date
2021-03-30
End date
2021-11-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We examine the effects of different incentive schemes on effort on a subjective classification task on the labor market platform MTurk.
Subjective classification tasks are common strategies to gather information on tastes, attitudes, and develop training datasets for artificial intelligence. When the output of the task is subjective, designing effective incentive schemes is challenging because effort is difficult to observe. However, opinions are often given freely, suggesting that the cost of providing a subjective opinion is low (or even negative), so that incentivizing effort may be ineffective or even counterproductive if monitoring and incentives either crowd out altruistic motives or generate multi-tasking problems and potentially incentivize gaming.

This pilot broadly investigates how to incentivize respondents to perform a simple introspective task: classifying responses to an open-ended question according to “originality”. MTurk workers are asked to report both their first order beliefs (what they think) and their second order beliefs (what they think others think). Using several measures of respondent effort, we will compare the performance of workers under a range of different incentive schemes including fixed wage schemes, various forms of attention checks, and the Bayesian Truth Serum (BTS).

For each incentive scheme, we will examine the level of effort using a range of novel outcomes, including an incentivized measure of the degree of disutility associated with the task. We will use these measures to examine whether performance incentives for subjective tasks increases the level of effort, whether linking performance pay to particular sub-tasks creates crowds out effort on other tasks, and whether the effects of different incentive schemes are heterogeneous across people with different predispositions to exert effort when there are no performance incentives.
External Link(s)

Registration Citation

Citation
Gray-Lobe, Guthrie and Peter Hickman. 2021. "Performance pay for introspection." AEA RCT Registry. October 02. https://doi.org/10.1257/rct.7425-1.1
Experimental Details

Interventions

Intervention(s)
We examine the effects of different incentive schemes on effort on a subjective classification task on the labor market platform MTurk.
Subjective classification tasks are common strategies to gather information on tastes, attitudes, and develop training datasets for artificial intelligence. When the output of the task is subjective, designing effective incentive schemes is challenging because effort is difficult to observe. However, opinions are often given freely, suggesting that the cost of providing a subjective opinion is low (or even negative), so that incentivizing effort may be ineffective or even counterproductive if monitoring and incentives either crowd out altruistic motives or generate multi-tasking problems and potentially incentivize gaming.

This pilot broadly investigates how to incentivize respondents to perform a simple introspective task: classifying responses to an open-ended question according to “originality”. MTurk workers are asked to report both their first order beliefs (what they think) and their second order beliefs (what they think others think). Using several measures of respondent effort, we will compare the performance of workers under a range of different incentive schemes including fixed wage schemes, various forms of attention checks, and the Bayesian Truth Serum (BTS).

For each incentive scheme, we will examine the level of effort using a range of novel outcomes, including an incentivized measure of the degree of disutility associated with the task. We will use these measures to examine whether performance incentives for subjective tasks increases the level of effort, whether linking performance pay to particular sub-tasks creates crowds out effort on other tasks, and whether the effects of different incentive schemes are heterogeneous across people with different predispositions to exert effort when there are no performance incentives.
Intervention Start Date
2021-04-01
Intervention End Date
2021-04-27

Primary Outcomes

Primary Outcomes (end points)
Time on task
Internal consistency (share of repeated items classified similarly)
Group consistency (average absolute difference between report and mean group report)
Negative reservation base wage (exact amount of bid)
Negative payout adjusted reservation base wage (bid – an adjustment for the amount the worker was paid out)

We will define gaming behaviors for the internal consistency strategy as:
The maximum 1st order belief option share (e.g., out of three options, the share of the option that is used most frequently)
Negative similarity of individual responses to other workers
Negative time on task
Degenerate second order beliefs (e.g., reporting 100 percent in the second order beliefs elicitation module)
Primary Outcomes (explanation)
Time on task, internal consistency and group consistency will be measured for the full task (first and second order beliefs) and separately for first order and second order beliefs. We will refer to a generic measure of effort as e, and effort on first (second) order beliefs as e_1 (e_2).

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The data for this study come from an mTurk subjective classification task. Workers were asked to classify reports from Kenyan pupils on possible uses of a spoon in terms of the degree of “originality”. Workers were asked to report their first order beliefs about whether the use was “original” (three-point scale), and their second order beliefs about the percentage of other mTurkers who report that the use was “original”. Each round, workers are asked to grade 100 items. Of the 100 items, 20 will repeat (10 unique items x 2 occurrences). Workers complete 2-3 100-item rounds.
Treatments vary how the task is incentivized. There are six treatment arms:
TA: Control (N=200) – fixed payment
TB: Intrinsic motivation (N=160) – fixed payment
TC: Internal consistency (N=160) - Respondents receive a bonus from classifying repeated items the same way.
TD: Group consistency (N=160) - Workers receive an incentive based on the similarity of their second order beliefs to others’ reported first order beliefs
TE: Bayesian Truth Serum (BTS) (N=160)- Workers are incentivized using the BTS mechanism
TF: Attention checks (N=160) - Workers are incentivized to pay attention by giving a pre-specified answer for items containing a pre-specified word.
Workers first complete a practice round of 10 items. They then complete round 1 with 100 items, all unincentivized.
Before round 2, workers are randomized into treatments and then complete another 100 items.
After completing the second round, workers participate in a Becker-Degroot-Marshalk auction to participate in a third round. Workers submit the lowest base wage for which they would agree to work another round under the same incentive scheme. Workers who bid an amount below a randomized offer are required to participate in the third round with the base wage being the randomized offer.
The base wage will be US$ 6, the total bonus will be up to US$ 4.
The MDE for a comparison between any two treatment arms will be 0.29 SDs (alpha 0.05, beta = 0.80).
Experimental Design Details
Randomization Method
By computer.
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
1000
Sample size: planned number of observations
1000
Sample size (or number of clusters) by treatment arms
TA: Control (N=200) – fixed payment
TB: Intrinsic motivation (N=160) – fixed payment
TC: Internal consistency (N=160) - Respondents receive a bonus from classifying repeated items the same way.
TD: Group consistency (N=160) - Workers receive an incentive based on the similarity of their second order beliefs to others’ reported first order beliefs
TE: Bayesian Truth Serum (BTS) (N=160)- Workers are incentivized using the BTS mechanism
TF: Attention checks (N=160) - Workers are incentivized to pay attention by giving a pre-specified answer for items containing a pre-specified word.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
0.29 SDs (alpha 0.05, beta = 0.80).
IRB

Institutional Review Boards (IRBs)

IRB Name
Innovations for Poverty Action
IRB Approval Date
2020-06-27
IRB Approval Number
7401
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials