x

Please fill out this short user survey of only 3 questions in order to help us improve the site. We appreciate your feedback!
Response of Output to Varying Incentive Structures on Amazon Turk
Last registered on April 07, 2017

Pre-Trial

Trial Information
General Information
Title
Response of Output to Varying Incentive Structures on Amazon Turk
RCT ID
AEARCTR-0000714
Initial registration date
May 14, 2015
Last updated
April 07, 2017 10:14 AM EDT
Location(s)
Primary Investigator
Affiliation
UC Berkeley
Other Primary Investigator(s)
PI Affiliation
University of Chicago Booth School of Business
Additional Trial Information
Status
Completed
Start date
2015-05-14
End date
2015-09-30
Secondary IDs
Abstract
This study will add to the body of research on incentive-induced behavior by quantifying the effect of various incentives on output. While a wide range of incentives have been shown to be effective (see Campbell, 2006), comparisons between different incentive types have been rare. This study hopes to quantify some of the difference between types of incentives, particularly in non-monetary forms (such as goal setting, comparison to others, etc.). The result of this study will form the basis for the elicitation of expert forecasts as registered in "Expert Forecasts of Amazon Turk Treatments" (AEARCTR-0000731).
External Link(s)
Registration Citation
Citation
DellaVigna, Stefano and Devin Pope. 2017. "Response of Output to Varying Incentive Structures on Amazon Turk." AEA RCT Registry. April 07. https://doi.org/10.1257/rct.714-4.0.
Former Citation
DellaVigna, Stefano, Stefano DellaVigna and Devin Pope. 2017. "Response of Output to Varying Incentive Structures on Amazon Turk." AEA RCT Registry. April 07. http://www.socialscienceregistry.org/trials/714/history/15968.
Experimental Details
Interventions
Intervention(s)
After Amazon Mechanical Turk users click select our task and complete the pre-requisite survey, they will be taken to a page that contains the below text:

***

You have 5 minutes maximum to read this page. If you finish early, you may proceed to the next page at your discretion.

On the next page you will complete a simple button-pressing task. The object of this task is to alternately press the 'a' and 'b' buttons on your keyboard as quickly as possible for 10 minutes. Every time you successfully press the 'a' and then the 'b' button, you will receive a point. Note that points will only be rewarded when you alternate button pushes: just pressing the 'a' or 'b' button without alternating between the two will not result in points.

Buttons must be pressed by hand only (key bindings or automated button-pushing programs/scripts cannot be used) or task will not be approved.

Feel free to score as many points as you can.

***

At this point, depending on the treatment, precisely one of seventeen different treatments will be shown to the participant in the next sentence. These treatments are designed to provide the users with different types of incentives.

Baseline:

1. Your score will not affect your payment in any way.

Basic Incentives:

2. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account within 24 hours.
3. As a bonus, you will be paid an extra 10 cents for every 100 points that you score. This bonus will be paid to your account within 24 hours.
4. As a bonus, you will be paid an extra 4 cents for every 100 points that you score. This bonus will be paid to your account within 24 hours.

Paying Too Little:

5. As a bonus, you will be paid an extra 1 cent for every 1,000 points that you score. This bonus will be paid to your account within 24 hours.

Charitable Giving:

6. As a bonus, the Red Cross charitable fund will be given 1 cent for every 100 points that you score.
7. As a bonus, the Red Cross charitable fund will be given 10 cents for every 100 points that you score.

Time Preferences:

8. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account two weeks from today.
9. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account four weeks from today.

Loss Aversion:

10. As a bonus, you will be paid an extra 40 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
11. As a bonus, you will be paid an extra 40 cents. This bonus will be paid to your account within 24 hours. However, you will lose this bonus (it will not be placed in your account) unless you score at least 2,000 points.
12. As a bonus, you will be paid an extra 80 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.

Probability-Weighting:

13. As a bonus, you will have a 1% chance of being paid an extra $1 for every 100 points that you score. Approximately one out of every 100 participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
14. As a bonus, you will have a 50% chance of being paid an extra 2 cents for every 100 points that you score. Approximately one out of two participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.

Social Comparison:

15. Your score will not affect your payment in any way. In a previous version of this task, many participants were able to score more than 2,000 points.
16. Your score will not affect your payment in any way. After you play, we will show you how well you did relative to other participants who have previously done this task.

Task Significance:

17. Your score will not affect your payment in any way. We are interested in how fast people choose to press digits and we would like you to do your very best. So please try as hard as you can.

Gift Exchange:

18. In appreciation to you for performing this task, you will be paid a bonus of 40 cents. This bonus will be paid to your account within 24 hours. Your score will not affect your payment in any way.
Intervention Start Date
2015-05-14
Intervention End Date
2015-06-14
Primary Outcomes
Primary Outcomes (end points)
The key outcome variable is the number of points scored by the subject, where each point is scored as a result of pressing 'a' then 'b'. Additionally, for each subject, we store data on their button presses over time (by second).
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
Subjects will choose to participate in this study by selecting it on Amazon's Mechanical Turk service. Before the participants choose to participate, they will be provided with a brief description of the study; which also tells a guaranteed flat-pay for successful submission and a time estimate for completion. Once in the survey, participants will fill out a brief set of questions providing demographic information like age, sex, and education. After this, participants will be directed to complete a task with randomly-selected incentive structures. The task is to see how many times a participant can press two alternating keyboard buttons (say, 'a' and 'b') within a given time period (10 minutes). There are 18 different randomly-assigned treatments with varying levels and types of incentives. For example, some participants will be given bonus payments based on points scored, others will raise money for charity, and some will receive no bonus. All participants are clearly informed of their incentive structure and bonus opportunities before playing their task. There is no deception at any point in this task.

Upon completion of the task, participants will be thanked for their contribution and the flat payment of $1, along with any additional money won, will be given within the designated time period for their treatment (typically within 24 hours). An example of the sequence of pages subjects see is attached ("SurveyWalkThrough.pdf").

The final sample will exclude subjects that (i) do not complete the MTurk task within 30 minutes of starting or (ii) exit then re-enter the task as a new subject (as these individuals might see multiple treatments) or (iii) score 4000 or more points (as we have learned from a pilot study of ~300 participants that it is physically impossible to score more than 3500 points, so it is likely that these individuals are using bots).

The average score in each of these 18 treatments will form the basis for the elicitation of expert forecasts as registered in "Expert Forecasts of Amazon Turk Treatments". To ensure that the principal investigators can make their own forecasts and that they will not interfere in the forecasting of other experts, the principal investigators will not access the experimental results until after the first round of forecasting. Results will be stored by Michael Sheldon (at Chicago Booth) and Don Moore (at UC Berkeley).
Experimental Design Details
Randomization Method
Number generation will randomize participants into each treatment group subject to the guarantee of an equal number of participants in each group.
Randomization Unit
Individual subject.
Was the treatment clustered?
No
Experiment Characteristics
Sample size: planned number of clusters
n/a
Sample size: planned number of observations
The number of clusters is the same as the number of observations. 10,000 people is the ideal number of subjects planned for the study. The hope is to obtain at least 5500 subjects. The task will be kept open on Amazon Mechanical Turk until either (i) two weeks have passed or (ii) 10,000 subjects have completed the study, whichever comes first. If two weeks pass without 5500 subjects completing the task, then the task will be kept open (up to six weeks) until 5500 subjects are obtained.
Sample size (or number of clusters) by treatment arms
It was determined that we wanted to reach a sample of at least 300 subjects per treatment for a total of 300*18 = 5400 subjects so as to attain sufficiently precise estimates of the productivity per treatment. The ideal sample that we hope to achieve is approximately twice as large as that.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We ran a small pilot to ensure that our protocol were working and that we were able to create differential effort with our online task. The pilot also allowed us to get an estimate for what the standard deviation will be in our task. Based on 393 pilot participants, the standard deviation of points scored was around 740 and was similar across different treatments. Assuming that this is approximately the standard deviation of each treatment in the experiment and assuming a sample size of 5500 (305 per treatment), there is thus an 80% power to reject the null hypothesis of zero difference in average points between two treatments when the actual difference between the two treatments is 168.1 points. Assuming instead a sample size of 10,000 (555 per treatment), there is then an 80% power to reject the null hypothesis of zero difference when the actual difference is 124.6 points. Based on our pilot, different treatments can create differences in average points scored by as much as 400-500 points, a difference of which can easily be detected statistically given the preceding calculations.
Supporting Documents and Materials

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
Social and Behavioral Sciences Institutional Review Board (SRS-IRB) at the University of Chicago
IRB Approval Date
2014-11-03
IRB Approval Number
IRB14-1321
Post-Trial
Post Trial Information
Study Withdrawal
Intervention
Is the intervention completed?
Yes
Intervention Completion Date
June 14, 2015, 12:00 AM +00:00
Is data collection complete?
Yes
Data Collection Completion Date
September 30, 2015, 12:00 AM +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
N/A
Was attrition correlated with treatment status?
No
Final Sample Size: Total Number of Observations
The final sample includes 9,861 subjects.
Final Sample Size (or Number of Clusters) by Treatment Arms
Approximately 550 subjects for each treatment arm (with 18 treatment arms)
Data Publication
Data Publication
Is public data available?
No
Program Files
Program Files
No
Reports, Papers & Other Materials
Relevant Paper(s)
Abstract
Academic experts frequently recommend policies and treatments. But how well do they anticipate the impact of different treatments? And how do their predictions compare to the predictions of non-experts? We analyze how 208 experts forecast the results of 15 treatments involving monetary and non-monetary motivators in a real-effort task. We compare these forecasts to those made by PhD students and non-experts: undergraduates, MBAs, and an online sample. We document seven main results. First, the average forecast of experts predicts quite well the experimental results. Second, there is a strong wisdom-of- crowds effect: the average forecast outperforms 96 percent of individual forecasts. Third, correlates of expertise–citations, academic rank, field, and contextual experience—do not improve forecasting accuracy. Fourth, experts as a group do better than non-experts, but not if accuracy is defined as rank ordering treatments. Fifth, measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent, especially for non- experts. Sixth, using these measures we identify ‘superforecasters’ among the non-experts who outperform the experts out of sample. Seventh, we document that these results on forecasting accuracy surprise the forecasters themselves. We present a simple model that organizes several of these results and we stress the implications for the collection of forecasts of future experimental results.
Citation
"Predicting Experimental Results: Who Knows What?" This version: August 16, 2016
Abstract
How much do different monetary and non-monetary motivators induce costly effort? Does the effectiveness line up with the expectations of researchers and with results in the literature? We conduct a large-scale real-effort experiment with 18 treatment arms. We examine the effect of (i) standard incentives; (ii) behavioral factors like social preferences and reference dependence; and (iii) non-monetary inducements from psychology. We find that (i) monetary incentives work largely as expected, including a very low piece rate treat- ment which does not crowd out effort; (ii) the evidence is partly consistent with standard behavioral models, including warm glow, though we do not find evidence of probability weighting; (iii) the psychological motivators are effective, but less so than incentives. We then compare the results to forecasts by 208 academic experts. On average, the experts an- ticipate several key features, like the effectiveness of psychological motivators. A sizeable share of experts, however, expects crowd-out, probability weighting, and pure altruism, counterfactually. As a further comparison, we present a meta-analysis of similar treat- ments in the literature. Overall, predictions based on the literature are correlated with, but underperform, the expert forecasts.
Citation
"What Motivates Effort? Evidence and Expert Forecasts." This version: March 15, 2017.
REPORTS & OTHER MATERIALS