x

We are happy to announce that all trial registrations will now be issued DOIs (digital object identifiers). For more information, see here.
Expert Forecasts of Real-Effort Experiment
Last registered on July 05, 2019

Pre-Trial

Trial Information
General Information
Title
Expert Forecasts of Real-Effort Experiment
RCT ID
AEARCTR-0000731
Initial registration date
July 09, 2015
Last updated
July 05, 2019 12:58 AM EDT
Location(s)
Primary Investigator
Affiliation
UC Berkeley
Other Primary Investigator(s)
PI Affiliation
University of Chicago Booth School of Business
Additional Trial Information
Status
Completed
Start date
2015-05-14
End date
2015-12-31
Secondary IDs
Abstract
This study makes use of the data on the response of output to various behavioral conditions in a real-effort experiment, which was gathered as a part of the project registered as "Response of Output to Varying Incentive Structures on Amazon Turk" (AEARCTR-0000714). The behavioral treatments include examinations of the response to incentives, altruistic motives, loss aversion, and gift exchange, among others. A group of forecasters, including experts in economics, psychology, and decision-making, will be asked to predict the output resulting from each of the conditions. We then compare these forecasts to the actual results to examine the relevance of expertise to forecasting experimental results.
External Link(s)
Registration Citation
Citation
DellaVigna, Stefano and Devin Pope. 2019. "Expert Forecasts of Real-Effort Experiment." AEA RCT Registry. July 05. https://doi.org/10.1257/rct.731-3.0.
Former Citation
DellaVigna, Stefano and Devin Pope. 2019. "Expert Forecasts of Real-Effort Experiment." AEA RCT Registry. July 05. https://www.socialscienceregistry.org/trials/731/history/49372.
Experimental Details
Interventions
Intervention(s)
Participants will be directed to a survey on Qualtrics. In this survey, the participants will first be given a description of the mTurk task that was performed in the project registered as "Response of Output to Varying Incentive Structures on Amazon Turk". Specifically, they will be shown the below text:

We ran a large, pre-registered experiment using Amazon's Mechanical Turk (MTurk), an online marketplace where individuals around the world are paid to complete small tasks. The MTurk participants who completed our study are primarily from the United States (85%) or India (12%) and are well-educated (55% self report having a college degree). The gender and age breakdown of the participants closely mirrors the US population as a whole (54% female, 52% age 18-30, 39% age 31-50).

The MTurk participants in our study initially agreed to perform a simple task that takes 10 minutes in return for a fixed participation fee of $1.00. They were not given information about the task or about possible bonus money before agreeing to participate. As part of the experiment, they were offered different bonus payments to encourage them to perform well.

In bold below is the task exactly as it was described to the MTurk participants:

On the next page you will play a simple button-pressing task. The object of this task is to alternately press the 'a' and 'b' buttons on your keyboard as quickly as possible for 10 minutes. Every time you successfully press the 'a' and then the 'b' button, you will receive a point. Note that points will only be rewarded when you alternate button pushes: just pressing the 'a' or 'b' button without alternating between the two will not result in points.

Buttons must be pressed by hand only (key-bindings or automated button-pushing programs/scripts cannot be used) or the task will not be approved.

Feel free to score as many points as you can.

[The participant would then see a different final paragraph depending on the condition to which they were randomly assigned]

The survey-takers will then be given the opportunity to participate in a practice button-pushing task to get a better sense of the actual task the MTurkers performed, and they will also be given the opportunity to view the instructions/walkthrough for that task in PDF format. Having said this, the survey-takers are not obliged to participate in the practice task or look at said instructions.
At this point, the aforementioned treatments will be displayed. The actual average scores received by participants in the first three treatments are then listed in order to provide a reference point. Participants are then asked to predict the number of points scored in the remaining treatments. Their predictions are recorded using a draggable slider.

1. Your score will not affect your payment in any way. [1522 points]
2. As a bonus, you will be paid an extra 1 cent for every 100 points you score. This bonus will be paid to your account within 24 hours. [2028 points]
3. As a bonus, you will be paid an extra 10 cents for every 100 points you score. This bonus will be paid to your account within 24 hours. [2175 points]
4. As a bonus, you will be paid an extra 4 cents for every 100 points that you score. This bonus will be paid to your account within 24 hours.
5. As a bonus, you will be paid an extra 1 cent for every 1,000 points that you score. This bonus will be paid to your account within 24 hours.
6. As a bonus, the Red Cross charitable fund will be given 1 cent for every 100 points that you score.
7. As a bonus, the Red Cross charitable fund will be given 10 cents for every 100 points that you score.
8. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account two weeks from today.
9. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account four weeks from today.
10. As a bonus, you will be paid an extra 40 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
11. As a bonus, you will be paid an extra 40 cents. This bonus will be paid to your account within 24 hours. However, you will lose this bonus (it will not be placed in your account) unless you score at least 2,000 points.
12. As a bonus, you will be paid an extra 80 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
13. As a bonus, you will have a 1% chance of being paid an extra $1 for every 100 points that you score. One out of every 100 participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
14. As a bonus, you will have a 50% chance of being paid an extra 2 cents for every 100 points that you score. One out of two participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
15. Your score will not affect your payment in any way. In a previous version of this task, many participants were able to score more than 2,000 points.
16. Your score will not affect your payment in any way. After you play, we will show you how well you did relative to other participants who have previously done this task.
17. Your score will not affect your payment in any way. We are interested in how fast people choose to press digits and we would like you to do your very best. So please try as hard as you can.
18. In appreciation to you for performing this task, you will be paid a bonus of 40 cents. This bonus will be paid to your account within 24 hours. Your score will not affect your payment in any way.

As added encouragement, five people who complete this survey will be chosen at random to be paid, and this payment will be based on the accuracy of each of his/her predictions. Specifically, these five individuals will each receive $1,000 - (Mean Squared Error/200), where the mean squared error is the average of the squared differences between his/her answers and the actual scores.

Finally, the second page of the survey will collect information on how many forecasts the participant thinks that he/she got correct to within 100 points of the actual values. The participant will then be asked a parallel question about the number of forecasts, on average, that he/she thinks 9 different groups of survey-takers got correct to within 100 points of the actual values:

1. Professors with expertise in behavioral economics or decision making who recently presented at or served on a program committee for select behavioral economics or decision making conferences (e.g. SITE and BDRM)
2. The 15 most-cited professors from Group #1 who respond to our survey
3. Professors from Group #1 with a PhD in economics
4. Professors from Group #1 with a PhD in psychology or decision making
5. PhD students in economics from UC Berkeley and the University of Chicago
6. PhD students from Group #5 who are specializing in behavioral economics
7. MBA students from the Booth School of Business at the University of Chicago
8. MTurk workers who make predictions after completing the button-pushing task in one of the conditions
9. MTurk workers who make predictions without participating in the button-pushing task

Finally, the participant will be asked if he/she has ever heard of or used Amazon Mechanical Turk in the past.
Intervention Start Date
2015-05-14
Intervention End Date
2015-07-09
Primary Outcomes
Primary Outcomes (end points)
The forecasts themselves.
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
We will contact via email a group of approximately 300 experts to ask for their forecasts. The email will provide the link to a conveniently formatted Qualtrics survey where the experts find an explanation for the survey as well as the results of the first three treatments. The experts are then invited to forecast the average effort in the remaining 15 treatments using a convenient slider scale. We also elicit the confidence of the experts, and provide an incentive pay of up to $1,000 to five selected forecasters as incentive for accuracy.

We determined the group of experts as follows. We collected the list of all authors of papers presented at the Stanford Institute of Theoretical Economics in Psychology and Economics and in Experimental Economics from its inception until 2014 (for all years in which the program is online). We combine this list with participants of the Behavioral Annual Meeting (BEAM) conferences from 2009 to 2014, and with the program committee and keynote speakers for the Behavioral Decision Research in Management Conference (BDRM) in 2010, 2012, and 2014. Finally, we include a list of invites to the Russell Sage Foundation 2014 Workshop on "Behavioral Labor Economics". In addition, we include researchers with at least 5 highly-cited papers (at least 100 Google Scholar citations) in relevant keywords and a small number of additional experts that would have been missed by the criteria above. This starting list provides a long list of over 600 people. Since we did not want to be seen as spamming researchers, we further pared down the list to about 300 researchers to whom at least one of the two PIs had some connection.

We notify each of these researchers via an email inviting them to make forecasts on the results of a real-effort experiment, and we provide them a link to a unique version of the survey. We then store the results of their forecasts for those that respond and send a reminder email to those who have not responded within one week.

In a second round of survey collection, we also collect forecasts from a broader group of individuals, including PhD students in economics, MBA students at the University of Chicago Booth School of Business, and a group of MTurk subjects that are recruited for this purpose.
Experimental Design Details
Randomization Method
The order in which the groups of treatments are presented to the survey-taker will be randomly drawn from one of six orders.
Randomization Unit
Individual subject
Was the treatment clustered?
No
Experiment Characteristics
Sample size: planned number of clusters
Please see below. The number of clusters is the same as the number of observations.
Sample size: planned number of observations
About 300 experts will be sent this survey, with additional participation by the aforementioned groups of non-experts.
Sample size (or number of clusters) by treatment arms
N/A; there is a single treatment arm, with the only source of variation originating from the randomization of question order.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
N/A; not relevant here.
Supporting Documents and Materials

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
Social and Behavioral Sciences Institutional Review Board (SRS-IRB) at the University of Chicago
IRB Approval Date
2015-06-05
IRB Approval Number
IRB15-0757
Analysis Plan
Analysis Plan Documents
Pre-Analysis Plan

MD5: 25abd06e6855c2b7b13829555a272e8e

SHA1: 425f4a110657ca65d51dea481c1e4ddf5f0a5b33

Uploaded At: July 09, 2015

Post-Trial
Post Trial Information
Study Withdrawal
Intervention
Is the intervention completed?
Yes
Intervention Completion Date
December 31, 2015, 12:00 AM +00:00
Is data collection complete?
Yes
Data Collection Completion Date
September 30, 2015, 12:00 AM +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
N/A
Was attrition correlated with treatment status?
No
Final Sample Size: Total Number of Observations
The final sample includes 9,861 subjects.
Final Sample Size (or Number of Clusters) by Treatment Arms
Approximately 550 subjects for each treatment arm (with 18 treatment arms)
Data Publication
Data Publication
Is public data available?
No
Program Files
Program Files
No
Reports and Papers
Preliminary Reports
Relevant Papers
Abstract
How much do different monetary and non-monetary motivators induce costly effort? Does the effectiveness line up with the expectations of researchers and with results in the literature? We conduct a large-scale real-effort experiment with 18 treatment arms. We examine the effect of (i) standard incentives; (ii) behavioral factors like social preferences and reference dependence; and (iii) non-monetary inducements from psychology. We find that (i) monetary incentives work largely as expected, including a very low piece rate treat- ment which does not crowd out effort; (ii) the evidence is partly consistent with standard behavioral models, including warm glow, though we do not find evidence of probability weighting; (iii) the psychological motivators are effective, but less so than incentives. We then compare the results to forecasts by 208 academic experts. On average, the experts an- ticipate several key features, like the effectiveness of psychological motivators. A sizeable share of experts, however, expects crowd-out, probability weighting, and pure altruism, counterfactually. As a further comparison, we present a meta-analysis of similar treat- ments in the literature. Overall, predictions based on the literature are correlated with, but underperform, the expert forecasts.
Citation
"What Motivates Effort? Evidence and Expert Forecasts." This version: March 15, 2017.
Abstract
Academic experts frequently recommend policies and treatments. But how well do they anticipate the impact of different treatments? And how do their predictions compare to the predictions of non-experts? We analyze how 208 experts forecast the results of 15 treatments involving monetary and non-monetary motivators in a real-effort task. We compare these forecasts to those made by PhD students and non-experts: undergraduates, MBAs, and an online sample. We document seven main results. First, the average forecast of experts predicts quite well the experimental results. Second, there is a strong wisdom-of- crowds effect: the average forecast outperforms 96 percent of individual forecasts. Third, correlates of expertise–citations, academic rank, field, and contextual experience—do not improve forecasting accuracy. Fourth, experts as a group do better than non-experts, but not if accuracy is defined as rank ordering treatments. Fifth, measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent, especially for non- experts. Sixth, using these measures we identify ‘superforecasters’ among the non-experts who outperform the experts out of sample. Seventh, we document that these results on forecasting accuracy surprise the forecasters themselves. We present a simple model that organizes several of these results and we stress the implications for the collection of forecasts of future experimental results.
Citation
"Predicting Experimental Results: Who Knows What?" This version: August 16, 2016.