Expert Forecasts of Real-Effort Experiment

Last registered on July 05, 2019

Pre-Trial

Trial Information

General Information

Title
Expert Forecasts of Real-Effort Experiment
RCT ID
AEARCTR-0000731
Initial registration date
July 09, 2015

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 09, 2015, 11:43 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
July 05, 2019, 12:58 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Primary Investigator

Affiliation
UC Berkeley

Other Primary Investigator(s)

PI Affiliation
University of Chicago Booth School of Business

Additional Trial Information

Status
Completed
Start date
2015-05-14
End date
2015-12-31
Secondary IDs
Abstract
This study makes use of the data on the response of output to various behavioral conditions in a real-effort experiment, which was gathered as a part of the project registered as "Response of Output to Varying Incentive Structures on Amazon Turk" (AEARCTR-0000714). The behavioral treatments include examinations of the response to incentives, altruistic motives, loss aversion, and gift exchange, among others. A group of forecasters, including experts in economics, psychology, and decision-making, will be asked to predict the output resulting from each of the conditions. We then compare these forecasts to the actual results to examine the relevance of expertise to forecasting experimental results.
External Link(s)

Registration Citation

Citation
DellaVigna, Stefano and Devin Pope. 2019. "Expert Forecasts of Real-Effort Experiment." AEA RCT Registry. July 05. https://doi.org/10.1257/rct.731-3.0
Former Citation
DellaVigna, Stefano and Devin Pope. 2019. "Expert Forecasts of Real-Effort Experiment." AEA RCT Registry. July 05. https://www.socialscienceregistry.org/trials/731/history/49372
Experimental Details

Interventions

Intervention(s)
Participants will be directed to a survey on Qualtrics. In this survey, the participants will first be given a description of the mTurk task that was performed in the project registered as "Response of Output to Varying Incentive Structures on Amazon Turk". Specifically, they will be shown the below text:

We ran a large, pre-registered experiment using Amazon's Mechanical Turk (MTurk), an online marketplace where individuals around the world are paid to complete small tasks. The MTurk participants who completed our study are primarily from the United States (85%) or India (12%) and are well-educated (55% self report having a college degree). The gender and age breakdown of the participants closely mirrors the US population as a whole (54% female, 52% age 18-30, 39% age 31-50).

The MTurk participants in our study initially agreed to perform a simple task that takes 10 minutes in return for a fixed participation fee of $1.00. They were not given information about the task or about possible bonus money before agreeing to participate. As part of the experiment, they were offered different bonus payments to encourage them to perform well.

In bold below is the task exactly as it was described to the MTurk participants:

On the next page you will play a simple button-pressing task. The object of this task is to alternately press the 'a' and 'b' buttons on your keyboard as quickly as possible for 10 minutes. Every time you successfully press the 'a' and then the 'b' button, you will receive a point. Note that points will only be rewarded when you alternate button pushes: just pressing the 'a' or 'b' button without alternating between the two will not result in points.

Buttons must be pressed by hand only (key-bindings or automated button-pushing programs/scripts cannot be used) or the task will not be approved.

Feel free to score as many points as you can.

[The participant would then see a different final paragraph depending on the condition to which they were randomly assigned]

The survey-takers will then be given the opportunity to participate in a practice button-pushing task to get a better sense of the actual task the MTurkers performed, and they will also be given the opportunity to view the instructions/walkthrough for that task in PDF format. Having said this, the survey-takers are not obliged to participate in the practice task or look at said instructions.
At this point, the aforementioned treatments will be displayed. The actual average scores received by participants in the first three treatments are then listed in order to provide a reference point. Participants are then asked to predict the number of points scored in the remaining treatments. Their predictions are recorded using a draggable slider.

1. Your score will not affect your payment in any way. [1522 points]
2. As a bonus, you will be paid an extra 1 cent for every 100 points you score. This bonus will be paid to your account within 24 hours. [2028 points]
3. As a bonus, you will be paid an extra 10 cents for every 100 points you score. This bonus will be paid to your account within 24 hours. [2175 points]
4. As a bonus, you will be paid an extra 4 cents for every 100 points that you score. This bonus will be paid to your account within 24 hours.
5. As a bonus, you will be paid an extra 1 cent for every 1,000 points that you score. This bonus will be paid to your account within 24 hours.
6. As a bonus, the Red Cross charitable fund will be given 1 cent for every 100 points that you score.
7. As a bonus, the Red Cross charitable fund will be given 10 cents for every 100 points that you score.
8. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account two weeks from today.
9. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account four weeks from today.
10. As a bonus, you will be paid an extra 40 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
11. As a bonus, you will be paid an extra 40 cents. This bonus will be paid to your account within 24 hours. However, you will lose this bonus (it will not be placed in your account) unless you score at least 2,000 points.
12. As a bonus, you will be paid an extra 80 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
13. As a bonus, you will have a 1% chance of being paid an extra $1 for every 100 points that you score. One out of every 100 participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
14. As a bonus, you will have a 50% chance of being paid an extra 2 cents for every 100 points that you score. One out of two participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
15. Your score will not affect your payment in any way. In a previous version of this task, many participants were able to score more than 2,000 points.
16. Your score will not affect your payment in any way. After you play, we will show you how well you did relative to other participants who have previously done this task.
17. Your score will not affect your payment in any way. We are interested in how fast people choose to press digits and we would like you to do your very best. So please try as hard as you can.
18. In appreciation to you for performing this task, you will be paid a bonus of 40 cents. This bonus will be paid to your account within 24 hours. Your score will not affect your payment in any way.

As added encouragement, five people who complete this survey will be chosen at random to be paid, and this payment will be based on the accuracy of each of his/her predictions. Specifically, these five individuals will each receive $1,000 - (Mean Squared Error/200), where the mean squared error is the average of the squared differences between his/her answers and the actual scores.

Finally, the second page of the survey will collect information on how many forecasts the participant thinks that he/she got correct to within 100 points of the actual values. The participant will then be asked a parallel question about the number of forecasts, on average, that he/she thinks 9 different groups of survey-takers got correct to within 100 points of the actual values:

1. Professors with expertise in behavioral economics or decision making who recently presented at or served on a program committee for select behavioral economics or decision making conferences (e.g. SITE and BDRM)
2. The 15 most-cited professors from Group #1 who respond to our survey
3. Professors from Group #1 with a PhD in economics
4. Professors from Group #1 with a PhD in psychology or decision making
5. PhD students in economics from UC Berkeley and the University of Chicago
6. PhD students from Group #5 who are specializing in behavioral economics
7. MBA students from the Booth School of Business at the University of Chicago
8. MTurk workers who make predictions after completing the button-pushing task in one of the conditions
9. MTurk workers who make predictions without participating in the button-pushing task

Finally, the participant will be asked if he/she has ever heard of or used Amazon Mechanical Turk in the past.
Intervention (Hidden)
Although this information is not made available within the survey, internally, we categorize the various treatments into groups that reflect their relation to one another.

Baseline:

1. Your score will not affect your payment in any way. [1522 points]

Basic Incentives:

2. As a bonus, you will be paid an extra 1 cent for every 100 points you score. This bonus will be paid to your account within 24 hours. [2028 points]
3. As a bonus, you will be paid an extra 10 cents for every 100 points you score. This bonus will be paid to your account within 24 hours. [2175 points]
4. As a bonus, you will be paid an extra 4 cents for every 100 points that you score. This bonus will be paid to your account within 24 hours.

Paying Too Little:

5. As a bonus, you will be paid an extra 1 cent for every 1,000 points that you score. This bonus will be paid to your account within 24 hours.

Charitable Giving:

6. As a bonus, the Red Cross charitable fund will be given 1 cent for every 100 points that you score.
7. As a bonus, the Red Cross charitable fund will be given 10 cents for every 100 points that you score.

Time Preferences:

8. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account two weeks from today.
9. As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account four weeks from today.

Loss Aversion:

10. As a bonus, you will be paid an extra 40 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.
11. As a bonus, you will be paid an extra 40 cents. This bonus will be paid to your account within 24 hours. However, you will lose this bonus (it will not be placed in your account) unless you score at least 2,000 points.
12. As a bonus, you will be paid an extra 80 cents if you score at least 2,000 points. This bonus will be paid to your account within 24 hours.

Probability-Weighting:

13. As a bonus, you will have a 1% chance of being paid an extra $1 for every 100 points that you score. One out of every 100 participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.
14. As a bonus, you will have a 50% chance of being paid an extra 2 cents for every 100 points that you score. One out of two participants who perform this task will be randomly chosen to be paid this reward. The bonus will be put in the winner's account within 24 hours.

Social Comparison:

15. Your score will not affect your payment in any way. In a previous version of this task, many participants were able to score more than 2,000 points.
16. Your score will not affect your payment in any way. After you play, we will show you how well you did relative to other participants who have previously done this task.

Task Significance:

17. Your score will not affect your payment in any way. We are interested in how fast people choose to press digits and we would like you to do your very best. So please try as hard as you can.

Gift Exchange:

18. In appreciation to you for performing this task, you will be paid a bonus of 40 cents. This bonus will be paid to your account within 24 hours. Your score will not affect your payment in any way.

The range (1000 to 2500) of the sliders with which the responses are recorded was set as follows: The maximum is set to the nearest multiple of 500 points which is at least 200 points higher than the average score in the highest-scoring treatment. For example, if the highest average score across the 18 treatments were 2,200, the maximum range would be set to 2,500. If the highest score were instead to be 2,350, the maximum range would be set to 3,000. The minimum, similarly, is set to the nearest multiple of 500 points which is at least 200 points below the average in the lowest-scoring treatment. In other words, no treatment resulted in average button-presses of less than 1200 or greater than 2300 points.

The principal investigators have not seen the actual data resulting from the button-pressing task; viewership has been limited to Michael Sheldon, a research assistant for Devin Pope. The principal investigators have, however, been provided with the above information (that the average for all treatments is between 1,200 and 2,300 points) and with the average score in the three reference treatments (which all survey participants see as well).


12,838 Mechanical Turk workers started our experimental task.

Of these, 721 were dropped because they experienced technical problems with the survey. This technical problem occurred over a several-hour period when the survey platform Qualtrics moved to a new server. Individuals participating in the survey during this time period experienced a malfunction in the counter that kept track of their scores.

48 workers were dropped for scoring above 4000 points. During a small pilot, we determined that scoring more than 4000 points was essentially physically impossible, and thus we worried that any score of 4000 would be due to using a cheat (e.g. a key-binding program).

1,543 workers were dropped because they failed to complete the experiment (for example, many participants only filled out the demographics portion of the experiment and were never assigned a treatment).

364 workers were dropped because they stopped the task and logged in again. We stated in the instructions to the workers that they could not stop the task and log in again. This restriction was put into place so as to discourage workers who may want to log in and obtain a different treatment.

187 workers were dropped because their HIT was not approved for some reason (e.g. they did not have a valid MTurk ID).

114 workers were dropped because they never did a single button press. We were concerned that these participants may have experienced a technical malfunction or that their results were simply not recorded for some reason.

After these sample restrictions, we are left with 9,861 completed tasks with valid results. This is the sample used for the computation of the mean score within each treatment.
Intervention Start Date
2015-05-14
Intervention End Date
2015-07-09

Primary Outcomes

Primary Outcomes (end points)
The forecasts themselves.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We will contact via email a group of approximately 300 experts to ask for their forecasts. The email will provide the link to a conveniently formatted Qualtrics survey where the experts find an explanation for the survey as well as the results of the first three treatments. The experts are then invited to forecast the average effort in the remaining 15 treatments using a convenient slider scale. We also elicit the confidence of the experts, and provide an incentive pay of up to $1,000 to five selected forecasters as incentive for accuracy.

We determined the group of experts as follows. We collected the list of all authors of papers presented at the Stanford Institute of Theoretical Economics in Psychology and Economics and in Experimental Economics from its inception until 2014 (for all years in which the program is online). We combine this list with participants of the Behavioral Annual Meeting (BEAM) conferences from 2009 to 2014, and with the program committee and keynote speakers for the Behavioral Decision Research in Management Conference (BDRM) in 2010, 2012, and 2014. Finally, we include a list of invites to the Russell Sage Foundation 2014 Workshop on "Behavioral Labor Economics". In addition, we include researchers with at least 5 highly-cited papers (at least 100 Google Scholar citations) in relevant keywords and a small number of additional experts that would have been missed by the criteria above. This starting list provides a long list of over 600 people. Since we did not want to be seen as spamming researchers, we further pared down the list to about 300 researchers to whom at least one of the two PIs had some connection.

We notify each of these researchers via an email inviting them to make forecasts on the results of a real-effort experiment, and we provide them a link to a unique version of the survey. We then store the results of their forecasts for those that respond and send a reminder email to those who have not responded within one week.

In a second round of survey collection, we also collect forecasts from a broader group of individuals, including PhD students in economics, MBA students at the University of Chicago Booth School of Business, and a group of MTurk subjects that are recruited for this purpose.
Experimental Design Details
Randomization Method
The order in which the groups of treatments are presented to the survey-taker will be randomly drawn from one of six orders.
Randomization Unit
Individual subject
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Please see below. The number of clusters is the same as the number of observations.
Sample size: planned number of observations
About 300 experts will be sent this survey, with additional participation by the aforementioned groups of non-experts.
Sample size (or number of clusters) by treatment arms
N/A; there is a single treatment arm, with the only source of variation originating from the randomization of question order.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
N/A; not relevant here.
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Social and Behavioral Sciences Institutional Review Board (SRS-IRB) at the University of Chicago
IRB Approval Date
2015-06-05
IRB Approval Number
IRB15-0757
Analysis Plan

Analysis Plan Documents

Pre-Analysis Plan

MD5: 25abd06e6855c2b7b13829555a272e8e

SHA1: 425f4a110657ca65d51dea481c1e4ddf5f0a5b33

Uploaded At: July 09, 2015

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
Yes
Intervention Completion Date
December 31, 2015, 12:00 +00:00
Data Collection Complete
Yes
Data Collection Completion Date
September 30, 2015, 12:00 +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
N/A
Was attrition correlated with treatment status?
No
Final Sample Size: Total Number of Observations
The final sample includes 9,861 subjects.
Final Sample Size (or Number of Clusters) by Treatment Arms
Approximately 550 subjects for each treatment arm (with 18 treatment arms)
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
No
Reports, Papers & Other Materials

Relevant Paper(s)

Abstract
How much do different monetary and non-monetary motivators induce costly effort? Does the effectiveness line up with the expectations of researchers and with results in the literature? We conduct a large-scale real-effort experiment with 18 treatment arms. We examine the effect of (i) standard incentives; (ii) behavioral factors like social preferences and reference dependence; and (iii) non-monetary inducements from psychology. We find that (i) monetary incentives work largely as expected, including a very low piece rate treat- ment which does not crowd out effort; (ii) the evidence is partly consistent with standard behavioral models, including warm glow, though we do not find evidence of probability weighting; (iii) the psychological motivators are effective, but less so than incentives. We then compare the results to forecasts by 208 academic experts. On average, the experts an- ticipate several key features, like the effectiveness of psychological motivators. A sizeable share of experts, however, expects crowd-out, probability weighting, and pure altruism, counterfactually. As a further comparison, we present a meta-analysis of similar treat- ments in the literature. Overall, predictions based on the literature are correlated with, but underperform, the expert forecasts.
Citation
"What Motivates Effort? Evidence and Expert Forecasts." This version: March 15, 2017.
Abstract
Academic experts frequently recommend policies and treatments. But how well do they anticipate the impact of different treatments? And how do their predictions compare to the predictions of non-experts? We analyze how 208 experts forecast the results of 15 treatments involving monetary and non-monetary motivators in a real-effort task. We compare these forecasts to those made by PhD students and non-experts: undergraduates, MBAs, and an online sample. We document seven main results. First, the average forecast of experts predicts quite well the experimental results. Second, there is a strong wisdom-of- crowds effect: the average forecast outperforms 96 percent of individual forecasts. Third, correlates of expertise–citations, academic rank, field, and contextual experience—do not improve forecasting accuracy. Fourth, experts as a group do better than non-experts, but not if accuracy is defined as rank ordering treatments. Fifth, measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent, especially for non- experts. Sixth, using these measures we identify ‘superforecasters’ among the non-experts who outperform the experts out of sample. Seventh, we document that these results on forecasting accuracy surprise the forecasters themselves. We present a simple model that organizes several of these results and we stress the implications for the collection of forecasts of future experimental results.
Citation
"Predicting Experimental Results: Who Knows What?" This version: August 16, 2016.

Reports & Other Materials