Experimental Design
We conduct an online experiment on Amazon MTurk. Our experiment consists of three parts. Each part is finished before the next part starts. The individual parts are described in more detail below.
In part 1, subjects (called workers for the purpose of this document) work on a real effort task. After part 1 is completed, another set of subjects (called supervisors for the purpose of this document) receive noisy information about the performance of one of the workers from part 1 and rate their work. After part 2 is completed, subjects from part 1 are invited again to learn their rating. At the beginning of parts 1 and 2, participants agree to a consent form, read instructions, and answer comprehension questions.
Part 1: Entry Task
Workers perform a real effort task. The task consists of entering text contained in hard-to-read images (similar to so-called ‘captchas’). Workers see 10 pages with 10 images on each page. Each page has one of five different time limits: 17, 19, 21, 23, or 25 seconds. Each of these time limits occurs exactly twice in randomized order. The order is the same for all subjects. The time limit for the next page is announced during a 5-seconds countdown before the page starts. Workers are also asked their belief about their performance on all 10 pages.
There are no treatments in part 1. Workers learn in the instructions that their work will be rated by other MTurk worker(s) and that they will receive a payment which may depend on the rating they receive in part 2.
Workers fill in a demographics questionnaire after completing the Entry Task.
Part 2: Rating Task
First, supervisors work on two pages of the Entry Task to get familiar with the real effort task. One of the pages has the shortest time limit of 17 seconds while the other page has the longest time limit of 25 seconds. Supervisors are also asked about their belief about their performance on the two example pages.
Then, supervisors are matched to a random worker from part 1. Matching is anonymous and participants never receive information on the identity of other subjects. Supervisors receive a noisy signal about the number of correctly entered images by the matched worker and are asked to give a rating to the worker. They are told that the rating should reflect performance on all 10 pages of the Entry Task.
We vary whether supervisors are paid according to accuracy or not (A and NA), whether workers are paid according to the rating or not (R and NR) and whether supervisors observe 1 or 4 pages out of the 10 pages the workers have worked on (S1 and S4). We conduct altogether 6 treatments:
• A-R-S1
• NA-R-S1
• A-R-S4
• NA-R-S4
• A-NR-S1
• NA-NR-S1
We randomly assign workers (and hence matched supervisors) to the six treatments stratifying the assignment to obtain comparable performance distributions across treatments.
In all treatments, supervisors receive a payment that is increasing in their matched worker’s actual performance. At the end of part 2, supervisors complete the Social Value Orientation (SVO) slider measure (Murphy et al., 2011) with a random worker (but not the one they rated) as the recipient in order to measure their social preferences towards the worker population. They also fill in a reciprocity questionnaire (Dohmen et al., 2009), the Big Five Inventory (Rammstedt and John, 2005), and a demographics questionnaire. After this, they learn their total payment and leave the study.
Part 3
Workers who completed part 1 and who received a rating in part 2 are invited per email to participate in part 3.
First, they learn their rating and their actual performance and are asked to submit their satisfaction with their performance and their satisfaction with the rating. Second, they learn whether their payment depends on the rating, and learn their payment. They complete the SVO slider measure with their supervisor as the recipient to measure their social preferences towards their supervisor. After this, they learn their total payment consisting of payments from the Entry Task, the SVO they completed, and the SVO another supervisor completed in part 2. This concludes the experiment.
Exclusion criteria
We restrict participation to MTurk workers who have completed at least 1000 HITs (Human Interface Task) on MTurk and who have an approval rate of at least 98%. These restrictions are standard in the literature and ensure high data quality. Subjects are excluded from payment (and further participation in the study) if they do not enter a single image in part 1.
In parts 1 and 2, participants have to enter comprehension questions to make sure they understand the instructions. If they do not answer a question correctly after the third attempt they are excluded from further participation.
Before participants agree to participate in the study in part 1, they are made aware that they will only receive their payment if they also participate in the third part of the study within 4 weeks of receiving the invitation email.
Pre-Analysis Plan
We will regress performance ratings on treatment dummies, the standardized aggregated signal observed by the respective supervisor, and interaction terms between the signal and the respective treatment dummies. The treatment dummies thus capture between treatment differences in rating leniency and the respective interaction terms differences in rating compression. Furthermore, we will compare the average ratings between prosocial and individualistic supervisors (according to SVO) within treatments.
References
T. Dohmen, A. Falk, D. Huffman, and U. Sunde. Homo reciprocans: Survey evidence on behavioural outcomes. Economic Journal, 119(536):592–612, 2009.
R. O. Murphy, K. A. Ackermann, and M. J. J. Handgraaf. Measuring social value orientation. Judgment and Decision Making, 6(8):771–781, 2011.
B. Rammstedt and O. P. John. Kurzversion des Big Five Inventory (BFI-K): Entwicklung und Validierung eines ökonomischen Inventars zur Erfassung der fünf Faktoren der Persönlichkeit. Diagnostica, 51(4):195– 206, 2005.