On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II

Last registered on May 30, 2018

View Trial History

Pre-Trial

Trial Information

General Information

Title

On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II

RCT ID

AEARCTR-0003029

Initial registration date

May 30, 2018

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

May 30, 2018, 9:58 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

United States of America

Region

Primary Investigator

Name

Thomas Vogt

Affiliation

University of Cologne

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Tobias Stangl

PI Affiliation

University of Cologne

Contact Investigator

PI Name

Dirk Sliwka

PI Affiliation

University of Cologne

Contact Investigator

PI Name

Ulrich W. Thonemann

PI Affiliation

University of Cologne

Contact Investigator

Additional Trial Information

Status

In development

Start date

2018-06-04

End date

2018-08-01

Keywords

Labor

Additional Keywords

Leniency Bias, Performance Appraisals, Incentives, Happiness and Productivity, Reciprocity, Rank Preferences

JEL code(s)

Secondary IDs

Abstract

A natural field experiment is conducted to investigate the influence of a dummy evaluation category at the bottom of a feedback scale on effort provision and performance. Subjects work on a real effort task in two successive periods. A performance dependent bonus is paid for both periods. Performance is evaluated using three evaluation categories. The first category is awarded to the highest performing subjects while the lowest performing subjects are evaluated with a three.

Subjects are randomly assigned to one of three treatments. In treatment ND no dummy evaluation category is shown, i.e. subjects are shown the actual three evaluation categories. In treatment CD a 4th dummy evaluation category is shown. In treatment TD a 4th dummy evaluation category is shown and the non-usage of the 4th evaluation category is made transparent.

We hypothesize that average effort provision and performance is higher in treatment CD compared to treatment ND in the second period. We also hypothesize that average effort provision and performance is higher in treatment CD compared to treatment TD in the second period. Moreover, we hypothesize that average effort provision and performance is higher in treatment TD compared to treatment ND in the second period. We expect these effects to be strongest for those ranking lowest (third).

External Link(s)

Registration Citation

Citation

Sliwka, Dirk et al. 2018. "On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II." AEA RCT Registry. May 30. https://doi.org/10.1257/rct.3029-1.0

Former Citation

Sliwka, Dirk et al. 2018. "On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II." AEA RCT Registry. May 30. https://www.socialscienceregistry.org/trials/3029/history/30165

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention (Hidden)

Intervention Start Date

2018-06-04

Intervention End Date

2018-08-01

Primary Outcomes

Primary Outcomes (end points)

The number of cover sheets entered correctly on the individual level (individual performance), the number of cover sheets entered on the individual level (individual effort provision), questionnaire data (post)

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

The first period is the same across treatments – subjects are asked to work on a real effort task. Subjects are informed that they receive a performance based bonus payment but no information on the evaluation scale is given.

In the second period, subjects learn about the evaluation scale and receive private feedback on their own evaluation of the performance in the first period before they can work again. Actual evaluations are based on the relative performance of subjects and follow exactly the same procedure in all three treatments such that only categories 1-3 are actually awarded. Subjects are not informed about the specific details of the evaluation procedure but they learn that about 30% of subjects are awarded the best grade and 40% the second best grade. In treatment ND and TD they are informed that 30% of the evaluations are in category 3. In treatment CD they learn that 30% of the evaluations are either in category 3 or category 4.
Subjects are informed that the evaluation scale of the first period is used for the bonus payments in the second period.
We are varying whether a 4th dummy evaluation category is shown in the evaluation scale in the second period (treatment ND vs. CD/TD). Additionally, we are varying whether the non-usage of the 4th dummy evaluation category is made transparent (treatment CD vs. TD).

Experimental Design Details

A natural field experiment is conducted on Amazon MTurk. As a university department we ask subjects to update a database on class grades in two successive periods. In period 1 (2) we provide 200 (400) scanned exam cover sheets that contain six handwritten grades each.

The first period is the same across treatments. After short instructions, a quiz on the task and payment structure needs to be passed. Subjects can then work for 20 minutes. Subjects are informed that a performance based bonus is paid additional to a fixed wage. Performance is defined as the number of correctly entered cover sheets. However, no information on the number of feedback categories is given.

After the first period, subjects are invited by e-mail to work again. When entering the second period, subjects are given private feedback on the performance in the first period. Across treatments, performance is evaluated using three evaluation categories. A quiz on the task and payment structure needs to be passed to work in the second period. Working time is not restricted. Subjects are paid a fixed wage and an additional performance based bonus in the second period. Subjects are informed that the evaluation scale of the first period is used for the evaluation of the performance in the second part. Performance is defined as in the first period.

We randomly assign subjects to either treatment CD, TD or control group ND stratifying treatment assignment based on the performance of the first period.

Randomization Method

Stratification method

Randomization Unit

Individual

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

1665

Sample size: planned number of observations

1665

Sample size (or number of clusters) by treatment arms

Control (ND): 555 subjects
CD: 555 subjects
TD: 555 subjects
Note: There can be slight changes in the number of subjects in each treatment due to selective attrition in the second part. However, we document drop-outs and test whether these are systematic.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

MDES= 8 [cover sheets entered correctly], standard deviation=32 [cover sheets entered correctly], 11%

Supporting Documents and Materials

IRB