On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II

Last registered on May 30, 2018

Pre-Trial

Trial Information

General Information

Title
On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II
RCT ID
AEARCTR-0003029
Initial registration date
May 30, 2018

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 30, 2018, 9:58 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
University of Cologne

Other Primary Investigator(s)

PI Affiliation
University of Cologne
PI Affiliation
University of Cologne
PI Affiliation
University of Cologne

Additional Trial Information

Status
In development
Start date
2018-06-04
End date
2018-08-01
Secondary IDs
Abstract
A natural field experiment is conducted to investigate the influence of a dummy evaluation category at the bottom of a feedback scale on effort provision and performance. Subjects work on a real effort task in two successive periods. A performance dependent bonus is paid for both periods. Performance is evaluated using three evaluation categories. The first category is awarded to the highest performing subjects while the lowest performing subjects are evaluated with a three.

Subjects are randomly assigned to one of three treatments. In treatment ND no dummy evaluation category is shown, i.e. subjects are shown the actual three evaluation categories. In treatment CD a 4th dummy evaluation category is shown. In treatment TD a 4th dummy evaluation category is shown and the non-usage of the 4th evaluation category is made transparent.

We hypothesize that average effort provision and performance is higher in treatment CD compared to treatment ND in the second period. We also hypothesize that average effort provision and performance is higher in treatment CD compared to treatment TD in the second period. Moreover, we hypothesize that average effort provision and performance is higher in treatment TD compared to treatment ND in the second period. We expect these effects to be strongest for those ranking lowest (third).
External Link(s)

Registration Citation

Citation
Sliwka, Dirk et al. 2018. "On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II." AEA RCT Registry. May 30. https://doi.org/10.1257/rct.3029-1.0
Former Citation
Sliwka, Dirk et al. 2018. "On Rating Scales in Subjective Performance Evaluations – The Effect of a Dummy Category II." AEA RCT Registry. May 30. https://www.socialscienceregistry.org/trials/3029/history/30165
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2018-06-04
Intervention End Date
2018-08-01

Primary Outcomes

Primary Outcomes (end points)
The number of cover sheets entered correctly on the individual level (individual performance), the number of cover sheets entered on the individual level (individual effort provision), questionnaire data (post)
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The first period is the same across treatments – subjects are asked to work on a real effort task. Subjects are informed that they receive a performance based bonus payment but no information on the evaluation scale is given.

In the second period, subjects learn about the evaluation scale and receive private feedback on their own evaluation of the performance in the first period before they can work again. Actual evaluations are based on the relative performance of subjects and follow exactly the same procedure in all three treatments such that only categories 1-3 are actually awarded. Subjects are not informed about the specific details of the evaluation procedure but they learn that about 30% of subjects are awarded the best grade and 40% the second best grade. In treatment ND and TD they are informed that 30% of the evaluations are in category 3. In treatment CD they learn that 30% of the evaluations are either in category 3 or category 4.
Subjects are informed that the evaluation scale of the first period is used for the bonus payments in the second period.
We are varying whether a 4th dummy evaluation category is shown in the evaluation scale in the second period (treatment ND vs. CD/TD). Additionally, we are varying whether the non-usage of the 4th dummy evaluation category is made transparent (treatment CD vs. TD).
Experimental Design Details
A natural field experiment is conducted on Amazon MTurk. As a university department we ask subjects to update a database on class grades in two successive periods. In period 1 (2) we provide 200 (400) scanned exam cover sheets that contain six handwritten grades each.

The first period is the same across treatments. After short instructions, a quiz on the task and payment structure needs to be passed. Subjects can then work for 20 minutes. Subjects are informed that a performance based bonus is paid additional to a fixed wage. Performance is defined as the number of correctly entered cover sheets. However, no information on the number of feedback categories is given.

After the first period, subjects are invited by e-mail to work again. When entering the second period, subjects are given private feedback on the performance in the first period. Across treatments, performance is evaluated using three evaluation categories. A quiz on the task and payment structure needs to be passed to work in the second period. Working time is not restricted. Subjects are paid a fixed wage and an additional performance based bonus in the second period. Subjects are informed that the evaluation scale of the first period is used for the evaluation of the performance in the second part. Performance is defined as in the first period.

We randomly assign subjects to either treatment CD, TD or control group ND stratifying treatment assignment based on the performance of the first period.
Randomization Method
Stratification method
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
1665
Sample size: planned number of observations
1665
Sample size (or number of clusters) by treatment arms
Control (ND): 555 subjects
CD: 555 subjects
TD: 555 subjects
Note: There can be slight changes in the number of subjects in each treatment due to selective attrition in the second part. However, we document drop-outs and test whether these are systematic.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
MDES= 8 [cover sheets entered correctly], standard deviation=32 [cover sheets entered correctly], 11%
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials