Self-Assessments in Subjective Performance Evaluation

Last registered on June 12, 2025

Pre-Trial

Trial Information

General Information

Title
Self-Assessments in Subjective Performance Evaluation
RCT ID
AEARCTR-0015223
Initial registration date
January 27, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 27, 2025, 10:38 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
June 12, 2025, 4:35 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
Leibniz University Hannover

Other Primary Investigator(s)

PI Affiliation
Erasmus University Rotterdam
PI Affiliation
University of Cologne

Additional Trial Information

Status
In development
Start date
2025-08-04
End date
2025-12-23
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Many organizations ask their employees to provide a self-assessment of their own performance to supervisors before these supervisors evaluate the employee's performance. We study the effect of using self-assessments for the accuracy of overall performance evaluations in an online experiment where subjects work on a real effort task and other subjects have to rate their performance. In a 2x2 experimental design we vary (i) whether agents provide a self-assessment to supervisors and (ii) whether agents receive a bonus based on the rating provided by the supervisor.
External Link(s)

Registration Citation

Citation
Kusterer, David, Marina Schröder and Dirk Sliwka. 2025. "Self-Assessments in Subjective Performance Evaluation." AEA RCT Registry. June 12. https://doi.org/10.1257/rct.15223-2.0
Experimental Details

Interventions

Intervention(s)
In a 2x2 experimental design we vary (i) whether agents provide a self-assessment to supervisors and (ii) whether agents receive a bonus based on the rating provided by the supervisor.
Intervention (Hidden)
See experimental design.
Intervention Start Date
2025-08-04
Intervention End Date
2025-12-23

Primary Outcomes

Primary Outcomes (end points)
- Informativeness of ratings
- Rating
- Self-Assessment
Primary Outcomes (explanation)
We measure the informativeness of ratings by (i) the squared deviation between rating and actual performance and (ii) the profit achieved from a fictitious job assignment decision.

Secondary Outcomes

Secondary Outcomes (end points)
Prediction quality
Secondary Outcomes (explanation)
We regress performance on the rating and compare the respective coefficients of determination (R²) between the treatments.

Experimental Design

Experimental Design
We study the effect of using self-assessments for the accuracy of overall performance evaluations in an online experiment where subjects work on a real effort task and other subjects have to rate their performance. In a 2x2 experimental design we vary (i) whether agents provide a self-assessment to supervisors and (ii) whether agents receive a bonus based on the rating provided by the supervisor.
Experimental Design Details
We have developed a formal model in which a supervisor and an agent both receive signals on the agent’s performance. The agent then reports a self-assessment to the supervisor and the supervisor updates her beliefs on the agent’s performance and makes the performance assessment. The supervisor is paid based on the accuracy of her rating (as measured by the squared difference between true performance and rating). Agents can receive a bonus, which depends on the supervisor’s rating. Agents trade off potential monetary gains against disutility from dishonesty in self-assessments and supervisors have incomplete information on the relative weights each agent assigns to both objectives. We show in the model that (i) supervisors’ ratings are increasing in agents’ self-assessments, (ii) less honest agents inflate their self- assessments to a stronger extent and thus receive higher bonuses, but (iii) the use of self-assessments still leads to a higher overall expected accuracy of ratings.

We test the predictions of the model conducting an experiment on CloudResearch Connect. Subjects are participating in the experiment as either agents (called workers in the experiment) or supervisors (called raters in the experiment). In a first stage, subjects in the role of agents perform a real effort entry task. The task consists of entering text contained in hard-to-read images (similar to so-called ‘captchas’). Agents see 10 pages with 10 images on each page. Each page has one of five different time limits: 17, 19, 21, 23, or 25 seconds. Each of these time limits occurs exactly twice in randomized order. The order is the same for all agents. The time limit for the next page is announced during a 5-seconds countdown before the page starts. The sum of correctly entered words out of the 100 words across all 10 pages constitutes an agent’s true performance. After the entry task, agents review all their entries across the 10 pages and are then asked to submit a self-assessment about the number (out of 100) of correctly entered words. Agents also see the distribution of performance from agents from another experiment with the same real-effort task (the same data that was shown to participants in Kusterer/Sliwka, forthcoming). They are informed whether this self-assessment will be revealed to the supervisor.

In a second stage, agents’ performance is rated by subjects in the role of supervisors. For each rated agent the supervisor sees the number of correctly entered words on one randomly selected screen. Each supervisor rates five agents and one of these ratings is payoff relevant for the supervisor and the respective agent. The supervisors also see the distribution of performance that was shown to the agents on their self-assessment page. The ratings are given on a scale of 0-100, and supervisors are told that they should provide a “rating for the number of correctly entered words out of the 100 displayed words across all 10 pages for this worker”. In all treatments, a supervisor’s payment depends on the accuracy of the rating, i.e. the squared deviation between the agents’ performance on all 10 pages of the Entry Task and the supervisor’s assessment of this performance.

We elicit agents’ preferences for honesty with different survey measures (Grosch et al., 2020; Necker and Paetzel; 2023 ; Schudy et al., 2024).

We run a 2x2 experimental design varying whether or not
(a) The agent’s self-assessment is revealed to the supervisor after having performed the data entry task and
(b) the agent receives a bonus based on the rating provided by the supervisor.

Procedure of the experiment:

Agents are randomly assigned to the role (agent or supervisor) and to the treatment. The identity of agents is never revealed to other participants or to the experimenter. Payment is assigned based on the Connect ID, which the experimenter cannot link to the identity of participants. Before the experiment starts, participants provide informed consent. Specifically, participants are informed that they have the right to withdraw from the study at any time.

Participants are also informed that they have to perform the task within 60 minutes and to correctly answer a set of comprehension questions in order to be eligible for payment.
Supervisors receive a fixed wage of $0.80 plus a bonus of min{$2.50-$〖0.001*(Rating-Performance)〗^2,0}
and agents receive a fixed wage of $0.80 + $1.7 and in the treatments with a bonus this bonus is $1*Rating*0.01.

The treatments are summarized below. The number n in each cell reflects the number of agents. There is an equal number of supervisors in each cell.
1. No self-assessment, no bonus, n ≈1000
2. Self-assessment, no bonus, n ≈1000
3. No self-assessment, bonus, n ≈1000
4. Self-assessment, bonus, n ≈1000

Exclusion criteria
On CloudResearch Connect, we exclude participants with less than 95% approval rate. Agents are excluded from payment (and further participation in the study) if they do not enter the text from a single image and are informed about this. For the analysis, we also drop all observations from supervisors who spent less than 10 seconds in total on all 5 evaluation screens.
All participants must enter comprehension questions to make sure they understand the instructions. If they do not answer a question correctly after the second attempt, they are excluded from further participation.

References:
Grosch, K., Müller, S., Rau, H., Zhurakhovska, L. (2020). Measuring (social) preferences with simple and short questionnaires. mimeo.
Kusterer, D. J., & Sliwka, D. (forthcoming). Social Preferences and the Informativeness of Subjective Performance Evaluations. Management Science.
Necker, S., & Paetzel, F. (2023). The effect of losing and winning on cheating and effort in repeated competitions. Journal of Economic Psychology, 98, 102655.
Schudy, S., Grundmann, S., & Spantig, L. (2024). Individual Preferences for Truth-Telling. CESifo Working Papers, 2024(11521).


Randomization Method
Randomization by computer
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
4000 agents
4000 supervisors
Sample size: planned number of observations
4000 agents 4000 supervisors
Sample size (or number of clusters) by treatment arms
1000 agents per treatment
1000 supervisors per treatment
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We conducted a power analysis based on data from Kusterer and Sliwka (forthcoming) which studied subjective performance evaluations in a related setting without self-assessments. To achieve 80% power for a sensible effect size of an increase in the accuracy of ratings (i.e. squared rating error), we will have about 1.000 supervisors per treatment. Or, respectively the MDE for the squared rating error is estimated at about 108. As, in contrast to Kusterer and Sliwka, now each supervisor evaluates 5 agents, we expect to have larger power/a lower MDE
IRB

Institutional Review Boards (IRBs)

IRB Name
Ethical Review Board of the Faculty of Management, Economics and Social Sciences of the University of Cologne
IRB Approval Date
2024-07-15
IRB Approval Number
240036DS
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials