Reciprocity in Peer Assessments

Last registered on April 29, 2026

Pre-Trial

Trial Information

General Information

Title
Reciprocity in Peer Assessments
RCT ID
AEARCTR-0018376
Initial registration date
April 22, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
April 29, 2026, 3:32 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Xiangtan University

Other Primary Investigator(s)

PI Affiliation
University of Cyprus
PI Affiliation
University of Southampton
PI Affiliation
University of Cyprus

Additional Trial Information

Status
In development
Start date
2026-04-23
End date
2026-12-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Peer assessment is widely used in academic settings, workplace evaluations, and collaborative contexts as a scalable alternative to expert grading. A well-documented concern is that grades assigned by peers may be driven not only by the objective quality of the work being evaluated, but also by strategic and social considerations — most notably, reciprocity.
This study examines two related phenomena: (i) whether evaluators who expect their own grade to be influenced by the grade they assign (i.e., sequential first movers in a dyadic grading exchange) inflate their assessments in anticipation of reciprocal reward, and (ii) whether evaluators who have already received a grade adjust their own assessment in response to the surprise component of the grade they received.
We exploit a controlled laboratory design in which participants first complete an analytical task, then grade each other's response. In the sequential condition, the second grader (the responder) observes the grade they received before assigning a grade in return, creating an identifiable window for reciprocal adjustment. In the simultaneous condition, both grades are submitted without knowledge of the other's assessment. Comparing grading behaviour across these two conditions allows us to isolate strategic anticipation and reciprocal adjustment from other determinants of peer grading.
External Link(s)

Registration Citation

Citation
Maniadis, Zacharias et al. 2026. "Reciprocity in Peer Assessments." AEA RCT Registry. April 29. https://doi.org/10.1257/rct.18376-1.0
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2026-04-23
Intervention End Date
2026-12-31

Primary Outcomes

Primary Outcomes (end points)
The grade assigned to one's peer for the response they submitted for the main task.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
**Participants and Setting
Participants will be recruited from a university subject pool and will participate in sessions of approximately 30 to 45 minutes. Participants are randomly assigned to either the simultaneous condition (control) or the sequential condition (treatment). The unit of randomisation is the individual; within each session, participants are randomly paired into dyads. Informed consent will be obtained from all participants prior to the experiment.
**CRT
Participants are first asked to take a Cognitive Reflection Task (CRT), consisting of three mutiple choice questions. We use the three-item CRT MCQ-4 from Sirota & Juanchich (2018), keeping the order of items fixed across participants (bat & ball, widgets, lillys) and randomizing the order of answer options in each item for each participant. After responding to the CRT, participants move on to completing the main task.
**The Main Task
Each participant acts as a hospital administrator and is presented with a table of patient recovery times across nine hospital units over six months. Some units implemented a new staff training programme in March; others continued with the old procedures throughout the period. Participants are asked to evaluate, based on the data, whether the training programme improved health outcomes and to write a short response (maximum 500 characters). They are given 10 minutes to submit their response.
The correct conclusion is that the training had no detectable effect: recovery times fell across all nine units — including the four that never received training — suggesting a confounding time trend rather than a genuine treatment effect. Reaching this conclusion requires comparing trained and untrained units rather than looking only at within-unit trends.
After submitting their response, participants report their own assessment of the training programme's effectiveness on a seven-point scale from −3 (strongly harmful) to +3 (strongly beneficial), with 0 indicating no effect. They also record their expected grade (the grade they expect to receive from their partner) on a 1-10 scale. These measures are note reported to their partner.
**Grading Procedure
After completing the task, each participant grades their partner's response on a 1–10 scale. In the sequential condition, one participant in each pair is randomly designated the first mover and submits their grade first. The responder then observes their received grade before assigning their own. The responder's window for adjustment is thus identified by the timing of information revelation. In the simultaneous condition, both participants submit grades without knowledge of the other's assessment.
After submitting a grade, we ask participants to provide some feedback regarding the grading criteria they applied. These are not going to be used for analysis and serve only for potentially improving the design implementation in future studies.
**Survey
After grading, participants complete a short post-task survey including demographic questions (age, gender, study major and year of study, major) and validated measures of subjective numeracy (SNS-3 from MCNaughton et al., 2015).
Experimental Design Details
Not available
Randomization Method
Participants are randomly assigned to condition at the session level, with equal allocation targeted across conditions. Within sessions, dyad formation and first-mover designation are determined by random draw at the time of pairing.
Randomization Unit
Individual participant
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
0
Sample size: planned number of observations
180 particpants
Sample size (or number of clusters) by treatment arms
60 (30 dyads) in the simultaneous condition and 120 (60 dyads) in the sequential condition, out of which 60 will be first-movers and 60 will be second-movers.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Cyprus National Bioethics Committee
IRB Approval Date
2020-07-24
IRB Approval Number
ΕΕΒΚ ΕΠ 2020.01.166