Experimental Design
**Participants and Setting
Participants will be recruited from a university subject pool and will participate in sessions of approximately 30 to 45 minutes. Participants are randomly assigned to either the simultaneous condition (control) or the sequential condition (treatment). The unit of randomisation is the individual; within each session, participants are randomly paired into dyads. Informed consent will be obtained from all participants prior to the experiment.
**CRT
Participants are first asked to take a Cognitive Reflection Task (CRT), consisting of three mutiple choice questions. We use the three-item CRT MCQ-4 from Sirota & Juanchich (2018), keeping the order of items fixed across participants (bat & ball, widgets, lillys) and randomizing the order of answer options in each item for each participant. After responding to the CRT, participants move on to completing the main task.
**The Main Task
Each participant acts as a hospital administrator and is presented with a table of patient recovery times across nine hospital units over six months. Some units implemented a new staff training programme in March; others continued with the old procedures throughout the period. Participants are asked to evaluate, based on the data, whether the training programme improved health outcomes and to write a short response (maximum 500 characters). They are given 10 minutes to submit their response.
The correct conclusion is that the training had no detectable effect: recovery times fell across all nine units — including the four that never received training — suggesting a confounding time trend rather than a genuine treatment effect. Reaching this conclusion requires comparing trained and untrained units rather than looking only at within-unit trends.
After submitting their response, participants report their own assessment of the training programme's effectiveness on a seven-point scale from −3 (strongly harmful) to +3 (strongly beneficial), with 0 indicating no effect. They also record their expected grade (the grade they expect to receive from their partner) on a 1-10 scale. These measures are note reported to their partner.
**Grading Procedure
After completing the task, each participant grades their partner's response on a 1–10 scale. In the sequential condition, one participant in each pair is randomly designated the first mover and submits their grade first. The responder then observes their received grade before assigning their own. The responder's window for adjustment is thus identified by the timing of information revelation. In the simultaneous condition, both participants submit grades without knowledge of the other's assessment.
After submitting a grade, we ask participants to provide some feedback regarding the grading criteria they applied. These are not going to be used for analysis and serve only for potentially improving the design implementation in future studies.
**Survey
After grading, participants complete a short post-task survey including demographic questions (age, gender, study major and year of study, major) and validated measures of subjective numeracy (SNS-3 from MCNaughton et al., 2015).