How Context Shapes Teachers’ Use of AI Grading

Last registered on October 01, 2025

Pre-Trial

Trial Information

General Information

Title
How Context Shapes Teachers’ Use of AI Grading
RCT ID
AEARCTR-0016662
Initial registration date
September 21, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 22, 2025, 6:52 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 01, 2025, 9:46 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
The University of Queensland

Other Primary Investigator(s)

PI Affiliation
Yale University
PI Affiliation
Curtin University

Additional Trial Information

Status
In development
Start date
2025-09-21
End date
2026-01-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Schools are beginning to deploy artificial-intelligence (AI) tools that suggest grades or feedback, yet little is known about how context and explanations shape teachers’ use of these recommendations. We study this with an anonymous, preregistered online survey experiment of K–12 teachers. Participants grade brief, realistic vignettes and are randomly assigned to (i) no AI suggestion (control), (ii) an AI grade recommendation, or (iii) the same recommendation accompanied by task/rubric context and a concise rationale (explainability). For a subset of vignettes we manipulate the accuracy of the AI suggestion, enabling tests of calibration (accepting correct and rejecting incorrect advice). Primary outcomes are AI uptake/override, accuracy-conditioned uptake, perceived trust, fairness/transparency, workload, and the between-teacher variance of assigned grades. We further explore heterogeneity by experience, subject area, and prior AI use. The design isolates whether contextual and explanatory cues help teachers discriminate between good and bad AI advice and whether such cues improve perceptions without increasing cognitive burden. Findings will inform the design and training of human-in-the-loop assessment systems that support—rather than substitute for—professional judgment.
External Link(s)

Registration Citation

Citation
Goulas, Sofoklis, Rigissa Megalokonomou and Panagiotis Sotirakopoulos. 2025. "How Context Shapes Teachers’ Use of AI Grading." AEA RCT Registry. October 01. https://doi.org/10.1257/rct.16662-2.0
Experimental Details

Interventions

Intervention(s)
K–12 teachers complete a short, anonymous online survey (~9 minutes) with brief grading vignettes. For each vignette, teachers are randomly assigned to one of three conditions: (1) Control – no AI input; (2) AI recommendation – an on-screen suggested grade; (3) AI + context/explanation – the same suggestion plus brief task/rubric context and a one-sentence rationale. Teachers choose a grade and answer a few items on usefulness, fairness/transparency, and workload. No identifying information is collected, and no real student records are involved. Participation is voluntary.
Intervention (Hidden)
Intervention Start Date
2025-09-21
Intervention End Date
2025-11-29

Primary Outcomes

Primary Outcomes (end points)
teacher grade
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Experimental Design (Public)
Anonymous online survey–experiment with K–12 teachers (~9 minutes). Each participant grades several short, realistic student-work vignettes. For each vignette, the platform randomly assigns one of three conditions: (i) Control (no AI input), (ii) AI recommendation (an on-screen suggested grade), or (iii) AI + context/explanation (the same suggestion plus brief task/rubric context and a one-sentence rationale). Order of vignettes is randomized.

Teachers choose a grade and answer a few brief questions on usefulness, fairness/transparency, and workload. Primary comparisons are across the three conditions (intent-to-treat). No identifying data are collected and no real student records are involved; participation is voluntary.
Experimental Design Details
Randomization Method
Randomization by Qualtrics
Randomization Unit
individuals
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
hopefully 1500 teachers
Sample size: planned number of observations
hopefully 1500 teachers
Sample size (or number of clusters) by treatment arms
500 teachers in each arm
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Monash University
IRB Approval Date
2025-10-01
IRB Approval Number
49289

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials