Teacher Biases and AI

Last registered on January 22, 2026

Pre-Trial

Trial Information

General Information

Title
Teacher Biases and AI
RCT ID
AEARCTR-0017679
Initial registration date
January 17, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 22, 2026, 7:03 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Yale University

Other Primary Investigator(s)

PI Affiliation
Heriot Watt University
PI Affiliation
Monash University

Additional Trial Information

Status
In development
Start date
2026-01-23
End date
2026-07-10
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Artificial intelligence (AI) tools are increasingly used in education, including for providing feedback and supporting assessment. This randomized experiment studies how access to an AI chatbot affects teachers’ grading decisions and potential biases when marking real student work. Teachers recruited to an online Qualtrics study are each assigned one authentic student script consisting of a question prompt and handwritten responses. The experiment randomizes two dimensions: (i) the name attached to the student work, which serves as a signal of student characteristics, and (ii) whether teachers have access to an AI assistant while marking. One-third of teachers mark the script without AI support, one-third receive access to an untrained AI chatbot, and one-third receive access to a trained AI assistant that is randomly designed to provide either systematically “fair” or systematically “harsh” guidance.

The study has two primary objectives. First, it estimates the extent to which student name cues affect teachers’ evaluations and grades for identical work. Second, it tests whether AI support changes teachers’ perceptions, grading behavior, and potential bias, and whether trained AI guidance can reduce grading errors or attenuate name-based disparities relative to both no-AI grading and untrained AI access. Follow-up survey measures collect information on teachers’ perceptions of the script, confidence, decision-making process, and use of the AI tool, allowing exploration of mechanisms underlying any observed effects. The study contributes to evidence on how AI tools interact with human judgment and bias in high-stakes educational evaluation.
External Link(s)

Registration Citation

Citation
Goulas, Sofoklis, Rigissa Megalokonomou and Hedier Tajali. 2026. "Teacher Biases and AI." AEA RCT Registry. January 22. https://doi.org/10.1257/rct.17679-1.0
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2026-01-23
Intervention End Date
2026-07-10

Primary Outcomes

Primary Outcomes (end points)
Primary Outcomes (End Points)
1. Assigned grade / score (continuous): The numeric mark the teacher assigns to the student script (converted to a standardized score where needed).
2. Name-based grading gap (treatment effect): Differences in assigned grades across randomly assigned student names for identical work, measured as the average score difference by name condition.
3. AI effect on grading level and dispersion: Differences in assigned grades between teachers with no AI, untrained AI, and trained AI access (including whether AI support increases or decreases average grades and reduces between-teacher variability).
4. Bias mitigation / amplification via AI: Interaction between student name condition and AI condition, capturing whether AI access attenuates or magnifies name-based differences in grading.
5. Teacher perception outcomes (scale-based endpoints): Post-marking survey measures of perceived student ability/effort/quality of work and confidence in the assigned grade (measured on pre-specified Likert scales).
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
This study is an online randomized experiment implemented via Qualtrics with a sample of teachers. Each participant is asked to mark one authentic student script consisting of a prompt and handwritten responses. The experiment uses a randomized factorial design with two components. First, teachers are randomly assigned to receive an otherwise identical script labeled with different student names. Second, teachers are randomly assigned to one of three marking conditions: (i) marking without access to any AI tool, (ii) marking with access to an untrained AI chatbot, or (iii) marking with access to a trained AI assistant. In the trained-AI condition, participants are further randomly assigned to receive one of two versions of the assistant. After grading, teachers complete a short follow-up survey measuring perceptions of the student work and the marking process. The primary outcomes are teachers’ assigned grades and related perception measures, and the design allows estimation of how name cues and AI access affect grading decisions and potential disparities.
Experimental Design Details
Not available
Randomization Method
Random assignment is implemented automatically by Qualtrics using a computer-generated randomization procedure. Participants are randomly assigned at the individual (teacher) level to (i) the student name condition and (ii) the AI access condition (no AI vs untrained AI vs trained AI), with further random assignment within the trained-AI group to the specific AI assistant version.
Randomization Unit
Randomization is conducted at the individual participant (teacher) level. Each teacher is independently randomized to the student-name condition and the AI-access condition (and, if assigned to trained AI, to the specific assistant version). There is no cluster-level randomization (e.g., school-level or session-level).
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
N/A
Sample size: planned number of observations
3,000 teachers (individual teacher participants).
Sample size (or number of clusters) by treatment arms
1. No AI (script only): 1,000 teachers
2. Untrained AI chatbot: 1,000 teachers
3. Trained AI assistant: 1,000 teachers. within this arm: “Fair” trained AI: 500 teachers; “Harsh” trained AI: 500 teachers
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Monash University
IRB Approval Date
2025-12-18
IRB Approval Number
50505