Abstract
Artificial intelligence (AI) tools are increasingly used in education, including for providing feedback and supporting assessment. This randomized experiment studies how access to an AI chatbot affects teachers’ grading decisions and potential biases when marking real student work. Teachers recruited to an online Qualtrics study are each assigned one authentic student script consisting of a question prompt and handwritten responses. The experiment randomizes two dimensions: (i) the name attached to the student work, which serves as a signal of student characteristics, and (ii) whether teachers have access to an AI assistant while marking. One-third of teachers mark the script without AI support, one-third receive access to an untrained AI chatbot, and one-third receive access to a trained AI assistant that is randomly designed to provide either systematically “fair” or systematically “harsh” guidance.
The study has two primary objectives. First, it estimates the extent to which student name cues affect teachers’ evaluations and grades for identical work. Second, it tests whether AI support changes teachers’ perceptions, grading behavior, and potential bias, and whether trained AI guidance can reduce grading errors or attenuate name-based disparities relative to both no-AI grading and untrained AI access. Follow-up survey measures collect information on teachers’ perceptions of the script, confidence, decision-making process, and use of the AI tool, allowing exploration of mechanisms underlying any observed effects. The study contributes to evidence on how AI tools interact with human judgment and bias in high-stakes educational evaluation.