Expert vs. AI Feedback: Evidence from a Large-Scale Math Experiment

Last registered on December 01, 2025

Pre-Trial

Trial Information

General Information

Title
Expert vs. AI Feedback: Evidence from a Large-Scale Math Experiment
RCT ID
AEARCTR-0017364
Initial registration date
November 28, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
December 01, 2025, 11:31 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
UNIVERSITÀ COMMERCIALE LUIGI BOCCONI

Other Primary Investigator(s)

PI Affiliation
IDB
PI Affiliation
GRADE
PI Affiliation
Northwestern University

Additional Trial Information

Status
In development
Start date
2025-12-01
End date
2025-12-07
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We evaluate how different forms of feedback delivered through a large-scale digital math platform affect the learning outcomes of primary school students in public schools in Peru. The platform “Apprendemos” currently provides weekly math activities consisting of 30 exercises to students across Peru in grades 3–6. In a one-week experiment, students will be individually randomized into one of three feedback conditions when completing the 30-item exercises: (i) status quo feedback indicating only whether an answer is correct or incorrect; (ii) expert-based feedback which provides explanations designed by a qualified team and that vary depending on the choice selected by the student; or (iii) AI-generated feedback, which provides explanations produced using a generative AI chatbot. For all students, items 1–5 will serve as a pre-test and items 26–30 as a post-test, while items 6–25 will be used to deliver the differentiated feedback.
External Link(s)

Registration Citation

Citation
Aulagnon, Raphaelle et al. 2025. "Expert vs. AI Feedback: Evidence from a Large-Scale Math Experiment ." AEA RCT Registry. December 01. https://doi.org/10.1257/rct.17364-1.0
Experimental Details

Interventions

Intervention(s)
The intervention modifies the feedback that students receive while completing a 30-exercise math activity on the Apprendemos platform.
Control – Status quo Apprendemos feedback
For all 30 items, students receive the standard platform feedback, which simply indicates whether an answer is correct or incorrect.
Treatment 1 – Expert-based feedback
Items 1–5: standard Apprendemos feedback.
Items 6–25: rule-based feedback. Messages depend on the answers chosen for that particular item and are pre-specified by content experts.
Items 26–30: standard Apprendemos feedback.
Treatment 2 – AI feedback
Items 1–5: standard Apprendemos feedback.
Items 6–25: AI-generated feedback based on the student’s current response and past responses, produced via a generative AI model integrated into the platform.
Items 26–30: standard Apprendemos feedback.

Within the two treatment arms, feedback messages have the same structure:
If the student answers correctly on the first or second attempt, they receive the same “correct” message as in the status quo Apprendemos feedback (kept constant across arms).
On the first incorrect attempt, the student receives a hint. On the second incorrect attempt, the student receives an explanation and the correct answer.
Messages combine pedagogical content (e.g., explanations of the relevant math concept or procedure) and motivational content (e.g., encouragement to keep trying).
The intervention is implemented in collaboration with the Inter-American Development Bank (IDB), GRADE, and Apprendemos, with software development provided by the firm Pyxis to enable the three feedback modalities.
Intervention (Hidden)
Intervention Start Date
2025-12-01
Intervention End Date
2025-12-07

Primary Outcomes

Primary Outcomes (end points)
Academic achievement in the post test
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experiment is embedded in the regular operation of the Apprendemos digital math platform for public primary schools in Peru.
The intervention consists of a single 30-exercise math activity delivered during one week.
Students are individually randomized into three groups of Control, Expert-Based and AI-Based Feedback.
The sample of users selected to participate in the experiment includes those that:
a. Do not take part in other special interventions in the APPrendemos program
b. Connected at least once in the preceding year
c. Use the app in Online mode (since this is necessary to view the generative AI feedback displayed).

Experimental Design Details
Randomization Method
Randomization was done using Stata on the sample selected (as mentioned above).
a. Sample of 43,003 students meeting the requirements
b. We randomly remove one student to have equally balanced group
c. We divide the sample into three equal groups with no stratification design (14,334 each)
Randomization Unit
Student
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
43,003 students
Sample size: planned number of observations
43,003 students
Sample size (or number of clusters) by treatment arms
14,334 control, 14,334 expert, 14,334 AI
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
ViaLibre
IRB Approval Date
2025-10-25
IRB Approval Number
13225

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials