AI in the Classroom: Evaluating the Impact of Teacher Training on Practices and Student Outcomes

Last registered on July 07, 2025

Pre-Trial

Trial Information

General Information

Title
AI in the Classroom: Evaluating the Impact of Teacher Training on Practices and Student Outcomes
RCT ID
AEARCTR-0016336
Initial registration date
July 04, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 07, 2025, 3:16 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
World Bank

Other Primary Investigator(s)

PI Affiliation
University of Toronto
PI Affiliation
World Bank

Additional Trial Information

Status
On going
Start date
2025-01-01
End date
2026-12-01
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Can artificial intelligence (AI) improve teaching practices and student learning in developing countries? This study evaluates whether training primary school teachers in Peru to integrate Large Language Models (LLMs) into their daily work can enhance instructional quality and student outcomes. We conduct a large-scale, multi-arm, field experiment across public schools in Lima, Peru. The main intervention is a structured teacher training program focused on the use of AI for educational practices. In addition to the primary treatment, the study includes two cross-randomized variations. The first compares two modes of ongoing support—peer-based networks versus authority-led guidance—to assess which approach better sustains AI adoption. The second introduces a complementary training that emphasizes tailoring instruction to students’ learning levels, following “Teaching at the Right Level” (TaRL) principles. By examining both implementation and impacts on teaching practices and student learning, this study provides timely evidence on the potential for AI to equitably and effectively support educators and learners in low- and middle-income contexts.
External Link(s)

Registration Citation

Citation
Lopez, Carolina, Ezequiel Molina and Roman Andres Zarate. 2025. "AI in the Classroom: Evaluating the Impact of Teacher Training on Practices and Student Outcomes." AEA RCT Registry. July 07. https://doi.org/10.1257/rct.16336-1.0
Experimental Details

Interventions

Intervention(s)
The intervention evaluates a teacher training program designed to integrate Large Language Models (LLMs) into educational practices. The program has been developed in close collaboration with local teachers and is tailored to the Peruvian context to ensure relevance and practical application.

The training begins with an initial eight-hour in-person session covering:
1. AI literacy fundamentals
2. Principles of responsible AI use
3. Hands-on practice with real-world classroom scenarios. These scenarios, developed through a teacher innovation contest in Lima, include:
3.1. Adapting lesson plans for diverse student levels
3.2. Creating engaging learning materials in Spanish
3.3. Designing formative assessments aligned with Peru’s curriculum
3.4. Drafting effective parent communications
3.5. Streamlining administrative tasks

In addition to the main intervention, the program will consider the following subcomponents:
1. Cross-randomized support structures: To sustain and reinforce the program’s impact, the intervention explores the effectiveness of two types of ongoing support:
1.1. Peer-based support: Teachers are organized into WhatsApp groups moderated by teachers who encourage weekly participation.
1.2. Authority-led support: Similar WhatsApp groups are monitored by education supervisors designated by the regional education authority (DRELM).

This nine-month continuous reinforcement phase includes regular check-ins, troubleshooting sessions, and active documentation of teacher experiences. Real-time barriers to adoption are identified and addressed to ensure the program’s effectiveness and scalability.

2. Teaching at the Right Level (TaRL) subtreatment: A second cross-randomized component focuses on differentiated instruction. A subset of teachers will receive additional online training in Teaching at the Right Level (TaRL) principles, emphasizing how to use AI tools to tailor instruction to students’ starting learning levels and promote individualized support. The comparison group will receive general AI training without this targeted emphasis on instructional differentiation.
Intervention Start Date
2025-02-03
Intervention End Date
2025-12-01

Primary Outcomes

Primary Outcomes (end points)
1. Teachers’ time use and their practices both inside and outside the classroom.
2. Students' standardized scores in math and reading comprehension for 6th grader students.
Primary Outcomes (explanation)
Our implementation includes a comprehensive data collection strategy to measure both final and intermediate outcomes for students and teachers, providing insights into the mechanisms driving potential impacts. We will collect data through:
1. Baseline and endline surveys administered to students and teachers.
2. Administrative records provided by the Dirección Regional de Educación de Lima Metropolitana (DRELM), including student academic records, attendance, and standardized test scores.

Primary Student Outcomes: standardized test scores in math and reading comprehension (using national or regionally validated assessments).

Primary Teacher Outcomes:
1. Teaching practices inside the classroom, evaluated using structured classroom observations.

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Student Outcomes:
1. Student academic progress based on administrative records.

Teacher secondary outcomes:
1. AI engagement metrics, including frequency of use and types of AI tools adopted, potentially complemented by anonymized usage data from Microsoft accounts (subject to final data-sharing agreements).
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The randomization was conducted in three phases, all stratified by district to ensure balance across treatment arms.

Phase 1: Network-Level Randomization
A network consists of approximately six geographically proximate schools that share a supervisor and are grouped for administrative purposes by DRELM. To avoid spillovers within networks, 143 networks were assigned to the pure control group, while 84 networks were selected for the school-level randomization.

Phase 2: School-Level Randomization
Within the selected networks, 390 schools were randomized at the school level, stratified by district and by prior performance (high- or low-performing). A total of 196 schools were assigned to the main treatment group, AI in person training for teachers, and 194 to the control group.

Phase 3: Cross-Randomized Sub-Treatments
Two sub-treatment arms were randomized at the school level:
Support Structure: Schools were assigned to either a peer-based network or authority-led guidance. In the peer-based arm, a teacher will monitor and encourage weekly participation among their peers. In the authority-led arm, this role is taken by a supervisor designated by the DRELM. A total of 97 and 99 schools were assigned to the peer- and authority-led treatment arms, respectively.

Teaching at the Right Level (TaRL): One group of teachers receives additional online training focused on the role of AI tools in tailoring instruction to students’ initial learning levels, promoting differentiated instruction based on students’ diverse abilities. The comparison group receives general AI training for classroom practices, with less emphasis on adapting content to individual learning levels. A total of 97 and 99 schools were assigned to the general vs. TaRL treatment arms, respectively.

This multi-level, stratified randomization strategy is designed to rigorously evaluate both the overall effect of AI training and the relative effectiveness of alternative support and instructional approaches.
Experimental Design Details
Not available
Randomization Method
The randomization was conducted using a computer. Each phase of the randomization process was executed 100 times, and we selected the randomization iteration that achieved the best balance across treatment arms using a max-min p-value criterion. This method maximizes the smallest p-value across covariates to ensure the best possible balance on observable characteristics.

Randomization was stratified by district and conducted in three phases:
1. Network-Level Randomization: Entire school networks were randomly assigned to either the pure control group or the eligible treatment pool to minimize spillovers within networks.
2. School-Level Randomization: Within selected networks, schools were randomly assigned to the main treatment (AI training) or the control group.
3. Cross-Randomized Sub-Treatments: Schools in the treatment group were further randomized into:
3.1. Peer-based versus authority-led support structures.
3.2. General AI training versus Teaching at the Right Level (TaRL)–focused training.
This multi-level, stratified randomization strategy ensures rigorous internal validity and balance across treatment arms.
Randomization Unit
The randomization unit was the school.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
1. 143 networks of 4.7 schools on average: 84 networks are part of the sample and 59 networks are pure control.--
2. Within the 84 networks of the same, there is a total of 390 schools.
Sample size: planned number of observations
We plan to have around three teachers per school for a total of 1,170 teachers, and around 90 students per school for a total of 35,100 students.
Sample size (or number of clusters) by treatment arms
The 390 schools were classified as follows:
1. 194 control schools
2. 196 treated schools:
2.1. 47 schools assigned to peers and general.
2.2. 49 schools assigned to peers and tatrl.
2.3. 50 schools assigned to authority and general.
2.4. 50 schools assigned to authority and tatrl.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
The study is powered to detect minimum effect sizes that are meaningful for educational outcomes. Power calculations were conducted assuming a residual variance of 0.8 and an intra-cluster correlation (ICC) of 0.10, which is based on administrative data from Peruvian schools. The design assumes an average of 3 teachers and 90 students per school, with 196 treated schools and 194 control schools, for a total of 390 schools. -The MDE of the general treatment effect is 0.0753σ for student outcomes and 0.1436σ for teacher outcomes. -The MDE for the two sub-treatment arms, assuming 97 schools in one and 99 schools in the other, is 0.1062σ for student outcomes and 0.2025σ for teacher outcomes.
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Toronto
IRB Approval Date
2025-06-30
IRB Approval Number
48010