Scalable Mentoring for Teachers: Evidence from an AI Intervention

Last registered on March 10, 2026

View Trial History

Pre-Trial

Trial Information

General Information

Title

Scalable Mentoring for Teachers: Evidence from an AI Intervention

RCT ID

AEARCTR-0017934

Initial registration date

March 03, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

March 10, 2026, 10:13 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Elizabeth Jaramillo-Rojas

Affiliation

Northwestern University & Inter-American Development Bank

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Gregory Elacqua

PI Affiliation

Inter-American Development Bank

Contact Investigator

PI Name

Ana Teresa Del Toro

PI Affiliation

Inter-American Development Bank

Contact Investigator

PI Name

Catalina Hermosilla

PI Affiliation

Inter-American Development Bank

Contact Investigator

Additional Trial Information

Status

In development

Start date

2026-03-02

End date

2026-12-20

Keywords

Education

Additional Keywords

generative AI, mentoring, teachers, education policy

JEL code(s)

I21, J24, O33

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

Teacher attrition is high during the first years of the profession, when novice teachers face steep learning curves and limited access to timely instructional support. Intensive coaching interventions have been shown to improve teaching practices, but they are costly and difficult to scale within public education systems. We study whether generative artificial intelligence can provide scalable professional support to early-career teachers. We evaluate Profe Gabi, an AI-powered mentoring chatbot delivered via WhatsApp, in a randomized controlled trial with approximately 9,000 novice teachers in Chile. Teachers assigned to treatment receive continuous, on-demand pedagogical guidance throughout the school year. Using administrative and survey data, we examine impacts on teacher retention, instructional practices, and professional development.

External Link(s)

Registration Citation

Citation

Del Toro, Ana Teresa et al. 2026. "Scalable Mentoring for Teachers: Evidence from an AI Intervention." AEA RCT Registry. March 10. https://doi.org/10.1257/rct.17934-1.0

Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Experimental Details

Interventions

Intervention(s)

Teacher quality is among the most powerful determinants of student learning, yet education systems struggle to develop and retain effective teachers, particularly in the early years of the profession. In countries like the United States, the United Kingdom, and Canada, nearly 40% of teachers leave within their first five years (UNESCO, 2024), a pattern driven by high workloads, inadequate support systems, and limited teaching experience (Hanushek et al., 2004; Boyd et al., 2008; Carver-Thomas and Darling-Hammond, 2019). The costs are substantial: attrition disrupts instructional continuity, lowers student achievement, and forces repeated investments in recruitment and training (Ronfeldt et al., 2013). In the United States alone, states spend an estimated $1-2.2 billion annually replacing teachers who exit the profession (Alliance for Excellent Education, 2014).

Induction and mentoring programs to support novice teachers have been widely adopted as a response, and the evidence on their effectiveness is promising (Keese et al., 2023; Naylor et al., 2019; Elacqua et al., 2018). But these programs are prohibitively expensive to scale, requiring sustained mentor involvement and ongoing monitoring. In Chile, where 16% of novice teachers leave the profession within their first year, induction and mentoring were institutionalized under Law 20.903. In 2017, the Ministry of Education launched the National Induction and Mentoring System, which pairs first- and second-year teachers with experienced mentors throughout the school year. Participants report high satisfaction with the mentoring experience, citing increased teaching confidence and stronger professional relationships (UNDP, 2023). Yet, due to institutional capacity constraints and high implementation costs, the program reaches fewer than 1% of eligible novice teachers annually, leaving the vast majority without structured support during a critical professional period.

Profe Gabi was designed to address this gap, leveraging AI to provide personalized, on-demand mentoring that can reach all novice teachers at a fraction of the cost of traditional programs. Profe Gabi is an AI-powered mentoring chatbot delivered via WhatsApp, developed jointly by the Chilean Ministry of Education (MINEDUC/CPEIP) and the Inter-American Development Bank (IDB). The tool provides novice teachers in publicly funded Chilean schools with continuous, on-demand pedagogical and socioemotional support throughout the school year. Teachers can ask for help with lesson planning, classroom management, assessment design, administrative tasks, and professional well-being at any time. The chatbot draws on official curricular and pedagogical materials as well as Chile's National Induction and Mentoring System framework, and sends weekly messages to encourage engagement. We evaluate the effectiveness of the tool through a school-level randomized controlled trial with approximately 9,000 first- and second-year teachers across two hiring cohorts (2025 and 2026).

Intervention Start Date

2026-03-02

Intervention End Date

2026-12-20

Primary Outcomes

Primary Outcomes (end points)

-Teacher attrition from the Chilean education system
-Teacher attrition from the publicly funded system
-Teacher turnover
-Exit from the classroom

Primary Outcomes (explanation)

Outcomes are measured using national administrative records from the Chilean Ministry of Education that cover the universe of teachers employed in Chile. These outcomes reflect the core challenges novice teachers face during their initial years, challenges that contribute to high turnover and reduced effectiveness and that traditional support systems have struggled to address at scale.

Teacher attrition from the Chilean education system is defined as a binary indicator equal to 1 if the teacher is not observed in any teaching role in any school in Chile, public municipal, private subsidized, or private, in March 2027, the start of the Chilean school year. This captures whether the teacher left the teaching profession.

Teacher attrition from the publicly funded system is defined as a binary indicator equal to 1 if the teacher is not observed in any public municipal or private subsidized school in March 2027. A teacher who moved to a private fee-paying school would take a value of 1 on this measure but 0 on the broader attrition measure above, allowing us to distinguish between exits from the profession and exits from the publicly funded sector specifically.

Teacher turnover is defined as a binary indicator equal to 1 if the teacher is not observed in the same school in March 2027 relative to their school of employment at the start of the 2026 school year. This includes both teachers who left the system entirely and those who switched to a different school.

Exit from the classroom is defined as a binary indicator equal to 1 if the teacher is observed in a school in March 2027 but is no longer recorded in a teaching role, having transitioned to an administrative or non-teaching position. This captures a form of attrition that represents a loss of classroom capacity.

Together, these four outcomes capture different dimensions of workforce instability among early-career teachers. Attrition from the Chilean education system reflects permanent exits from the profession, the most severe form of teacher loss. Attrition from the publicly funded system is particularly relevant from a policy standpoint: the government's ability to design interventions, allocate resources, and deploy tools like Profe Gabi operates within public municipal and private subsidized schools, and it is within this sector that the costs of teacher attrition, training investments, and disrupted continuity, fall on the public budget. A teacher who moves to a private fee-paying school represents a different kind of loss than one who leaves the profession entirely, and distinguishing between the two has direct implications for targeting, scaling, and cost-benefit analysis of the intervention. Turnover captures a broader form of disruption that attrition measures alone would miss: even teachers who remain in the profession impose costs on students and schools when they switch, breaking instructional continuity and preventing the accumulation of school-specific knowledge and experience. Exit from the classroom captures the quietest form of attrition, teachers who remain employed in the school system but step away from teaching roles, a loss of classroom capacity that would be invisible in standard retention statistics. Attrition measures are defined over a one-year horizon, capturing absence in the subsequent academic year and not distinguishing between permanent exits and temporary interruptions.

Secondary Outcomes

Secondary Outcomes (end points)

-Instructional self-efficacy
-Socio-emotional well-being and professional development
-Teacher well-being
-Intention to remain in teaching
-Student outcomes
-Use and perception of AI tools for teaching

Secondary Outcomes (explanation)

Teacher-level secondary outcomes are measured via a survey administered at baseline and endline through WhatsApp. The survey captures information on (1) teachers’ current use and perception of AI tools for teaching; (2) teachers’ knowledge and self-reported capacity on topics related to pedagogical practice, socioemotional well-being and career development; (3) teachers’ well-being; (4) teachers’ use of time for lesson planning and other preparations for their teaching; and (5) teachers’ intention to remain in the teaching profession.
Use and perception of AI tools for teaching is measured through five survey items. The first two capture the frequency with which teachers use AI tools such as ChatGPT, Gemini, or Copilot, both generally and then specifically for their teaching work. The third captures the range of teaching tasks for which teachers use AI, including lesson planning, assessment design, classroom management, and administrative tasks, among others. The fourth and fifth items capture teachers' perceived usefulness of AI tools for their teaching work and their self-assessed capacity to use these tools effectively.
Teachers’ knowledge and self-reported capacity on topics related to pedagogical practice, socioemotional well-being and career development are captured through a total of nine items: three items cover pedagogical topics; three items cover socioemotional wellbeing topics; and three items cover professional and career development. These items correspond to topics that Profe Gabi has been trained and designed to cover through its weekly message to participating teachers. The three items covering topics related to pedagogical practice have been developed building on the Teachers' Sense of Efficacy Scale (TSES, Tschannen-Moran and Hoy, 2001). The items cover lesson planning aligned to national learning objectives and curricular standards, differentiation of learning activities across student needs and learning paces, and design of assessment instruments such as rubrics and scoring guides. Socio-emotional well-being and professional development is measured using 6 items rated on the same 7-point scale. Three items capture self-reported capacity to manage work-related stress, maintain a balance between professional responsibilities and self-care, and reconnect with professional motivation during difficult moments. Three additional items measure self-reported knowledge of the Chilean teaching career structure, the national teacher performance evaluation system (Law 21.625), and available pathways for continuing professional development and training. All 9 items are rated on a 7-point scale ranging from 1 (Strongly disagree) to 7 (Strongly agree).
Teacher well-being is measured using the Teacher Subjective Wellbeing Questionnaire (TSWQ; Renshaw, Long, and Cook, 2015), 8-item self-report tool designed to assess teachers’ work-related wellbeing. It incorporates two dimensions: school connectedness (4 items) and teaching efficacy (4 items). Items are rated on a 4-point frequency scale ranging from 1 (almost never) to 4 (almost always).
Teachers’ use of time for lesson planning and other preparations for their teaching is captured through a single survey item asking teachers to select the range of hours they dedicate to these activities during a typical work week with no holidays or leave.
Intention to remain in teaching is captured through a single survey item asking teachers about their most likely professional situation in the following year. Response options range from continuing at the same school, continuing at a different school, remaining in education but outside the classroom, leaving to study, or leaving the education system entirely.
Student outcomes are measured using linked administrative records from the Chilean Ministry of Education, including end-of-year academic achievement, grade progression and repetition, and dropout from the school system. For students in tested grades, we measure performance on the SIMCE national standardized assessments. Student outcomes are aggregated at the classroom level and linked to teachers through administrative records.

Experimental Design

We evaluate Profe Gabi through a school-level randomized controlled trial with two arms: treatment and control. Eligible schools are all public municipal (or Local Education Service) and private subsidized schools in Chile that hired at least one first- or second-year teacher in 2025 or 2026. Teachers in early childhood education, special education, and adult education are excluded. Schools are divided into two mutually exclusive cohorts based on hiring year. Cohort 1 includes schools that hired at least one first-year teacher in 2025. Cohort 2 includes schools that hired at least one first-year teacher in 2026 and were not part of Cohort 1.

Within each cohort, schools are randomly assigned to treatment (60%) or control (40%). In treatment schools, all first- and second-year teachers identified in administrative records are invited to access Profe Gabi via a personalized WhatsApp message. Teachers in control schools do not receive access to the chatbot during the study period. The study includes approximately 9,000 novice teachers across both cohorts.

Novice teachers are identified using administrative records from the Centro de Perfeccionamiento, Experimentación e Investigaciones Pedagógicas (CPEIP) in the Chilean Ministry of Education. These records capture all teachers employed across all school types and are updated throughout the year. First-year teachers are identified as those appearing in the administrative system for the first time in a given year. Second-year teachers are identified as those whose first year of recorded employment was the preceding year. Teachers are recruited by sending a personalized WhatsApp message to their registered phone number, using contact information drawn from the same administrative records. The rollout is staggered to reflect the timing of administrative data availability. The first wave begins in March 2026 with second-year teachers in Cohort 1, the only group for whom records are already available at that point. The second wave begins in May 2026, once updated hiring records become available, and incorporates first-year teachers in both cohorts and second-year teachers in Cohort 2. As a result, second-year teachers in Cohort 1 receive approximately 35 weeks of access to Profe Gabi, while all other groups receive approximately 28 weeks.

Schools in each cohort are randomly assigned to treatment (60%) or control (40%). The 60/40 allocation favoring treatment reflects two considerations. First, since Chile's National Induction and Mentoring System reaches less than 1% of novice teachers annually, assigning a larger share of schools to treatment reduces the number of early-career teachers left without any form of structured support during the study period. Second, given the low take-up rates observed in the pilot study, assigning more schools to treatment increases the expected number of compliers, improving the precision of the LATE estimates.

The main analysis estimates intention-to-treat (ITT) effects using OLS regressions with standard errors clustered at the school level. The baseline specification pools both cohorts and regresses each outcome on an indicator for treatment assignment, controlling for a pre-specified set of baseline covariates drawn from administrative records, including gender, age, contract type, contract hours, teaching hours, subject taught, school level taught, quality of teacher training institution, scores on the university admissions test (PSU/PAES), scores on the Evaluación Nacional Diagnóstica (a standardized exam administered during the final year of initial teacher education programs), and an indicator for whether the teacher received in-person mentoring through the National Induction and Mentoring System.. School-level covariates include school type (public municipal or Local Education Service vs. private subsidized), geographic location (Metropolitan Region of Santiago vs. other regions), school setting (urban vs. rural), school vulnerability index (IVE), school academic performance (SIMCE scores), school size measured by total enrollment and number of teachers, and number of novice teachers in the school. . A second specification interacts treatment assignment with an indicator for second-year teachers to allow effects to vary by career stage. Separate regressions are also estimated by cohort.

Control teachers cannot access Profe Gabi without a direct personalized invitation, so we expect full-compliance on the control side. On the treatment side, we anticipate partial compliance, since some teachers may not engage with the tool despite receiving the invitation. Due to imperfect compliance, we will also estimate the Local Average Treatment Effect (LATE) by two-stage least squares (2SLS) using treatment assignment (receipt of a personalized WhatsApp invitation to access Profe Gabi) as an instrument for take-up. We consider two definitions of take-up. The first, the extensive margin, defines take-up as a binary indicator equal to 1 if the teacher sent at least one message to the chatbot beyond the initial onboarding message. This captures whether the teacher engaged with the tool. The second, the intensive margin, uses the total number of messages sent to Profe Gabi as a continuous measure of engagement, capturing the average effect of an additional message across the observed distribution of use. We acknowledge that this specification assumes a linear relationship between the number of messages sent and outcomes, which may not hold in practice given that the returns to engagement are likely diminishing. To assess the robustness of this assumption, we complement the continuous specification with a set of 2SLS regressions using binary indicators for whether the teacher crossed pre-specified engagement thresholds: at least 2 messages, at least 4 messages, and at least 8 messages . These thresholds are informed by engagement patterns observed in a pilot study conducted prior to the RCT, in which 35.9% of teachers used the tool only once, 39.2% sent between 2 and 3 messages, 19.4% sent between 4 and 7 messages, and 5.5% sent 8 or more messages, and distinguish between teachers who engaged minimally, those who used the tool on at least a few occasions, and those who sustained engagement over time. We note that the pilot was conducted with volunteer teachers, who are likely more motivated than the average novice teacher in the RCT, and engagement rates in the RCT may therefore be lower. If the distribution of engagement in the RCT differs from that observed in the pilot, additional thresholds may be defined based on the realized distribution of usage to capture meaningful variation in engagement intensity. All specifications are instrumented by treatment assignment and estimate the effect of the intervention among compliers, that is, teachers who reached each level of engagement because they received the invitation.

We address potential violations of SUTVA through three strategies.

First, the intervention is delivered individually via personalized WhatsApp invitations. Control teachers cannot access Profe Gabi without a direct invitation. Even if they hear about the tool from colleagues, they cannot use it, which limits the scope for spillovers.

Second, we use administrative data to identify teachers who transfer from a treatment school to a control school during the study period. We assess whether the presence of these transferred teachers is associated with changes in outcomes among other teachers in the receiving control schools, by comparing outcomes in control schools that received a transfer from a treatment school against outcomes in control schools that did not receive any such transfers. If control schools that received transferred teachers show systematically better outcomes, this suggests that transferred teachers brought knowledge or practices from Profe Gabi into the control group through peer diffusion. We also conduct robustness checks excluding control schools that received these transfers from the main analysis.

Third, the endline survey includes items asking all teachers, regardless of treatment assignment, whether they have heard about or used Profe Gabi. Among control group teachers, we examine whether reported exposure to the tool is associated with changes in outcomes from baseline to endline. Because we have baseline survey measures of the same outcomes collected prior to the intervention, we can difference out time-invariant characteristics. This analysis is still descriptive given selection concerns, but it serves as a useful bound: if control teachers with and without reported exposure show similar changes from baseline to endline, that is reassuring evidence that spillovers are not a major concern.

To the extent that spillovers remain after these checks, they are expected to bias ITT estimates toward zero, making our estimates conservative relative to the true effect of the intervention.

Finally, in addition to cross-school spillovers, we study within-school spillovers in treatment schools by studying whether more experienced teachers who were not invited to use Profe Gabi show changes in outcomes relative to their counterparts in control schools.

Experimental Design Details

Not available

Randomization Method

Randomization is conducted by computer using a random number generator in Stata, separately within each cohort. Schools are stratified by region and school type (public municipal vs. private subsidized) prior to randomization to ensure balance across treatment and control groups on these key characteristics.

Randomization Unit

The unit of randomization is the school. All first- and second-year teachers employed in a given school are assigned to the same treatment condition as their school.

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

Randomization is conducted separately within each cohort. For Cohort 1, which includes schools that hired at least one first-year teacher in 2025, the total number of eligible schools is 2,607. For Cohort 2, which includes schools that hired at least one first-year teacher in 2026 and were not part of Cohort 1, the number of eligible schools is not yet known, as administrative hiring records for 2026 will be available in April 2026. Within Cohort 1, 1,562 schools are assigned to the treatment group (60%) and 1,045 schools are assigned to the control group (40%). The total planned number of clusters across both cohorts will be updated once 2026 hiring records become available in April 2026.

Sample size: planned number of observations

The study population consists of first- and second-year teachers employed in eligible schools across two cohorts. For Cohort 1 (schools that hired at least one first-year teacher in 2025), second-year teachers in 2026 are observed in current administrative records and total 4,591 teachers. First-year teachers in Cohort 1 schools, meaning teachers newly hired in 2026 in those same schools, will be observed in April 2026 when updated hiring records become available. For Cohort 2 (schools that hired at least one first-year teacher in 2026, excluding Cohort 1 schools), both first- and second-year teachers will be identified once the 2026 hiring records become available in April 2026. The total planned number of observations across both cohorts is approximately 9,000 novice teachers, pending confirmation of 2026 hiring data.

Sample size (or number of clusters) by treatment arms

The trial has two arms: treatment and control. Within each cohort, 60% of eligible schools are assigned to the treatment arm and 40% are assigned to the control arm.

For Cohort 1, 1,562 schools are assigned to the treatment arm and 1,045 schools are assigned to the control arm. For Cohort 2, the number of schools by treatment arm will be determined once 2026 hiring records become available in April 2026.

In treatment schools, all first- and second-year teachers are invited to access Profe Gabi. Teachers in control schools do not receive access to the chatbot during the study period.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Power calculations are based on the Cohort 1 sample of second-year teachers in 2026, since this is the only group fully observed in administrative records at the time of registration. These figures are therefore conservative. The pooled sample across both cohorts and all teacher types will be larger, and statistical power for the main analysis will be higher than reported here. Cohort 1 includes 4,591 second-year teachers distributed across 2,607 schools, randomly assigned to treatment (60%) and control (40%). Calculations assume an intracluster correlation coefficient (ICC) of 0.10 and a significance level of 0.05. Under these assumptions and with an average of 1.76 teachers per school, the study is powered at 80% to detect a minimum detectable effect (MDE) of 2.79 percentage points in teacher attrition from the Chilean education system (baseline: 11.5%), 2.88 percentage points in teacher attrition from the publicly funded system (baseline: 12.4%), 3.62 percentage points in teacher turnover (baseline: 21.9%), and 1.16 percentage points in exit from the classroom (baseline: 1.8%). All baseline rates are drawn from historical administrative records (2021-2024) for second-year teachers. Power calculations for first-year teachers in Cohort 1 and all teachers in Cohort 2 will be updated once 2026 hiring records become available in April 2026.

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Health Media Lab Institutional Review Board

IRB Approval Date

2025-04-30

IRB Approval Number

HML IRB #2850

Analysis Plan