Experimental Design
We evaluate Profe Gabi through a school-level randomized controlled trial with two arms: treatment and control. Eligible schools are all public municipal (or Local Education Service) and private subsidized schools in Chile that hired at least one first- or second-year teacher in 2025 or 2026. Teachers in early childhood education, special education, and adult education are excluded. Schools are divided into two mutually exclusive cohorts based on hiring year. Cohort 1 includes schools that hired at least one first-year teacher in 2025. Cohort 2 includes schools that hired at least one first-year teacher in 2026 and were not part of Cohort 1.
Within each cohort, schools are randomly assigned to treatment (60%) or control (40%). In treatment schools, all first- and second-year teachers identified in administrative records are invited to access Profe Gabi via a personalized WhatsApp message. Teachers in control schools do not receive access to the chatbot during the study period. The study includes approximately 9,000 novice teachers across both cohorts.
Novice teachers are identified using administrative records from the Centro de Perfeccionamiento, Experimentación e Investigaciones Pedagógicas (CPEIP) in the Chilean Ministry of Education. These records capture all teachers employed across all school types and are updated throughout the year. First-year teachers are identified as those appearing in the administrative system for the first time in a given year. Second-year teachers are identified as those whose first year of recorded employment was the preceding year. Teachers are recruited by sending a personalized WhatsApp message to their registered phone number, using contact information drawn from the same administrative records. The rollout is staggered to reflect the timing of administrative data availability. The first wave begins in March 2026 with second-year teachers in Cohort 1, the only group for whom records are already available at that point. The second wave begins in May 2026, once updated hiring records become available, and incorporates first-year teachers in both cohorts and second-year teachers in Cohort 2. As a result, second-year teachers in Cohort 1 receive approximately 35 weeks of access to Profe Gabi, while all other groups receive approximately 28 weeks.
Schools in each cohort are randomly assigned to treatment (60%) or control (40%). The 60/40 allocation favoring treatment reflects two considerations. First, since Chile's National Induction and Mentoring System reaches less than 1% of novice teachers annually, assigning a larger share of schools to treatment reduces the number of early-career teachers left without any form of structured support during the study period. Second, given the low take-up rates observed in the pilot study, assigning more schools to treatment increases the expected number of compliers, improving the precision of the LATE estimates.
The main analysis estimates intention-to-treat (ITT) effects using OLS regressions with standard errors clustered at the school level. The baseline specification pools both cohorts and regresses each outcome on an indicator for treatment assignment, controlling for a pre-specified set of baseline covariates drawn from administrative records, including gender, age, contract type, contract hours, teaching hours, subject taught, school level taught, quality of teacher training institution, scores on the university admissions test (PSU/PAES), scores on the Evaluación Nacional Diagnóstica (a standardized exam administered during the final year of initial teacher education programs), and an indicator for whether the teacher received in-person mentoring through the National Induction and Mentoring System.. School-level covariates include school type (public municipal or Local Education Service vs. private subsidized), geographic location (Metropolitan Region of Santiago vs. other regions), school setting (urban vs. rural), school vulnerability index (IVE), school academic performance (SIMCE scores), school size measured by total enrollment and number of teachers, and number of novice teachers in the school. . A second specification interacts treatment assignment with an indicator for second-year teachers to allow effects to vary by career stage. Separate regressions are also estimated by cohort.
Control teachers cannot access Profe Gabi without a direct personalized invitation, so we expect full-compliance on the control side. On the treatment side, we anticipate partial compliance, since some teachers may not engage with the tool despite receiving the invitation. Due to imperfect compliance, we will also estimate the Local Average Treatment Effect (LATE) by two-stage least squares (2SLS) using treatment assignment (receipt of a personalized WhatsApp invitation to access Profe Gabi) as an instrument for take-up. We consider two definitions of take-up. The first, the extensive margin, defines take-up as a binary indicator equal to 1 if the teacher sent at least one message to the chatbot beyond the initial onboarding message. This captures whether the teacher engaged with the tool. The second, the intensive margin, uses the total number of messages sent to Profe Gabi as a continuous measure of engagement, capturing the average effect of an additional message across the observed distribution of use. We acknowledge that this specification assumes a linear relationship between the number of messages sent and outcomes, which may not hold in practice given that the returns to engagement are likely diminishing. To assess the robustness of this assumption, we complement the continuous specification with a set of 2SLS regressions using binary indicators for whether the teacher crossed pre-specified engagement thresholds: at least 2 messages, at least 4 messages, and at least 8 messages . These thresholds are informed by engagement patterns observed in a pilot study conducted prior to the RCT, in which 35.9% of teachers used the tool only once, 39.2% sent between 2 and 3 messages, 19.4% sent between 4 and 7 messages, and 5.5% sent 8 or more messages, and distinguish between teachers who engaged minimally, those who used the tool on at least a few occasions, and those who sustained engagement over time. We note that the pilot was conducted with volunteer teachers, who are likely more motivated than the average novice teacher in the RCT, and engagement rates in the RCT may therefore be lower. If the distribution of engagement in the RCT differs from that observed in the pilot, additional thresholds may be defined based on the realized distribution of usage to capture meaningful variation in engagement intensity. All specifications are instrumented by treatment assignment and estimate the effect of the intervention among compliers, that is, teachers who reached each level of engagement because they received the invitation.
We address potential violations of SUTVA through three strategies.
First, the intervention is delivered individually via personalized WhatsApp invitations. Control teachers cannot access Profe Gabi without a direct invitation. Even if they hear about the tool from colleagues, they cannot use it, which limits the scope for spillovers.
Second, we use administrative data to identify teachers who transfer from a treatment school to a control school during the study period. We assess whether the presence of these transferred teachers is associated with changes in outcomes among other teachers in the receiving control schools, by comparing outcomes in control schools that received a transfer from a treatment school against outcomes in control schools that did not receive any such transfers. If control schools that received transferred teachers show systematically better outcomes, this suggests that transferred teachers brought knowledge or practices from Profe Gabi into the control group through peer diffusion. We also conduct robustness checks excluding control schools that received these transfers from the main analysis.
Third, the endline survey includes items asking all teachers, regardless of treatment assignment, whether they have heard about or used Profe Gabi. Among control group teachers, we examine whether reported exposure to the tool is associated with changes in outcomes from baseline to endline. Because we have baseline survey measures of the same outcomes collected prior to the intervention, we can difference out time-invariant characteristics. This analysis is still descriptive given selection concerns, but it serves as a useful bound: if control teachers with and without reported exposure show similar changes from baseline to endline, that is reassuring evidence that spillovers are not a major concern.
To the extent that spillovers remain after these checks, they are expected to bias ITT estimates toward zero, making our estimates conservative relative to the true effect of the intervention.
Finally, in addition to cross-school spillovers, we study within-school spillovers in treatment schools by studying whether more experienced teachers who were not invited to use Profe Gabi show changes in outcomes relative to their counterparts in control schools.