Fidelity vs. Scale: Delivering In-Service Teacher Training Through NGO and Government Teacher Colleges - Evidence from a Three-Arm Randomized Evaluation in Tanzania.

Last registered on May 11, 2026

Pre-Trial

General Information

Title

RCT ID

AEARCTR-0018554

Initial registration date

May 11, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

May 11, 2026, 9:35 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Lukas Hauck

Affiliation

University of Bern

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Konstantin Büchel

PI Affiliation

University of Bern

Contact Investigator

PI Name

Tim Stettler

PI Affiliation

University of Bern

Contact Investigator

Additional Trial Information

Status

On going

Start date

2025-10-01

End date

2026-10-31

Keywords

Behavior, Education

Additional Keywords

Training of Trainers (ToT), Scaling, Teacher Training

JEL code(s)

C93, I20, I28, O15

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

In-service teacher training programs offer a potential solution in alleviating the global learning crisis in many low- and middle-income countries, which show low student learning despite high enrolment. However, significant supply side barriers limit the scalability of high-quality in-service training. This project aims to scientifically assess a scaling approach for teacher professional development in Tanzania. The evaluation is designed as a randomized controlled trial (RCT), to identify the causal effect the two scaling approaches have on student learning outcomes. Additionally, this project incorporates teachers’ perspectives on pedagogical innovations and teachers’ job satisfaction and self-efficacy as a potential mediator of causal mechanisms. By collecting a extensive body of survey data across three waves, this project aims to provide evidence how the two scaling approaches change teacher beliefs how the teacher beliefs are manifested in student performance.

External Link(s)

Registration Citation

Citation

Büchel, Konstantin , Lukas Hauck and Tim Stettler. 2026. "Fidelity vs. Scale: Delivering In-Service Teacher Training Through NGO and Government Teacher Colleges - Evidence from a Three-Arm Randomized Evaluation in Tanzania.." AEA RCT Registry. May 11. https://doi.org/10.1257/rct.18554-1.0

Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Experimental Details

Interventions

Intervention(s)

The intervention is an in-service teacher training program, developed by the Swiss NGO Helvetas in close collaboration with Tanzanian stakeholders. The School-based In-service Teacher Training (SITT) program, has been implemented in over 1,500 primary schools in Tanzania since 2016, with additional pilots in Kenya, South Sudan and Zambia. The program is endorsed by the Tanzania Teachers' Union (TTU), the Ministry of Education, Science and Technology (MoEST), the President's Office, Regional Administration and Local Government (PO-RALG) and supported by district education authorities.

Intervention content. The SITT program promotes a student-centered teaching approach through four simple strategies. First, teachers are encouraged to actively involve pupils during lessons. For example, through group work, by asking them open-ended questions, and by having pupils explain concepts in front of the class. Second, to make learning more accessible, the program promotes practice-based learning. For example, by using examples from daily life, by connecting lessons to prior content knowledge, or by engaging pupils in experiments and hands-on learning activities. Third, to address the scarcity of teaching resources, the program trains teachers to use readily available local materials to create teaching aids. For example, berries and toothpicks can be used to form simple geometric forms. Fourth, the program emphasizes teachers’ accountability by reminding them that each choice they make is deliberate-being late or not engaging pupils during the lesson is a choice of their own. This is reflected in the program’s guiding motto: “MAKSUDI MAKSUDI" (intentional, intentional). Furthermore, participating teachers are asked to share their new knowledge and skills with peer teachers at their school. To achieve this, the program introduces various collaborative activities, such as inviting peers to model lessons to showcase their new skills, organizing peer learning groups to reflect on their implementation, and conducting feedback sessions.

Two prior RCTs by our research team establish that direct SITT participation shifts classroom practice by 0.5–0.8 SD on structured observation measures and raises student math and science scores by 0.13–0.15 SD on nationally standardized assessments (Jakob et al., 2026; Büchel & Hauck, 2026). These effect sizes place the program in the top quartile of experimentally evaluated educational interventions in developing countries (Evans & Yuan, 2022).

The program is delivered through a centrally organized five-day on-site workshop and targets primary school teachers teaching mathematics and science to 7th graders as well as schools head teachers. This study will experimentally evaluate two distinct delivery models and compare the effectiveness of these delivery models against each other:

Arm 1— SITT-Intensive: NGO-Led Delivery (N = 100 schools). Intervention Arm 1 provides the 5-day SITT workshop delivered directly by the NGO facilitation team. From each school in this arm, the headmaster and two subject teachers — mathematics and science teachers nominated by district authorities — attend the 5-day SITT workshop delivered directly by the NGO facilitation team. These facilitators are experienced SITT practitioners who have trained teachers from over 1,000 schools across Tanzania, Kenya, and Zambia. This arm represents the program's standard operating model and serves as the benchmark against which the training-of-trainers approach is evaluated. Semi-annual refresher workshops reinforce adoption of newly acquired techniques throughout the study period. This arm replicates and refines our previous experimental evaluations, which documented substantial shifts in instructional practice among trained teachers and substantial improvements in pupils learning outcomes.

Arm 2 — SITT-Extensive: Teacher College-Led Delivery (N = 100 schools). Intervention Arm 2 provides the same 5-day SITT workshop, but delivered by master trainers from Tanzanian government teacher colleges rather than by NGO facilitators. The Helvetas SITT team first trains three master trainers at each of eight government teacher colleges, following a "training-of-trainers" approach. To reinforce fidelity to the program's pedagogical approach, the master trainers subsequently observe one full week of Arm 1 workshops delivered by the NGO facilitation team. These college-based master trainers then deliver the full 5-day SITT workshop to the headmaster and two subject teachers per assigned school, following the identical curriculum and materials used in Arm 1. Semi-annual refresher workshops, likewise delivered by the college-based master trainers, reinforce adoption of newly acquired techniques throughout the study period. This arm tests whether the Training-of-Trainers model maintains the positive effects on teaching practice and student learning outcomes established under direct NGO delivery. If training quality is preserved, the ToT model would constitute a feasible and cost-effective pathway for nationwide scale-up, leveraging existing government infrastructure without continued dependence on NGO facilitation capacity.

Intervention Start Date

2026-04-20

Intervention End Date

2026-08-31

Primary Outcomes

Primary Outcomes (end points)

Pupils' mathematics and science skills at the end of the 2026 school year, based on the national standardized mathematics and science examinations in Grade 4 (Standard Four National Assessment, SFNA) and Grade 7 (Primary School Leaving Examination, PSLE), administered by the National Examinations Council of Tanzania (NECTA).

The 2026 cycle constitutes the experimental endline, aligned with the participation criterion under which treated teachers commit to teaching Grade 7 — but not Grade 4 — in mathematics and science in 2026. This pre-defined teaching schedule allows us to identify (i) direct effects through the Grade 7 PSLE cohort, taught by trained teachers, and (ii) cascading effects through the Grade 4 SFNA cohort, taught by untrained peer teachers within the same treatment schools.

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

At baseline events, we collect (i) sociodemographic and professional background data from all participating teachers; (ii) a 90-minute paper-based mathematics content assessment from all targeted mathematics teachers; and (iii) a corresponding science content assessment from all targeted science teachers; and (iv) a tablet-based survey eliciting teachers' job satisfaction, self-efficacy, and beliefs about the usefulness and implementation effort of SITT pedagogical techniques, using response-time-based measures and pairwise choice experiments (described in detail below). Both subject assessments mirror Tanzania's primary school curriculum for Grades 2–7 and replicate the instruments used in our prior RCTs (Jakob et al., 2023; Büchel & Hauck), permitting direct comparability across study iterations.

We conduct two waves of structured classroom observations — one baseline wave in early 2026 and one post-intervention waves. Observations are based on the World Bank's "Teach" observation tool, adapted to the Tanzanian primary school context, and capture teacher attendance as well as pedagogical practice across multiple instructional dimensions.

We complement classroom observations with extensive survey data collected from teachers across all sample schools using tablets. During the classroom observation visits, enumerators administer these tablet-based surveys to all targeted teachers in the same school visit. To measure teacher job satisfaction and self-efficacy, we exploit response times as indicators of preference intensity — faster responses to evaluative questions are indicative of stronger underlying attitudes. This approach is based on state-of-the-art methods to elicit economically relevant but unobserved variables like preference intensity, self-assessment, or satisfaction using response times (see Netzer & Liu, 2023; Alós-Ferrer et al., 2021; Benkert et al., 2024). It will allow us to recover nuanced variation in latent teacher dispositions that standard ordinal survey scales fail to capture.

In addition, we implement pairwise choice experiments to elicit two key dimensions of teacher beliefs: (i) the perceived usefulness of specific SITT pedagogical techniques for student learning, and (ii) the perceived effort required to implement each technique in regular classroom practice (Hainmueller et al., 2015). By comparing belief distributions across treatment arms and over time, we can assess whether the two delivery models differentially shift teacher beliefs — and whether those belief shifts mediate adoption of new instructional practices. These measures are collected across the two in-school data collection waves, enabling within-teacher analysis of belief updating in response to the interventions.

Long-run pupil outcomes. As a secondary outcome, we pre-specify pupils' mathematics and science skills on NECTA assessments administered beyond the 2026 school year — the Standard Four National Assessment (SFNA) for Grade 4 and the Primary School Leaving Examination (PSLE) for Grade 7 — to measure the medium to long-term impacts of the intervention.

Secondary Outcomes (explanation)

Experimental Design

Starting point are 849 governement-run primary schools in seven districts in Tanzania: 158 schools in Iringa DC (Iringa region), 46 schools in Iringa MC (Iringa region), 178 schools in Kilosa DC (Morogoro region), 100 schools in Kondoa (Dodoma region), 155 schools in Mvomero DC (Morogoro region),137 schools in Mpwapwa DC (Dodoma region) and 75 schools in Morogoro MC (Morogoro region).
Six out of the seven districts in our sample host one teacher college, with Kilosa DC hosting two teacher colleges. This yields a total of eight teacher colleges, located in seven districts. For each teacher college, we select the schools closest to the teacher college for participation in our experiment. To prevent cross-school spillovers, we account for the presence of so-called "twin schools" — pairs of schools sharing the same name with an A/B suffix and typically operating on the same or adjacent compound. In such cases, we retain only one of the two twin schools in the sample. The selection cutoff is 36 schools in Kilosa DC (hosting two teacher colleges), Morogoro DC, and Iringa MC, and 39 schools in Iringa DC, Kondoa DC, Mpwapwa DC, and Mvomero DC. This procedure yields the final sample of 300 schools.
For these 300 schools the head teacher is nominated for participation in the SITT program. Furthermore, in consultation with the District Education Officer, each school could nominate two regular teachers (one math and one science teacher) for participation in the program. When nominating the two regular teachers, schools are asked to adhere to the following selection criteria: (C1) Both nominees must continue teaching at their current school until the end of 2026. (C2) Both nominees should teach pupils of grade 7 in 2026. (C3) Both nominees should not teach pupils of grade 4 in 2026. All nominees (head teacher and the two regular teachers) are invited to participate in a baseline event scheduled in early October 2025.
During the baseline events in October 2025, the nominated head teachers (300), the nominated math teachers (300) and the nominated science teachers (300) are introduced to the SITT program. The introduction covers the following points: (i) general presentation about the SITT program, (ii) information on the scope and the goals of the evaluation study. (iii) Furthermore, participants will be informed about the criteria for participating in the study (see (C1) to (C2) of previous paragraph) and about their option to opt out of the study at any time. The presentation emphasizes that all participants must adhere to these criteria, no matter whether their school is assigned to the control or one of the treatment groups. After the presentations, the teachers were asked to provide informed written consent to participate in the program and its evaluation. Teachers who provided written consent to participate in the program and its evaluation were asked to fill in a socio-demographic survey. In addition, baseline survey data to measure job satisfaction and self-efficacy as well as the pairwise choice experiment and the math/science assessment as described in section 4 were collected.
The 300 participating schools are then randomly assigned to treatment Arm 1 (SITT Intensive, 100 schools), treatment Arm 2 (SITT Extensive, 100 schools) or control group (not participating in the SITT program, 100 schools). If a teacher did not give consent to participate in the study, he or she will not be part of the evaluation study. Random assignment is implemented in Stata and stratified along two dimensions: First, school clusters in the vicinity of the eight teacher colleges which yields eight strata. Second, the school-level average score across mathematics and science in the 2023 Standard Four National Assessment (SFNA), along which we split schools into above and below median performing schools. This yields a total of 16 strata, each containing between 18 and 20 schools).
The main contribution of this project is to evaluate the relative effectiveness of two alternative delivery models for a previously validated teacher professional development program — direct NGO-led delivery (Arm 1, SITT-Intensive) versus delivery through local government teacher colleges (Arm 2, SITT-Extensive) — and to compare them against each other and against a pure control group. The central estimand is the impact of each delivery model on pupil learning outcomes, measured through Tanzania's nationally standardized examinations. In addition, we evaluate impacts on intermediate outcomes — teachers' classroom practice, their beliefs about the usefulness and implementation effort of specific SITT techniques, and their job satisfaction and self-efficacy — and test whether beliefs elicited directly after training predict subsequent changes in classroom practice observed in the field
Direct effect on targeted pupils. The participation criteria commit treated teachers to teaching Grade 7 mathematics and science throughout 2026. We therefore identify the direct effect of the intervention on pupils taught by trained teachers using the Primary School Leaving Examination (PSLE) administered by the National Examinations Council of Tanzania (NECTA) at the end of 2026. Because schools are randomly assigned across Arm 1, Arm 2, and the control group, any difference in post-intervention outcomes can be causally attributed to the intervention.
Spillovers to peer teachers' pupils. The same participation criteria specify that treated teachers do not teach Grade 4 in 2026. We therefore use the 2026 Standard Four National Assessment (SFNA) to measure cascading effects on pupils taught by untrained peer teachers within the same treatment schools. This design, which partially replicates the identification strategy of Büchel & Hauck (forthcoming), isolates within-school knowledge transmission from direct teacher effects.
Intermediate outcomes and mechanisms. To understand the causal mechanisms through which the two delivery models affect pupil learning, we conduct structured pre- and post-intervention classroom observations (adapted from the World Bank's Teach tool) together with tablet-based teacher surveys eliciting job satisfaction, self-efficacy, and beliefs about SITT techniques (see previous section). A particular methodological test is whether teacher beliefs elicited shortly after the training predict subsequent changes in classroom practice observed in later waves. The baseline survey wave, administered during the sensitization event and complemented by in-school surveys conducted prior to the intervention, provides a pre-intervention benchmark for all survey-based measures.

Experimental Design Details

Not available

Randomization Method

Randomization is done in the office using STATA.

Randomization Unit

We randomize at the school level. Within each teacher-college stratum, we split schools into a lower and an upper half based on the school-level average of fourth-graders' mathematics and science scores in the 2025 Standard Four National Assessment (SFNA). Within each of these strata — defined by the interaction of teacher college and SFNA half — schools are then randomly assigned to the three experimental arms. Note that, from the perspective of pupil-level outcomes, treatment is clustered at the school level.

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

There are 300 schools which participate in the evaluation.
Each school nominates the head teacher, one math and one science teacher for participation, i.e. 300 head teachers, 300 math teachers and 300 science teachers participate in the evaluation.
Schools are then randomly assigned to a treatment arm (100 schools in each arm, i.e. 100 head teachers, 100 math teachers and 100 science teachers in each arm) and a control group (100 schools, i.e. 100 head teachers, 100 math teachers and 100 science teachers).

Sample size: planned number of observations

300 schools yield 900 teachers — specifically, 300 head teachers, 300 mathematics teachers, and 300 science teachers. The mathematics and science abilities of their pupils are assessed using nationally standardized examinations administered by the National Examinations Council of Tanzania (NECTA). Class sizes vary across schools and may change across school years, but we expect on average approximately 60 - 70 pupils per school per grade to sit the assessments, yielding roughly 18,000 – 21,000 pupils for the Grade 7 Primary School Leaving Examination (PSLE) and 18,000 – 21,000 pupils for the Grade 4 Standard Four National Assessment (SFNA).

Sample size (or number of clusters) by treatment arms

Treatment arm 1: 100 schools (resp. 100 head teachers, 100 math and 100 science teachers)
Treatment arm 2: 100 schools (resp. 100 head teachers, 100 math and 100 science teachers)
Control group: 100 schools (resp. 100 head teachers, 100 math and 100 science teachers)
As we expect on average 60-70 students per teacher, we estimate to evaluate treatment effects on pupils' math and science abilities based on the results of 6000 – 7000 pupils per treatment arm and grade.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Power calculations follow Bloom (2007) for clustered designs, with power = 80%, α = 0.05 (two-sided), J = 200 clusters per pairwise contrast, and P = 0.5 (even split between compared arms). The MDE ranges reported below apply to each of the three pairwise contrasts of interest: Arm 1 vs Control, Arm 2 vs Control, and Arm 1 vs Arm 2. We assume n = 65 pupils per school taking the assessment, consistent with enrollment records. Conservatively assuming an ICC of 0.19 and R^2(between) = 0.3, R^2(within) = 0.2 our MDE is 0.15 SD. Based on pupil data from previous experiments, ICC = 0.11, R^2(between) = 0.3, R^2(within) = 0.25 our MDE is 0.11SD. Other configurations rank in between, so the MDE on pupil learning outcomes ranges from 0.11 to 0.15SD.

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Ethikkommission der Wirtschafts- und Sozialwissenschaftliche Fakultät

IRB Approval Date

2025-06-23

IRB Approval Number

262025

Analysis Plan