Experimental Design
Starting point are 849 governement-run primary schools in seven districts in Tanzania: 158 schools in Iringa DC (Iringa region), 46 schools in Iringa MC (Iringa region), 178 schools in Kilosa DC (Morogoro region), 100 schools in Kondoa (Dodoma region), 155 schools in Mvomero DC (Morogoro region),137 schools in Mpwapwa DC (Dodoma region) and 75 schools in Morogoro MC (Morogoro region).
Six out of the seven districts in our sample host one teacher college, with Kilosa DC hosting two teacher colleges. This yields a total of eight teacher colleges, located in seven districts. For each teacher college, we select the schools closest to the teacher college for participation in our experiment. To prevent cross-school spillovers, we account for the presence of so-called "twin schools" — pairs of schools sharing the same name with an A/B suffix and typically operating on the same or adjacent compound. In such cases, we retain only one of the two twin schools in the sample. The selection cutoff is 36 schools in Kilosa DC (hosting two teacher colleges), Morogoro DC, and Iringa MC, and 39 schools in Iringa DC, Kondoa DC, Mpwapwa DC, and Mvomero DC. This procedure yields the final sample of 300 schools.
For these 300 schools the head teacher is nominated for participation in the SITT program. Furthermore, in consultation with the District Education Officer, each school could nominate two regular teachers (one math and one science teacher) for participation in the program. When nominating the two regular teachers, schools are asked to adhere to the following selection criteria: (C1) Both nominees must continue teaching at their current school until the end of 2026. (C2) Both nominees should teach pupils of grade 7 in 2026. (C3) Both nominees should not teach pupils of grade 4 in 2026. All nominees (head teacher and the two regular teachers) are invited to participate in a baseline event scheduled in early October 2025.
During the baseline events in October 2025, the nominated head teachers (300), the nominated math teachers (300) and the nominated science teachers (300) are introduced to the SITT program. The introduction covers the following points: (i) general presentation about the SITT program, (ii) information on the scope and the goals of the evaluation study. (iii) Furthermore, participants will be informed about the criteria for participating in the study (see (C1) to (C2) of previous paragraph) and about their option to opt out of the study at any time. The presentation emphasizes that all participants must adhere to these criteria, no matter whether their school is assigned to the control or one of the treatment groups. After the presentations, the teachers were asked to provide informed written consent to participate in the program and its evaluation. Teachers who provided written consent to participate in the program and its evaluation were asked to fill in a socio-demographic survey. In addition, baseline survey data to measure job satisfaction and self-efficacy as well as the pairwise choice experiment and the math/science assessment as described in section 4 were collected.
The 300 participating schools are then randomly assigned to treatment Arm 1 (SITT Intensive, 100 schools), treatment Arm 2 (SITT Extensive, 100 schools) or control group (not participating in the SITT program, 100 schools). If a teacher did not give consent to participate in the study, he or she will not be part of the evaluation study. Random assignment is implemented in Stata and stratified along two dimensions: First, school clusters in the vicinity of the eight teacher colleges which yields eight strata. Second, the school-level average score across mathematics and science in the 2023 Standard Four National Assessment (SFNA), along which we split schools into above and below median performing schools. This yields a total of 16 strata, each containing between 18 and 20 schools).
The main contribution of this project is to evaluate the relative effectiveness of two alternative delivery models for a previously validated teacher professional development program — direct NGO-led delivery (Arm 1, SITT-Intensive) versus delivery through local government teacher colleges (Arm 2, SITT-Extensive) — and to compare them against each other and against a pure control group. The central estimand is the impact of each delivery model on pupil learning outcomes, measured through Tanzania's nationally standardized examinations. In addition, we evaluate impacts on intermediate outcomes — teachers' classroom practice, their beliefs about the usefulness and implementation effort of specific SITT techniques, and their job satisfaction and self-efficacy — and test whether beliefs elicited directly after training predict subsequent changes in classroom practice observed in the field
Direct effect on targeted pupils. The participation criteria commit treated teachers to teaching Grade 7 mathematics and science throughout 2026. We therefore identify the direct effect of the intervention on pupils taught by trained teachers using the Primary School Leaving Examination (PSLE) administered by the National Examinations Council of Tanzania (NECTA) at the end of 2026. Because schools are randomly assigned across Arm 1, Arm 2, and the control group, any difference in post-intervention outcomes can be causally attributed to the intervention.
Spillovers to peer teachers' pupils. The same participation criteria specify that treated teachers do not teach Grade 4 in 2026. We therefore use the 2026 Standard Four National Assessment (SFNA) to measure cascading effects on pupils taught by untrained peer teachers within the same treatment schools. This design, which partially replicates the identification strategy of Büchel & Hauck (forthcoming), isolates within-school knowledge transmission from direct teacher effects.
Intermediate outcomes and mechanisms. To understand the causal mechanisms through which the two delivery models affect pupil learning, we conduct structured pre- and post-intervention classroom observations (adapted from the World Bank's Teach tool) together with tablet-based teacher surveys eliciting job satisfaction, self-efficacy, and beliefs about SITT techniques (see previous section). A particular methodological test is whether teacher beliefs elicited shortly after the training predict subsequent changes in classroom practice observed in later waves. The baseline survey wave, administered during the sensitization event and complemented by in-school surveys conducted prior to the intervention, provides a pre-intervention benchmark for all survey-based measures.