Experimental Design
Of the 82 schools in our study sample, 41 were randomly assigned to the treatment condition and 41 to a business as usual control. Random assignment occurred within blocks formed by two school-level measures: racial/ethnic composition and student test score growth attributable to the school (school value-added). As detailed below, additional randomization of teacher roles and the frequency of observations occurred within the 41 treatment schools.
In treatment schools, English and Maths teachers are asked to implement a program of peer observation and performance description for a duration of two school years. However, the program is not a formal evaluation. No explicit incentives or penalties are attached to teachers’ scores.
Within treatment schools, English and Maths teachers were randomly assigned to one of three role conditions: (i) teachers were assigned to the “observer” role with probability 1/3, (ii) to the “observee” role with probability 1/3, or (iii) to participate in both “observer” and “observee” roles with probability 1/3. Assignment to role was within school-by-subject blocks, where subject is either Maths or English.
Throughout the school year, “observers” periodically spend time watching “observees” teach in the observees’ classes. Each observation lasts 20-30 minutes. During each visit, observers are asked to pay particular attention to the observee’s performance in several specific teaching skills, and score those skills using an evaluation rubric and tablet computer program. Each skill is first scored as being “Ineffective (1-3)”, “Basic (4-6)”, “Effective (7-9)”or “Highly Effective (10-12)”.. The rubric provides a concrete description of what an observer should see happening in the classroom to warrant allocation to each one of these categories. After choosing one of these four categories, observers can choose a numeric score from within each of these categories creating a final 12 point score scale. Observers and observees were encouraged to meet and discuss the observation results however the form and nature of these feedback sessions were not prescribed. Note given the assignment of roles, some teachers will be the observer in one pair interaction, and the observee in a different interaction.
In treatment schools, English and Maths departments were randomly assigned to a “high frequency” observation condition or to a “low frequency” condition. In half of treatment schools the English department was assigned to “high frequency” and the Maths department to “low frequency.” In half of schools the department assignments were reversed. In the high-frequency condition, observees are required to be observed 12 times per year. In the low-frequency condition, the observees are to be observed 6 times per year.
These design features create three key “treatment effect” estimates of interest. First, the broad contrast of student achievement outcomes between the (i) peer observation treatment schools, and (ii) business-as-usual control schools. Second, the contrasts in teacher performance—as measured by student test scores—between (i) teachers in the observer role; (ii) teachers in the observee role; (iii) teachers in treatment schools with no role, but who might have gained through spillovers; and (iv) teachers in control schools with no role and no exposure to treatment. Third, the contrast in teacher performance between teachers who observed or were observed with (i) high frequency, or (ii) low frequency. Moreover, the design also permits estimates of the interactions between these three main contrasts.