NEW UPDATE: Completed trials may now upload and register supplementary documents (e.g. null results reports, populated pre-analysis plans, or post-trial results reports) in the Post Trial section under Reports, Papers, & Other Materials.
Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools
Last registered on February 01, 2016


Trial Information
General Information
Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools
Initial registration date
February 01, 2016
Last updated
February 01, 2016 2:25 PM EST
Primary Investigator
University of Bristol
Other Primary Investigator(s)
PI Affiliation
PI Affiliation
Harvard Graduate School of Education
Additional Trial Information
On going
Start date
End date
Secondary IDs
We study a program of informal, peer performance evaluation among classroom teachers who work in the same school. Maths and English teachers who teach these subjects in years 10 and 11 in “treatment” schools, all of which are secondary schools in areas of high deprivation in England, are asked to implement a program of peer observation for a period of two school years. The experiment is designed to estimate the effect of peer observation on teacher performance in the classroom, as measured by teachers’ contributions to student achievement and attainment, and by teachers’ observed teaching practices. The experiment is also designed to estimate different effects for teachers who are the “observer” and teachers who are the “observee”, and the influence of the frequency of observations. Observer teachers conduct an observation of their peer observee teacher and provide an evaluation; observers assign scores on specific skills using a structured rubric. However, the program is not a formal evaluation; there are no formal incentives or stakes attached to the scores.
External Link(s)
Registration Citation
Burgess, Simon, Shenila Rawal and Eric Taylor. 2016. "Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools." AEA RCT Registry. February 01. https://doi.org/10.1257/rct.1010-1.0.
Former Citation
Burgess, Simon et al. 2016. "Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools." AEA RCT Registry. February 01. http://www.socialscienceregistry.org/trials/1010/history/6709.
Experimental Details
Intervention Start Date
Intervention End Date
Primary Outcomes
Primary Outcomes (end points)
Teacher performance, as measured by teachers’ contributions to student achievement and teachers’ observed teaching practices in the classroom.
Primary Outcomes (explanation)
Student achievement is measured in both control and treatment schools by: (a) GCSE exams in English and Maths, typically completed at the end of year 11; and (b) a bespoke exam at the end of year 10 which will be created, administered, and marked by NFER specifically for the purposes of this study.

Teaching practices measured by classroom observation scores recorded by peer observers using the study rubric. (Only available in schools assigned to the treatment—implementing the peer observation program.)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
Of the 82 schools in our study sample, 41 were randomly assigned to the treatment condition and 41 to a business as usual control. Random assignment occurred within blocks formed by two school-level measures: racial/ethnic composition and student test score growth attributable to the school (school value-added). As detailed below, additional randomization of teacher roles and the frequency of observations occurred within the 41 treatment schools.

In treatment schools, English and Maths teachers are asked to implement a program of peer observation and performance description for a duration of two school years. However, the program is not a formal evaluation. No explicit incentives or penalties are attached to teachers’ scores.

Within treatment schools, English and Maths teachers were randomly assigned to one of three role conditions: (i) teachers were assigned to the “observer” role with probability 1/3, (ii) to the “observee” role with probability 1/3, or (iii) to participate in both “observer” and “observee” roles with probability 1/3. Assignment to role was within school-by-subject blocks, where subject is either Maths or English.

Throughout the school year, “observers” periodically spend time watching “observees” teach in the observees’ classes. Each observation lasts 20-30 minutes. During each visit, observers are asked to pay particular attention to the observee’s performance in several specific teaching skills, and score those skills using an evaluation rubric and tablet computer program. Each skill is first scored as being “Ineffective (1-3)”, “Basic (4-6)”, “Effective (7-9)”or “Highly Effective (10-12)”.. The rubric provides a concrete description of what an observer should see happening in the classroom to warrant allocation to each one of these categories. After choosing one of these four categories, observers can choose a numeric score from within each of these categories creating a final 12 point score scale. Observers and observees were encouraged to meet and discuss the observation results however the form and nature of these feedback sessions were not prescribed. Note given the assignment of roles, some teachers will be the observer in one pair interaction, and the observee in a different interaction.

In treatment schools, English and Maths departments were randomly assigned to a “high frequency” observation condition or to a “low frequency” condition. In half of treatment schools the English department was assigned to “high frequency” and the Maths department to “low frequency.” In half of schools the department assignments were reversed. In the high-frequency condition, observees are required to be observed 12 times per year. In the low-frequency condition, the observees are to be observed 6 times per year.

These design features create three key “treatment effect” estimates of interest. First, the broad contrast of student achievement outcomes between the (i) peer observation treatment schools, and (ii) business-as-usual control schools. Second, the contrasts in teacher performance—as measured by student test scores—between (i) teachers in the observer role; (ii) teachers in the observee role; (iii) teachers in treatment schools with no role, but who might have gained through spillovers; and (iv) teachers in control schools with no role and no exposure to treatment. Third, the contrast in teacher performance between teachers who observed or were observed with (i) high frequency, or (ii) low frequency. Moreover, the design also permits estimates of the interactions between these three main contrasts.
Experimental Design Details
Randomization Method
In office by a computer.
Randomization Unit
Please see description of Experimental Design.
Was the treatment clustered?
Experiment Characteristics
Sample size: planned number of clusters
The extensive margin of treatment is assigned at the school level. There are 82 schools (41 treatment, 41 control). Since all other clusters are nested within school, this is the level of clustering most relevant for statistical inference.

The frequency of peer observation is assigned at the department level within each treatment school. There are 82 clusters in this case as well: 41 treatment schools X 2 departments per school. The observer/observee conditions are assigned at the teacher level within each department and school. There are approximately 600 teachers: on average 7 teacher per department X 2 departments X 41 schools.
Sample size: planned number of observations
We observe outcome measures at two levels: teacher and student. The study sample includes approximately 25,000 students and 1,200 teachers.
Sample size (or number of clusters) by treatment arms
41 treatment schools and 41 control schools. Within treatment schools, 41 "high frequency" departments and 41 "low frequency" departments. Within treatment school departments, approximately 200 observers, 200 observees, and 200 taking both observer and observee roles; and in control schools approximately 600 teachers without a role.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB Name
Research Ethics Committee, School of Economics, Finance and Management, University of Bristol
IRB Approval Date
IRB Approval Number
Post Trial Information
Study Withdrawal
Is the intervention completed?
Is data collection complete?
Data Publication
Data Publication
Is public data available?
Program Files
Program Files
Reports, Papers & Other Materials
Relevant Paper(s)