Back to History Current Version

Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools

Last registered on February 01, 2016


Trial Information

General Information

Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools
Initial registration date
February 01, 2016

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
February 01, 2016, 2:25 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.


Primary Investigator

University of Bristol

Other Primary Investigator(s)

PI Affiliation
Harvard Graduate School of Education
PI Affiliation

Additional Trial Information

On going
Start date
End date
Secondary IDs
We study a program of informal, peer performance evaluation among classroom teachers who work in the same school. Maths and English teachers who teach these subjects in years 10 and 11 in “treatment” schools, all of which are secondary schools in areas of high deprivation in England, are asked to implement a program of peer observation for a period of two school years. The experiment is designed to estimate the effect of peer observation on teacher performance in the classroom, as measured by teachers’ contributions to student achievement and attainment, and by teachers’ observed teaching practices. The experiment is also designed to estimate different effects for teachers who are the “observer” and teachers who are the “observee”, and the influence of the frequency of observations. Observer teachers conduct an observation of their peer observee teacher and provide an evaluation; observers assign scores on specific skills using a structured rubric. However, the program is not a formal evaluation; there are no formal incentives or stakes attached to the scores.
External Link(s)

Registration Citation

Burgess, Simon, Shenila Rawal and Eric Taylor. 2016. "Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools." AEA RCT Registry. February 01.
Former Citation
Burgess, Simon, Shenila Rawal and Eric Taylor. 2016. "Teacher observation, teacher effectiveness, and pupil attainment: An RCT in England’s secondary schools." AEA RCT Registry. February 01.
Experimental Details


Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Teacher performance, as measured by teachers’ contributions to student achievement and teachers’ observed teaching practices in the classroom.
Primary Outcomes (explanation)
Student achievement is measured in both control and treatment schools by: (a) GCSE exams in English and Maths, typically completed at the end of year 11; and (b) a bespoke exam at the end of year 10 which will be created, administered, and marked by NFER specifically for the purposes of this study.

Teaching practices measured by classroom observation scores recorded by peer observers using the study rubric. (Only available in schools assigned to the treatment—implementing the peer observation program.)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Of the 82 schools in our study sample, 41 were randomly assigned to the treatment condition and 41 to a business as usual control. Random assignment occurred within blocks formed by two school-level measures: racial/ethnic composition and student test score growth attributable to the school (school value-added). As detailed below, additional randomization of teacher roles and the frequency of observations occurred within the 41 treatment schools.

In treatment schools, English and Maths teachers are asked to implement a program of peer observation and performance description for a duration of two school years. However, the program is not a formal evaluation. No explicit incentives or penalties are attached to teachers’ scores.

Within treatment schools, English and Maths teachers were randomly assigned to one of three role conditions: (i) teachers were assigned to the “observer” role with probability 1/3, (ii) to the “observee” role with probability 1/3, or (iii) to participate in both “observer” and “observee” roles with probability 1/3. Assignment to role was within school-by-subject blocks, where subject is either Maths or English.

Throughout the school year, “observers” periodically spend time watching “observees” teach in the observees’ classes. Each observation lasts 20-30 minutes. During each visit, observers are asked to pay particular attention to the observee’s performance in several specific teaching skills, and score those skills using an evaluation rubric and tablet computer program. Each skill is first scored as being “Ineffective (1-3)”, “Basic (4-6)”, “Effective (7-9)”or “Highly Effective (10-12)”.. The rubric provides a concrete description of what an observer should see happening in the classroom to warrant allocation to each one of these categories. After choosing one of these four categories, observers can choose a numeric score from within each of these categories creating a final 12 point score scale. Observers and observees were encouraged to meet and discuss the observation results however the form and nature of these feedback sessions were not prescribed. Note given the assignment of roles, some teachers will be the observer in one pair interaction, and the observee in a different interaction.

In treatment schools, English and Maths departments were randomly assigned to a “high frequency” observation condition or to a “low frequency” condition. In half of treatment schools the English department was assigned to “high frequency” and the Maths department to “low frequency.” In half of schools the department assignments were reversed. In the high-frequency condition, observees are required to be observed 12 times per year. In the low-frequency condition, the observees are to be observed 6 times per year.

These design features create three key “treatment effect” estimates of interest. First, the broad contrast of student achievement outcomes between the (i) peer observation treatment schools, and (ii) business-as-usual control schools. Second, the contrasts in teacher performance—as measured by student test scores—between (i) teachers in the observer role; (ii) teachers in the observee role; (iii) teachers in treatment schools with no role, but who might have gained through spillovers; and (iv) teachers in control schools with no role and no exposure to treatment. Third, the contrast in teacher performance between teachers who observed or were observed with (i) high frequency, or (ii) low frequency. Moreover, the design also permits estimates of the interactions between these three main contrasts.
Experimental Design Details
Randomization Method
In office by a computer.
Randomization Unit
Please see description of Experimental Design.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
The extensive margin of treatment is assigned at the school level. There are 82 schools (41 treatment, 41 control). Since all other clusters are nested within school, this is the level of clustering most relevant for statistical inference.

The frequency of peer observation is assigned at the department level within each treatment school. There are 82 clusters in this case as well: 41 treatment schools X 2 departments per school. The observer/observee conditions are assigned at the teacher level within each department and school. There are approximately 600 teachers: on average 7 teacher per department X 2 departments X 41 schools.
Sample size: planned number of observations
We observe outcome measures at two levels: teacher and student. The study sample includes approximately 25,000 students and 1,200 teachers.
Sample size (or number of clusters) by treatment arms
41 treatment schools and 41 control schools. Within treatment schools, 41 "high frequency" departments and 41 "low frequency" departments. Within treatment school departments, approximately 200 observers, 200 observees, and 200 taking both observer and observee roles; and in control schools approximately 600 teachers without a role.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Institutional Review Boards (IRBs)

IRB Name
Research Ethics Committee, School of Economics, Finance and Management, University of Bristol
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Collection Completion Date
Final Sample Size: Number of Clusters (Unit of Randomization)
Was attrition correlated with treatment status?
Final Sample Size: Total Number of Observations
Final Sample Size (or Number of Clusters) by Treatment Arms
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

This paper reports on a field experiment in 82 high schools trialing a low-cost intervention in schools’ operations: teachers working in the same school observed and scored each other’s teaching. Students in treatment schools scored 0.07 student standard deviations higher on math and English exams. Teachers were further randomly assigned to roles—observer and observee—and students of both types benefited, observers’ students perhaps more so. Doubling the number of observations produced no difference in student outcomes. Treatment effects were larger for otherwise low-performing teachers.
Teacher Peer Observation and Student Test Scores: Evidence from a Field Experiment in English Secondary Schools. Journal of Labor Economics Volume 39, Number 4 October 2021
We study teachers’ choices about how to allocate class time across different instructional activities, for example, lecturing, open discussion, or individual practice. Our data come from secondary schools in England, specifically classes preceding GCSE exams. Students score higher in math when their teacher devotes more class time to individual practice and assessment. In contrast, students score higher in English if there is more discussion and work with classmates. Class time allocation predicts test scores separate from the quality of the teacher's instruction during the activities. These results suggest opportunities to improve student achievement without changes in teachers’ skills.
Teachers’ use of class time and student achievement Economics of Education Review, 2023, vol. 94, issue C

Reports & Other Materials