Experimental Design
First, researchers garnered support from the district superintendent and other key district personnel. The district then provided a list of 62 schools that were eligible for randomization into the teacher specialization experiment. Researchers removed twelve of these schools because either they were part of another experiment or because their particular school model was antithetical to the notion of teacher specialization (e.g. Montessori). Their final experimental sample consists of 50 schools – twenty-five treatment and twenty-five control – that were randomly allocated vis-à-vis a matched-pair procedure. To partition the set of interested schools into treatment and control, researchers used a matched-pair randomization procedure. Fifty schools entered our experimental sample from which researchers constructed twenty-five matched pairs. First, the full set of fifty schools were ranked by the sum of their mean reading and math test scores in the previous two years. Then, researchers designated every two schools from this ordered list as a “matched pair” and randomly selected one member of the matched pair into the treatment group and one into the control group using a random number generator on a computer.
After treatment and control schools were chosen, treatment schools were alerted that they would alter their schedules to have teachers specialize in a subset of the following subjects – math, science, social studies and reading – based on each teacher’s strengths. Schools then sent in specialization plans along with a written justification for each plan. Principals assigned teachers to subjects based on the principal’s perception of each teacher’s comparative advantage. This perception was based on either Teacher Value-Added (TVA) measures, classroom observations, or recommendations (for teachers new to the district or new to teaching).
Schools were constrained as to how many teachers they had teaching a certain grade and language since teachers were prohibited from switching between these categories. Given these grade-level and language constraints, there were 2-4 teachers available to teach a given grade and language group in over 80% of cases. Based on this availability, teams of teachers were designated within schools and grades. Teachers were not permitted to teach both math and reading. In the modal case of a two-teacher team, one teacher taught math and science and one teacher taught reading and social studies. Otherwise, one teacher taught reading, one teacher taught math, and the teachers shared teaching duties for social studies and science. Some teacher teams had three teachers where one taught math, one taught reading and the third taught science and social studies. Students had different teachers for different subjects, but stayed with the same group of classmates for all subjects. After reviewing schools’ departmentalization plans, researchers recommended further changes in teaching assignment for 25 out of 520 teachers. Researchers made recommendations for changes only in cases where the principal’s decision seemed to contradict Houston’s calculated TVA for the 2011-2012 school year or author-calculated TVA for 2012-2013 school year. Schools then sent updated departmentalization plans and 14 of the recommended changes were agreed upon by the school. In the remaining eleven cases, the principals indicated their choices and arguments justifying their decisions.
The descriptive differences between participating (treatment and control) and nonparticipating schools is consistent with the fact that the leadership of Houston Independent School District (HISD) preferred elementary schools that were predominantly minority and low-achieving to enter the experimental sample. Students in experimental schools are less likely to be white, more likely to be black, less likely to be Asian, more likely to be economically disadvantaged, more likely to be in a special education program, less likely to be gifted, and have lower pre-treatment test scores in math and reading. Thus, the results estimated are likely more applicable to urban schools with high concentrations of minority students.
Researchers use administrative data provided by HISD. The main HISD data file contains student-level administrative data on approximately 200,000 students across the Houston metropolitan area in a given year. The data includes information on student race, gender, free and reduced-price lunch status, behavior, and attendance for all students; state math and reading test scores for students in third through fifth grades; and Stanford 10 subject scores in math and reading for elementary school students. Behavior data records student behavioral incidents resulting in a serious disciplinary action such as a suspension or an expulsion. The HISD data span the 2010-2011 to 2014-2015 school years. Researchers also collected data from a survey administered to teachers at the end of the 2013-2014 school year. 418 (80% response rate) treatment teachers and 343 (70% response rate) control teachers completed the survey.
To supplement HISD’s administrative data, a survey was administered to all teachers in both treatment and control at the end of the 2013-2014 school year. The survey data includes questions about lesson planning, relationship with students and interaction with parents and guardians of students. Teachers were given a $20 Amazon.com gift card for completing the survey and principals were informed that they would also receive a $40 Amazon.com gift card if they were able to get teacher participation above 80% at their campus. Approximately 70 percent of control teachers completed the survey while the corresponding fraction for treatment teachers was 80.
The state math and reading tests, developed by the Texas Education Agency (TEA), are statewide high-stakes exams conducted in the spring for students in third through eleventh grade. All public school students are required to take the math and reading tests unless they are medically excused or have a severe disability. Researchers use test scores that are normalized across the school district. Students in fifth grade must score proficient or above on both tests to advance to the next grade. Because of this, students in the fifth grade who do not pass the tests are allowed to retake it approximately one month after the first administration. Researchers use a student’s first score unless it is missing.
The most important controls researchers use are baseline test scores, i.e., reading and math achievement test scores from the three years prior to the start of the experiment. Researchers also control for whether the rest was taken in Spanish. Baseline scores are STAAR test scores for students in grades three through five in the baseline year and Stanford 10 for students in grade K-2 in the baseline year. Other individual-level controls include gender; a mutually exclusive and collectively exhaustive set of race indicator variables; and indicators for whether a student is eligible for free or reduced-price lunch or other forms of federal assistance, whether a student receives accommodations for limited English proficiency, whether a student receives special education accommodations, or whether a student is enrolled in the district’s gifted and talented program.
To estimate the causal impact of the treatment on outcomes, researchers estimate both intent-to-treat (ITT) effects and Local Average Treatment Effects (LATEs). Researchers also used grade-level fixed effect and a matched-pair fixed effect. A student is considered treated (resp. control) if they were in a treatment (resp. control) school in the pre-treatment year and not in an exit grade (e.g. 5th grade). All student mobility after treatment assignment is ignored.
LATE measures the average effect of attending a treatment school on students who attend as a result of their school being randomly selected. Researchers estimate two different LATE parameters through two-stage least squares regressions, using random assignment as an instrumental variable for the first stage regression. The first LATE parameter uses an indicator variable, EVER which is equal to one if a student attended a treatment school for at least one day in a given school year. Our second LATE parameter uses an indicator variable, TREATED, which is the number of years a student is present at a treatment school. The second-stage equations take the same form as the first-state equations in both cases, but with LATE and TREATED as the dependent variables, respectively. Researchers conducted analysis based on the 2014 test results, 2015 test results, and pooled – 2014 and 2015 – test results.