Back to History

Fields Changed

Registration

Field Before After
Abstract We study how to equalize students’ educational opportunities in a tracked high school system. We will use a randomized control trial in around 400 Italian middle schools to test two conceptually different ways to close gender gaps in high school track recommendations. To help teachers predict future performance, our first treatment provides teachers with a simple algorithmic recommendation that indicates whether students are likely to excel in the more rigorous scientific high school tracks. To address behavioral biases and awareness issues, our second treatment provides teachers with real-time feedback on the diversity of their track recommendations compared to an algorithmic benchmark, which they can then adjust before communicating their recommendations to students. We hypothesize that the first intervention will lead to fairer track recommendations if teacher biases are driven by inaccurate mental models of how individual students will perform in high school. Meanwhile, the second intervention will lead to fairer recommendations if teachers are best able to correct behavioral biases when considering the aggregate diversity of their track recommendations. We study how to equalize students’ educational opportunities in a tracked high school system. We will use a randomized control trial in nearly 370 Italian middle schools to test two conceptually different ways to close gender gaps in high school track recommendations. To help teachers predict future performance, our first treatment provides teachers with a simple algorithmic recommendation that indicates whether students are likely to excel in the more rigorous scientific high school tracks. To address behavioral biases and awareness issues, our second treatment provides teachers with real-time feedback on the diversity of their track recommendations compared to an algorithmic benchmark, which they can then adjust before communicating their recommendations to students. We hypothesize that the first intervention will lead to fairer track recommendations if teacher biases are driven by inaccurate mental models of how individual students will perform in high school. Meanwhile, the second intervention will lead to fairer recommendations if teachers are best able to correct behavioral biases when considering the aggregate diversity of their track recommendations.
Last Published December 16, 2025 04:00 AM January 08, 2026 11:24 AM
Randomization Unit The unit of randomization is the school system. Because some teachers reported teaching classes in multiple participating middle schools, we linked some schools to form school systems before randomizing, to avoid exposure to multiple treatments. The school systems were then randomized into a control group or one of three treatment groups, with 25% probability each. The unit of randomization is the school. When teachers report teaching in multiple participating middle school buildings, these buildings were treated as a single school to prevent exposure to multiple treatments. Schools were then randomly assigned to the control group or to one of three treatment groups, with equal probability (25% each).
Planned Number of Clusters Around 400 school systems Around 368 schools
Planned Number of Observations Around 3,000 eligible eighth-grade teachers and 40,000 eighth-graders in participating school systems. Our estimation sample consists of 8th-grade students in treatment-eligible classes within the 368 randomized schools. We expect approximately 39,600 8th-grade students to be in treatment-eligible classes. Furthermore, we will assess spillover effects by comparing the gender gaps in track recommendation and track choice using administrative outcomes of 8th-grade students in ineligible classes in treated and control schools. We expect approximately 19,000 8th-grade students to be in ineligible classes in treated or control schools.
Sample size (or number of clusters) by treatment arms The school systems will be randomized into the control group or one of three treatment groups, with 25% probability each. The schools will be randomized into the control group or one of three treatment groups, with 25% probability each.
Power calculation: Minimum Detectable Effect Size for Main Outcomes Considering an ICC of 0.07, with approximately 400 school systems (100 school systems per treatment arm) and around 100 students per school system (approximately 30 baseline high-achievers and 70 baseline low-achievers), the MDE for the high-achieving students is 0.127 SD for each treatment arm without including covariates, or 0.06 percentage points change, assuming that 42% of high-achieving students are recommended for a scientific track. Considering an ICC of 0.07, approximately 368 schools, and around 108 students per school in treatment-eligible classes (about 32 high-achieving students per school), the minimum detectable effect (MDE) for the gender interaction among high achievers is 7.5 percentage points, corresponding to approximately 0.16 standard deviations, assuming mean recommendation rates of 55% for high-achieving boys and 34% for high-achieving girls. The MDE for the average treatment effect on all students in treatment-eligible classes is around 4.5 percentage points, or 0.12 standard deviations. We expect to achieve higher power once we control for teacher characteristics (from surveys) and rich baseline information on students (from INVALSI administrative and survey data).
Intervention (Hidden) We evaluate the effectiveness of two interventions, described in detail below. We randomized schools into four treatment arms: one control group and three treatment groups. All teachers in participating schools received a letter by email with an aggregate statistics report about their past students' academic performance in high school. This report did not include any information disaggregated by gender or any comparison to other schools, and it was provided as an incentive to participate in the study. In the first treatment group, the letter also included a simple algorithmic track recommendation for each student in the teacher's class, as well as information on the students' grades in the core subjects (mathematics, literature, and English) in grade 7. In the second treatment group, the letter also included information on the number of male and female students in the teacher's class who should be encouraged to attend the scientific track, based on students' core subject grades in grade 7. In the third treatment group, teachers received both sets of information in their letters. The interventions are described in more detail below: (1) Simple Algorithmic Track Recommendations. We designed the first intervention to help teachers generate more accurate and objective track recommendations, grounded in their students' actual past academic performance. For each class of eighth graders where the class coordinator or one core subject teacher joined the study, we asked the class coordinator to fill out a spreadsheet with the students' most recent end-of-year core subject grades in grade 7. We emailed a blank version of the spreadsheet to the class coordinator with built-in settings. As the coordinator added the students' grades to the spreadsheet, the spreadsheet automatically generated a simple algorithmic track recommendation for each student based on their core subject grades in grade 7. Before the track recommendation meeting, the teachers received this spreadsheet, as well as additional information from the research team explaining the contents of the spreadsheet. The direct inclusion of grades in the spreadsheet was to address any memory issues that teachers may experience when attempting to recall students' past academic performance. The inclusion of a simple algorithmic track recommendation was to help teachers more accurately translate the grades into future academic performance. We asked the class coordinator to send in the filled-in spreadsheet after the track recommendation meeting. (2) Algorithmic Benchmark. We designed the second intervention to address teachers' behavioral biases. Teachers received information on the number of male and female eighth graders in their class who should be encouraged to attend the scientific track, based on their school's administrative records. By design, this information was consistent with the simple algorithmic track recommendations to the scientific track. Teachers received this information before the track recommendation meeting. The teachers were then asked to write down how many male and female students they planned to recommend to the scientific track, allowing them to assess the bias in their track recommendations relative to the algorithmic benchmark. We evaluate the effectiveness of two interventions, described in detail below. We randomized schools into four treatment arms: one control group and three treatment groups. All teachers participating in the baseline received a letter by email with aggregate statistics about their past students' academic performance in high school. This report did not include any information disaggregated by gender or any comparison to other schools, and it was provided as an incentive for participation in the study. In the first treatment group, the letter also included a simple algorithmic recommendation for each student in the teacher's class. In the second treatment group, the letter also included information on the number of male and female students in the teacher's class who should be encouraged to attend the scientific track, based on students' core subject grades in grade 7. In the third treatment group, teachers received both sets of information in their letters. The interventions are described in more detail below: (1) Simple Algorithmic Track Recommendations. We designed the first intervention to help teachers generate more accurate and objective track recommendations, grounded in their students' actual past academic performance. For each class of eighth graders where the class coordinator or one core subject teacher joined the study, we asked the class coordinator to fill out a spreadsheet with the students' most recent end-of-year core subject grades in grade 7. We emailed a blank version of the spreadsheet to the class coordinator with built-in settings. As the coordinator added the students' grades to the spreadsheet, the spreadsheet automatically generated a simple algorithmic track recommendation for each student based on their core subject grades in grade 7. Before the track recommendation meeting, the teachers received this spreadsheet, as well as additional information from the research team explaining the contents of the spreadsheet. The direct inclusion of grades in the spreadsheet was to address any memory issues that teachers may experience when attempting to recall students' past academic performance. The inclusion of a simple algorithmic track recommendation was to help teachers more accurately translate the grades into future academic performance. We asked the class coordinator to send in the filled-in spreadsheet after the track recommendation meeting. (2) Algorithmic Benchmark. We designed the second intervention to address teachers' behavioral biases. Teachers received information on the number of male and female eighth graders in their class who should be encouraged to attend the scientific track, based on their school's administrative records. By design, this information was consistent with the simple algorithmic track recommendations to the scientific track. Teachers received this information before the track recommendation meeting. The teachers were then asked to write down how many male and female students they planned to recommend to the scientific track, allowing them to assess the bias in their track recommendations relative to the algorithmic benchmark.
Back to top