Experimental Design Details
The eligible population for our study are the parents of students applying to school in an entry grade (Pre-K, Kindergarten, First grade, Ninth grade), in Chile’s dense urban areas, for the academic year beginning in March 2021. As mentioned before, we will be able to include nearly the entire eligible population in our sample, because we are working with the school choice agency directly. In fact, for political restrictions, it is not possible to include only the mentioned grades in our study, and therefore in the implementation we will consider applicants from any grade (Pre-K through 12). However, in the analysis we will only consider those from entry grades. The majority of applicants belong to those entry grades, as expected. See the Pre-Analysis Plan for more details.
From the beginning of August 2020, the school choice portal will display a banner that reads “access personalized information here”. Parents who click on the banner will land on the page that is dedicated to this policy pilot (tuinformacion.mineduc.cl), where they are prompted to sign up. This constitutes the first channel of recruitment. By signing up, they become part of the RCT sample, provided they live inside our selected clusters. While signing up, we will request contact information (email, phone number for SMS and Whatsapp) and, importantly for the randomization, their home address. Once they are logged in, they will be asked to enter the list of all the schools they know and how they wish to rank them in their actual application. All of those who are recruited in this way constitute sample 1, and they will be randomized into treatment by clusters.
Additionally, anyone who completes a real application, will also be part of the experiment, provided they live inside our selected clusters. The difference with these applicants is that they enter the experiment at a moment when they have already submitted a real application. They will be randomized into treatment via clusters, and we will refer to this sample as sample 2.
Additionally, on day three of the application process, we will take all the students from sample 2 that fall inside the control clusters, and we will randomly select a subsample that will be randomized into treatment at the individual level. We refer to this subsample as sample 3. In order to keep the control clusters relatively clean, this sample can not be very large. In practice, this means that there are few “slots” for sample 3, so we will prioritize students that do not have a particularly high switching cost To select this sample, we will proceed as follows. First, we will restrict the universe of applicants to the ones whose decision process is especially interesting for the types of questions our study seeks to answer: applicants choosing schools for the first time (i.e. without a secured spot in their current school), with no siblings enrolled in any school, and with at least 5 schools in a 2 km radius. Then, we will assign a random number to each individual that meets the criteria in the first step, and rank them based on that number. Next, we will proceed to select individuals for sample 3. We will add observations one by one, following the rank. Observations will only be added to the sample if they meet the criteria of being separated by at least 0.5km meters from other individuals already in the sample.
Last year (2019), there were 483,814 applicants. Out of these, 429,476 belonged to urban areas, and almost all of these belonged specifically to the clusters we are considering for our experiment. We expect the implementation this year to happen with a similar number of participants (divided into 3 samples, as explained before).
The unit of randomization is the cluster for samples 1 and 2, or the parent, for sample 3. Since the cluster is the innovative part of our design, we delve into the details here.
The clusters are constructed using the geographic distribution of applications in the previous year. This is done market by market, in two steps. In the first step, we use a Density-based spatial clustering of applications with noise (DBSCAN) algorithm to classify locations into the ones that are closely packed together (points with many nearby neighbors) and the outlier points that lie alone in low-density regions (whose nearest neighbors are too far away). Then we partition the N observations in the first group into K clusters using a k-means algorithm, which minimizes the within-cluster variances.
Then, clusters are divided into three areas. First, the core of the cluster (see white areas in Figure 5 of the Appendix). This will be our unit of observation for the simulated effects of the policy on school congestion and school quality. Then, we have an intermediate zone. This will be our unit for policy implementation. In treated clusters, all the nodes that fall inside these intermediate zones will be assigned to the treatment for simulations. Finally, we define a buffer zone that goes around the intermediate zone. This (buffer) area between treated and control neighborhoods is needed to have a clean experiment. The smaller the buffer, the more likely it is that there can be spillovers between treatment and control clusters.
For samples 1 and 2, randomization will occur at the cluster level defined above. Additionally, parents from sample 2 that were assigned to control clusters and that meet certain criteria (sample 3) will continue through a second process of randomization at the individual level. The criteria that must be met in order to qualify for sampe three are: to be currently applying for PreK, Kindergarten or 1st grade, to not have older siblings enrolled in a school, and to have at least 5 schools within a 2 km radius. In order to avoid spillovers between treatment and control individuals, we will block a circular area of 0.5km radius around each individual in this sample.