Experimental Design
For Cohort 1, our core experimental sample consists of 1,476 moderate- to high-risk boys who completed our survey in 52 schools. There was a 2-stage randomization. Half (26) of the schools were randomly assigned to have their pupils be eligible for treatment. Within schools eligible for treatment, half of the moderate- to high-risk boys enrolled at the time of randomization were assigned to treatment. This results in three experimental conditions: treatment boys, spillover boys (control boys in treatment schools), and pure control boys (control boys in control schools).
We expect Cohort 2 to follow a similar design for a pooled experimental sample of approximately 3,000 pupils in 100 schools.
Finally, prior to Cohorts 1 and 2, we conducted a pilot program and evaluation with an experimental sample of 250 youth in 6 schools. Four schools were assigned to treatment and students were assigned to the program within schools at random, stratified by baseline risk level. We had access to 9-month post-intervention results of the pilot before registering the full experimental design. We did not have access to any post-randomization data on Cohort 1 prior to registration (Cohort 1 is currently undergoing the intervention at the time of registration).
ESTIMATION OF DIRECT TREATMENT EFFECTS
We are primarily interested in direct treatment effects. Spillover effects are of secondary interest, and more exploratory, given that we do not have strong priors about the direction or magnitude of spillovers. We will return to this below.
For average treatment effects, we will estimate the simple intent-to-treat (ITT) effect of treatment as well as the effect of treatment on the treated (TOT). TOT uses individual random assignment as an instrument for a “treated” indicator. We define “treated” as equaling one if the pupil attended at least one session of the program and zero if they declined to participate or never attended (e.g. because they transferred to another school before the program began).
We plan to run the following OLS regression:
y = 𝛿T + 𝜃S + 𝛽X + 𝜀
where:
y is the outcome
T = 1 for students assigned to treatment and is 0 for spillover and pure controls
S is a vector of spillover variables, discussed below
X is a vector of baseline covariates (school and pupil level) selected via a double lasso method, plus fixed effects for the cohort
For the ITT we are interested in 𝛿. As mentioned, we will also estimate the TOT estimate, using T as an instrument for treated.
HETEROGENEITY ANALYSIS
Our main heterogeneity analysis will look at impacts by baseline criminal recruitment risk level. We will use baseline self-reported risk and a set of risk factors (all other baseline survey and baseline school variables) to train a predictive model to develop a measure of predicted recruitment risk. We will estimate treatment heterogeneity by interacting treatment status with an indicator of “moderate” recruitment risk (as opposed to “high” risk). We expect this to be an indicator of being below-median predicted risk within the experimental sample. We will also explore alternative subgroups/specifications as supplemental analyses.
ESTIMATION OF SPILLOVERS
Spillovers are a secondary analysis partly because we probably do not have the statistical power to detect all but relatively large spillovers (see power analysis below). What’s more, there may be countervailing effects that reduce estimated spillovers on net. On the one hand, positive informational and behavioral spillovers could improve school dropout, criminal engagement, and gang entry. On the other hand, if demand for labor in the gangs is relatively inelastic, then this could result in negative spillovers to criminal careers and gang membership.
There are several options for improving statistical power and disentangling these countervailing effects. One is to take advantage of the fact that we have partial social network data for surveyed students (up to 10 friends in the school, a subset of whom would have answered our survey and also be eligible for the experimental sample based on risk). The other is to take advantage of the fact that we will have administrative data (such as school dropout, and eventually arrest data) on all classmates, including the lower risk students and the students who did not answer the survey. Finally, there will be random variation in the degree to which the number of friends are treated.
Thus we expect the vector S to include:
An indicator for the spillover condition
A measure of friends treated
The total number of friends in the experimental sample and the total number of friends they listed in their cohort
Note that the coefficients on #3 are not of substantive interest. As for #2, the number of friends and the proportion of friends treated should yield substantively the same coefficient. We can also use an indicator for any friends treated. Since, for cohort 1, the majority of the experimental sample have only 1 friend treated, this indicator should yield qualitatively similar results to the proportion, and if so, is more easily interpreted. But the default would be to use the proportion for accuracy, especially if the results are not qualitatively the same.
We will consider alternate specifications. Note that our primary estimate of interest, the direct treatment effect 𝛿, should not be significantly affected by the specific vector of spillover variables used.
We will calculate spillover effects first within the “core” experimental sample, but also within the broader sample. The core sample of 1,476 excludes pupils in the same classrooms who were designated “low risk”, or who did not complete the risk assessment survey. A broader experimental sample would include them, along with indicators for each of these strata.