Experimental Design Details
The intervention will be implemented in schools that have at least 30 percent Syrian refugees. Schools will be stratified by gender, allowing to estimate the average effect of the intervention controlling for confounding variables that might be systematically associated with boys and girls’ schools, while also testing for heterogeneous effects between both types of schools. Schools will be randomly assigned to a Control Group, Treatment 1, or Treatment 2. The rationale behind the two treatment arms, one with 5 sessions of CBT, and one with 20, is assess the sustainability and cost-effectiveness of the interventions as part of the evaluation.
Econometric analyses: The evaluation design is a stratified cluster randomized control trial, so the main analytical approach will follow that standard: regressing the individual student-level dependent (outcome) variables on observable covariates as well as a dummy for assigned treatment status (i.e. intent-to-treat), with clustered standard errors. The estimation therefore takes the form:
Yijk = β0 + βjk•Xjk + βt•Treatmentkt + βzst•zst + єijk
where Yijk is the outcome variable of interest for student i (potentially of type j; see below) in school k; Xjk are mean school-level control variables; and zst are stratum-level fixed effects. The central parameters of interest are the βt for treatments t = 1,2; these are estimated from dummies Treatmentkt which take on the value 1 precisely when school k is assigned to treatment t. The main analysis will not use type j, but we plan to do some exploratory analysis in which students are disaggregated by gender and/or refugee status.
The primary outcome variables (i.e. Y in the equation above) are an index of socioemotional skills and – conditional on seeing positive impacts in that first step – ultimately attendance/dropout status. Secondary outcome variables include math performance and (teacher-level) adherence assessments, which will be analyzed in part as potential mediation mechanisms for the main outcomes.
We will not have baseline data on socioemotional skills (otherwise, we could include that as a control in the regression specification, which we can do for some of the other variables), because we believe that limited resources will be better spent developing and piloting a comprehensive endline survey to faithfully capture a variety of psychosocial constructs among this specific population.
Statistical power calculations: After minimizing within-school attrition, we expect approximately 50 students per cluster (school), although the power calculations do not vary much if this is reduced to 40 or even 30. As is standard in education settings – see Hutchison and Styles (2010) for a discussion of this as well as the intra-cluster correlation (ICC) values within schools – we aim for a minimum detectable effect (MDE) of 0.2 standard deviations. We use standard levels for power (0.8) and significance (0.05) in two-sided hypothesis tests.
The key remaining variable is the ICC. Hutchison and Styles (2010) suggest that “attitudinal and lifestyle” variables have an ICC of roughly 0.05, whereas “academic attainment” variables have an ICC of roughly 0.2. We expect socioemotional skills to be closer to the former, while attendance should be somewhere in between but perhaps closer to the latter. Hence, we use an ICC of 0.15, which yields a required sample size of 67 per arm, implying 200 total schools for our design with control and two treatment arms. As a robustness check we also estimated with an ICC of 0.2, which yields 86 per arm – but that number comes back down to 67 for one-sided hypothesis testing, and we do have a clear directional hypothesis that the intervention will reduce attendance/dropouts.
We estimate that around 5,000 total refugee children will be included in the treatment arms, with a similar number of Jordanians. The inclusion of Jordanians is not only desirable but it also cost-effective as they are part of the same schools. Since the randomization is being done at the school level, and all of the schools in our sample have both refugee and Jordanian children, we should be powered to test for effects separately across the two groups. The only limitation would be if attrition is especially high for one of the groups, although even in that case the magnitudes are likely to be suggestive of any potential differences.