Experimental Design
We adopt the experimental design that we pre-registered for a separate study based on the same RCT (AEARCTR-0012924). In that preregistration, we evaluate the short-term effects of the intervention and analyze different outcome domains, focusing on income, subjective wellbeing, cognitive skills, and aspirations. In this pre-registration, we instead examine how international migration shapes perceptions of the self and others, with an emphasis on universalism, gender attitudes, and various aspects of identity.
Since Malengo receives more qualified applications than it can support, it randomizes admission among all shortlisted applicants. We can hence conduct a randomized controlled trial. We use stratified randomization to improve the precision of the estimates. We form strata based on the gender of the applicant, whether they come from the Greater Kampala region or not, and whether they attended the arts or science stream in secondary school. Within each stratum, we form octuplets based on applicants’ standardized test scores in the final secondary school exams. Within each octuplet, we assign up to half of the applicants to the treatment group and the remaining applicants to the control group. Our ability to oversample the control group depends on the number of qualified applicants to the Malengo program, as well as Malengo’s operational budget and recruitment schedule. The intervention is the same for all treated applicants. There is one treatment and one control group.
We follow Malengo’s recruitment schedule and interview shortlisted applicants from the 2021-2026 cohorts of Malengo students. We also interview applicants’ parents (or alternative caregivers if they do not live with their parents), siblings, and friends in Uganda to identify spillover effects.
The analysis is based on a survey that tracks respondents over space and time. We conduct baseline interviews with all respondents. They take place before Malengo informs applicants about the (non-)successful application to avoid anticipation effects. We plan to conduct follow-up interviews with applicants every year and with other types of respondents at least once within the first three years of applicants’ planned arrival in Germany (toward the end of this period). To keep the length of the interview manageable and maximize the time of exposure to Germany, we will collect some outcomes only in the last year (i.e., three years after applicants’ planned arrival in Germany).
We will use the following equation to estimate the impact of the intervention:
Y_it = a + b Malengo_i + X’_i c + u_it
where Y_it is the outcome variable of interest for applicant i in year t after the applicant’s planned arrival in Germany. Malengo_i is the treatment dummy indicating whether the applicant has been admitted to the Malengo program. Based on experience with existing cohorts of Malengo students, we expect compliance to be high. X_i is a vector of baseline control variables. It includes the baseline value of the respective outcome variable wherever possible. It also includes randomization strata, Malengo cohort, survey wave, year of observation, and type of respondent fixed effects. Our focus will be on year 3 after the applicant’s planned arrival in Germany (t=3).
We will use the post-double-selection lasso estimation proposed by Belloni et al. (2014) to select additional control variables. We will consider the following baseline variables as inputs for the procedure (including parents’ values where appropriate): Age, gender, tribe, educational attainment, enrollment status, marital status, household size, number of children 0-5, number of children 6-18, UACE/UCE scores, physical health index, self-efficacy index, remittances received at baseline, remittances sent at baseline, business ownership, value of real estate owned, house ownership, number of bedrooms, number of bathrooms, house quality index, frequency of praying, importance of family/friends/leisure time/politics/work/religion/tradition in life, number of close friends, role of luck vs. effort for economic outcomes, desired level of redistribution of income, economic preferences, Big-5 personality traits, curiosity index, social desirability index, worries index, lived abroad for at least three months, having been overseas, number of people known abroad, number of Malengo scholars known, Facebook/Twitter/Instagram/Tiktok account ownership, district, rural/urban, and baseline values of all primary and secondary outcomes (including those specified in AEARCTR-0012924). We will use dummies to indicate missing baseline data and replace missing values with zero, including both variables in the set of potential control variables for the post-double-selection lasso estimation.
We will make the following adjustments to variables if needed. First, some variables might have minimal variation and thus reduce the power to detect an impact. We will therefore exclude all variables for which 95 percent of observations of the relevant sample or more have the same value. Second, we will winsorize continuous variables that are heavily skewed (e.g., incomes) at the 99th percentile and apply the inverse hyperbolic sine transformation to mitigate the influence of outliers. Third, we will consider replacing missing outcome data (e.g., due to attrition) with observed data from a previous follow-up interview or a proxy interview with a knowledgeable family member or friend.
We will use the same specification to analyze spillover effects and estimate it for the pooled sample of the different groups of non-applicants (as specified for the various primary and secondary outcomes above). We will also report results for estimating the treatment effects separately for the different types of non-applicants (but our focus remains on the pooled sample). We will use OLS to estimate the equation above and cluster standard errors at the applicant level. For outcomes with zeros and positive values, such as income, we will also consider using Poisson regressions to express the treatment effect in levels as a percentage (Chen and Roth, Logs with Zeros? Some Problems and Solutions, Quarterly Journal of Economics, 2024).
We will test for effect heterogeneity along the following dimensions: (i) gender, (ii) ability (based on baseline grades), (iii) socio-economic status (based on per-capita consumption expenditures of parents’ households). We will do so by interacting the treatment dummy with a variable that captures the respective dimension of heterogeneity. We may also consider exploring effect heterogeneity using modern machine-learning methods.
We will rely on outcome indices, as defined by Anderson (2008), to reduce the number of hypotheses. These indices are inverse covariance weighted averages of standardized z-scores of individual outcomes, where individual outcomes are recoded so that higher values correspond to “more favorable” outcomes. In addition, we will adjust for multiple testing across the primary outcomes within types of respondents, controlling for the false discovery rate. We will not adjust for multiple testing across secondary outcomes, individual outcomes within domains, types of respondents, or dimensions of heterogeneity, as we put less emphasis on these results.
We will consider replacing any methods mentioned above with superior methods if they become available by the time of analysis.
Note that we follow the guidance provided by Duflo et al. (2020) on pre-analysis plans and only use these fields in the AEA RCT Registry rather than a separate document.