Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We surveyed the literature on cash transfers to gauge the range of plausible effect sizes that we want to be powered to detect. Transfers are around 15% of household income, similar in magnitude to those studied in Schady and Rosero (2008) and Paxson and Schady (2010) in Ecuador, Amarante et al (2011) in Uruguay, and Macours et al (2012) in Nicaragua. The most comparable is the Ecuadorian program, in which there was an improvement in development of 0.18 standard deviations among the poorest quartile of children in their sample; given the higher income levels in Ecuador, that group should be roughly comparable to our sample in Jharkhand. We wish to detect effects of that size for each of the questions of interest. To be slightly conservative, our calculations focus on minimum detectable effect sizes of 0.15 standard deviations.
Intracluster Correlation Calculations
Since our randomization is clustered at the AWC level, we must account for intracluster correlation in our power calculations. We use the 2006 wave of the Indian National Family Health Survey and calculate the intra-village correlation in child height and weight for age within villages in Jharkhand and neighboring states (Bihar, Madhya Pradesh, Orissa). For these variables, the intracluster correlation is relatively high, at 0.06 and 0.09. To be conservative, we use 0.09 in our calculations.
Number of Participants Per Cluster
The average number of women per AWC is estimated to be 5, but in the first round of registrations, we observed substantial heterogeneity across AWCs. In some clusters, there were as many as 17 women registered, while in others, none registered. We calculate the coefficient of variation in number of registrations across AWCs (0.665), and take this into account in the power calculations. When we randomize, we will also stratify by number of registrations in an AWC to maximize power.
One way of increasing power is to collect baseline data on variables that will be correlated with the outcome at endline (such as the outcome at baseline). We have elected not to do a baseline survey. There are two reasons that we will not be able to do this. First, it is not possible to identify beneficiaries until after they have registered, and the transfers must go out as soon as possible after registration. This does not afford time for a baseline survey. Second, the primary outcomes of interest, such as child weight and height are not possible to measure at baseline, since the child is still in utero. In the power calculations, we thus do not assume any power boost from a baseline survey.
The power calculations are different for each of the questions of interest. For example, to calculate the effect of a single year of cash transfers, we can combine the two treatment groups that receive a year of transfers and compare them to the control group. However, to determine the effect of receiving two years of transfers, we can only compare that single treatment group to the control group. For all of the questions of interest, we seek to detect an effect of 0.15 standard deviations with 80% power. We thus determine our sample size for the questions on which the sample sizes will be smallest, and then calculate our power for the other questions of interest under this sample size.
Based on this, a sample size of 240 AWCs per treatment arm allows us to detect an effect size of 0.15 standard deviations with 95% confidence at 80% power. Based on this sample size, the below table shows the level of power for different effect sizes.
Question of Interest Effect size of 0.1 SD Effect size of 0.12 SD Effect size of 0.15 SD
Comparison of 1 treatment arm to 1 treatment arm (e.g. effect of 1 year of transfers in utero vs. after child is born) 0.48 0.63 0.82
Comparison of 1 treatment arm to 2 treatment arms (e.g. effect of 1 year of transfers vs. 2 years of transfers) 0.66 0.82 0.95