Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Power for biomarker outcomes:
We calculate power for the individually randomized design, with block indicators and pre-trial values of the outcomes. Given the persistence of biomarkers, our power calculations assume that these covariates explain 60 percent of the variation in these outcome at endline. This yields a minimum detectable effect size of 0.046 standard deviations.
To translate this MDE into the natural units of the outcomes, we use available data from the 2012-2016 participating cohorts in DC Greens earlier programs. In that closely related population, the standard deviation (SD) for A1c was 2.89; the SD for BMI was 9.12, the SD for SBP was 17.25. These imply minimum detectable effects of 0.32 for A1c; of 1.02 for BMI; and of 1.93 for SBP.
Power for medical expenditure
We propose to use a two-part model for medical costs, as well as a Komolgorov-Smirnov test for equality in distributions (with the latter as primary, and with randomization inference used to conduct inference in both cases). Simulations demonstrate that these offer power advantages relative to OLS regressions for such skewed outcomes. However, since closed-form solutions do not exist for the power for these alternative specifications, we present power calculations for medical expenditures based on an ordinary least squares model. We consider these estimates to be conservative as a consequence.
Medical costs will be analyzed on a monthly basis, with the primary analysis using 12 months of pre-intervention data and 8 months of post-intervention data. Projections about power depend on assumptions about two features of the data-generating process:
(1) the share of variation in medical expenditures that can be explained by the combination of baseline covariates and block-month indicators; and
(2) the autocorrelation in medical expenditure outcomes.
Assuming that covariates explain 40 percent of the variation in outcomes – note that baseline biomarkers and other pre-intervention health measures can be combined with survey data to predict medical expenditures – and that the degree of autocorrelation in medical expenditures is 0.6 the study would be powered to detect an impact of 0.06 standard deviations.
To map the MDE expressed in standard deviations to dollar terms, we note that according to Kaiser Family Foundation the average per member per month (PMPM) Medicaid spending is $750. We expect this value to be highly skewed (with a high fraction of zeros and a long right tail). Based on other studies conducted on the Medicaid population, standard deviations of monthly medical costs are approximately $2,000. An impact of 0.06 standard deviations therefore corresponds to $120 per month, or 16 percent of average monthly expenditure. A reduction in costs of $100 is approximately the effect size required to justify the cost of the program on a cost basis.
As noted above, alternative specifications are likely to be substantially better powered to detect impacts on average expenditure than an OLS regression, in the presence of such skewed outcomes. For example, Komolgorov-Smirnov test can reduce the MDE to as little as a third of the MDE in an OLS regression for skewed data, as the PIs have shown elsewhere.
Power for grocery expenditure
Power for food expenditure outcomes, measured using point-of-sale data from Giant, is strengthened by the multiplicity of post-intervention time periods: we have a total of 36 post-intervention weeks (or equivalent, for individuals in the control group). This compensates considerably for the absence of pre-intervention point-of-sale data. We further assume that grocery expenditure data will be available only for a set of 200 individuals who can be enrolled (with the encouragement of a one-off voucher) to share their grocery expenditure data within the study.
Statistical power for these outcomes is determined by
(1) the share of expenditure outcomes explained by baseline covariates and by block-week fixed effects, which we denote collectively by R_X^2; and
(2) and the extent of autocorrelation in an individual’s post-treatment expenditure outcomes, which we denote by ρ.
We anticipate that food expenditures are highly correlated for a given household from week to week. Applying values of ρ=R_X^2=0.6 gives a minimum detectable effect size of 0.32 standard deviations in food expenditure.
To map this minimum detectable effect size, expressed in standard deviations of food expenditure, into a dollar value, we require an estimate of the standard deviation of food expenditure. We derive this by combining data from two sources. First, the USDA-financed Evaluation of the Healthy Incentives Pilot (HIP): Final Report (Bartlett et al., 2014), which focuses on a similarly low-income (SNAP-eligible) population, reports a standard deviation of weekly household fruit and vegetable purchases – measured in cup equivalents – of 10.9. Second, USDA guidelines on fruit and vegetable consumption imply a current price of $1.00 per cup equivalent. Thus, we estimate the standard deviation of weekly fruit and vegetable expenditure as $10.9.
This estimate for the standard deviation of weekly fruit and vegetable consumption, combined with our estimated minimum detectable effect size (in standard deviation units), implies that we are powered to detect an increase in fruit and vegetable expenditure of $3.54, with 80 percent probability. Compared with a weekly voucher of $20, this is a very small effect. This suggests that even under a range of alternative assumptions about the properties of measured expenditure, we will be highly powered to detect policy-relevant impacts.