Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Experiment 1: We conduct ex ante power calculations for the three pairwise treatment comparisons (Incentives vs. Information, Incentives vs. Control, and Information vs. Control) under a two-sided test with 5% significance and 80% power. The sample consists of 1,200 observations in the pooled Incentives group and 400 observations each in the Information and Control groups. For the primary binary outcome - whether a respondent calls back (control mean = 0.148) - we estimate minimum detectable effects (MDEs) of 5.7 percentage points (38.8% of the control mean) for comparisons involving the Incentives group (Incentives vs. Information and Incentives vs. Control), and 7.0 percentage points (47.5% of the control mean) for the Information vs. Control comparison. For call duration (measured in seconds, unconditional on callback; control mean = 22.22, SD = 76.53), the corresponding MDEs are 12.4 seconds (0.16 SD) and 15.2 seconds (0.20 SD), respectively.
Experiment 2: We conduct ex ante power calculations for three pairwise comparisons: Female-tailored message vs. Control, Male-tailored message vs. Control, and Female- vs. Male-tailored messages. Assuming a total sample of 2,000 observations equally allocated across the three arms (approximately 667 observations per arm), a two-sided test with 5% significance and 80% power, the study is powered to detect a minimum effect of 7.7 percentage points for the binary outcome of picking up a push call (control mean = 0.48), corresponding to approximately 16.0% of the control mean. For the outcome measuring the length of the push call message heard (not conditional on picking up; control mean = 63.46 seconds, SD = 86.35 seconds), the minimum detectable effect is 13.3 seconds, equivalent to 0.15 standard deviations. These detectable effects apply symmetrically across all three pairwise comparisons under equal allocation. Given the large dispersion of the duration outcome relative to its mean, the distribution is likely right-skewed, with a mass at short durations and a long upper tail; we therefore interpret power for this outcome as conservative and will complement analyses with transformations or robustness checks (e.g., excluding very short call durations).