Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We conduct two types of power calculations: first, we use observed effect sizes from our pilot in May 2019, in addition to theoretical predictions from Ericson (2017) for the cases we did not test in our pilot, to determine the minimum sample size needed in each arm for each pairwise comparison of interest. We use these power calculations to determine how to optimally allocate our fixed total sample (30,000 firms to be allocated to arms 2 through 15). Second, after conducting the randomization, we take as given the sample size in each arm and calculate the minimum detectable effect for each pairwise comparison of interest. Note that for the measuring the effect of a deadline, anticipated reminder, and unanticipated reminder, in the main results we will pool the 2.75% and 3.00% cross-randomized groups, as both of these are relevant potential fee reductions. Thus, the treatment effects will be a weighted average of the effect of a deadline or reminder with a 2.75% fee offer and the effect of a deadline or reminder with a 3.00% fee offer.
Minimum sample size per arm: In the pilot we estimated the take-up rates of merchants with a 3.75% merchant fee to accept a 3.50% merchant fee reduction offer. Compared to the control group that received no offer, we estimated treatment effects of merchants accepting the offer with an email with no deadline and with a 24-hour deadline. We did not test different length deadlines, as we will do in the RCT. Thus, for the expected effect of both a one-week and one-day deadline we use the effect from our pilot of this 24-hour deadline.
After three days, we also sent unanticipated reminders to merchants, but did not randomize these reminders. We thus estimate the expected effect of a reminder by comparing take-up right before the reminder to 24 hours after the reminder. With these groups, we use the equation for minimum sample size from List, Sadoff, and Wagner (2011, equation 8), plugging in the observed P0 and P1 from the experiment to calculate n*. Doing so, we calculate the following minimum sample sizes per arm for each pairwise comparison of interest.
Because we will be pooling the 2.75% and 3.00% fee cross-randomization within each treatment arm defined by the combination of deadline type and reminder type, for the description below we consider the following 8 (pooled) treatment arms:
T1) Control
T2) No deadline, no reminder
T3) No deadline, anticipated reminder
T4) No deadline, unanticipated reminder
T5) One-week deadline, no reminder
T6) One-week deadline, anticipated reminder
T7) One-week deadline, unanticipated reminder
T8) One-day deadline, no reminder
Control (no offer) vs. No Deadline (T1 vs. T5)
• P0 = 0.01, P1 = 0.18. Minimum sample size per arm to detect this effect = 46
Control (no offer) vs. Deadline (T1 vs. T5)
• P0 = 0.01, P1 = 0.28. Minimum sample size per arm to detect this effect = 26
No Deadline vs. Deadline (T2 vs. T5)
• P0 = 0.18, P1= 0.28. Minimum sample size per arm to detect this effect = 277
No Deadline vs. No Deadline with Unanticipated Reminder (T2 vs. T4)
• P0 = 0.10, P1 = 0.14. Minimum sample size per arm to detect this effect = 1472.
Deadline vs. Deadline with Unanticipated Reminder (T5 vs. T7)
• P0 = 0.20, P1= 0.24. Minimum sample size per arm to detect this effect = 1529.
In the pilot there was no anticipated reminder treatment group. To obtain an estimate of expected effect size of the anticipated reminder and perform power calculations, we benchmark results from the pilot with model simulations based on the model in Ericson (2017) assuming standard magnitudes for present-bias and forgetfulness from the literature, and assuming full naïveté about present-bias but accurate beliefs about memory (using Ericson's terminology, beta = 0.9, beta_hat = 1, rho = 0.95, rho_hat = 0.95). The model simulations predict ratios of the difference in take-up between the groups with unanticipated and anticipated reminders over the difference in take-up between the groups with unanticipated and no reminders. In our pilot we use the difference in take-up between groups with unanticipated and no reminders and then apply the ratio to scale an estimated take-up rate for groups with an anticipated reminder. The ratio from the model is measured in the period when the reminder is sent, one period before the deadline. With these simulated take-up rates of groups with an anticipated reminder, we get a treatment effect ratio of 1.23, which means the necessary sample size to detect differences in relevant pairwise comparisons are:
No Deadline with Unanticipated Reminder vs. No Deadline with Anticipated Reminder (T4 vs. T3)
• P0 = 0.14, P1 = 0.18. Minimum sample size per arm to detect this effect = 1222.
Deadline with Unanticipated Reminder vs. Deadline with Anticipated Reminder (T7 vs. T6)
• P0 = 0.24, P1 = 0.29. Minimum sample size per arm to detect this effect = 1152.
To obtain a minimum sample size per arm for our study, we select the largest sample size needed for each group depending on its relevant pairwise power calculations above. For group T8, we assume we need the same number of observations as for group T5, since both include a deadline and no reminder. We calculate the following minimum sample sizes per arm to detect the expected effect sizes based on our pilot and simulations:
T1) Control: 46
T2) No deadline, no reminder: 1472
T3) No deadline, anticipated reminder: 1222
T4) No deadline, unanticipated reminder: 1472
T5) One-week deadline, no reminder: 1529
T6) One-week deadline, anticipated reminder: 1152
T7) One-week deadline, unanticipated reminder: 1529
T8) One-day deadline, no reminder: 1529
Thus, we need in total 9,951 observations across all treatment arms to statistically detect the expected differences in take-up between treatment groups of interest, based on outcomes in our pilot and simulations of the Ericson (2017) model. As our available sample size for the experiment is 34,010 firms, with 4,010 assigned to control per our partner’s preferences and the remaining 30,000 available to allocate between T2 and T8, which is much larger than the needed 9,951, we adjust sample sizes of each treatment arm T2-T8 proportionally to arrive at the sample sizes per arm in our study shown under “Sample size by treatment arm.”
MDE: For each pairwise comparison of interest between the arms T1 through T8 and the allocation of merchants across treatment arms listed above under “Sample size by treatment arm,” we can calculate the minimum detectable effect we are powered to detect. To do so, we use the formula for minimum detectable effect from List, Sadoff, and Wagner (2011, equation 5), where we can plug in sigma^2_0 = P0*(1 – P0) and sigma^2_1 = P1*(1 – P1). We thus plug in n_0 and n_1 as the sample size for each arm of the comparison, plug in P0 from our pilot data for the relevant group, and solve for P1 and thus for the MDE (= P1 – P0). We express the MDE in percentage points and also divide by the standard deviation to obtain the MDE in standard deviations. The standard deviation we divide by is the standard deviation for all merchants in the comparison of interest, i.e. sqrt(p_bar*(1-p_bar)), where p_bar = (P0 + P1)/2.
Effect of the offer (conditional on no deadline).
• T2 vs T1 : MDE = 0.73 percentage points. MDE (SD) = 0.07 SD.
Effect of the offer (conditional on deadline).
• T5 vs T1 : MDE = 0.72 percentage points. MDE (SD) = 0.07 SD.
Effect of a deadline (conditional on no reminder).
• T5 vs T2 : MDE = 2.32 percentage points. MDE (SD) = 0.06 SD. (Note the MDE in SD went down relative to the last comparison even though the MDE in percentage points went up, since p_bar is higher and hence 1/SD is higher for this comparison.)
Effect of an unanticipated reminder (conditional on no deadline).
• T4 vs T2 : MDE = 1.88 percentage points. MDE (SD) = 0.06 SD.
Effect of an anticipated reminder (conditional on reminder, no deadline).
• T3 vs T4 : MDE = 2.22 percentage points. MDE (SD) = 0.06 SD.
Effect of an unanticipated reminder (conditional on deadline).
• T7 vs T5 : MDE = 2.37 percentage points. MDE (SD) = 0.06 SD.
Effect of an anticipated reminder (conditional on reminder, deadline).
• T6 vs T7 : MDE = 2.74 percentage points. MDE (SD) = 0.06 SD.