Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Sample sizes for each group were informed by results in a prior pilot test conducted in May 2019 and making pairwise power calculations for comparisons of interest. In the pilot we were offering a reduction in the merchant fee charged to the firm for card payments processed through the FinTech payments company from 3.75% to 3.5%. There was a control group, a placebo group that received an email from the FinTech payments company with no messages, a group that received the offer with no deadline, and a group that received the offer with a 24-hour deadline. We also sent two non-randomized reminders to both groups after the deadline had passed (the deadline was not binding, so firms could still sign up after the deadline).
For the purpose of the power calculations we assume a similar take-up between a 24-hour and one week deadline absent other treatments. The take-up rates in the pilot by pairwise comparison and the necessary sample size to detect such differences are as follows (using the treatment group numbers above). In the comparisons below, P0 refers to take-up in the second group listed in each comparison (e.g. if the comparison is 2 vs 1, P0 refers to take-up in group 1, and P1 refers to take-up in group 2). For the reminders which were not randomized in our pilot, we estimate P1 as cumulative take-up of the offer 24 hours after the reminder was sent and P0 as cumulative take-up immediately before the reminder was sent.
- 2 vs 1: P0 = 0.01, P1 = 0.18. Minimum sample size per arm = 46.
- 5 vs 1: P0 = 0.01, P1 = 0.28. Minimum sample size per arm = 26.
- 5 vs 2: P0 = 0.18, P1 = 0.28. Minimum sample size per arm = 277.
- 4 vs 2: P0 = 0.14, P1 = 0.18. Minimum sample size per arm = 1472.
- 7 vs 5: P0 = 0.20, P1 = 0.24. Minimum sample size per arm = 1529.
In the pilot there was no anticipated reminder treatment group. To obtain an estimate of expected effect size of the anticipated reminder and perform power calculations, we benchmark results from the pilot with model simulations based on the model in Ericson (2017) assuming standard magnitudes for present-bias and forgetfulness from the literature, and assuming full naïveté about present-bias but accurate beliefs about memory (using Ericson’s terminology, beta = 0.9; beta_hat = 1; rho = 0.95; rho_hat = 0.95). The model simulations predict ratios of the difference in take-up between the groups with unanticipated and anticipated reminders over the difference in take-up between the groups with unanticipated and no reminders. In our pilot we use the difference in take-up between groups with unanticipated and no reminders and then apply the ratio to scale an estimated take-up rate for groups with an anticipated reminder. The ratio from the model is measured in the period when the reminder is sent right before the deadline. With these simulated take-up rates of groups with an anticipated reminder, we get a treatment effect ratio of 1.23, which means the necessary sample size to detect differences in relevant pairwise comparisons are:
- 4 vs 3: P0 = 0.14, P1 = 0.18. Minimum sample size per arm = 1222.
- 7 vs 6: P0 = 0.24, P1 = 0.29. Minimum sample size per arm = 1152.
To obtain a minimum sample size per arm for our study, we select the largest sample size needed for each group depending on its relevant pairwise power calculations above. For group 8, we assume we need the same number of observations as for group 5, since both include a deadline and no reminder. We calculate the following minimum sample sizes per arm to detect the expected effect sizes based on our pilot and simulations:
- Control, no messages: 46
- Loan messages with no deadline, no reminder: 1,472
- Loan messages with no deadline, anticipated reminder: 1,529
- Loan messages with no deadline, unanticipated reminder: 1,222
- Loan messages with 1-week deadline, no reminder: 1,152
- Loan messages with 1-week deadline, anticipated reminder: 1,472
- Loan messages with 1-week deadline, unanticipated reminder: 1,529
- Loan messages with 24-hour deadline, no reminder: 1,529
Thus, we need in total 9,951 observations across all treatment arms to statistically detect the expected differences in take-up between treatment groups of interest, based on outcomes in our pilot and simulations of the Ericson (2017) model. As our available sample size for the experiment is 70,020 firms, which is much larger than the needed 9,951, we adjust sample sizes of each treatment arm proportionally to arrive at the sample sizes per arm in our study shown under “Sample size by treatment arm.”