Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
We use results from a pilot study on Prolific to calculate the minimum detectable effect size, assuming a Type I error rate of 5%, 80% power, a mean score of 50 (out of 100), and a standard deviation of 30. To be conservative, we use a slightly higher standard deviation than what was observed in the pilot study. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group.
With a sample size of 800, the minimum detectable effect size is 10 percentage points (p.p.), meaning the treated group must improve by at least 10 p.p. more than the control group for the effect to be detectable. As the sample size increases, the minimum detectable effect decreases: with 1,000 observations, it drops to 9 p.p.; with 1,200 and 1,400 observations, it decreases further to 8 p.p.; and with 1,600 observations, it reaches 7 p.p.
Due to funding constraints that limit our sample size, we do not expect to be adequately powered to detect differences in selection patterns across the different versions of the education material at conventional significance levels. Instead, we plan to examine the correlation between participants' willingness to pay for each treatment and their initial financial task scores, then test whether these correlations differ across treatments. Based on our power calculations, with a sample of 800 to 1,000 participants, we can only detect differences in correlations of approximately 15–20 percentage points or more–that is, the correlation between the baseline financial literacy score and the willingness to pay would need to differ by at least 15–20 percentage points across treatments to be statistically detectable. We believe such large differences are unlikely given the similar nature of the education materials.