AEA RCT Registry

Fields Changed

Registration

Field	Before	After
Abstract	Persistent disparities in financial literacy contribute to income and wealth inequality. This project investigates how individuals self-select into financial education and explores the welfare impacts of such selection. We pair rich administrative data with a lab experiment that incentivizes participation in financial education among individuals with different levels of demand. To examine how different types of education material influence take-up, we experimentally vary the format and perceived difficulty of the education material.	Persistent disparities in financial literacy contribute to income and wealth inequality. This project investigates how individuals self-select into financial education and explores the welfare impacts of such selection. We pair rich administrative data with a lab experiment that incentivizes participation in financial education among individuals with different levels of demand. To examine how different types of education material influence take-up, we experimentally vary the format and perceived difficulty of the education material.
Last Published	September 04, 2025 11:34 AM	September 04, 2025 04:40 PM
Intervention (Public)	We will conduct a lab experiment to study selection patterns in financial education. Participants are offered access to different types of education material which are designed to improve their financial decision-making ability. We study how demand (willingness to pay) for education varies with baseline ability and with the format and perceived difficulty of the education material. To better understand the role of confidence in education investment decisions, we also elicit each participant’s perceived baseline ability and the perceived benefits of education.	We will conduct a lab experiment to study selection patterns in financial education. Participants are offered access to different types of education material which are designed to improve their financial decision-making ability. We study how demand (willingness to pay) for education varies with baseline ability and with the format and perceived difficulty of the education material. To better understand the role of confidence in education investment decisions, we also elicit each participant’s perceived baseline ability and the perceived benefits of education.
Randomization Method	The treatment and control conditions will be randomly assigned by a computer. Treatment conditions will be assigned at the individual level. Within treatment status, the different versions of the education material will be randomly assigned at the individual level as well. The size of the treatment and control group will not be balanced in order to maximize power.	The treatment and control conditions will be randomly assigned by a computer. Treatment conditions will be assigned at the individual level. Within treatment status, the different versions of the education material will be randomly assigned at the individual level as well.
Randomization Unit	Treatment conditions will be randomly assigned at the individual level. A computer will choose a random subset of individuals to have one of their choices in the willingness to pay elicitation implemented. For the majority of participants, the treatment and control conditions will be randomly assigned. Within treatment status, the different versions of the education material will be randomly assigned at the individual level as well. The size of the treatment and control group will not be balanced in order to maximize power.	Treatment conditions will be randomly assigned at the individual level. A computer will choose a random subset of individuals to have one of their choices in the willingness to pay elicitation implemented. For the majority of participants, the treatment and control conditions will be randomly assigned. Within treatment status, the different versions of the education material will be randomly assigned at the individual level as well.
Planned Number of Observations	With our current funding we will target 800-1,000 complete responses, however the final sample size will ultimately depend on the attrition rate. Complete responses are defined as surveys that have been fully answered with only coherent willingness to pay responses. Incoherent responses include participants who switch from multiple times on the multiple price list or participants who switch from right to left. Incoherent willingness to pay responses will be coded as missing and excluded from the analysis. Our ideal sample size is 1,200-1,600 responses, and we will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered. If we conduct a version of the experiment with UC Berkeley students, we will also ask about interest in a real personal finance class at UC Berkeley. For this population, we will exclude responses from participants who have already taken the class if their responses look different from other participants. We will also exclude participants who provide incoherent willingness to pay responses, which will contribute to attrition rates. Based on pilots of the survey on Prolific, we expect the attrition rate from incoherent willingness to pay responses to be approximately 5-15%, however the exact attrition rate may be higher or lower for different participant groups. We will also include a version of the analysis where we drop participants who have their education material choices implemented based on their willingness-to-pay.	With our current funding we will target 1,000-1,200 responses, however the final sample size will ultimately depend on the attrition rate. Complete responses are defined as surveys that have been fully answered with only coherent willingness to pay responses and who have passed all relevant comprehension and attention checks. Incoherent willingness to pay responses include participants who switch multiple times on the multiple price list, participants who switch from right to left, or responses with willingness to pay elicitations that otherwise do not make sense. Incoherent willingness to pay responses will be coded as missing and excluded from the analysis. Responses that are flagged as potentially generated by artificial intelligence software (such as ChatGPT) or copied from external sources will be excluded from the analysis. We may use automated plagiarism detection tools and web page navigation tracking, among other methods, to identify such behavior. Our ideal sample size is 1,200-1,600 responses, and we will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered. If we conduct a version of the experiment with UC Berkeley students, we will also ask about interest in a real personal finance class at UC Berkeley. For this population, we will exclude responses from participants who have already taken the class if their responses look different from other participants. We will also exclude participants who provide incoherent willingness to pay responses, which will contribute to attrition rates. Based on pilots of the survey on Prolific, we expect the attrition rate from incoherent willingness to pay responses to be approximately 5-15%, however the exact attrition rate may be higher or lower for different participant groups. We will also include a version of the analysis where we drop participants who have their education material choices implemented based on their willingness-to-pay.
Sample size (or number of clusters) by treatment arms	We will target 800-1,000 complete responses. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. Our ideal sample size is 1,200-1,600 responses, but the final sample size will ultimately depend on the attrition rate and our ability to recruit enough participants to complete the survey. We may add additional versions of the education material in future versions of the study. We will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered.	We will target 1,000-1,200 responses. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. Our ideal sample size is 1,200-1,600 responses, but the final sample size will ultimately depend on the attrition rate and our ability to recruit enough participants to complete the survey. We may add additional versions of the education material with different wordings in future versions of the study to disentangle mechanisms. We will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered.
Power calculation: Minimum Detectable Effect Size for Main Outcomes	We use results from a pilot study on Prolific to calculate the minimum detectable effect size, assuming a Type I error rate of 5%, 80% power, a mean score of 50 (out of 100), and a standard deviation of 30. To be conservative, we use a slightly higher standard deviation than what was observed in the pilot study. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. We will target 800-1,000 complete responses. With a sample size of 800, the minimum detectable effect size is 10 percentage points (p.p.), meaning the treated group must improve by at least 10 p.p. more than the control group for the effect to be detectable. As the sample size increases, the minimum detectable effect decreases: with 1,000 observations, it drops to 9 p.p.; with 1,200 and 1,400 observations, it decreases further to 8 p.p.; and with 1,600 observations, it reaches 7 p.p. We plan to examine the correlation between participants' willingness to pay for each treatment and their baseline score in the financial decision-making task, then test whether these correlations differ across treatments. Due to funding constraints that limit our sample size, we do not expect to be adequately powered to detect differences in the slope of the demand curve across the different versions of the education material at conventional significance levels. Based on our power calculations, with a sample of 800 to 1,000 participants, we can only detect differences in correlations of approximately 15–20 percentage points or more–that is, the correlation between the baseline financial literacy score and the willingness to pay would need to differ by at least 15–20 percentage points across treatments to be statistically detectable. We believe such large differences are unlikely given the similar nature of the education materials. We may be able to detect differences in demand for the different types of education materials when we bin the groups (for example by high and low baseline ability).	We use results from a pilot study on Prolific to calculate the minimum detectable effect size, assuming a Type I error rate of 5%, 80% power, a mean score of 50 (out of 100), and a standard deviation of 30. To be conservative, we use a slightly higher standard deviation than what was observed in the pilot study. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. We will target 1,000-1,200 responses. According to our power calculations, with a sample size of 800 the minimum detectable effect size is 10 percentage points (p.p.), meaning the treated group must improve by at least 10 p.p. more than the control group for the effect to be detectable. As the sample size increases, the minimum detectable effect decreases: with 1,000-1,200 observations, it drops to 9 p.p.; with 1,200-1,400 observations, it decreases further to 8 p.p.; and with approximately 1,600 observations, it reaches 7 p.p. We plan to examine the correlation between participants' willingness to pay for each treatment and their baseline score in the financial decision-making task, then test whether these correlations differ across treatments. Due to funding constraints that limit our sample size, we do not expect to be adequately powered to detect differences in the slope of the demand curve across the different versions of the education material at conventional significance levels. Based on our power calculations, with a sample of 1,000 participants, we can only detect differences in correlations of approximately 15–20 percentage points or more–that is, the correlation between the baseline financial literacy score and the willingness to pay would need to differ by at least 15–20 percentage points across the treatment arms for the differences to be statistically significant. We believe such large differences are unlikely since we highlight the similar nature of the education materials to the participants. We may be able to detect differences in demand for the different types of education materials when we bin the participant groups (for example by high and low baseline ability) or when pooling the AI and non-AI versions of the education material.