Back to History

Fields Changed

Registration

Field Before After
Abstract Persistent disparities in financial literacy contribute to income and wealth inequality. This project investigates how individuals self-select into financial education and the welfare impacts of such selection. We pair rich administrative data with a lab-in-field experiment that incentivizes participation in financial education among individuals with low willingness to pay. To examine how different types of education material may influence take-up, we experimentally vary the format and perceived difficulty of the education materials. Persistent disparities in financial literacy contribute to income and wealth inequality. This project investigates how individuals self-select into financial education and explores the welfare impacts of such selection. We pair rich administrative data with a lab experiment that incentivizes participation in financial education among individuals with different levels of demand. To examine how different types of education material influence take-up, we experimentally vary the format and perceived difficulty of the education material.
Trial Start Date August 01, 2025 September 04, 2025
Last Published August 01, 2025 02:29 PM September 04, 2025 11:34 AM
Intervention (Public) We will conduct a lab-in-field experiment to study selection patterns in financial education. Participants are offered access to education material designed to improve their financial decision-making ability within the experiment. We study how demand (willingness to pay) for education varies with baseline ability and with the format and perceived difficulty of the education materials. To better understand the role of confidence in education investment decisions, we also elicit each participant’s perceived baseline ability and perceived benefits of education. We will conduct a lab experiment to study selection patterns in financial education. Participants are offered access to different types of education material which are designed to improve their financial decision-making ability. We study how demand (willingness to pay) for education varies with baseline ability and with the format and perceived difficulty of the education material. To better understand the role of confidence in education investment decisions, we also elicit each participant’s perceived baseline ability and the perceived benefits of education.
Intervention Start Date August 01, 2025 September 04, 2025
Primary Outcomes (End Points) We measure how demand for different types of education materials varies with an individual's starting level of financial knowledge. We measure an individual’s financial decision-making ability, demand for financial education (willingness to pay), and the treatment effect of education (or no education). We measure how demand for different types of educational materials vary with an individual's starting level of financial knowledge. We measure an individual’s financial decision-making ability, demand for financial education (willingness to pay), and the treatment effect of education (or no education).
Primary Outcomes (Explanation) To measure financial knowledge, we design scenarios where there is a financial decision that maximizes a participant's monetary payout in the experiment. We then measure the mistake, or the difference between the participant's choices in the scenario and the choice that would have maximized their payout. We construct our measure of financial knowledge as a rescaled function of the mistake. We base our methodology on Ambuehl, Bernheim, and Lusardi (2022), which uses the difference between two equivalently valued complex and simply-framed financial instruments as the outcome. We collect both a baseline and an endline financial decision-making score. We also measure the change in the score, or the treatment effect under different education materials or under no education. We use a multiple price list to measure demand (willingness to pay/willingness to accept) for different types of education. We exclude observations where individuals switch at multiple points in the multiple price list. We also conduct a version of our analysis using only the willingness to pay/willingness to accept elicitation for the first version of the education material as a robustness check. To measure financial knowledge, we design scenarios where there is a financial decision that maximizes a participant's monetary payout in the experiment. We then measure the mistake, or the difference between the participant's choices in the scenario and the choice that would have maximized their payout. We construct our measure of financial knowledge as a rescaled function of the mistake. We base our methodology on Ambuehl, Bernheim, and Lusardi (2022), which uses the difference between two equivalently valued complex and simply-framed financial instruments as the outcome. We collect both a baseline and an endline financial decision-making score. We also measure the change in the score, or the treatment effect under the different versions of the education material and under the control condition of no education material. We conduct pilots on the relevant populations to check the difficulty of the questions and to make sure there is room for a treatment effect from the education material. We use a multiple price list to measure demand (willingness to pay/willingness to accept) for different types of education. We exclude observations where individuals switch at multiple points in the multiple price list. As a robustness check, we will also conduct a version of the analysis using only the willingness to pay/accept elicitation for the first version of the education material presented to account for potential anchoring.
Planned Number of Observations With our current funding we will target a minimum of 1,000 complete responses, however the final sample size will ultimately depend on the attrition rate. Complete responses are defined as surveys that have been fully completed with only coherent willingness to pay responses. Incoherent responses include participants who switch from willing-to-pay to unwilling-to-pay more than once on the multiple price list, or participants who switch from unwilling-to-pay back to willing-to-pay. Incoherent willingness to pay measures will be coded as missing and excluded from the analysis. Our ideal sample size is 1,200-1,600 responses, and we will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered. Since we connect our experiment questions to a real personal finance class at UC Berkeley, we may decide to exclude responses from participants who have already taken the class if their responses look different from other participants. We will also exclude participants who provide incoherent willingness to pay responses. Based on pilots of the survey on Prolific, we expect the attrition rate from incoherent willingness to pay responses to be approximately 5-15%, however the exact attrition rate may be higher or lower for different participant groups. With our current funding we will target 800-1,000 complete responses, however the final sample size will ultimately depend on the attrition rate. Complete responses are defined as surveys that have been fully answered with only coherent willingness to pay responses. Incoherent responses include participants who switch from multiple times on the multiple price list or participants who switch from right to left. Incoherent willingness to pay responses will be coded as missing and excluded from the analysis. Our ideal sample size is 1,200-1,600 responses, and we will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered. If we conduct a version of the experiment with UC Berkeley students, we will also ask about interest in a real personal finance class at UC Berkeley. For this population, we will exclude responses from participants who have already taken the class if their responses look different from other participants. We will also exclude participants who provide incoherent willingness to pay responses, which will contribute to attrition rates. Based on pilots of the survey on Prolific, we expect the attrition rate from incoherent willingness to pay responses to be approximately 5-15%, however the exact attrition rate may be higher or lower for different participant groups. We will also include a version of the analysis where we drop participants who have their education material choices implemented based on their willingness-to-pay.
Sample size (or number of clusters) by treatment arms We will target a minimum of 1,000 complete responses. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. Our ideal sample size is 1,200-1,600 responses, but the final sample size will ultimately depend on the attrition rate and our ability to recruit enough participants to complete the survey. We may add additional versions of the education material in future versions of the study. We will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered. We will target 800-1,000 complete responses. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. Our ideal sample size is 1,200-1,600 responses, but the final sample size will ultimately depend on the attrition rate and our ability to recruit enough participants to complete the survey. We may add additional versions of the education material in future versions of the study. We will target larger sample sizes if we are able to obtain additional funding. Treatment will not be clustered.
Power calculation: Minimum Detectable Effect Size for Main Outcomes We use results from a pilot study on Prolific to calculate the minimum detectable effect size, assuming a Type I error rate of 5%, 80% power, a mean score of 50 (out of 100), and a standard deviation of 30. To be conservative, we use a slightly higher standard deviation than what was observed in the pilot study. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. With a sample size of 800, the minimum detectable effect size is 10 percentage points (p.p.), meaning the treated group must improve by at least 10 p.p. more than the control group for the effect to be detectable. As the sample size increases, the minimum detectable effect decreases: with 1,000 observations, it drops to 9 p.p.; with 1,200 and 1,400 observations, it decreases further to 8 p.p.; and with 1,600 observations, it reaches 7 p.p. Due to funding constraints that limit our sample size, we do not expect to be adequately powered to detect differences in selection patterns across the different versions of the education material at conventional significance levels. Instead, we plan to examine the correlation between participants' willingness to pay for each treatment and their initial financial task scores, then test whether these correlations differ across treatments. Based on our power calculations, with a sample of 800 to 1,000 participants, we can only detect differences in correlations of approximately 15–20 percentage points or more–that is, the correlation between the baseline financial literacy score and the willingness to pay would need to differ by at least 15–20 percentage points across treatments to be statistically detectable. We believe such large differences are unlikely given the similar nature of the education materials. We use results from a pilot study on Prolific to calculate the minimum detectable effect size, assuming a Type I error rate of 5%, 80% power, a mean score of 50 (out of 100), and a standard deviation of 30. To be conservative, we use a slightly higher standard deviation than what was observed in the pilot study. The treatment-to-control ratio is 3:1, reflecting our design with three treatment arms and one control group. We will target 800-1,000 complete responses. With a sample size of 800, the minimum detectable effect size is 10 percentage points (p.p.), meaning the treated group must improve by at least 10 p.p. more than the control group for the effect to be detectable. As the sample size increases, the minimum detectable effect decreases: with 1,000 observations, it drops to 9 p.p.; with 1,200 and 1,400 observations, it decreases further to 8 p.p.; and with 1,600 observations, it reaches 7 p.p. We plan to examine the correlation between participants' willingness to pay for each treatment and their baseline score in the financial decision-making task, then test whether these correlations differ across treatments. Due to funding constraints that limit our sample size, we do not expect to be adequately powered to detect differences in the slope of the demand curve across the different versions of the education material at conventional significance levels. Based on our power calculations, with a sample of 800 to 1,000 participants, we can only detect differences in correlations of approximately 15–20 percentage points or more–that is, the correlation between the baseline financial literacy score and the willingness to pay would need to differ by at least 15–20 percentage points across treatments to be statistically detectable. We believe such large differences are unlikely given the similar nature of the education materials. We may be able to detect differences in demand for the different types of education materials when we bin the groups (for example by high and low baseline ability).
Intervention (Hidden) We investigate selection patterns in financial education and their implications for welfare. Using two complementary approaches, we examine how demand for financial education relates to the benefits that individuals receive. We pair rich administrative data with a lab-in-field experiment that incentivizes participation in financial education among individuals with low willingness to pay for such education. We measure demand for education, confidence (i.e., the participant's beliefs about their starting level of knowledge), and the perceived benefit of different types of education materials. To understand how the proliferation of new technologies, such as AI and online learning, can influence selection patterns, we vary the format and perceived cognitive difficulty of the education materials. Participants first complete a baseline task in which they choose between different financial instruments. Then, we ask participants about their willingness to pay for three different types of education materials. The three types are a university lecture format, a short video format, and an interactive AI-powered learning assistant. We emphasize that all materials are based on the same content and vary only in presentation. A subset of participants is then randomly assigned to complete one version of the education material. The education material is designed to teach participants the financial concepts needed to maximize their payment (bonus) in the experiment. Finally, participants complete a very similar set of endline tasks after either completing or not completing the education materials. This experiment allows us to identify selection patterns, test for selection on treatment effects, and explore ways to change selection patterns. Our findings will shed light on how self-selection impacts the efficacy of financial literacy programs and inform efforts to target financial literacy education to those who benefit most. We investigate selection patterns in financial education and their implications for welfare. Using two complementary approaches, we examine how demand for financial education relates to the benefits that individuals receive. We pair rich administrative data with a lab experiment that incentivizes participation in financial education among individuals with low willingness to pay for such education. We measure demand for education, confidence (i.e., the participant's beliefs about their starting level of knowledge), and the perceived benefit of different types of education materials. To understand how the proliferation of new technology, such as AI and online learning, can influence selection patterns, we intentionally vary the format and perceived difficulty of the education materials. Participants first complete a baseline task in which they choose between different financial instruments. Then, we ask participants about their willingness to pay for three different types of education materials. The three types are a university lecture format, a short video format, and an interactive AI-powered learning assistant. We emphasize that all versions of the education material are based on the same content and vary only in presentation. Since we are primarily interested in how AI-powered education material can change selection patterns, we also conduct a version of the analysis where we pool the video and university lecture material to compare the AI and non-AI versions of the education material. A subset of participants is then randomly assigned to complete one version of the education material. We design the education material to teach participants financial concepts that can help participants maximize their bonus payment in the experiment. Participants assigned to the education material must complete the material and pass an attention check before moving on in the experiment. After either completing or not completing the education materials, participants are asked to complete an endline financial decision-making task that is very similar to the baseline task. This experiment allows us to identify self-selection patterns, test for selection on treatment effects, and explore ways that we can change self-selection patterns. Our findings will shed light on how self-selection impacts the efficacy of financial literacy programs and inform efforts to target financial literacy education to those who benefit most.
Secondary Outcomes (End Points) We also measure confidence and perceived ability by asking participants to predict their performance on the baseline and endline financial decision-making questions. We count an estimated score as being correct if it is within a reasonable range (e.g. +/- 5 percentage points) of the actual score. We also measure the confidence, perceived ability, and perceived benefit of the different versions of the education material by asking participants to predict their performance on the baseline and endline financial decision-making questions. We count an estimated score as being correct if it is within a reasonable range of the actual score. To understand the drivers of demand for the different types of education materials, we ask individuals to explain their preferences and classify the responses.
Secondary Outcomes (Explanation) We measure confidence and perceived ability by asking participants to predict their performance on the baseline financial decision-making questions. We also ask participants to predict their performance on the endline questions for each version of the education materials and for no education material. We count an estimate as being correct if it is within a reasonable range (e.g., +/- 5 percentage points) of the actual score. We also measure overconfidence using the difference between the participant's actual ability and their perceived ability. We measure the confidence, perceived ability, and perceived benefit of the different versions of the education material by asking participants to predict their performance on the baseline financial decision-making questions. We also ask participants to predict their performance on the endline questions for each version of the education materials and for no education material. We count an estimate as being correct if it is within a reasonable range of the actual score. We also measure overconfidence using the difference between the participant's actual ability and their perceived ability. To understand the drivers of demand for the different types of education materials, we ask individuals to explain their preferences and classify the responses.
Pi as first author No Yes
Back to top