Back to History

Fields Changed

Registration

Field Before After
Trial Status in_development completed
Abstract This paper studies the tradeoffs of personalizing treatments based on heterogeneity analysis. We do so in the setting of a learning platform (Conecta Ideas) for primary school students in Peru. In the first phase, we randomize students' parents who have previously downloaded the app into 16 treatment groups and a control group. The parents in each group receive a combination of 2 messages out of 4 possible messages through an app notification. These messages seek increase the use of the learning up by students. to At the end of this phase, we conduct two analyses to determine the "best" message. First, we choose the best combination of messages based on their (estimated) average treatment effect. Second, we use machine learning techniques (a la Athey & Wager, 2018) to estimate the best message for each person based on (pre-treatment) observable characteristics. In the second phase, we run an experiment in which parents are randomly assigned to a treatment group in which they all receive the "best overall" message (based on the estimated ATE), a treatment group in which each parent is assigned the "best personalized" message based on the heterogeneity analysis and a control group. This allows us to compare the gains (if any) of personalization. Intuitively, the tradeoff in phase 2 is that the "best overall" message is estimated with more precision but may be suboptimal for some, while the "best personalized" is "optimal" among the set of messages for each individual, but this "optimality" is estimated with less precision. This study asks whether machine learning can improve the targeting and personalization of low-cost digital "nudges" — push-notification messages sent to parents to encourage their children to use an educational app. The setting is Conecta Ideas, a free smartphone-based mathematics platform used by primary-school students across Peru. Sustaining voluntary use of such platforms is a central challenge, and short motivational messages are a cheap, scalable lever. We study two distinct decisions a platform faces: (1) personalization — WHICH message to send to each person, choosing among several alternative messages the one predicted to work best for that individual; and (2) targeting — WHETHER to message a given person at all, given a single common message, concentrating effort on those predicted to respond most. The study works with a population of roughly 100,000 parents who had used the app at least once in the prior year. Each parent receives push notifications drawn from four behavioral message types — a teacher recommendation (social norms), peer/usage norms, the parent's role in their child's learning (identity), and the future opportunities that learning math creates (present bias). Treated parents receive two notifications per week (Mondays and Thursdays) over three weeks; the control group receives none. The primary outcome is whether the student logs into the platform in the weeks following the messaging; secondary outcomes are time spent on the platform and the number of exercises completed. The design has two phases. In Phase 1, parents are randomly assigned (at the individual level, stratified by whether the student logged in the previous week) across the candidate messages and a no-message control. Using the Phase 1 outcome data, we estimate both the average effect of each message and how those effects vary across individuals, using two machine-learning estimators — a causal forest and a k-nearest-neighbor matcher — fed a common set of pre-treatment characteristics (grade, location, prior platform use, baseline achievement, and school characteristics). In Phase 2, the same population is re-randomized into four arms of roughly equal size: a uniform "best" arm (everyone receives the single highest-average-effect message from Phase 1); two "personalized" arms (each person receives the message that the causal forest, or the nearest-neighbor model, predicts is best for them); and a "random" arm that serves as a no-personalization benchmark. Phase 2 is therefore an out-of-sample experimental test of the personalized and targeted assignment rules learned in Phase 1: the rules are fixed using one experiment's data and then evaluated on a fresh draw of the same population. The study is run twice in contrasting engagement environments. The first implementation takes place during the school year (October–December 2023), when baseline weekly login rates are high. A second implementation (added as an addendum; see below) replicates the same two-phase design during the summer vacation (January–February 2024), when schools are out of session and baseline engagement is far lower, so that the value of personalization and targeting can be compared across a high-engagement and a low-engagement context. Addendum disclosure: the summer implementation was not pre-registered in advance. It was designed and fielded after the school-year Phase 1 results were known and is documented here for completeness and transparency.
Trial Start Date October 16, 2023 October 09, 2023
Last Published October 17, 2023 01:55 PM June 19, 2026 12:03 PM
Intervention (Public) This paper studies the tradeoffs of personalizing treatments based on heterogeneity analysis. We do so in the setting of a learning platform (Conecta Ideas) in Peru. In the first phase, we randomize parents who have previously downloaded the app into 16 groups or a pure control group. These groups have all possible combinations of 16 messages to encourage platform use. Parents receive the messages in their group once times between October 12 and October 13, 2023. We then observe whether they connect and use the platform or not in the next couple of weeks. In the second phase, we run an experiment in which parents are assigned to a treatment group in which they all receive the "best overall" message (based on the estimated ATE) or a treatment group in which each parent is assigned the "best personalized" message based on the heterogeneity analysis of the first phase. Parents can also be randomly assigned to a control group. This will take place between October 30 and November 3, 2023. The intervention is a series of motivational push notifications delivered through the Conecta Ideas mobile app to parents of primary-school students, encouraging them to have their child use the app's mathematics exercises. Notifications are the only treatment; the platform's default is to send none. Message content. We designed four base messages, each built around a single behavioral channel: - Teacher recommendation (social norms): the child's teacher recommends regular use of the app. - Peer usage (social norms): many students across the country already use the app. - Parental support (identity): parents play a central role in their child's learning. - Future opportunities (present bias): learning math now opens later academic and job opportunities. Each message is lightly personalized by inserting the child's name (and, for the teacher message, the teacher's name); the catalog of four messages is held fixed across all phases and both implementations. Delivery. Treated parents receive two notifications per week — on Mondays and Thursdays at 5pm local time — for three weeks. The control group receives no notifications. Messages are sent as in-app push notifications and are short and low-bandwidth, consistent with the platform's design for low-end smartphones. How the intervention varies across phases. In Phase 1, the four base messages are combined into the experimental arms used to estimate effects (in the school-year experiment, the 16 arms are all ordered pairs of the four base messages — the message sent on Monday and the message sent on Thursday of each week; in the summer experiment, four single-message arms are used). In Phase 2, each parent is assigned a message according to one of the assignment rules under test — a single uniform "best" message for everyone, a message individually selected by a causal-forest model, a message individually selected by a nearest-neighbor model, or a randomly drawn message — and is then sent that message on the same twice-weekly, three-week cadence. The intervention is intentionally low-cost and high-frequency: the marginal cost of an additional message is effectively zero, which is why the study focuses on which message to send and to whom, rather than on the cost of messaging itself.
Intervention Start Date October 16, 2023 October 09, 2023
Intervention End Date November 30, 2023 December 03, 2023
Primary Outcomes (End Points) Login to the plataform, and the amount of time spent in it Platform login (extensive margin of use): a binary indicator equal to one if the student logs into the Conecta Ideas platform at least once during the post-intervention observation window (weekly login).
Experimental Design (Public) Our sample consists of 100,000 parents. 80,000 of these are assigned to the first phase of the experiment and are evenly divided among 16 groups. We then re-randomize all parents (including the 20,000 not involved in phase 1) into three treatment groups. XXX of these parents will be assigned to the "best overall" treatment, XXX to the "best personalized" treatment, and XX to the control. The treatment in both cases will be stratified by whether the parents connected in the previous week or not The study is a two-phase, individual-level randomized controlled trial, run with a population of roughly 100,000 parents of primary-school students who had used the Conecta Ideas mathematics app at least once in the prior year. The same population is carried across both phases so that assignment rules learned in the first phase can be tested out of sample on the second. PHASE 1 (estimation). Parents are randomly assigned across a set of message arms plus a no-message control. Randomization is at the individual level and stratified by whether the student logged in during the week before the experiment begins. In the school-year experiment, there are 16 message arms, formed as all ordered pairs of the four base messages (the message delivered on Monday and the message delivered on Thursday of each week); in the summer experiment, there are four single-message arms. Using the Phase 1 outcome data, we estimate the average effect of each message by OLS and estimate how treatment effects vary across individuals using two machine-learning methods — a causal forest and a k-nearest-neighbor matcher — each given a common vector of pre-treatment characteristics (e.g., grade, urban/rural location, prior platform use, baseline math achievement, and school characteristics). PHASE 2 (out-of-sample test). The same population is re-randomized, with roughly equal allocation, into four arms: - Best: every parent receives the single message with the highest average effect in Phase 1 (a uniform, non-personalized policy). - Causal forest (personalized): each parent receives the message that the causal forest predicts is best for that individual. - Nearest neighbor (personalized): each parent receives the message that the k-nearest-neighbor model predicts is best for that individual. - Random: each parent receives a randomly drawn message, serving as the no-personalization benchmark. Randomization is again at the individual level, stratified by prior-week login. Phase 2 is thus an experimental evaluation of the Phase 1 machine-learning predictions: the personalized and targeted assignment rules are fixed using Phase 1 data and then implemented on a fresh draw of the same population. The design lets us separate two questions. Personalization asks WHICH message to send to each individual (does tailoring the message beat a single uniform best message?). Targeting asks WHETHER to message a given individual at all, given a common message (can we identify who responds most and concentrate messaging on them?). The two phases together allow both questions to be answered out of sample rather than only on held-out folds of a single experiment. CONTEXTS. The design is implemented twice in contrasting engagement environments: once during the school year (high baseline platform use) and once during the summer vacation (low baseline platform use). Comparing the two lets us assess how the value of personalization and targeting depends on the engagement level of the target population. ADDENDUM DISCLOSURE. The summer implementation was not pre-registered in advance; it was designed and fielded after the school-year Phase 1 results were known and is described here for completeness and transparency.
Sample size (or number of clusters) by treatment arms In the first phase, there are 5,000 parents in each of the 16 possible groups. In the second phase, there are 30,000 parents in each of the 2 treatment groups (best overall, and best personalized), and 40,000 in the control group. Total sample: ~100,000 parent-student pairs, held fixed across phases. EXPERIMENT 1 — SCHOOL YEAR Phase 1: 16 message arms at ~5,000 each (~80,000) + no-message control (~20,000) = 100,000. Phase 2: 4 arms at ~25,000 each — Best, Causal Forest, Nearest Neighbor, Random (within each arm, ~half treated / ~half control). EXPERIMENT 2 — SUMMER (addendum; not pre-registered) Phase 1: 4 message arms + no-message control at ~20,000 per cell = ~100,000. Phase 2: 4 arms at ~25,000 each — Best, Causal Forest, Nearest Neighbor, Random (within each arm, ~half treated / ~half control).
Power calculation: Minimum Detectable Effect Size for Main Outcomes We estimate that we are able to detect a treatment effect (or a difference in treatments) of XX percentage points in phase 2. This is with power 80% and size 5%. EXPERIMENT 1 — SCHOOL YEAR (Phase 2, ~25,000 per arm) MDE for the personalized-vs-uniform-best (Causal Forest - Best) contrast: ~1.2 p.p. Individual-arm standard error: ~0.31 p.p. Control-arm login rate: ~36% (Phase 2); ~42% (Phase 1). Outcome (Bernoulli) standard deviation at the control mean: ~0.48. MDE expressed relative to the control mean: ~3% of the control login rate (~0.025 SD). EXPERIMENT 2 — SUMMER (Phase 2, ~25,000 per arm; addendum, not pre-registered) MDE for the same contrast: ~0.5 p.p. Individual-arm standard error: ~0.12 p.p. Control-arm login rate: ~2%. Outcome (Bernoulli) standard deviation at the control mean: ~0.14. MDE relative to the control mean: ~25% of the control login rate (~0.036 SD).
Secondary Outcomes (End Points) Intensive margin of platform use: (1) time spent on the platform (minutes) and (2) the number of exercises/modules completed during the post-intervention window.
Back to top