Intervention(s)
The intervention is NUMI, a research-oriented computer-assisted learning (CAL) platform that delivers weekly math practice to Grades 4-9 students. Each week, teachers assign a 30-90 minute NUMI assignment consisting of short instructional videos, exercises, and a brief exit-ticket assessment that contributes to a participation grade. Students who do not finish in class complete the remainder at home or during extra time.
Students are randomized at the individual level within their classroom to one of four conditions in a 2x2 factorial design:
- CAL only (control): non-Mastery progression, no AI tutor, step-by-step solutions on mistakes.
- CAL + AI: non-Mastery progression, plus access to NUMI's guard-railed AI tutor.
- CAL + Mastery: must answer 3 problems correctly in a row before advancing, no AI tutor.
- CAL + Mastery + AI: Mastery progression plus AI tutor.
In Mastery mode, students must answer three problems in a row correctly before advancing to the next exercise; in non-Mastery mode, students decide when they feel ready to take the exit ticket. In the AI arms, the AI tutor becomes available in a safe, domain-bounded chat space when a student makes a mistake or requests help. It elicits the student's reasoning and walks through steps together without revealing the final answer. NUMI primarily poses questions requiring binary (yes/no) or multiple-choice responses, with an occasional "Help Me Get Started" option (option A/B). After a student errs, NUMI suggests the likely misconception, reviews the worked solution step-by-step, and prompts for further questions. A text box is available for open-ended questions; filters and classifiers suppress off-topic or personal content. No student inputs are stored by any external AI provider. In the non-AI arms, students see step-by-step worked solutions when they make a mistake.
Trimester rotation: Each student rotates conditions across the three trimesters, experiencing three of the four conditions, never repeating. The rotation is pre-randomized at the start of the year.
Embedded A/B test within AI arms (pre-specified): Students in the two AI arms are independently randomized to either standard prompting or a humor / relatable-examples variant. Additional embedded A/B tests may be added in registry updates before each trimester begins.
Improvement checks: Every 2-4 weeks, teachers deliver short in-class "improvement check" assessments that revisit prior content after a lag to gauge retention.