NUMI, Mastery Learning, and Short-Duration Math Practice

Last registered on April 14, 2026

Pre-Trial

Trial Information

General Information

Title
NUMI, Mastery Learning, and Short-Duration Math Practice
RCT ID
AEARCTR-0018315
Initial registration date
April 09, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
April 14, 2026, 9:01 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Primary Investigator

Affiliation
University of Toronto

Other Primary Investigator(s)

PI Affiliation
University of Toronto
PI Affiliation
University of Pennsylvania

Additional Trial Information

Status
Completed
Start date
2026-03-30
End date
2026-04-03
Secondary IDs
none
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We are studying whether short classroom math practice activities can be improved by combining mastery learning with an AI tutor. Middle school students are randomly assigned to different versions of an online math practice activity during class. Some students receive access to an AI tutor, while others do not. Some students complete the activity under mastery-learning rules, meaning they must watch enough of the lesson video and answer questions correctly before moving on, while others can move through the activity more freely. Students are also randomly assigned which math topic they practice.

About one week later, students take a test that includes questions on both the topic they practiced and a topic they did not practice. This allows us to compare how much students learned from the activity and whether the effects differ depending on AI access and mastery learning. We also study how far students progress through the activity, how they interact with the AI tutor, and whether technical issues such as website speed may affect results.
External Link(s)

Registration Citation

Citation
Liut, Michael, Philip Oreopoulos and Alp Sangu. 2026. "NUMI, Mastery Learning, and Short-Duration Math Practice." AEA RCT Registry. April 14. https://doi.org/10.1257/rct.18315-1.0
Experimental Details

Interventions

Intervention(s)
Students participated in an online classroom math practice activity during a regular class period. Students were randomly assigned to receive access to an AI math tutor or not, and to complete the activity either under mastery-learning rules or under a more flexible non-mastery format. Students were also randomly assigned which of two math topics to practice. In the mastery condition, students had to watch at least part of the instructional video and answer questions correctly before moving on. In the non-mastery condition, students could move through the material more freely. A week later, students completed a follow-up test.
Intervention (Hidden)
Students in grades 6, 7, and 8 completed an online math practice activity during a 50-minute class period. Students were randomized along three dimensions: (i) AI tutor access, (ii) mastery-learning rules, and (iii) topic assignment. Each student was assigned one of two topic bundles, each consisting of two sets of “one video, one exercise,” followed by an exit ticket.

In the mastery condition, students had to watch at least one minute of the video and answer three questions in a row correctly before moving to the next set, and again before reaching the exit ticket. In the non-mastery condition, students had to attempt at least one question but could otherwise move through videos, exercises, and the exit ticket more freely.

Students assigned to AI had access to NUMI, an AI math tutor. NUMI provided several forms of support, including “help get me started” hints, structured step-by-step walkthroughs after mistakes, optional explanations of selected solution steps, and limited open-ended interaction. Students not assigned to AI could view solutions after mistakes but did not receive tutoring support.

Approximately one week later, students completed a test with one question corresponding to each exercise type for both the practiced and unpracticed topic bundles. This design allows estimation of the effect of practice by comparing each student’s performance on the topic they practiced to their performance on the topic they did not practice.
Intervention Start Date
2026-03-30
Intervention End Date
2026-04-03

Primary Outcomes

Primary Outcomes (end points)
The primary outcome is student performance on the week-later test, measured as the within-student difference between performance on questions corresponding to the topic assigned for practice and performance on questions corresponding to the topic not assigned for practice.
Primary Outcomes (explanation)
Each student completes a follow-up test containing one question corresponding to exercise 1 and one question corresponding to exercise 2 for the practiced topic, and one question corresponding to exercise 1 and one question corresponding to exercise 2 for the unpracticed topic. The main outcome is constructed as:

(score on practiced-topic questions) minus (score on unpracticed-topic questions).

We will also examine this separately for the question corresponding to exercise 1 and the question corresponding to exercise 2.

Secondary Outcomes

Secondary Outcomes (end points)
Secondary outcomes include:

total follow-up test score
score on the practiced-topic questions
score on the unpracticed-topic questions
exit ticket score
progression through the practice activity, including whether students reached the second set and the exit ticket
mastery-specific progression outcomes, including whether students cleared the first and second mastery gates
AI usage and engagement measures, including whether students used the AI tutor, number of interactions, and time spent interacting with the tutor
Secondary Outcomes (explanation)
The exit ticket score is based on two questions completed during the practice session, one corresponding to the first exercise and one corresponding to the second. Progression outcomes are constructed from platform logs and include whether the student reached or completed major steps in the activity. Mastery-gate outcomes are based on whether a student satisfied the platform’s progression requirements for each set. AI usage outcomes are constructed from interaction logs, including counts and timing of tutor use.

Experimental Design

Experimental Design
This study uses randomized assignment of middle school students to different versions of an in-class online math practice activity. Students are randomly assigned to: (i) access to an AI tutor or not, (ii) mastery-learning rules or not, and (iii) one of two math topics to practice. About one week later, students complete a follow-up test covering both the topic they practiced and the topic they did not practice. The main analysis compares, within student, performance on practiced versus unpracticed material across treatment groups.
Experimental Design Details
The design is a 2 x 2 factorial experiment crossing AI tutor access with mastery learning, with an additional random assignment of practice topic. The four main treatment groups are: Mastery + AI, Non-mastery + AI, Mastery + no AI, and Non-mastery + no AI. Topic assignment is randomized independently. The primary estimand is the within-student practice effect, defined as performance on follow-up test questions corresponding to the practiced topic minus performance on follow-up test questions corresponding to the unpracticed topic. Secondary analyses examine progression through the activity, exit ticket performance, mastery-gate completion, AI engagement, and heterogeneity by grade and topic. Exploratory robustness analyses will examine whether website slowness and low-effort test-taking affect results.
Randomization Method
Randomization was done by computer through the online platform at the student level.
Randomization Unit
Individual student. Topic assignment, AI assignment, and mastery assignment were all randomized at the individual student level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Not applicable; randomization is at the individual student level.
If the registry insists on an entry, use: 0 clusters (individual-level randomization) or N/A - individual-level randomization, depending on what it accepts.
Sample size: planned number of observations
Approximately 8000 students in grades 6–8
Sample size (or number of clusters) by treatment arms
50 50 split
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Assuming approximately 8,000 students, individual-level randomization, 80 percent power, and a two-sided 5 percent significance level, the study is powered to detect effects of about 0.06 standard deviations for a main 50/50 treatment comparison on standardized test-score outcomes. For pairwise comparisons across the four main treatment arms, the corresponding minimum detectable effect is about 0.09 standard deviations.
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number
Analysis Plan

Analysis Plan Documents

full preanalysis plan

MD5: e950d863ce9841d6ee630e095f9a5fb6

SHA1: ababd29e615cce1d928975fa065bc3d991ee17d9

Uploaded At: April 09, 2026

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials