AI-Powered Tutoring: Unleashing the Full Potential of Personalized Learning with Khanmigo

Last registered on May 09, 2024

Pre-Trial

Trial Information

General Information

Title
AI-Powered Tutoring: Unleashing the Full Potential of Personalized Learning with Khanmigo
RCT ID
AEARCTR-0013519
Initial registration date
April 28, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 09, 2024, 1:52 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
University of Toronto

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2024-06-01
End date
2026-06-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Tutoring stands out as a highly effective educational policy for improving student outcomes, but its implementation is hindered by issues of scalability and cost. One solution involves equipping teachers with enhanced skills in utilizing Computer Assisted Learning (CAL) to simulate the tutoring experience at a lower expense. A recent evaluation of this approach, called "Khoaching with Khan Academy" demonstrated meaningful improvements in standardized math test scores for elementary students whose teachers were randomly assigned to receive such assistance, as compared to students whose teachers did not benefit from it. Although this intervention successfully enhanced average math scores, the study also highlighted significant variations in CAL practice time both within and across classrooms receiving the treatment. To improve engagement of the Khan Academy platform and increase practice time, the KWiK program will leverage Khanmigo, a ChatGPT virtual assistant developed by Khan Academy, as a personalized tutor to support student’s progress with Khan Academy assignments. The additional improvement to the program is from dedicated practice time. Participating schools will set aside 30-45 minutes at the end of school during independent learning time for students to practice on KA with khanmigo.

We will conduct a grade-level RCT, randomizing Grades 6-8 within each of about 20 schools for students to receive the program for Tier 2 and 3 students. Teachers would receive mandatory training before the beginning of school with simplified instructions and a project manager within the district would be hired to manage khoach meetings and field visits to help ensure fidelity. All students would be given khanmigo access to maximize power for estimating the full program. The analysis will include impacts both at the end of the program in year 1, and a year after the program has concluded, at the end of year 2.

External Link(s)

Registration Citation

Citation
Oreopoulos, Philip. 2024. "AI-Powered Tutoring: Unleashing the Full Potential of Personalized Learning with Khanmigo." AEA RCT Registry. May 09. https://doi.org/10.1257/rct.13519-1.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
Educational disparities rooted in socioeconomic status (SES) continue to be a major public policy issue, with significant impacts on long-term economic opportunities for young people. Traditional classroom settings, characterized by a wide range of academic levels among students, present substantial challenges to effective teaching and learning. Teachers cannot easily navigate diverse individual educational needs while being required to present a fixed set of topics over a limited period of time. Variance in academic preparedness and learning comprehension means that advancing new material can leave some students struggling to keep up, not due to a lack of capability but because of increasing foundational gaps. This heterogeneity is further exacerbated by factors such as the COVID-19 pandemic, which has widened the educational gaps and underscored the urgency of finding scalable solutions to provide personalized instruction.

Mastery learning, a pedagogical approach where students must demonstrate a high level of understanding before moving to the next topic, offers a framework for addressing these disparities. However, the traditional model of one-on-one tutoring, while effective, may be too costly and operationally challenging to scale. This project aims to explore the roles that Computer Assisted Learning (CAL) and Large Language Model (LLM) technologies might play as scalable solutions to these challenges to support mastery learning across diverse educational settings.

The main intervention is a program to support regular CAL practice for 30-45 minutes a day during a school's "What I Need" time at the end of each school day for Tier 2 and 3 Grade 6-6 students. Randomization is at the grade level across approximatley 50 schools. Schools receive training, support, and weekly guidance around successful implementation, with a focus on ensuring high dosage practice and adequate progress. Students follow an incremental roadmap of mastery learning both for math and English, with teachers and supervisors choosing particular units and topics for students to follow. Coaches help teachers and administrative staff ensure fidelity.
Intervention Start Date
2024-08-01
Intervention End Date
2025-05-31

Primary Outcomes

Primary Outcomes (end points)
Primary outcomes are math and English assessment scores. Standardized tests are collected multiple times during the school year and at the end. These outcomes will be normalized so that the control group variables have mean zero, standard deviation one for each grade. We aim to also collect subjective survey data from both control and treated students around math and english satisfaction, attitudes towards school and life in general.
Primary Outcomes (explanation)
Subjective questions will be measured on a 7 point Likert scale.

Secondary Outcomes

Secondary Outcomes (end points)
We will also examine intermediate outcomes, looking at impacts on Khan Academy practice time and progress. These outcomes will be measured in terms of total practice minutes and proficiency levels attained over the school period. In follow up years we will also examine second year impacts on similar assessment scores to first year primary outcomes.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Grade 6-8 students classified as Tier 3 at the beginning of school similar Tier 2 students that schools that choose to include in the program would be the target population. Grades/schools would be randomized to determine which grades will receive the program and support, and which ones will be selected as comparison grades. Each school would have at least one grade selected, and not more than two. Grades selected for randomization will have their dedicated daily "What-I-Need" time used for Khan Academy practice both for math and ELA. Teachers would receive mandatory training before the beginning of school with simplified instructions and a project manager would be hired to manage khoach meetings and field visits to help ensure fidelity. Supervisors during WIN time will meet regularly (weekly) with khoaches or RTI leads to assess progress. All students would be given khanmigo access to maximize power for estimating the full program. The analysis will include impacts both at the end of the program in year 1, and a year after the program has concluded, at the end of year 2.
Experimental Design Details
Not available
Randomization Method
Randomization using R code with seed is:

1) choose random number between 0-1 for each school. Above or below 0.5 determines if school has 1 or 2 treated grades.
2) choose random number between 0-1 for each grade in each school, sort, and top 1 or 2 grades, depending on (1) will be treated grades.
Randomization Unit
Grades are randomized within each of 22 schools. Each school will have either one or two grades treated (out of 3, for Grades 6-8)
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
22*3 = 66 for the first year. Second year analysis will include some students treated once, some twice, and some not at all, which will help increase power and examine for multiple dosage analysis.
Sample size: planned number of observations
66 grades, with about 50 Tier 2 and 3 students in each grade (at total of 3,300 students in Year 1).
Sample size (or number of clusters) by treatment arms
33 treated grades
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
With ICC of 0.1 (in part from conditioning on previous test scores), MDE is 0.17 for 80% power at 5% statistical significance.
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Toronto
IRB Approval Date
2024-04-23
IRB Approval Number
46104