Adoption and Effectiveness of AI Tutoring in Tertiary Education

Last registered on December 20, 2024

Pre-Trial

Trial Information

General Information

Title
Adoption and Effectiveness of AI Tutoring in Tertiary Education
RCT ID
AEARCTR-0014959
Initial registration date
December 19, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
December 20, 2024, 2:25 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
ifo Institute

Other Primary Investigator(s)

PI Affiliation
IZA Bonn

Additional Trial Information

Status
In development
Start date
2024-12-09
End date
2026-12-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We study whether and how much AI technologies improve learning and educational outcomes in a tertiary education context. We test this question using a large-scale randomized field experiment. Adapting an encouragement design, we evaluate which students select into usage of a targeted AI assistant within an online educational program, and assess the effects of this AI assistant on educational outcomes using data on study progress and grades based on admin data from the university.
External Link(s)

Registration Citation

Citation
Hermes, Henning and Ingo Isphording. 2024. "Adoption and Effectiveness of AI Tutoring in Tertiary Education." AEA RCT Registry. December 20. https://doi.org/10.1257/rct.14959-1.0
Experimental Details

Interventions

Intervention(s)
Our RCT investigates the effects of encouraging students to use an AI assistant on their study success. The AI assistant, integrated into the university's online learning platform, is a Chatbot that offers students support by answering clarifying questions, providing examples, looking up definitions, and referencing relevant study materials. The Chatbot, which is technically built on the current GPT-4 framework, has been trained with direct references to relevant sources such as study letters, further literature or presentations, in order to assist students in their individual learning process. Students in the intervention group will receive specific encouragement to use the Chatbot more intensively, while a control group will not receive such encouragement.
Intervention Start Date
2025-01-02
Intervention End Date
2025-05-08

Primary Outcomes

Primary Outcomes (end points)
First, we will analyze the usage of the AI assistant (Chatbot), selection into usage, and describe the type of usage. Specific focus will be given to selection based on initial levels of skills, in particular, digital skills.

Second, we will evaluate effects of Chatbot use on educational success, measured as study progress (completed modules) and academic performance (grades).
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
- Chatbot usage vs. classical tutor support
- Heterogeneity analysis: Comparing usage of the Chatbot, study progress and academic performance by (i) gender, (ii) critical thinking and problem solving skills, (iii) digital literacy / competencies, (iv) socioeconomic background, and (v) usage type (see below).
- Perceived Benefits: Assessed through survey questions regarding students' perceptions of the benefits and returns from using the Chatbot.
- Satisfaction: Evaluated through surveys measuring satisfaction with the Chatbot.
Secondary Outcomes (explanation)
We plan to use detailed chat protocols to examine heterogeneous effects by type of the AI tutor usage. We will operationalize the usage type through several components:
1.⁠ ⁠Frequency of Usage: The number of sessions during which a student interacts with the Chatbot, measured as unique login events within the study period.
2.⁠ ⁠Extent of Usage: The cumulative length of interactions, quantified by the total number of words or characters exchanged in the chat sessions.
3. Intent of Chats: We will employ large language models (LLMs) to classify the intent behind student interactions with the Chatbot, e.g. usage for definitions or clarifications, generation of real-world examples, or more general inquiries or other purposes
4.⁠ ⁠Sophistication: We further employ LLMs to assess the sophistication of prompts regarding complexity, specificity, etc.
5.⁠ ⁠Interactiveness: We assess the interactiveness of chats by the number of follow-up prompts within a single session, indicating iterative engagement with the Chatbot.

Experimental Design

Experimental Design
We will conduct a field experiment in which participants are randomly assigned to either the treatment or control group. Participants in the treatment group will receive encouragement to use an AI assistant (Chatbot), while those in the control group will not receive this encouragement. The AI assistant is generally available for all students. All students will be part of the experiment, either in the treatment group or the control group. The experiment will also include a baseline survey for all students to gather information on their perceptions and usage of the Chatbot, as well as personal characteristics. Main outcomes will be provided with administrative data from the university as well as an endline survey.
Experimental Design Details
Not available
Randomization Method
Treatment assignment by running odd/even contractual number at sign up.
Randomization Unit
Individual student
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
no clustering
Sample size: planned number of observations
7000 students
Sample size (or number of clusters) by treatment arms
3500 treatment, 3500 control
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We estimate a small share of students actively using the AI assistant absent an intervention (<10%, n < 350). Through the encouragement design, we hope to increase the share of active users up to 20% (n = 700). Based on this, we estimate an MDE of about 18% of a standard deviation with 80% power and alpha = .05.
IRB

Institutional Review Boards (IRBs)

IRB Name
Gesellschaft für experimentelle Wirtschaftsforschung e.V.
IRB Approval Date
2024-07-25
IRB Approval Number
L7hwsL39