Adoption and Effectiveness of AI Tutoring in Tertiary Education

Last registered on December 20, 2024

View Trial History

Pre-Trial

Trial Information

General Information

Title

Adoption and Effectiveness of AI Tutoring in Tertiary Education

RCT ID

AEARCTR-0014959

Initial registration date

December 19, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

December 20, 2024, 2:25 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Henning Hermes

Affiliation

ifo Institute

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Ingo Isphording

PI Affiliation

IZA Bonn

Contact Investigator

Additional Trial Information

Status

In development

Start date

2024-12-09

End date

2026-12-31

Keywords

Behavior, Education

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

We study whether and how much AI technologies improve learning and educational outcomes in a tertiary education context. We test this question using a large-scale randomized field experiment. Adapting an encouragement design, we evaluate which students select into usage of a targeted AI assistant within an online educational program, and assess the effects of this AI assistant on educational outcomes using data on study progress and grades based on admin data from the university.

External Link(s)

Registration Citation

Citation

Hermes, Henning and Ingo Isphording. 2024. "Adoption and Effectiveness of AI Tutoring in Tertiary Education." AEA RCT Registry. December 20. https://doi.org/10.1257/rct.14959-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Our RCT investigates the effects of encouraging students to use an AI assistant on their study success. The AI assistant, integrated into the university's online learning platform, is a Chatbot that offers students support by answering clarifying questions, providing examples, looking up definitions, and referencing relevant study materials. The Chatbot, which is technically built on the current GPT-4 framework, has been trained with direct references to relevant sources such as study letters, further literature or presentations, in order to assist students in their individual learning process. Students in the intervention group will receive specific encouragement to use the Chatbot more intensively, while a control group will not receive such encouragement.

Intervention Start Date

2025-01-02

Intervention End Date

2025-05-08

Primary Outcomes

Primary Outcomes (end points)

First, we will analyze the usage of the AI assistant (Chatbot), selection into usage, and describe the type of usage. Specific focus will be given to selection based on initial levels of skills, in particular, digital skills.

Second, we will evaluate effects of Chatbot use on educational success, measured as study progress (completed modules) and academic performance (grades).

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

- Chatbot usage vs. classical tutor support
- Heterogeneity analysis: Comparing usage of the Chatbot, study progress and academic performance by (i) gender, (ii) critical thinking and problem solving skills, (iii) digital literacy / competencies, (iv) socioeconomic background, and (v) usage type (see below).
- Perceived Benefits: Assessed through survey questions regarding students' perceptions of the benefits and returns from using the Chatbot.
- Satisfaction: Evaluated through surveys measuring satisfaction with the Chatbot.

Secondary Outcomes (explanation)

We plan to use detailed chat protocols to examine heterogeneous effects by type of the AI tutor usage. We will operationalize the usage type through several components:
1.⁠ ⁠Frequency of Usage: The number of sessions during which a student interacts with the Chatbot, measured as unique login events within the study period.
2.⁠ ⁠Extent of Usage: The cumulative length of interactions, quantified by the total number of words or characters exchanged in the chat sessions.
3. Intent of Chats: We will employ large language models (LLMs) to classify the intent behind student interactions with the Chatbot, e.g. usage for definitions or clarifications, generation of real-world examples, or more general inquiries or other purposes
4.⁠ ⁠Sophistication: We further employ LLMs to assess the sophistication of prompts regarding complexity, specificity, etc.
5.⁠ ⁠Interactiveness: We assess the interactiveness of chats by the number of follow-up prompts within a single session, indicating iterative engagement with the Chatbot.

Experimental Design

We will conduct a field experiment in which participants are randomly assigned to either the treatment or control group. Participants in the treatment group will receive encouragement to use an AI assistant (Chatbot), while those in the control group will not receive this encouragement. The AI assistant is generally available for all students. All students will be part of the experiment, either in the treatment group or the control group. The experiment will also include a baseline survey for all students to gather information on their perceptions and usage of the Chatbot, as well as personal characteristics. Main outcomes will be provided with administrative data from the university as well as an endline survey.

Experimental Design Details

Not available

Randomization Method

Treatment assignment by running odd/even contractual number at sign up.

Randomization Unit

Individual student

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

no clustering

Sample size: planned number of observations

7000 students

Sample size (or number of clusters) by treatment arms

3500 treatment, 3500 control

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

We estimate a small share of students actively using the AI assistant absent an intervention (<10%, n < 350). Through the encouragement design, we hope to increase the share of active users up to 20% (n = 700). Based on this, we estimate an MDE of about 18% of a standard deviation with 80% power and alpha = .05.

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Gesellschaft für experimentelle Wirtschaftsforschung e.V.

IRB Approval Date

2024-07-25

IRB Approval Number

L7hwsL39

Analysis Plan