From Autopilot to Co-Pilot: Guiding Effective AI Use in Higher Education

Last registered on April 13, 2026

View Trial History

Pre-Trial

Trial Information

General Information

Title

From Autopilot to Co-Pilot: Guiding Effective AI Use in Higher Education

RCT ID

AEARCTR-0017652

Initial registration date

April 06, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

April 13, 2026, 9:10 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Hammad Shaikh

Affiliation

University of Stavanger Business School

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Ajinkya Keskar

PI Affiliation

University of Binghamton

Contact Investigator

PI Name

Ozlem Tonguc

PI Affiliation

University of Binghamton

Contact Investigator

PI Name

Case Tatro

PI Affiliation

University of Binghamton

Contact Investigator

Additional Trial Information

Status

On going

Start date

2026-04-06

End date

2026-12-18

Keywords

Education

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

The recent introduction of generative AI (e.g., ChatGPT) has sparked contentious debates among educators over whether to adopt or ban it. This study aims to provide empirical evidence on whether training and incentivising students to use generative AI as a complementary productivity-enhancing technology (rather than a substitute) for their own effort can increase educational attainment. In a lab experiment, students learning about economic literacy are randomized into four groups: (1) No AI, (2) AI, (3) AI and training, and (4) AI, training, and incentives. The training treatment teaches students evidence-based prompt engineering techniques and encourages students to use AI in a complementary way. Additionally, some students are given a bonus payment for writing high-quality prompts as instructed in the training. Our key outcomes include whether students integrate prompt engineering techniques, write prompts that complement their own effort, and their performance on a closed-book exam. The results of the study will provide evidence on whether students can be trained to use AI more effectively and what effects this has on their learning.

External Link(s)

Registration Citation

Citation

Keskar, Ajinkya et al. 2026. "From Autopilot to Co-Pilot: Guiding Effective AI Use in Higher Education." AEA RCT Registry. April 13. https://doi.org/10.1257/rct.17652-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Students are randomized into one of four groups. The control group learns traditionally without AI. Then there are three treatment groups (1) AI Access, (2) AI Access + Prompt Engineering Training, and (3) AI Access + Prompt Engineering Training + Incentives for Good Prompts.

The AI training module is a 10-minutes video which begins by highlighting the importance of critical thinking (providing research evidence showing returns to critical thinking over memorized knowledge) and encouraging students to use AI as a complement, rather than a substitute. Then the training transitions into prompt engineering, where they are taught 3 useful and evidence-based prompt engineering techniques and shown examples. The three techniques include (i) providing a role and context, (ii) breaking down problems into smaller steps, (3) giving examples and generating practice problems. Finally, in the final segment of the training, students are asked to improve low-quality prompts and integrate the taught techniques.

To provide incentives to students to utilize and focus on the AI training, students in the final treatment group are given a bonus payment based on the quality of prompts they write. If they integrate some of the prompt engineering techniques, they are awarded with a bonus payment.

Intervention Start Date

2026-04-07

Intervention End Date

2026-12-18

Primary Outcomes

Primary Outcomes (end points)

We have both survey and administrative outcomes, enabling us to have both self-reported and actually observed AI usage and prompting strategies.

Survey outcomes:
- Assigned a tutor/TA role to the AI (yes/no)
- Utilizing prompt engineering techniques (index)
- Frequency of copy-pasting problems without trying them (5-point Likert scale)

Administrative outcomes (some outcomes based on students' prompt logs):
- Any AI used
- Percent of prompts that do not copy-paste the question
- Percent of prompts that utilize at least one prompt engineering technique (role, context, steps, or example)
- Score on the closed-book exam

Primary Outcomes (explanation)

Survey outcomes:
- Utilizing prompt engineering techniques (index): based on aggregating multiple Likert scale questions (steps, context, example, practice), so the scale is increasing in the use of prompting techniques covered in the AI training.

Administrative Outcome:
- Percent of prompts that do not copy-paste the question: we will use AI to identify the proportion of prompts that are not a low-effort copy-paste of the question, leaving us with prompts where the student is engaging in a productive discussion with the AI (e.g., trying to understand the problem, understanding a concept, or double-checking their answers). As Franco, Irmert, and Isaksson (2026, SSRN) find that students perform worse when over 80% of the prompt resembles the original question, we will aim to use close to an 80% matching for classifying the copy-paste prompts. We will build on the prompts in Fischer, Rau, Rilke (2025, IZA) to do the classification of prompts in our context.
- Percent of prompts that utilize at least one prompt engineering technique (role, context, steps, or example, practice): we will use AI to identify prompts that utilize one of the techniques covered in the AI training module. If it is the case that students' frequency of using these techniques is very small, we may instead consider the binary outcome of whether any prompt engineering techniques are employed.
- We will standardize the closed-book exam score by the control group as is standard, so that the effect sizes will be SD changes relative to the control.

To validate the AI-generated outcomes, we will have a student research assistant conduct a random check on a subset of the data to ensure the AI classifications are reliable. If the AI has low reliability, we will infer prompt quality through manual inspections.

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

The experimental design has the following four groups:

(1) No AI (traditional learning)
(2) AI access (free chatGPT)
(3) AI access + AI effective use training
(4) AI access + AI effective use training + bonus payment for prompts that implement practices taught in the training

Experimental Design Details

Not available

Randomization Method

We will do a stratified randomization. We stratify respondents on a 2 x 3 grid using (i) baseline academic preparation and (ii) baseline frequency of AI use. Academic preparation is a binary index built from current GPA, SAT score, prior high-school economics, and self-rated economics proficiency. On the other hand, AI use is grouped into none, occasional/unknown, and regular/extensive. We have 6 stratas in total.

Randomization Unit

The randomization unit is the student.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

No clusters, but we may have up to 40 lab sessions.

Sample size: planned number of observations

1000 students is the ideal target. If we have recruiting difficulties, we will aim for around 500.

Sample size (or number of clusters) by treatment arms

For example, N = 1000: 400 in the control group (no AI) and 200 in all three treatment groups, around 20 students per lab session. Smaller samples will keep the same distribution.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Binghamton University Institutional Review Board

IRB Approval Date

2026-03-17

IRB Approval Number

STUDY00007102

Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information