Primary Outcomes (explanation)
Survey outcomes:
- Utilizing prompt engineering techniques (index): based on aggregating multiple Likert scale questions (steps, context, example, practice), so the scale is increasing in the use of prompting techniques covered in the AI training.
Administrative Outcome:
- Percent of prompts that do not copy-paste the question: we will use AI to identify the proportion of prompts that are not a low-effort copy-paste of the question, leaving us with prompts where the student is engaging in a productive discussion with the AI (e.g., trying to understand the problem, understanding a concept, or double-checking their answers). As Franco, Irmert, and Isaksson (2026, SSRN) find that students perform worse when over 80% of the prompt resembles the original question, we will aim to use close to an 80% matching for classifying the copy-paste prompts. We will build on the prompts in Fischer, Rau, Rilke (2025, IZA) to do the classification of prompts in our context.
- Percent of prompts that utilize at least one prompt engineering technique (role, context, steps, or example, practice): we will use AI to identify prompts that utilize one of the techniques covered in the AI training module. If it is the case that students' frequency of using these techniques is very small, we may instead consider the binary outcome of whether any prompt engineering techniques are employed.
- We will standardize the closed-book exam score by the control group as is standard, so that the effect sizes will be SD changes relative to the control.
To validate the AI-generated outcomes, we will have a student research assistant conduct a random check on a subset of the data to ensure the AI classifications are reliable. If the AI has low reliability, we will infer prompt quality through manual inspections.