Experimental Design
This study uses a between-subjects randomized experiment with two conditions. Participants are randomly assigned in Qualtrics to either a treatment group or a control group. The study is administered online and includes adults aged 18 or older who are transitioning to the labor market or in early career stages, including higher-education students nearing graduation, job seekers, and individuals who have recently started working. Participants in the treatment group are allowed to use generative AI and are given in-survey access to OpenAI’s o4-mini model during the experimental tasks via an integrated ChatGPT interface. Participants in the control group are not allowed to use generative AI or other external tools.
All participants complete the same four core productivity tasks under otherwise similar conditions, measuring four dimensions of knowledge work: writing, information synthesis, creativity, and data interpretation. These four tasks constitute the primary global productivity index. In addition, all participants complete a separate fifth task: an unannounced recall (memory) task administered at the very end of the experiment, after treatment-group participants no longer have access to the integrated AI interface. Because this task captures memory rather than contemporaneous task productivity, it is analyzed separately and is not part of the primary four-task index.
To minimize order and fatigue effects, the creativity task is always presented first, the recall task is always shown last, and the remaining three core tasks are presented in a randomized order. The creativity task is fixed in the first position because it is linked to the later recall measure; keeping it fixed maintains a comparable retention interval between initial idea generation and the recall task across participants, whereas fully randomizing it would introduce avoidable variation in that delay and could bias treatment-control comparisons on the memory outcome. This design allows the study to examine whether earlier AI use affects later recall of self-generated ideas, consistent with theories of cognitive offloading and with evidence that expectations of external information availability can reduce later recall.
The primary analysis is an intention-to-treat (ITT) comparison: it includes all participants who complete the study and provide outcome data, analyzed according to their randomized assignment, with no exclusions based on post-randomization behavior. Participants are considered non-analyzable only for reasons unrelated to treatment, such as failure to complete the survey, technical issues, or duplicate participation. Indicators of likely prohibited tool use in the control group are defined ex ante from Qualtrics embedded paradata (window-focus / tab-switch events and copy/paste behavior on task pages); because these indicators are measured after randomization, any analysis that excludes participants on this basis is reported as a secondary sensitivity analysis rather than as the primary specification. All such exclusions are reported transparently.
The global productivity index is the single confirmatory outcome. The task-level and component (time, quality) analyses, the moderation by prior AI experience, and the recall analysis are secondary tests within each of these families are corrected for multiple hypotheses or are reported as explicitly exploratory.