|
Field
Abstract
|
Before
This randomized controlled trial (RCT) evaluates the causal impact of an AI-powered Research Development Tool on the academic productivity and well-being of researchers. Participants, primarily PhD students and junior female economists, will be randomly assigned to one of two groups: a control group receiving feedback from a general-purpose AI, or a treatment group gaining access to a comprehensive AI-driven "Research Development Suite." This suite offers detailed, structured feedback on research papers and integrated workflow features. Over a 24-month intervention period, we will measure changes in objective productivity metrics (e.g., papers submitted/published, co-author networks) and subjective well-being (e.g., job satisfaction, work-life balance). The study aims to determine whether advanced AI tools can enhance career development, foster equity in the research community by making tacit knowledge more accessible, and ultimately improve the quality of academic output.
|
After
This randomized controlled trial (RCT) evaluates the causal impact of an AI-powered Research Development Tool on the academic productivity and well-being of early-career researchers. Participants — PhD students and junior economists within five years of their doctorate — will be randomly assigned, within career-stage strata, to either a control group with access to a general-purpose AI or a treatment group with access to a comprehensive AI-driven Research Development Suite offering structured, expert-level feedback on research papers. The control group will receive access 12 months after the experiment's start date. Over a 24-month intervention period, we measure changes in externally evaluated research quality, paper submission rates, job satisfaction, and work-life balance.
Our central question is whether the productivity effects of AI access vary systematically with the researcher's level of accumulated tacit knowledge and structural advantage — specifically evaluating heterogeneity by career stage and controlling for baseline institutional prestige. We test whether AI acts as a substitute for elite networks (compressing inequality) or a complement to existing advantages (amplifying inequality).
|
|
Field
Last Published
|
Before
April 09, 2026 05:01 PM
|
After
April 20, 2026 05:28 PM
|
|
Field
Intervention (Public)
|
Before
This is a two-arm, parallel-group, single-blind randomized controlled trial (RCT). Participants will be randomly allocated to either a control group or a treatment group in a 1:1 ratio. The intervention will last for 24-26 months. Recruitment will target PhD students and junior female economists via email campaigns and professional development workshops. Data will be collected through baseline and endline surveys and continuous behavioral logging from the AI platform. The study aims to evaluate the tool's impact on academic productivity and well-being.
|
After
This is a two-arm, parallel-group, single-blind randomized controlled trial (RCT). Participants — PhD students and junior economists within five years of their doctorate — will be randomly allocated to either a control group or a treatment group in a 1:1 ratio, using stratified randomization by career stage. The intervention will last for 24 months. Data will be collected through baseline and endline surveys and continuous behavioral logging from the AI platform. The study aims to evaluate whether the productivity effects of AI access vary systematically with the researcher's level of accumulated tacit knowledge.
|
|
Field
Intervention Start Date
|
Before
February 01, 2026
|
After
May 11, 2026
|
|
Field
Primary Outcomes (End Points)
|
Before
Change in the number of papers submitted for publication.
Change in the number of papers published (working papers, R&Rs, accepted/published journal articles).
Change in self-reported job satisfaction.
Change in self-reported work-life balance satisfaction.
|
After
1. Change in objective academic progression milestones over the 24-month period, specifically measured by: (a) rate of desk rejections, (b) invitations to Revise and Resubmit (R&R), and (c) total number of working papers publicly circulated (e.g., NBER, CEPR, SSRN, or university working paper series)
2. Heterogeneity of the treatment effect on objective publication milestones by career stage: whether the impact of AI access on reducing desk rejections and increasing R&Rs, and the amount of research, differs systematically between PhD students and junior economists.
3. Change in the number of papers submitted for publication.
4. Change in self-reported job satisfaction.
5. Change in self-reported work-life balance satisfaction.
|
|
Field
Experimental Design (Public)
|
Before
This is a two-arm, parallel-group, single-blind randomized controlled trial (RCT). Participants will be randomly allocated to either a control group or a treatment group in a 1:1 ratio. The intervention will last for 24-26 months. Recruitment will target PhD students and junior female economists via email campaigns and professional development workshops. Data will be collected through baseline and endline surveys and continuous behavioral logging from the AI platform. The study aims to evaluate the tool's impact on academic productivity and well-being.
|
After
This is a two-arm, parallel-group, single-blind randomized controlled trial (RCT) evaluated over a 24-month period. The study investigates the causal impact of a structured, AI-powered Research Development Tool on the academic productivity of early-career researchers. Participants will be stratified by career stage (PhD students vs. junior economists within five years of their doctorate) and randomized in a 1:1 ratio into either a treatment group or a waitlist control group.
The treatment group will receive immediate access to the AI tool, which provides expert-level feedback on working papers. The waitlist control group will conduct their research as usual (which may include ad-hoc use of general-purpose LLMs) and will receive access to the treatment tool only after the 12-month intervention concludes. Our primary objective is to measure changes in real-world publication milestones (e.g., desk rejection rates, Revise & Resubmit invitations, and working paper circulation). Secondarily, we will test whether the treatment effect varies systematically by the researcher's level of accumulated tacit knowledge (career stage), identifying whether AI access compresses or amplifies existing inequalities in research output.
|
|
Field
Randomization Method
|
Before
Individual participants will be randomly assigned to one of the two arms using a computerized random assignment algorithm, ensuring a 1:1 allocation ratio. To mitigate potential contamination and spillovers, each participant will receive a unique, tokenized access link to their assigned tool. The platform will prevent simultaneous logins from different IP addresses using the same token. Furthermore, participants in the control group will be informed about a waitlist control design, where they will be granted full access to the treatment tool after the intervention period concludes, reducing any incentive to seek access to the treatment arm during the study. Heterogeneous treatment effects will be explored for PhD students versus junior professors.
|
After
Individual participants will be randomly assigned to one of the two arms using a computerized stratified random assignment. Stratification is based on one binary variable: Career Stage (PhD Student vs. Junior Economist).
|
|
Field
Planned Number of Clusters
|
Before
Not applicable as randomization is individual
|
After
Not applicable (individual-level randomization).
|
|
Field
Planned Number of Observations
|
Before
The primary planned sample size is 128 participants. This number is derived to detect a medium effect size. Sample size will be sensitive to the observed effect size; a study targeting smaller effects may require up to 786 participants.
|
After
Our primary recruitment target is 504 participants (252 per arm), which provides 80% power to detect a moderate effect (d = 0.50) on the main ATE and 80% between treatment and career stage. This is our reference scenario given that existing literature on AI and cognitively demanding tasks (Brynjolfsson et al., 2023; Noy & Zhang, 2023) reports effects in the small-to-medium range. Gender interactions are pre-specified as exploratory: detecting a triple interaction (treatment × career stage × gender) requires approximately four times the sample needed for the main interaction.
|
|
Field
Sample size (or number of clusters) by treatment arms
|
Before
Arm 1 (Control Group): 64 participants
Arm 2 (Treatment Group - RefereeAI Suite): 64 participants
|
After
Arm 1 (Control Group - waitlist/status quo): 252 participants
Arm 2 (Treatment Group - RefereeAI Suite): 252 participants
|
|
Field
Power calculation: Minimum Detectable Effect Size for Main Outcomes
|
Before
Accounting for a two-arm design with individual randomization, the power analysis for 80% power and a significance level of 0.05 indicates the following sample size requirements per arm:
1. Large Effect (Cohen's d = 0.8): 25 participants per arm (50 total).
2. Medium Effect (Cohen's d = 0.5): 64 participants per arm (128 total). This is our primary target sample size.
3. Small Effect (Cohen's d = 0.2): 393 participants per arm (786 total).
Our planned sample size of 128 (64 per arm) is powered to detect a medium effect. A larger sample would be required to detect smaller effects with the same statistical power. If a sample of 2000 researchers (1000 per arm) were achieved, the study would be powered to detect a very small effect size of approximately Cohen's d = 0.18. This high statistical power implies that we would be able to detect even a minimal, subtle difference between the two groups. However, detecting a very small effect forces a critical interpretation of the results: while statistically significant, an effect of this magnitude (d=0.18) may lack practical significance in terms of meaningful real-world impact on researcher productivity or well-being.
|
After
Power calculations assume a two-arm, individual-level randomized trial with no clustering (α=0.05, Power = 80%).
Crucially, our primary hypothesis investigates whether the AI tool democratises tacit knowledge—meaning we are testing an interaction effect (Treatment × Career Stage) rather than just an Average Treatment Effect (ATE).
Our baseline recruitment target of 504 participants is powered under an Optimistic Perspective. Assuming around 50/50 stratified split between PhD students and junior economists, the Minimum Detectable Effect Size (MDES) for the interaction under different enrollment scenarios is as follows:
- Optimistic Target (Moderate Effect, MDES = 0.50 standard deviations): Requires 504 total participants (252 per arm). This is our baseline operational target. (Note: detecting the main ATE alone at this magnitude would require just 126 total participants).
- Reference Scenario (Small-to-Moderate Effect, MDES = 0.30 SD): Requires 1,396 total participants (698 per arm). If recruitment exceeds our baseline target, this is our extended goal.
- Conservative Scenario (Small Effect, MDES = 0.20 SD): Requires 3,140 total participants (1,570 per arm).
Operational Constraints on Power:
These calculations depend on the demographic balance of our strata. Statistical power for an interaction is constrained by the size of the smallest cell, because the variance function p(1−p) is maximized at p=0.5.
If our real-world recruitment pool deviates from a 50/50 split, the required sample size increases significantly to maintain the same MDES. For example, to detect an effect of d=0.50, a 30/70 demographic split increases the required sample from 504 to 524 participants to preserve the necessary minimum cell size. To transparently pre-register our power constraints, we outline the expected sample requirements under varying degrees of demographic imbalance (d=0.5):
- Optimal Split (50/50): Requires 504 total participants (252/arm). The minimum cell size is 126. This is our baseline operational target.
- Mild Imbalance (60/40 or 40/60): Requires 524 total participants (262/arm).
- Moderate Imbalance (70/30 or 30/70): Requires 598 total participants (299/arm).
- Severe Imbalance (80/20 or 20/80): Requires 786 total participants (393/arm).
- Extreme Imbalance (90/10 or 10/90): Requires 1,396 total participants (698/arm).
If the true effect of the AI tool is smaller, the required sample sizes increase drastically (d=0.3).
- Optimal Split (50/50): Requires 1,396 total participants (698/arm).
- Moderate Imbalance (70/30 or 30/70): Requires 1,662 total participants (831/arm).
- Severe Imbalance (80/20 or 20/80): Requires 2,182 total participants (1,091/arm).
|
|
Field
Intervention (Hidden)
|
Before
Individual participants will be randomly assigned to one of the two arms using a computerized random assignment algorithm, ensuring a 1:1 allocation ratio. To mitigate potential contamination and spillovers, each participant will receive a unique, tokenized access link to their assigned tool. The platform will prevent simultaneous logins from different IP addresses using the same token. Furthermore, participants in the control group will be informed about a waitlist control design, where they will be granted full access to the treatment tool after the intervention period concludes, reducing any incentive to seek access to the treatment arm during the study. Heterogeneous treatment effects will be explored for PhD students versus junior professors.
|
After
Individual participants will be randomly assigned to one of the two arms using a computerized stratified random assignment algorithm, ensuring a 1:1 allocation ratio within each stratum. Randomization is stratified by career stage (early-stage researchers versus advanced-stage researchers) to ensure balance across groups that are structurally distinct in their level of accumulated tacit knowledge — the key moderator variable in our primary research question. Within each stratum, assignment is individual and independent. To mitigate potential contamination and spillovers, each participant will receive a unique, tokenized access link to their assigned tool. The platform will prevent simultaneous logins from different IP addresses using the same token. Furthermore, participants in the control group will be informed about a waitlist control design, where they will be granted full access to the treatment tool after the intervention period concludes.
|
|
Field
Secondary Outcomes (End Points)
|
Before
|
After
1. Change in algorithmic paper quality (0-100 score). At the endline, both treatment and control group working papers will be processed blindly through the 'Editor Agent' prompt of the ResearchAI tool to calculate a standardized quality score. This serves as a mechanistic measure of algorithmic compliance rather than a measure of ultimate scientific validity.
|