AI in Writing

Last registered on February 19, 2026

Pre-Trial

Trial Information

General Information

Title
AI in Writing
RCT ID
AEARCTR-0017909
Initial registration date
February 17, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
February 19, 2026, 7:35 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation

Other Primary Investigator(s)

Additional Trial Information

Status
On going
Start date
2025-09-25
End date
2027-04-28
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Does the degree of AI involvement in writing affect the success of written output? Specifically, we ask two questions. First, for social media posts (e.g., LinkedIn, X/Twitter), does AI-assisted writing improve engagement metrics such as impressions, likes, comments, shares, and follower growth? Second, for academic papers, does AI-assisted writing improve citation counts, journal acceptance rates, peer review quality scores, and altmetric attention?
The deeper question is whether AI assistance generates a voltage gain or voltage drop when scaled from the author’s initial draft to published output. AI-polished writing may gain surface-level engagement but lose the idiosyncratic voice and rough edges that drive deeper resonance. Or it may do the opposite. We don’t know yet—which is precisely why we need an experiment.
External Link(s)

Registration Citation

Citation
August, john. 2026. "AI in Writing." AEA RCT Registry. February 19. https://doi.org/10.1257/rct.17909-1.0
Experimental Details

Interventions

Intervention(s)
5a. Treatment Arms
We employ a three-arm design with within-subject randomization:
Arm Definition Production Process
No AI Author writes entirely without AI tools. No ChatGPT, Claude, Grammarly AI, or any LLM-based assistance. Spell-check allowed. Author drafts, revises, and finalizes using only their own judgment and traditional writing tools (word processor, manual style guides, human feedback if desired).
Moderate AI Author writes initial draft without AI, then uses AI for targeted improvements: grammar, clarity, structural suggestions, tone calibration. Author retains final editorial control and makes all acceptance/rejection decisions on AI suggestions. Author drafts independently. AI is used as an editor/consultant. Author reviews each suggestion and decides what to incorporate. The voice remains the author's.
Heavy AI Author provides topic/thesis and key points; AI generates a complete draft that the author then reviews for factual accuracy but does not substantially rewrite. AI handles structure, phrasing, transitions, and rhetorical strategy. Author provides a brief (bullet points or outline). AI produces the draft. Author checks facts and makes minimal edits. The voice is substantially the AI's.

5b. Domain 1: Social Media Posts
Design: Within-subject randomized trial. A single author (the PI) produces social media posts on LinkedIn and X/Twitter over a 6-month period. Each week, the platform (LinkedIn or X) and AI treatment level are randomly assigned according to a pre-specified randomization schedule generated via a random number generator before data collection begins. The author posts several times per week per platform.
Topic stratification: We stratify randomization by topic category (economics/policy, personal/motivational, research findings, current events) to ensure each treatment arm covers similar content. This addresses the concern that some topics are inherently more engaging than others.
Blinding: Followers do not know which treatment arm produced any given post. The author does not disclose AI usage on individual posts during the study period. Post-study disclosure will occur.
5c. Domain 2: Academic Papers
Design: Between-paper randomized trial. Over a 12-month period, the PI will produce new working papers, referee reports, and editorial letters. Each document is randomly assigned to a treatment arm before writing begins. Because academic papers are heterogeneous in topic and complexity, we use a matched-triple design: papers are grouped into triples of similar scope/topic, and within each triple, one paper is randomly assigned to each treatment arm.
Key constraint: The research content, empirical analysis, and intellectual contribution must be held constant across arms. The treatment affects the writing process only—not the ideas, data, or analytical choices. This is enforceable because the author determines all research decisions before the writing phase begins.
Intervention Start Date
2025-09-30
Intervention End Date
2027-04-28

Primary Outcomes

Primary Outcomes (end points)
Social Media Engagement Rate = (likes + comments + shares + reposts) / impressions Platform analytics API, measured at 72 hours post-publication (to allow engagement to stabilize)
Academic Composite Quality Index: weighted average of (a) citations, (b) revision turnaround time, (c) journal acceptance on first submission Journal editorial system data; self-reported review scores standardized within journal

7b. Secondary Outcomes
Social Media:
• Impressions (reach)
• Comment depth (average word count per comment, as a proxy for substantive engagement vs. superficial reactions)
• Follower growth rate (net new followers in the 7 days following each post)
• Sentiment of comments (positive/negative/neutral, coded by ML classifier)
• AI detection score (GPTZero, Originality.ai) as a process check
• Profile visit rate post-publication
Academic:
• Citation count at 6, 12, and 24 months post-publication
• Altmetric attention score
• NBER/SSRN download counts at 30 and 90 days
• Referee report tone (NLP-coded constructiveness, specificity of suggestions)
• Number of revision rounds before acceptance
• AI detection score (process check)
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Social Media Engagement Rate = (likes + comments + shares + reposts) / impressions Platform analytics API, measured at 72 hours post-publication (to allow engagement to stabilize)
Academic Composite Quality Index: weighted average of (a) citations, (b) revision turnaround time, (c) journal acceptance on first submission Journal editorial system data; self-reported review scores standardized within journal

7b. Secondary Outcomes
Social Media:
• Impressions (reach)
• Comment depth (average word count per comment, as a proxy for substantive engagement vs. superficial reactions)
• Follower growth rate (net new followers in the 7 days following each post)
• Sentiment of comments (positive/negative/neutral, coded by ML classifier)
• AI detection score (GPTZero, Originality.ai) as a process check
• Profile visit rate post-publication
Academic:
• Citation count at 6, 12, and 24 months post-publication
• Altmetric attention score
• NBER/SSRN download counts at 30 and 90 days
• Referee report tone (NLP-coded constructiveness, specificity of suggestions)
• Number of revision rounds before acceptance
• AI detection score (process check)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
5a. Treatment Arms
We employ a three-arm design with within-subject randomization:
Arm Definition Production Process
No AI Author writes entirely without AI tools. No ChatGPT, Claude, Grammarly AI, or any LLM-based assistance. Spell-check allowed. Author drafts, revises, and finalizes using only their own judgment and traditional writing tools (word processor, manual style guides, human feedback if desired).
Moderate AI Author writes initial draft without AI, then uses AI for targeted improvements: grammar, clarity, structural suggestions, tone calibration. Author retains final editorial control and makes all acceptance/rejection decisions on AI suggestions. Author drafts independently. AI is used as an editor/consultant. Author reviews each suggestion and decides what to incorporate. The voice remains the author's.
Heavy AI Author provides topic/thesis and key points; AI generates a complete draft that the author then reviews for factual accuracy but does not substantially rewrite. AI handles structure, phrasing, transitions, and rhetorical strategy. Author provides a brief (bullet points or outline). AI produces the draft. Author checks facts and makes minimal edits. The voice is substantially the AI's.
Experimental Design Details
Not available
Randomization Method
8. Randomization Procedure
Social media: Complete randomization schedule generated before data collection using a reproducible seed (Stata: set seed 20260217). Each week’s 18 posts are assigned to 6 per arm, stratified by platform (LinkedIn/X) and topic category. The full schedule is sealed in the AEA registry before the first post is published.
Academic papers: Matched-triple design. Papers are grouped into triples of similar scope and topic. Within each triple, treatment assignment is determined by random draw (sealed envelope method or reproducible code) before writing begi
Randomization Unit
individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
6a. Social Media
Target sample: N = 468 posts.
Power calculation: Using engagement rate (likes + comments + shares / impressions) as the primary outcome and assuming a baseline engagement rate of 3% with a standard deviation of 2.5%, we are powered at 80% (α = 0.05, two-sided) to detect a minimum detectable effect (MDE) of 0.8 percentage points (d = 0.32) for pairwise comparisons between any two arms. For the omnibus F-test across all three arms, power exceeds 85%.
Adjustments: We account for serial correlation within platform-weeks using cluster-robust standard errors at the week level. We also allow for heteroskedasticity across platforms.
6b. Academic Papers
Target sample: N = 30 documents (10 per arm), organized in 10 matched triples. This is a smaller sample and we acknowledge that power is limited for detecting moderate effects on citation counts or acceptance rates. This arm is better understood as generating informative descriptive statistics and suggestive evidence rather than definitive causal estimates. We calculate power for a permutation-based test under the sharp null, acknowledging that with 10 triples, we can detect only very large effects (d > 0.8) at conventional significance levels.
Supplementary approach: Given the power limitation, we supplement with machine learning analysis of referee reports (sentiment, specificity of criticism, revision requests) as continuous outcomes that increase statistical precision.
Sample size: planned number of observations
6a. Social Media Target sample: N = 468 posts. Power calculation: Using engagement rate (likes + comments + shares / impressions) as the primary outcome and assuming a baseline engagement rate of 3% with a standard deviation of 2.5%, we are powered at 80% (α = 0.05, two-sided) to detect a minimum detectable effect (MDE) of 0.8 percentage points (d = 0.32) for pairwise comparisons between any two arms. For the omnibus F-test across all three arms, power exceeds 85%. Adjustments: We account for serial correlation within platform-weeks using cluster-robust standard errors at the week level. We also allow for heteroskedasticity across platforms. 6b. Academic Papers Target sample: N = 30 documents (10 per arm), organized in 10 matched triples. This is a smaller sample and we acknowledge that power is limited for detecting moderate effects on citation counts or acceptance rates. This arm is better understood as generating informative descriptive statistics and suggestive evidence rather than definitive causal estimates. We calculate power for a permutation-based test under the sharp null, acknowledging that with 10 triples, we can detect only very large effects (d > 0.8) at conventional significance levels. Supplementary approach: Given the power limitation, we supplement with machine learning analysis of referee reports (sentiment, specificity of criticism, revision requests) as continuous outcomes that increase statistical precision.
Sample size (or number of clusters) by treatment arms
6a. Social Media
Target sample: N = 468 posts.
Power calculation: Using engagement rate (likes + comments + shares / impressions) as the primary outcome and assuming a baseline engagement rate of 3% with a standard deviation of 2.5%, we are powered at 80% (α = 0.05, two-sided) to detect a minimum detectable effect (MDE) of 0.8 percentage points (d = 0.32) for pairwise comparisons between any two arms. For the omnibus F-test across all three arms, power exceeds 85%.
Adjustments: We account for serial correlation within platform-weeks using cluster-robust standard errors at the week level. We also allow for heteroskedasticity across platforms.
6b. Academic Papers
Target sample: N = 30 documents (10 per arm), organized in 10 matched triples. This is a smaller sample and we acknowledge that power is limited for detecting moderate effects on citation counts or acceptance rates. This arm is better understood as generating informative descriptive statistics and suggestive evidence rather than definitive causal estimates. We calculate power for a permutation-based test under the sharp null, acknowledging that with 10 triples, we can detect only very large effects (d > 0.8) at conventional significance levels.
Supplementary approach: Given the power limitation, we supplement with machine learning analysis of referee reports (sentiment, specificity of criticism, revision requests) as continuous outcomes that increase statistical precision.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number