A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed

Last registered on May 04, 2026

Pre-Trial

Trial Information

General Information

Title
A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed
RCT ID
AEARCTR-0018523
Initial registration date
April 30, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 04, 2026, 8:11 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Cornell Tech

Other Primary Investigator(s)

PI Affiliation
Cornell University
PI Affiliation
Graze Social
PI Affiliation
Independent
PI Affiliation
Cornell Tech

Additional Trial Information

Status
On going
Start date
2026-03-27
End date
2026-10-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Online news feeds are responsive attention markets: the content shown to users affects the interactions that determine future rankings. This feedback loop can generate cumulative advantage, concentrating attention among incumbent publishers and making outcomes sensitive to early luck. We propose a randomized parallel-world field experiment on the Graze Trending News feed to test how ranking design shapes these dynamics. Users are persistently assigned to one of nine worlds implementing three ranking rules: a status-quo local-interaction rule, a lightly smoothed interaction-per-impression rule, and a heavily smoothed interaction-per-impression rule. Because items decay out of the feed within 12 hours, the experiment consists of repeated short-lived attention markets. The primary outcomes are user engagement, position-weighted quality allocation, variance in exposure among similarly high-quality posts, outlet-level concentration, and cross-world divergence in market shares. Quality is measured using a pre-specified external label estimated from chronological-feed interactions. The central question is whether exposure-normalized ranking can reduce concentration and path dependence while preserving or improving the quality of attention allocated to news.
External Link(s)

Registration Citation

Citation
Gaffney, Devin et al. 2026. "A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed." AEA RCT Registry. May 04. https://doi.org/10.1257/rct.18523-1.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
The intervention is a randomized change to the ranking rule used on the Graze Trending News feed. Users are persistently assigned to one of nine experimental cells: three ranking designs crossed with three parallel worlds per design.
Intervention Start Date
2026-04-27
Intervention End Date
2026-06-30

Primary Outcomes

Primary Outcomes (end points)
1. User engagement metrics: positive interactions (likes, replies, reposts, quote posts, see more), sessions, and session depth. A session is defined as a request to the Graze Trending News feed. Session depth is defined as the number of items viewed within a request. Positive interactions and sessions are the primary engagement outcomes, while session depth is a secondary engagement outcome. The primary engagement outcomes are measured on a user-day panel and analyzed intent-to-treat. Included users are those who used the Graze Trending News feed at least once in the month before the experimental period, and the user-day panel begins on the calendar day of that user's first observed Graze Trending News feed impression during the experiment. For positive interactions, our primary measure is "likes", with a secondary measure using the full set of interaction types, weighted in the same way as the ranking rule in the experiment.

2. User-facing quality allocation. The main quality outcome is the average external quality of the visibility allocated by the ranking rule, using position-weighted exposure rather than raw impressions. This will primarily be defined as the average (unweighted) quality of posts in the Top 3 positions sent to users. For these Top 3 exposure measures, an opportunity is a Graze Trending News feed request while the post is eligible. The inference unit is each post entry cohort: we will compute the quality-allocation metric for each design x cohort, then compare designs across cohorts (properly accounting for how world averaging affects variance of the estimate). We average across worlds within each design.

3. Within-quality variance in post visibility. This measures how differently similarly high-quality posts are treated by the ranking rule. We will bin posts into quality quantiles and measure the variance in exposure among posts in each quantile. The primary exposure measure is the fraction of Graze Trending News feed requests while eligible in which a post appears in the Top 3, with a secondary measure on the number of impressions. Same as user-facing quality allocation, the inference unit is the post-entry cohort, and we will average across the 3 worlds within a design before comparing designs across cohorts.

4. Outlet concentration. The main concentration outcome is the Gini coefficient of outlet exposure shares among outlets with at least one eligible post in a given design x world x post-entry cohort. Outlet exposure is defined using the same Top 3 opportunity-based exposure measure as above, averaged across all posts from the outlet in that cohort. The inferential unit is the post-entry cohort: we compute the Gini coefficient separately for each design x cohort, and then compare designs across cohorts. We average across worlds within each design.

5. Cross-world unpredictability. This measures how much item-level market shares diverge across the three parallel worlds within a design. This is defined as the average pairwise absolute difference in item-level market shares across the three worlds within a design. Here, the inference unit is the post-entry cohort; for each design x cohort, we will compute the average pairwise divergence across the 3 worlds for each post, then average across posts within the cohort, and then compare designs across cohorts.
Primary Outcomes (explanation)
The experiment is designed to measure both user value and producer-side allocation. The quality and fairness outcomes focus on how the ranking rule allocates visibility across posts and outlets. The unpredictability outcome measures path dependence by asking whether nominally identical worlds diverge because of feedback and early luck. The user-engagement outcome measures whether a ranking rule preserves or improves the value of the feed for users.

Quality is measured using an external proxy constructed from the chronological news feed rather than from the experimental Graze Trending News feed. For post i, the primary quality label is the rate of positive interactions per chronological-feed impression, using the same positive interaction types and impression normalization used in the experimental ranking design, using equal weights across interaction types. The primary quality analyses will use these impression-normalized quality labels rather than raw interaction counts. This quality proxy is intended to be less contaminated by treatment-induced feedback on the Graze Trending News feed than interactions observed directly on the Graze Trending News feed. We will only include posts with at least 10 impressions in the chronological feed to ensure a minimum level of reliability in the quality proxy. We also will smooth these scores additively toward the overall interaction rate on the chronological news feed, similar to the treatment. In particular, we will compute the average rate of interactions per impression r. For each post, we will then add 10 x r and 10 to the interaction and impression counts respectively.

The primary confirmatory hypotheses are as follows. We expect that user engagement will not change significantly between the three designs, but that there will be a tradeoff between good performance on the second endpoint (high user-facing quality), and the remaining three (low within-quality variance, fair outlet exposure, and low cross-world unpredictability). In particular, relative to Design 1, we expect both Designs 2 and 3 to decrease user-facing quality and within-quality variance, increase fairness, and decrease unpredictability. We expect Design 3 to interpolate between Designs 1 and 2: relative to Design 2, we expect Design 3 to increase user-facing quality but also increase within-quality variance, decrease fairness, and increase cross-world unpredictability.

Secondary Outcomes

Secondary Outcomes (end points)
1. Small-high-quality lift: whether high-quality content from smaller outlets receives more visibility.

2. Turnover: how long a post remains in the top 3.

3. Dynamic event-time trajectories for the main market-level outcomes over the 12-hour life of a post.

4. Early-interaction trajectory plots that compare later trajectories for posts with different early interaction signals.

5. Secondary user-experience measures such as "show more / show less" and an explicit survey satisfaction measure.

6. Coverage and exploration outcomes: cohort-level coverage of eligible posts, and user-level exploration outcomes measuring how much visibility goes to outlets not recently seen by the same user.

7. New follows to outlets surfaced in the Graze Trending News feed during the experimental period, including the number of newly followed outlets and the share or count of those new follows going to small outlets.

8. Relative-prominence engagement: for user-post first impressions, whether users are more likely to engage with or follow content when that post is ranked more prominently in their own world than in the other parallel worlds under the same design.

9. Behavioral-convergence classification: whether post-treatment user behavior vectors can predict a user's assigned design, and within design the user's assigned cell, more accurately than chance using out-of-sample regularized multinomial regression with permutation inference.

As in the primary outcomes, market-level secondary outcomes are largely constructed within each design x world x post-entry cohort and use the post-entry cohort as the inferential unit; secondary user-experience and follow outcomes use user-day or user-level analyses, and relative-prominence engagement uses the user-post first-impression as the analysis unit.
Secondary Outcomes (explanation)
These outcomes are intended to clarify mechanism and user experience. In particular, the trajectory analyses are meant to show whether early interaction differences are amplified into later exposure differences, while coverage, exploration, new follows to surfaced outlets, and relative-prominence engagement speak to novelty, discovery, and within-design reinforcement of what a user's own feed promotes. The behavioral-convergence classification outcome asks whether users in the same design or cell become behaviorally distinguishable from users in other designs or cells based on post-treatment interaction and follow patterns. The primary version predicts design from post-treatment behavior vectors; a secondary version predicts cell within design. Performance will be measured out of sample using multinomial log loss, and label permutations will be conducted within design for the cell-classification test.

Experimental Design

Experimental Design
Users are randomly and persistently assigned to one of nine cells:

- 1a, 1b, 1c
- 2a, 2b, 2c
- 3a, 3b, 3c

The numeric index denotes the ranking design and the letter index denotes the parallel world within design. Assignment is fixed for the duration of the study.

The primary item unit is the post. Some analyses, especially concentration and follow behavior, additionally aggregate posts to the outlet level.

Because posts decay out of the feed after 12 hours, the primary market unit is an entry cohort rather than a fixed calendar window. Posts are grouped by first eligibility time into cohorts of width Delta. The primary Delta unit will be 1 hour, with robustness analyses using 30 minutes, 2 hours, and 4 hours.
Experimental Design Details
Not available
Randomization Method
Randomization uses persistent user-level assignment.
Randomization Unit
User
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Not applicable for randomization, since treatment is assigned at the user level.
Sample size: planned number of observations
All users who used the Graze Trending News feed at least once in the month before the experimental period are included. For user-level outcomes, counting begins on the calendar day of the user's first observed impression on the Graze Trending News feed during the experiment. The exact number of users is not fixed in advance but is expected to be about 9k, and so about 1k per cell.
Sample size (or number of clusters) by treatment arms
About 1k per cell.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Cornell University Institutional Review Board for Human Participant Research
IRB Approval Date
2025-10-07
IRB Approval Number
IRB0150005
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information