A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed

Last registered on June 22, 2026

Pre-Trial

Trial Information

General Information

Title
A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed
RCT ID
AEARCTR-0018523
Initial registration date
April 30, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 04, 2026, 8:11 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
June 22, 2026, 9:56 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Cornell Tech

Other Primary Investigator(s)

PI Affiliation
Cornell University
PI Affiliation
Graze Social
PI Affiliation
Independent
PI Affiliation
Cornell Tech

Additional Trial Information

Status
In development
Start date
2026-06-11
End date
2027-03-23
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Online news feeds are responsive attention markets: the content shown to users affects the interactions that determine future rankings. This feedback loop can generate cumulative advantage, concentrating attention among incumbent publishers and making outcomes sensitive to early luck. We propose a randomized parallel-world field experiment on the Graze Trending News feed to test how ranking design shapes these dynamics. Users are persistently assigned to one of nine worlds implementing three ranking rules: a status-quo local-interaction rule, a lightly smoothed interaction-per-impression rule, and a heavily smoothed interaction-per-impression rule. Because items decay out of the feed within 48 hours, the experiment consists of repeated short-lived attention markets. The primary outcomes are user engagement, position-weighted quality allocation, variance in exposure among similarly high-quality posts, outlet-level concentration, and cross-world divergence in market shares. Quality is measured using a pre-specified external label estimated from chronological-feed interactions. The central question is how designs, such as exposure-normalized ranking, trade off concentration, unpredictability, and user-facing quality/engagement metrics. The central hypothesis is that interventions that reduce concentration may do so by increasing unpredictability.
External Link(s)

Registration Citation

Citation
Gaffney, Devin et al. 2026. "A Parallel-World Experiment on Ranking Rules, Quality Allocation, Concentration, and Path Dependence in the Graze Trending News feed." AEA RCT Registry. June 22. https://doi.org/10.1257/rct.18523-2.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
The intervention is a randomized change to the ranking rule used on the Graze Trending News feed. Users are persistently assigned to one of nine experimental cells: three ranking designs crossed with three parallel worlds per design.
Intervention Start Date
2026-06-23
Intervention End Date
2026-09-23

Primary Outcomes

Primary Outcomes (end points)
We will group posts into post-entry cohorts: sets of posts that were created in the same time interval.

We will primarily examine the render log, which tracks the time, user identifier, and the posts sent to the user for each request of the Trending News Feed (a “session”). In particular, we examine news posts sent in the top 6 positions for requests with post limit > 1. (We consider the absolute position in the feed, that is, we include the offset caused by non-news posts, such as announcements and surveys. Since the feed always includes a pinned post, this is typically 5 news posts if the survey is not shown).

We also construct a “shadow global” render log. For each render log entry at time t with n news posts, we create an entry in the shadow global render log at time t with the n posts that would have been shown by the “Shadow Global” Design 0 based on global interaction counts up to time t.

**Relevant definitions for outcomes**
1. *Exposure market share for posts.* For each post, an exposure is defined as being in the top 6 positions of posts sent to users. The market share of a post is the number of exposures of a post, normalized by the total exposures of posts in the same post-entry cohort.

2. *Residual from shadow global.* For each post, in a given world, its residual from the shadow global is the difference in its market share based on the actual posts sent to users, and what its market share would have been if the shadow global ranking was used to rank posts in that world.

3. *Item quality from reverse chronological feed.* For each post, we will define its quality as explained in detail below. Given a measure of quality, we will bin eligible posts (posts rendered to any user across any Design 0-3) into quality quantiles (5 buckets), where each quintile has the same number of posts.


**Primary outcomes**

1. *Movement from Shadow Global.* In terms of market share of posts, how different are the designs from shadow global. Defined as the sum of absolute values of residuals in post-level market share in each cohort.

2. *Inequality (concentration) of post-level market shares in each cohort.* This measures how differently similarly high-quality posts are treated by the ranking rule. Our primary measure of inequality is the Gini coefficient of exposure market shares in each cohort. We will measure this both across all posts, and within post-quality groups, focusing on the top 20% of posts by estimated quality. The inference unit is the post-entry cohort, and we will average across the 3 worlds within a design before comparing designs across cohorts. As a secondary illustration, we will plot this metric for quality groups beyond the top 20%, including more granular bins.

3. *Cross world unpredictability (arbitrariness) of post-level market shares across worlds in the same design.* This measures how much item-level market shares diverge across the three parallel worlds within a design. The unpredictability of a design is the Gini Mean Difference (GMD) for a post i’s market share across the 3 worlds in that design. We will report: the GMD across all posts and across the top 20% of posts by estimated quality. As a secondary illustration, we will plot this metric for quality groups beyond the top 20%, including more granular bins. We will also report the GMD conditional on the mean movement – the residual from shadow global (bucketed, into quintiles) – to decompose unpredictability caused by a design changing average exposures relative to shadow global versus variance in the change. More precisely, we will primarily test the difference in GMD averaged with uniform weight over movement quintiles. Here, the inference unit is the post-entry cohort; for each design x cohort, we will compute the GMD across the 3 worlds for each post, then average across posts within the cohort, and then compare designs across cohorts.

4. *User facing outcomes.*
(a) *Exposure quality:* The main quality outcome is the average external quality of the visibility allocated by the ranking rule, using position-weighted exposure. This will primarily be defined as the average (unweighted) quality of the top-6 posts sent to users, as in the exposure metric. The inference unit is each post entry cohort: we will compute the quality-allocation metric for each design x cohort, then compare designs across cohorts (properly accounting for how world averaging affects variance of the estimate). We average across worlds within each design.
(b) *User engagement metrics:* positive interactions (likes, replies, reposts, quote posts, see more), sessions, and session depth. A session is defined as a request to the Graze Trending News feed. Session depth is defined as the number of items viewed within a request. Positive interactions and sessions are the primary engagement outcomes, while session depth is a secondary engagement outcome. For positive interactions, our primary measure is "on-feed likes per session", with secondary measures using “on-feed likes per exposure” and the full set of interaction types, weighted in the same way as the ranking rule in the experiment.
Primary Outcomes (explanation)
The experiment is designed to measure both user value and producer-side allocation. The quality and fairness outcomes focus on how the ranking rule allocates visibility across posts and outlets. The unpredictability outcome measures path dependence by asking whether nominally identical worlds diverge because of feedback and early luck. The user-engagement outcome measures whether a ranking rule preserves or improves the value of the feed for users.

Quality is measured using an external proxy constructed from the chronological news feed rather than from the experimental Graze Trending News feed. This quality proxy is intended to be less contaminated by treatment-induced feedback on the Graze Trending News feed than interactions observed directly on the Graze Trending News feed. For post i, the primary quality label is the rate of positive interactions per chronological-feed impression, using the same positive interaction types and impression normalization design used in the experimental ranking design, using equal weights across interaction types. The primary quality analyses will use these impression-normalized quality labels rather than raw interaction counts. We will only include posts with at least 10 impressions in the chronological feed to ensure a minimum level of reliability in the quality proxy, and exclude engagement from users with over 600 on-feed views and interactions per day (informed by the 99.9th percentile of on-feed engagement). We also will smooth these scores additively toward the overall interaction rate on the chronological news feed, similar to the treatment. In particular, we will compute the average interaction score per impression r. For each post, we will then add 10 x r and 10 to the interaction and impression counts respectively. We will only include the first feed load per user per 5 minute window, and exclude a user-post pair once they have already liked the post.

As a robustness check metric of quality, we will conduct the following regression, at the post-impression level:

Positive interaction ~ Post_fixed_effect + post_age + position_in_ranking + {current_global_engagement_counts_per_interaction_type}.

Then, we will use the post_fixed_effect, regularized toward the mean, as a measure of quality. This regression attempts to control for social influence effects, in which a post may be more likely to be engaged with if it already has many engagements. We will also report the coefficients on the other components of the regression, as a measure of such social influence and algorithmic ranking effects on interaction rates.

As a secondary robustness check, for both the primary and above metric, we will also calculate the metrics using the treatment feeds themselves (all worlds and designs together). This definition reflects quality as from the same population as the treatment population, but is contaminated via the experiment data. We will report the correlation between the primary and these secondary/robustness quality metrics, and calculate some of our primary hypotheses using these metrics.

Secondary Outcomes

Secondary Outcomes (end points)
- Rich-get-richer effects more directly. Do early interactions predict future exposure and outcomes, including after controlling for post quality? This will be measured for both the shadow global feed, and each of the designs. Within a design, we will further compare post outcomes across worlds, as a function of the early interactions in each world.
- How many local-feed interactions are needed in each design for a post to have different outcomes than it would under shadow global. This will be an illustration of the mechanism that Design 3 “moves faster” based on local information, increasing arbitrariness but also being more effective at reducing inequality.
- High-quality global underdog exposure. For posts that are in the highest quality buckets but have 0 shadow global exposure: what is the overall exposure of these posts?
- Regression coefficients for explanatory variables in the regression on post-impression level interaction rates, as measures of social influence and algorithm ranking effects.
- Using the above regression coefficients (alongside uncertainty), we may develop a simulator to evaluate different ranking algorithms given such feedback effects.
- Outlet concentration and arbitrariness. Same as post-level, but first aggregated at the outlet (poster) level. Are smaller outlets by pre-experiment follow counts prioritized in some designs?

Other explanatory measures
- Turnover: how long a post remains in the top 6.
- Dynamic event-time trajectories for the main market-level outcomes over the 48-hour life of a post.
- Coverage and exploration outcomes: cohort-level coverage of eligible posts, and user-level exploration outcomes measuring how much visibility goes to outlets not recently seen by the same user.
- New follows to outlets surfaced in the Graze Trending News feed during the experimental period, including the number of newly followed outlets and the share or count of those new follows going to small outlets, where outlet size is defined at experiment start.

Robustness/additional measures on primary outcomes
- Alternative parameters/definitions of primary metrics: e.g., the quality metric; defining exposure as a different N for Top N posts; different bucketing of post quality; for movement, the unique number of posts with changed exposure.
- Repeating the analyses using the view/engagement data reported to Graze by Bluesky instead of the render log to define exposure.
- Secondary user-experience measures such as "show more / show less" and an explicit survey satisfaction measure.
- We will analyze top-4 exposure and turnover in addition to top-6.

As in the primary outcomes, market-level secondary outcomes are largely constructed within each design x world x post-entry cohort and use the post-entry cohort as the inferential unit; secondary user-experience and follow outcomes use user-day or user-level analyses.
Secondary Outcomes (explanation)
These outcomes are intended to clarify mechanisms and provide robustness analyses.

Experimental Design

Experimental Design
Users are randomly and persistently assigned to one of nine cells:

- 1a, 1b, 1c
- 2a, 2b, 2c
- 3a, 3b, 3c

The numeric index denotes the ranking design and the letter index denotes the parallel world within design. Assignment is fixed for the duration of the study.

The primary item unit is the post. Some analyses, especially concentration and follow behavior, additionally aggregate posts to the outlet level.

Because posts decay out of the feed after 48 hours, the primary market unit is an entry cohort rather than a fixed calendar window. Posts are grouped by first eligibility time into cohorts of width Delta. The primary Delta unit will be 1 hour, with robustness analyses using 30 minutes, 2 hours, and 4 hours.
Experimental Design Details
Not available
Randomization Method
Randomization uses persistent user-level assignment.
Randomization Unit
User
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Not applicable for randomization, since treatment is assigned at the user level.
Sample size: planned number of observations
For post-level aggregate metrics (such as cohort market share), we consider all posts that had at least one exposure in any of the 9 treatment cells or in the shadow global. On average, a cohort (1 hour window) has about 170 posts. As discussed below, analyses are largely at the cohort level, with appropriate bootstrap sampling. For each post, we consider all their interactions and exposures in the treatment cell, by any user. For user-level outcomes such as sessions, all users who used the Graze Trending News feed at least once in the month before the pilot period are included. Counting begins on the calendar day of the user's first observed request on the Graze Trending News feed during the pilot period or experiment. We cannot predict in advance how many users will use the feed during the experiment, but we expect the number of included users to be about 9k, and so about 1k per cell.
Sample size (or number of clusters) by treatment arms
About 1k per cell.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Cornell University Institutional Review Board for Human Participant Research
IRB Approval Date
2025-10-07
IRB Approval Number
IRB0150005
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information