Primary Outcomes (end points)
We will group posts into post-entry cohorts: sets of posts that were created in the same time interval.
We will primarily examine the render log, which tracks the time, user identifier, and the posts sent to the user for each request of the Trending News Feed (a “session”). In particular, we examine news posts sent in the top 6 positions for requests with post limit > 1. (We consider the absolute position in the feed, that is, we include the offset caused by non-news posts, such as announcements and surveys. Since the feed always includes a pinned post, this is typically 5 news posts if the survey is not shown).
We also construct a “shadow global” render log. For each render log entry at time t with n news posts, we create an entry in the shadow global render log at time t with the n posts that would have been shown by the “Shadow Global” Design 0 based on global interaction counts up to time t.
**Relevant definitions for outcomes**
1. *Exposure market share for posts.* For each post, an exposure is defined as being in the top 6 positions of posts sent to users. The market share of a post is the number of exposures of a post, normalized by the total exposures of posts in the same post-entry cohort.
2. *Residual from shadow global.* For each post, in a given world, its residual from the shadow global is the difference in its market share based on the actual posts sent to users, and what its market share would have been if the shadow global ranking was used to rank posts in that world.
3. *Item quality from reverse chronological feed.* For each post, we will define its quality as explained in detail below. Given a measure of quality, we will bin eligible posts (posts rendered to any user across any Design 0-3) into quality quantiles (5 buckets), where each quintile has the same number of posts.
**Primary outcomes**
1. *Movement from Shadow Global.* In terms of market share of posts, how different are the designs from shadow global. Defined as the sum of absolute values of residuals in post-level market share in each cohort.
2. *Inequality (concentration) of post-level market shares in each cohort.* This measures how differently similarly high-quality posts are treated by the ranking rule. Our primary measure of inequality is the Gini coefficient of exposure market shares in each cohort. We will measure this both across all posts, and within post-quality groups, focusing on the top 20% of posts by estimated quality. The inference unit is the post-entry cohort, and we will average across the 3 worlds within a design before comparing designs across cohorts. As a secondary illustration, we will plot this metric for quality groups beyond the top 20%, including more granular bins.
3. *Cross world unpredictability (arbitrariness) of post-level market shares across worlds in the same design.* This measures how much item-level market shares diverge across the three parallel worlds within a design. The unpredictability of a design is the Gini Mean Difference (GMD) for a post i’s market share across the 3 worlds in that design. We will report: the GMD across all posts and across the top 20% of posts by estimated quality. As a secondary illustration, we will plot this metric for quality groups beyond the top 20%, including more granular bins. We will also report the GMD conditional on the mean movement – the residual from shadow global (bucketed, into quintiles) – to decompose unpredictability caused by a design changing average exposures relative to shadow global versus variance in the change. More precisely, we will primarily test the difference in GMD averaged with uniform weight over movement quintiles. Here, the inference unit is the post-entry cohort; for each design x cohort, we will compute the GMD across the 3 worlds for each post, then average across posts within the cohort, and then compare designs across cohorts.
4. *User facing outcomes.*
(a) *Exposure quality:* The main quality outcome is the average external quality of the visibility allocated by the ranking rule, using position-weighted exposure. This will primarily be defined as the average (unweighted) quality of the top-6 posts sent to users, as in the exposure metric. The inference unit is each post entry cohort: we will compute the quality-allocation metric for each design x cohort, then compare designs across cohorts (properly accounting for how world averaging affects variance of the estimate). We average across worlds within each design.
(b) *User engagement metrics:* positive interactions (likes, replies, reposts, quote posts, see more), sessions, and session depth. A session is defined as a request to the Graze Trending News feed. Session depth is defined as the number of items viewed within a request. Positive interactions and sessions are the primary engagement outcomes, while session depth is a secondary engagement outcome. For positive interactions, our primary measure is "on-feed likes per session", with secondary measures using “on-feed likes per exposure” and the full set of interaction types, weighted in the same way as the ranking rule in the experiment.