Prosocial Ranking Challenge

Last registered on October 22, 2024

Pre-Trial

Trial Information

General Information

Title
Prosocial Ranking Challenge
RCT ID
AEARCTR-0014274
Initial registration date
August 29, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 12, 2024, 5:19 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 22, 2024, 8:43 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
University of Warwick

Other Primary Investigator(s)

PI Affiliation
UC Berkeley
PI Affiliation
Columbia University
PI Affiliation
University of Michigan
PI Affiliation
Civic Health Project
PI Affiliation
Columbia University

Additional Trial Information

Status
In development
Start date
2024-07-08
End date
2025-06-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
More information about the study will be available after the completion of the trial.
External Link(s)

Registration Citation

Citation
Beknazar-Yuzbashev, George et al. 2024. "Prosocial Ranking Challenge." AEA RCT Registry. October 22. https://doi.org/10.1257/rct.14274-1.1
Sponsors & Partners

Sponsors

Experimental Details

Interventions

Intervention(s)
Experimental intervention is introduced through a browser extension that all participants have to install. This makes various modifications to which content people see on Facebook, X, and Reddit.
Intervention (Hidden)
There will be multiple treatment arms, each testing a different ranking approach. For each arm, the first 50 content items displayed to a user, for each page load, for each platform, are subject to re-ranking. The algorithm processes these content items and adjusts order and selection based on the procedures described below. Items can be re-ordered, removed, or added from public content posted elsewhere on the platform.

In the control group, participants will not have their feed orders changed. Two thirds of the control group will receive in-feed survey questions, with the other one third receiving no questions.

At this stage, we pre-register two treatment arms (we will amend the pre-registration to add new arms ahead of switching them on for participants).

The algorithms below depend on a set of classifiers in Jigsaw’s Perspective API, which score text based on a variety of attributes in three categories:

BRIDGING classifiers: constructive, nuance, personal_story, affinity, compassion, respect, curiosity.
PERSUASION classifiers: fearmongering, power_appeal, generalization, scapegoating, moral_outrage, alienation.
TOXICITY classifiers: toxicity (Perspective API classic), severe toxicity, identity attack, insult, profanity, threat, sexually explicit, flirtation.

Treatment Arm 1: Upranks Bridging content
Treatment Arm 2: Upranks Bridging content and downranks both Toxicity and Persuasion

As previously indicated, we are amending the pre-registration by adding new new classifiers (three new treatment arms). The new arms are described below. Please note that the intervention for Treatments 3-5 started later than for Treatments 1-2 (which were originally pre-registered).

Tretmeant Arm 3 (Heartbreak): This algorithm recommends content based on ideological alignment of source and user, i.e., content that users likely disagree with from sources they usually agree with (and vice-versa), to study the impact on users’ perceptions of out-groups and inclination to consider other perspectives. The ideology of users is determined from nine issue position questions in the baseline survey, while the ideology of content and domains is evaluated using GPT 4o prompts.

Treatment Arm 4 (Feedspan): Detects civic content that doesn’t attract diverse engagement (not ‘bridging’) and replaces it with similar content that is expected to be more bridging, hopefully reducing divisiveness without decreasing civic content. Bridgingness is determined using a BERT-based model that is trained on LLM labels that indicate whether a post is likely to be found interesting and engaging by both average Republicans and Democrats.

Treatment Arm 5 (Personalized Quality News): Recommends news posts from credible and ideologically diverse news sources, which are tailored to a specific user's interests to increase political knowledge and make users more resilient to democratic threats.


Intervention Start Date
2024-08-30
Intervention End Date
2025-03-31

Primary Outcomes

Primary Outcomes (end points)
1. Mental Health

2. Support for partisan violence

3. Meta-perceptions on support for partisan violence

4. Affective polarization

5. Intergroup empathy

6. Political knowledge

7. Meaningful connection on platforms

8. Experiences on platforms

9. Negative Emotions

10. Social media use

a. Active time spent
- We measure active time following the methodology specified in Beknazar-Yuzbashev et al. (2022), summed across all three treated platforms.

b. Engagement rate
- Measured as: total number of engagement signals submitted, divided by number of posts seen, across all platforms.

Outcomes 1-9 will be measured in all surveys (baseline, midline, endline). Some questions from outcomes 3, 4, 5, and 9 will be measured using in-feed surveys injected by the extension. Lastly, outcome 10 will be measured through usage data.

Heterogeneity:

We will look at heterogeneity with respect to:
- Political affiliation (Democrats and Independents leaning towards Democrats vs. Independents leaning towards Republicans and Republicans),
- Minority status (female respondents, ethnic minorities)
- High/low political content exposure (upper quartile compared to lower three quartiles) based on the benchmark period
- High/low toxic content exposure (compared to median) based on the benchmark period

For outcome 10, we will additionally look at heterogeneity by social media platform.
Primary Outcomes (explanation)
Re 1: Mental Health will be measured using the following survey questions (All of the time, Most of the time, More than half the time, Some of the time, At no time):
- I have felt cheerful and in good spirits
- I have felt calm and relaxed
- I have felt active and vigorous
- I woke up feeling fresh and rested
- My daily life has been filled with things that interest me

We will standardize scores for each statement and add them up to create a single index measure.

Re 2: Support for partisan violence will be measured using the following survey questions:

- When, if ever, is it OK for [inparty] to send threatening and intimidating messages to [outparty] leaders? (101 pt scale, Never, Always)
- How much do you feel it is justified for [inparty members] to use violence in advancing their political goals these days? (101 pt scale, Not at all, Extremely)

We will standardize scores for both statements and add them up to create a single index measure.

Re 3: Meta-perceptions on support for partisan violence will be measured using four survey questions (All of the time, Most of the time, More than half the time, Some of the time, At no time):
- How do you think an average [outparty] would respond to the following question? How much do you feel it is justified for [outparty] to use violence in advancing their political goals these days?
- How do you think an average [outparty] would respond to the following question? When, if ever, is it OK for [outparty] to send threatening and intimidating messages to [inparty] leaders?

We will standardize scores for each statement and add them up to create a single index measure.

Re 4: Affective polarization will be measured using the following survey questions:
- [APOUT1] Please indicate how you feel toward [outparty members] using the scale below. 100 means that you feel very favorably or warm toward them, 0 that you feel very unfavorable or cold, and 50 are neutral.
- [APIN1] Please indicate how you feel toward [inparty members] using the scale below. 100 means that you feel very favorably or warm toward them, 0 that you feel very unfavorable or cold, and 50 are neutral.
- [APOUT2] How comfortable are you having friends who are [outgroup members]? (101 pt scale Not at all to Completely)
- [APIN2] How comfortable are you having friends who are [ingroup members]? (101 pt scale Not at all to Completely)

We will standardize scores for each statement and add them up to create a single index measure.

Re 5: Intergroup empathy will be measured using two survey questions (7-point scale ranging from Strongly disagree to Strongly agree):
- I find it difficult to see things from [outparty] point of view.
- I think It is important to understand [outparty] by imagining how things look from their perspective.

We will standardize scores for each statement and add them up to create a single index measure.

Re 6: Participants will be asked questions about their political knowledge in each of the surveys.

Of the following news events, which ones do you think are true events that occurred in the last month, and which ones do you think are false and did not occur? (True, False, Unsure)

There will be five statements in each survey, taken from headlines 2 weeks before the survey, with 2-3 modified to be false.

Re 7: Meaningful connection on platforms will be measured using four survey questions:
- In the last two weeks, have you experienced a meaningful connection with others on Facebook?
- In the last two weeks, have you experienced a meaningful connection with others on X (Twitter)?
- In the last two weeks, have you experienced a meaningful connection with others on Reddit?
- In the last two weeks, have you personally witnessed or experienced something that affected you negatively on Facebook?
- In the last two weeks, have you personally witnessed or experienced something that affected you negatively on X (Twitter)?
- In the last two weeks, have you personally witnessed or experienced something that affected you negatively on Reddit?

Re 8: Experiences on platforms will be measured using four survey questions:
- In the last two weeks, have you learned something that was useful or helped you understand something important on Facebook?
- In the last two weeks, have you learned something that was useful or helped you understand something important on X (Twitter)?
- In the last two weeks, have you learned something that was useful or helped you understand something important on Reddit?
- In the last two weeks, have you witnessed or experienced content that you would consider bad for the world on Facebook?
- In the last two weeks, have you witnessed or experienced content that you would consider bad for the world on X (Twitter)?
- In the last two weeks, have you witnessed or experienced content that you would consider bad for the world on Reddit?

We will standardize scores for each statement and add them up to create a single index measure.

Re 9: Negative Emotions will be measured using the following question (note: this question will only be asked through in-feed surveys, not in the baseline, midline, and endline surveys).
- Reading my {platform} feed makes me feel angry, sad, or disgusted.

Secondary Outcomes

Secondary Outcomes (end points)
1. Social Trust

2. Further measures of user engagement

a. Total number of posts seen
b. Total number of political/civic posts seen
c. Average toxicity (Jigsaw) of posts seen
d. Engagement rate with toxicity
- Measured as: average toxicity of posts weighted by share of total viewport time
- Alternatively measured as: average toxicity of posts engaged with divided by average toxicity of all rendered posts, where engaged can be shares, clicks, reactions
e. Engagement rate with political/civic posts
- Political/civic posts will be classified using the classifier described in https://arxiv.org/abs/2403.13362
f. Average toxicity of posts created (Jigsaw)
g. Attrition, per platform, defined as the fraction of users who had at least one session in at month 5 as compared to month 1, controlling for extension uninstallation.

Heterogeneity:
We will look at the same angles of heterogeneity as for the primary outcomes. For outcome 2, we will additionally look at heterogeneity with respect to social media platform.
Secondary Outcomes (explanation)
Re 1: Social Trust will be measured using the following survey questions:

- "Generally speaking, would you say that most people can be trusted, or that you can't be too careful in dealing with people?"
- Outparty friends (Druckman, Levendusky 2019, Rajadesingan et al, 2023)
- "How comfortable are you having close personal friends who are [Outparty]?"

We will standardize scores for each statement and add them up to create a single index measure.

Experimental Design

Experimental Design
Information on the experimental design is hidden until the end of the trial.
Experimental Design Details
The primary purpose of the experiment is to measure the effect of various prosocial ranking approaches on outcomes surrounding polarization, information, and wellbeing.

We recruit participants to install a browser extension called "Social Media Lab", which has the ability to re-order, insert, and remove content on Twitter, Facebook, and Reddit. The extension was specifically designed for Google Chrome browser, and we screen/exclude other browsers. Recruitment is conducted through survey research and market research companies (such as CloudResearch, Positly, PureSpectrum, Cint, and several others.) We will recruit up to 15,000 participants.

Users are randomized into one of the experimental groups. In the treatment groups, during the intervention period, we will employ a number of prosocial ranking algorithms to adjust the order and composition of content each participant receives (see Intervention section for details). In the control group, content will be processed, but not re-ordered or otherwise affected. All participants in treatment groups will also receive periodical in-feed survey questions. Two-thirds of the control group will also receive the in-feed surveys.

After installation and completion of a baseline survey, the user enters a baseline period, where no content will be altered (but in-feed surveys will be sent). After at least 3 weeks, but no earlier than August 29, 2024, treatment participants will enter an intervention period of 18 weeks, where the extension re-ranks content in the groups where this is required, according to the procedures outlined for that arm. On October 22, 2024, we will start inviting participants to take a midline survey, to be completed prior to the Nov 5 election. Near the end of the intervention period, the respondents will be sent an endline survey.

The primary outcome and secondary survey outcomes will be measured in all three surveys (baseline, midline, endline), with the exception of primary outcome 9 (negative emotions) which will only be measured in-feed. We will use a canonical difference-in-difference specification for testing hypotheses. For primary outcome 10 (social media use) and secondary outcome 2 (further measures of user engagement) we adopt the two-way fixed effects model (TWFE) as our main specification. First, for each participant, we define time periods t as days in the study relative to their individual start date. Second, we generate a treatment dummy Dit, indicating whether the intervention was on for individual i in period t. Lastly, we regress the outcome variable Yit on the treatment dummy Dit with individual fixed effects αi and period fixed effects δt. We will use Driscoll and Kraay standard errors to account for serial and cross-sectional dependence for all measures where we have a sufficiently long panel (Cameron and Miller, 2015). We will perform robustness checks to DiD specifications alternative to TWFE.

We will perform three types of hypothesis testing. First, we will pool all treatment arms as a “prosocial ranking” treatment and compare it to the control. Second, we will pool the two arms using Jigsaw prosocial classifiers. Lastly, we will look at individual arm treatment effects compared to the control.
Randomization Method
The browser extension assigns the user one of the experimental groups using a random number generator. The control group will be twice as large as each treatment group.
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
N/A
Sample size: planned number of observations
The number of observations depends on how successful we will be in recruiting users to install the browser extension and how many of them stay in the study through the endline. We are recruiting through a variety of online research and market research companies (CloudResearch, Forthright, PureSpectrum, Cint, several others). We estimate that we will be able to recruit approximately 15,000 users at baseline, and put an upper bound on retention to endline at 80% = 12,000.
Sample size (or number of clusters) by treatment arms
We will randomly assign individuals to treatment groups with equal probabilities, with the exception that the likelihood of being assigned a control group is twice as high as being assigned any other individual treatment group.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Committee for Protection of Human Subjects University of California, Berkeley
IRB Approval Date
2024-05-06
IRB Approval Number
2024-03-17285

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials