Improving User Experience on Social Media. A Field Experiment

Last registered on July 08, 2022

View Trial History

Pre-Trial

Trial Information

General Information

Title

Improving User Experience on Social Media. A Field Experiment

RCT ID

AEARCTR-0009628

Initial registration date

July 05, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

July 08, 2022, 9:48 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

United States of America

Region

Primary Investigator

Name

Mateusz Stalinski

Affiliation

University of Warwick

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

George Beknazar-Yuzbashev

PI Affiliation

Columbia University

Contact Investigator

PI Name

Jesse Mc Crosky

PI Affiliation

The Mozilla Foundation

Contact Investigator

PI Name

Rafael Jimenez

PI Affiliation

The University of Chicago

Contact Investigator

Additional Trial Information

Status

In development

Start date

2022-07-06

End date

2023-02-01

Keywords

Behavior, Welfare

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

This is a field experimental study on social media. More details will be available after the trial is completed.

External Link(s)

Registration Citation

Citation

Beknazar-Yuzbashev, George et al. 2022. "Improving User Experience on Social Media. A Field Experiment ." AEA RCT Registry. July 08. https://doi.org/10.1257/rct.9628-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention Start Date

2022-07-20

Intervention End Date

2022-09-30

Primary Outcomes

Primary Outcomes (end points)

Average time (per day) spent on Twitter, Facebook, and YouTube (in minutes)

Heterogeneity:
We will look at heterogeneity with respect to the median level of toxicity that the user is exposed to during the observation period preceding the intervention. In particular, we will look separately at users with above-median toxicity and below-median toxicity.

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

1. A measure of the average amount of content consumed on Twitter, Facebook, YouTube.
2. Average time spent on other social media platforms -- where we do not conduct the hiding intervention (in minutes).
3. Average number of ads users see on Twitter and Facebook (if available).
4. User engagement – measured as the average number of user’s reactions (such as likes), posts, and retweets (Twitter only) on Twitter, Facebook, and YouTube.
5. Average toxicity score of content that the user posts, retweets, or reacts to (e.g. likes) on Twitter, Facebook, and YouTube.
6. Proportion of content on Twitter, Facebook, and YouTube that is hidden and that remains on the user's feed, as well as the average toxicity.
7. Long term effects of the intervention (see the explanation below).

Heterogeneity:
As an additional measure, in the case of outcomes combining multiple platforms, we will also look at each platform separately.
We will also look at heterogeneity with respect to the median level of toxicity in the observation period (as in the case of the primary outcome).

Secondary Outcomes (explanation)

In the case of outcomes 1-5, we will compute averages per day for each individual (unit of observation).

Re 2: The social media platforms where we measure user's time are listed in the attached PDF.

Several of the variables of interest are related to our main direction of inquiry and represent various facets of user engagement. Therefore, we intend to create a summary index consisting of the primary outcome variable as well as the secondary outcomes number 1 and 4, as the equally weighted average of z-scores of its components (following Kling, Liebman, and Katz 2007).

Re 7: While the intervention period for the study is 6 weeks (the primary outcome and secondary outcomes 1-6 are evaluated at that point), we are also interested in the long-term effects of hiding toxic content. Therefore, we will encourage our users to keep the extension installed for a period of up to 6 months, and additionally report the outcome values (for the remaining sample) measured at that point.

Experimental Design

The experimental design description is hidden until the end of the trial.

Experimental Design Details

The primary purpose of the experiment is to measure the effect of an intervention that mimics a policy reducing toxic content on online engagement.

We recruit participants to install a browser extension called "Social Media Research", which has the ability to hide toxic content (posts, comments, replies) on Twitter, Facebook, and YouTube.

Users are randomized into one of two groups. In the treatment group, during the intervention period all content exceeding the threshold level of toxicity is hidden (please see Intervention section for a description on how we assign the toxicity scores). In the control group, no content is hidden. In both groups, the extension loads replies which Twitter places under the “Show more replies” at the bottom of the comment sections (where more toxic content is placed by Twitter). This increases the variation in exposure to toxic content between the groups.

After installation, the user enters an observation period of 2 weeks (measured individually from the day they first open one of the three platforms in their browser with the extension installed) where the extension collects data on their activity but there is no hiding intervention in both groups. Subsequently, the intervention begins for the treatment group.
We will use a difference-in-difference specification to identify the effect of the intervention on our outcome variables. In particular, our specification will include two periods (the observation period where no intervention takes place and the experimental period with the hiding intervention) and two treatment groups (with hiding and without hiding).

For each user, we will compute the proportion of content displayed to the user on Twitter, Facebook, and YouTube that was in a language which our extension does not support (we cannot properly assign a toxicity score). Please note that we will discard all observations where that proportion exceeds 50%. The language detection is performed as part of toxicity score assignment (as explained in Intervention section).

We will also analyze an event-study type of specification of the impact of the intervention on the week-by-week basis.

For each individual, the duration of the intervention is 6 weeks. The exact intervention end date for the study will be the date of recruiting the last individual plus 8 weeks. In accordance with the consent form, the latest possible end date is 9/30/2022, at which point we intend to collect the outcomes of interest and reimburse participants.

Randomization Method

After the user installs the browser extension (and agrees to the data collection), the extension (using JavaScript) generates a random number between 0 and 1. If the number exceeds 0.5, the user is assigned one of the treatments. If it is below 0.5, the user is assigned the other.

Randomization Unit

Individual

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

The number of clusters/observations depends on how successful we will be in promoting the browser extension (it will be promoted through advertisement on Twitter and by the Mozilla Foundation). In particular, the second component is difficult to estimate. However, a rough prediction is that the number of observations should be in the range of 1000-2000.

Sample size: planned number of observations

The same as the number of clusters.

Sample size (or number of clusters) by treatment arms

We randomly assign units to treatments with equal probabilities. Therefore, half of the observations should be in the treatment group and the other half in the control.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

With the number of observations that we have in mind, we should be able to detect effect sizes between 0.125 and 0.177 standard deviations. In a (very small) sample from a pilot, we observed an effect size of 0.5 standard deviations so we expect to have enough power ex ante.

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB