Social Media Toxicity and Mental Health

Last registered on October 21, 2024

Pre-Trial

Trial Information

General Information

Title
Social Media Toxicity and Mental Health
RCT ID
AEARCTR-0013987
Initial registration date
July 09, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 16, 2024, 2:31 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 21, 2024, 8:35 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
University of Warwick

Other Primary Investigator(s)

PI Affiliation
WZB and Berlin School of Economics
PI Affiliation
Columbia University
PI Affiliation
Columbia University

Additional Trial Information

Status
In development
Start date
2024-06-03
End date
2025-06-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
More information about the study will be available after the completion of the trial.
External Link(s)

Registration Citation

Citation
Beknazar-Yuzbashev, George et al. 2024. "Social Media Toxicity and Mental Health ." AEA RCT Registry. October 21. https://doi.org/10.1257/rct.13987-1.1
Sponsors & Partners

Sponsors

Experimental Details

Interventions

Intervention(s)
Information on the intervention is hidden until the end of the trial.
Intervention Start Date
2024-07-10
Intervention End Date
2025-03-31

Primary Outcomes

Primary Outcomes (end points)
1. Mental Health of the respondents measured with two validated scales: PHQ8 (depression) and GAD7 (anxiety).

Heterogeneity:
We will look at heterogeneity with respect to: baseline level of mental health, political affiliation, minority status, and interaction of minority status with baseline level of mental health. Details are provided below.
Primary Outcomes (explanation)
Re 1: For each scale, we will sum the points for each question. Moreover, we will create a composite index which is the average between the PHQ8 score and the GAD7 score. We will z-score the index. We provide the list of PHQ8 and GAD7 from our surveys for completeness.

PHQ8: Over the last 2 weeks, how often have you been bothered by any of the following?
- Little interest or pleasure in doing things.
- Feeling down, depressed, or hopeless.
- Trouble falling or staying asleep, or sleeping too much.
- Feeling tired or having little energy.
- Poor appetite or overeating.
- Feeling bad about yourself — or that you are a failure or have let yourself or your family down.
- Trouble concentrating on things, such as reading the newspaper or watching television.
- Moving or speaking so slowly that other people could have noticed. Or the opposite – being so fidgety or restless that you have been moving around a lot more than usual.

For each statement, participants select one of four options: not at all, several days, more than half of the days, nearly every day. These will be coded as 1, 2, 3, and 4 respectively when the outcomes are computed (as explained above).

GAD7: Over the last 2 weeks, how often have you been bothered by any of the following?
- Feeling nervous, anxious, or on edge.
- Not being able to stop or control worrying.
- Worrying too much about different things.
- Trouble relaxing.
- Being so restless that it is hard to sit still.
- Becoming easily annoyed or irritable.
- Feeling afraid, as if something awful might happen.

For each statement, participants select one of four options: not at all, several days, more than half of the days, nearly every day. These will be coded as 1, 2, 3, and 4 respectively when the outcomes are computed (as explained above).

Heterogeneity:
- baseline level of mental health (the respondents will be split into two groups: above and below median score of the composite index of mental health),
- political affiliation (Democrats and Independents leaning towards Democrats vs. Independents leaning towards Republicans and Republicans),
- minority status (female respondents, members of LGBTQ+ community, ethnic minorities),
- interaction of minority status and baseline level of mental health (above/below median baseline MH score x having/having minority status).

Secondary Outcomes

Secondary Outcomes (end points)
1.Beliefs about likelihood of discrimination

2. Moral superiority

3. Beliefs about altruism of others

4. Willingness to follow a bot/page/group offering particular content (midline)

5. Willingness to accept to temporarily deactivate their accounts on social media platforms after the endline

Heterogeneity:
We will look at the same angles of heterogeneity as for the primary outcomes.
Secondary Outcomes (explanation)
Re 1: For each of the following statements, we measure agreement on a scale from 0 to 100 regarding the prevalence of various forms of discrimination.

How prevalent do you think are the following situations?
1) Women facing sexual harassment (ranging from inappropriate jokes to unwanted advances) in the workplace.
2) Members of ethnic minorities and LGBTQ+ individuals experiencing hate crime (ranging from abusive language to assault).
3) Despite having equal qualifications, women are not getting hired for a managerial role.
4) Despite having equal qualifications, straight white men are not getting hired because they do not add to the diversity of the workplace.
5) Supervisors in a workplace being excessively critical and demeaning towards employees.
6) Older individuals are denied equal opportunities because of their age.

To compute the outcome, we will look at the mean of standardized scores for each statement. Separately, we will consider the mean of standardized scores but only for scenarios that describe discrimination against a group of which the respondent is a member (for “old people” we will use a cutoff of 50 years old).

Re 2: First, we ask participants to decide whether each of the following statements applies to them (“yes” or “no”).
1) I sometimes feel resentful when I don’t get my way.
2) I’m always willing to admit it when I make a mistake.
3) There have been times when I was quite jealous of the good fortune of others.
4) I am always courteous, even to people who are disagreeable.
5) There have been occasions when I took advantage of someone.

To compute an index of absolute moral superiority, we first standardize the proportion of “yes” answers for each statement. Then, the scores for statements that indicate lower moral superiority (1, 3, 5) are multiplied by −1. Lastly, we compute the mean of the sign-adjusted standardized scores.

Separately, we look at relative moral superiority. Participants are asked the following question:

Considering the statements listed above, over the last 2 weeks, do you think that you represented more or fewer positive qualities than an average person?

They can indicate agreement from 0 to 100 using a slider.

Re 3: In a different survey (with a representative sample of U.S. adults), we offered respondents who use social media a surprise bonus payment of $1 at the very end.

They could anonymously split the bonus between themselves and Save the Children USA – a charity offering aid and support to children with economic and emotional needs.

We ask “On average, what proportion of the bonus payment do you think the respondents who use social media donated to the charity?” ($0-$1 slider with $0.01 increments).

Re 4: We provide an example of the outcome for a bot on X (formerly Twitter). The decision made by participants is whether or not they want to follow the bot.

The bot will be sharing content with some toxicity. We signal toxicity by showing some screenshots of what the first posts will be, but we will not mention it directly.

Overall, the bot is designed to post tweets that have been carefully curated to drive high engagement. The bot tweets five times a day, and each tweet is crafted to spark discussions, generate likes, and be shared widely.

These tweets might include a mix of the following (we will provide screenshots of toxic tweets as an example):

- Thought-provoking questions that encourage people to share their opinions.
- Interesting facts or statistics that prompt conversations.
- Humorous content that is likely to be retweeted and liked.
- Trending topics or current events that are relevant and timely.

Re 5: We will use the same approach as Butera et al. (2022) to measure WTP/WTA to deactivate their social media accounts for 4 weeks after. We will inform participants that a random subset of respondents will see their choices implemented. The number of implemented choices will depend on the funding availability.

Experimental Design

Experimental Design
Information on the experimental design is hidden until the end of the trial.
Experimental Design Details
Not available
Randomization Method
After the user installs the browser extension (and agrees to the data collection), the extension assigns the user one of the experimental groups. The following example visualizes our approach well. The extension generates a random number between 0 and 1. If the number exceeds 0.4, the user is assigned one of the treatments: if it is between 0.4 and 0.7 to the treatment that uses Perspective API as classifier, if it is above 0.7 to the treatment that additionally uses Alienation as classifier. If it is below 0.4, the user is assigned to the control group: if it is between 0 and 0.2 it is assigned to the control group with no hiding, while if it is between 0.2 and 0.4 then it is assigned to the control group with random hiding.

Between the midline and the endline, we are planning to cross-randomize the respondents in both control groups into two groups that will either experience an engagement intervention or not. Randomization will be performed by the extension.
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
N/A
Sample size: planned number of observations
The number of observations depends on how successful we will be in promoting the browser extension (it will be promoted through advertisement on social media in addition to recruitment via survey companies). We estimate to be able to recruit approximately 2,500 users.
Sample size (or number of clusters) by treatment arms
We will randomly assign individuals to treatment groups with the following probabilities: 40% to the control group (equal split between pure control with no hiding and control with random hiding), 30% to a treatment group that uses Perspective API classifier, and 30% to a treatment group that additionally uses Alienation classifier.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We use the software GPower to derive the optimal sample size for the experiment. The family test is a one-sided t test and the statistical test is a linear bivariate regression with two groups and differences between slopes (diff-in-diff). We assume that parameters alpha and 1-beta are equal to 0.05 and 0.8, we fix that the allocation ratio of the respondents of the treatment group relative to the control group is 1.5, we assume a standard deviation for the outcomes being the same across groups and equal to 1, and we target a minimum detectable effect of 0.1SD. This implies a total sample of 2,578 respondents (divided in 1,547 in the treatment group and 1,031 in the control group). The minimum detectable effect of our study is reasonable given the results of similar experiments. For example, Braghieri et al. (2022) reports the effect of Facebook adoption on an index of mental health of college students (based on the NCHA survey) to be around 0.085 standard deviation units. Similarly, in a Facebook deactivation experiment, Allcott et al. (2020) report an effect on mental health of approximately 0.09 standard deviation units. Furthermore, it is likely that our intervention, hiding toxic content on three social media platforms, might be stronger than the interventions cited above. First, we hide toxicity on three key social media platforms: X, Facebook, Reddit. Second, we remove arguably “the worst” part of social media (toxic content), while keeping many benefits (such as interacting with friends). Lastly, it is important to point out that some interventions (such as providing access to mental health support apps) are very successful in improving an index of anxiety, depression, and stress even when an intervention lasts only several weeks: Shreekumar and Vautrey (2024) reports effects 0.38 standard deviations at two weeks and 0.46 SDs at four weeks, with persistent effects three months later.
IRB

Institutional Review Boards (IRBs)

IRB Name
Morningside IRB, Columbia University
IRB Approval Date
2024-05-15
IRB Approval Number
IRB-AAAU8117
IRB Name
Committee for Protection of Human Subjects University of California, Berkeley
IRB Approval Date
2024-01-31
IRB Approval Number
2023-10-16782