Investigating Algorithmic Bias on Spotify

Last registered on October 08, 2024

Pre-Trial

Trial Information

General Information

Title
Investigating Algorithmic Bias on Spotify
RCT ID
AEARCTR-0014495
Initial registration date
September 30, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
October 07, 2024, 7:08 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 08, 2024, 8:43 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region
Region
Region

Primary Investigator

Affiliation
Rennes University

Other Primary Investigator(s)

PI Affiliation
Sorbonne Paris Nord University
PI Affiliation
ENSAE
PI Affiliation
Sorbonne Paris Nord University

Additional Trial Information

Status
In development
Start date
2024-10-08
End date
2024-11-04
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In this “sock puppets” experiment, we test whether the recommendation algorithm of a music streaming platform is biased in favor of titles from large economies (here, USA) to the detriment of titles from a small economy (here, France). To do this, we created fictitious users that vary along four dimensions. Firstly, they vary according to their preference for small-economy titles (No, Small, Large, Total). In addition, the profiles vary according to their location (France and USA), their preference for recent or old titles and whether or not they consume titles recommended by the algorithm. Each day of the experiment, users will consume titles in the Get Recommendations API according to their profile (input data). After each consumption phase, the recommended titles are recorded (output data). We can then compare the difference between the proportion of titles of one nationality in a user's input data and the proportion of titles of the same nationality in the recommendation list (output data). We can test whether the result varies according to the preference for the small-economy music, but also to the user's location, the age of the titles and whether the user consumes recommended titles.
External Link(s)

Registration Citation

Citation
Aly-Tovar, Ramadan Jose et al. 2024. "Investigating Algorithmic Bias on Spotify." AEA RCT Registry. October 08. https://doi.org/10.1257/rct.14495-1.1
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
Several countries have implemented broadcasting quotas for radio and TV, requiring stations to allocate a percentage of airtime to locally produced music. The proponents defend such strategy as a channel to promote and preserve local culture identities.
The rise of digital streaming platforms raised new concerns regarding the supply of cultural goods.
As streaming platforms rely on recommendation systems (editorial or algorithmic), partisans of govern- ment regulation are especially concerned with whether and to what extent these systems steer consumers, eventually having a negative impact on the consumption and production of local content.
For example, platforms could intentionally recommend certain content, putting them on a spotlight position such as playlists (Aguiar & Waldfogel, 2021) or other editorial recommendation systems.
Additionally, algorithms used to automatically recommend content to consumers could be biased, either intentionally or unintentionally, towards certain types of content, potentially undermining the representation of local content. For example, if a streaming service's algorithm predominantly recommends international hits over local music, it can reduce the visibility and reach of local artists. This could happen, for example, if streaming platforms intentionally favoured certain artists if such recommendations increased the platforms’ profitability (Bourreau & Gaudin, 2022). Recommendations could also be biased due to the lack of representativeness of data at a global level (Hesmondhalgh et al., 2023).
In this context, the aim is to study whether algorithmic recommendations introduce a bias against local titles and in favor of titles from large economy, such as USA.
To answer, we will run a “sock puppets” experiment on the Spotify's API Get Recommendations, based on the creation of fictitious users (bots). Each fictitious user is assigned to a profile, varying mainly according to their preference for small-economy titles (high, medium, zero). Here, the small economy is France.
Intervention (Hidden)
Intervention Start Date
2024-10-08
Intervention End Date
2024-11-04

Primary Outcomes

Primary Outcomes (end points)
For each user each day of the expriment, we will introduce a distance metric to quantify the discrepancy between the proportion of items in a user’s input data that belong to a particular category (e.g., "US-based music") and the proportion of items in the recommendation list (output data) belonging to the same category.
Primary Outcomes (explanation)
For each fictitious user, 5 titles are randomly selected according to her profile and "play" in the API Get recommendations. For each 5-title request, the API gives by default 20 recommendations of titles (outcomes data).
We will then calculate a distance metric to quantify the discrepancy between the proportion of input titles that belong to a particular category (e.g., "US-based music") and the proportion of output titles belonging to the same category.
For each day of the experiment (14 days), we will collect these metrics for each fictitious user.
The nationality of a track can be defined by referring to its ISRC, which is a unique number assigned to each musical track.

Secondary Outcomes

Secondary Outcomes (end points)
Beyond nationality, we can also refer to the language of the titles. A procedure will be created to determine the language from spotify data.

For robustness check, we also create fictitious users from Quebec (Canada) where French is also the local language. We will run the same experiment in the aera but with different list of input titles.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We will run a “sock puppets” experiment on the Spotify's API Get Recommendations. This type of experiment is based on the creation of fictitious users.
Each fictitious user is assigned to a profile, varying according four dimensions. Firstly, they vary according to their preference for small-economy titles: No (0 French title and 5 US titles in input data), Small (2 French titles and 3 US titles in input data), Large (3 French titles and 2 US titles in input data), Total (5 French titles and 0 US title in input data).
In addition, the profiles vary according to their location (France versus USA), their preference for recent music (Yes, No) and whether they "consume" titles recommended by the algorithm (Yes, No). For location, we register each user in the API according to her assigned nationality and use proxy (VPN).
Each day of the experiment, users will "consume" 5 titles in the Get Recommendations API according to their profile. These titles are randomly chosen in a list of titles according to their profiles. These list have been created using the Top songs of Spotify and the SNEP (the dominant union in the French music industry). This procedure is used to check the success of input data. During the first seven days, the procedure is identical whether the fictitious user is assigned to the “consume recommendations” condition or not. However, from the eighth day onwards, the 5 titles of fictitious users assigned to the “consume recommendations” condition are drawn at random from the list of titles recommended to them the previous day (d-1). For the others, the experiment continues as before.
After each consumption phase, we will record the 20 recommended titles.
Experimental Design Details
Randomization Method
Each fictitious user (bot) is randomly assigned to a profile by a computer program.
Randomization Unit
The unit of randomization is fictitious user.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
In total, 960 fictitious users (bots) have been created: 320 for France, 320 for USA and 320 for Quebec.
Sample size: planned number of observations
For each fictitious user, 5 titles - randomly selected according to profile - will be used as input data. By default, the API recommends 20 titles (output data). We will then calculate a distance metric to quantify the discrepancy between the proportion of items in a user’s input data that belong to a particular category (e.g., "US-based music") and the proportion of items in the recommendation list (output data) belonging to the same category. For each day of the experiment (14 days), we will collect these metrics for each fictitious user. In total, we will obtain 1 metric x 14 days x 920 bots = 12,880 observations.
Sample size (or number of clusters) by treatment arms
Each profile varies according to 4 dimensions:
- Preference for small economy (No, Small, Large, Total)
- Localisation (France, USA)
- Preference for recent music (Yes, No)
- Follow recommendations (Yes, No)

In total, we have 32 profiles (4 Preference for small economy x 2 Localisation x 2 Preference for recent music x 2 Follow Recommandation) or 48 (if we consider a third localisation with Quebec - Canada)

For each profile, we create 20 user accounts.

In total, the experiment is based on 640 fictitious users (bots) for France and USA and 320 more for Quebec.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials