Can Customer Reviews Reduce Statistical Discrimination? Implications for Online Marketplaces

Last registered on May 27, 2025

View Trial History

Pre-Trial

Trial Information

General Information

Title

Can Customer Reviews Reduce Statistical Discrimination? Implications for Online Marketplaces

RCT ID

AEARCTR-0015282

Initial registration date

May 22, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

May 27, 2025, 7:04 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

United Kingdom of Great Britain and Northern Ireland

Region

Cambridge

Primary Investigator

Name

Konstantinos Ioannidis

Affiliation

University of Cambridge

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Noriko Amano-Patiño

PI Affiliation

University of Cambridge

Contact Investigator

PI Name

James Morris

PI Affiliation

University of Cambridge

Contact Investigator

Additional Trial Information

Status

In development

Start date

2025-06-01

End date

2025-12-31

Keywords

Behavior, Labor, Welfare

Additional Keywords

Statistical Discrimination, Online Rental Markets, Sharing Economy

JEL code(s)

D83, J15, L84

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

We investigate the role of reviews in statistical discrimination in the sharing economy (specifically in online rental markets). Using a controlled experiment in an Airbnb-like setting, we measure how quantitative and qualitative customer review information affects accommodation demand across minority and non-minority hosts. We create fictitious listings using scraped data from Airbnb and systematically vary host characteristics (through photos and names), the number of available customer reviews, and the informativeness and quality of the available reviews. Our experimental design consists of three between participants treatments: one treatment varying host race (minority/non-minority) and the number of reviews (few/many, keeping quality of reviews fixed), one varying host race and informativeness of reviews when all reviews are positive, and another treatment varying host race and review informativeness when the reviews include one negative. This approach allows us to isolate the specific mechanisms through which customer reviews influence statistical discrimination. Our findings will provide insights for platform design to reduce racial discrimination in the sharing economy, complementing existing observational studies on discriminatory behaviour in online markets.

External Link(s)

Registration Citation

Citation

Amano-Patiño, Noriko, Konstantinos Ioannidis and James Morris. 2025. "Can Customer Reviews Reduce Statistical Discrimination? Implications for Online Marketplaces." AEA RCT Registry. May 27. https://doi.org/10.1257/rct.15282-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention (Hidden)

Intervention Start Date

2025-06-01

Intervention End Date

2025-12-31

Primary Outcomes

Primary Outcomes (end points)

Ranking of target properties

Primary Outcomes (explanation)

Each participant will be presented with 4 sets of 6 fictitious properties. In each set, there will be one target property that we will vary between participants according to the treatment they are in as detailed in our description of the treatments in the Experimental design section.
The participant’s ranking of the target property in each set (a number between 1 and 6) is our primary outcome.

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

General structure of the experiment

The experiment will run online on Prolific. After obtaining informed consent, participants will first report details about their most recent rental experience, which will be used to customize the price range of properties shown to them. The main task consists of four rounds, where in each round participants rank six fictitious properties in order of preference. To encourage participants to reveal how they believe others perceive the properties, participants are incentivised with bonus payments based on how closely their rankings align with the modal ranking (the most common ordering chosen by other participants). The experiment concludes with a post-experimental survey comprising an Implicit Association Test to measure implicit biases and basic demographic questions.

Treatments

Our experimental design consists of three treatments, each showing participants 4 sets of 6 fictitious properties. In each set, there will be one target property that we will vary between participants according to the treatment they are in. In the first treatment, we will vary the host race (minority/non-minority) and review quantity (low/high, keeping quality of reviews fixed) of such property. In the second treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when all reviews are positive. In the third treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when one of the reviews is negative. Participants are randomly assigned to one treatment and see each target property configuration exactly once, ensuring they cannot compare different versions of the same property. Within their assigned treatment, participants evaluate four different sets of properties, with the target property's characteristics systematically varied across sets.

Hypotheses

For all treatments, a benchmark hypotheses is that properties with minority hosts will receive lower rankings compared to identical properties with majority hosts. After establishing the existence of a ranking difference due to race, we are interested in each treatment to study the effect of reviews on this difference.

We hypothesize that:
i. Controlling for host characteristics and review quality, the quantity of reviews will affect participants' ranking.
ii. Controlling for host characteristics and review quantity, the informativeness of reviews will affect participants' ranking.
Hypothesis ii will be tested separately for treatments 2 and 3, so we can study how informativeness affects the ranking gap in the presence and absence of a negative review. This design allows us to isolate the effects of host race, review quantity, and review quality on property rankings while minimising potential confounds.

Analysis of main effects

We will run the following regressions for participants in the first and second treatments, respectively:
Prob(Rank_{ijt} ≤ k) = Λ(κₖ - β₀ + β₁Minority_i + β₂ LowReviews_i β₁₂(Minority_i x LowReviews_i) + γₚ + δₜ)

Where:
Rank_ijt is the ranking (1-6) given to property i by participant j in set t
Minority_i is a dummy variable indicating whether the host is a minority
LowReviews_i is a dummy for low quantity/informativeness of reviews (1 if low, 0 if high)
γₚ are participant fixed effects
δₜ are set fixed effects

This specification would test:
1) Whether minority hosts receive lower rankings: H1: β₁ < 0
2) Whether low quantity/quality of reviews leads to lower rankings: H2: β₂ < 0

Exploratory analysis

While not a main hypothesis, we implicitly assume that the baseline effect of minority host status (β₁) is consistent across both review quantity and quality treatments. This additional hypothesis could provide interesting insights about whether discrimination against minority hosts varies depending on the type of information (quantity vs. quality of reviews) being considered. Therefore, we will also test this hypothesis (H4) by comparing the coefficients across the two regressions using a statistical test (like a Chow test or z-test for equality of coefficients from separate regressions).

We also aim to investigate whether the main eﬀects tested above (in H1, H2) interact with experimental variables. One plausible interaction eﬀect would be that for non-minority hosts there is little or no statistical discrimination to start with, so higher number/quality of reviews does not change the ranking much, whereas for minority hosts, the eﬀect may be stronger. We test this hypothesis separately for each treatment arm. For each of the treatment arms, this hypothesis is captured by:
H5: |β₂ + β₁₂| > |β₂|
- where β₂ represents the effect of low quantity/informativeness reviews for non-minority hosts, and
- (β₂ + β₁₂) represents the effect of low quantity/informativeness reviews for minority hosts
In other words, we expect the interaction terms (β₁₂) to be negative and significant, indicating that minority hosts are more heavily penalised for having few or low-quality reviews compared to non-minority hosts.

For the exploratory hypothesis, we will apply Benjamini-Hochberg corrections to exploratory hypotheses (H₄ – H₅) to control the false discovery rate at α = 0.10.

Robustness Checks

We will assess robustness by re-estimating models without random effects (clustering SEs at the participant level), including set-level random effects, and with different covariance structures.

We will also formally test the proportional odds assumption using a Brant test. If violated, we will consider partial proportional odds models or multinomial logistic regression as alternatives.

Experimental Design Details

Randomization Method

Computerized randomization

Randomization Unit

Individual randomisation

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

5,250 (1,750 per treatment arm). Based on a pilot study, we may adjust the sample size up or down by up to 250 individuals.

Sample size: planned number of observations

21,000 (7,000 per treatment arm, 4 from each participant)

Sample size (or number of clusters) by treatment arms

5,250 (1,750 per treatment arm). Based on a pilot study, we may adjust the sample size up or down by up to 250 individuals.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

We simulate data following our model specification across a grid of parameter values, including the minority penalty coefficient, review quantity/quality effects, and interaction terms for different sample sizes. For each parameter combination, we generated 1,000 synthetic datasets to estimate statistical power for detecting discrimination effects. With the chosen sample size, we should be able to pick up a minority penalty coefficient of at least 0.1 in magnitude for 80% power. We verified that smaller magnitudes of the minority penalty coefficient would lead to a standardised effect size that would not be economically meaningful.

Supporting Documents and Materials

IRB