Experimental Design
General structure of the experiment
The experiment will run online on Prolific. After obtaining informed consent, participants will first report details about their most recent rental experience, which will be used to customize the price range of properties shown to them. The main task consists of four rounds, where in each round participants rank six fictitious properties in order of preference. To encourage participants to reveal how they believe others perceive the properties, participants are incentivised with bonus payments based on how closely their rankings align with the modal ranking (the most common ordering chosen by other participants). The experiment concludes with a post-experimental survey comprising an Implicit Association Test to measure implicit biases and basic demographic questions.
Treatments
Our experimental design consists of three treatments, each showing participants 4 sets of 6 fictitious properties. In each set, there will be one target property that we will vary between participants according to the treatment they are in. In the first treatment, we will vary the host race (minority/non-minority) and review quantity (low/high, keeping quality of reviews fixed) of such property. In the second treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when all reviews are positive. In the third treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when one of the reviews is negative. Participants are randomly assigned to one treatment and see each target property configuration exactly once, ensuring they cannot compare different versions of the same property. Within their assigned treatment, participants evaluate four different sets of properties, with the target property's characteristics systematically varied across sets.
Hypotheses
For all treatments, a benchmark hypotheses is that properties with minority hosts will receive lower rankings compared to identical properties with majority hosts. After establishing the existence of a ranking difference due to race, we are interested in each treatment to study the effect of reviews on this difference.
We hypothesize that:
i. Controlling for host characteristics and review quality, the quantity of reviews will affect participants' ranking.
ii. Controlling for host characteristics and review quantity, the informativeness of reviews will affect participants' ranking.
Hypothesis ii will be tested separately for treatments 2 and 3, so we can study how informativeness affects the ranking gap in the presence and absence of a negative review. This design allows us to isolate the effects of host race, review quantity, and review quality on property rankings while minimising potential confounds.
Analysis of main effects
We will run the following regressions for participants in the first and second treatments, respectively:
Prob(Rank_{ijt} ≤ k) = Λ(κₖ - β₀ + β₁Minority_i + β₂ LowReviews_i β₁₂(Minority_i x LowReviews_i) + γₚ + δₜ)
Where:
Rank_ijt is the ranking (1-6) given to property i by participant j in set t
Minority_i is a dummy variable indicating whether the host is a minority
LowReviews_i is a dummy for low quantity/informativeness of reviews (1 if low, 0 if high)
γₚ are participant fixed effects
δₜ are set fixed effects
This specification would test:
1) Whether minority hosts receive lower rankings: H1: β₁ < 0
2) Whether low quantity/quality of reviews leads to lower rankings: H2: β₂ < 0
Exploratory analysis
While not a main hypothesis, we implicitly assume that the baseline effect of minority host status (β₁) is consistent across both review quantity and quality treatments. This additional hypothesis could provide interesting insights about whether discrimination against minority hosts varies depending on the type of information (quantity vs. quality of reviews) being considered. Therefore, we will also test this hypothesis (H4) by comparing the coefficients across the two regressions using a statistical test (like a Chow test or z-test for equality of coefficients from separate regressions).
We also aim to investigate whether the main effects tested above (in H1, H2) interact with experimental variables. One plausible interaction effect would be that for non-minority hosts there is little or no statistical discrimination to start with, so higher number/quality of reviews does not change the ranking much, whereas for minority hosts, the effect may be stronger. We test this hypothesis separately for each treatment arm. For each of the treatment arms, this hypothesis is captured by:
H5: |β₂ + β₁₂| > |β₂|
- where β₂ represents the effect of low quantity/informativeness reviews for non-minority hosts, and
- (β₂ + β₁₂) represents the effect of low quantity/informativeness reviews for minority hosts
In other words, we expect the interaction terms (β₁₂) to be negative and significant, indicating that minority hosts are more heavily penalised for having few or low-quality reviews compared to non-minority hosts.
For the exploratory hypothesis, we will apply Benjamini-Hochberg corrections to exploratory hypotheses (H₄ – H₅) to control the false discovery rate at α = 0.10.
Robustness Checks
We will assess robustness by re-estimating models without random effects (clustering SEs at the participant level), including set-level random effects, and with different covariance structures.
We will also formally test the proportional odds assumption using a Brant test. If violated, we will consider partial proportional odds models or multinomial logistic regression as alternatives.