Back to History

Fields Changed

Registration

Field Before After
Trial Title Can Customer Reviews Reduce Statistical Discrimination? Implications for Online Marketplaces The impact of qualitative reviews in online markets: Empirical and experimental evidence on statistical discrimination
Abstract We investigate the role of reviews in statistical discrimination in the sharing economy (specifically in online rental markets). Using a controlled experiment in an Airbnb-like setting, we measure how quantitative and qualitative customer review information affects accommodation demand across minority and non-minority hosts. We create fictitious listings using scraped data from Airbnb and systematically vary host characteristics (through photos and names), the number of available customer reviews, and the informativeness and quality of the available reviews. Our experimental design consists of three between participants treatments: one treatment varying host race (minority/non-minority) and the number of reviews (few/many, keeping quality of reviews fixed), one varying host race and informativeness of reviews when all reviews are positive, and another treatment varying host race and review informativeness when the reviews include one negative. This approach allows us to isolate the specific mechanisms through which customer reviews influence statistical discrimination. Our findings will provide insights for platform design to reduce racial discrimination in the sharing economy, complementing existing observational studies on discriminatory behaviour in online markets. We investigate the role of customer reviews and host demographics in statistical discrimination within the sharing economy (specifically in online rental markets). Using a controlled experiment in an Airbnb-like setting, we measure how a host's race, a host's gender, and customer reviews interact to affect accommodation demand. We create fictitious listings using scraped data from Airbnb and systematically vary host characteristics across three primary dimensions in a fully crossed 2x2x2 factorial design: Host Race (Black/White), Host Gender (Man/Woman), and a Review factor (High/Low). To isolate specific mechanisms of review-based discrimination, the exact nature of the Review factor varies across three between-participant treatments, manipulating either review quantity, positive informativeness, or negative informativeness. Our experimental design uses a forced-choice pairwise mechanism across three budget blocks (Low, Mid, High). To ensure perfect orthogonality and counterbalancing, the pairings are drawn from a comprehensive property map, with participants assigned to one of 56 block-randomised survey versions. This approach allows us to estimate the causal main effects of each attribute, as well as their interactions, to understand whether specific types of high-quality reviews can mitigate intersectional demographic penalties. Our findings will provide insights for platform design to reduce discrimination in the sharing economy.
Trial Start Date June 01, 2025 March 30, 2026
Trial End Date December 31, 2025 July 31, 2026
Last Published May 27, 2025 07:04 AM March 24, 2026 12:57 PM
Intervention Start Date June 01, 2025 March 30, 2026
Intervention End Date December 31, 2025 July 31, 2026
Primary Outcomes (End Points) Ranking of target properties Whether target property was chosen
Primary Outcomes (Explanation) Each participant will be presented with 4 sets of 6 fictitious properties. In each set, there will be one target property that we will vary between participants according to the treatment they are in as detailed in our description of the treatments in the Experimental design section. The participant’s ranking of the target property in each set (a number between 1 and 6) is our primary outcome. Each participant will be presented with 33 pairs of fictitious properties across three budget blocks (Low, Mid, High) and asked to select their most preferred property in each pair. Out of these 33 rounds, 28 are the primary experimental rounds containing the fully counterbalanced 2x2x2 attribute variations. The remaining 5 rounds are fixed filler/attention-check rounds. For the analysis, the data from the 28 experimental rounds will be reshaped into a "long format," resulting in 56 observations per participant (two competing properties per round). Our primary outcome is a binary indicator (0 or 1) for whether a specific property variant was selected by the participant.
Experimental Design (Public) General structure of the experiment The experiment will run online on Prolific. After obtaining informed consent, participants will first report details about their most recent rental experience, which will be used to customize the price range of properties shown to them. The main task consists of four rounds, where in each round participants rank six fictitious properties in order of preference. To encourage participants to reveal how they believe others perceive the properties, participants are incentivised with bonus payments based on how closely their rankings align with the modal ranking (the most common ordering chosen by other participants). The experiment concludes with a post-experimental survey comprising an Implicit Association Test to measure implicit biases and basic demographic questions. Treatments Our experimental design consists of three treatments, each showing participants 4 sets of 6 fictitious properties. In each set, there will be one target property that we will vary between participants according to the treatment they are in. In the first treatment, we will vary the host race (minority/non-minority) and review quantity (low/high, keeping quality of reviews fixed) of such property. In the second treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when all reviews are positive. In the third treatment, we will vary the host race and informativeness of reviews (low/high, keeping number of reviews fixed) when one of the reviews is negative. Participants are randomly assigned to one treatment and see each target property configuration exactly once, ensuring they cannot compare different versions of the same property. Within their assigned treatment, participants evaluate four different sets of properties, with the target property's characteristics systematically varied across sets. Hypotheses For all treatments, a benchmark hypotheses is that properties with minority hosts will receive lower rankings compared to identical properties with majority hosts. After establishing the existence of a ranking difference due to race, we are interested in each treatment to study the effect of reviews on this difference. We hypothesize that: i. Controlling for host characteristics and review quality, the quantity of reviews will affect participants' ranking. ii. Controlling for host characteristics and review quantity, the informativeness of reviews will affect participants' ranking. Hypothesis ii will be tested separately for treatments 2 and 3, so we can study how informativeness affects the ranking gap in the presence and absence of a negative review. This design allows us to isolate the effects of host race, review quantity, and review quality on property rankings while minimising potential confounds. Analysis of main effects We will run the following regressions for participants in the first and second treatments, respectively: Prob(Rank_{ijt} ≤ k) = Λ(κₖ - β₀ + β₁Minority_i + β₂ LowReviews_i β₁₂(Minority_i x LowReviews_i) + γₚ + δₜ) Where: Rank_ijt is the ranking (1-6) given to property i by participant j in set t Minority_i is a dummy variable indicating whether the host is a minority LowReviews_i is a dummy for low quantity/informativeness of reviews (1 if low, 0 if high) γₚ are participant fixed effects δₜ are set fixed effects This specification would test: 1) Whether minority hosts receive lower rankings: H1: β₁ < 0 2) Whether low quantity/quality of reviews leads to lower rankings: H2: β₂ < 0 Exploratory analysis While not a main hypothesis, we implicitly assume that the baseline effect of minority host status (β₁) is consistent across both review quantity and quality treatments. This additional hypothesis could provide interesting insights about whether discrimination against minority hosts varies depending on the type of information (quantity vs. quality of reviews) being considered. Therefore, we will also test this hypothesis (H4) by comparing the coefficients across the two regressions using a statistical test (like a Chow test or z-test for equality of coefficients from separate regressions). We also aim to investigate whether the main effects tested above (in H1, H2) interact with experimental variables. One plausible interaction effect would be that for non-minority hosts there is little or no statistical discrimination to start with, so higher number/quality of reviews does not change the ranking much, whereas for minority hosts, the effect may be stronger. We test this hypothesis separately for each treatment arm. For each of the treatment arms, this hypothesis is captured by: H5: |β₂ + β₁₂| > |β₂| - where β₂ represents the effect of low quantity/informativeness reviews for non-minority hosts, and - (β₂ + β₁₂) represents the effect of low quantity/informativeness reviews for minority hosts In other words, we expect the interaction terms (β₁₂) to be negative and significant, indicating that minority hosts are more heavily penalised for having few or low-quality reviews compared to non-minority hosts. For the exploratory hypothesis, we will apply Benjamini-Hochberg corrections to exploratory hypotheses (H₄ – H₅) to control the false discovery rate at α = 0.10. Robustness Checks We will assess robustness by re-estimating models without random effects (clustering SEs at the participant level), including set-level random effects, and with different covariance structures. We will also formally test the proportional odds assumption using a Brant test. If violated, we will consider partial proportional odds models or multinomial logistic regression as alternatives. General structure of the experiment The experiment will run online on Prolific. After obtaining informed consent, participants will first report details about their most recent rental experience. The main task consists of 33 rounds divided into three budget blocks. In each round, participants are presented with two fictitious properties and must choose the one they would prefer to rent. To encourage truthful revelation of preferences and mitigate social desirability bias, participants are incentivised with bonus payments based on how closely their choices align with the modal choice (the option most frequently selected by other participants). The experiment concludes with a post-experimental survey comprising an Implicit Association Test to measure implicit biases and standard demographic questions. Treatments Our experimental design consists of three between-participant treatments. In all treatments, we use a within-subjects orthogonal design where participants evaluate 28 pairs of properties across three primary dimensions. • Treatment 1 (Quantity): We vary the host race (minority/non-minority), host gender (man/woman), and review quantity (low/high, keeping the quality of reviews fixed). • Treatment 2 (Positive Informativeness): We vary the host race, host gender, and the informativeness of reviews (low/high, keeping the number of reviews fixed) when all reviews are positive. • Treatment 3 (Negative Informativeness): We vary the host race, host gender, and the informativeness of reviews (low/high, keeping the number of reviews fixed) when one of the reviews is negative. Participants are randomly assigned to one of the three treatments. To prevent participants from seeing the exact same property twice while ensuring that all combinations of traits are tested against each other, the 28 experimental rounds are constructed using a predefined "property map". To achieve perfect counterbalancing, participants within each treatment are randomly assigned to one of 56 distinct participant types. This rotation ensures that property attributes are orthogonal to the round number and budget tier, minimising order effects.
Planned Number of Clusters 5,250 (1,750 per treatment arm). Based on a pilot study, we may adjust the sample size up or down by up to 250 individuals. 1,200 individuals (400 per treatment arm)
Planned Number of Observations 21,000 (7,000 per treatment arm, 4 from each participant) Because participants make 28 valid experimental choices evaluated in pairs, the effective number of observations is 400 * 28 * 2 = 22,400 observations per treatment (67,200 total).
Sample size (or number of clusters) by treatment arms 5,250 (1,750 per treatment arm). Based on a pilot study, we may adjust the sample size up or down by up to 250 individuals. 400 individuals per treatment
Power calculation: Minimum Detectable Effect Size for Main Outcomes We simulate data following our model specification across a grid of parameter values, including the minority penalty coefficient, review quantity/quality effects, and interaction terms for different sample sizes. For each parameter combination, we generated 1,000 synthetic datasets to estimate statistical power for detecting discrimination effects. With the chosen sample size, we should be able to pick up a minority penalty coefficient of at least 0.1 in magnitude for 80% power. We verified that smaller magnitudes of the minority penalty coefficient would lead to a standardised effect size that would not be economically meaningful. We base our sample size on the hardest to detect main effect identified in our pilot (the demographic penalties, which showed a roughly 3.0% difference in selection rates). We simulate data using the 28-round, long-format structure and the group means/standard deviations from the pilot. By running the full LPM on synthetic datasets across a grid of sample sizes, we determine the number of participants required to achieve 90% statistical power to detect a main effect coefficient of 0.032 at α = 0.05 to be N=400 per treatment arm.
Back to top