The Limits of Rating Systems in Healthcare Credence Goods Markets

Last registered on November 18, 2021

Pre-Trial

Trial Information

General Information

Title
The Limits of Rating Systems in Healthcare Credence Goods Markets
RCT ID
AEARCTR-0008572
Initial registration date
November 15, 2021
Last updated
November 18, 2021, 11:58 AM EST

Locations

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information

Primary Investigator

Affiliation
University of Innsbruck

Other Primary Investigator(s)

PI Affiliation
University of Innsbruck
PI Affiliation
UMIT Tirol
PI Affiliation
ETH Zurich

Additional Trial Information

Status
In development
Start date
2021-11-16
End date
2022-12-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
A key characteristic of health care markets is the information asymmetry between patients and physicians. Physicians know more about the disease and the appropriate treatment than patients. This may result in different forms of physician misbehavior: providing more treatments than necessary, i.e. overtreatment; providing less treatment than necessary, i.e. undertreatment or charging more treatments than provided, i.e. overcharging. Patients have to trust in physicians that they receive appropriate treatment. This is why health services are often referred to as credence goods (Darby and Karni 1973, Dulleck and Kerschbamer 2006).

The provision of feedback on rating platforms and the associated reputation building has gained more and more attention in the past two decades in the context of physician-patient interactions. In Germany, for instance, about 70% of physician-rating website users are influenced by the rating in their physician choice (Emmert and Meszmer 2018). However, patients base their ratings often on characteristics unrelated to the quality of care (Emmert et al. 2020), thus introducing noise into the quality ratings. We capture these recent developments and investigate the effectiveness of public rating systems on the quality of care with the use of a laboratory experiment.

Based on the credence goods framework established by Dulleck and Kerschbamer (2006) and Dulleck et al. (2011), we introduce a toy model that enables us to derive hypotheses and test them in a laboratory experiment. In total, three conditions of market interactions are planned with 148 undergraduate students either in the role of physicians or patients. In the baseline condition (B), no reputation building is possible between physicians and patients. In the rating condition (R), we introduce the possibility to rate physicians on a rating scale between zero and five stars. The rating is based on the payoff information of patients resulting from the interaction between physician and patient. In the random rating condition (R-Random), on top of the ratings provided by patients, we add noise to the average rating publicly visible to all market participants by introducing an additional random rating between 0 and 5 stars for each rating provided by patients.
Our design allows us to investigate the effect of a public rating mechanism on outcomes in healthcare credence goods markets. Furthermore, it enables us to explore the robustness of public rating mechanisms to noise by introducing additional random ratings.

References
Darby, M. R. and E. Karni (1973). "Free Competition and the Optimal Amount of Fraud." Journal of Law & Economics 16(1): 67-88.
Dulleck, U. and R. Kerschbamer (2006). "On Doctors, Mechanics, and Computer Specialists: The Economics of Credence Goods." Journal of Economic Literature 44(1): 5-42. DOI: https://doi.org/10.1257/002205106776162717.
Dulleck, U., R. Kerschbamer and M. Sutter (2011). "The Economics of Credence Goods: An Experiment on the Role of Liability, Verifiability, Reputation, and Competition." American Economic Review 101(2): 526-555. DOI: https://doi.org/10.1257/aer.101.2.526.
Emmert, M., S. Becker, N. Meszmer and U. Sander (2020). "Spiegeln Facebook-Bewertungen Die Versorgungsqualität Und Patientenzufriedenheit Von Krankenhäusern Wider? Eine Querschnittstudie Am Beispiel Der Geburtshilfe in Deutschland." Gesundheitswesen 82(06): 541-547. DOI: https://doi.org/10.1055/a-0774-7874.
Emmert, M. and N. Meszmer (2018). "Eine Dekade Arztbewertungsportale in Deutschland: Eine Zwischenbilanz Zum Aktuellen Entwicklungsstand." Gesundheitswesen 80(10): 851-858. DOI: https://doi.org/10.1055/s-0043-114002.

External Link(s)

Registration Citation

Citation
Angerer, Silvia et al. 2021. "The Limits of Rating Systems in Healthcare Credence Goods Markets." AEA RCT Registry. November 18. https://doi.org/10.1257/rct.8572-1.0
Sponsors & Partners

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
Experimental Details

Interventions

Intervention(s)
We experimentally investigate the effect and limits of a public rating system in healthcare credence goods markets. Therefore, we plan to employ a laboratory experiment framed in a healthcare context, where experts are called physicians and consumers are called patients, using a student sample from the University of Innsbruck.

To start with, we plan to run three experimental conditions. In the baseline condition, there is no feedback mechanism in place. Next, we introduce a public rating mechanism into the market, where patients can rate the interactions with physicians on a five-star-rating-scale. Given that the feedback mechanism enhances market outcomes, we plan to run a follow-up treatment where we introduce noise into the feedback mechanism. We plan to implement a condition where physicians receive one random rating (from zero to five stars) on top of each patient rating.

Our design allows us to investigate the effect of a public rating mechanism on outcomes in healthcare credence goods markets. Furthermore, it enables us to explore the robustness of public rating mechanisms to noise by introducing additional random ratings
Intervention Start Date
2021-11-17
Intervention End Date
2022-01-31

Primary Outcomes

Primary Outcomes (end points)
Overtreatment-rates
Primary Outcomes (explanation)
Overtreatment is characterized by the fact that the patient needs the mild treatment (𝑞l) but receives the major treatment (𝑞H). The overtreatment rate is the number of actual overtreatment decisions divided by the number of interactions with patients with a mild health problem.

Secondary Outcomes

Secondary Outcomes (end points)
Market Efficiency as the sum of patient, physician and insurance payoffs.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We plan to use a student sample from the University of Innsbruck and run each experimental condition with 48 subjects (as suggested by our power analysis). Therefore, we plan to run two sessions with 24 subjects each in every experimental condition. All sessions are run computerized using z-Tree and students are recruited using hroot. Participants do not know which experiment they are going to participate in when they register. They only receive information about the expected duration of the experiment (2h).
Our experiment is structured as follows for all our conditions:

Stage 1: The experimenter explains the experiment and participants read the instructions.
Stage 2: Participants answer several control questions to ensure they understood the game.
Stage 3: The computer randomly assigns roles and markets to participants.
Stage 4: Participants play the game for 16 periods.
Stage 5: Participants participate in additional games: an individual risk preference task, a dictator game, a lying task, and a trust game.
Stage 6: Participants fill out a questionnaire.
Experimental Design Details
Not available
Randomization Method
Randomization is carried out in the experiment by a computer.
Randomization Unit
at the session level
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
6 clusters á 8 individuals per experimental condition.
Sample size: planned number of observations
48 (6 x 8) individuals per experimental condition.
Sample size (or number of clusters) by treatment arms
192 (3 x 48) individuals (students at the University of Innsbruck).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on previous findings, we performed a power calculation, indicating that we need six clusters á 8 subjects per experimental condition when aiming for a power of 80%.
IRB

Institutional Review Boards (IRBs)

IRB Name
Leopold-Franzens-Universität Innsbruck, Certificate of good standing,
IRB Approval Date
2017-10-18
IRB Approval Number
40/2017