The Limits of Rating Systems in Healthcare Credence Goods Markets

Last registered on May 23, 2022

Pre-Trial

Trial Information

General Information

Title
The Limits of Rating Systems in Healthcare Credence Goods Markets
RCT ID
AEARCTR-0008572
Initial registration date
November 15, 2021

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 18, 2021, 11:58 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
May 23, 2022, 7:39 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
Technical University of Munich, School of Management

Other Primary Investigator(s)

PI Affiliation
University of Innsbruck
PI Affiliation
UMIT Tirol
PI Affiliation
ETH Zurich
PI Affiliation
ESCP Business School

Additional Trial Information

Status
In development
Start date
2022-05-24
End date
2022-12-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
A key characteristic of health care markets is the information asymmetry between patients and physicians. Physicians know more about the disease and the appropriate treatment than patients. This may result in different forms of physician misbehavior: providing more treatments than necessary, i.e. overtreatment; providing less treatment than necessary, i.e. undertreatment or charging more treatments than provided, i.e. overcharging. Patients have to trust in physicians that they receive appropriate treatment. This is why health services are often referred to as credence goods (Darby and Karni 1973, Dulleck and Kerschbamer 2006).

The provision of feedback on rating platforms and the associated reputation building has gained more and more attention in the past two decades in the context of physician-patient interactions. In Germany, for instance, about 70% of physician-rating website users are influenced by the rating in their physician choice (Emmert and Meszmer 2018). However, patients base their ratings often on characteristics unrelated to the quality of care (Emmert et al. 2020), thus introducing noise into the quality ratings. We capture these recent developments and investigate the effectiveness of public rating systems on the quality of care with the use of a laboratory experiment.

Based on the credence goods framework established by Dulleck and Kerschbamer (2006) and Dulleck et al. (2011), we introduce a toy model that enables us to derive hypotheses and test them in a laboratory experiment. We are planning to run at least four conditions of market interactions with 48 undergraduate students either in the role of physicians or patients. In the baseline condition, no reputation building is possible between physicians and patients. In the rating conditions, we introduce the possibility to rate physicians on a rating scale between zero and five stars. The rating is based on the payoff information of patients resulting from the interaction between physician and patient. In the (2+) random-rating conditions, on top of the ratings provided by patients, we add noise to the average rating publicly visible to all market participants by introducing additional random ratings between 0 and 5 stars for each rating provided by patients.

Our design allows us to investigate the effect of a public rating mechanism on outcomes in healthcare credence goods markets. Furthermore, it enables us to explore the robustness of public rating mechanisms to noise by introducing additional random ratings.

References
Darby, M. R. and E. Karni (1973). "Free Competition and the Optimal Amount of Fraud." Journal of Law & Economics 16(1): 67-88.
Dulleck, U. and R. Kerschbamer (2006). "On Doctors, Mechanics, and Computer Specialists: The Economics of Credence Goods." Journal of Economic Literature 44(1): 5-42. DOI: https://doi.org/10.1257/002205106776162717.
Dulleck, U., R. Kerschbamer and M. Sutter (2011). "The Economics of Credence Goods: An Experiment on the Role of Liability, Verifiability, Reputation, and Competition." American Economic Review 101(2): 526-555. DOI: https://doi.org/10.1257/aer.101.2.526.
Emmert, M., S. Becker, N. Meszmer and U. Sander (2020). "Spiegeln Facebook-Bewertungen Die Versorgungsqualität Und Patientenzufriedenheit Von Krankenhäusern Wider? Eine Querschnittstudie Am Beispiel Der Geburtshilfe in Deutschland." Gesundheitswesen 82(06): 541-547. DOI: https://doi.org/10.1055/a-0774-7874.
Emmert, M. and N. Meszmer (2018). "Eine Dekade Arztbewertungsportale in Deutschland: Eine Zwischenbilanz Zum Aktuellen Entwicklungsstand." Gesundheitswesen 80(10): 851-858. DOI: https://doi.org/10.1055/s-0043-114002.

External Link(s)

Registration Citation

Citation
Angerer, Silvia et al. 2022. "The Limits of Rating Systems in Healthcare Credence Goods Markets." AEA RCT Registry. May 23. https://doi.org/10.1257/rct.8572-2.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
We experimentally investigate the effect and limits of a public rating system in healthcare credence goods markets. Therefore, we plan to employ a laboratory experiment framed in a healthcare context, where experts are called physicians and consumers are called patients, using a student sample from the University of Innsbruck.

We plan to run at least four experimental conditions. In the baseline condition, there is no feedback mechanism in place. Next, we introduce a public rating mechanism into the market, where patients can rate the interactions with physicians on a five-star-rating-scale. Given that the feedback mechanism enhances market outcomes, we plan to run at least two follow-up conditions where we introduce noise into the feedback mechanism. We plan to implement noise as a situation, where physicians receive random ratings (from zero to five stars) on top of each patient rating. The conditions with noise vary in the amount of additional ratings. We will start with one random rating for each patient rating and — depending on its’ effect on market outcomes — will increase (decrease) the amount of noise (i.e. the number of random ratings) in the following condition(s).

Our design allows us to investigate the robustness of public rating mechanisms to noise by introducing additional random ratings.
Intervention Start Date
2022-05-24
Intervention End Date
2022-07-15

Primary Outcomes

Primary Outcomes (end points)
Overtreatment-rates
Primary Outcomes (explanation)
Overtreatment is characterized by the fact that the patient needs the mild treatment (𝑞l) but receives the major treatment (𝑞H). The overtreatment rate is the number of actual overtreatment decisions divided by the number of interactions with patients with a mild health problem.

Secondary Outcomes

Secondary Outcomes (end points)
Market Efficiency as the sum of patient, physician and insurance payoffs.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We plan to use a student sample from the University of Innsbruck and run each experimental condition with 48 subjects (as suggested by our power analysis). Therefore, we plan to run two sessions with 24 subjects each in every experimental condition. All sessions are run computerized using z-Tree and students are recruited using hroot. Participants do not know which experiment they are going to participate in when they register. They only receive information about the expected duration of the experiment (1:45h).

Our experiment is structured as follows for all our conditions:
Stage 1: The experimenter explains the experiment and participants read the instructions.
Stage 2: Participants answer several control questions to ensure they understood the game.
Stage 3: The computer randomly assigns roles and markets to participants.
Stage 4: Participants play the game for 16 periods.
Stage 5: Participants participate in additional games: an individual risk preference task, a dictator game, a lying task, and a trust game.
Stage 6: Participants fill out a questionnaire.
Experimental Design Details
Randomization Method
Randomization is carried out in the experiment by a computer.
Randomization Unit
at the session level
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
6 clusters á 8 individuals per experimental condition.
Sample size: planned number of observations
48 (6 x 8) individuals per experimental condition.
Sample size (or number of clusters) by treatment arms
at least 192 (4 x 48) individuals (students at the University of Innsbruck).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on previous findings, we performed a power calculation, indicating that we need six clusters á 8 subjects per experimental condition when aiming for a power of 80%.
IRB

Institutional Review Boards (IRBs)

IRB Name
Leopold-Franzens-Universität Innsbruck, Certificate of good standing,
IRB Approval Date
2017-10-18
IRB Approval Number
40/2017

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials