Cheating the Rating Systems in Healthcare Credence Goods Markets

Last registered on May 23, 2022


Trial Information

General Information

Cheating the Rating Systems in Healthcare Credence Goods Markets
Initial registration date
November 15, 2021

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 18, 2021, 12:06 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
May 23, 2022, 7:34 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.



Primary Investigator

Technical University of Munich, School of Management

Other Primary Investigator(s)

PI Affiliation
University of Innsbruck
PI Affiliation
PI Affiliation
ETH Zurich
PI Affiliation
ESCP Business School

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial is based on or builds upon one or more prior RCTs.
A key characteristic of health care markets is the information asymmetry between patients and physicians. Physicians know more about the disease and the appropriate treatment than patients. This may result in different forms of physician misbehavior: providing more treatments than necessary, i.e. overtreatment; providing less treatment than necessary, i.e. undertreatment or charging more treatments than provided, i.e. overcharging. Patients have to trust in physicians that they receive appropriate treatment. This is why health services are often referred to as credence goods (Darby and Karni 1973, Dulleck and Kerschbamer 2006).

The provision of feedback on rating platforms and the associated reputation building has gained more and more attention in the past two decades in the context of physician-patient interactions. In Germany, for instance, about 70% of physician-rating website users are influenced by the rating in their physician choice (Emmert and Meszmer 2018). However, patients base their ratings often on characteristics unrelated to the quality of care (Emmert et al. 2020), thus introducing noise into the quality ratings. We capture these recent developments and investigate the effectiveness of public rating systems on the quality of care with the use of a laboratory experiment.

Based on the credence goods framework established by Dulleck and Kerschbamer (2006) and Dulleck et al. (2011), we introduce a toy model that enables us to derive hypotheses and test them in a laboratory experiment. We are planning to run at least four conditions of market interactions with 48 undergraduate students either in the role of physicians or patients. In the baseline condition, no reputation building is possible between physicians and patients. In the rating conditions, we introduce the possibility to rate physicians on a rating scale between zero and five stars. The rating is based on the payoff information of patients resulting from the interaction between physician and patient. In the (2+) buy-rating conditions, on top of the ratings provided by patients, we allow physicians to buy up to four additional ratings of five stars in at the beginning of each playing period. These buy-rating conditions vary in the costs of the additional ratings.

Our design allows us to investigate the robustness of public rating mechanisms to fraud by introducing the possibility to cheat.

Darby, M. R. and E. Karni (1973). "Free Competition and the Optimal Amount of Fraud." Journal of Law & Economics 16(1): 67-88.
Dulleck, U. and R. Kerschbamer (2006). "On Doctors, Mechanics, and Computer Specialists: The Economics of Credence Goods." Journal of Economic Literature 44(1): 5-42. DOI:
Dulleck, U., R. Kerschbamer and M. Sutter (2011). "The Economics of Credence Goods: An Experiment on the Role of Liability, Verifiability, Reputation, and Competition." American Economic Review 101(2): 526-555. DOI:
Emmert, M., S. Becker, N. Meszmer and U. Sander (2020). "Spiegeln Facebook-Bewertungen Die Versorgungsqualität Und Patientenzufriedenheit Von Krankenhäusern Wider? Eine Querschnittstudie Am Beispiel Der Geburtshilfe in Deutschland." Gesundheitswesen 82(06): 541-547. DOI:
Emmert, M. and N. Meszmer (2018). "Eine Dekade Arztbewertungsportale in Deutschland: Eine Zwischenbilanz Zum Aktuellen Entwicklungsstand." Gesundheitswesen 80(10): 851-858. DOI:

External Link(s)

Registration Citation

Angerer, Silvia et al. 2022. "Cheating the Rating Systems in Healthcare Credence Goods Markets." AEA RCT Registry. May 23.
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details


We experimentally investigate the impact of cheating a public rating system in healthcare credence goods markets. Therefore, we plan to employ a laboratory experiment framed in a healthcare context, where experts are called physicians and consumers are called patients, using a student sample from the University of Innsbruck.

To start with, we plan to run at least four experimental conditions. In the baseline condition, there is no feedback mechanism in place. Next, we introduce a public rating mechanism into the market, where patients can rate the interactions with physicians on a five-star-rating-scale. Given that the feedback mechanism enhances market outcomes, we plan to run at least two follow-up conditions where we investigate the robustness of the feedback mechanism to cheating, i.e. we allow physicians to buy up to four additional five-star ratings per period to improve their public rating. The conditions where physicians can buy additional ratings vary in the costs of additional ratings. We will start with costs of 1 ECU per additional rating and — depending on its’ effect on market outcomes — will increase (decrease) the cost of additional ratings in the following condition(s).

Our design allows us to investigate the robustness of public rating mechanisms to fraud by introducing the possibility to cheat.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Primary Outcomes (explanation)
Overtreatment is characterized by the fact that the patient needs the mild treatment (𝑞l) but receives the major treatment (𝑞H). The overtreatment rate is the number of actual overtreatment decisions divided by the number of interactions with patients with a mild health problem.

Secondary Outcomes

Secondary Outcomes (end points)
Market Efficiency as the sum of patient, physician and insurance payoffs.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We plan to use a student sample from the University of Innsbruck and run each experimental condition with 48 subjects (as suggested by our power analysis). Therefore, we plan to run two sessions with 24 subjects each in every experimental condition. All sessions are run computerized using z-Tree and students are recruited using hroot. Participants do not know which experiment they are going to participate in when they register. They only receive information about the expected duration of the experiment (1:45h).

Our experiment is structured as follows for all our conditions:
Stage 1: The experimenter explains the experiment and participants read the instructions.
Stage 2: Participants answer several control questions to ensure they understood the game.
Stage 3: The computer randomly assigns roles and markets to participants.
Stage 4: Participants play the game for 16 periods.
Stage 5: Participants participate in additional games: an individual risk preference task, a dictator game, a lying task, and a trust game.
Stage 6: Participants fill out a questionnaire.
Experimental Design Details
Randomization Method
Randomization is carried out in the experiment by a computer.
Randomization Unit
at the session level
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
6 clusters á 8 individuals per experimental condition.
Sample size: planned number of observations
48 (6 x 8) individuals per experimental condition.
Sample size (or number of clusters) by treatment arms
at least 192 (4 x 48) individuals (students at the University of Innsbruck).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on previous findings, we performed a power calculation, indicating that we need six clusters á 8 subjects per experimental condition when aiming for a power of 80%.

Institutional Review Boards (IRBs)

IRB Name
Leopold-Franzens-Universität Innsbruck, Certificate of good standing.
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials