Mirror Mirror on the Wall (Additional data collection)

Last registered on July 08, 2022

View Trial History

Pre-Trial

Trial Information

General Information

Title

Mirror Mirror on the Wall (Additional data collection)

RCT ID

AEARCTR-0009694

Initial registration date

July 04, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

July 08, 2022, 9:47 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

Germany

Region

Primary Investigator

Name

Kevin Bauer

Affiliation

Leibniz Institute for Financial Research SAFE

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

Completed

Start date

2022-07-04

End date

2022-07-05

Keywords

Behavior

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial is based on or builds upon one or more prior RCTs.

Abstract

This project examines the existence and mechanisms of potential side effects that rendering the outcome of algorithmic assessments transparent may entail. Our main objective is to test whether such algorithmic transparency can steer an assessee’s behavior into the direction of the assessment, i.e., create a self-fulfilling prophecy. We use an incentivized online experiment to test hypotheses derived from psychological and economic theories. Specifically, participants engage in incentivized investment games, where we vary investors’ and recipients’ access to an assessment about recipients’ likelihood to repay an investment. The assessment either originates from a human expert or a trained machine learning model.

External Link(s)

Wave 1

Registration Citation

Citation

Bauer, Kevin. 2022. "Mirror Mirror on the Wall (Additional data collection)." AEA RCT Registry. July 08. https://doi.org/10.1257/rct.9694-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

We vary the disclosure of different types of predictions to the assessed individuals. Overall, there are 4 treatments (and one baseline from wave 1): (1) Predictions made by a machine learning model privately disclosed to the assessee, (2) predictions made by a machine learning model publicly disclosed to the assessee and the other person the assessee interacts with, (3) Predictions made by human experts privately disclosed to the assessee, (4) predictions made by human experts publicly disclosed to the assessee and the other person the assessee interacts with. Notably, the human experts' predictions is a model of human experts behavior for scalability.

Intervention Start Date

2022-07-04

Intervention End Date

2022-07-05

Primary Outcomes

Primary Outcomes (end points)

Recipients’ repayment amount, Recipients’ beliefs about investors’ repayment expectations

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Our experiment comprises two consecutive stages. In stage 1, participants have to fill out a questionnaire. Answers serve as the basis of the algorithmic assessment and as control variables. In stage 2, participants engage in a one-shot investment game -- a reliably and widely used experimental paradigm that mirrors the fundamental structure of sequential interactions.
Experimental Design Details
In the first stage of the experiment, we use a questionnaire to elicit 15 personal characteristics from participants. 12 of these characteristics serve as the basis for predictive assessments. At this point, we do not inform participants about the purpose of the questionnaire. In this way, we can allay concerns that participants give intentionally inaccurate and self-serving answers in order to (perceivably) outsmart or game the system. The questionnaire further contains items that directly measure participants' positive reciprocity. We take proven measures from. In our analyses, these measures will serve as the ground truth that helps us to examine important treatment heterogeneities related to the accuracy of predictive assessments. Thereby, we will consider the actual empirical correction in our sample between baseline participants' answer to this question and their actual repayment behavior in stage 2. We show the 12 traits in use in the appendix.

\subsection{Stage 2}
Stage 2 comprises a one-shot investment game that has the following basic structure. There are two parties: an investor and a recipient. The investor has 10 monetary units (MU) and begins with deciding whether to keep or invest the entire 10 MU with the recipient. If she keeps the 10 MU, the game ends leaving her and the recipient with a payoff of 10 MU and 0 MU, respectively. If she decides to invest, the recipient receives triple the amount, i.e., 30 MU. The recipient is free to keep the whole amount without repercussion. Crucially, however, the recipient has the option to repay the investor x \in [0,30] MU, thereby reciprocating the investors initial trust. The investor and recipient payoffs equal x MU and 30-x MU, respectively. With this structure the investment game closely mirrors sequential human transactions that require both, trust by first moving party (e.g., loan officer, HR manager, supplier) and reciprocity by the second moving party (e.g., borrower, worker, buyer), especially in incomplete contract situations. Notably, participants in our main study always play in the recipient role and must indicate how many MU they will return to an investor who initially invests 10 MU with them. After making the repayment decision, we ask participants to indicate what they believe the investor expects to be repaid. If their guess does not deviate from the actual belief of the investor by more than 5 units, participants earn 5 MU.

We introduce our between-subject treatment variation before participants make their decisions. In our baseline condition (NoDisc), participants do not receive any additional information. Participants in our two main treatment conditions, however, observe a proper ML model's predictive assessment about whether or not they are a selfish person who is not expected to repay trust. To provide a reference point and control beliefs, we explain to participants that a reciprocal person, in the game they play, typically repays more than 10 MU so that a trusting investor is strictly better off than if she had not invested. Before we reveal their assessment, we ask participants to guess the prediction accuracy across experimental participants. If their guess does not deviate from the actual prediction by more than 15 percentage points, participants they earn 5 MU.

Participants in the public disclosure treatment (PubDisc) learn the ML-model's predictive assessment and know that the investor knows this prediction before he makes his decision, too, but is not bound to adhere to it. In our private disclosure treatment condition (PrivDisc), participants are aware of their predictive assessment of the ML-model, but know that we did not reveal it to the investor. We employ this second treatment to disentangle first and second order belief effects. The key feature of our design is that we ask treatment participants to make their repayment decisions and state their beliefs at both possible prediction information sets rather than only the one actually reached. Put differently, participants have to indicate their decisions twice: (i) assuming they are predicted to be a reciprocal person (who typically repays more than 10 MU) and (ii) assuming they are predicted not to be a reciprocal person (who typically repays at most 10 MU). Participants only learn about their actual assessment afterwards.

We employ this strategy method elicitation for three reasons. First, it allows us to observe counterfactuals and measure participants' beliefs and behaviors conditional on the assessment. This way, we are able to examine individual level heterogeneities conditional on the accuracy of the prediction. Second, we can use predictions generated by a pre-trained ML-model instead of a mock-up and examine aggregate level equilibrium outcomes conditional on different predictive performance levels. Third, it provides more data because we observe two outcomes per participant.

To investigate the role of knowing that it is a machine performing the predictive assessment, we run two control treatments: PrivDisc-H and PubDisc-H. These control treatments perfectly mirror the PrivDisc and PubDisc conditions, respectively, except that recipients learn that it is not proper machine learning model making the prediction, but a human expert. This way we are able to isolate any idiosyncratic effects driven by the computerized nature of algorithmic assessments.

Once participants finish stage 2, the experiment ends with a questionnaire containing several items on trust in the predictive assessment and a manipulation check (in all but the baseline treatment). These variables will serve as additional controls in our regression analyses. On the final screen we inform participants about the game outcomes in each stage, the actual assessment about themselves, and their income.

Experimental Design Details

Randomization Method

We randomize the assignment into the treatments using a random number generator in Python. The assignment occurs with equal probability to obtain the same number of observations per treatment. Overall, there are 4 conditions (all treatment conditions). We do not collect more data for the baseline condition because we have more observations there from the first wave.

Randomization Unit

Participants are assigned randomly on the individual level and remain in one treatment over the course of the entire experiment.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

Sample size: planned number of observations

about 200 additional observations.

Sample size (or number of clusters) by treatment arms

about 50 observations per treatment

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Gemeinsame Ethikkommission Wirtschaftswissenschaften der Goethe-Universität Frankfurt und der Johannes Gutenberg-Universität Mainz

IRB Approval Date

2022-04-22

IRB Approval Number

N/A

Analysis Plan

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?

Data Collection Complete

Data Publication

Is public data available?

Program Files

Reports, Papers & Other Materials

Mirror Mirror on the Wall (Additional data collection)

Pre-Trial

General Information

Locations

Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Registration Citation

Interventions

Primary Outcomes

Secondary Outcomes

Experimental Design

Experiment Characteristics

Institutional Review Boards (IRBs)

Post-Trial

Study Withdrawal

Intervention

Data Publication

Program Files

Relevant Paper(s)

Reports & Other Materials