The role of social identity in algorithmic transparency

Last registered on September 02, 2022

Pre-Trial

Trial Information

General Information

Title
The role of social identity in algorithmic transparency
RCT ID
AEARCTR-0010001
Initial registration date
August 31, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 02, 2022, 4:09 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Leibniz Institute for Financial Research SAFE

Other Primary Investigator(s)

PI Affiliation
Goethe University
PI Affiliation
Goethe University
PI Affiliation
Goethe University

Additional Trial Information

Status
In development
Start date
2022-09-01
End date
2022-09-08
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Recent literature explores the interaction between AI predictions and users’ decision-making. However, little is known about the influence of the AI developer’s identity on the acceptance or rejection of algorithmically generated predictions by users. This study aims to fill this gap. We draw on social identity theory and analyze how disclosing the developer identity to users impacts users’ demand for and processing of algorithmic advice. We developed a novel experimental design where we disclose the developer identity to participants in an online experiment which will include approx. 800 participants. The two popular constructs of willingness to pay (WTP) and weight on advice (WOA) will operationalize information demand and processing, respectively. Moreover, we aim to disentangle the effect attributable to expected accuracy from social identity effects. The findings of this study contribute to the literature on advice taking, algorithm aversion as well as social identity and may provide practical guidance to organizations for defining strategies that aid in the successful adoption and value creation of algorithmic applications.
External Link(s)

Registration Citation

Citation
Bauer, Kevin et al. 2022. "The role of social identity in algorithmic transparency." AEA RCT Registry. September 02. https://doi.org/10.1257/rct.10001-1.0
Experimental Details

Interventions

Intervention(s)
Our experiment comprises three consecutive stages (see Figure 1 for an overview). In stage 1, participants fill out a questionnaire containing items on their personal characteristics. Stages 2 and 3 respectively measure participants’ demand for and processing of AI advice in the context of two distinct guessing tasks. As our main treatment variation, we disclose the algorithm developer’s identity to participants before they engage with the advice. In additional control treatments, we further vary whether participants also learn about the AI’s prediction accuracy. Overall, we employ four experimental conditions.
Intervention Start Date
2022-09-01
Intervention End Date
2022-09-08

Primary Outcomes

Primary Outcomes (end points)
We look at two primary outcomes measures. First, we look at the money spent to affect the probability of getting an AI prediction of the apartment price/probability of default to be estimated (information demand).
Second, we look at the difference between the initial and revised prediction operationalized through regression analyses or WOA (information processing).
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Cognitive Trust in competence: Trust in the AI’s competence after prediction recommendation.
Cognitive Trust in integrity: Trust in the AI’s integrity after prediction recommendation.
Emotional trust: Sense of security with prediction recommendation of the AI system.
Transparency: Comprehensibility of how the AI system arrived at its prediction recommendation.
Anthropomorphism: The attribution of human characteristics or behaviour to the AI system.
Social distance: Measuring social distance to the developer.
Perceived accuracy: Participant’s expected performance of the AI system.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Our experiment comprises three consecutive stages. In stage 1, participants fill out a questionnaire containing items on their personal characteristics. Stages 2 and 3 respectively measure participants’ demand for and processing of AI advice in the context of two distinct guessing tasks. As our main treatment variation, we disclose the algorithm developer’s identity to participants before they engage with the advice. In additional control treatments, we further vary whether participants also learn about the AI’s prediction accuracy. Overall, we employ four experimental conditions.

In stage 1, we elicit participants’ demographics, risk aversion, delayed gratification, norm obedience, and disposition to engage with novel technological applications using a questionnaire. We use these measures as control variables in our analyses. Importantly, among the demographic items, we elicit participants’ gender, immigration background, and political orientation and ask them to indicate on a 5-point scale how central each of these attributes is to their identity. We do so because these are the three characteristics of a developer of an algorithmic decision support system that participants eventually observe. By eliciting these characteristics and their importance for participants’ social identity, we can compute a weighted measure of participants’ perceived affinity and closeness to the developer, i.e., the social distance.

Prior guess elicitation. At the beginning of stage 2, we ask participants to guess the probability of a borrower defaulting on credit. A borrower is a real person from a publicly available, historic credit defaults dataset from Lending Club. We randomize the borrower on an individual level. Participants can enter a guess on a scale from 0 to 100 percent in steps of 10, i.e., there exist eleven possible answers. We also ask them to indicate their confidence in their guess on a 5-point scale. Participants earn a 5€ bonus if their guess does not deviate from the borrower’s actual credit default assessment recorded in the data. To make an informed guess, participants see ten borrower characteristics. We selected these ten characteristics based on the criteria that they are (i) relevant to the credit default assessment and (ii) sufficiently familiar/accessible to people. This initial guess serves as a prior.

Demand for advice. After we elicit participants’ prior guesses, we inform them that they will have the chance to update their initial guess for the given borrower. To possibly improve their guess, participants might observe the prediction of a previously trained machine learning model. As a default, participants observe the prediction with 50% probability. We then ask them to allocate through a slider how much of a 0.3€ endowment they will spend to adjust the probability of observing the prediction, where each 10% probability adjustment costs 0.1€. Participants retain any unallocated endowment portion. Participants can increase or decrease the probability of observing the prediction to at most 80% and at least 20% if they spend their entire endowment. As participants are unable to ensure that they definitely do not obtain the prediction, we might observe – at least for a subset of participants – how they behave if they eventually obtain the prediction despite having revealed a preference against it. Notably, before participants can make their slider task decision, we ask them to indicate their belief about the machine learning model’s prediction accuracy, which serves as a control measure in our analyses. The share of the endowment a participant spends to increase her probability of observing the machine learning prediction reflects her demand for algorithmic advice.
Posterior guess elicitation. We randomly determine whether they see the prediction according to the chosen probability. Participants can then update their initial guess, observing the same ten borrower characteristics, their initial guess, and possibly the prediction of the machine learning model. We inform participants that we will use their final guess to determine if they earn a 5€ bonus or not. Participants again have to indicate their level of confidence in their guess. Once participants made their demand for advice decision, we ask them to answer several questions measuring their perception of and attitudes toward the machine learning model and its developer.

Prior guess elicitation. At the beginning of stage 3, we ask participants to guess the listing price of an apartment in berlin. The is an actual real estate object from a dataset that we scraped from a large online platform. We randomize the apartment on an individual level. Mirroring the guessing task for credit defaults there are eleven possible answers. More specifically, participants can enter a guess on a scale from 300.000€ to 700.000€ in steps of 40.000€, i.e., there are eleven possible answers. We also ask them to indicate their confidence in their guess on a 5-point scale. Participants earn a 5€ bonus if their guess does not deviate from the real listing price recorded in the data. To make an informed guess, participants observe ten apartment characteristics. We selected these ten characteristics based on the criteria that they are (i) relevant to the apartment evaluation and (ii) sufficiently familiar/accessible to people. This initial guess serves as a prior.

Posterior guess elicitation. After we elicit participants’ prior guesses, we inform them that they will have the chance to update their initial guess for the given borrower. To possibly improve their guess, participants observe the prediction of a previously trained machine learning model. To alleviate concerns about spillover effects, we explicitly inform participants that the two machine learning models in the current and the previous stage are independent. Notably, in stage 3 we always show the AI prediction to the participant (in contrast to stage 2). Before we show the prediction to participants and allow them to update their initial guess, they need to indicate their belief about the machine learning model’s prediction accuracy, which serves as a control measure in our analyses. Participants can then update their initial guess, knowing that we will use their second guess to determine whether they earn a 5€ bonus or not. Participants are again required to indicate their level of confidence in their guess. Mirroring the demand stage, we ask participants to answer several questions measuring their perception of and attitudes toward the machine learning model and its developer. A participant’s adjustment of her prior guess in the direction of the observed prediction reflects her processing of algorithmic advice.

We employ four different experimental conditions that we implement in a between-subject fashion. Our baseline condition works exactly as outlined before. For participants in our main treatment (Dev condition), we disclose the social identity of the developers of the two machine learning models in stage 2 and stage 3 before participants can engage with them. More specifically, right after we elicit participants’ prior guesses and before they estimate the ML accuracy, we inform them about the developers’ gender, immigration background, and political orientation. Notably, both machine learning models were developed together by three members of our research team. On the participant level, we randomly select two of the three developers and disclose the identity of one of them in the demand and the other ones in the processing stage. By doing so, we can control for developer fixed effects. The developer identities are based on the authors of the study.

We employ two additional control treatments where we inform participants about the machine learning models’ prediction accuracies. DevAcc condition only differs from the Dev condition in one aspect: after participants indicate their belief about the machine learning models’ prediction accuracies in the demand and processing stages, we inform them about the models’ actual accuracies. Finally, identical to the DevAcc condition, participants in our Acc condition learn about the prediction accuracies of the two machine learning models as in the DevAcc condition. However, we do not disclose the developer identities to participants in the Acc condition. By comparing results from the DevAcc and Acc conditions, we can isolate the impact of disclosed developer identities on accuracy beliefs.

To avoid ordering effects, we randomize the order of stage 2 and stage 3 on the participant level. Relatedly, to control for task fixed effects, we randomly determine whether participants engage in the credit default and the apartment price guessing task in the demand and processing stage, respectively, or vice versa. Participants in the baseline or Acc condition also have to indicate their social distance on the page before the last page after they have completed both stages because we want to directly compare effects among the treatments. We plan to run our experiment with 800 participants that we recruit on the Prolific platform and randomly assign to one of our four experimental conditions with equal probability. We implement the experiment using oTree (Chen et al., 2016). We host the software on commercial servers from Heroku. In addition to a fixed participation fee of 1,75€, we pay participants according to their performance in either the demand stage or the processing stage. At the end of the experiment, we determine which of the two stages is payoff-relevant by means of a computerized coin flip.
Experimental Design Details
Randomization Method
We randomly assign participants to one of our four treatment variations with equal probability using a computerized random number generator. Additionally, we randomize the order of tasks, the guessing tasks, and the developers participants observe.
Randomization Unit
The randomization occurs at the individual level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
1 session on Prolific
Sample size: planned number of observations
800 individuals (200 per treatment)
Sample size (or number of clusters) by treatment arms
200 individuals per treatment condition
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Gemeinsame Ethikkommission Wirtschaftswissenschaften der Goethe-Universität Frankfurt und der Johannes Gutenberg-Universität Mainz
IRB Approval Date
2022-07-12
IRB Approval Number
N/A
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials