Experimental Design
Our experiment comprises three consecutive stages. In stage 1, participants fill out a questionnaire containing items on their personal characteristics. Stages 2 and 3 respectively measure participants’ demand for and processing of AI advice in the context of two distinct guessing tasks. As our main treatment variation, we disclose the algorithm developer’s identity to participants before they engage with the advice. In additional control treatments, we further vary whether participants also learn about the AI’s prediction accuracy. Overall, we employ four experimental conditions.
In stage 1, we elicit participants’ demographics, risk aversion, delayed gratification, norm obedience, and disposition to engage with novel technological applications using a questionnaire. We use these measures as control variables in our analyses. Importantly, among the demographic items, we elicit participants’ gender, immigration background, and political orientation and ask them to indicate on a 5-point scale how central each of these attributes is to their identity. We do so because these are the three characteristics of a developer of an algorithmic decision support system that participants eventually observe. By eliciting these characteristics and their importance for participants’ social identity, we can compute a weighted measure of participants’ perceived affinity and closeness to the developer, i.e., the social distance.
Prior guess elicitation. At the beginning of stage 2, we ask participants to guess the probability of a borrower defaulting on credit. A borrower is a real person from a publicly available, historic credit defaults dataset from Lending Club. We randomize the borrower on an individual level. Participants can enter a guess on a scale from 0 to 100 percent in steps of 10, i.e., there exist eleven possible answers. We also ask them to indicate their confidence in their guess on a 5-point scale. Participants earn a 5€ bonus if their guess does not deviate from the borrower’s actual credit default assessment recorded in the data. To make an informed guess, participants see ten borrower characteristics. We selected these ten characteristics based on the criteria that they are (i) relevant to the credit default assessment and (ii) sufficiently familiar/accessible to people. This initial guess serves as a prior.
Demand for advice. After we elicit participants’ prior guesses, we inform them that they will have the chance to update their initial guess for the given borrower. To possibly improve their guess, participants might observe the prediction of a previously trained machine learning model. As a default, participants observe the prediction with 50% probability. We then ask them to allocate through a slider how much of a 0.3€ endowment they will spend to adjust the probability of observing the prediction, where each 10% probability adjustment costs 0.1€. Participants retain any unallocated endowment portion. Participants can increase or decrease the probability of observing the prediction to at most 80% and at least 20% if they spend their entire endowment. As participants are unable to ensure that they definitely do not obtain the prediction, we might observe – at least for a subset of participants – how they behave if they eventually obtain the prediction despite having revealed a preference against it. Notably, before participants can make their slider task decision, we ask them to indicate their belief about the machine learning model’s prediction accuracy, which serves as a control measure in our analyses. The share of the endowment a participant spends to increase her probability of observing the machine learning prediction reflects her demand for algorithmic advice.
Posterior guess elicitation. We randomly determine whether they see the prediction according to the chosen probability. Participants can then update their initial guess, observing the same ten borrower characteristics, their initial guess, and possibly the prediction of the machine learning model. We inform participants that we will use their final guess to determine if they earn a 5€ bonus or not. Participants again have to indicate their level of confidence in their guess. Once participants made their demand for advice decision, we ask them to answer several questions measuring their perception of and attitudes toward the machine learning model and its developer.
Prior guess elicitation. At the beginning of stage 3, we ask participants to guess the listing price of an apartment in berlin. The is an actual real estate object from a dataset that we scraped from a large online platform. We randomize the apartment on an individual level. Mirroring the guessing task for credit defaults there are eleven possible answers. More specifically, participants can enter a guess on a scale from 300.000€ to 700.000€ in steps of 40.000€, i.e., there are eleven possible answers. We also ask them to indicate their confidence in their guess on a 5-point scale. Participants earn a 5€ bonus if their guess does not deviate from the real listing price recorded in the data. To make an informed guess, participants observe ten apartment characteristics. We selected these ten characteristics based on the criteria that they are (i) relevant to the apartment evaluation and (ii) sufficiently familiar/accessible to people. This initial guess serves as a prior.
Posterior guess elicitation. After we elicit participants’ prior guesses, we inform them that they will have the chance to update their initial guess for the given borrower. To possibly improve their guess, participants observe the prediction of a previously trained machine learning model. To alleviate concerns about spillover effects, we explicitly inform participants that the two machine learning models in the current and the previous stage are independent. Notably, in stage 3 we always show the AI prediction to the participant (in contrast to stage 2). Before we show the prediction to participants and allow them to update their initial guess, they need to indicate their belief about the machine learning model’s prediction accuracy, which serves as a control measure in our analyses. Participants can then update their initial guess, knowing that we will use their second guess to determine whether they earn a 5€ bonus or not. Participants are again required to indicate their level of confidence in their guess. Mirroring the demand stage, we ask participants to answer several questions measuring their perception of and attitudes toward the machine learning model and its developer. A participant’s adjustment of her prior guess in the direction of the observed prediction reflects her processing of algorithmic advice.
We employ four different experimental conditions that we implement in a between-subject fashion. Our baseline condition works exactly as outlined before. For participants in our main treatment (Dev condition), we disclose the social identity of the developers of the two machine learning models in stage 2 and stage 3 before participants can engage with them. More specifically, right after we elicit participants’ prior guesses and before they estimate the ML accuracy, we inform them about the developers’ gender, immigration background, and political orientation. Notably, both machine learning models were developed together by three members of our research team. On the participant level, we randomly select two of the three developers and disclose the identity of one of them in the demand and the other ones in the processing stage. By doing so, we can control for developer fixed effects. The developer identities are based on the authors of the study.
We employ two additional control treatments where we inform participants about the machine learning models’ prediction accuracies. DevAcc condition only differs from the Dev condition in one aspect: after participants indicate their belief about the machine learning models’ prediction accuracies in the demand and processing stages, we inform them about the models’ actual accuracies. Finally, identical to the DevAcc condition, participants in our Acc condition learn about the prediction accuracies of the two machine learning models as in the DevAcc condition. However, we do not disclose the developer identities to participants in the Acc condition. By comparing results from the DevAcc and Acc conditions, we can isolate the impact of disclosed developer identities on accuracy beliefs.
To avoid ordering effects, we randomize the order of stage 2 and stage 3 on the participant level. Relatedly, to control for task fixed effects, we randomly determine whether participants engage in the credit default and the apartment price guessing task in the demand and processing stage, respectively, or vice versa. Participants in the baseline or Acc condition also have to indicate their social distance on the page before the last page after they have completed both stages because we want to directly compare effects among the treatments. We plan to run our experiment with 800 participants that we recruit on the Prolific platform and randomly assign to one of our four experimental conditions with equal probability. We implement the experiment using oTree (Chen et al., 2016). We host the software on commercial servers from Heroku. In addition to a fixed participation fee of 1,75€, we pay participants according to their performance in either the demand stage or the processing stage. At the end of the experiment, we determine which of the two stages is payoff-relevant by means of a computerized coin flip.