Experimental Design
There are two phases for the experiment. In the first phase, we collect data that will be used to incentivize the main experiment (also called the second phase). These first-phase participants comprise the pool from which decision makers in the main experiment may choose as experts. In the main experiment, we measure the accuracy of the experimental task that an expert needs in order for decision-makers to delegate their decision agency to the expert. The phases will be between-subject and collected on Prolific.
Before explaining the objectives of the two phases, we first explain the experimental task in detail. The task of interest is a balls-in-urns-style task. There are two jars. Both jars each have 10 balls in them. The 10 balls are a combination of green, orange, and blue balls. The color composition of the 10 balls in both jars is known and displayed on the screen. One of the two jars is randomly selected (with a 50% probability), and three balls are drawn from it without replacement. The chosen jar is not known to the participant. These draws are not chosen randomly, but decided ex ante to manipulate task difficulty, and are shown to the participant. A fourth ball will be drawn from the same randomly chosen jar. The objective of the task is to choose the color that the fourth draw will be. The fourth draw is chosen randomly from the remaining 7 balls in the randomly chosen jar.
There are three versions of this baseline task: an easy, medium, and hard version. These difficulty levels have been determined prior to running the experiment based on how obvious the best answer is (meaning there is largely more of one color in the jar) and how much additional accuracy a participant may gain from correctly doing the Bayesian updating calculation, when compared to a participant that chooses at random. The difficulty levels may be different after analysis from the results of the data, and it is not integral to the experiment that they remain their intended difficulty level. Rather, these different versions of the task are to provide robustness of our results and to ensure that any results of the main experiment are not only a feature of a single set of parameters of the balls-in-urns task.
In the first phase of the experiment, participants (who we also call experts) perform 100 repetitions of the experimental task. Participants will only ever see one difficulty level of the task. More explicitly, they see the same two jars with the same color compositions and the same three ball draws, and they must guess what the fourth color drawn is 100 times. The jar that is randomly chosen and the color of the fourth draw are random and may change between each repetition of the task. Participants in this phase are told all this information and are informed that they will answer the same question, with potentially different outcomes, 100 times. The first phase is solely to collect the data in order to incentivize the main experiment. One of the 100 repetitions will be randomly chosen and the participants will receive a bonus if they are correct on that problem.
In addition, there are two different manipulations for each difficulty level of the experimental task. The jar compositions are the same for each manipulation, but the ball draws shown vary. The first manipulation changes the number of ball draws that a participant sees. A participant may see the same first three ball draws as the original version, but in addition will see a fifth and sixth ball draw as well. Their objective is to still guess the color of the fourth ball draw. Alternatively, a participant may see the same first ball draw, but not the second and third. These will be replaced with grayed-out balls, and their objective will be to still guess the fourth ball drawn. The additional ball draws will be the same for each participant, and the grayed-out balls will be the same as the original version of the task.
The second manipulation changes how informative the three balls drawn are. The three balls drawn and shown to the participant will be more or less informative than the original version of the task. The degree of informativeness of the draws is determined by the probability that a color is to be the fourth draw based on Bayesian updating calculations. Essentially, less informative draws are those that give Bayesian updating calculations closer to random, and more informative draws are those that make one color more certain than in the original version. In total, there are 15 different versions of the task, including all difficulty levels and their variations. Participants in the first phase are assigned to only one of the 15 versions in which they answer the same variation 100 times. We recruit 20 participants for each version, for a total of 300 participants in the first phase.
In the main experiment, or the second phase, participants first do 10 repetitions of one difficulty level of the original, non-manipulated task. The main question of interest is whether participants would like to use their own performance on the task or have one of the first phase participants with X% accuracy on the task replace their decisions and use them for payment. Their answer on this question is what is the X% accuracy needed for the decision maker to replace their performance with that of the first-phase participants. They may answer any X% from 0%-100% in one percent increments. The question is displayed in list format, where the decision maker must choose a row in which to switch from preferring their own performance to that of the first-phase participant. We enforce a single switch-point, so any accuracy higher than their switch-point, the decision maker must also prefer the first-phase participants' choices. For example, if a decision maker states 60% accuracy for their switch-point, they automatically choose participants in the first phase with 61%, 62%,..., 100% accuracy on the task to their own performance. In addition to answering this question for the original version of the task, they will report the necessary accuracy for all four manipulations of the task. Also, there will be two additional manipulations in which we show the decision makers whether the first-phase participant agreed or disagreed with their own guess on the task. This manipulation will randomly choose one of the 10 repetitions and display their reported answer on that repetition, as well as a repetition of a first-phase participant who agreed or disagreed with them on the answer.
There are two treatments for the main experiment. In the first treatment, decision makers do not know their own performance accuracy in the task when answering the main questions. Before they answer, we elicit their beliefs about how accurate they believed they were on the task and then display their believed accuracy to them as they answer the main questions. We call this treatment subjective accuracy. In the second treatment, we measure their beliefs between the task and the main questions, just as in the first treatment. However, participants are told their own accuracy on the task after reporting their beliefs. Their actual accuracy is displayed on the main question decision screens. We will call this treatment objective accuracy. Treatments are between-subject and beliefs in both treatments are incentivized using a Multiple Price List, where an explanation is available to decision makers.
In addition to the main experiment, we extend our experiment to determine how gender and race/ethnicity group may affect the expertise premium. This experiment has the same original task, where the ball draws are the same for the first-phase participants and decision makers. We use the same first-phase participants, from which we have collected race/ethnicity group and gender through Prolific. We will not ask them to provide these in the experiment; instead, we will export their characteristics from their profile.
We only use the medium difficulty level of the task and there are not the previous six manipulations. We also only run the Objective Accuracy treatment. In this additional experiment, our manipulations are whether the gender and race/ethnicity group of the first-phase participant is the same as the decision maker's own gender and race/ethnicity group. We look at the effect of race/ethnicity group and gender on the expertise premium separately. For our gender categories, we use men and women. For our race/ethnicity group categories, we use White (non-Hispanic), Black (non-Hispanic), Asian (non-Hispanic) and Hispanic individuals. We chose these race/ethnicity groups because they are a majority of 96\% of the people in the United States according to the United States Census. For gender, the manipulation is whether the gender of the first-phase participant is the same as the decision maker's gender. For the race/ethnicity group, we also use the manipulation of whether the race/ethnicity group of the first-phase participant is the same as the decision maker's race/ethnicity group. As their are more than two races, we do not compare all possible combinations of races/ethnicity groups. We have a manipulation where the race/ethnicity group of the first-phase participant is the same as the decision maker. For the manipulation where races/ethnicity groups are not the same, one of the other three races/ethnicity groups is randomly chosen.
The design of the experiment is the same as that of the main experiment. First, decision makers do 10 repetitions of the original, medium difficulty task. We then ask them their beliefs about their own performance and reveal their actual accuracy in the 10 repetitions. They then answer the necessary accuracy questions. The only difference is that on the necessary accuracy question screen, we include the specific demographic characteristic in an excerpt about the first-round decision makers. This excerpt is in the main experiment, but without the characteristic of interest.