Experimental Design Details
I hired participants on Prolific to answer ten math questions from the Armed Services Vocational Aptitude Battery (ASVAB). Each of these experimental workers received a score out of 10, which serves as my ability measure. I then created three groups of workers, each uniformly distributed over four possible scores: one reference group, a mean shift of this group, and a mean preserving spread of this group.
In this experiment, experimental "employers" are tasked with learning about the pool of experimental "workers" just described. Employers are randomized to one of three environments in a between-subjects design:
Treatment 1: One worker
In the baseline treatment, IND, I randomize the employer to one of the three groups of workers and show them the distribution of scores in their group: this gives the employer an accurate prior about the score of each worker they evaluate. Then a worker is drawn at random. For this worker, the employer views three random draws with replacement from the worker’s quiz. For each draw, the employer (1) learns whether the drawn question was answered correctly or incorrectly and (2) reports a full posterior distribution over the worker’s score.
Treatment 2: Two workers from same group
Next, I ask whether receiving information about multiple workers simultaneously, a natural feature of many evaluation environments, distorts inference. In a second treatment, COMP-SAME, the employer is again randomly assigned to a group of workers and sees the distribution of scores in the group. But, rather than one worker, two workers are drawn with replacement from the group. The employer completes three rounds of evaluation for both workers simultaneously.
Treatment 3: Two workers from different groups
Finally, I ask how well people can execute a simple abstract statistical discrimination problem. In a third treatment, COMP-DIFF, the employer is randomly assigned to a pair of worker groups, where these pairs contrast: one group is a mean shift or mean preserving spread of the other group. One worker is drawn from each group, and the employer completes three rounds of evaluation for both workers simultaneously.
The set of signal triplets, or paired signal triplets, an employer observes in a given treatment is fixed in advance. The triplets are generated according to their expected frequency given the prior. For the IND and COMP-SAME treatments, there is one signal set per prior. For the COMP-DIFF treatment, there are two sets of signals per pair of groups. This gives 10 sets.
In the IND treatment, half of the posterior reports are chosen at random to count for payment. In the COMP-SAME and COMP-DIFF treatments, in each round, the posterior report for one worker in the pair is chosen at random to count for payment. Payment is calculated using a binarized scoring rule.