Experimental Design Details

Experimental Design:

In this experiment, Experiment 2, subjects are asked to evaluate the mathematical performance of a second group of individuals, from hereon called candidates, that participated in Experiment 1. Subjects will only observe the joint score of pairs of candidates from Experiment 1. Hence, the individual performance of each candidate must be inferred from the joint score of a pair.

We start by describing the procedures of Experiment 1, which will explain what exercises the candidates performed and how we matched them into pairs. We then proceed to explain how information about the performance of candidates in Experiment 1 will be presented to and evaluated by the subjects in Experiment 2.

Finding candidates and constructing pairs:

Experiment 1 was conducted in the spring and fall of 2017. In the experiment, candidates conducted a series of mathematical quizzes (Wiborg, Brekke & Nyborg, 2020). In each quiz, candidates were asked to answer as many mathematical exercises as possible in 60 seconds. The exercises were variations of adding and subtracting two and three digit numbers. We use data from five of these quizzes to generate the information observable to the subjects in Experiment 2. The candidates’ performance on a sixth quiz will be used to incentivise subjects in Experiment 2.

In order to generate pairs of candidates, we randomly match candidates, uniquely, in each quiz, meaning that out of, for instance, 60 candidates there would be 30 unique pairs in each quiz. Hence, a candidate has five partners in total.

Prior to performing the math quizzes in Experiment 1, candidates were asked to provide a nickname for anonymisation purposes. We asked candidates the following question to preserve information about gender: ”Imagine that you were to have a different first name. What name would you prefer to have?”. In Experiment 2 we convey information about the gender of the candidates by using these nicknames.

For the purposes of Experiment 2, we only use data on 42 subjects (out of 148) with American sounding nicknames that, according to social security database and babynames.com, convey one gender. Four of the names are close varieties of common American names, such as Viktoria, that are included to have a sufficiently large pool of candidates. Hence, the 42 candidates participate in one of 21 unique pairs in each of the five quizzes. Only one of the 42 candidates had a name that did not correspond to self-reported sex.

Design of online experiment:

Subjects in Experiment 2 will be given incentives to evaluate the mathematical performance of candidates in Experiment 1. We inform the subjects about the nature of the five mathematical quizzes, the random matching of candidates and that they will only observe the joint score of candidates in a pair, the combined number of correct answers in a pair on a given quiz.

The evaluation of candidates’ performance will be conducted by having the subjects pick one candidate out of four based on the information about the joint scores of pairs in the five preceding quizzes. Subjects receive cash for each correct answer their chosen candidate provided individually on a sixth quiz. We ask subjects to make eight such choices sequentially. Hence, out of 42 candidates, we present subjects with information about 32 (=4∗8) candidates, in total. All subjects are presented the same 32 candidates, although the order of presentation and amount of information vary across treatments. To convey information about the gender of each candidate we rely on the signalling effect of the nicknames chosen by candidates in Experiment 1. The procedure leading to these nicknames, which is described above, will be thoroughly explained to the subjects.

Treatments:

To study the effect of the informativeness of within-pair name ordering on the selection of female candidates, we vary whether subjects observe tables in which pairs are ordered alphabetically or according to individual performance. In the Alphabetical treatment, names in a pair are ordered alphabetically in the first four tables presented to the subjects. In the last four tables names are ordered depending on the number of correct answers of each pair member, listing first the one who obtained the highest number of correct answers. The First Author treatment is exactly opposite, ordering pair members according to score in the first four tables and alphabetically in the last four. Subjects were not informed about the ordering of pairs until they were presented with each type of table. Four of the eight sets of candidates also contain different number of men and women to reduce the chances of subjects realising that the study is concerned with gender.

Apart from the ordering of pair members, the first four tables are the same across treatments. So are the last four tables. To reduce the potential impact of the ordering of tables - i.e. the effect of observing a set of four candidates before another as opposed to the opposite order - we vary the order of the first four and last four tables across treatments.

The following information is provided to the subjects prior to each decision depending on whether the order is alphabetical or first author:

Alphabetical order: For each pair the names are ordered alphabetically.

First Author order: For each pair the names are ordered according to score so that the one with the highest score is listed first. If both have equal scores, the computer randomly draws the order.

Subsequent to making these eight decisions, we ask subjects an incentivised question regarding the performance of male and female candidates, to elicit their beliefs about gender differences in mathematical abilities. Specifically, we tell subjects that the 32 subjects that they have encountered is a subset of 148 candidates that participated in Experiment 1. We then inform them about the average combined, individual score on the five quizzes, in the whole sample, and ask them to guess the difference in average score between men and women in the whole sample. Subjects first indicate who they think did best, guess the difference in score and are compensated if the answer is in an interval which is ±2 from the correct answer. The average in the whole sample is 39.64865 and men score on average 2.93086 better than women. In addition to this question, we asked several questions regarding the evaluation of individual contributions to joint work in order to draw attention away from the issue of gender.