Experimental Design Details
In this study, we will provide expert subjects, called evaluators, with information about applicants collected in Avery et al. (2023). They will be told, truthfully, that their decisions will help us decide to whom to offer the position.
Evaluators will be recruited to perform a freelance hiring activity. As this is a natural field experiment, they will not be told that they are in an experiment. Each evaluator will be randomized into one of two treatments: gender known or gender unknown. Each evaluator will be provided with information about the job and the recruitment process. Then, evaluators will be provided with a series of three pairs of applicants. Two of these pairs will be mixed gender, while one will have both applicants be male. For each pair, their job will be to choose who should be considered for the position. For each applicant, they will be provided with the following information: the applicant’s years of experience, their education (whether at least a university degree or not), where they learned coding, what coding languages they know, their answers to four interview questions, and 3 evaluations provided by other evaluators as a part of Avery et al. (2023). In addition to the above information, evaluators in the Gender-Known treatment will also be provided with the first name and last initial of the applicants. The ordering of the applicants on the screen will be determined randomly. The exact same pairs will be shown in the gender know and gender unknown treatments.
In a given pair, the evaluation scores of the applicants, referred to as the set of evaluations, may vary across evaluators. For example, in evaluation set 1, Applicant 1 could have scores of 10, 20, and 30, while Applicant 2 could have scores of 20, 80, and 100. In evaluation set 2, the scores could be different, such as Applicant 1: 80, 20, and 30, and Applicant 2: 70, 80, and 50. This design enables us to examine how evaluators rate the same applicant when the evaluation metrics, such as variance, differ. All evaluations shown will be real evaluations given to that applicant, we use the fact that applicants receive at least 3 evaluations, allowing us to vary the evaluation scores shown to each evaluator.
For each pair of applicants, evaluators will choose which they recommend to be hired. Then, they will be presented with the applicant they chose from all three pairs and asked, between those three, which they would recommend most. While this data will not be used in the primary analysis, it is to justify the presentation of the three pairs rather than one.
To select the pairs of applicants we use the following procedure:
• Starting with the sample of applicants from Avery et al (2023) we first drop individuals with non-typical Western sounding names, those with gender-neutral names, and those who have less than 3 evaluations.
• We also drop those applicants where the answers to the interview questions were unusually short in length or too long, relative to average.
• We then focus on the sample where the average of all evaluation scores put them into the top part of the distribution, as this is the sample from which the selected applicant is likely to come from.
• We then created two groups of pairs: exact match pairs, where the CVs of the two applicants within the pair are identical or very similar; and trade-off pairs, where the CVs of the two applicants within the pair are close in quality but differ on areas of relative strength (e.g., one applicant might have higher education, while the other would have more years of experience).
• For each type of pairs, we have one male-male set and the remainder are mixed-gender pairs.
• For each pair we create 3 sets of evaluation scores. We aimed to select sets of scores that have a low mean difference between pairs but also where there is variation in the variance, where possible. We also aimed to generate a low correlation between variance and mean.
• In total we selected 10 pairs and 3 evaluation sets per pair.
In order to not disadvantage the subjects who were not chosen to be in the pairs, we will use a system similar to that of Kessler et al. (2019) to use the decisions made by our evaluators over the sample pairs to identify the applicants in the full sample that would be chosen by the evaluators. We will then invite applicants who are predicted to be selected to be invited for further interview and from there to be hired.
Our sample will be taken from Mturk, Prolific and, if viable, UpWork.We plan collect 45% of our sample from Mturk, 45% from Prolific and 10% from Upwork although the latter depends on our ability to recruit viable evaluators.
References
Kessler, Judd B., Corinne Low, and Colin D. Sullivan. "Incentivized resume rating: Eliciting employer preferences without deception." American Economic Review 109, no. 11 (2019): 3713-3744.