Experimental Design Details
We run our experiment on Amazon’s Mechanical Turk (MTurk), an online labor market where employers offer real-wage tasks and exercises to a large pool of potential workers (see Horton et al., 2011). MTurk permits to reach a diversified pool of participants in terms of demographic characteristics: Age, gender, ethnicity, education as well as risk aversion, competitiveness and personality traits (measured using a five-factor model as in Gerlitz and Schupp, 2005). MTurk participants are the job-applicants in our setting. Job-applicants’s quality is measured as their performance in a number-finding task: Finding two numbers, out of a 3x3 matrix, that add up to one hundred (see also Buser et al., 2014). The job-applicants perform the task 10 times. The faster a job-applicant is, the better her performance. Both the algorithmic and the human recruiter have imperfect information: They can partially observe the job-applicants’ performance and some of their demographic characteristics (age, gender, education, ethnicity). The experiment consists of two phases: The pilot experiment, that took place in spring 2019, and the main experiment, that will take place in June 2020. The pilot experiment is used to train the algorithmic and human recruiters. The algorithm is trained on the performance of 345 MTurk job-applicants. It is an OLS regression whose coefficients are used to predict the performance of the job-applicants who will choose the algorithmic recruiter in the main experiment. For the human recruiter, we asked 22 Utrecht University students to evaluate the performance of 83 Mturk job-applicants. Each of the 22 students individually evaluated half of the 83 Mturk job-applicants. For each of the 22 human recruiters, we run a OLS regression to compute the weights each of the 22 recruiters assigns to the demographic characteristics and the speed. We then randomly assign one of the 22 sets of weights to the job-applicants who choose to be evaluated by a human.
In the main experiment we recruit around 500 job-applicants. As in the pilot experiment, job-applicants have to perform the number-finding exercise and choose between algorithmic and human recruiter. Then the computer assigns one of the recruiters to the job-applicants. We elicit the job-applicants’ willingness to pay for their favorite recruiter by giving them the opportunity to change the recruiter assigned by the computer. The job-applicants also have the opportunity to explain their choice of recruiter, their beliefs regarding how well both the algorithmic and the human recruiter would score them and their willingness to pay.
We also would like to study how job-applicants' perception of their ability in the number-finding exercise affects their choice of the recruiter and their willingness to pay. To do this, we randomly provide information about the median performance in the number finding-task to a sub-sample of the participants (treatment group) and we do not provide this piece of information to another sub-sample (control group).