Experimental Design Details
<p> Our design aims to measure perceived and actual gender skill gap in the programming field between male and female programmers. The design consists of two stages. </p>
<p>In stage 1, we will collect a measure of actual programming skill gap for a pool of job applicants. We will post a job ad for a real programming job (1 month, 80 hours) across the United States at several job portals (e.g. joinhandshake.com, dice.com, crunchboard.com, github.com) and ask job applicants to send their CV and fill out a short survey (e.g. years of experience programming). After applications close, we will invite a subset of applicants to take part in two tests: either a Python programming test and an aptitude test (50% of applicants), or a Python programming test and a personality test (50% of applicants). The scores on the programming test will be the basis for our measure of programming skill gap. The test will be administered on a conditional random sample based on applicants’ gender to ensure that we elicit similar shares of men and women in our sample. </p>
<p>In a concurrent experiment (AEA RCT Registry title: Pay for applications ) we incentivise a random sample of applicants to complete the two tests (Programming & Aptitude and Programming & Personality). We will only use the non-incentivised group for the experiment described here as the actual skill measure for this group will reflect an applicant pool when using a standard procedure (i.e. when applicants are not paid to apply). </p>
<p> In Stage 2, we will collect a measure of perceived skill gap by having a sample of potential employers guess the programming test scores of randomly selected male and female applicants and see how the two treatments affect these guesses. We explain this in more detail below. </p>
<p>We will use Qualtrics to recruit the potential employers to take part in guessing the test scores of applicants. Our sample of potential employers consists of 1) people who work as programmers, and 2) people working in recruitment, including that of programmers (HR professionals/Hiring agencies). </p>
<p>The sample of potential employers will then be block randomly assigned into six blocks: </p>
<p>1. Control / Male first </p>
<p>2. Control / Female first </p>
<p>3. Information (Personality) / Male first </p>
<p>4. Information (Personality) / Female first </p>
<p>5. Information (Aptitude) / Male first </p>
<p>6. Information (Aptitude) / Female first </p>
<p>The block random assignment ensures that each of the six blocks has the same number of potential employers. In all variants applicants first rate profiles of one gender and then profiles of the other gender. Each potential employer will be shown 20 profiles (10 male and 10 female profiles). The profiles will be randomly selected from the pool of applicants. We plan to compare the guesses (male versus female) of all 20 profiles. However, if we find evidence of order effects, we will perform additional analyses focusing on the first guess of each potential employer and the first 10 guesses of each potential employer (remember the first 10 profiles are all of the same gender). </p>
<p>Figure 1 (see docs and materials) shows an illustration of the experimental procedure. All potential employers first see a description of the applicants’ test followed by a short profile of the applicant that includes gender and other relevant factors. Employers will then guess the applicants programming test performance. Employers will be shown 20 profiles and will make one guess for each profile. The employers in the first information treatment (treatment 1) will guess applicants’ test performance with additional information on applicants’ personality. The employers in the second information treatment (treatment 2) will guess applicants’ test performance with additional information on applicants score on the aptitude test (either the applicants score on a personality test (50%) or an aptitude test (50%)) [1][2]. Employers in all treatments are paid according to the accuracy of their guesses using the quadratic scoring rule. After guessing test performances, all employers fill in a short survey. The survey collects additional information related to the research (e.g. whether they have received voluntary/mandatory anti-bias training, whether they think women would perform worse on these kinds of application tests because they are competitive, whether they had a female professor in University, whether they have female siblings).</p>
<p>
Notes:
[1]-Further details about the information given to employers, including the details on the information treatment, will be provided in Appendix 2, after stage 1 data has been collected but before stage 2. <br>
[2]-The aptitude and personality test are taken from the Mettl test bank. The personality test a version of the Mettl Personality Profiler which shows the scores on the Big 5 personality traits. The aptitude test is a 25 min version of Mettl’s General Aptitude test (see https://mettl.com/en/job-aptitude-test/) </p>
<p>
[Update September 25th, 2019] <p>
We have fewer than expected female applicants, which has prompted us to make two amendments to our experimental design. First, we will oversample male applicants to female applicants in a ratio of 3 to 1 to increase our sample size. Second, we will collect additional applicant data by posting a second job. This job will be posted on the same job sites and will contain the same job description. We will split the sample of job 1 and job 2 (50% of each gender) between this project and the incentive treatment in the pay for applicants project (AEARCTR-0004625). We decided to do this additional step in the data collection before inviting any applicants to take the tests to ensure that our data collection efforts are not driven our results but exclusively by concerns about low sample size.
<p>