Understanding Skill Gap

Last registered on December 07, 2020


Trial Information

General Information

Understanding Skill Gap
Initial registration date
August 26, 2019

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 30, 2019, 10:25 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
December 07, 2020, 9:56 AM EST

Last updated is the most recent time when changes to the trial's registration were published.



Primary Investigator

Gothenburg University

Other Primary Investigator(s)

PI Affiliation
Monash University
PI Affiliation
Victoria University of Wellington
PI Affiliation
Griffith University

Additional Trial Information

Start date
End date
Secondary IDs
In this project, we study whether employers believe women are less skilled programmers and whether these beliefs are justified. We also study different ways of reducing potential misconceptions about women’s skill.
External Link(s)

Registration Citation

Feld, Jan et al. 2020. "Understanding Skill Gap." AEA RCT Registry. December 07. https://doi.org/10.1257/rct.4227
Experimental Details


Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Perceived gender skill gap: difference in guesses of programming test scores of randomly selected male and female applicants. These guesses come from our sample of potential employers in the control group.

Actual gender skill gaps: difference in programming test scores of randomly selected male and female applicants.

Gender bias: Difference between perceived and actual gender skill gap (as described above).

Perceived Gender Gap with Second Order Beliefs
The average difference in second order beliefs (i.e. what employers believe other employers guessed for that profile) of male profiles and female profiles. These second order beliefs can also be considered as a descriptive norm.

Gender Bias in Second Order Beliefs
Difference between gender gap in second order beliefs and actual gender skill gap (as described above).

Primary Outcomes (explanation)
The actual skill measure is based on a Python programming skills test. In this test, applicants are asked to complete two coding tasks. We use multiple metrics to assess the code in order to determine the programming skill of the candidates. We outline in detail in Appendix 1 (see docs and materials) how the final score will be calculated.

The perceived skill measure is based on the potential employers’ incentive compatible guesses of applicants’ scores on the programming test.

We will add Appendix 2 which will contain details of the elicitation of the guesses and how the perceived skill gaps will be calculated before stage 2 (i.e., before employers make their guesses) of the experiment (see below).

Secondary Outcomes

Secondary Outcomes (end points)
We will explore secondary questions about whether perception of skill differs depend on whether the potential employers’ job functions are human resource related or programming oriented. Human resource professionals may have more experience and training in hiring, however, fellow programmers may have better understanding and experience with regards to evaluating programmers’ actual skills and on the job performance.

Further, we will investigate heterogeneity in the perceived and actual skill gaps. For instance, we will study the age skill gap gradient and the number of years of experience skill gap gradient.

We will collect other predictors of performance in the applicants’ job applications such as; CVs (education, experience), university grade point average, competitiveness and risk, and results on aptitude and personality tests which they complete as part of the application process.

Finally, to measure experimenter demand effect, we will follow de Quidt et al., (2018) and include a few questions at the end of the employer survey to estimate a demand effect.

de Quidt, Jonathan, Johannes Haushofer, and Christopher Roth. 2018. "Measuring and Bounding Experimenter Demand." American Economic Review, 108 (11): 3266-3302.
Secondary Outcomes (explanation)
We recruit participants who are programmers and participants whose role is more human-resource driven and are not necessarily programmers.

Experimental Design

Experimental Design
Our design aims to measure perceived and actual gender skill gap in the programming field between male and female programmers.
Experimental Design Details
<p> Our design aims to measure perceived and actual gender skill gap in the programming field between male and female programmers. The design consists of two stages. </p>

<p>In stage 1, we will collect a measure of actual programming skill gap for a pool of job applicants. We will post a job ad for a real programming job (1 month, 80 hours) across the United States at several job portals (e.g. joinhandshake.com, dice.com, crunchboard.com, github.com) and ask job applicants to send their CV and fill out a short survey (e.g. years of experience programming). After applications close, we will invite a subset of applicants to take part in two tests: either a Python programming test and an aptitude test (50% of applicants), or a Python programming test and a personality test (50% of applicants). The scores on the programming test will be the basis for our measure of programming skill gap. The test will be administered on a conditional random sample based on applicants’ gender to ensure that we elicit similar shares of men and women in our sample. </p>

<p>In a concurrent experiment (AEA RCT Registry title: Pay for applications ) we incentivise a random sample of applicants to complete the two tests (Programming & Aptitude and Programming & Personality). We will only use the non-incentivised group for the experiment described here as the actual skill measure for this group will reflect an applicant pool when using a standard procedure (i.e. when applicants are not paid to apply). </p>

<p> In Stage 2, we will collect a measure of perceived skill gap by having a sample of potential employers guess the programming test scores of randomly selected male and female applicants and see how the two treatments affect these guesses. We explain this in more detail below. </p>

<p>We will use Qualtrics to recruit the potential employers to take part in guessing the test scores of applicants. Our sample of potential employers consists of 1) people who work as programmers, and 2) people working in recruitment, including that of programmers (HR professionals/Hiring agencies). </p>

<p>The sample of potential employers will then be block randomly assigned into six blocks: </p>
<p>1. Control / Male first </p>
<p>2. Control / Female first </p>
<p>3. Information (Personality) / Male first </p>
<p>4. Information (Personality) / Female first </p>
<p>5. Information (Aptitude) / Male first </p>
<p>6. Information (Aptitude) / Female first </p>

<p>The block random assignment ensures that each of the six blocks has the same number of potential employers. In all variants applicants first rate profiles of one gender and then profiles of the other gender. Each potential employer will be shown 20 profiles (10 male and 10 female profiles). The profiles will be randomly selected from the pool of applicants. We plan to compare the guesses (male versus female) of all 20 profiles. However, if we find evidence of order effects, we will perform additional analyses focusing on the first guess of each potential employer and the first 10 guesses of each potential employer (remember the first 10 profiles are all of the same gender). </p>

<p>Figure 1 (see docs and materials) shows an illustration of the experimental procedure. All potential employers first see a description of the applicants’ test followed by a short profile of the applicant that includes gender and other relevant factors. Employers will then guess the applicants programming test performance. Employers will be shown 20 profiles and will make one guess for each profile. The employers in the first information treatment (treatment 1) will guess applicants’ test performance with additional information on applicants’ personality. The employers in the second information treatment (treatment 2) will guess applicants’ test performance with additional information on applicants score on the aptitude test (either the applicants score on a personality test (50%) or an aptitude test (50%)) [1][2]. Employers in all treatments are paid according to the accuracy of their guesses using the quadratic scoring rule. After guessing test performances, all employers fill in a short survey. The survey collects additional information related to the research (e.g. whether they have received voluntary/mandatory anti-bias training, whether they think women would perform worse on these kinds of application tests because they are competitive, whether they had a female professor in University, whether they have female siblings).</p>
[1]-Further details about the information given to employers, including the details on the information treatment, will be provided in Appendix 2, after stage 1 data has been collected but before stage 2. <br>
[2]-The aptitude and personality test are taken from the Mettl test bank. The personality test a version of the Mettl Personality Profiler which shows the scores on the Big 5 personality traits. The aptitude test is a 25 min version of Mettl’s General Aptitude test (see https://mettl.com/en/job-aptitude-test/) </p>
[Update September 25th, 2019] <p>
We have fewer than expected female applicants, which has prompted us to make two amendments to our experimental design. First, we will oversample male applicants to female applicants in a ratio of 3 to 1 to increase our sample size. Second, we will collect additional applicant data by posting a second job. This job will be posted on the same job sites and will contain the same job description. We will split the sample of job 1 and job 2 (50% of each gender) between this project and the incentive treatment in the pay for applicants project (AEARCTR-0004625). We decided to do this additional step in the data collection before inviting any applicants to take the tests to ensure that our data collection efforts are not driven our results but exclusively by concerns about low sample size.
Randomization Method
Randomization will be carried out by a computer.
Randomization Unit
The randomization unit will be the individual for all treatments.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
For stage 1, we plan to invite up to 360 applicants to complete the Python skills assessment. All applicants will be given the test OR if there are more than 360 applicants a conditional random sample of applicants will be given the test (depending on the number and gender composition of the applicants). The number of observations depends on the number of applicants for the programming job as well as the share of invited applicants who will actually complete the assessments.

For stage 2, we plan to have between 450 and 780 potential employers (50% programmers and 50% HR professionals) depending on availability and funding. We will decide on the exact number of potential employers before the start of stage 2. This will be stated in Appendix 2.
Sample size: planned number of observations
For the applicant sample (stage 1), the number of observations is the same as the number of clusters. We hope to have up to 360 observations (180 male applicants and 180 female applicants). For the potential employer sample (stage 2), each employer will guess 20 different profiles (10 for each gender). This leaves us with between 9,000 and 15,6000 employer-applicant-profile observations (depending on the number of potential employers). For parts of the analysis where we focus on across employer guesses (due to order effects), we will only use the between-gender-profile observations from the guesses of the first gender which leaves us with a maximum 7,800 employer-applicant-profile observations.
Sample size (or number of clusters) by treatment arms
In the first stage we plan to have up to 180 applicants (90 male and 90 female) who complete the Python test and the personality test while the same number will also answer the Python test and the aptitude test.

In the second stage, in each randomization block will have between 75 and 130 potential employers who will make 20 guesses each. The final number will depend on availability of participants and budget [1]. The number of observations will be the same in each randomization block. These are the 6 randomization blocks.

1. Control / Male first
2. Control / Female first
3. Information (Personality) / Male first
4. Information (Personality) / Female first
5. Information (Aptitude) / Male first
6. Information (Aptitude) / Female first

[1] Further details about the information given to employers, including the details on the information treatment, will be provided in Appendix 2, after stage 1 data has been collected but before stage 2
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Supporting Documents and Materials


Document Name
Appendix 2
Document Type
Document Description
As mentioned in the original pre analysis plan, this document contains Appendix 2 which was submitted prior to collection of stage 2 of the project.
Appendix 2

MD5: 78115101ef07de2dea4ab265129338c8

SHA1: 8b63d7832403cc2267e1335b64b0474aa4e96349

Uploaded At: December 07, 2020

Document Name
Calculating the test score
Document Type
Document Description
Detailed information on how the test score is calculated
Calculating the test score

MD5: 226b6ac5f2819a577c3aa36ee5618760

SHA1: 86836b294880ebc061cbb463950023c73715bb38

Uploaded At: August 26, 2019

Document Name
Figure 1
Document Type
Document Description
Figure 1
Figure 1

MD5: 4e30c1917041871faa20a42ca9b69b5e

SHA1: f64951abe4e3e8cf3e08700f75d8709ffe168aab

Uploaded At: August 26, 2019


Institutional Review Boards (IRBs)

IRB Name
Monash University
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials