Understanding Skill Gap

Last registered on January 16, 2025

View Trial History

Pre-Trial

Trial Information

General Information

Title

Understanding Skill Gap

RCT ID

AEARCTR-0004227

Initial registration date

August 26, 2019

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

August 30, 2019, 10:25 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

January 16, 2025, 10:23 AM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

Joe Vecci

Affiliation

Gothenburg University

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Edwin Ip

PI Affiliation

Monash University

Contact Investigator

PI Name

Jan Feld

PI Affiliation

Victoria University of Wellington

Contact Investigator

PI Name

Andreas Leibbrandt

PI Affiliation

Griffith University

Contact Investigator

Additional Trial Information

Status

Completed

Start date

2019-09-02

End date

2020-01-15

Keywords

Gender

Additional Keywords

Discrimination, Labour Market

JEL code(s)

J71, C93

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

In this project, we study whether employers believe women are less skilled programmers and whether these beliefs are justified. We also study different ways of reducing potential misconceptions about women’s skill.

External Link(s)

Registration Citation

Citation

Feld, Jan et al. 2025. "Understanding Skill Gap." AEA RCT Registry. January 16. https://doi.org/10.1257/rct.4227-3.1

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention (Hidden)

In the economics literature, there are two main types of discrimination against women in the labor market, namely taste-based discrimination, which refers to an inherent dislike of women, and statistical discrimination, where men are favored because they are believed to be on average better than women at the job. We propose a new methodology that can study beliefs about average skill, the basis for statistical discrimination, in the absence of taste-based discrimination. We further compare such belief with the true average to determine whether potential statistical discrimination may be biased.

In this experiment, we collect applicant data for a programming job. Applicants are invited to take part in a skill assessment test as part of the application process. We measure applicants’ programming skill according to performance in a programming task, which allows us to determine any average skill difference between men and women. After that, we survey potential employers (experienced programmers and HR professionals who hire programmers) to guess the programming test performance of randomly selected male and female applicants using an incentive compatible elicitation procedure, with more accurate guesses leading to higher monetary rewards for the participants. This allows us to capture potential employers’ beliefs about any skill difference between men and women. Comparing this to any actual skill difference in our data allows us to see whether these beliefs are accurate.

We further test the potential effect of two further treatments. In the first, we study whether giving potential employers additional information (e.g. aptitude or personality test results) affects their belief about the applicants’ skill. We test whether exposure to this information affects the potential employers’ belief about the applicants’ skill.

Intervention Start Date

2019-10-15

Intervention End Date

2019-12-01

Primary Outcomes

Primary Outcomes (end points)

Perceived gender skill gap: difference in guesses of programming test scores of randomly selected male and female applicants. These guesses come from our sample of potential employers in the control group.

Actual gender skill gaps: difference in programming test scores of randomly selected male and female applicants.

Gender bias: Difference between perceived and actual gender skill gap (as described above).

Perceived Gender Gap with Second Order Beliefs
The average difference in second order beliefs (i.e. what employers believe other employers guessed for that profile) of male profiles and female profiles. These second order beliefs can also be considered as a descriptive norm.

Gender Bias in Second Order Beliefs
Difference between gender gap in second order beliefs and actual gender skill gap (as described above).

Primary Outcomes (explanation)

The actual skill measure is based on a Python programming skills test. In this test, applicants are asked to complete two coding tasks. We use multiple metrics to assess the code in order to determine the programming skill of the candidates. We outline in detail in Appendix 1 (see docs and materials) how the final score will be calculated.

The perceived skill measure is based on the potential employers’ incentive compatible guesses of applicants’ scores on the programming test.

We will add Appendix 2 which will contain details of the elicitation of the guesses and how the perceived skill gaps will be calculated before stage 2 (i.e., before employers make their guesses) of the experiment (see below).

Secondary Outcomes

Secondary Outcomes (end points)

We will explore secondary questions about whether perception of skill differs depend on whether the potential employers’ job functions are human resource related or programming oriented. Human resource professionals may have more experience and training in hiring, however, fellow programmers may have better understanding and experience with regards to evaluating programmers’ actual skills and on the job performance.

Further, we will investigate heterogeneity in the perceived and actual skill gaps. For instance, we will study the age skill gap gradient and the number of years of experience skill gap gradient.

We will collect other predictors of performance in the applicants’ job applications such as; CVs (education, experience), university grade point average, competitiveness and risk, and results on aptitude and personality tests which they complete as part of the application process.

Finally, to measure experimenter demand effect, we will follow de Quidt et al., (2018) and include a few questions at the end of the employer survey to estimate a demand effect.

References:
de Quidt, Jonathan, Johannes Haushofer, and Christopher Roth. 2018. "Measuring and Bounding Experimenter Demand." American Economic Review, 108 (11): 3266-3302.

Secondary Outcomes (explanation)

We recruit participants who are programmers and participants whose role is more human-resource driven and are not necessarily programmers.

Experimental Design

Our design aims to measure perceived and actual gender skill gap in the programming field between male and female programmers.

Experimental Design Details

Our design aims to measure perceived and actual gender skill gap in the programming field between male and female programmers. The design consists of two stages. 

In stage 1, we will collect a measure of actual programming skill gap for a pool of job applicants. We will post a job ad for a real programming job (1 month, 80 hours) across the United States at several job portals (e.g. joinhandshake.com, dice.com, crunchboard.com, github.com) and ask job applicants to send their CV and fill out a short survey (e.g. years of experience programming). After applications close, we will invite a subset of applicants to take part in two tests: either a Python programming test and an aptitude test (50% of applicants), or a Python programming test and a personality test (50% of applicants). The scores on the programming test will be the basis for our measure of programming skill gap. The test will be administered on a conditional random sample based on applicants’ gender to ensure that we elicit similar shares of men and women in our sample. 

In a concurrent experiment (AEA RCT Registry title: Pay for applications ) we incentivise a random sample of applicants to complete the two tests (Programming & Aptitude and Programming & Personality). We will only use the non-incentivised group for the experiment described here as the actual skill measure for this group will reflect an applicant pool when using a standard procedure (i.e. when applicants are not paid to apply). 

 In Stage 2, we will collect a measure of perceived skill gap by having a sample of potential employers guess the programming test scores of randomly selected male and female applicants and see how the two treatments affect these guesses. We explain this in more detail below. 

We will use Qualtrics to recruit the potential employers to take part in guessing the test scores of applicants. Our sample of potential employers consists of 1) people who work as programmers, and 2) people working in recruitment, including that of programmers (HR professionals/Hiring agencies). 

The sample of potential employers will then be block randomly assigned into six blocks: 
1. Control / Male first 
2. Control / Female first 
3. Information (Personality) / Male first 
4. Information (Personality) / Female first 
5. Information (Aptitude) / Male first 
6. Information (Aptitude) / Female first 

The block random assignment ensures that each of the six blocks has the same number of potential employers. In all variants applicants first rate profiles of one gender and then profiles of the other gender. Each potential employer will be shown 20 profiles (10 male and 10 female profiles). The profiles will be randomly selected from the pool of applicants. We plan to compare the guesses (male versus female) of all 20 profiles. However, if we find evidence of order effects, we will perform additional analyses focusing on the first guess of each potential employer and the first 10 guesses of each potential employer (remember the first 10 profiles are all of the same gender). 

Figure 1 (see docs and materials) shows an illustration of the experimental procedure. All potential employers first see a description of the applicants’ test followed by a short profile of the applicant that includes gender and other relevant factors. Employers will then guess the applicants programming test performance. Employers will be shown 20 profiles and will make one guess for each profile. The employers in the first information treatment (treatment 1) will guess applicants’ test performance with additional information on applicants’ personality. The employers in the second information treatment (treatment 2) will guess applicants’ test performance with additional information on applicants score on the aptitude test (either the applicants score on a personality test (50%) or an aptitude test (50%)) [1][2]. Employers in all treatments are paid according to the accuracy of their guesses using the quadratic scoring rule. After guessing test performances, all employers fill in a short survey. The survey collects additional information related to the research (e.g. whether they have received voluntary/mandatory anti-bias training, whether they think women would perform worse on these kinds of application tests because they are competitive, whether they had a female professor in University, whether they have female siblings).

Notes:
[1]-Further details about the information given to employers, including the details on the information treatment, will be provided in Appendix 2, after stage 1 data has been collected but before stage 2. 
[2]-The aptitude and personality test are taken from the Mettl test bank. The personality test a version of the Mettl Personality Profiler which shows the scores on the Big 5 personality traits. The aptitude test is a 25 min version of Mettl’s General Aptitude test (see https://mettl.com/en/job-aptitude-test/) 

[Update September 25th, 2019] 
We have fewer than expected female applicants, which has prompted us to make two amendments to our experimental design. First, we will oversample male applicants to female applicants in a ratio of 3 to 1 to increase our sample size. Second, we will collect additional applicant data by posting a second job. This job will be posted on the same job sites and will contain the same job description. We will split the sample of job 1 and job 2 (50% of each gender) between this project and the incentive treatment in the pay for applicants project (AEARCTR-0004625). We decided to do this additional step in the data collection before inviting any applicants to take the tests to ensure that our data collection efforts are not driven our results but exclusively by concerns about low sample size.

Randomization Method

Randomization will be carried out by a computer.

Randomization Unit

The randomization unit will be the individual for all treatments.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

For stage 1, we plan to invite up to 360 applicants to complete the Python skills assessment. All applicants will be given the test OR if there are more than 360 applicants a conditional random sample of applicants will be given the test (depending on the number and gender composition of the applicants). The number of observations depends on the number of applicants for the programming job as well as the share of invited applicants who will actually complete the assessments.

For stage 2, we plan to have between 450 and 780 potential employers (50% programmers and 50% HR professionals) depending on availability and funding. We will decide on the exact number of potential employers before the start of stage 2. This will be stated in Appendix 2.

Sample size: planned number of observations

For the applicant sample (stage 1), the number of observations is the same as the number of clusters. We hope to have up to 360 observations (180 male applicants and 180 female applicants). For the potential employer sample (stage 2), each employer will guess 20 different profiles (10 for each gender). This leaves us with between 9,000 and 15,6000 employer-applicant-profile observations (depending on the number of potential employers). For parts of the analysis where we focus on across employer guesses (due to order effects), we will only use the between-gender-profile observations from the guesses of the first gender which leaves us with a maximum 7,800 employer-applicant-profile observations.

Sample size (or number of clusters) by treatment arms

In the first stage we plan to have up to 180 applicants (90 male and 90 female) who complete the Python test and the personality test while the same number will also answer the Python test and the aptitude test.

In the second stage, in each randomization block will have between 75 and 130 potential employers who will make 20 guesses each. The final number will depend on availability of participants and budget [1]. The number of observations will be the same in each randomization block. These are the 6 randomization blocks.

1. Control / Male first
2. Control / Female first
3. Information (Personality) / Male first
4. Information (Personality) / Female first
5. Information (Aptitude) / Male first
6. Information (Aptitude) / Female first

Notes:
[1] Further details about the information given to employers, including the details on the information treatment, will be provided in Appendix 2, after stage 1 data has been collected but before stage 2

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

Documents

Document Name

Appendix 2

Document Type

other

Document Description

As mentioned in the original pre analysis plan, this document contains Appendix 2 which was submitted prior to collection of stage 2 of the project.

File

Appendix 2

MD5: 78115101ef07de2dea4ab265129338c8

SHA1: 8b63d7832403cc2267e1335b64b0474aa4e96349

Uploaded At: December 07, 2020

Document Name

Calculating the test score

Document Type

other

Document Description

Detailed information on how the test score is calculated

File

Calculating the test score

MD5: 226b6ac5f2819a577c3aa36ee5618760

SHA1: 86836b294880ebc061cbb463950023c73715bb38

Uploaded At: August 26, 2019

Document Name

Figure 1

Document Type

other

Document Description

Figure 1

File

Figure 1

MD5: 4e30c1917041871faa20a42ca9b69b5e

SHA1: f64951abe4e3e8cf3e08700f75d8709ffe168aab

Uploaded At: August 26, 2019

Document Name

Appendix: Perceived Ethnicity

Document Type

survey_instrument

Document Description

This document was uploaded on the 16th of January 2025 prior to any data collection related to this survey. The document outlines the methodology and analysis used to measure an employer's perception of ethnicity.

File

Appendix: Perceived Ethnicity

MD5:

SHA1:

Uploaded At: January 16, 2025

IRB

Institutional Review Boards (IRBs)

IRB Name

Monash University

IRB Approval Date

2018-11-14

IRB Approval Number

14985

Analysis Plan

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?

Data Collection Complete

Data Publication

Is public data available?

Program Files

Reports, Papers & Other Materials

Understanding Skill Gap

Pre-Trial

General Information

Locations

Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Registration Citation

Interventions

Primary Outcomes

Secondary Outcomes

Experimental Design

Experiment Characteristics

Documents

Institutional Review Boards (IRBs)

Post-Trial

Study Withdrawal

Intervention

Data Publication

Program Files

Relevant Paper(s)

Reports & Other Materials