Gender gaps in exam performance at the top

Last registered on January 03, 2023


Trial Information

General Information

Gender gaps in exam performance at the top
Initial registration date
January 02, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 03, 2023, 5:31 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.



Primary Investigator

Norwegian School of Economics

Other Primary Investigator(s)

PI Affiliation

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
The starting point for the current research project is to understand the why there is a gender gap at the top of the performance distribution for a high-stakes college entrance exam in Colombia. It has been previously documented that women’s academic performance declines when stakes are high (Cai et al., 2019) and that this has consequences for their admission to their preferred academic program (Jurajda & Münich, 2011). Some research has looked at gender differences in willingness to take risk and self-confidence to explain the gender gap (Riener & Wagner, 2018), and it has been documented in a number of studies that women are less willing to guess in exams and therefore leave more questions unanswered, resulting in lower test scores (Baldiga, 2014; Coffman & Klinowski, 2020). In this paper, we investigate whether differences in performance among these top performing females and men result from gender differences in their test-taking strategies. The current pre-analysis plan details the treatments proposed for a pilot which will be run before scaling up the intervention in spring 2022.
External Link(s)

Registration Citation

Franco, Catalina and Ingvild Lindgren Skarpeid. 2023. "Gender gaps in exam performance at the top." AEA RCT Registry. January 03.
Experimental Details


In this pilot we study gender differences in test-taking strategies under pressure. We simulate a real-life situation of an important university entrance exam where the participant is either close to, or far away from a cutoff that can give her access to the institution academic program of her preference. The number of allowed mistakes resembles how far students are from a cutoff in a test. Being close to a cutoff means that any mistake is costly because students may just miss the cutoff to get access to their preferred academic option. Being far from a cutoff means that mistakes are less costly because it is unlikely that the outcome of the test changes whether they get access to their preferred academic option.

We will test the effect of this manipulation as well as testing for possible incentive concerns that must be considered before scaling up the experiment on a larger sample.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Fraction of participants missing the cutoff, by gender.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The main purpose of the pilot is to test an experimental simulation of being near or far from the cutoff. We create a setting where the student is close to a cutoff by manipulating the number of mistakes she can make before losing her experimental earnings. In the treatment where few mistakes are allowed or "close to the cutoff", every mistake is costly: one mistake might send the student from being just above to just below the cutoff and lose her experimental earnings. In the "far from the cutoff" treatment, many more mistakes are allowed. We also manipulate when in the test the students face a very hard question that they are likely to get wrong, to see whether test strategies are affected by when the student randomly faces a hard question in the examination, early or late in the test.
Experimental Design Details
We first run a pilot where we test our proposed level of allowed mistakes before losing experimental earnings. We also test whether the level of incentives have an effect on performance.

The pilot will consist of two work parts and a questionnaire.

The questionnaire will be divided in two parts, a pre-survey demographic questionnaire which is filled out as students sign up for the experiment and a post-survey questionnaire. In these questionnaires several factors in our post- experiment survey: exam preparation, subject ability, motivation in the task, and anxiety in the task.

In the work parts, participants are asked to solve seven mathematical tasks in five minutes. The questions are similar to the Raven's test, where the participants see a sequence of images and must complete the pattern by choosing one of five alternatives. After choosing an option, participatns answer questions about the confidence in their answers and whether they guessed or not.

During both part one and part two, a clock with the remaining time is visible on participants' screens. To simulate the real examination, participants also have the possibility of skipping questions and return to them at the end of the test. At the end of the part they see a screen with thumbnails of each question and they can click on the thumbnail to return to the questions they skipped in turn. After each submitted response participants are taken to a new page where the timer is stopped - and they have twenty seconds to answer how confident they feel that their submitted answer was correct. If the timer runs out and participants fail to answer any questions, their response is coded as wrong. There is no punishment for wrong answers.

After the first work part, the participants face one of the treatment conditions which is explained in detail below. After finishing this, they answer a post-survey questionnaire.

Near versus far from cutoff
In the near the cutoff condition, if the participants get two or more questions wrong, they obtain a payoff of zero. That is, they can only get maximum two questions wrong before they "miss the cutoff". In the far from cutoff condition, they can get up to 5 questions wrong before they come below the cutoff. In this way we are able to study whether the cost of making a mistake has an effect on the performance of the participants. Participants do not know whether have answered correctly until the test is completed.

One main concern is that the above treatments may induce different behaviour as they have different expected payoffs. We want to test if any behavioural differences we see empirically is driven incentive levels, and not by our the pressure from the mistakes. We propose to test for incentive concerns in the following way:

Incentive manipulation 1
First, we address the fact that in the treatments where mistakes are costly, the expected payoff for each participant is lower than in those treatments where mistakes are less costly. To address these concerns, we introduce one treatment where we increase the bonus payment in the "near-cutoff"-condition such that the expected payoff for the "far-from-cutoff"-condition is equal to the "near-cutoff"-condition. The calculations for this is shown in the attached pre-analysis plan.

Incentive manipulation 2
Second, we add a treatment where participants are only paid their experimental earnings with a ten percent probability. This is to address concerns that the stakes will be too low in the main survey experiment. We will not have the funding to pay every participant their full experimental earnings in the main survey experiment so this is to test any behavioural effects on a similar sample to the main experiment sample.
Randomization Method
Computer randomized
Randomization Unit
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
For pilot, approximately 400 individuals who are current students at Universidad Nacional de Colombia. For the main intervention we will recruit as many as possible from a 3000 sample of applicants to the 2020 cycle at Universidad Nacional de Colombia who scored either just above or just below the cutoff of their preferred study field.
Sample size: planned number of observations
For pilot, approximately 400 individuals who are current students at Universidad Nacional de Colombia. For the main intervention we will recruit as many as possible from a 3000 sample of applicants to the 2020 cycle at Universidad Nacional de Colombia who scored either just above or just below the cutoff of their preferred study field.
Sample size (or number of clusters) by treatment arms
We will have five treatments in the pilot: 80 students in each treatment arm.
In the experiment we will have four treatments and aim for approximately 750 in each treatment arm.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials