Gender Gaps at the Top: Exam Performance and Choking Under Pressure

Last registered on June 28, 2023

View Trial History

Pre-Trial

Trial Information

General Information

Title

Gender Gaps at the Top: Exam Performance and Choking Under Pressure

RCT ID

AEARCTR-0011604

Initial registration date

June 23, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

June 28, 2023, 4:31 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

United States of America

Region

online

Primary Investigator

Name

Ingvild Lindgren Skarpeid

Affiliation

Norwegian School of Economics

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Catalina Franco

PI Affiliation

Center for Applied Research (SNF), Norwegian School of Economics

Contact Investigator

Additional Trial Information

Status

In development

Start date

2023-06-23

End date

2024-06-30

Keywords

Behavior, Education, Gender

Additional Keywords

Choking, stress, exams, education, gender, stakes

JEL code(s)

C9, D01, D02, D9, I2, J16

Secondary IDs

Prior work

This trial is based on or builds upon one or more prior RCTs.

Abstract

Being admitted to top-quality education is highly consequential for long-term outcomes. Admissions to top schools are determined by performing well in high-stakes exams in many countries around the world. Yet, a growing body of evidence shows that men outperform women in these important exams. To date, there is scarce evidence as to how and why gender differences in performance arise in high pressure, high-stakes situations. We design an experiment to explore female exam underperformance when the pressure goes up in a controlled environment. In the experiment, we give Prolific workers a series of two Raven's-type short tests containing easy, middle-difficulty and hard questions. After completing the baseline test (Test 1), we introduce treatments in Test 2 varying the level of pressure faced by participants in two ways: (1) We introduce a cutoff that determines whether participants receive a bonus payment, and (2) We increase the monetary payment per correct answer by a factor of fifteen. We measure whether the treatments induce underperformance relative to a control group, and whether there are gender differences in underperformance. We will collect the data for this experiment in June 2023.

External Link(s)

Registration Citation

Citation

Franco, Catalina and Ingvild Lindgren Skarpeid. 2023. "Gender Gaps at the Top: Exam Performance and Choking Under Pressure." AEA RCT Registry. June 28. https://doi.org/10.1257/rct.11604-1.0

Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Experimental Details

Interventions

Intervention(s)

Participants are asked to complete two 3-minute tests, each with 7 questions of varying difficulty. The score in one of the tests will be randomly selected to determine experimental earnings on top of their show-up fee.

Test 1 is equal for all participants. In Test 2, participants are randomly assigned one of three treatment conditions. The treatments aim to increase the pressure that participants face in two ways: Relative to a pure control group, treated participants (treatment arm two and three) all face the introduction of a cutoff. To increase pressure even further, participants in the third treatment arm will face higher stakes through a higher monetary payment per correct answer in addition to the cutoff.

Intervention (Hidden)

Treatment overview:
Treatment arm 1: Control
Participants in the pure control condition are presented with piece rate incentives equivalent to 0.20GBP per correct answer in both Test 1 and Test 2.

Treatment arm 2: Low incentives with cutoff
Test 1 is described above. Participants face higher pressure in Test 2 by introducing a cutoff; if participants score less than the set cutoff of 5 correct answers they will earn zero, otherwise they earn 0.20GBP per correct answer.

Treatment arm 3: High incentives with cutoff
Test 1 is described above. Participants in treatment 3 face the same cutoff in Test 2 as treatment 2. Additionally, pressure is increased even more by increasing the monetary stakes fifteen-fold. Participants earn 3 GBP per correct answer if they score equal to or above the cutoff.

Intervention Start Date

2023-06-23

Intervention End Date

2023-06-30

Primary Outcomes

Primary Outcomes (end points)

1) Participant missing the cutoff, 2) Total score, 3) Number of omitted questions. All specifications are OLS, regressing the outcome variable on treatment dummies, a gender dummy and the interacted treatment and gender dummies. The alternative specifications run the same OLS regression as the main specification, including a set of background controls.

Primary Outcomes (explanation)

The ”missed cutoff” variable equals 1 if the participant scored lower than 5 correct responses, and zero otherwise.

Secondary Outcomes

Secondary Outcomes (end points)

We also collect a number of secondary measures that act as drivers for performance under pressure: e.g. self-reported stress, motivation, effort, worry/anxiety and time use.

Secondary Outcomes (explanation)

Experimental Design

The purpose of this project is to understand how individuals make decisions and perform in activities similar to examinations. The project aims to study different factors that influence decision-making and performance. Participants complete an online study that takes approximately 15 minutes. The survey includes exercises where they complete picture patterns and answer questions about the exercises and themselves. They are informed that they have the opportunity to earn extra money during the study, in addition to the show-up fee. They are asked to read the instructions carefully to understand how this compensation will be determined. If participants navigate away from the survey page, they will be excluded from the study without pay.

Experimental Design Details

Summary of experiment:
We first collect background questions on gender, age, geographical location, household income and educational attainment in order to randomize, by gender, equal number of women and men into treatment. Participants then in turn face two timed tests of 3 minutes, where they face 7 Raven-style IQ questions of varying difficulty (easy, medium, high). The first test is paid with piece rate incentives, and the second test participants are paid according to the treatment incentives described below. Participants are informed that earnings from one of the tests will be randomly drawn for bonus payout, and that they should do their best on both tests to maximize their payoff. Both tests are constructed such that questions follow a set order of difficulty.

Participants can choose to respond to the questions as they appear, or move questions to the end of the test, allowing for some room to develop individual test strategies. After answering a question, participants see a wait page that stops for maximum 20 seconds. Here they have to answer a set of questions regarding their experience solving the question: How difficult they found it, whether they are certain that they got it right, and if they completely guessed when answering it. In addition, after each test is complete we elicit overconfidence and how much effort participants exerted in the test.

Participants finally answer an end survey that maps traits that may be associated with performance under pressure, such as neuroticism, anxiety and how familiar participants are with Raven- style exercises. We also ask a number of stress and motivation questions that are identical to a questionnaire asked after a high-stakes national exam in Colombia, which allows us to compare the self-reported stress, motivation and strategy responses by Prolific workers with the answers from real students in the field in Colombia.

Treatment overview:
Treatment arm 1: Control
Participants in the pure control condition are presented with piece rate incentives equivalent to 0.20GBP per correct answer in both Test 1 and Test 2.

Treatment arm 2: Low incentives with cutoff
Test 1 is described above. Participants face higher pressure in Test 2 by introducing a cutoff; if participants score less than the set cutoff of 5 correct answers they will earn zero, otherwise they earn 0.20GBP per correct answer.

Treatment arm 3: High incentives with cutoff
Test 1 is described above. Participants in treatment 3 face the same cutoff in Test 2 as treatment 2. Additionally, pressure is increased even more by increasing the monetary stakes by a factor of fifteen. Participants earn 3 GBP per correct answer if they score equal to or above the cutoff.

Earlier work:
We are basing the RCT on a previously submitted RCT conducted in August 2022, with RCT ID AEARCTR-0009873. The experiment conducted in August 2022 suffered from budget issues, and had to be interrupted, as the fraction of participants who made the cutoff was substantially higher than anticipated and budgeted. To compensate for this, we decided to make it more difficult to reach the cutoff in the current round of data collection. This resulted in two design changes 1) Shortening the length in the current experiment to 3 minutes, down from 4 minutes and 2) Lowering earnings per correct answer in treatment arm three to 3 GBP, down from 5 GBP.

In order to save costs we also reduced the number of treatment arms. The number of treatment arms in the current study is three, down from four treatment arms. These budget concerns were discussed also in the previously submitted RCT and it was preregistered that if the budget was exceeded, we would stop data collection of treatment arm four, which we have now decided not to include at all.

Randomization Method

Randomization in Qualtrics: Two randomization procedures for gender separately, where participants are randomly assigned to one of the three treatment conditions.

Randomization Unit

Since the gender dimension is of particular importance to us, randomization is done at the individual level, separately for males and females, ensuring that we have equal proportions of males and females in each treatment group.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

N/A

Sample size: planned number of observations

2322 participants, 1161 males and 1161 females, budget permitting (see below)

Sample size (or number of clusters) by treatment arms

774 in each of the three treatment conditions. The final number of participants will not depend on the detected effect size but on the budget.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Previous literature suggests that in examinations choices, small margins contribute to large effects. This implies that even small differences in our experiment may be meaningful. We set the lowest meaningful difference to be a treatment difference of .2 points in the diff-in-diff between gender and treatment. For 80% power we need 771 participants in each treatment group.

Supporting Documents and Materials

IRB