Extending the Behavioral Explanation of the Gender Grade Gap: Evidence from a Vignette Experiment

Last registered on May 21, 2025

Pre-Trial

Trial Information

General Information

Title
Extending the Behavioral Explanation of the Gender Grade Gap: Evidence from a Vignette Experiment
RCT ID
AEARCTR-0015829
Initial registration date
May 09, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 21, 2025, 11:54 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Research Center for Educational and Network Studies, Hungarian Academy of Sciences, Centre for Social Sciences; TÁRKI Social Research Institute, Budapest

Other Primary Investigator(s)

Additional Trial Information

Status
On going
Start date
2025-04-16
End date
2026-06-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
The behavioral explanation for the gender gap in school grades suggests that girls receive higher grades than boys because their behavior is more aligned with school’s expectations. This study extends the understanding of the behavioral explanation for the gender gap in school grades in three key ways. First, it examines whether teachers assess the same school (mis)behavior differently when exhibited by a girl rather than a boy, leading to gender bias in behavior assessment. Second, it provides an experimental test of whether behavioral differences influence subject grades and examines whether this effect varies by gender, specifically testing if the gender gap in subject grades is moderated by school behavior. Third, it investigates whether differential treatment in behavior assessment mediates the gender gap in subject grades, demonstrating a spillover effect from behavior assessment bias to overall school performance.
External Link(s)

Registration Citation

Citation
Keller, Tamas. 2025. "Extending the Behavioral Explanation of the Gender Grade Gap: Evidence from a Vignette Experiment." AEA RCT Registry. May 21. https://doi.org/10.1257/rct.15829-1.0
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2025-04-16
Intervention End Date
2026-02-28

Primary Outcomes

Primary Outcomes (end points)
Behavior grade, subject grade
Primary Outcomes (explanation)
There are two primary outcomes, both representing teacher-assigned grades. The behavior grade is measured using the standard grading scale commonly used in daily school practice, ranging from 5 (exemplary behavior) to 2 (poor behavior). The writing grade is assessed on the traditional five-point academic scale: excellent (5), good (4), average (3), satisfactory (2), and unsatisfactory (1).

Secondary Outcomes

Secondary Outcomes (end points)
Behavior assessment, Essay evaluation, Track recommendation, Social status assessment
Secondary Outcomes (explanation)
There are five secondary outcomes. Behavior and writing grades are also measured on a continuous 0–100 scale.

Furthermore, track recommendation, a binary variable coded as 1 if the teacher recommends the student for the elite eight-year academic secondary track, and 0 if the teacher either does not recommend the track or answers “I don’t know.”

Lastly, teachers are asked how likely they find it that the described vignette person belongs to working-class families and how likely they find it that the described vignette person belongs to intellectual families.

Experimental Design

Experimental Design
In the experiment, real teachers read short descriptions of fictitious vignette students whose characteristics are randomly varied across four dimensions: gender, ability, school behavior, and essay quality. The introductory sentence conveys the student’s gender—signaled by their first name and further clarified through explicit mention—as well as their general ability level (high, middle, or low). For example, the sentence reads: Anna is a middle-ability girl in the class.

After this introduction, teachers first review the student’s behavior profile (presented in the behavior vignette) and then assess a sample of the student’s writing (presented in the essay vignette).

Teachers are first primed with the student’s behavior described in the behavior vignette. After reviewing the behavior vignette, they grade the student’s behavior (Task 1). Once this is completed, they revisit the behavior vignette and then read the student’s writing piece (essay). After reviewing both materials, teachers evaluate the essay (Task 2). Finally, in Task 3, teachers are asked whether they would recommend the student for the elite academic secondary track, followed by an assessment of the student’s potential social status in Task 4.

In Task 1, the respondent teacher is asked to grade the described (fictitious) student’s behavior using the standard grading scale, ranging from 5 (exemplary behavior) to 2 (poor behavior). After the traditional behavior assessment, school behavior is additionally evaluated on a scale from 0 to 100.

In Task 2, the respondent teacher is asked to grade the student’s essay using a standard five-point scale: excellent (5), good (4), average (3), satisfactory (2), or unsatisfactory (1). After the traditional assessment, the student’s essay is additionally evaluated on a scale from 0 to 100.

In Task 3, respondent teachers are asked whether they would recommend the student to apply for the elite academic secondary track after completing fourth grade. This eight-year academic track is often considered an elite pathway reserved for the most talented students, as most students are tracked only after eighth grade, resulting in the standard four-year instead of an eight-year secondary education. Teachers can express their recommendation using the response options: “Yes,” “No,” or “I do not know.”

In Task 4, teachers are asked to rate, on a scale from 0 to 100, how likely it is that the described student comes from a working-class family and, separately, how likely it is that the student comes from an intellectual family.

Teachers evaluate six fictitious students, so this procedure will be repeated for each student.

Experimental Design Details
Not available
Randomization Method
Randomly generated numbers
Randomization Unit
Randomization is conducted at the level of the student profile rated by the teacher. Randomization occurs at multiple levels—specifically, the vignette student's gender, ability level, school behavior, and essay quality are each randomly selected from the available options.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
The goal is to collect responses from 200 to 500 teachers, with each teacher evaluating six fictitious students. This will result in a total sample of 1,200 to 3,000 students.
Sample size: planned number of observations
1,200 to 3,000 students
Sample size (or number of clusters) by treatment arms
Since there are 2⁶ = 64 possible combinations of behavior vignettes, a sample of 200 teachers evaluating 1,200 students results in each vignette being assessed approximately 19 times (1,200 / 64). In a larger sample of 500 teachers evaluating 3,000 students, each vignette is evaluated about 47 times (3,000 / 64).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
16-10% of a standard deviation
IRB

Institutional Review Boards (IRBs)

IRB Name
HUN-REN KRTK
IRB Approval Date
2024-11-10
IRB Approval Number
1 FŐIG/ 76-1/ 2024
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information