Can blind grading on online platforms reduce grade gaps in education?

Last registered on July 10, 2023

Pre-Trial

Trial Information

General Information

Title
Can blind grading on online platforms reduce grade gaps in education?
RCT ID
AEARCTR-0011694
Initial registration date
June 29, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 10, 2023, 8:46 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
University of Oregon

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2023-06-29
End date
2023-10-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
I conduct an online laboratory experiment with Prolific and Qualtrics to evaluate whether blind grading on online platforms like Canvas might promote grade equity. I report the average impact on grade gaps by race and gender and the underlying behaviors that drive those gaps. In particular, I measure information seeking, time spent grading, order effects, and self-reported ratings. Motivated by rational inattention theory consistent with a Bayesian learning model with costly signal extraction, this experiment provides empirical evidence on whether statistical discrimination and endogenous inattention can explain grade differences.
External Link(s)

Registration Citation

Citation
Ren, Tamara. 2023. "Can blind grading on online platforms reduce grade gaps in education?." AEA RCT Registry. July 10. https://doi.org/10.1257/rct.11694-1.0
Experimental Details

Interventions

Intervention(s)
Participants grade in either a blind or nonblind environment.
Intervention (Hidden)
I will recruit participants from Prolific to complete a Qualtrics survey. Participants will grade a sequence of homework submissions on the same prompt, “Should we tax companies that use artificial intelligence in place of human workers?” In the blind group, participants only see the homework submissions. In the non-blind group, participants see the submissions and signals of student demographics.
Intervention Start Date
2023-06-29
Intervention End Date
2023-07-31

Primary Outcomes

Primary Outcomes (end points)
- Grades given to each submission
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
- Time spent grading each submission
- Whether participants opened the grading rubric and lecture notes.
- Whether participants opened lecture notes
- Self-reported ratings on "grading is exhausting" and others (did they use the grading rubric/lecture notes, how did they feel about grading overall)
- Accuracy of grading (difference in scores relative to pre-determined scores)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
I will recruit participants online. Participants will grade a sequence of homework submissions on the same prompt. Participants in the treatment group will see a randomly generated fictitious identity attached to each submission. The fictitious identities include names and photos to mimic the information graders can see on grading platforms like Canvas. The remaining participants in the blind group will not observe any fictitious student information. In both groups, participants can choose to view the grading rubric and the class notes on every submission.

Experimental Design Details
I will recruit participants from Prolific to complete a Qualtrics survey. Participants will grade a sequence of homework submissions on the same prompt, “Should we tax companies that use artificial intelligence in place of human workers?” The Qualtrics survey mimics grading on platforms like Canvas in a blind and nonblind environment. These participants will be given contextual information on the homework question before the grading part of the survey. They can also review fictitious class notes and a grading rubric at any time during the study. Participants will be told in advance that everything is fictitious. I use AI to generate all material.

Participants in the treatment group will see a randomly generated fictitious identity attached to each submission. The fictitious identities include names and photos to mimic the information graders can see on grading platforms like Canvas. The remaining participants in the blind group will not observe any fictitious student information. Fictitious identities will be randomly assigned to each submission each participant sees in the nonblind group. The identities include a name and a photo that reflects the fictitious student's gender and race.

To identify the potential impact of blind grading on gender and racial grade gaps in education, I include four different racial groups and two genders. The fictitious students' racial and ethnic makeup is White, Black, Hispanic, and Asian. I will pool the Black, Hispanic, and Asian identities into a non-white group. There will always be an equal representation of white and non-white identities and female and male identities.

Each participant will grade eight randomly ordered submissions drawn from a larger pool of twenty submissions. Each submission is graded out of five points. Participants can choose to allocate points in half-point increments to reflect partial credit, a common practice in real life. Once a submission is graded, participants cannot go back, review, or change their previous assessments to ensure a clean identification.

Participants can look at the grading rubric or reference the class notes while they grade. On every submission, there will be two gray buttons that say “View Answer Key” and “View Lecture Notes.” If the participant chooses to reference these materials, they can click the buttons to reveal images of the grading rubric and the class notes within the page,

Once all eight submissions have been graded, I will ask participants to answer self-rating questions on their agreement with “Grading is exhausting”, “the grading rubric helped with assessing the homework,” “the class lecture notes helped with assessing the homework,” and “deciding how many points to allocate became easier over the duration of the survey.” After which they can provide their demographic information and then finish the study.

Participants will receive both a flat fee for completing this study and a potential bonus for "accurately" grading each submission. Participants will know there exists a "correct" value associated with each submission, but they will not know this value.

The "correct" values come from a survey I conducted where faculty and graduate students from the economics department at the University of Oregon blindly graded the homework submissions that will be used in this experiment. The average scores from this pre-experiment survey will act as the correct value and benchmark the bonus payments for each submission.

In this study, participants will be told that a group of teachers previously graded each submission and each submission is associated with the average score from the teachers' grades. In addition, participants will be explicitly told they can earn a bonus if they grade each submission within half a point of the pre-determined score.

The bonus amount will be shown in the instructions before starting the grading portion of the study. The bonus will range between $0.05
and $0.45 in intervals of ten cents. Participants will receive a flat fee of
$1.75 for completing the survey plus the bonus. Because I'm using Prolific, the expected time and the flat fee is set prior to the launch of the experiment.

Randomization Method
I use Qualtrics to randomize participants into a treatment or control group.
Randomization Unit
Individual
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
500 participants
Sample size: planned number of observations
4000
Sample size (or number of clusters) by treatment arms
250 individuals in the blind group; 250 individuals in the nonblind group.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Using an intraclass correlation of 0.1 estimated from the data provided by Hanna and Linden (2012) and a covariate R2 = 0.4, I will need 500 participants to have a minimum detectable effect of .09 grade points of one standard deviation.
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Oregon's Committee for the Protection of Human Subjects
IRB Approval Date
2023-06-22
IRB Approval Number
STUDY00000848
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials