x

We are happy to announce that all trial registrations will now be issued DOIs (digital object identifiers). For more information, see here.
Discrimination in Grading
Last registered on April 04, 2016

Pre-Trial

Trial Information
General Information
Title
Discrimination in Grading
RCT ID
AEARCTR-0001086
Initial registration date
April 04, 2016
Last updated
April 04, 2016 2:38 PM EDT
Location(s)
Region
Primary Investigator
Affiliation
Harvard University
Other Primary Investigator(s)
PI Affiliation
University of Texas at Austin
Additional Trial Information
Status
Completed
Start date
2007-01-01
End date
2007-12-31
Secondary IDs
Abstract
We report the results of an experiment that was designed to test for discrimination in grading in India. We recruited teachers to grade exams. We randomly assigned child “characteristics” (age, gender, and caste) to the cover sheets of the exams to ensure that there is no relationship between these observed characteristics and the exam quality. We find that teachers give exams that are assigned to be lower-caste scores that are about 0.03 to 0.08 standard deviations lower than those that are assigned to be high caste. The teachers’ behavior appears consistent with statistical discrimination.
External Link(s)
Registration Citation
Citation
Hanna, Rema and Leigh Linden. 2016. "Discrimination in Grading." AEA RCT Registry. April 04. https://doi.org/10.1257/rct.1086-1.0.
Former Citation
Hanna, Rema and Leigh Linden. 2016. "Discrimination in Grading." AEA RCT Registry. April 04. https://www.socialscienceregistry.org/trials/1086/history/7493.
Experimental Details
Interventions
Intervention(s)
We designed an exam competition in which children recruited from the community competed for a prize of INR 2500 (US$58), which was equivalent to 56 percent of parents’ average monthly income. In April 2007, 69 children between the ages of 7-14 years completed exams that tested their math, language, and artistic skills. Afterwards, we removed any information that would reveal the test-taker’s identity and randomly assigned a different set of child “characteristics” (age, gender, and caste) to each exam’s cover sheet. Since the characteristics of the child were randomly assigned, there should be no relationship between the assigned characteristics and test scores. Any correlation between the characteristics and exam scores would be evidence of discrimination.

A total of 120 teachers were recruited from local schools and were paid to grade the exams. Teachers were given 25 exams to grade and were informed that a substantial prize would be awarded to the students with the top scores in each age group. Teachers were given an answer sheet to use for grading but were permitted to offer partial credit at their discretion. In all, each exam was graded by an average of 43 teachers. To ensure that teachers viewed the cover sheets, they were asked to copy the child’s characteristics from the cover sheet onto the grade roster.
Intervention Start Date
2007-04-01
Intervention End Date
2007-05-01
Primary Outcomes
Primary Outcomes (end points)
Student exam grades
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
The experiment is comprised of three components: child testing sessions, the creation of grading packets, and teacher grading sessions. First, we ran exam tournaments for children between seven and 14 years of age in April 2007. Our project team went door to door to invite parents to allow their children to attend a testing session to compete for a 2,500 INR prize (about 58 USD). Over a two-week period, 69 children attended four testing sessions. During the testing sessions, the project team obtained informed consent and then administered a short survey to the parents in order to collect information on the child and the basic demographic characteristics of the family. Next, the project team administered the exam. We included questions that tested standard math and language skills, as well as an art section.

Second, we randomized the demographic characteristics observed by teachers on each exam so that these characteristics are uncorrelated with exam quality. Each teacher was asked to grade a packet of exams. To form these packets, each completed test was stripped of identifying information, assigned an ID number, and photocopied. Twenty-five exams were then randomly selected to form each packet, without replacement, in order to ensure that the teacher did not grade the same photocopied test more than once. Each exam in the packet was then given a coversheet, which contained the randomly assigned characteristics: child’s first name, last name, gender, caste information, and age. The assigned characteristics were each drawn from an independent distribution. Each exam was graded by an average of 43 teachers.

Lastly, we recruited teachers to grade the exams. We obtained a listing of the city’s schools from the local government and divided them into government and private schools. For each category, we ranked the schools using a random number generator. In total, the project team visited about 167 schools to recruit 120 teachers, 67 from government schools and 53 from private schools. Teachers were invited to participate in a study to understand grading practices, where they were told that they would grade twenty-five exams in return for a 250 INR (about 5.80 USD) payment. Each grading session lasted about two hours. The project team provided the teachers with a complete set of answers for the math and language sections of the test, and the maximum points allotted for each question for all three test sections. Teachers were told that partial credit was allowed, but the team did not describe how it should be allocated. the teachers each received 25 randomly selected exams—with the randomly assigned cover sheets—to grade, as well as a “testing roster” to fill out. To ensure that teachers viewed the cover sheets, we asked them to copy the cover sheet information onto the grade roster. They were then asked to grade the exam and enter the grades onto the roster. When a teacher finished grading, the project team administered a short survey to the teacher, which was designed to learn their demographic characteristics and teaching philosophy.

After all the grading sessions were complete, we computed the average grade for each child across all teachers who graded his or her exam. We then awarded the prize to the highest scoring child in each of the age categories based on these average grades.
Experimental Design Details
Randomization Method
Random number generator for school selection
Randomization Unit
Individual randomization
Was the treatment clustered?
No
Experiment Characteristics
Sample size: planned number of clusters
3,000 graded exams (graded in sets of 25 by 120 teachers)
Sample size: planned number of observations
3,000 graded exams (graded in sets of 25 by 120 teachers)
Sample size (or number of clusters) by treatment arms
treatment: 3,000 teacher-graded exams, control: 69 blindly graded exams (graded by research staff)
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
NYU and Columbia
IRB Approval Date
2007-01-01
IRB Approval Number
On File at NYU and Columbia
Post-Trial
Post Trial Information
Study Withdrawal
Intervention
Is the intervention completed?
Yes
Intervention Completion Date
May 01, 2007, 12:00 AM +00:00
Is data collection complete?
Yes
Data Collection Completion Date
December 31, 2007, 12:00 AM +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
3,000 graded exams
Was attrition correlated with treatment status?
Final Sample Size: Total Number of Observations
3,000 graded exams
Final Sample Size (or Number of Clusters) by Treatment Arms
treatment: 3,000 teacher-graded exams, control: 69 blindly graded exams (graded by research staff)
Data Publication
Data Publication
Is public data available?
No
Program Files
Program Files
No
Reports and Papers
Preliminary Reports
Relevant Papers
Abstract
We report the results of an experiment that was designed to test for discrimination in grading in India. We recruited teachers to grade exams. We randomly assigned child “characteristics” (age, gender, and caste) to the cover sheets of the exams to ensure that there is no relationship between these observed characteristics and the exam quality. We find that teachers give exams that are assigned to be lower-caste scores that are about 0.03 to 0.08 standard deviations lower than those that are assigned to be high caste. The teachers’ behavior appears consistent with statistical discrimination.
Citation
Hanna, Rema, and Leigh Linden. 2012. "Discrimination in Grading." American Economic Journal: Economic Policy 4(4): 146-68