The experiment is comprised of three components: child testing sessions, the creation of grading packets, and teacher grading sessions. First, we ran exam tournaments for children between seven and 14 years of age in April 2007. Our project team went door to door to invite parents to allow their children to attend a testing session to compete for a 2,500 INR prize (about 58 USD). Over a two-week period, 69 children attended four testing sessions. During the testing sessions, the project team obtained informed consent and then administered a short survey to the parents in order to collect information on the child and the basic demographic characteristics of the family. Next, the project team administered the exam. We included questions that tested standard math and language skills, as well as an art section.
Second, we randomized the demographic characteristics observed by teachers on each exam so that these characteristics are uncorrelated with exam quality. Each teacher was asked to grade a packet of exams. To form these packets, each completed test was stripped of identifying information, assigned an ID number, and photocopied. Twenty-five exams were then randomly selected to form each packet, without replacement, in order to ensure that the teacher did not grade the same photocopied test more than once. Each exam in the packet was then given a coversheet, which contained the randomly assigned characteristics: child’s first name, last name, gender, caste information, and age. The assigned characteristics were each drawn from an independent distribution. Each exam was graded by an average of 43 teachers.
Lastly, we recruited teachers to grade the exams. We obtained a listing of the city’s schools from the local government and divided them into government and private schools. For each category, we ranked the schools using a random number generator. In total, the project team visited about 167 schools to recruit 120 teachers, 67 from government schools and 53 from private schools. Teachers were invited to participate in a study to understand grading practices, where they were told that they would grade twenty-five exams in return for a 250 INR (about 5.80 USD) payment. Each grading session lasted about two hours. The project team provided the teachers with a complete set of answers for the math and language sections of the test, and the maximum points allotted for each question for all three test sections. Teachers were told that partial credit was allowed, but the team did not describe how it should be allocated. the teachers each received 25 randomly selected exams—with the randomly assigned cover sheets—to grade, as well as a “testing roster” to fill out. To ensure that teachers viewed the cover sheets, we asked them to copy the cover sheet information onto the grade roster. They were then asked to grade the exam and enter the grades onto the roster. When a teacher finished grading, the project team administered a short survey to the teacher, which was designed to learn their demographic characteristics and teaching philosophy.
After all the grading sessions were complete, we computed the average grade for each child across all teachers who graded his or her exam. We then awarded the prize to the highest scoring child in each of the age categories based on these average grades.