Experimental Design
Experiments were conducted in 203 schools across three cities (27,000 students). All experiments had a similar implementation plan. First, researchers garnered support from the district superintendent. Second, a letter was sent to principals of schools that served the desired grade levels. Third, researchers met with principals to discuss the details of the programs. After principals were given information about the experiment, there was a brief sign-up period – typically five to ten days. Schools that signed up to participate serve as the basis for randomization. All randomization was done at the school level. Students had to have a parental consent form signed. Students received their first payments the second week of October and their last payment was disseminated over the summer. All experiments lasted one academic year.
In Dallas, 42 schools signed up to participate in the experiment (3,718 second graders with test scores), and researchers randomly chose twenty-one of those schools to be treated. Participating schools received $1,500 to lower the cost of implementation. Upon finishing a book, each student took an Accelerated Reader (AR) computer-based comprehension quiz. The student earned a $2/book reward for scoring eighty percent or better on the book quiz for up to 20 books per semester. Students were allowed to select and read books of their choice at the appropriate reading level and at their leisure, not as a classroom assignment. The books came from the existing stock available at their school. Quizzes were taken in the library on a computer and students were only allowed one chance to take a quiz. An important caveat of the Dallas experiment is that researchers combine Accelerated Reader (a known software program) with the use of incentives. If the Accelerated Reader program has an independent (positive) effect on student achievement, the impact of incentives would be overstated. Three times a year (twice in the fall and once in the spring) teachers in the program tallied the total amount of incentive dollars earned by each student based on the number of passing quiz scores. A check was then written to each student for the total amount of incentive dollars earned.
In New York City, 121 schools signed up to participate; researchers randomly chose 63 schools (thirty-three fourth grades and thirty-one seventh grades) to be treated (15,883 fourth and seventh graders with test scores). A participating school received $2,500 if eighty percent of eligible students were signed up to participate and if the school had administered the first four assessments. The school received another $2,500 later in the year if eighty percent of students were signed up and if the school had administered all six assessments. Students were given incentives for their performance on six computerized exams (three in reading and three in math) and four predictive assessments that were pencil and paper tests. For each test, fourth graders earned $5 for completing the exam and $25 for a perfect score. The incentive scheme was strictly linear – each marginal increase in score was associated with a constant marginal benefit. The magnitude of the incentive was doubled for seventh graders – $10 for completing each exam and $50 for a perfect score – yielding the potential to earn $500 in a school year. Approximately sixty-six percent of students opened student savings accounts with Washington Mutual as part of the experiment and money was directly deposited into these accounts. Certificates were distributed in school to make the earnings public. Students who did not participate because they did not return consent forms took identical exams but were not paid.
In Chicago, of the 70 schools opted to participate, researchers selected forty smallest schools (7,655 ninth graders) and randomly selected twenty to treat, with the other twenty representing control. Participating schools received up to $1,500 to provide a bonus for the school liaison who served as the main contact for implementation team. Students in Chicago were given grade incentives in five courses: English, mathematics, science, social science, and gym. Researchers rewarded each student with $50 for A, $35 for B, $20 for C, and $0 for D. If a student failed a core course, s/he received $0 for that course and temporarily “lost” all other monies earned from other courses in the grading period. Once the student made up the failing grade through credit recovery, night school, or summer school, all the money “lost” was reimbursed. Students could earn $250 every five weeks and $2,000 per year. Half of the rewards were given immediately after the five-week grading periods ended and the other half was supposed to be held in an account and given in a lump sum conditional on high school graduation.
Researchers collected administrative and survey data. The data include information on each student’s first and last name, birth date, address, race, gender, free lunch eligibility, attendance, matriculation with course grades, special education status, and English Language Learner (ELL) status. In Dallas and New York, researchers are able to link students to their classroom teachers. New York City administrative files contain teacher value-added data for teachers in grades four through eight, as well as data on student suspensions and behavioral incidents.
The main outcome variable is an achievement test unique to each city. All Chicago tenth graders take the PLAN assessment, an ACT college-readiness exam, in October. In May of every school year, students in regular classes in Dallas elementary schools take the Iowa Tests of Basic Skills (ITBS) if they are in kindergarten, first grade, or second grade. Students in bilingual classes in Dallas take a different exam, called Logramos. In New York City, McGraw-Hill mathematics and English Language Arts tests are administered each winter to students in grades three through eight.
Researchers use a parsimonious set of controls to aid in precision and to correct for any potential imbalance between treatment and control. The most important controls are reading and math achievement test scores from the previous two years; they are included in all regressions along with their squares. Previous years’ test scores are available for most students who were in the district in previous years. Researchers also include an indicator variable that is one if a student is missing a test score from a previous year and zero otherwise. Other individual-level controls include a mutually exclusive and collectively exhaustive set of race dummies pulled from each school district’s administrative files, indicators for free lunch eligibility, special education status, and whether a student is an English Language Learner. Researchers also construct three school-level control variables: percent of student body that is black, percent Hispanic, and percent free lunch eligible.
To supplement each district’s administrative data, researchers administered a survey in each of the three school districts. These surveys include basic demographics of each student such as family structure and parental education, time use, effort and behavior in school, and the Intrinsic Motivation Inventory.
Researchers assess the effects of incentives on state test scores – an indirect and non-incentivized outcome. They estimate intent-to-treat (ITT) effects using a regression equation, which includes an indicator for assignment to treatment, a vector of baseline covariates measured at the individual level, and school-level variables; parsimonious set of controls. All these variables are measures pre-treatment. ITT provides an estimate of the impact of being offered a chance to participate in a financial incentive program. All student mobility between schools after random assignment is ignored. Researchers only include students who were in treatment and control schools as of October 1 in the year of treatment. For most districts, school begins in early September; the first student payments were distributed mid-October. All standard errors are clustered at the school level. To ensure that they do not overfit the model, researchers also estimate treatment effects with school-level regressions.
Researchers also assess effect of incentives on outcomes for which students where given direct incentives (i.e., books in Dallas, predictive tests in NYC, and report card grades in Chicago), their self-reported effort, and intrinsic motivation. They also assess treatment effects for subsamples – gender, race/ethnicity, previous year’s test score, an income proxy, whether a student is an English language learner, and, in Dallas only, whether or not a student took the English or Spanish test. All categories are mutually exclusive and collectively exhaustive. Standard errors are clustered at the school level.