Measuring Success in Education: The Role of Effort on the Test Itself
Last registered on December 13, 2018

Pre-Trial

Trial Information
General Information
Title
Measuring Success in Education: The Role of Effort on the Test Itself
RCT ID
AEARCTR-0003657
Initial registration date
December 12, 2018
Last updated
December 13, 2018 12:45 AM EST
Location(s)
Region
Primary Investigator
Affiliation
Bentley University
Other Primary Investigator(s)
PI Affiliation
Shanghai Jiao Tong University
PI Affiliation
University of California-San Diego
PI Affiliation
University of Chicago
PI Affiliation
University of California-San Diego
Additional Trial Information
Status
Completed
Start date
2015-11-04
End date
2018-12-10
Secondary IDs
Abstract
Standardized tests comparing educational achievements are an important policy tool. U.S. students often rank poorly on such assessments. We propose that this is due not only to differences in ability but also to differences in effort on the test itself. We experimentally show that offering U.S. students incentives to put forth effort improves test performance substantially. In contrast, Shanghai students, who are top performers on assessments, are not affected by incentives. Our findings suggest that ranking countries based on low-stakes assessments does not reflect only differences in ability, but also intrinsic motivation to perform well on the test.
External Link(s)
Registration Citation
Citation
Gneezy, Uri et al. 2018. "Measuring Success in Education: The Role of Effort on the Test Itself." AEA RCT Registry. December 13. https://www.socialscienceregistry.org/trials/3657/history/38875
Experimental Details
Interventions
Intervention(s)
10th grade students at two high schools in the U.S.A. and four high schools in Shanghai take a 25 question mathematics test which is made up of multiple choice and free answer questions that were given on past editions of the Programme for International Student Assessment (PISA). Students are randomized into either treatment or control. Members of the treatment group is given a financial incentive that is based on their test performance. Right before taking the test, they are given an envelope with $25 in cash (or the equivalent in RMB in Shanghai), and are told that $1 will be taken away for each question that is answered incorrectly. The control group takes the test with no financial incentives.

The test is taken and graded immediately upon completion by computer, so the payments of subjects in the treatment group are processed immediately at the conclusion of the experiment.
Intervention Start Date
2016-03-25
Intervention End Date
2018-04-26
Primary Outcomes
Primary Outcomes (end points)
The study examines four outcome variables, all related to performance on the test:
Student level outcomes (N = 447 in U.S., N = 656 in Shanghai)
1. number of questions answered correctly (out of 25), standardized by subtracting the sample mean and dividing by the sample standard deviation.

Question response level outcomes (N = 447 students x 25 questions = 11,175 observations in U.S., N = 656 students x 25 questions = 16,400 observations in Shanghai)
2. The probability that question i is attempted by student j:
a) over all 25 questions
b) over questions 1-13
c) over questions 14-25

3. The probability that student j answered question i correctly (sample of attempted questions only)
a) over all 25 questions
b) over questions 1-13
c) over questions 14-25

4. The probability that student j answered question i correctly (all questions, both attempted and not attempted)
a) over all 25 questions
b) over questions 1-13
c) over questions 14-25
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
The working paper linked to this registration provides full details of the experimental design (NBER working paper No. 24004). Key details are summarized below.

We recruited two two high schools in the U.S.A. and four high schools in Shanghai to participate. In each school, student subjects take a 25 minute, 25 question mathematics test which is made up of multiple choice and free answer questions that were given on past editions of the Programme for International Student Assessment (PISA). Students are randomized into either treatment or control. Members of the treatment group are given a financial incentive that is based on their test performance. Right before taking the test, they are given an envelope with $25 in cash (or the equivalent in RMB in Shanghai), and are told that $1 will be taken away for each question that is answered incorrectly. The students had no advance notice of the task they would be doing or of the financial incentives. The control group takes the test with no financial incentives.

The exam is administered by computer so scores are available immediately after the test is completed.

The main experiment was conducted in 2016 at two high schools in the U.S. and at three schools in Shanghai.
U.S. School 1 is a high performing private boarding school.
U.S. School 2 is a low performing public school.
At both of these schools, all 10th grade students were required to participate.

Shanghai schools 1 through 3 include one below-average performing school, one school with performance that is just above average, and one
school with performance that is well above average. Two classes each of 10th grade math students at schools 1 and 2 and four classes of 10th grade math students at school 3 were randomly selected to participate.

in 2018, we reran the experiment in Shanghai at schools 2 and 3 and at a new school, school 4, whose performance is also well above average.

Logistics required different randomization procedures of students into treatment or control (described in more detail below):
U.S. School 1: students were randomized into treatment or control at the individual level.
U.S. School 2: students were randomized into treatment or control at the class level.
Shanghai schools 1-3 in 2016 were randomized into treatment or control at the class level.
Shanghai schools 2-4 in 2018 were randomized into treatment or control at the individual level.


Experimental Design Details
Randomization Method
The randomization is stratified by school. Logistics required different randomization procedures of students into treatment or control.

We randomized at the class level in the lower performing school (school 2) in the U.S. and in the 2016 sessions in Shanghai. We randomized at the individual level in the higher performing school in the U.S. and in the 2018 sessions in Shanghai. In the U.S., we stratified by school and re-randomized to achieve balance on the following baseline characteristics: gender, ethnicity and mathematics class level/track: low, regular, and honors. For each school's randomization, we re-randomized until the p-values of all tests of differences between Treatment and Control were above 0.4. In the 2016 Shanghai sessions, we stratified the randomization by school (baseline demographics were not available at the time of randomization). In the 2018 Shanghai sessions, we stratified the randomization by class, gender, and senior entrance exam score quartile.
Randomization Unit
The randomization is stratified by school. Within each school the level of randomization varied because of logistical constraints.
U.S. School 1: students were randomized into treatment or control at the individual level.
U.S. School 2: students were randomized into treatment or control at the class level.
At Shanghai schools 1-3 in 2016, students were randomized into treatment or control at the class level.
At Shanghai schools 2-4 in 2018, students were randomized into treatment or control at the individual level.
Was the treatment clustered?
Yes
Experiment Characteristics
Sample size: planned number of clusters
U.S.: 131 clusters. In U.S. school 2, the randomization was done at the individual level so the size of each cluster is 1 for that school.
Shanghai: 384 clusters. In the 2018 sessions, the randomization was done at the individual level so the size of each cluster is 1 for those sessions.
Sample size: planned number of observations
The U.S. sample includes 447 students (227 in control and 220 in treatment). The Shanghai sample includes 656 students (333 in control and 323 in treatment).
Sample size (or number of clusters) by treatment arms
U.S.:
227 students organized into 64 clusters (12 of size n>1, 52 of size n=1) in control.
220 students organized into 67 clusters (13 of size n>1, 54 of size n=1) in treatment.

Shanghai:
333 students organized into 196 clusters (4 of size n>1, 190 of size n=1) in control
323 students organized into 188 clusters (4 of size n>1, 184 of size n=1) in treatment
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
U.S. For the main outcome, test score (out of 25), the control mean is 10.22 with a standard deviation of 5.63. With 131 clusters of size 3 (the actual average size is 3.41 and clusters vary widely in size), with power of 0.8 and at a significance level of 0.05, the minimum detectable effect size is 1.13. With 131 clusters of size 4 (the actual average size is 3.41 and clusters vary widely in size), with power of 0.8 and at a significance level of 0.05, the minimum detectable effect size is 0.98. Shanghai For the main outcome, test score (out of 25), the control mean is 20.50 with a standard deviation of 2.95. With 384 clusters of size 2 (the actual average size is 1.71; most clusters are size 1 with four clusters that are much larger), with power of 0.8 and at a significance level of 0.05, the minimum detectable effect size is 1.05. With 131 clusters of size 1 (the actual average size is 1.71; most clusters are size 1 with four clusters that are much larger), with power of 0.8 and at a significance level of 0.05, the minimum detectable effect size is 1.00
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
University of Chicago IRB
IRB Approval Date
2015-05-27
IRB Approval Number
IRB15-0448
Post-Trial
Post Trial Information
Study Withdrawal
Intervention
Is the intervention completed?
Yes
Intervention Completion Date
April 26, 2018, 12:00 AM +00:00
Is data collection complete?
Yes
Data Collection Completion Date
April 26, 2018, 12:00 AM +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
U.S.: 447 students, 131 clusters
Shanghai: 656 students, 384 clusters
Was attrition correlated with treatment status?
No
Final Sample Size: Total Number of Observations
U.S.: 447 students, 131 clusters
Shanghai: 656 students, 384 clusters
Final Sample Size (or Number of Clusters) by Treatment Arms
U.S.: 227 students organized into 64 clusters (12 of size n>1, 52 of size n=1) in control. 220 students organized into 67 clusters (13 of size n>1, 54 of size n=1) in treatment. Shanghai: 333 students organized into 196 clusters (4 of size n>1, 190 of size n=1) in control 323 students organized into 188 clusters (4 of size n>1, 184 of size n=1) in treatment
Data Publication
Data Publication
Is public data available?
No

This section is unavailable to the public. Use the button below to request access to this information.

Request Information
Program Files
Program Files
No
Reports and Papers
Preliminary Reports
Relevant Papers
Abstract
Tests measuring and comparing educational achievement are an important policy tool. We experimentally show that offering students extrinsic incentives to put forth effort on such achievement tests has differential effects across cultures. Offering incentives to U.S. students, who generally perform poorly on assessments, improved performance substantially. In contrast, Shanghai students, who are top performers on assessments, were not affected by incentives. Our findings suggest that in the absence of extrinsic incentives, ranking countries based on low-stakes assessments is problematic because test scores reflect differences in intrinsic motivation to perform well on the test itself, and not just differences in ability.
Citation
Uri Gneezy, John A. List, Jeffrey A. Livingston, Sally Sadoff, Xiangdong Qin, Yang Xu (2017) "Measuring Success in Education: The Role of Effort on the Test Itself." NBER Working Paper No. 24004