Across-test human performance in mathematics

Last registered on October 26, 2023


Trial Information

General Information

Across-test human performance in mathematics
Initial registration date
October 06, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
October 17, 2023, 10:55 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 26, 2023, 7:19 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.


There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Harvard University

Other Primary Investigator(s)

PI Affiliation
Harvard University

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
We study how human subjects perform on a series of math questions sampled from different standardized tests of various difficulty. The study will take place online and recruit subjects from the Prolific platform.
External Link(s)

Registration Citation

Dreyfuss, Bnaya and Raphael Raux. 2023. "Across-test human performance in mathematics." AEA RCT Registry. October 26.
Experimental Details


We administer a math test online, where subjects attempt to solve 30 randomly-sample questions from 3 standardized tests of various difficulty levels (4th grade, 8th grade, and High school). These tests belong to the TIMSS family of tests, which are administered every four years in several countries. We are restricting our attention to multiple-choice questions, which represent the large majority of available questions and are more directly comparable between them. The size of the question pool will be around 420.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Subject performance in the test.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Subjects see a total of 30 questions. Questions are randomly sampled from the total pool of questions. For this stage there are no "treatments".

The goal of this exercise is to create an index of difficulty across all questions, using average success rates. We expect (tautologically) that a given subject' performance will be decreasing with question difficulty (computed with average performance). At the extremes of the ability distribution, a flatter slope can be expected given that performance is bounded between 0 and 1.
Experimental Design Details
Not available
Randomization Method
Randomization done through qualtrics survey flow.
Randomization Unit
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
We want to obtain a large enough number of responses for each item, to create a reliable difficulty index. We would need between 20 and 40 responses per question: for 420 questions this means between 280 and 560 subjects approximately.
Sample size: planned number of observations
Between 280 and 560 subjects
Sample size (or number of clusters) by treatment arms
Between 280 and 560 subjects
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Institutional Review Boards (IRBs)

IRB Name
Committed on the Use of Human Subjects - Harvard
IRB Approval Date
IRB Approval Number