One Truth, Many Datasets: How Context Shapes Model Extraction

Last registered on May 04, 2026

Pre-Trial

Trial Information

General Information

Title
One Truth, Many Datasets: How Context Shapes Model Extraction
RCT ID
AEARCTR-0018501
Initial registration date
April 29, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 04, 2026, 8:00 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Universitat Pompeu Fabra

Other Primary Investigator(s)

PI Affiliation
Universitat Pompeu Fabra

Additional Trial Information

Status
On going
Start date
2026-04-01
End date
2026-09-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
This study investigates how people extract information from data when they have no prior knowledge of the underlying relationship between explanatory variables and the outcome. Participants observe datasets consisting of colored lights (on or off) and a sound, and must figure out a rule (mapping from the lights' status to the sound) to predict when the sound occurs. The study evaluates theoretical predictions about how the statistical properties of a dataset influence rule extraction, with participants recruited online via Prolific from the United Kingdom.
External Link(s)

Registration Citation

Citation
Salvanti, Andrea and Patrick Sewell. 2026. "One Truth, Many Datasets: How Context Shapes Model Extraction." AEA RCT Registry. May 04. https://doi.org/10.1257/rct.18501-1.0
Experimental Details

Interventions

Intervention(s)
This experiment tests a cost–benefit analysis of rule extraction. Participants observe datasets consisting of two or three lights (explanatory variables) that vary in color and can be either on or off, paired with a sound (dependent variable). Participants receive no information about the underlying statistical relationships. We examine how the probability of extracting each rule (i.e. a mapping from the light configurations to the prediction of the sound) depends on the statistical properties of the dataset.
Intervention Start Date
2026-05-01
Intervention End Date
2026-09-30

Primary Outcomes

Primary Outcomes (end points)
The main dependent variable will be the share of subjects extracting each rule, depending on its relative value and the number of explanatory variables (lights).


Primary Outcomes (explanation)
Rule extraction is inferred using a maximum likelihood classification method that assigns participants to strategy types based on their observed choice patterns. We additionally report robustness analyses in the appendix using a binomial testing approach following Kendall and Oprea (2025) to validate the main classification results.

Secondary Outcomes

Secondary Outcomes (end points)
We also measure participants' confidence and time spent learning.
Secondary Outcomes (explanation)
Using the measure of subjective complexity proposed by Agranov et al. (2025), we will examine the relationship between perceived task complexity and rule extraction outcomes

Experimental Design

Experimental Design
Each participant observes five datasets, each with a unique rule attaining the highest accuracy. Participants first read instructions and complete comprehension checks. They then observe each dataset in turn, taking notes as they wish, before predicting the sound. We collect 12 predictions per dataset in the main setting with 2 lights and 16 predictions for the setting with 3 lights. At the end of the experiment, we also elicit measures of self-confidence and the strategy participants used to solve the tasks.
Experimental Design Details
Not available
Randomization Method
Participants are recruited through Prolific in sessions of 50 subjects each. Each session includes five datasets, each associated with an optimal decision rule. The composition of datasets within a session is determined ex-ante to balance cognitive load across participants (e.g., ensuring variation in task difficulty within a session). Participants have no prior knowledge of which datasets they will encounter.

For each dataset, every participant is exposed to a single condition. The between-subjects design means that comparisons across conditions are conducted between participants within each dataset. Condition assignments are balanced to ensure equal sample sizes per condition within each dataset.
Randomization Unit
The unit of randomization is the individual participant.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
0
Sample size: planned number of observations
We will recruit 250 participants in total. No observations will be excluded from the main analysis among participants who complete the experiment; however, participants who drop out before completion will be replaced by new subjects, and responses with completion times excessively fast will be excluded and similarly replaced. Robustness checks that we will report in the appendix will restrict the sample to: (1) participants whose maximum-likelihood strategy estimate achieves a posterior accuracy above 80%, and (2) participants who pass all comprehension checks. The 250 participants is the target before the robustness checks. Participants will be recruited on Prolific from the United Kingdom and will be required to be native English speakers, aged 18–50, with at least a high school education (including higher education up to the doctoral level). To ensure data quality, only participants with an approval rate of at least 99% will be included.
Sample size (or number of clusters) by treatment arms
Each of the three treatments with two lights includes 50 participants, with a total of 150. The two treatments with three lights likewise include 50 participants each, bringing the total to 250.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Sample size was determined based on results from a pilot study. For between-subject comparisons within each dataset, detecting a 25 percentage-point difference (from a baseline of 40%) between the First and Second condition at a 5% significance level with 80% power requires approximately 50 participants per group for a directional statistical test. For the within-subject analysis pooling across conditions, a sample of 150 participants provides approximately 90% power at a 1% significance level under the effect sizes observed in the pilot study.
IRB

Institutional Review Boards (IRBs)

IRB Name
Institutional Committee for Ethical Review of Projects (CIREP) at Universitat Pompeu Fabra
IRB Approval Date
2026-04-09
IRB Approval Number
468