How People Use Statistics

Last registered on April 08, 2024

Pre-Trial

Trial Information

General Information

Title
How People Use Statistics
RCT ID
AEARCTR-0011166
Initial registration date
April 05, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
April 13, 2023, 3:36 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
April 08, 2024, 8:36 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
Harvard University

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
PI Affiliation
PI Affiliation

Additional Trial Information

Status
Completed
Start date
2023-04-06
End date
2023-07-01
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We study how people solve statistical problems. We test a model in which selective attention to different features of problems yields multimodality in beliefs. Our experiments are designed to change the distribution of participants beliefs by manipulating the contrast or prominence of various features while (typically) holding constant the underlying statistical problem.
External Link(s)

Registration Citation

Citation
Bordalo, Pedro et al. 2024. "How People Use Statistics." AEA RCT Registry. April 08. https://doi.org/10.1257/rct.11166-1.1
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2023-04-06
Intervention End Date
2023-04-14

Primary Outcomes

Primary Outcomes (end points)
Our primary questions of interest are variants on the "balls-and-urns" inference paradigm and on questions about coin flips ("Gambler's Fallacy" problems).
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
We will also elicit from participants free-text responses of how they solved each problem and self-reports of which features they felt they were paying attention to. Next, we will have participants rate the similarity between pairs of sequences of coin flips, to back out which features they attend to, as well as their beliefs about the absolute frequency of various sequences. We plan to correlate judgments of average similarity with these frequency judgments.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experiment has two main questions—one inference problem and one gambler's fallacy problem—that occur in a random order at the beginning of the experiment. After each of them, we elicit self-reports of how participants solved each question as well as what features of the problem they felt they were paying attention to.

Experimental Design Details
There are 15 inference-problem treatments, described briefly below. Our primary focus is on the fraction of participants whose beliefs fall into different "modes". We hypothesize that these modes will be the following: the base rate, 50-50, (close to) the bayesian answer, and the "likelihood" (i.e., P(Signal | Hypothesis)). We also hypothesize that some respondents will answer with P(Signal & Hypothesis) (i.e., failing to renormalize given the likelihood of the signal conditional on the alternative hypothesis).

1. Balls-and-urns control condition
2. Blue-cab green-cab problem. We hypothesize that this will increase the mode at the likelihood, compared to treatment 1.
3. "Undermine" cabs. We hypothesize this will reduce the mode at the likelihood compared to treatment 2.
4. "Cabified" balls-and-urns. We hypothesize this will increase the mode at the likelihood compared to treatment 1.
5. Balls-and-urns will less extreme likelihood.
6. Balls-and-urns with more extreme likelihood. We hypothesize this will increase the mode at the likelihood and bayesian answer compared to treatment 5, compared to the base rate and P(Signal & Hypothesis).
7. Complicated signal (5 green balls and 4 blue, rather than just 1 green ball). We hypothesize this will boost the mode at the base rate compared to treatment 1.
8. 2 Green Signals. We hypothesize multimodality in these beliefs, but are not comparing them to another treatment.
9. 1 Green Signal, 1 Irrelevant signal.
10. 1 Green Signal, No Irrelevant Signal. We hypothesize that treatment 9 will have an increased mode at the base rate or at 50-50 compared to this treatment.
11. Balls and urns but only explicitly asking about one hypothesis. We hypothesize that this will increase the mode at P(Signal & Hypothesis) compared to treatment 1.
12. "Small Green Urn". Base rate = 50%, P(Green | Jar A) = 50%. P(Green | Jar B) = 100%. But, problem is described in terms of frequencies (how many marbles in each jar). Jar B has 5 green marbles, and Jar A has 5 green and 5 blue.
13. "Big Green Urn" Same as treatment 12, but Jar B has 15 green marbles. We hypothesize a shift away from 50-50, and toward 25% (the ratio of green marbles in Jar B compared to Jar A), compared to treatment 12.
14. "Elementary description" Same statistical problem as Treatment 1, but the probability of each event (e.g. a green marble from Jar A) is described individually. We hypothesize an increased mode around the bayesian answer compared to treatment 1.
15. Elementary description with alternative implicit. We hypothesize this will increase the mode at P(Signal and Hypothesis) compared to treatment 14.

Our model suggests that attention to different features of the problem correlate with which mode beliefs sort into. We will measure attention using both participants' free-text responses of how they solved the problem as well as their answers to the questions asking directly which features they were paying attention to.

We hypothesize that attention to: 1) the color of the blue-vs-green marble/cab will correlate with answering with the likelihood, 2) the urn/cab company will correlate with the base rate, 3) the "match" between urn and signal or to whether the witness's report is correct will correlate with answering with the likelihood, 4) both color/match and urn correlate with the Bayesian answer, 5) nothing correlates with 50-50, and 6) the irrelevant signal correlates with 50-50 or the base rate.

We have six gambler's fallacy treatments:

1. TH vs HH: Asks the relative frequency of these 2-flip sequences of coin flips.
2. THTHHT vs HHHHHH: we hypothesize this will decrease the mode at 50-50 and shift the mean belief down (where lower beliefs correspond to committing the gambler's fallacy more) compared to treatment 1.
3. HHHHHT vs HHHHHH.
4. P(H | HHHHH): Same as treatment 3, except it is emphasized how the problem is asking for the probability that the final flip is heads vs tails conditional on the first five flips all being heads (rather than just asking about the likelihood of each of these sequences as a whole). We hypothesize this will increase the mode at 50-50 compared to treatment 3.
5. Priming control condition. Before participants answer the main question (which will be about THTHHT vs HHHTHH), they will rate 15 pairs of sequences by how similar they are to each other, where this means how many individual flips differ between them (e.g., first flip is heads in one but tails in the other).
6. Priming share heads: Same as treatment 5 but the ratings questions will ask about what share of flips in each sequence are heads vs tails and ask participants to rate differences between them on this basis. We intend this treatment to boost attention paid to share heads in the main gambler's fallacy question (about THTHHT vs HHHTHH), which we will measure using self-reported attention. If it succeeds in sufficiently boosting attention to share heads, we hypothesize that it will reduce the fraction of participants who answer with 50-50 and lower the mean belief (where lower beliefs correspond to exhibiting the gambler's fallacy more).

Our model suggests that attention to the share of heads vs tails will correlate with committing the gambler's fallacy.

In April 2024, we will run additional treatments with the following design. All participants will answer two inference problems and two "compound probability reduction" problems. These will allow us to see whether answers are correlated across problems within person. The second inference problem is always an identical balls-and-urns problem. The first inference problem is either has a low (60%) or high (90%) likelihood, meant to vary the contrast of the signal and therefore boost the share of participants who respond with the likelihood. This design will allow us to test the extent to which treatment effects stemming from altering the first inference problem spillover onto participants' answers to the second inference problem, which is held fixed. We will also vary whether the first inference problem has a balls-and-urns (N=2,500) or taxicabs (N=500) framing to test whether there is a greater correlation between answers within vs across frames. We include a larger sample size for balls-and-urns to maximize precision to test the spillover effects of varying the contrast in the first problem on the second problem (which is always a balls-and-urns problem, and we expect spillovers, if there are any, to arise within rather than across frames).

The compound probability problems are meant to test whether people's answers tend to cluster on the modes our model predicts as well as investigate correlation across problems. The problems simply tell participants a prior (the odds a computer will choose the "orange" vs the "purple" deck of cards) and likelihoods for each deck (the share of cards whose suit is spades). They then ask given these numbers the probability that a spades is drawn. Participants will be evenly split across 4 parameterizations.
Randomization Method
All randomization will be done within Qualtrics using either its survey flow randomizer or javascript embedded in the survey.
Randomization Unit
Treatment assignment is randomized at the individual level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
We are randomizing at the individual level, so there are no clusters.
Sample size: planned number of observations
We hope to recruit 4800 participants through prolific. In addition, in our April 2024 experiments, we will recruit another 3,000 participants.
Sample size (or number of clusters) by treatment arms
Intended sample sizes for each inference treatment are below:

Treatment 1: 500
Treatment 2: 200
Treatment 3: 200
Treatment 4: 200
Treatment 5: 500
Treatment 6: 500
Treatment 7: 200
Treatment 8: 200
Treatment 9: 500
Treatment 10: 500
Treatment 11: 500
Treatment 12: 200
Treatment 13: 200
Treatment 14: 200
Treatment 15: 200

For the gambler's fallacy treatments, intended sample sizes are as follows:

Treatment 1: 400
Treatment 2: 400
Treatment 3: 1000
Treatment 4: 1000
Treatment 5: 1000
Treatment 6: 1000

Treatment is independently randomly assigned, so final sample sizes may differ somewhat from the above numbers due to chance.

See above for sample sizes in our April 2024 experiment.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Harvard
IRB Approval Date
2020-11-09
IRB Approval Number
IRB20-1759

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials