How People Use Statistics

Last registered on May 23, 2024

View Trial History

Pre-Trial

Trial Information

General Information

Title

How People Use Statistics

RCT ID

AEARCTR-0011166

Initial registration date

April 05, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

April 13, 2023, 3:36 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

May 23, 2024, 11:43 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

John Conlon

Affiliation

Harvard University

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Pedro Bordalo

PI Affiliation

PI Name

Nicola Gennaioli

PI Affiliation

PI Name

Spencer Kwon

PI Affiliation

PI Name

Andrei Shleifer

PI Affiliation

Additional Trial Information

Status

Completed

Start date

2023-04-06

End date

2023-07-01

Keywords

Behavior, Lab

Additional Keywords

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

We study how people solve statistical problems. We test a model in which selective attention to different features of problems yields multimodality in beliefs. Our experiments are designed to change the distribution of participants beliefs by manipulating the contrast or prominence of various features while (typically) holding constant the underlying statistical problem.

External Link(s)

Registration Citation

Citation

Bordalo, Pedro et al. 2024. "How People Use Statistics." AEA RCT Registry. May 23. https://doi.org/10.1257/rct.11166-1.2

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention Start Date

2023-04-06

Intervention End Date

2023-04-14

Primary Outcomes

Primary Outcomes (end points)

Our primary questions of interest are variants on the "balls-and-urns" inference paradigm and on questions about coin flips ("Gambler's Fallacy" problems).

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

We will also elicit from participants free-text responses of how they solved each problem and self-reports of which features they felt they were paying attention to. Next, we will have participants rate the similarity between pairs of sequences of coin flips, to back out which features they attend to, as well as their beliefs about the absolute frequency of various sequences. We plan to correlate judgments of average similarity with these frequency judgments.

Secondary Outcomes (explanation)

Experimental Design

The experiment has two main questions—one inference problem and one gambler's fallacy problem—that occur in a random order at the beginning of the experiment. After each of them, we elicit self-reports of how participants solved each question as well as what features of the problem they felt they were paying attention to.

Experimental Design Details

There are 15 inference-problem treatments, described briefly below. Our primary focus is on the fraction of participants whose beliefs fall into different "modes". We hypothesize that these modes will be the following: the base rate, 50-50, (close to) the bayesian answer, and the "likelihood" (i.e., P(Signal | Hypothesis)). We also hypothesize that some respondents will answer with P(Signal & Hypothesis) (i.e., failing to renormalize given the likelihood of the signal conditional on the alternative hypothesis).

1. Balls-and-urns control condition
2. Blue-cab green-cab problem. We hypothesize that this will increase the mode at the likelihood, compared to treatment 1.
3. "Undermine" cabs. We hypothesize this will reduce the mode at the likelihood compared to treatment 2.
4. "Cabified" balls-and-urns. We hypothesize this will increase the mode at the likelihood compared to treatment 1.
5. Balls-and-urns will less extreme likelihood.
6. Balls-and-urns with more extreme likelihood. We hypothesize this will increase the mode at the likelihood and bayesian answer compared to treatment 5, compared to the base rate and P(Signal & Hypothesis).
7. Complicated signal (5 green balls and 4 blue, rather than just 1 green ball). We hypothesize this will boost the mode at the base rate compared to treatment 1.
8. 2 Green Signals. We hypothesize multimodality in these beliefs, but are not comparing them to another treatment.
9. 1 Green Signal, 1 Irrelevant signal.
10. 1 Green Signal, No Irrelevant Signal. We hypothesize that treatment 9 will have an increased mode at the base rate or at 50-50 compared to this treatment.
11. Balls and urns but only explicitly asking about one hypothesis. We hypothesize that this will increase the mode at P(Signal & Hypothesis) compared to treatment 1.
12. "Small Green Urn". Base rate = 50%, P(Green | Jar A) = 50%. P(Green | Jar B) = 100%. But, problem is described in terms of frequencies (how many marbles in each jar). Jar B has 5 green marbles, and Jar A has 5 green and 5 blue.
13. "Big Green Urn" Same as treatment 12, but Jar B has 15 green marbles. We hypothesize a shift away from 50-50, and toward 25% (the ratio of green marbles in Jar B compared to Jar A), compared to treatment 12.
14. "Elementary description" Same statistical problem as Treatment 1, but the probability of each event (e.g. a green marble from Jar A) is described individually. We hypothesize an increased mode around the bayesian answer compared to treatment 1.
15. Elementary description with alternative implicit. We hypothesize this will increase the mode at P(Signal and Hypothesis) compared to treatment 14.

Our model suggests that attention to different features of the problem correlate with which mode beliefs sort into. We will measure attention using both participants' free-text responses of how they solved the problem as well as their answers to the questions asking directly which features they were paying attention to.

We hypothesize that attention to: 1) the color of the blue-vs-green marble/cab will correlate with answering with the likelihood, 2) the urn/cab company will correlate with the base rate, 3) the "match" between urn and signal or to whether the witness's report is correct will correlate with answering with the likelihood, 4) both color/match and urn correlate with the Bayesian answer, 5) nothing correlates with 50-50, and 6) the irrelevant signal correlates with 50-50 or the base rate.

We have six gambler's fallacy treatments:

1. TH vs HH: Asks the relative frequency of these 2-flip sequences of coin flips.
2. THTHHT vs HHHHHH: we hypothesize this will decrease the mode at 50-50 and shift the mean belief down (where lower beliefs correspond to committing the gambler's fallacy more) compared to treatment 1.
3. HHHHHT vs HHHHHH.
4. P(H | HHHHH): Same as treatment 3, except it is emphasized how the problem is asking for the probability that the final flip is heads vs tails conditional on the first five flips all being heads (rather than just asking about the likelihood of each of these sequences as a whole). We hypothesize this will increase the mode at 50-50 compared to treatment 3.
5. Priming control condition. Before participants answer the main question (which will be about THTHHT vs HHHTHH), they will rate 15 pairs of sequences by how similar they are to each other, where this means how many individual flips differ between them (e.g., first flip is heads in one but tails in the other).
6. Priming share heads: Same as treatment 5 but the ratings questions will ask about what share of flips in each sequence are heads vs tails and ask participants to rate differences between them on this basis. We intend this treatment to boost attention paid to share heads in the main gambler's fallacy question (about THTHHT vs HHHTHH), which we will measure using self-reported attention. If it succeeds in sufficiently boosting attention to share heads, we hypothesize that it will reduce the fraction of participants who answer with 50-50 and lower the mean belief (where lower beliefs correspond to exhibiting the gambler's fallacy more).

Our model suggests that attention to the share of heads vs tails will correlate with committing the gambler's fallacy.

In April 2024, we will run additional treatments with the following design. All participants will answer two inference problems and two "compound probability reduction" problems. These will allow us to see whether answers are correlated across problems within person. The second inference problem is always an identical balls-and-urns problem. The first inference problem is either has a low (60%) or high (90%) likelihood, meant to vary the contrast of the signal and therefore boost the share of participants who respond with the likelihood. This design will allow us to test the extent to which treatment effects stemming from altering the first inference problem spillover onto participants' answers to the second inference problem, which is held fixed. We will also vary whether the first inference problem has a balls-and-urns (N=2,500) or taxicabs (N=500) framing to test whether there is a greater correlation between answers within vs across frames. We include a larger sample size for balls-and-urns to maximize precision to test the spillover effects of varying the contrast in the first problem on the second problem (which is always a balls-and-urns problem, and we expect spillovers, if there are any, to arise within rather than across frames).

The compound probability problems are meant to test whether people's answers tend to cluster on the modes our model predicts as well as investigate correlation across problems. The problems simply tell participants a prior (the odds a computer will choose the "orange" vs the "purple" deck of cards) and likelihoods for each deck (the share of cards whose suit is spades). They then ask given these numbers the probability that a spades is drawn. Participants will be evenly split across 4 parameterizations.

In May 2024, we will run additional treatments with the following design. Participants will solve two gambler's fallacy problems separated by a distraction task (Raven's matrices). The second problem is held constant (comparing ththht vs hhhhhh) and we randomize the first problem (either comparing th vs hh or hthtth vs hhhhhh). We will run N = 1000 people total, evenly split between these two first-problem treatments. Like the April experiment, this is to test for correlation in answers across problems that vary in their similarity to each other, as well as spillover effects of inducing one mode of answer in the first problem (th vs hh tends to produce 50-50 answers more than ththht vs hhhhhh) has spillover effects on later problems.

Randomization Method

All randomization will be done within Qualtrics using either its survey flow randomizer or javascript embedded in the survey.

Randomization Unit

Treatment assignment is randomized at the individual level.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

We are randomizing at the individual level, so there are no clusters.

Sample size: planned number of observations

We hope to recruit 4800 participants through prolific. In addition, in our April 2024 experiments, we will recruit another 3,000 participants.

Sample size (or number of clusters) by treatment arms

Intended sample sizes for each inference treatment are below:

Treatment 1: 500
Treatment 2: 200
Treatment 3: 200
Treatment 4: 200
Treatment 5: 500
Treatment 6: 500
Treatment 7: 200
Treatment 8: 200
Treatment 9: 500
Treatment 10: 500
Treatment 11: 500
Treatment 12: 200
Treatment 13: 200
Treatment 14: 200
Treatment 15: 200

For the gambler's fallacy treatments, intended sample sizes are as follows:

Treatment 1: 400
Treatment 2: 400
Treatment 3: 1000
Treatment 4: 1000
Treatment 5: 1000
Treatment 6: 1000

Treatment is independently randomly assigned, so final sample sizes may differ somewhat from the above numbers due to chance.

See above for sample sizes in our April and May 2024 experiments.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

IRB