Belief reactions to sequential AI predictions

Last registered on April 22, 2025

Pre-Trial

Trial Information

General Information

Title
Belief reactions to sequential AI predictions
RCT ID
AEARCTR-0011057
Initial registration date
April 16, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
April 22, 2025, 9:42 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
University of Stavanger

Other Primary Investigator(s)

PI Affiliation
Hanken School of Economics & Helsinki GSE
PI Affiliation
Lund University; Hanken School of Economics & Helsinki GSE

Additional Trial Information

Status
In development
Start date
2025-04-21
End date
2025-05-04
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Artificial intelligence (AI) has become integral in decision-making processes, with tools such as ChatGPT increasingly shaping outcomes. These AI systems are inanimate but operate based on human-coded algorithms. In this study, we explore how previous success and streaks influence beliefs over sequential AI predictions, comparing them to beliefs over sequences generated by humans and inanimate objects. We also examine how outcomes (the frequency of correct decisions) affect participants' willingness to switch to an alternative source of data generation. These insights may guide AI policy, such as efforts to address 'cognitive behavioral manipulation.'
External Link(s)

Registration Citation

Citation
Herold, Theo, Marco Lambrecht and Erik Wengström. 2025. "Belief reactions to sequential AI predictions." AEA RCT Registry. April 22. https://doi.org/10.1257/rct.11057-1.0
Experimental Details

Interventions

Intervention(s)
We conduct a two-stage experiment to investigate participants' beliefs over sequential AI predictions.

In the first stage, we compare belief reactions across three different treatments between subjects.

In the second stage, we compare across two treatments how participants react to information regarding their correct answers in the first stage.
Intervention (Hidden)
Our design features three treatments in the first stage: (i) human prediction outcomes, (ii) algorithmic prediction outcomes, and (iii) dice roll outcomes.

In the second stage, we reveal information about first stage sucess in two treatments: (i) correct answers align with modal answers in our pilot (whenever possible given the data from our previous experiment), and (ii) incorrect answers align with modal answers in our pilot (whenever possible given the data from our previous experiment).
Intervention Start Date
2025-04-21
Intervention End Date
2025-05-04

Primary Outcomes

Primary Outcomes (end points)
Belief reaction ("decision" in pre-analysis plan)
Primary Outcomes (explanation)
For each sequence, we elicit a binary decision variable = 1 if subject counts on the subsequent ninth outcome to be successful, otherwise = 0. For each hypothesis, we define reactions by comparing the corresponding decisions of inversed/reversed sequences (see our pre-analysis plan for more details).

Secondary Outcomes

Secondary Outcomes (end points)
Choose AI-generated sequence ("choose_ai" in pre-analysis plan)
Secondary Outcomes (explanation)
Binary variable = 1 if subject chooses to be provided with a sequence generated by an algorithm, otherwise = 0. Reactions take into account what sequences subjects have seen during the first stage.

Experimental Design

Experimental Design
Our design builds on a previous experiment and consists of two stages. In the previous experiment, we acquire binary outcomes from three different sources: (i) humans, (ii) an algorithm, and (iii) an inanimate pure chance object in the form of dice rolls.

In the first stage, we provide participants with outcomes from the previous experiment. Depending on treatment, outcomes will originate from one of the three sources of the previous experiment. Participants are tasked with predicting whether the ninth outcome in each sequence is correct or not.

In the second stage, participants receive information about their correct decisions made during the first stage and choose between a human- or algorithm-generated sequence of outcomes.
Experimental Design Details
We ran a preparatory stage which we refer to as "previous experiment". We invited participants to create series of sequential and binary outcomes that we use in our main experiment. We used daily historical S&P500 trading data and showed subjects information about the past performance of a stock, i.e., whether the stock has gone up or down on 5 consecutive trading days. We then asked them to predict whether the stock trended up or down on the 6th day. Each participant made nine such predictions for different, randomly chosen stocks from random points in time. Analogously, we performed the same task with AI-algorithms (ChatGPT and Microsoft Copilot). We also generated sequences from dice rolls where success depended on pre-defined thresholds.

In the first stage of our main experiment, we invite a new set of participants and randomly allocate them to one of three treatments. Participants are provided with 24 sequences from the preparatory stage. The sequences are identical across treatment, but the source of outcomes varies, i.e., sequences originate from a human, an AI-algorithm, or dice rolls depending on treatment. Each sequence consists of 8 subsequent outcomes. Subjects see the sequences in random order and choose whether to count one the ninth outcome in each sequence to be correct or not. In 16 of these sequences either the first half or the second half is a streak of 4 identical outcomes. The other 8 sequences alternate between successes and failures more frequently. Each sequence is provided in normal, reversed, and inversed order. Our sequences thus differ with respect to successes, streaks of successes or failures, and whether streaks are in the first or the second half of the sequence. This allows us to investigate reactions across treatments.

In the second stage, we randomly allocate participants to one of two treatments. We provide information regarding success in the first stage and let participants choose between another sequence which originates from a human or an algorithmic source. For half of our participants, the correct answer in the first stage is aligned with the modal answer of a previously run pilot. Naturally, one would expect a high rate of correct answers for participants in this treatment. For the other half, the incorrect answer is aligned with the modal answer of the pilot. Note that this treatment-dependent distinction of correct answers affects only 23 of 24 sequences. We can only utilize variation in the correct answer for sequences if we found such variation in the ninth outcome in the preparatory stage (i.e., no deception). This intervention allows us to study how participants react to variation in their correct answers across treatments. We analyze whether participants in the human/algorithmic treatment switch from their first stage source, and consider the dice treatment to serve as an informative benchmark to compare to.

The experiment concludes by eliciting a set of control variables. In particular, we measure statistical literacy, CRT scores, self-reported AI expertise and beliefs, gambler's fallacy, and self-reported demographics such as age and education level.
Randomization Method
Treatment allocation is done by a computer within oTree. Similarly, any randomization during our experiment is done within our oTree code. Randomization for the preparatory stage (e.g. the random selection of stock market data) was implemented in Stata. Likewise, the generation of sequences from dice rolls was implemented in Stata.
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
900 total subjects
Sample size: planned number of observations
900 individuals
Sample size (or number of clusters) by treatment arms
300 per first stage treatment, 450 per second stage treatment

Given our 3x2 treatment design, we elicit data from 150 individuals in each of the 6 groups (150 in human+modal correct, 150 in human+modal incorrect, 150 in AI+modal correct, 150 in AI+modal incorrect, 150 in dice+modal correct, 150 in dice+modal incorrect)
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We conduct non-parametric power calculations using the Wilcoxon Mann-Whitney U test for two independent samples, which we perform using the Sample Size Calculator version 1.063 developed and maintained by Robin Ristl (University of Vienna, Department for Statistics and Operations Research). We assume 300 participants per treatment group and an alpha-level of 0.05 (to emulate the first stage experiment), and investigate the minimum shift between group X and Y that can be detected at 80 percent power. Under these conditions, we can detect a shift of at least 6.6 percentage points (P(X > Y) = 0.434 and P(X > Y) = 0.566) in either direction with 80 percent power. We use simulations to estimate the minimum detectable effect size using a permutations test for n = 150 observations per treatment group (to emulate the second stage experiment). At alpha of 0.5 with 80 percent power, and assuming that the mean in three out of four groups is 0.5, we estimate that a shift of at least 0.15 in the fourth group would be detectable. See the pre-analysis plan for a more detailed account.
Supporting Documents and Materials

Documents

Document Name
Research Ethics Committee (IEC/IRB-body) – Protocol
Document Type
irb_protocol
Document Description
File
Research Ethics Committee (IEC/IRB-body) – Protocol

MD5: 118129c31c3b3817959699404f6c7a6c

SHA1: 32b2e6cbeaf38f4abe2cfff211ac45769244aa3f

Uploaded At: December 12, 2023

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Hanken Research Ethics Committee
IRB Approval Date
2023-12-05
IRB Approval Number
N/A
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials