Do people trust generative AI, and is it trustworthy? An experiment with ChatGPT

Last registered on July 16, 2024

Pre-Trial

Trial Information

General Information

Title
Do people trust generative AI, and is it trustworthy? An experiment with ChatGPT
RCT ID
AEARCTR-0013792
Initial registration date
July 11, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 16, 2024, 3:31 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Bentley University

Other Primary Investigator(s)

PI Affiliation
University of Minnesota
PI Affiliation
University of Washington

Additional Trial Information

Status
In development
Start date
2024-07-15
End date
2024-08-30
Secondary IDs
n/a
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We measure whether and how much people trust generative AI by having them play Trust Games (Berg et al. 1995) with ChatGPT. Subjects are given $10 and are can send any portion of this to ChatGPT, which is tripled. ChatGPT then decides how much of what it receives to the subject. There are three treatments: Treatment Self-Interest, Treatment Fairness, and Treatment No Prompt.

In each treatment, we vary the prompt that instructs ChatGPT how to play the game. In Treatment Self-Interest, ChatGPT is told to determine how much to return to the subject "as a rational person would." In Treatment Fairness, ChatGPT is told to determine how much to return to the subject "as a person who believes in fairness and reciprocity" would. In Treatment No Prompt, ChatGPT is given no instructions on how it should decide how much to return to the subject.

The experiment is within-subject; each person plays the game three times (once for each treatment). We randomize the order of the treatments. One game is randomly chosen to determine payments to the subject.
External Link(s)

Registration Citation

Citation
Livingston, Jeffrey, Kobe Rankich and Samson Shen. 2024. "Do people trust generative AI, and is it trustworthy? An experiment with ChatGPT." AEA RCT Registry. July 16. https://doi.org/10.1257/rct.13792-1.0
Experimental Details

Interventions

Intervention(s)
Experiment 1:
We measure whether and how much people trust generative AI by having them play Trust Games (Berg et al. 1995) with ChatGPT. Subjects are given $10 and are can send any portion of this to ChatGPT, which is tripled. ChatGPT then decides how much of what it receives to the subject. There are three treatments: Treatment Self-Interest, Treatment Fairness, and Treatment No Prompt.

In each treatment, we vary the prompt that instructs ChatGPT how to play the game.

In Treatment Self-Interest, ChatGPT is told to determine how much to return to the subject "as a rational, self-interested person would."

In Treatment Fairness, ChatGPT is told to determine how much to return to the subject "as a person who believes in fairness and reciprocity" would.

In Treatment No Prompt, ChatGPT is given no instructions on how it should decide how much to return to the subject.

The experiment is within-subject; each person plays the game three times (once for each treatment). We randomize the order of the treatments. One game is randomly chosen to determine payments to the subject.

The full experiment interface and all instructions can be viewed in the document attached in the Docs & Materials section.

Experiment 2:
With the same set of treatments, we run the same trust game where we send ChatGPT $10 in every play and record how much ChatGPT decides to return. We do this with several different versions of ChatGPT:
1. ChatGPT 3.5
2. ChatGPT 4
3. ChatGPT 4o
4-6: The same versions, but through the ChatGPT API instead of the main interface at chat.openai.com.
Intervention (Hidden)
Intervention Start Date
2024-07-15
Intervention End Date
2024-08-30

Primary Outcomes

Primary Outcomes (end points)
Experiment 1:
1. The amount of money (out of $10) that the subject decides to send to ChatGPT in the first move of the trust game.

Experiment 2:
1. The amount that ChatGPT decides to return to the subject, (raw amount out of $30).
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Experiment 1:
1. The amount that ChatGPT decides to return to the subject, both the raw amount and as a percentage of the amount received.

2.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Experiment 1:
We measure whether and how much people trust generative AI by having them play Trust Games (Berg et al. 1995) with ChatGPT. The subject is the first mover. They are given $10 and are can send any portion of this to ChatGPT, which is tripled. ChatGPT is the second mover. It then decides how much of what it receives to the subject. There are three treatments: Treatment Rational, Treatment Fairness, and Treatment No Prompt. The experiment is within-subject; each person plays the game three times (once for each treatment). We randomize the order of the treatments. One game is randomly chosen to determine payments to the subject.

In each treatment, we vary the prompt that instructs ChatGPT how to play the game.

In Treatment Rational, ChatGPT is told to determine how much to return to the subject "as a rational person would." The prompt that is entered to get ChatGPT to play the game reads as follows:
"Play a game with me where I am given $10. I will first tell you how much of this $10 I want to send to you. Whatever I send is tripled, so if I send you $10, you receive $30 and I have $0. If I send $5 to you, you receive $15 and I have $5. You can then choose to return any amount of what you receive back to me, including keeping all of the money for yourself, but this amount is not tripled. We will play the game one time. I will start by telling you how much I wish to send to you. At the end, state how much money each of us end up with. My payment should be the amount of money that I keep instead of sending it to you, plus the amount that you send back to me. When deciding how much to return to me, you should play as a rational, self-interested person would."

In Treatment Fairness, ChatGPT is told to determine how much to return to the subject "as a person who believes in fairness and reciprocity" would. The prompt that is entered to get ChatGPT to play the game reads as follows:
"Play a game with me where I am given $10. I will first tell you how much of this $10 I want to send to you. Whatever I send is tripled, so if I send you $10, you receive $30 and I have $0. If I send $5 to you, you receive $15 and I have $5. You can then choose to return any amount of what you receive back to me, including keeping all of the money for yourself, but this amount is not tripled. We will play the game one time. I will start by telling you how much I wish to send to you. At the end, state how much money each of us end up with. My payment should be the amount of money that I keep instead of sending it to you, plus the amount that you send back to me. When deciding how much to return to me, you should play as a person who believes in fairness and reciprocity would."

In Treatment No Prompt, ChatGPT is given no instructions on how it should decide how much to return to the subject. The prompt that is entered to get ChatGPT to play the game reads as follows:
"Play a game with me where I am given $10. I will first tell you how much of this $10 I want to send to you. Whatever I send is tripled, so if I send you $10, you receive $30 and I have $0. If I send $5 to you, you receive $15 and I have $5. You can then choose to return any amount of what you receive back to me, including keeping all of the money for yourself, but this amount is not tripled. We will play the game one time. I will start by telling you how much I wish to send to you. At the end, state how much money each of us end up with. My payment should be the amount of money that I keep instead of sending it to you, plus the amount that you send back to me."

After completing the three plays of the game, the subject completes a 16 question survey which asks questions about how trusting the person is, demographic information, the subject's previous experiences with ChatGPT, and their expectations for the impacts ChatGPT will have in the future The survey questions can be viewed in the document attached in the Docs & Materials section.

This document also displays the full experiment interface.

Experiment 2:
With the same set of treatments (No Prompt, Rational, and Fairness & Reciprocity), we run the same trust game where we send ChatGPT $10 in every play and record how much ChatGPT decides to record. We do this with several different versions of ChatGPT:
1. ChatGPT 3.5
2. ChatGPT 4
3. ChatGPT 4o
4. ChatGPT 3.5 API version
5. ChatGPT 4 API version
6. ChatGPT 4o API version
Experimental Design Details
Randomization Method
The order in which the subjects do each treatment is determined via simple randomization. Each subject's order is determined randomly by computer and independently for each subject.
Randomization Unit
Individual subject.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
n/a
Sample size: planned number of observations
Experiment 1: 360 Experiment 2: 1800
Sample size (or number of clusters) by treatment arms
Experiment 1: 120 per treatment order, though since we are dong a simple randomization, there may be a few more or a few less than 120 assigned to each treatment order.

Experiment 2: 100 per treatment condition (300 total) for each of the 6 versions of ChatGPT, so 1800 total.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
In a meta-analysis of trust game studies, Johnson and Mislin (2011) report an average amount sent in the first move of 50% of the endowment, with a standard deviation of 0.12. With a $10 endowment, this implies a mean amount sent of $5 with a standard deviation of 1.2. For our power calculations, we assume this mean and standard deviation will apply to our sample as well. For the main outcome variable - the amount sent by the subject in the first move of the trust game - we plan on using both the within-subject variation and the between-subject variation acheived by our design, which randomizes the order of the treatments. The between-subject analysis will use only the data from the subject' first plays, since the treatment assigned to each subject in the first play is effectively randomized by randomizing the treatment order. For the within-subject analysis, using a simple t-test, the minimum detectable effect size assuming 5% significance level and 80% power is 0.258 standard deviations. This implies a MDE of $0.31 between a pair of treatments. For the between-subject analysis, using a simple t-test, the minimum detectable effect size assuming 5% significance level and 80% power is 0.363 standard deviations. This implies a MDE of $0.44 between a pair of treatments.
Supporting Documents and Materials

Documents

Document Name
Experiment interface
Document Type
survey_instrument
Document Description
This document contains screenshots of each screen that a subject can view during the experiment.
File
Experiment interface

MD5: 743f77e6db43ed77d5043266bb41dc93

SHA1: 015e16f91fa6dce589d090757a8b898e41a9b3cf

Uploaded At: July 11, 2024

IRB

Institutional Review Boards (IRBs)

IRB Name
Bentley University
IRB Approval Date
2023-09-05
IRB Approval Number
230905006
Analysis Plan

Analysis Plan Documents

PAP

MD5: b467f8a49901fb5c6ae158864b9ce649

SHA1: c4895bf223177ce5be1e077e8efcf9581354c1d1

Uploaded At: July 11, 2024

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials