Coordination and Leadership: the impact of Artificial Intelligence

Last registered on January 12, 2024

Pre-Trial

Trial Information

General Information

Title
Coordination and Leadership: the impact of Artificial Intelligence
RCT ID
AEARCTR-0012434
Initial registration date
November 06, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 15, 2023, 4:03 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
January 12, 2024, 2:11 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Primary Investigator

Affiliation
Alma Mater Studiorum - Università di Bologna

Other Primary Investigator(s)

PI Affiliation
Alma Mater Studiorum - Università di Bologna

Additional Trial Information

Status
In development
Start date
2023-11-20
End date
2024-05-18
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We investigate the effect of Artificial Intelligence as a source of help for the leader of a group that is facing a coordination dilemma. In an on-line experiment, subjects play a one-shot minimum effort game with leadership. The group leader must send a short message to their teammates to enhance coordination. First, the leader writes the message, then they see ChatGPT's output for the same task and decide whether to send their own text or the one produced by the chatbot. Followers are informed whether the leader sent their own message or the one produced by ChatGPT, before making their decision. With this approach, we want to compare the differences in coordination levels and elicited beliefs between the groups with a human-written leader's message and those with an AI-generated one.
External Link(s)

Registration Citation

Citation
Bigoni, Maria and Damiano Paoli. 2024. "Coordination and Leadership: the impact of Artificial Intelligence." AEA RCT Registry. January 12. https://doi.org/10.1257/rct.12434-4.0
Experimental Details

Interventions

Intervention(s)
Participants will play a one-shot Minimum Effort Game with a leader. The leader has to send a message to the other group members, trying to entice them to choose the highest effort level. The leader can send them his/her own human-written message, or the message generated by ChatGPT4 for the exact same task.
Intervention Start Date
2023-11-20
Intervention End Date
2023-11-24

Primary Outcomes

Primary Outcomes (end points)
The first key outcome variable is the Individual Effort Level chosen by participants.
Given that participants will be divided in groups of 5, the second key outcome variable will be the Minimum Effort in the group, defined as the lowest Individual Effort Level chosen among group members.
This experiment aims to measure the Average Treatment Effect of receiving a human-written or AI-generated leader's message on the Individual Effort Level's choice.
We will also elicit the beliefs of Leaders and Followers about other group members' choices, in order to understand the potential mechanism behind the effect that we expect to detect.

Main hypotheses:
1. H0: Leadership with AI’s message does not worsen coordination with respect to leadership with human message.
H1: Leadership with AI’s message leads to a lower Pareto-ranked equilibrium.

2. H0: Followers do not differentiate between AI and Human message.
H1: Followers show algorithm aversion, choosing a lower effort level when the message is AI-generated, with respect to when it’s human-written.

3. H0: Subjects’ beliefs about others’ actions do not change based on whether the message is AI-generated or human-written.
H1: Leaders and/or followers update their beliefs when the message is AI-generated, expecting lower effort levels chosen by their teammates than when the message is human-written.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Given that the leader's choice between human and AI message is completely endogenous, we will observe leaders' preferences on it. We will check the quality of the text written by leaders and check if there is a significant difference between those who choose to keep their own text, and those who prefer ChatGPT's output.
The quality of the texts will be assessed by a different sample of subjects, who are not informed about the purpose of the experiment and who do not know whether the messages have been written by a human participant or by ChatGPT.
Moreover, we will ask leaders to what extent they think they have been convincing, how much they felt under pressure for the role they were given and how much they believe to be suited for a role of responsibility. Followers will also be asked about the leader's message persuasiveness.
Finally, we will study whether the leaders' and followers' behavior are affected by gender, familiarity and trust in AI, risk aversion, education level and standard demographics.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The study is an asynchronous online survey experiment in which participants will answer questions without any real-time interaction with the others. The study is conducted entirely on Qualtrics.

Participants will play a one-shot Minimum Effort Game with leadership. The leader of the group has to send a message to the other group members; the message can be written by himself/herself or generated by Artificial Intelligence. Followers will receive the message before choosing their effort level.

We will elicit leaders' and followers' beliefs about the effort chosen by others.
Experimental Design Details
The study is an asynchronous online survey experiment in which participants will answer questions without any real-time interaction with the others. The study is conducted entirely on Qualtrics.

The experiment is based on a simple Minimum Effort Game (MEG), a coordination game with multiple Pareto-ranked equilibria. In its baseline version, each subject has to choose an effort level; then, the lowest effort level chosen among all group members is the one implemented in the group; payoffs are increasing with the group effort level, but decreasing with the individual effort (keeping group effort fixed). In our design, subjects play in groups of 5 players and can choose an effort level from 0 to 3.

In the baseline treatment, subjects will just play the standard MEG. Then, each "experienced" player will play again the MEG, filling the role of leader of the group, matched with 4 new participants who never played the MEG before. The leaders must write a text where they try to entice the other group members to choose the highest effort level. Then, the leaders see the ChatGPT’s output for the same task, and they can decide whether to keep their own text or use the one generated by the AI. There is no direct interaction between the leader and
the AI: leaders are only shown ChatGPT’s answer first by means of a recorded video, then as written text. Then, the message (written by the leader or generated by ChatGPT) is shown to all other group members, before they decide their effort level. Followers know whether the message is human-written or AI-generated and the leader knows that followers will have this piece of information: there is complete transparency.

In this way, we can observe three different treatments:
• Baseline MEG: played by the (future) leaders’ 1st part
• Leader/Human Message: 2nd part where the leader chooses to use his/her own text
• Leader/AI Message: 2nd part where the leader chooses to use ChatGPT’s output

However, the choice between AI and human message is obviously endogenous. Thus, to have balanced samples and sufficient statistical power to test a potential difference between treatments, we may need to discard some observations, in particular 2nd part leaders’ choices that are redundant. To do so without affecting subjects’ payoffs, at the beginning of the experiment we will tell the leaders that their earnings will depend on their results in the 1st or 2nd part.

In order to understand the mechanisms behind a potential difference between groups where the advice is human-written vs those where the advice is AI-generated, we need to elicit subjects’ beliefs. To do so, we ask two questions to followers, after they have chosen their effort level but before they see the result:
• Which effort level do you think the leader will choose?
• of the other three followers, how many will choose the highest effort level (3)?
We are interested in leaders’ beliefs too: in a similar way, we elicit their beliefs about how many followers will choose the maximum effort.

The quality of the text written by leaders could be a key variable to explain both leaders and followers' choices: to evaluate it, we will run a follow-up survey on a different sample of subjects, asking them to rate the messages written by leaders. Each subject will rate only a subsample (e.g. 20 out of 165) of the total messages. Moreover, we will insert among the messages to rate also the AI-generated message, and ask the subjects to identify it (similar to a Turing Test).

Regarding subjects' earnings, we will pay only 20% of the participants. The last question of the end-of-study survey will be a Beauty Contest: subjects must choose a number between 0 and 100, and the 20% of them who get closer to 15 + 2/3 of the average number chosen will be selected for receiving the payment.
Randomization Method
Randomization will be done at the recruitment level through ORSEE.
We will identify a sample of eligible subjects and randomize them in the three different treatments.
Randomization Unit
Randomization at the individual level (participants in the experiment).
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
We will have 33 groups of 5 subjects that play the game in each treatment.
Sample size: planned number of observations
429 subjects.
Sample size (or number of clusters) by treatment arms
165 subjects control, 132 AI message, 132 Human message.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
A one-tail Wilcoxon-Mann-Whitney test on the Individual Effort Level in the two treatments (AI message and Human message), including only followers observations (i.e. 132 per treatment), with alpha equal to 0.05 and power equal to 0.8, requires a minimum detectable Cohen's D equal to 0.314.
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Comitato di Bioetica Alma Mater Studiorum - Università di Bologna
IRB Approval Date
2023-09-27
IRB Approval Number
0289950
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials