Ability, Calibration, and Leadership in AI-Augmented Teams

Last registered on May 11, 2026

Pre-Trial

Trial Information

General Information

Title
Ability, Calibration, and Leadership in AI-Augmented Teams
RCT ID
AEARCTR-0018593
Initial registration date
May 10, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 11, 2026, 9:32 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation

Other Primary Investigator(s)

PI Affiliation
Nanyang Technological University
PI Affiliation
Nanyang Technological University
PI Affiliation
Nanyang Technological University

Additional Trial Information

Status
In development
Start date
2026-05-18
End date
2026-05-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
This project investigates how AI integration modulates team dynamics and performance across both flat (leaderless) and hierarchical (leader-led) organizational structures. In leaderless settings, the research evaluates the impact of AI on collective output and individual member confidence, specifically examining whether the introduction of AI obscures the "star effect" typically driven by consistently high-performance individuals. Within the leadership framework, the study determines how the sophistication of AI assistance—ranging from standard AI and certainty AI to personalized AI interacts with the method of leader selection. By comparing teams with self-promoted versus randomly assigned leaders, the research aims to analyze how leadership legitimacy and AI modality jointly influence team outcomes and decision-making efficacy.
External Link(s)

Registration Citation

Citation
Fan, Yujie et al. 2026. "Ability, Calibration, and Leadership in AI-Augmented Teams." AEA RCT Registry. May 11. https://doi.org/10.1257/rct.18593-1.0
Experimental Details

Interventions

Intervention(s)
Intervention Start Date
2026-05-18
Intervention End Date
2026-05-31

Primary Outcomes

Primary Outcomes (end points)
Team performance: average score for image recognition tasks at the team level
Individual performance: individual score for image recognition tasks
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Study 1: Experiment with AI in Leaderless Groups
This study investigates collective performance in the absence of a designated leader. The procedure is divided into two sequential stages:
Stage I: Baseline Ability and Calibration Assessment
Recruitment: Participants are invited to the university laboratory to perform the experimental tasks.
Individual Assessment: To establish a baseline for ability and calibration, participants first complete 15 individual versions of the image recognition task.
Control Condition: For the control group, these individual tasks are performed independently without AI assistance to measure pure human diagnostic accuracy.
Stage II: Group Performance and Exit Survey
Group Tasks: Participants are organized into groups of four to complete 4 rounds of collective image recognition tasks. To ensure robustness, question difficulty is randomized within subjects across these rounds.
Debriefing: The session concludes with a post-experiment survey capturing demographic information and subjective feedback.

Study 2: Experiment with AI with Designated Leadership
This study explores the impact of leadership and AI-assisted calibration on group decision-making.
Stage I: Managerial Selection and AI-Human Calibration
Initial Testing: Participants complete 10 individual image recognition tasks. In the control condition, participants perform these tasks with the assistance of AI to facilitate calibration.
Validation: Following the AI-assisted phase, participants perform the same 10 tasks alone to assess their independent judgment after AI exposure.
Manager Preference Elicitation: To assign a group leader (manager), the study employs two distinct methods:
Volunteer Selection: Based on participants' willingness to lead.
Random Selection: To serve as a baseline for leadership comparison.
Stage II: Group Performance and Exit Survey
Group Tasks: Participants engage in 4 rounds of 4-person group tasks, with question difficulty randomized across rounds to control for order effects.
Debriefing: The session concludes with a comprehensive post-experiment survey focusing on leadership effectiveness and task experience.
Experimental Design Details
Not available
Randomization Method
The first study is leaderless, following Weidmann and Deming (2026). It has one control and one treatment group. We will estimate the team player index. For each participant, we estimate this index.

The Teamplayer Index is a novel measure of individual performance in the context of collective problem solving. A detailed description of how the index is conceptualized and calculated is provided in the attached Statistical Analysis Plan.

In brief: the Teamplayer Index for participant i is the average performance of the groups to which i was allocated, conditional on the individual skill of each of i's groups. Group performance is assessed using the image recognition task, adapted from Fügener et al. (2022).

The experiment has two stages. In the first stage, participants in the control group will individually complete 30 image-recognition tasks. In the treatment groups, players will complete the 30 image-recognition tasks with the help of AI. At the end of stage 1, participants will complete a self-reflection on providing a point estimate and a 90% confidence interval for the correct number of questions in stage 1.

In the second stage, participants will complete tasks for group image recognition. Three players form a group, and they will finish 4 rounds of image recognition tasks, in total 4 * 3 = 12 group image recognition tasks. Each round has 3 questions; the group composition does not change within a round. Following Weidmann et al. (2026), the randomization rule is that players who have cooperated before will never meet again in subsequent rounds.

For example:

- The n participants in each session are divided into three equal blocks of size n/3, based on their overall score in stage 1 (without leader experiments) and part 1C (with leader experiments) [if n is not a multiple of three, excess participants are paid for their time and asked to return to another session]. Blocks can loosely be thought of as ‘high’, ‘medium’, and ‘low’ in terms of individual skill.

- There are three bags, one for each block. Each bag has n/3 balls, marked with a letter. Bags have consecutive letters. For example, in a session of 9 people, the bag for ‘high’ performance will have balls labeled A, B, C; the ‘medium’ bag will have balls labeled D, E, F; and the final bag will have balls labeled G, H, I.

- Each participant draws a ball. This ball defines their 2 groups for that session. To return to the example of a 9-person session, allocations would be as follows: first set of groups: {ADG, BEH, CFI}; second set of groups: {AEI, BFG, CDH}.

In the control group, they will first submit their answers and have the chance to discuss the questions with their teammates via a live chatbox. In the treatment group, players will first submit their answers with the help of AI and be given the chance to discuss the questions with their teammates via a live chatbox. Finally, a randomly selected representative will submit the answer that the whole group agrees on. At the end of stage 2, participants will complete a self-reflection on providing a point estimate and a 90% confidence interval for the correct number of questions in stage 2.

In the second study, we follow Weidmann et al. (2026) on the randomization method. The study has 2 control groups (C1: no AI random leader; C2: no AI self-promoted leader) and 6 treatment groups (T1: AI random leader; T2: personalized AI random leader; T3: certainty AI random leader; T4: AI self-promoted leader; T5: personalized AI self-promoted leader; T6: certainty AI self-promoted leader).

We first randomize sessions to either the ‘self-selection' or ‘random’ conditions. Participants in the self-selection condition rate their preference for being a manager for the remainder of the experiment on a scale of 1 to 10. Those with the strongest preference are assigned to be the manager (if there is a tie, roles are randomized). Those in the ‘random’ condition will be randomized to their role.

We also randomly assign each manager to eight different three-person teams over the course of the experiment. Workers are randomly assigned to managers, with two constraints: if possible, we avoid groupings in which workers are assigned to a manager they have worked with before. Second, workers are allocated to a manager at most once.

In the first stage, participants will individually complete 18 image-recognition tasks. These 18 tasks are divided into 3 parts. Part 1 has 6 questions, and these are done without AI. Part 2 has 6 questions, and these are finished with AI. Part 3 has 6 questions. At the end of stage 1, participants will complete a self-reflection on providing a point estimate and a 90% confidence interval for the correct number of questions in stage 1.

In the second stage, participants will complete tasks for group image recognition. Three players form a group, consisting of 1 leader and 2 workers, and they will finish 4 rounds of image recognition tasks, in total 4 * 3 = 12 group image recognition tasks. Each round has 3 questions; the group composition does not change within a round.

Following Weidmann (2024), the randomization rule is that players who have cooperated before will never meet again in subsequent rounds. In the control group, leaders will first submit their answers and then discuss the questions with their teammates via a live chatbox. In the treatment group, leaders will first submit their answers using a specific AI and then have the chance to discuss the questions with their teammates via a live chatbox. Teammates cannot chat with each other and have no access to AI throughout stage 2.

Finally, a randomly chosen representative in the random-leader treatments and a self-selected leader in the self-promoted treatments will submit an answer after discussing with the team members. At the end of stage 2, participants will complete a self-reflection on providing a point estimate and a 90% confidence interval for the correct number of questions in stage 2.
Randomization Unit
For the Without-Leader Study:

Randomization occurs at the group level in each round. We employ a protocol for triads, ensuring that no participant interacts with the same individual more than once throughout the session to maintain independence of observations.

For the With-Leader Study:

"Randomization is implemented at the group level with fixed roles. Leaders remain constant across all rounds, while workers are randomly reassigned to leaders in each round. The group composition ensures that no worker encounters the same leader or peer twice.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
Groups consist of three participants. In the no-leader study, groups are composed of three equal members, whereas in the leader study, each group includes one leader and two members.
Sample size: planned number of observations
The study aims to recruit 720-850 participants.
Sample size (or number of clusters) by treatment arms
We plan to do 6 sessions for the without a leader experiment, with 3 sessions each for control and treatment groups. We plan to have 24 participants per session.

For the with leader experiment, we plan to do a total of 8*3=24 sessions, with each group having 3 sessions. We plan to have 24 participants per session.

We plan to run 3 sessions for each condition, with 24 participants per session, resulting in a total of 144 participants for the experiment without a leader and 576 with a leader.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
For study 1, without a leader design, to achieve a statistical power of 0.95 for the study comparing a control group and a treatment group (1*2 design), the recruitment target is approximately 252 to 264 participants, organized into 84 to 88 independent triads (3 participants per cluster). This estimate is based on detecting a medium effect size (f = 0.25) with a significance level of 0.05. To account for the nested data structure, the sample size was adjusted using a Design Effect of 1.2 (assuming an Intraclass Correlation Coefficient of 0.10). Consequently, each experimental condition will require at least 42 groups to ensure robust statistical inference. For study 2, with a leader design to achieve a statistical power of 0.95 for a 4*2 factorial design with a medium effect size (f = 0.25) and a significance level of 0.05, a power analysis was conducted. Considering the nested structure of the data (3 participants per cluster), the sample size was adjusted using a Design Effect (DE) calculation. Assuming a conservative Intraclass Correlation Coefficient (ICC) of 0.1, the required sample size was inflated by a factor of 1.2. Consequently, the study aims to recruit a total of 336 to 360 participants, organized into 112 to 120 triads, to ensure robust detection of both main effects and interaction effects.
IRB

Institutional Review Boards (IRBs)

IRB Name
The NTU Institutional Review Board
IRB Approval Date
2026-05-07
IRB Approval Number
IRB-2026-280
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information