Systematic determinants of group and team performance in a portfolio allocation task

Last registered on June 03, 2026

Pre-Trial

Trial Information

General Information

Title
Systematic determinants of group and team performance in a portfolio allocation task
RCT ID
AEARCTR-0018014
Initial registration date
May 25, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
June 03, 2026, 8:25 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
University of Bonn

Other Primary Investigator(s)

PI Affiliation
University of Bonn

Additional Trial Information

Status
In development
Start date
2026-05-25
End date
2026-09-02
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In a 2x3 design, this study investigates three factors that may matter for group and team decision-making and performance in a portfolio allocation task: cognitive composition, independence, and aggregation. Participants are recruited via Prolific, with an emphasis on recruiting retail investors. They are given a hypothetical $1,000 and financially incentivized to design portfolios that maximize realized Sharpe ratios after 3 months. They design a portfolio alone in Round 1 of the task and again as part of a two-member team in Round 2. We plan to recruit 600 participants to form 300 teams.

The two-condition treatment variable targets team cognitive composition, i.e., the distribution of thinking styles among teammates. We define thinking styles as approaches to processing information and making decisions. These are measured psychometrically using instruments for analytic and intuitive thinking styles, which we will validate with incentivized choices in a separate study. In Round 2, participants will be randomized into one of two treatment conditions that are balanced in terms of analytic and intuitive composition but differ in how teams are constructed. The team assignment algorithm in the diversity condition pairs each participant with a teammate so as to maximize the sum of (a function of) within-team differences in thinking styles, while that in the homogeneity condition aims to minimize within-team differences.

The three-condition treatment variable targets independence and aggregation by modifying the format of team interaction and is cross-randomized with the cognitive composition treatment. In particular, subjects are assigned to one of three groups in Round 2:
1. Control: This condition is identical to Round 1, except as follows. Each teammate still designs a portfolio alone, but they are now paid according to the performance of a joint portfolio created by pooling their and their teammate's portfolios together. We explain portfolio pooling to participants as follows: if they invest $100 in Stock A and their teammate invests $100 in Stock B, then their pooled portfolio will have $50 invested in Stock A and $50 invested in Stock B. This approach is generalized to re-allocate the total holdings to $1,000.
2. Communication: This condition is identical to the Control condition, except that teammates are able to have a live discussion with each other while designing their portfolios. A chatbox appears on their screen for this purpose. They can also view their teammate's Round 1 portfolio. This intervention pierces each subject's independence by exposing them to the influence of their teammate.
3. Full team: This condition is identical to the Communication condition, except that teammates work with each other on a shared decision screen. The screen features jointly editable input fields and teammates must unanimously agree on a team portfolio in order to be eligible for bonus payments. This intervention targets aggregation by allowing teammates to organically aggregate a portfolio, rather than imposing an aggregation procedure (as done with portfolio pooling in the previous two conditions).

We will use linear regression to check for treatment effects of each condition on our primary outcomes, which we group into those that target portfolio returns (e.g., the Sharpe ratio and historical returns) and those that target portfolio risk (e.g., volatility over the evaluation period and proxies for diversification). We expect treatment effects to work primarily through the portfolio risk channel rather than the returns channel. An additional primary outcome is the share of each portfolio that participants wish to re-allocate to an ETF that tracks the S&P 500. Our secondary outcomes include participants' stated confidence in their portfolios and measures of discussion quality (e.g., chat sentiment), aggregation quality (e.g., whether subjects use a naive diversification heuristic), and the quality of the team experience.

We pre-specify a number of exploratory analyses in the analysis plan. In particular, we plan to estimate interaction effects between cognitive composition and team format, transformations of the cognitive composition treatment variable, and alternative evaluation windows for realized market-based outcomes. We will also exploit any other resulting variation in cognitive composition that emerges from exogenous treatment assignment. For example, it may be that subjects higher in self-described rationality perform differently from other participants, have higher levels of subjective confidence in their portfolio, and have more success at convincing their teammate to adopt their preferred inputs.
External Link(s)

Registration Citation

Citation
Evans, Daniel and Georg Schneider. 2026. "Systematic determinants of group and team performance in a portfolio allocation task." AEA RCT Registry. June 03. https://doi.org/10.1257/rct.18014-1.0
Experimental Details

Interventions

Intervention(s)
All participants experience Round 1 of the portfolio allocation task identically: they design a portfolio alone by selecting up to 10 stocks from the S&P 500 and allocating a hypothetical $1,000 between them. They are incentivized to maximize the realized Sharpe ratio of this portfolio after three months. In Round 2, we assign them to a team and introduce randomized interventions that target (i) team cognitive composition and (ii) the format of team interaction.

Team cognitive composition is targeted by constructing teams that are cognitively diverse (homogeneous) in thinking style in the diversity (homogeneity) condition. Thinking styles are elicited as described in the Experimental Design section below. To implement this intervention, subjects are first randomly assigned to one of the two conditions, which are balanced in terms of analytic and intuitive composition in expectation (as well as on other pre-treatment covariates). They are then assigned to a teammate within that condition. Teammate assignment is conducted using a condition-specific, two-step algorithm that either aims to maximize or minimize Euclidean distance in thinking styles between teammates.

We employ an algorithm that strikes a balance between performance and speed in pairing teammates in our live setting. The first step in the algorithm lists all possible pairings between teammates, calculates their distances, and makes greedy pairings starting with the lowest- and highest-distance pairs in the homogeneity and diversity conditions, respectively. Assignment continues with the next-best pair that remains available until all teammates are assigned. Since this procedure is not likely to produce globally optimal matches, the next step in the algorithm aims to improve match quality while remaining computationally affordable.

It does so by scanning through all possible teammate swaps in a fixed order and accepting those that increase (decrease) the sum of a concave (convex) function of Euclidean distance in the diversity (homogeneity) condition as they are encountered. After a swap, both of the affected teams remain eligible for additional beneficial swaps in the same pass using the updated pairing of teammates. This swapping exercise is repeated with up to 50 passes or until no more beneficial swaps are available. The concavity (convexity) of the distance function is intended to reduce the number of outlier teams with abnormally high or low distance in each condition. If the number of eligible subjects is odd, the unmatched participant is assigned to a solo "team" and excluded from team-level analyses.

The format of team interaction is modified by randomizing subjects into one of three conditions:
1. Control: this condition is identical to Round 1 except in how subjects are incentivized. Subjects still design portfolios alone but are now told that they will be paid according to the realized Sharpe ratio of a joint portfolio that is created by pooling their and their teammate's portfolios together. We explain portfolio pooling to participants as follows: if they invest $100 in Stock A and their teammate invests $100 in Stock B, then their pooled portfolio will have $50 invested in Stock A and $50 invested in Stock B. This approach is generalized to re-allocate the total pooled holdings to $1,000.
2. Communication: this condition is identical to the Control condition, except as follows. Each subject is now able to have a live discussion with their teammate while designing their portfolio. A chatbox appears on their screen for this purpose. Furthermore, teammates can view each other’s Round 1 portfolios. This intervention pierces each subject's independence by exposing them to the influence of their teammate.
3. Full team: this condition is identical to the Communication condition, except that teammates must now design a portfolio together on a shared decision screen. Subjects’ screens feature jointly editable input fields and an agreement button that records each teammate’s assent to the portfolio currently on the screen. We require unanimity: each teammate must press the agreement button for a given portfolio in order for it to be eligible for bonus payments. To equalize the capacity for diversification between conditions, full teams are able to allocate portfolio holdings to up to 20 tickers (instead of just 10). This intervention targets aggregation by allowing teammates to organically and collaboratively aggregate a portfolio, rather than imposing the pooling aggregation procedure as done in the other conditions.
Intervention Start Date
2026-06-02
Intervention End Date
2026-06-03

Primary Outcomes

Primary Outcomes (end points)
Return variables:
- Realized Sharpe ratios*
- Realized returns
- Portfolio performance ranking in terms of realized Sharpe ratios and returns
- Past returns of tickers in portfolio from historical data

Risk variables:
- Realized volatility
- Portfolio performance ranking in terms of realized volatility
- Average pairwise correlation of tickers in portfolio from historical data
- The Herfindahl-Hirschman Index (HHI) of allocations to tickers and industries
- Maximum weight on a single asset or industry
- The number of unique assets and industries that received allocations

ETF re-allocation variable:
- The share of each portfolio that participants wish to re-allocate to an ETF that tracks the S&P 500

*We list this outcome under Return variables for organizational simplicity, but it depends on both returns and risk.
Primary Outcomes (explanation)
All return- and volatility-based outcomes are calculated in standard ways using realized asset values from the S&P 500 over a 3 month period. The portfolios are managed using a "buy and hold" strategy without rebalancing. These quantities will be annualized for reporting in the paper. The performance of each portfolio is ranked compared to other subjects' portfolios on all realized measures.

Past returns and the average pairwise correlation of tickers will be calculated based on historical data from the two years prior to the main experiment.

We expect treatment effects to work primarily through the channel of portfolio risk rather than returns. This motivates our inclusion of various proxies for diversification as primary outcomes. The HHI of allocations to tickers and industries is calculated according to the standard formula. We will construct simple numerical variables tracking the number of unique assets and industries that appear in each submitted portfolio.

In a final task near the end of the main experiment, we give participants the option to re-allocate between 0% and 100% of their Round 1 and Round 2 portfolios to an exchange-traded fund (ETF) that tracks the S&P 500. For a small share of participants, their preferred allocation will be implemented and the final portfolio that is eligible for bonus payments will be a mixture between their selected assets and the ETF.

Secondary Outcomes

Secondary Outcomes (end points)
- The number of allocations received by each individual asset and industry
- Levels and differences in participants' stated confidence in each portfolio
- Measures of discussion quality (see below)
- Measures of aggregation quality (see below)
- Subjective reports about the quality of the team experience (see below)
Secondary Outcomes (explanation)
We will track the number of allocations that specific tickers (e.g., NVDA, JNJ) and industries (e.g., consumer, healthcare) receive from subjects.

We elicit subjects' subjective confidence (0-100) in their portfolio's ability to generate a high Sharpe ratio after both rounds.

Our measures of discussion quality include:
- Chat sentiment (as calculated from standard packages available with R and Python)
- The number of messages between teammates, their length, and time elapsed between messages
- Imbalance between subjects in terms of the number and length of messages sent
- Appearances of keywords like "Sharpe", "diversify", "diversification", etc. in the team chat that demonstrate a correct understanding of the task objective
- Additional information about the features and structure of these discussions coded by LLMs for exploratory analyses

Our measures of aggregation quality include:
- A classification of whether a portfolio was naively diversified (i.e., equal amounts were allocated to each asset in the portfolio)
- How much overlap a full team's portfolio has with one created using the portfolio pooling heuristic. This heuristic refers to taking the assets in each teammate's Round 1 portfolio and re-allocating them proportionately to $1,000 in the team portfolio.
- A dummy variable capturing whether an individual submitted an eligible portfolio and/or a full team unanimously agreed upon an eligible portfolio
- Time elapsed between the start of the round, editing actions, the submission of eligible portfolios, and the end of the round
- Total number of edits, share of edits made by each teammate
- The number and share of assets whose allocated amounts are at typical focal points, e.g., 100, 250, 500, and multiples of 50 and 100 more generally
- Whether the teammates agreed (in the chatbox) to follow a particular aggregation procedure, e.g., each teammate can allocate half of the amounts in the portfolio
- The extent of overlap of a team's portfolio with each teammate's individual Round 1 portfolio. Higher levels of overlap with one teammate's Round 1 portfolio can be interpreted as a proxy for that teammate's dominance and/or influence over the team's decision-making procedure, especially in full teams.
- Additional information about the features and structure of allocations and aggregations coded by LLMs for exploratory analyses

Our measures of the subjective quality of the team experience include:
- An individual’s willingness to work with the same teammate on a similar task on a scale from 0 to 100
- Text sentiment in an open-text description of how an individual’s "team (i) communicated, (ii) decided which stocks to include, and (iii) allocated the $1,000" (as calculated from standard packages available with R and Python)
- Appearances of keywords in the open-text descriptions that signify (dys)functional team dynamics
- Additional information about the features and structure of these open-text descriptions coded by LLMs

Experimental Design

Experimental Design
Our data collection is divided into two parts. In the intake survey, we recruit participants and have them commit to showing up for the main experiment at a pre-specified date and time. We also collect data on their thinking styles, demographics, preferences, and beliefs, and give them a comprehension check that they must pass in order to be eligible to participate in the portfolio allocation task. In the main experiment, subjects complete the portfolio allocation task first individually (Round 1) and then as part of a team that we assign them to (Round 2).

INTAKE SURVEY:

To recruit subjects who are intrinsically motivated to work on the portfolio allocation task and who hold relevant background knowledge, we advertise the task as an "investment tournament" on Prolific. At first, we will restrict eligibility to subjects who self-report owning investments in their private holdings, i.e., retail investors. If the number of sign-ups falls short of our target sample size (discussed below), we will open up eligibility to subjects who do not meet this criterion. Other criteria we impose on our participants include that they are native speakers of English who live in the United States, Australia, Canada, Ireland, New Zealand, or the United Kingdom, and that they have not participated in any prior studies that either co-author has posted on Prolific.

Eligible participants can sign up via the intake survey form, which informs them of the date and time of the main experiment. The flat fee for completing the survey is $5 USD. In order to allow teammates to work together in real time, we elicit a commitment from each subject to show up on time and participate for the full duration of the main experiment. Abiding by this commitment earns them a show-up fee of $6.

A psychometric evaluation of analytic and intuitive thinking styles follows after this. We use 22 items from the Rational-Experiential Inventory. To keep the survey short, we only selected questions with high factor loadings (absolute value > 0.3) on the rationality and intuition components in the PCA analysis in Norris and Epstein (2011). This resulted in 12 items on the Rational scale and 10 items on the Intuition scale. For potential use in robustness checks, we also elicit subjects' one-dimensional scores on the Cognitive Style Index (Allinson and Hayes, 1996).

After this, we measure subject numeracy using the Berlin Numeracy Test (Cokely et al., 2012). Additional information elicited from participants includes risk preferences, demographics, investing experience, belief in the efficient market hypothesis, a forecast of the level of the S&P 500 in three months, and self-assessed reliability using a question from Dohmen and Jagelka (2024).

The final module in the intake survey is a comprehension check. In this check, we provide subjects with information on Sharpe ratios to prepare them for the main experiment. This information includes both an intuitive definition of Sharpe ratios and a formula that explains how we will calculate them. Participants also see an example calculation and are given tips on maximizing Sharpe ratios, including that diversifying across multiple stocks is a more sensible strategy for this task than choosing "winners." Three questions test their understanding of the definition of Sharpe ratios, how they are calculated, and reliable strategies for maximizing them. To comply with Prolific's rules on comprehension checks, participants who do not answer these questions correctly within two attempts still receive their payment for completing the survey but are not eligible for participation in the main experiment.

MAIN EXPERIMENT:

The intake survey is open for approximately one week in the lead-up to the main experiment. We will close it a few hours before the pre-specified time in order to assess eligibility and prepare the main task for launch.

The portfolio allocation task comprises two rounds. In Round 1, subjects work alone to design a hypothetical portfolio worth $1,000 by selecting tickers from a drop-down list and entering amounts in open boxes. They are required to select a ticker and enter an amount to invest for at least one and at most 10 individual companies on the S&P 500. For each ticker in the dropdown list, subjects are given the full name of the company and its GICS sector-level industry classification (e.g., Financials). They are warned that portfolios that do not sum to $1,000 are not eligible for bonus payments.

Subjects are incentivized to design a portfolio with a high Sharpe ratio. In particular, they are told that the Sharpe ratio of their portfolio will be calculated after three months using real returns and volatility data and that they can receive a bonus payment of up to $10 depending on how well it performs. To implement this, we calculate and rank the realized Sharpe ratios of every eligible participant in Round 1. We then randomly select 10% of participants to receive bonus payments and randomly choose either their Round 1 or Round 2 portfolio to be evaluated for payment. Selected subjects whose Sharpe ratios fall into the top decile of all subjects receive $10, while those in the next decile receive $9. This pattern continues through the bottom decile, where subjects receive $1. Portfolios are only eligible for bonus payments if their allocations add up to $1,000. We use this incentive scheme, rather than one with large prizes for winners, to minimize the role of luck and discourage subjects from gambling their portfolios on "winner" stocks.

After Round 1 is complete, we elicit subjects' subjective confidence in the portfolio they just designed on a 0-100 scale. We also take steps to ensure that we do not assign inattentive and inactive participants to teams in Round 2. In particular, we restrict participation and treatment assignment to subjects who successfully designed an eligible portfolio in Round 1. We also require them to register their presence by pressing a button within two minutes of completing Round 1. This effectively filters out subjects who are no longer paying attention to the survey and prevents us from creating teams with inactive teammates.

The portfolio allocation task in Round 2 is similar to that of Round 1, except for condition-specific differences as described in the Interventions section above. Portfolio incentives are also nearly identical: if a subject is selected for bonus payments and their Round 2 portfolio is chosen to be evaluated, they are paid based on the decile of the portfolio's Sharpe ratio relative to other subjects in the same round. The evaluated portfolios in the Control and Communication conditions are subjects' pooled portfolios with their teammates, while those in the Full team condition are the ones that teammates unanimously agreed upon (if they did so successfully).

There may be cases in which one teammate submits an eligible portfolio and the other does not, or a team in the Full team condition is unable to unanimously agree on a portfolio due to the non-participation of one teammate. In these cases, we will use the available data to recover an eligible portfolio to evaluate for the teammate who participated and followed instructions. In case one teammate submits a portfolio that does not sum up to $1,000, we will re-allocate it to $1,000 before pooling it with their eligible teammate's portfolio. If no portfolio is submitted by one teammate, we will use the eligible teammate's portfolio. Finally, if a teammate is inactive in the Full team condition, we ask the active teammate to still design a portfolio, which will be eligible even without agreement.

We elicit a battery of measures after Round 2 is complete. These include subjects' subjective confidence in their Round 2 portfolio, open-text explanations of their confidence ratings from both rounds and their decision-making procedure and team experience, their willingness to work with this teammate on a similar task again, and their relative attention to returns vs. risk when designing the Round 2 portfolio.

In a final task, we give participants the option to re-allocate between 0% and 100% of their Round 1 and Round 2 portfolios to SPY, which is an ETF that tracks the S&P 500. Participants who select 70% are indicating that they prefer to allocate $700 to their own chosen stocks from a given round, and re-allocate $300 to the ETF. We will then implement these choices for a randomly-selected 10% of participants and one of their two portfolios. If implemented, we will calculate the Sharpe ratio of the re-allocated portfolio as usual and it will be eligible for bonus payments in the respective round for subjects who are selected to receive them.

FORECASTING SURVEY:

After data collection is complete, we will launch a survey that tests whether forecasters are able to predict our main results (DellaVigna and Pope, 2018). More information will follow in a survey-specific preregistration.

References
Allinson, C.W. and Hayes, J. (1996), The Cognitive Style Index: A Measure of Intuition-Analysis For Organizational Research. Journal of Management Studies, 33: 119-135. https://doi.org/10.1111/j.1467-6486.1996.tb00801.x
Cokely ET, Galesic M, Schulz E, Ghazal S, Garcia-Retamero R. Measuring Risk Literacy: The Berlin Numeracy Test. Judgment and Decision Making. 2012;7(1):25-47. doi:10.1017/S1930297500001819
DellaVigna, S. and Pope, D. (2018), Predicting Experimental Results: Who Knows What? Journal of Political Economy, 126: 2410-2456. https://doi.org/10.1086/699976
Dohmen, T., & Jagelka, T. (2024). Accounting for individual-specific reliability of self-assessed measures of economic preferences and personality traits. Journal of Political Economy Microeconomics, 2(3), 399-462.
Norris, P. and Epstein, S. (2011), An Experiential Thinking Style: Its Facets and Relations With Objective and Subjective Criterion Measures. Journal of Personality, 79: 1043-1080. https://doi.org/10.1111/j.1467-6494.2011.00718.x
Experimental Design Details
Not available
Randomization Method
Random assignment is performed by a computer using the random.shuffle() method in Python.
Randomization Unit
Subjects are first randomly assigned to either the homogeneity condition or the diversity condition. These conditions determine how their teammate is selected, as discussed in the Interventions section above.

Once a team is formed, treatment assignment for team format is then conducted at the team level.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
300 teams
Sample size: planned number of observations
600 individuals
Sample size (or number of clusters) by treatment arms
Cognitive composition conditions: 300 individuals and 150 teams in diversity condition, 300 individuals and 150 teams in homogeneity condition

Team format conditions (cross-randomized with cognitive composition): 200 individuals and 100 teams in control condition, 200 individuals and 100 teams in Communication condition, 200 individuals and 100 teams in Full team condition
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
We report power calculations for three between-treatment comparisons that we plan to make for two outcomes. The comparisons are as follows: Comparison 1. Diversity vs. Homogeneity, 150 teams per treatment arm Comparison 2. Control vs. Communication, 100 teams per treatment arm Comparison 3. Communication vs. Full team, 100 teams per treatment arm The outcomes used in these calculations are the realized volatility and Sharpe ratio of a team's Round 2 portfolio. We take standard values of α = 0.05 and 1 − β = 0.80 and calculate MDEs assuming a balanced two-sample test of the treatment coefficient. The basis of our power calculations is a team-level OLS specification with heteroskedasticity-robust standard errors, a treatment indicator, and a control for the team's mean Round 1 value of the corresponding outcome. As reported in the analysis plan, we plan to run additional specifications that may vary in precision. Residual SDs from a version of this specification omitting the treatment indicator are 0.034 in annualized volatility units and 1.334 in annualized Sharpe ratio units. This allows us to calculate MDEs of 0.011 and 0.432 for each outcome in Comparison 1, respectively. For Comparisons 2 and 3, the MDEs are somewhat higher at 0.013 and 0.529. For ease of interpretation, we note that an MDE of 0.011 for volatility corresponds to a 1.1 percentage-point difference in annualized portfolio volatility between arms. Meanwhile, the Sharpe MDE of 0.432 corresponds to a difference in annualized excess returns of approximately 6.9 percentage points evaluated at the median Round 2 portfolio volatility of 16% from the pilot. As discussed elsewhere in the pre-registration, we expect treatment effects to work mostly through the risk channel rather than through the returns channel. For similar reasons, we anticipate enhanced precision in regressions of portfolio volatility on treatment indicators. Our pilot provides a preliminary confirmation of this expectation: our team-level control for Round 1 performance explains 71% of the variance in volatility and 27% in Sharpe ratios. This is consistent with volatility being determined to a large degree by subjects' decisions about how to structure portfolios that persist across rounds, while the Sharpe ratio is more sensitive to asset-specific and market-driven noise. We pre-register further specifications in the analysis plan that we expect to improve precision. Power calculations are not reported for these specifications because these additional covariates would absorb too many degrees of freedom to informatively estimate the SD from our pilot data.
IRB

Institutional Review Boards (IRBs)

IRB Name
German Association for Experimental Economic Research e.V.
IRB Approval Date
2026-02-06
IRB Approval Number
gE1F4rmG
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information