Evaluating Written Arguments in Group Decision-Making: How Payoff Relevance and Decision-Maker Type Shape Argument Characteristics

Last registered on November 19, 2025

Pre-Trial

Trial Information

General Information

Title
Evaluating Written Arguments in Group Decision-Making: How Payoff Relevance and Decision-Maker Type Shape Argument Characteristics
RCT ID
AEARCTR-0017247
Initial registration date
November 14, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
November 19, 2025, 1:41 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Otto-von-Guericke-Universität (OVGU) Magdeburg

Other Primary Investigator(s)

Additional Trial Information

Status
On going
Start date
2025-11-14
End date
2025-11-28
Secondary IDs
Prior work
This trial is based on or builds upon one or more prior RCTs.
Abstract
This study analyzes written arguments that participants submitted in a previous experiment on group decision-making. In the original study, participants were part of groups where a selection mechanism chose one option from several alternatives over 30 rounds. Participants individually ranked three options for each topic according to their preferences (most preferred, second most preferred, and third most preferred) and were invited to individually provide written arguments explaining why their most preferred option should be chosen for the group. Participants' payments depended on whether the chosen option aligned with their preference, with higher payoffs for alignment.
Three different decision-making systems were tested: one where a single person (dictator) made the final decision after reading all arguments and seeing all preferences, one where decisions were made by a voting rule (Borda count) with an observer reading the arguments and seeing preferences, and one where an AI (ChatGPT) made decisions based on arguments and preferences with an observer reading arguments and seeing preferences.
In this follow-up study, we collect independent evaluations of these written arguments. Each argument is rated by 3-4 evaluators who are incentivized for their assessments through additional payment if their rating falls within the median ±1 of all evaluations for that argument on each dimension. Evaluators rate each argument on four characteristics: how emotional it is, how convincing it is, how self-oriented it is (focusing on personal experiences or preferences rather than group or societal ones), and how vivid the language is (using descriptive language and metaphors).
Understanding how people craft arguments under different decision-making systems is important because the perceived relevance of one's voice may influence communication strategies. By analyzing whether arguments differ systematically when they directly influence outcomes versus when they are merely observed, and when the decision-maker is human versus AI, we can gain insights into how institutional design shapes voice expression. These insights will help organizations and policymakers design participatory decision-making processes that effectively incorporate individual voice in group decision contexts.

Registration Citation

Citation
Bechdolf, Mathilde Lea Editha. 2025. "Evaluating Written Arguments in Group Decision-Making: How Payoff Relevance and Decision-Maker Type Shape Argument Characteristics." AEA RCT Registry. November 19. https://doi.org/10.1257/rct.17247-1.0
Experimental Details

Interventions

Intervention(s)
Participants evaluate written arguments from a previous group decision-making experiment. Each argument is independently rated by 3-4 evaluators on four dimensions: emotionality, convincingness, self-orientation, and vividness of language. Evaluators are incentivized through additional payment if their ratings fall within the median ±1 of all ratings for that argument on each dimension.
Intervention (Hidden)
Evaluators are recruited online and randomly assigned to rate subsets of written arguments collected from a previous laboratory experiment. The arguments were produced by participants across three decision-making systems (Dictator, Borda Count, AI) during a group policy selection task. Evaluators are blind to the decision system and treatment conditions under which the arguments were generated.

Each argument is evaluated by 3-4 independent raters using four 7-point Likert scales:
1. Emotionality: "The argument is emotional" (1 = not at all emotional, 7 = very emotional)
2. Convincingness: "The argument is convincing" (1 = not at all convincing, 7 = very convincing)
3. Self-orientation: "The argument is self-oriented (focuses on personal experiences or preferences rather than group or societal ones)" (1 = not at all self-oriented, 7 = very self-oriented)
4. Vividness: "The argument uses vivid language with adjectives and metaphors" (1 = not at all vivid, 7 = very vivid)

Evaluators receive a base payment plus a bonus payment for each dimension where their rating falls within the median ±1 of all ratings for that argument. This incentive mechanism encourages careful and consensus-oriented evaluation. After completing all evaluations, demographic information are collected.
Intervention Start Date
2025-11-14
Intervention End Date
2025-11-21

Primary Outcomes

Primary Outcomes (end points)
1. Emotionality ratings of arguments (7-point Likert scale)
2. Convincingness ratings of arguments (7-point Likert scale)
3. Self-orientation ratings of arguments (7-point Likert scale)
4. Vividness ratings of arguments (7-point Likert scale)
Primary Outcomes (explanation)
For each argument, the mean rating across all evaluators (3-4 raters per argument) will be calculated for each of the four dimensions. These mean ratings will be analyzed by decision system (Dictator, Borda, AI) and decision-maker type (Human vs. AI) to test whether argument characteristics differ systematically across treatments. Inter-rater reliability will be assessed using intraclass correlation coefficients (ICC).

Secondary Outcomes

Secondary Outcomes (end points)
1. Inter-rater agreement/reliability measures (ICC)
2. Evaluator characteristics (demographics)
3. Relationship between argument length and ratings
Secondary Outcomes (explanation)
Inter-rater reliability metrics will assess the consistency of evaluations across raters using intraclass correlation coefficients (ICC). We will examine whether certain types of arguments (e.g., from specific decision systems) show higher or lower inter-rater agreement. Evaluator demographic characteristics will be collected to describe the sample of raters. The relationship between argument characteristics such as word count and ratings will be explored as a robustness check.

Experimental Design

Experimental Design
Independent evaluators rate written arguments from a previous group decision-making experiment. Each argument is rated by 3-4 evaluators on four dimensions using 7-point Likert scales. Evaluators are incentivized for ratings that fall within the median ±1 of all ratings for each argument-dimension combination.
Experimental Design Details
Evaluators are recruited online and presented with arguments in randomized order to prevent order effects. Each evaluator rates a subset of arguments to ensure each argument receives 3-4 independent ratings while keeping the task manageable for individual evaluators.

The arguments being evaluated were generated in a previous laboratory experiment with a 3x2 design:
- Three decision systems: Dictator (human makes decision after seeing arguments and preferences), Borda Count (voting rule determines decision, observer sees arguments), AI (ChatGPT makes decision based on arguments and preferences, observer sees arguments)
- Two communication conditions: with and without voice (opportunity to write arguments)

Arguments analyzed in this study come from rounds where participants had the opportunity to write arguments. Only valid arguments with more than 4 characters were included in the evaluation. Evaluators are blind to:
- Which decision system generated each argument
- Whether the argument writer's preference was selected
- Any outcome information from the original experiment

After completing all argument evaluations, evaluators complete demographic questions. Payment is calculated as base payment plus bonuses for ratings within the median ±1 range.
Randomization Method
Individual arguments are the unit of analysis. Each argument is randomly assigned to 3-4 evaluators. The order in which arguments are presented to evaluators is also randomized at the individual evaluator level.
Randomization Unit
Individual arguments are the unit of analysis. Each argument is randomly assigned to 3-4 evaluators. The order in which arguments are presented to evaluators is also randomized at the individual evaluator level.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Not applicable - no clustering design
Sample size: planned number of observations
5,627 arguments evaluated by approximately 212 evaluators, resulting in 16,960 total evaluations (each argument rated 3-4 times, each evaluator rates 80 arguments).
Sample size (or number of clusters) by treatment arms
Dictator: 1934 arguments
Borda: 1929 arguments
AI: 1764 arguments

Each argument is evaluated 3-4 times by independent raters, with each rater evaluating 80 arguments. Total of approximately 212 evaluators providing 16,960 evaluations.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
With approximately 1,900 arguments per decision system, we have high statistical power to detect small to medium effect sizes in comparisons between decision systems.
IRB

Institutional Review Boards (IRBs)

IRB Name
Gesellschaft für experimentelle Wirtschaftsforschung (GfeW)
IRB Approval Date
2024-08-15
IRB Approval Number
oFCWnQCM

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials