Back to History Current Version

Human Oversight and Aversion to AI Redistributive Decisions

Last registered on September 12, 2025

Pre-Trial

Trial Information

General Information

Title
Human Oversight and Aversion to AI Redistributive Decisions
RCT ID
AEARCTR-0016680
Initial registration date
September 08, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 12, 2025, 10:18 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Alma Mater Studiorum - Università di Bologna

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2025-09-21
End date
2026-05-18
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In many situations, people make decisions that affect others, and these other-regarding choices are shaped by their fairness preferences. At the same time, artificial intelligence (AI) is increasingly integrated into high-stakes decision-making—public benefits allocation, hiring, healthcare, and military operations—making human oversight crucial and normatively required. This paper investigates whether (i) individuals are willing to accept an other-regarding decision made by someone else, and (ii) they are more or less willing to accept a decision made by an AI system rather than a human. Specifically, it examines whether individuals revise redistributive choices differently depending on whether they were made by a human or by AI, and seeks to disentangle two behavioral mechanisms: (i) the black-box effect, stemming from uncertainty about the AI’s decision-making process, and (ii) intrinsic AI aversion, reflecting a fundamental reluctance to rely on algorithmic judgment. To address these questions, the study combines a stylized theoretical framework of the revision of redistributive choices under incomplete information with an online experiment. The design comprises three stages: (i) workers earn money through a real-effort task; (ii) a spectator—either human or AI—makes a redistribution decision; and (iii) a reviewer evaluates and may pay to reveal information and revise this decision. Reviewers observe both AI- and human-made decisions in randomized order, allowing for within- and between-subject comparisons. The findings offer policy-relevant insights for oversight strategies and regulatory frameworks—such as the EU AI Act—by identifying behavioral barriers to effective AI integration.
External Link(s)

Registration Citation

Citation
Paoli, Damiano. 2025. "Human Oversight and Aversion to AI Redistributive Decisions." AEA RCT Registry. September 12. https://doi.org/10.1257/rct.16680-1.0
Experimental Details

Interventions

Intervention(s)
Participants evaluate redistributive allocations that determine other people’s payoffs. Each allocation is labeled as being made by either a human or an AI system. Participants can either accept the allocation or pay a small fee to obtain additional information and revise it. The sole intervention is the different nature of the decision-maker (human vs AI) whose choice is evaluated.
Intervention (Hidden)
Participants act as reviewers of redistributive allocations that were previously set by a spectator (either a human participant or an AI agent implemented via a GPT-based model). The intervention is the type of decision-maker whose allocation the reviewer observes (AI vs. human), holding everything else constant.
When deciding whether to intervene or not, the reviewer only observes the spectator’s final allocation for a pair of workers; they do not observe whether initial earnings were determined by Merit or Luck.
Reviewers choose to accept the allocation or intervene by paying a fixed fee ($0.50) to reveal the Merit/Luck criterion and freely revise the allocation that determines the workers’ payoffs.
Each reviewer completes two blocks (AI and Human), with four allocations per block (strategy method over {(6,0), (5,1), (4,2), (3,3)}) for a total of eight decisions. Block order (AI+Human vs Human+AI) is randomized at the individual level to control for order effects.
One of the eight decisions is randomly selected for payment; if the reviewer intervened in that decision, the fee is deducted, and the revised allocation is implemented for the workers.
Additionally, I elicit the beliefs of participants about what the spectator (AI or Human) would do when deciding how to redistribute. Specifically, I collect an incentivized measure of the probability that the workers were in the Merit scenario, given an observed allocation and spectator's type.
Intervention Start Date
2025-09-21
Intervention End Date
2025-10-21

Primary Outcomes

Primary Outcomes (end points)
The main outcome is the intervention rate.
Intervene(i,a): Outcome variable on the extensive margin. It is a dummy variable equal to 1 if the reviewer decides to pay the fee and intervene, revealing the workers’ initial criterion (Luck or Merit), and 0 otherwise. I collect one observation for each observed allocation a ((6, 0), (5, 1), (4, 2), (3, 3)) through the strategy method, and the reviewer is exposed to both treatment conditions (human and AI), thus there will be eight observed allocations.
∆G(i,a): Outcome variable on the intensive margin. It measures the change in the Gini index before and after the reviewer’s intervention. It is equal to 0 if the reviewer decides not to intervene.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Probability of Merit scenario (pM): elicited probability that the workers are in the Merit scenario, conditional on observed allocation (a) and spectator’s type (tau): pM = P(M | a, tau ). Consequently, pL = 1 − pM is the elicited probability that workers are in the Luck scenario. I elicit eight probabilities per subject: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experiment has four parts. First, workers earn money in a real-effort task. Second, spectators decide how to redistribute earnings within a randomly drawn worker pair. Third, reviewers observe a spectator’s decision and decide whether to intervene and modify the allocation. Fourth, workers are paid based on the final redistribution. The study focuses on reviewers’ decisions; workers and spectators create a consequential economic environment.

Workers
Workers are recruited on Prolific. After the task, they are randomly paired; in each pair, one worker receives an additional reward determined either by Merit (higher performance) or by Luck (random draw), with equal probability across pairs. They are informed that a third party (the spectator) will see the initial earnings and the criterion and may redistribute earnings within the pair.

Spectators
There are two spectator types: human (participants recruited on Prolific) and AI (ChatGPT-5). Both receive identical instructions and information. Each worker pair is evaluated by both spectator types. Spectators choose whether and how to redistribute the initial earnings. They make the decision for a pair in the Merit condition, and for a pair in the Luck condition.

Reviewers
The same human participants return one week later as reviewers. They evaluate a spectator’s redistribution under incomplete information: they see only the final allocation and whether it was made by a human or an AI spectator (not whether initial earnings came from Merit or Luck). Reviewers can either accept the allocation or intervene by paying a small fee to reveal the criterion and then revise the allocation. Choices are elicited for a fixed set of canonical allocations in both spectator types, with randomized order. One decision is randomly selected for payment; participants remain anonymous and are never matched to their own earlier spectator decisions.

Belief Elicitation
Beliefs about the source of inequality (Merit vs Luck), conditional on the observed allocation and spectator type, are elicited with an incentivized measure.
Experimental Design Details
The experiment consists of four parts. In the first part, workers earn real money through a real-effort task. In the second part, spectators decide how to redistribute the earnings between a randomly drawn pair of workers. In the third part, reviewers observe the spectator's decision and determine whether they want to intervene and modify the earnings allocation. In the fourth part, workers receive payments based on the final redistribution determined by spectators and reviewers. This study primarily focuses on reviewers' decisions, while workers and spectators establish a real economic setting with tangible consequences.

Workers:
600 workers are recruited on Prolific. When recruited, workers are promised a participation fee of 0.50 USD, and they are told that they could earn additional money, depending on the actions they and others will take in the experiment. After completing a real-effort assignment, workers are randomly paired. In each pair, one worker receives an additional reward of 6 USD, while the other receives nothing. The assignment follows one of two possible criteria:
- Merit: The worker with the higher performance in the pair receives 6 USD.
- Luck: The worker who receives 6 USD is randomly selected.
The criterion for each pair is determined randomly with a 50/50 probability. Thus, half of the workers (150 pairs) have their initial earnings assigned based on performance, while the other half receive earnings based on luck. For clarity, I henceforth refer to the worker who receives 6 USD as Blue Worker, regardless of the adopted criterion—merit or luck. Workers are informed about the allocation mechanism, but do not know which criterion was used in their specific case. After completing the effort task, they are told that a third party—the spectator—will observe the initial distribution of earnings and the criterion (merit or luck) that determined the earnings. The spectator will then have the opportunity to redistribute the earnings between the two workers in the pair.

Spectators:
There are two types of spectators:
- Human spectators: 300 participants, recruited online via Prolific. They receive a fixed payment for participation and do not overlap with the workers' sample. The choice they have to make will have consequences for a real-life situation and is therefore incentive-compatible (this incentive assumes that spectators care about the earnings of others; otherwise, purely selfish spectators would never intervene).
- AI spectators: 300 artificial agents represented by ChatGPT-5, receiving as a prompt the same instructions that the human spectators see.
Each unique pair of workers is assigned to both a human spectator and an AI spectator. The spectators decide whether and how to redistribute the initial earnings. Spectators are fully informed about the effort task completed by workers, the criterion used to assign initial earnings (Merit or Luck), and the fact that workers were unaware that their performance would be observed for redistribution purposes. Each spectator completes the assignment for two pairs of workers: one in the Merit condition and one in the Luck condition. The order of these conditions is randomized to control for order effects. We elicit 1200 redistributions, but there are only 300 unique pairs of workers. Therefore, there is a 25% probability that a redistribution choice made by the spectator is actually implemented and evaluated by a reviewer.

Reviewers:
The same 300 subjects who participated as human spectators are invited to a follow-up session one week later, where they act as reviewers. Their task is to evaluate and, if desired, revise a redistribution decision made by a spectator. Initially, reviewers have incomplete information: they only observe the final earnings of both workers after the spectator's redistribution, without knowing whether the initial allocation of earnings was determined by merit or luck. They are, however, informed about the nature of the spectator (human or AI) responsible for the decision. Reviewers have two possible choices:
- No intervention: They accept the current earnings allocation of workers.
- Intervention: They pay a small but non-negligible fee (0.50 USD) deducted from their own earnings to reveal the original payoff criterion (Merit or Luck). If they pay the fee, they are also allowed to modify the earnings allocation as they prefer.
The decision is elicited using the strategy method. Each reviewer observes all four allocations {(6,0), (5,1), (4,2), (3,3)} and, for each, decides whether to intervene and potentially redistribute the earnings or not. Reviewers complete this task for both a human spectator and an AI spectator. The sequence of these two treatment conditions is randomized to mitigate order effects. Hence, each reviewer faces the decision task eight times: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}. One of these choices is payoff-relevant for a pair of workers and the reviewer herself, but the reviewers do not know which one is, as they make their decisions. To prevent any strategic behavior, all participants remain fully anonymous, and reviewers are never matched with decisions they previously made as spectators.

Belief Elicitation:
I elicit reviewers' beliefs about the source of inequality, conditional on redistribution choices and the type of spectator. After completing their redistribution decisions as spectators, subjects are presented with a redistribution made by another spectator (human or AI) and must estimate the probability that the workers were in the Merit condition (pM) or the Luck condition (pL), given the spectator's type and the observed allocation.
Each subject evaluates eight different scenarios: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}. To incentivize truthful reporting, I use the binarized scoring rule. I do not provide explicit details about the scoring mechanism to minimize distortions in reported beliefs. Participants are simply informed that their best estimate maximizes their expected earnings, with further details on the payment rule available in a clickable link that opens a PDF containing a description of the scoring rule. I randomize the order of the spectator's type (AI or Human) that subjects will evaluate.
Randomization Method
Randomization is done through the experimental software (oTree). Half of the reviewers (150) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.
Randomization Unit
Randomization happens at the individual level. First, human spectators are randomized in the belief elicitation: half (150) will see AI first, half (150) human first. Then, when they participate as reviewers one week later, these two subgroups are equally randomized in the treatment order: half (75 + 75) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
300 individuals.
Sample size: planned number of observations
2400 observations = 300 individuals X 4 allocations X 2 treatment conditions.
Sample size (or number of clusters) by treatment arms
150 participants (600 observations) with AI first, 150 participants (600 observations) with human first (between-subjects).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
I consider the more conservative between-subjects comparison. If we assume a sample size of 300 subjects (150 independent observations per treatment), a non-parametric test as the Wilcoxon-Mann-Whitney test achieves a minimum detectable effect of 0.35 s.d. with a two-tailed test, alpha=0.05, power=0.80, and without any assumption on the parent distribution (min A.R.E.).
IRB

Institutional Review Boards (IRBs)

IRB Name
Comitato di Bioetica Alma Mater Studiorum - Università di Bologna
IRB Approval Date
2025-05-22
IRB Approval Number
0153983

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials