Human Oversight and Aversion to AI Redistributive Decisions

Last registered on October 03, 2025

Pre-Trial

Trial Information

General Information

Title
Human Oversight and Aversion to AI Redistributive Decisions
RCT ID
AEARCTR-0016680
Initial registration date
September 08, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 12, 2025, 10:18 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
October 03, 2025, 1:04 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Alma Mater Studiorum - Università di Bologna

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2025-10-02
End date
2026-05-18
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In many situations, people make decisions that affect others, and these other-regarding choices are shaped by their fairness preferences. At the same time, artificial intelligence (AI) is increasingly integrated into high-stakes decision-making—public benefits allocation, hiring, healthcare, and military operations—making human oversight crucial and normatively required. This paper investigates whether (i) individuals are willing to accept an other-regarding decision made by someone else, and (ii) they are more or less willing to accept a decision made by an AI system rather than a human. Specifically, it examines whether individuals revise redistributive choices differently depending on whether they were made by a human or by AI, and seeks to disentangle two behavioral mechanisms: (i) the black-box effect, stemming from uncertainty about the AI’s decision-making process, and (ii) intrinsic AI aversion, reflecting a fundamental reluctance to rely on algorithmic judgment. To address these questions, the study combines a stylized theoretical framework of the revision of redistributive choices under incomplete information with an online experiment. The design comprises three stages: (i) workers earn money through a real-effort task; (ii) a spectator—either human or AI—makes a redistribution decision; and (iii) a reviewer evaluates and may pay to reveal information and revise this decision. Reviewers observe both AI- and human-made decisions in randomized order, allowing for within- and between-subject comparisons. The findings offer policy-relevant insights for oversight strategies and regulatory frameworks—such as the EU AI Act—by identifying behavioral barriers to effective AI integration.
External Link(s)

Registration Citation

Citation
Paoli, Damiano. 2025. "Human Oversight and Aversion to AI Redistributive Decisions." AEA RCT Registry. October 03. https://doi.org/10.1257/rct.16680-3.0
Experimental Details

Interventions

Intervention(s)
Participants evaluate redistributive allocations that determine other people’s payoffs. Each allocation is labeled as being made by either a human or an AI system. Participants can either accept the allocation or pay a small fee to obtain additional information and revise it. The sole intervention is the different nature of the decision-maker (human vs AI) whose choice is evaluated.
Intervention Start Date
2025-10-03
Intervention End Date
2025-10-17

Primary Outcomes

Primary Outcomes (end points)
The main outcome is the intervention rate.
Intervene(i,a): Outcome variable on the extensive margin. It is a dummy variable equal to 1 if the reviewer decides to pay the fee and intervene, revealing the workers’ initial criterion (Luck or Merit), and 0 otherwise. I collect one observation for each observed allocation a ((6, 0), (5, 1), (4, 2), (3, 3)) through the strategy method, and the reviewer is exposed to both treatment conditions (human and AI), thus there will be eight observed allocations.
∆G(i,a): Outcome variable on the intensive margin. It measures the change in the Gini index before and after the reviewer’s intervention. It is equal to 0 if the reviewer decides not to intervene.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Probability of Merit scenario (pM): elicited probability that the workers are in the Merit scenario, conditional on observed allocation (a) and spectator’s type (tau): pM = P(M | a, tau ). Consequently, pL = 1 − pM is the elicited probability that workers are in the Luck scenario. I elicit eight probabilities per subject: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experiment has four parts. First, workers earn money in a real-effort task. Second, spectators decide how to redistribute earnings within a randomly drawn worker pair. Third, reviewers observe a spectator’s decision and decide whether to intervene and modify the allocation. Fourth, workers are paid based on the final redistribution. The study focuses on reviewers’ decisions; workers and spectators create a consequential economic environment.

Workers
Workers are recruited on Prolific. After the task, they are randomly paired; in each pair, one worker receives an additional reward determined either by Merit (higher performance) or by Luck (random draw), with equal probability across pairs. They are informed that a third party (the spectator) will see the initial earnings and the criterion and may redistribute earnings within the pair.

Spectators
There are two spectator types: human (participants recruited on Prolific) and AI (ChatGPT-4.1). Both receive identical instructions and information. Each worker pair is evaluated by both spectator types. Spectators choose whether and how to redistribute the initial earnings. They make the decision for a pair in the Merit condition, and for a pair in the Luck condition.

Reviewers
The same human participants return one week later as reviewers. They evaluate a spectator’s redistribution under incomplete information: they see only the final allocation and whether it was made by a human or an AI spectator (not whether initial earnings came from Merit or Luck). Reviewers can either accept the allocation or intervene by paying a small fee to reveal the criterion and then revise the allocation. Choices are elicited for a fixed set of canonical allocations in both spectator types, with randomized order. One decision is randomly selected for payment; participants remain anonymous and are never matched to their own earlier spectator decisions.

Belief Elicitation
Beliefs about the source of inequality (Merit vs Luck), conditional on the observed allocation and spectator type, are elicited with an incentivized measure.
Experimental Design Details
Not available
Randomization Method
Randomization is done through the experimental software (oTree). Half of the reviewers (150) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.
Randomization Unit
Randomization happens at the individual level. First, human spectators are randomized in the belief elicitation: half (150) will see AI first, half (150) human first. Then, when they participate as reviewers one week later, these two subgroups are equally randomized in the treatment order: half (75 + 75) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
300 individuals. The sample is representative of the U.S. population in terms of sex, age, and political affiliation, as determined by Prolific's representative sample study distribution.
Sample size: planned number of observations
2400 observations = 300 individuals X 4 allocations X 2 treatment conditions.
Sample size (or number of clusters) by treatment arms
150 participants (600 observations) with AI first, 150 participants (600 observations) with human first (between-subjects).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
I consider the more conservative between-subjects comparison. If we assume a sample size of 300 subjects (150 independent observations per treatment), a non-parametric test as the Wilcoxon-Mann-Whitney test achieves a minimum detectable effect of 0.35 s.d. with a two-tailed test, alpha=0.05, power=0.80, and without any assumption on the parent distribution (min A.R.E.).
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Comitato di Bioetica Alma Mater Studiorum - Università di Bologna
IRB Approval Date
2025-05-22
IRB Approval Number
0153983
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information