Human Oversight and Aversion to AI Redistributive Decisions

Last registered on October 02, 2025

View Trial History

Pre-Trial

Trial Information

General Information

Title

Human Oversight and Aversion to AI Redistributive Decisions

RCT ID

AEARCTR-0016680

Initial registration date

September 08, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

September 12, 2025, 10:18 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

October 02, 2025, 9:59 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

Italy

Region

Primary Investigator

Name

Damiano Paoli

Affiliation

Alma Mater Studiorum - Università di Bologna

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

In development

Start date

2025-10-02

End date

2026-05-18

Keywords

Behavior

Additional Keywords

Artificial Intelligence, Fairness, Redistribution

JEL code(s)

C91, D63, O33

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

In many situations, people make decisions that affect others, and these other-regarding choices are shaped by their fairness preferences. At the same time, artificial intelligence (AI) is increasingly integrated into high-stakes decision-making—public benefits allocation, hiring, healthcare, and military operations—making human oversight crucial and normatively required. This paper investigates whether (i) individuals are willing to accept an other-regarding decision made by someone else, and (ii) they are more or less willing to accept a decision made by an AI system rather than a human. Specifically, it examines whether individuals revise redistributive choices differently depending on whether they were made by a human or by AI, and seeks to disentangle two behavioral mechanisms: (i) the black-box effect, stemming from uncertainty about the AI’s decision-making process, and (ii) intrinsic AI aversion, reflecting a fundamental reluctance to rely on algorithmic judgment. To address these questions, the study combines a stylized theoretical framework of the revision of redistributive choices under incomplete information with an online experiment. The design comprises three stages: (i) workers earn money through a real-effort task; (ii) a spectator—either human or AI—makes a redistribution decision; and (iii) a reviewer evaluates and may pay to reveal information and revise this decision. Reviewers observe both AI- and human-made decisions in randomized order, allowing for within- and between-subject comparisons. The findings offer policy-relevant insights for oversight strategies and regulatory frameworks—such as the EU AI Act—by identifying behavioral barriers to effective AI integration.

External Link(s)

Registration Citation

Citation

Paoli, Damiano. 2025. "Human Oversight and Aversion to AI Redistributive Decisions." AEA RCT Registry. October 02. https://doi.org/10.1257/rct.16680-2.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Participants evaluate redistributive allocations that determine other people’s payoffs. Each allocation is labeled as being made by either a human or an AI system. Participants can either accept the allocation or pay a small fee to obtain additional information and revise it. The sole intervention is the different nature of the decision-maker (human vs AI) whose choice is evaluated.

Intervention (Hidden)

Participants act as reviewers of redistributive allocations that were previously set by a spectator (either a human participant or an AI agent implemented via a GPT-based model). The intervention is the type of decision-maker whose allocation the reviewer observes (AI vs. human), holding everything else constant.
When deciding whether to intervene or not, the reviewer only observes the spectator’s final allocation for a pair of workers; they do not observe whether initial earnings were determined by Merit or Luck.
Reviewers choose to accept the allocation or intervene by paying a fixed fee ($0.50) to reveal the Merit/Luck criterion and freely revise the allocation that determines the workers’ payoffs.
Each reviewer completes two blocks (AI and Human), with four allocations per block (strategy method over {(6,0), (5,1), (4,2), (3,3)}) for a total of eight decisions. Block order (AI+Human vs Human+AI) is randomized at the individual level to control for order effects.
One of the eight decisions is randomly selected for payment; if the reviewer intervened in that decision, the fee is deducted, and the revised allocation is implemented for the workers.
Additionally, I elicit the beliefs of participants about what the spectator (AI or Human) would do when deciding how to redistribute. Specifically, I collect an incentivized measure of the probability that the workers were in the Merit scenario, given an observed allocation and spectator's type.

Intervention Start Date

2025-10-03

Intervention End Date

2025-10-17

Primary Outcomes

Primary Outcomes (end points)

The main outcome is the intervention rate.
Intervene(i,a): Outcome variable on the extensive margin. It is a dummy variable equal to 1 if the reviewer decides to pay the fee and intervene, revealing the workers’ initial criterion (Luck or Merit), and 0 otherwise. I collect one observation for each observed allocation a ((6, 0), (5, 1), (4, 2), (3, 3)) through the strategy method, and the reviewer is exposed to both treatment conditions (human and AI), thus there will be eight observed allocations.
∆G(i,a): Outcome variable on the intensive margin. It measures the change in the Gini index before and after the reviewer’s intervention. It is equal to 0 if the reviewer decides not to intervene.

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

Probability of Merit scenario (pM): elicited probability that the workers are in the Merit scenario, conditional on observed allocation (a) and spectator’s type (tau): pM = P(M | a, tau ). Consequently, pL = 1 − pM is the elicited probability that workers are in the Luck scenario. I elicit eight probabilities per subject: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}

Secondary Outcomes (explanation)

Experimental Design

The experiment has four parts. First, workers earn money in a real-effort task. Second, spectators decide how to redistribute earnings within a randomly drawn worker pair. Third, reviewers observe a spectator’s decision and decide whether to intervene and modify the allocation. Fourth, workers are paid based on the final redistribution. The study focuses on reviewers’ decisions; workers and spectators create a consequential economic environment.

Workers
Workers are recruited on Prolific. After the task, they are randomly paired; in each pair, one worker receives an additional reward determined either by Merit (higher performance) or by Luck (random draw), with equal probability across pairs. They are informed that a third party (the spectator) will see the initial earnings and the criterion and may redistribute earnings within the pair.

Spectators
There are two spectator types: human (participants recruited on Prolific) and AI (ChatGPT-4.1). Both receive identical instructions and information. Each worker pair is evaluated by both spectator types. Spectators choose whether and how to redistribute the initial earnings. They make the decision for a pair in the Merit condition, and for a pair in the Luck condition.

Reviewers
The same human participants return one week later as reviewers. They evaluate a spectator’s redistribution under incomplete information: they see only the final allocation and whether it was made by a human or an AI spectator (not whether initial earnings came from Merit or Luck). Reviewers can either accept the allocation or intervene by paying a small fee to reveal the criterion and then revise the allocation. Choices are elicited for a fixed set of canonical allocations in both spectator types, with randomized order. One decision is randomly selected for payment; participants remain anonymous and are never matched to their own earlier spectator decisions.

Belief Elicitation
Beliefs about the source of inequality (Merit vs Luck), conditional on the observed allocation and spectator type, are elicited with an incentivized measure.

Experimental Design Details

The experiment consists of four parts. In the first part, workers earn real money through a real-effort task. In the second part, spectators decide how to redistribute the earnings between a randomly drawn pair of workers. In the third part, reviewers observe the spectator's decision and determine whether they want to intervene and modify the earnings allocation. In the fourth part, workers receive payments based on the final redistribution determined by spectators and reviewers. This study primarily focuses on reviewers' decisions, while workers and spectators establish a real economic setting with tangible consequences.

Workers:
600 workers are recruited on Prolific. When recruited, workers are promised a participation fee of 0.50 USD, and they are told that they could earn additional money, depending on the actions they and others will take in the experiment. After completing a real-effort assignment, workers are randomly paired. In each pair, one worker receives an additional reward of 6 USD, while the other receives nothing. The assignment follows one of two possible criteria:
- Merit: The worker with the higher performance in the pair receives 6 USD.
- Luck: The worker who receives 6 USD is randomly selected.
The criterion for each pair is determined randomly with a 50/50 probability. Thus, half of the workers (150 pairs) have their initial earnings assigned based on performance, while the other half receive earnings based on luck. For clarity, I henceforth refer to the worker who receives 6 USD as Blue Worker, regardless of the adopted criterion—merit or luck. Workers are informed about the allocation mechanism, but do not know which criterion was used in their specific case. After completing the effort task, they are told that a third party—the spectator—will observe the initial distribution of earnings and the criterion (merit or luck) that determined the earnings. The spectator will then have the opportunity to redistribute the earnings between the two workers in the pair.

Spectators:
There are two types of spectators:
- Human spectators: 300 participants, recruited online via Prolific. They receive a fixed payment for participation and do not overlap with the workers' sample. The choice they have to make will have consequences for a real-life situation and is therefore incentive-compatible (this incentive assumes that spectators care about the earnings of others; otherwise, purely selfish spectators would never intervene).
- AI spectators: 300 artificial agents represented by ChatGPT-4.1, receiving as a prompt the same instructions that the human spectators see.
Each unique pair of workers is assigned to both a human spectator and an AI spectator. The spectators decide whether and how to redistribute the initial earnings. Spectators are fully informed about the effort task completed by workers, the criterion used to assign initial earnings (Merit or Luck), and the fact that workers were unaware that their performance would be observed for redistribution purposes. Each spectator completes the assignment for two pairs of workers: one in the Merit condition and one in the Luck condition. The order of these conditions is randomized to control for order effects. We elicit 1200 redistributions, but there are only 300 unique pairs of workers. Therefore, there is a 25% probability that a redistribution choice made by the spectator is actually implemented and evaluated by a reviewer.

Reviewers:
The same 300 subjects who participated as human spectators are invited to a follow-up session one week later, where they act as reviewers. Their task is to evaluate and, if desired, revise a redistribution decision made by a spectator. Initially, reviewers have incomplete information: they only observe the final earnings of both workers after the spectator's redistribution, without knowing whether the initial allocation of earnings was determined by merit or luck. They are, however, informed about the nature of the spectator (human or AI) responsible for the decision. Reviewers have two possible choices:
- No intervention: They accept the current earnings allocation of workers.
- Intervention: They pay a small but non-negligible fee (0.50 USD) deducted from their own earnings to reveal the original payoff criterion (Merit or Luck). If they pay the fee, they are also allowed to modify the earnings allocation as they prefer.
The decision is elicited using the strategy method. Each reviewer observes all four allocations {(6,0), (5,1), (4,2), (3,3)} and, for each, decides whether to intervene and potentially redistribute the earnings or not. Reviewers complete this task for both a human spectator and an AI spectator. The sequence of these two treatment conditions is randomized to mitigate order effects. Hence, each reviewer faces the decision task eight times: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}. One of these choices is payoff-relevant for a pair of workers and the reviewer herself, but the reviewers do not know which one is, as they make their decisions. To prevent any strategic behavior, all participants remain fully anonymous, and reviewers are never matched with decisions they previously made as spectators.

Belief Elicitation:
I elicit reviewers' beliefs about the source of inequality, conditional on redistribution choices and the type of spectator. After completing their redistribution decisions as spectators, subjects are presented with a redistribution made by another spectator (human or AI) and must estimate the probability that the workers were in the Merit condition (pM) or the Luck condition (pL), given the spectator's type and the observed allocation.
Each subject evaluates eight different scenarios: (AI or Human) X {(6,0), (5,1), (4,2), (3,3)}. To incentivize truthful reporting, I use the binarized scoring rule. I do not provide explicit details about the scoring mechanism to minimize distortions in reported beliefs. Participants are simply informed that their best estimate maximizes their expected earnings, with further details on the payment rule available in a clickable link that opens a PDF containing a description of the scoring rule. I randomize the order of the spectator's type (AI or Human) that subjects will evaluate.

Randomization Method

Randomization is done through the experimental software (oTree). Half of the reviewers (150) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.

Randomization Unit

Randomization happens at the individual level. First, human spectators are randomized in the belief elicitation: half (150) will see AI first, half (150) human first. Then, when they participate as reviewers one week later, these two subgroups are equally randomized in the treatment order: half (75 + 75) will first see the redistributions made by AI, then those made by humans; the other half, the opposite order.

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

300 individuals. The sample is representative of the U.S. population in terms of sex, age, and political affiliation, as determined by Prolific's representative sample study distribution.

Sample size: planned number of observations

2400 observations = 300 individuals X 4 allocations X 2 treatment conditions.

Sample size (or number of clusters) by treatment arms

150 participants (600 observations) with AI first, 150 participants (600 observations) with human first (between-subjects).

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

I consider the more conservative between-subjects comparison. If we assume a sample size of 300 subjects (150 independent observations per treatment), a non-parametric test as the Wilcoxon-Mann-Whitney test achieves a minimum detectable effect of 0.35 s.d. with a two-tailed test, alpha=0.05, power=0.80, and without any assumption on the parent distribution (min A.R.E.).

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB