AI, Belief Calibration, and Human Decision-Making: Evidence from a Large-Scale Field Experiment with Taxi Drivers

Last registered on September 15, 2025

Pre-Trial

Trial Information

General Information

Title
AI, Belief Calibration, and Human Decision-Making: Evidence from a Large-Scale Field Experiment with Taxi Drivers
RCT ID
AEARCTR-0016775
Initial registration date
September 13, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 15, 2025, 9:48 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
The University of Hong Kong

Other Primary Investigator(s)

PI Affiliation
The University of Hong Kong

Additional Trial Information

Status
In development
Start date
2025-09-15
End date
2026-02-28
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
This study investigates how artificial intelligence (AI) and behavioral interventions can improve decision-making and productivity among gig workers. We partner with a leading mobility platform to conduct a large-scale randomized controlled trial (RCT) with 8,000 taxi drivers in a major city in China. The experiment tests two complementary channels: (1) ability improvement through AI-generated real-time demand predictions and (2) belief calibration through personalized feedback that corrects misperceptions about earnings across different strategies. We examine whether these interventions reduce inefficient behaviors—such as prolonged queuing at transport hubs despite lower earnings—and improve drivers’ income. The study contributes to understanding human-AI complementarity in labor markets and informs the design of technology systems that augment rather than replace human judgment.
External Link(s)

Registration Citation

Citation
He, Guojun and Qinrui Xiahou. 2025. "AI, Belief Calibration, and Human Decision-Making: Evidence from a Large-Scale Field Experiment with Taxi Drivers." AEA RCT Registry. September 15. https://doi.org/10.1257/rct.16775-1.0
Experimental Details

Interventions

Intervention(s)
The study involves a series of in-app interventions delivered through a widely used taxi driver application in a major city in China. These interventions are designed to improve driver outcomes by providing real-time information and personalized feedback on earnings. There is also a control group that receives no new features.
Intervention Start Date
2025-10-01
Intervention End Date
2025-11-30

Primary Outcomes

Primary Outcomes (end points)
1. Average hourly income (net of fees).
2. Time allocation (fraction of hours spent at transport hubs vs. city cruising).
3. Average queue duration at major transport hubs.
Primary Outcomes (explanation)
1. Average hourly income: Calculated from the high-frequency app log data on fares and total logged working hours for each driver.
2. Time allocation: Calculated from high-frequency GPS data to determine time spent in designated airport/hub queuing zones versus time spent actively cruising in the city.
3. Average queue duration: Calculated from the duration of time a driver is stationary within a designated queuing zone.

Secondary Outcomes

Secondary Outcomes (end points)
1. Driver belief accuracy.
2. Compliance with AI recommendations (usage intensity of AI features).
3. Externalities (hub congestion, passenger wait times).
4. Work duration and total working hours.
Secondary Outcomes (explanation)
1. Driver belief accuracy: This will be measured by comparing self-reported beliefs from an in-app baseline survey with the actual outcomes from driver-level app data.
2. Compliance with AI recommendations: Measured by app telemetry data, such as clicks on recommendations and the frequency of using the new features.
3. Externalities: Measured by analyzing aggregate data on queue lengths, vehicle approach times, and passenger wait times at hubs.
4. Work duration: Measured as the total hours a driver is logged into the app per day or week.

Experimental Design

Experimental Design
The study is a large-scale, multi-arm randomized controlled trial (RCT) conducted in a major city in China. The interventions are delivered through a widely-used mobile application for taxi drivers. Drivers are randomly assigned to a control group or one of several treatment arms, which are designed to enhance driver productivity through AI-assisted ability improvement, belief calibration, or a combination of both.
Experimental Design Details
Not available
Randomization Method
Randomization will be performed by a computer using a stratified randomization protocol based on driver income levels and baseline characteristics to ensure balance across treatment and control groups.
Randomization Unit
Individual driver
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
8,000 drivers
Sample size: planned number of observations
8,000 drivers × daily observations for ~180 days ≈ 1,440,000 driver-day observations.
Sample size (or number of clusters) by treatment arms
Control Group: 2000
Nine treatment arms (from the 3x3 factorial design): 667 drivers each.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
With 8000 observations, assuming an intra-class correlation of 0, 80% power, and a 5% significance level, we can detect a minimum detectable effect size of 4–5% on the main outcome of hourly income.
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number