Human Experts and Artificial Intelligence: The Value of Human Input in Diagnostic Imaging

Last registered on June 28, 2023


January 13, 2022

January 14, 2022, 1:56 PM EST

June 28, 2023, 10:43 AM EDT

Purdue University

Massachusetts Institute of Technology
Massachusetts Institute of Technology
Harvard Medical School

This trial does not extend or rely on any prior RCTs.
We plan to investigate how human experts combine their own information with AI predictions when making assessments and decisions in the radiology domain. See the attached pre-analysis plan for full details.
Agarwal, Nikhil et al. 2023. "Human Experts and Artificial Intelligence: The Value of Human Input in Diagnostic Imaging." AEA RCT Registry. June 28.
See the attached pre-analysis plan for full details.
To measure the quality of diagnostic assessments and decisions we will primarily focus on the following primary outcomes variables for each pathology group.
1. Error in probability assessment
2. Incorrect treatment/followup recommendation

The primary pathology groups we will consider are:
1. Pooled outcomes for all pathologies
2. Pooled outcomes for all AI assisted pathologies
See the attached pre-analysis plan for full details.

1. Time-taken and measures of effort exerted to parse the information in the X-ray and the clinical history, with and without AI
2. Heterogeneity of treatment effects by pathology prevalence and AI performance
See the attached pre-analysis plan for full details.

We ask radiologists to read chest x-rays, randomizing whether they have access to the patient's clinical history and an AI tool. See the attached pre-analysis plan for full details of the experiment and analysis plan.
The attached pre-analysis plan contains the full randomization details. The order in which radiologists go through the different treatments in each session will be randomized to account for order effects. For each radiologist, we will randomly sample 60 cases to be read under each experimental condition.2 We will then randomly select a sequence of images from the set of image sequences satisfying the following two criteria: (1) there are 15 cases in each treatment arm per round and (2) each image is read in all treatment arms across the rounds.

The advantage of the within-subject design is that we observe each radiologist making many decisions under each treatment arm, which makes it easier to detect effects as it can control for across-subject heterogeneity. In addition, this design facilitates estimation of an economic model of decision making described later on that is used to study automation bias or neglect.
Randomization occurs at the patient case level for each radiologist.
Sample size: planned number of clusters
We expect between 30-35 radiologists to complete the experiment.
Sample size: planned number of observations
Each radiologist reads 60 cases under all four treatment arms. This will result in 8,400 total reads if 35 radiologists complete the experiment.
Sample size (or number of clusters) by treatment arms
Each radiologist will read 60 cases under each of the four treatment arms. This will result in 2,100 observations per arm if 35 radiologists complete the experiment.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
See the attached pre-analysis plan for full details of the power calculations.

MIT Committee on the User of Humans as Experimental Subjects
Pre-Analysis Plan

MD5: 697da19a77bb639ba35d78d48c23599b

SHA1: d09e135b1be6f5a056a534a5c9329add237a9fd7

Uploaded At: January 13, 2022


There is information in this trial unavailable to the public. Use the button below to request access.

