The Impact of Large Language Models on Diagnostic Reasoning Among Medical Doctors

Last registered on January 18, 2025

Pre-Trial

Trial Information

General Information

Title
The Impact of Large Language Models on Diagnostic Reasoning Among Medical Doctors
RCT ID
AEARCTR-0015117
Initial registration date
January 04, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 10, 2025, 1:05 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
January 18, 2025, 1:24 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Lahore University of Management Sciences (LUMS)

Other Primary Investigator(s)

PI Affiliation
Lahore University of Management Sciences
PI Affiliation
King Edward Medical University
PI Affiliation
Lahore General Hospital
PI Affiliation
Children's Hospital, Lahore
PI Affiliation
Lahore University of Management Sciences

Additional Trial Information

Status
In development
Start date
2025-01-10
End date
2025-09-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Diagnostic errors are a major source of preventable patient harm. Large language models (LLMs) have shown promise in assisting with clinical decision-making, potentially improving diagnostic accuracy and efficiency. However, the impact of LLMs on medical doctors' diagnostic reasoning compared to conventional diagnostic resources remains unclear. This study aims to evaluate whether providing medical doctors (including physicians and surgeons) with access to ChatGPT-4o, in addition to standard resources, enhances their diagnostic reasoning performance. All participating doctors will have completed at least a 10-hour training program covering ChatGPT-4o usage, prompt engineering techniques, and output evaluation strategies.
External Link(s)

Registration Citation

Citation
Akhtar, Muhammad Junaid et al. 2025. "The Impact of Large Language Models on Diagnostic Reasoning Among Medical Doctors." AEA RCT Registry. January 18. https://doi.org/10.1257/rct.15117-1.1
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
Treatment group will be given access to ChatGPT-4o.
Intervention Start Date
2025-01-10
Intervention End Date
2025-07-31

Primary Outcomes

Primary Outcomes (end points)
The primary outcome will be the diagnostic reasoning score, calculated as a percentage score (0-100%) for each case.
Primary Outcomes (explanation)
For each case, participants will be asked for three top diagnoses, findings from the case that support that diagnosis, and findings from the case that oppose that diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.

Secondary Outcomes

Secondary Outcomes (end points)
Time Spent on Diagnosis
Secondary Outcomes (explanation)
This will be the total time (in seconds) participants spend per case.

Experimental Design

Experimental Design
The trial will be designed as a randomized, two-arm, single-blind parallel group study.
Experimental Design Details
Not available
Randomization Method
Randomization done in office by a computer
Randomization Unit
Individual medical doctor.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
No clustering. Individual medical doctors will be randomized.
Sample size: planned number of observations
50 medical doctors
Sample size (or number of clusters) by treatment arms
25 medical doctors each in both treatment and control
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
The minimum detectable effect size for the primary outcome (i.e., the difference in mean diagnostic reasoning scores) is 8 percentage points between groups.
IRB

Institutional Review Boards (IRBs)

IRB Name
Institutional Review Board, Lahore University of Management Sciences
IRB Approval Date
2024-12-06
IRB Approval Number
IRB-0342
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information