Can Feedback from a Large Language Model Improve Health Care Quality?

Last registered on January 27, 2025

Pre-Trial

Trial Information

General Information

Title
Can Feedback from a Large Language Model Improve Health Care Quality?
RCT ID
AEARCTR-0015226
Initial registration date
January 23, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 27, 2025, 10:05 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
World Bank

Other Primary Investigator(s)

PI Affiliation
Yale University
PI Affiliation
University of Pennsylvania
PI Affiliation
George Washington University
PI Affiliation
EHA Clinics

Additional Trial Information

Status
In development
Start date
2025-01-21
End date
2025-07-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
The quality of healthcare in low- and middle-income countries is notoriously low (Banerjee et al., 2023). In Nigeria, only a small proportion of primary care patients are seen by a Medical Officer (MO) instead of a mid-level provider such as a Community Health Extension Worker. Access to qualified medical doctors is a barrier to good health care (Okeke, 2023). This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary care clinics. An LLM-based tool provides "second opinions'' to community health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient. For our main analysis, we use a within-patient comparison of two patient notes created by the CHEW; one during the initial patient consultation, and one after the LLM feedback was received. The patient is also seen by a fully trained MO who is in charge of patient care. The MO conducts a blinded review of the CHEW's patient notes to measures changes in the CHEW's care as a result of the LLM feedback.
External Link(s)

Registration Citation

Citation
Abaluck, Jason et al. 2025. "Can Feedback from a Large Language Model Improve Health Care Quality?." AEA RCT Registry. January 27. https://doi.org/10.1257/rct.15226-1.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
An LLM is prompt-engineered to provide specific and concrete feedback on structured patient notes created by community health extension workers (CHEWs) at two clinics in Nigeria as part of their usual data entry into an electronic patient management system.
Intervention Start Date
2025-01-21
Intervention End Date
2025-04-30

Primary Outcomes

Primary Outcomes (end points)
Indicator for "any treatment error", indicator for "severe treatment error", indicator for "better treatment plan", and a treatment misallocation indicator for malaria, anemia, and UTI (see pre-analysis plan for details)
Primary Outcomes (explanation)
See pre-analysis plan.

Secondary Outcomes

Secondary Outcomes (end points)
See pre-analysis plan.
Secondary Outcomes (explanation)
See pre-analysis plan.

Experimental Design

Experimental Design
This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary
care clinics by giving customized and instant feedback to the provider in natural language. An LLM-based
tool integrated into an electronic patient management system provides “second opinions” to community
health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror
what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient.
For our main analysis, we use a within-patient comparison of two patient notes created by the CHEW; one
during the initial patient consultation, and one after the LLM feedback was received. The patient is also
seen by a fully trained medical officer who is in charge of patient care. The MO conducts a randomized blinded review
of the CHEW’s patient notes to measures changes in the CHEW’s care as a result of the LLM feedback.
Experimental Design Details
Not available
Randomization Method
simple randomization integrated into data interface
Randomization Unit
Patient SOAP note (provider's patient encounter record)
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
500 patients with 2 SOAP notes each.
Sample size: planned number of observations
1000 (2 per patient)
Sample size (or number of clusters) by treatment arms
500 per arm (2 per patient, unassisted and LLM-assisted patient note)
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Yale University IRB # IRB00011725
IRB Approval Date
2025-01-23
IRB Approval Number
2000035990/MOD00069989
IRB Name
Kano State of Nigeria Ministry of Health, Health Research Ethics Committee
IRB Approval Date
2024-10-17
IRB Approval Number
SHREC/2024/5464
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information