Can Feedback from a Large Language Model Improve Health Care Quality?

Last registered on January 16, 2026

Pre-Trial

Trial Information

General Information

Title
Can Feedback from a Large Language Model Improve Health Care Quality?
RCT ID
AEARCTR-0015226
Initial registration date
January 23, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 27, 2025, 10:05 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
January 16, 2026, 9:30 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
World Bank

Other Primary Investigator(s)

PI Affiliation
Yale University
PI Affiliation
University of Pennsylvania
PI Affiliation
George Washington University
PI Affiliation
EHA Clinics

Additional Trial Information

Status
Completed
Start date
2025-01-30
End date
2025-10-17
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
The quality of healthcare in low- and middle-income countries is notoriously low (Banerjee et al., 2023). In Nigeria, only a small proportion of primary care patients are seen by a Medical Officer (MO) instead of a mid-level provider such as a Community Health Extension Worker. Access to qualified medical doctors is a barrier to good health care (Okeke, 2023). This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary care clinics. An LLM-based tool provides "second opinions'' to community health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient. For our main analysis, we use a within-patient comparison of two patient notes created by the CHEW; one during the initial patient consultation, and one after the LLM feedback was received. The patient is also seen by a fully trained MO who is in charge of patient care. The MO conducts a blinded review of the CHEW's patient notes to measures changes in the CHEW's care as a result of the LLM feedback.
External Link(s)

Registration Citation

Citation
Abaluck, Jason et al. 2026. "Can Feedback from a Large Language Model Improve Health Care Quality?." AEA RCT Registry. January 16. https://doi.org/10.1257/rct.15226-1.1
Experimental Details

Interventions

Intervention(s)
An LLM is prompt-engineered to provide specific and concrete feedback on structured patient notes created by community health extension workers (CHEWs) at two clinics in Nigeria as part of their usual data entry into an electronic patient management system.
Intervention (Hidden)
A community health extension worker (CHEW) conducted the patient consultation and prepared a "conditional'" SOAP note through completing the fields of the patient EMR. This conditional record specified the provisional diagnosis and prescriptions conditional on laboratory test results. The health worker then submitted this "unassisted" SOAP note to the LLM through the EMR system for feedback. The LLM prompt was developed in extensive piloting and review of simulated feedback and was in particular aimed at producing concise responses that did not rely on excessive laboratory testing (McPeak et al., 2024). In all cases, we used the GPT-o1-mini model snapshot (o1-mini-2024-09-12). Upon receiving the feedback from the LLM in the EMR system, the health worker had the option to update the EMR record, which created the "assisted" (conditional) SOAP note.
The intervention and data collection at the clinics took place from January 30, 2025 to May 26, 2025.
Intervention Start Date
2025-01-30
Intervention End Date
2025-05-26

Primary Outcomes

Primary Outcomes (end points)
Indicator for "any treatment error", indicator for "severe treatment error", indicator for "better treatment plan", and a treatment misallocation indicator for malaria, anemia, and UTI (see pre-analysis plan for details)
Primary Outcomes (explanation)
See pre-analysis plan.

Secondary Outcomes

Secondary Outcomes (end points)
See pre-analysis plan.
Secondary Outcomes (explanation)
See pre-analysis plan.

Experimental Design

Experimental Design
This project tests whether Large Language Models (LLMs) can improve patient care in Nigerian primary
care clinics by giving customized and instant feedback to the provider in natural language. An LLM-based
tool integrated into an electronic patient management system provides “second opinions” to community
health extension workers (CHEWs) at two clinics in Nigeria. These second opinions are intended to mirror
what a reviewing physician might advise the CHEWs after seeing or hearing their initial report on a patient.
For our main analysis, we use a within-patient comparison of two patient notes created by the CHEW; one
during the initial patient consultation, and one after the LLM feedback was received. The patient is also
seen by a fully trained medical officer who is in charge of patient care. The MO conducts a randomized blinded review
of the CHEW’s patient notes to measures changes in the CHEW’s care as a result of the LLM feedback.
Experimental Design Details
See pre-analysis plan.
Randomization Method
simple randomization integrated into data interface
Randomization Unit
Patient SOAP note (provider's patient encounter record)
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
500 patients with 2 SOAP notes each.
Sample size: planned number of observations
1000 (2 per patient)
Sample size (or number of clusters) by treatment arms
500 per arm (2 per patient, unassisted and LLM-assisted patient note)
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Yale University IRB # IRB00011725
IRB Approval Date
2025-01-23
IRB Approval Number
2000035990/MOD00069989
IRB Name
Kano State of Nigeria Ministry of Health, Health Research Ethics Committee
IRB Approval Date
2024-10-17
IRB Approval Number
SHREC/2024/5464
Analysis Plan

Analysis Plan Documents

Can Feedback from a Large Language Model Improve Health Care Quality? Pre-Analysis Plan

MD5: 7ae64df64e0b886dcc60dcd3cd5fa3f6

SHA1: 1db7352e9cbb624b90c5dae6a596768bcddf8cae

Uploaded At: January 22, 2025

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
Yes
Intervention Completion Date
May 26, 2025, 12:00 +00:00
Data Collection Complete
Yes
Data Collection Completion Date
October 17, 2025, 12:00 +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
There were 660 patient visits in the full sample (Jan 30-May 26), 491 patient visits in the correctly randomized sample (Feb 25-May 26). From Jan 30 to Feb 24, SOAP notes presented to the MO for assessment were not correctly randomized, and we are discarding the subjective MO evaluations for these observations. As a result, there are 982 observations for the subjective evaluation data in the correctly randomized sample (Feb 25-May 26), and 1320 observations for the objective data, such as test results (Jan 30-May 26).
Was attrition correlated with treatment status?
No
Final Sample Size: Total Number of Observations
See "Number of Clusters".
Final Sample Size (or Number of Clusters) by Treatment Arms
There were 660 patient visits in the full sample (Jan 30-May 26), 491 patient visits in the correctly randomized sample (Feb 25-May 26); unassisted and assisted SOAP note were presented to the MO in random order for evaluation.
Data Publication

Data Publication

Is public data available?
No

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Program Files

Program Files
Yes
Reports, Papers & Other Materials

Relevant Paper(s)

Abstract
We deployed large language model (LLM) decision support for health workers at two outpatient clinics in Nigeria. For each patient, health workers drafted care plans that were optionally revised after LLM feedback. We compared unassisted and assisted plans using blinded randomized assessments by on-site physicians who evaluated and treated the same patients, as well as results from laboratory tests for common conditions. Academic physicians performed blinded retrospective reviews of a subset of notes. In response to LLM feedback, health workers changed their prescribing for more than half of the patients and reported high satisfaction with the recommendations, and retrospective academic reviewers rated LLM-assisted plans more favorably. However, on-site physicians observed little to no improvement in diagnostic alignment or treatment decisions. Laboratory testing showed mixed effects of LLM-assistance, which removed negative tests for malaria but added them for urinary tract infection and anemia, with no significant increase in the detection rates for the tested conditions.
Citation
Jason Abaluck, Robert Pless, Nirmal Ravi, Anja Sautmann, and Aaron Schwartz, "Does LLM Assistance Improve Healthcare Delivery? An Evaluation Using On-site Physicians and Laboratory Tests," NBER Working Paper 34660 (2026), https://doi.org/10.3386/w34660.

Reports & Other Materials