The Impact of Specialized AI tools for Lawyering Tasks

Last registered on December 23, 2024

Pre-Trial

Trial Information

General Information

Title
The Impact of Specialized AI tools for Lawyering Tasks
RCT ID
AEARCTR-0014957
Initial registration date
December 20, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
December 23, 2024, 1:27 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Primary Investigator

Affiliation

Other Primary Investigator(s)

PI Affiliation
University of Minnesota
PI Affiliation
University of Michigan
PI Affiliation
University of Minnesota
PI Affiliation
University of Michigan
PI Affiliation

Additional Trial Information

Status
Completed
Start date
2024-10-01
End date
2024-11-04
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We seek to study the impact of large language models (LLMs) and LLMs with additional domain-specific scaffolding on legal work. While pre-trained LLMs have been shown to increase worker productivity across a number of tasks, there is limited empirical research on how domain-specific software integrations and post-training model enhancements could drive additional impacts on worker productivity and quality. In real-world work applications, LLMs are often enhanced with “scaffolding” that includes tool-use capabilities, retrieval-augmented generation (RAG), access to specialized datasets, backend prompt optimization, and user interfaces tailored for specific use cases. Recent research suggests that such scaffolding can significantly enhance the capabilities of pre-trained models, but there is a lack of empirical studies quantifying the effect this has on different kinds of work. As AI progress continues, understanding the marginal impact of scaffolding built around LLMs will be essential to gauging AI’s impact on the future of work. This study aims to generate new evidence on this question, using legal work as an experimental setting. To do so, we propose a RCT to evaluate the impact of LLM-powered legal assistant software on legal labor. We will randomly assign second- and third-year law students into a no-AI control group and two treatment groups: one with access to OpenAI’s o1 model and the other with access to VLex’s Vincent AI, a legal assistant software built on GPT-4o with additional features specific to legal work. We will then measure productivity and work quality differences on a set of 6 legal work tasks that participants complete, and estimate the causal impact of using LLMs and LLM-powered tools to assist with these tasks.

External Link(s)

Registration Citation

Citation
Barry, Patrick et al. 2024. "The Impact of Specialized AI tools for Lawyering Tasks." AEA RCT Registry. December 23. https://doi.org/10.1257/rct.14957-1.0
Experimental Details

Interventions

Intervention(s)
Participants will be asked to complete 6 different legal research tasks that are illustrative of the type of work that a junior associate normally needs to perform on the job. Access to an AI assistant tool (either OpenAI's o1-preview model or Vincent AI's legal research tool or no AI tool) will be randomly assigned for the task. The trial aims to estimate the causal impact of access to each AI tool on worker productivity and quality on legal tasks.
Intervention (Hidden)
Intervention Start Date
2024-10-01
Intervention End Date
2024-11-04

Primary Outcomes

Primary Outcomes (end points)
Work Quality:
For each of the six tasks, we will evaluate treatment effects on the following criteria that make up work quality:
Accuracy
Analysis
Organization
Clarity
Professionalism

Each will be graded on a 1-7 scale by law professors who are given detailed grading rubrics and instructions.

We will assess a measure overall quality that is an average of the scores on each component of quality.

Time:
A self-reported measure of the number of minutes it took to complete each task.

Productivity:
Overall quality divided by number of minutes it took to complete the task.
Primary Outcomes (explanation)


Secondary Outcomes

Secondary Outcomes (end points)
We may also perform subgroup analysis outcome measures across groups of similar tasks (as opposed to an average quality rating for all tasks).

We may also perform heterogeneity analysis by participant gender, university, or other demographic variables.

We may also compare results to results in Choi et al 2023 that examined the impact of GPT-4 on legal work tasks, but this analysis would be exploratory and interpretation will need to account for the fact that the Choi et al 2023 used a different set of legal work tasks.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
Participants consent to enroll in the study and then are randomly assigned into one of three groups: A, B, or C. Each group has different stipulations around access for Vincent AI, OpenAI's o1-preview model, or no AI to assist them while completing the assigned legal work task.

Group A:

Assignment 1: No AI
Assignment 2: Vincent AI
Assignment 3: o1
Assignment 4: No AI
Assignment 5: Vincent AI
Assignment 6: o1

Group B:

Assignment 1: o1
Assignment 2: No AI
Assignment 3: Vincent AI
Assignment 4: o1
Assignment 5: No AI
Assignment 6: Vincent AI

Group C:

Assignment 1: Vincent AI
Assignment 2: o1
Assignment 3: No AI
Assignment 4: Vincent AI
Assignment 5: o1
Assignment 6: No AI
Experimental Design Details
Randomization Method
Randomization done by a computer. Randomly sorted observations and split observations into thirds to assign group numbers to participants.
Randomization Unit
Individual
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
1, this was not a clustered randomization.
Sample size: planned number of observations
153
Sample size (or number of clusters) by treatment arms
51 in Group A, 51 in Group B, 51 in Group C. This means that 51 participants completed each of the six tasks with no AI, access to o1-preview, and Vincent AI, respectively.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
University of Minnesota Human Research Protection Program
IRB Approval Date
2024-08-15
IRB Approval Number
STUDY00023073

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials