AI EVALUATIONS AND SCREENING - A Detailed Study on Human-AI Collaboration in Screening Efficiency and Decision-Making, 2024

Last registered on May 09, 2024

Pre-Trial

Trial Information

General Information

Title
AI EVALUATIONS AND SCREENING - A Detailed Study on Human-AI Collaboration in Screening Efficiency and Decision-Making, 2024
RCT ID
AEARCTR-0013525
Initial registration date
April 29, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
May 09, 2024, 1:56 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
HBS

Other Primary Investigator(s)

PI Affiliation
HBS
PI Affiliation
UW
PI Affiliation
HBS
PI Affiliation
HBS

Additional Trial Information

Status
In development
Start date
2024-04-30
End date
2025-06-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
This study investigates the integration of artificial intelligence (AI) in the screening processes of early-stage innovations, traditionally conducted by human evaluators, across various professional and competitive settings. Through a randomized controlled trial involving around 400 participants from the MIT Solve expert internal screener team and from community leveraged startups screeners, this research explores whether AI-assisted human evaluators or AI-only evaluations enhance the efficiency and quality of decision-making compared to traditional human-only evaluations. Outcomes measured include the time efficiency of evaluations, consistency and convergence of decisions, evaluator confidence, and the overall quality of decisions across three conditions: control (no AI assistance), Treatment A (basic AI assistance), and Treatment B (advanced AI assistance providing detailed rationales). The findings aim to delineate the conditions under which human-AI collaboration optimizes evaluation outcomes of early-stage innovations, contributing to the broader discourse on effectively combining human intuition with AI’s processing capabilities. This could have significant implications for fields requiring precise and timely assessments, such as academic research, grant funding, and competitive selection processes, enhancing both theoretical understanding and practical applications of AI in evaluative tasks.
External Link(s)

Registration Citation

Citation
Ayoubi, Charles et al. 2024. "AI EVALUATIONS AND SCREENING - A Detailed Study on Human-AI Collaboration in Screening Efficiency and Decision-Making, 2024." AEA RCT Registry. May 09. https://doi.org/10.1257/rct.13525-1.0
Experimental Details

Interventions

Intervention(s)
This study employs three distinct intervention strategies to evaluate the impact of AI-assisted decision-making in the evaluation of early stage innovations:

Control Group: Participants will conduct evaluations without any technological assistance, mirroring traditional human-only evaluation processes.
Treatment A: This group involves the use of a generative AI tool that provides basic pass/fail recommendations to assist participants in their decision-making. This is intended to assess whether simple AI guidance can enhance the efficiency of evaluations compared to the traditional approach.
Treatment B: Participants in this group receive a more sophisticated form of AI assistance, which includes not only pass/fail recommendations but also detailed rationales behind each decision. This intervention aims to explore whether increased expalinability and depth in AI guidance can improve decision quality and evaluator confidence more significantly than basic AI assistance.
Intervention Start Date
2024-04-30
Intervention End Date
2024-12-31

Primary Outcomes

Primary Outcomes (end points)
The primary outcomes for this study are designed to measure the direct effects of AI integration on the evaluation process:

Efficiency: Time taken to reach a decision for each submission, recorded in seconds.
Decision Consistency: The degree of uniformity in decisions across different evaluators, measured using inter-rater reliability statistics.
Evaluator Confidence: A quantitative assessment of how confident evaluators feel about their decisions, measured on a Likert scale from 1 (not confident) to 7 (very confident).
Alignment with AI Recommendations: The degree to which participants in Treatments A and B align with the AI's recommendations.
Decision Quality: Evaluated by the alignment of the screeners' decisions to selection decisions to the next stage of the evaluation process and expert judges' quality evaluations of the submissions.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary outcomes of the study will investigate the broader impacts of AI integration on evaluators' perceptions and the decision-making process:

Perception of AI Utility: Evaluators' subjective ratings of how useful the AI was in assisting their decision-making, collected through post-evaluation surveys.
Trust in AI: Changes in evaluators' trust in AI technology, measured before and after the interventions.
Adoption Willingness: Evaluators' willingness to incorporate AI assistance in future evaluations, assessed at the end of the study.
Secondary Outcomes (explanation)
These outcomes will be assessed through a combination of quantitative surveys and qualitative interviews, designed to capture both the measurable changes in perception and the nuanced personal experiences of the evaluators with the AI tools.

Experimental Design

Experimental Design
The experimental design for this study is structured as a three-arm randomized controlled trial (RCT) to evaluate the impact of AI-assisted decision-making in the screening of submissions for a global health equity challenge. The design is intended to test the efficiency and effectiveness of AI interventions compared to traditional human-only evaluation processes.

Control Group: Participants in this group perform evaluations manually, without any AI tools, serving as the baseline against which the AI-assisted groups are compared. This mimics the traditional process currently employed in most evaluative settings, allowing for a direct assessment of the impact of AI interventions.
Treatment A: In this arm, participants receive basic AI assistance, which provides binary (pass/fail) recommendations for each submission. This treatment tests the hypothesis that even minimal AI involvement can streamline the evaluation process and reduce the time and cognitive load on human evaluators.
Treatment B: Participants in this group use a more advanced AI tool that not only suggests binary outcomes but also provides detailed rationales for each recommendation. This treatment is designed to explore whether deeper AI integration, which includes providing context and explanations, can improve the quality of decisions and increase evaluator confidence and alignment with expert decisions.
The study's design allows for a controlled comparison across different levels of AI assistance, providing insights into how different AI models might enhance or interfere with human cognitive processes in evaluative tasks.
Experimental Design Details
Not available
Randomization Method
Random assignment of participants to the intervention groups is achieved through a computerized random number generator, ensuring that the allocation is both random and concealed until the point of assignment.
Randomization Unit
The units of randomization in this study are the individual evaluator and the individual solutions, with each participant being independently assigned to one or two of the three study conditions.
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
48 solutions and 400 screener participants.
Sample size: planned number of observations
Each evaluator (400 total) screens an average of 20 (Between 15 and 30) solutions for a total of 8000 screener-solution pairs.
Sample size (or number of clusters) by treatment arms
Balancing out across the three conditions leads to around 2650 observations per condition (control, treatment A and treatment B).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Harvard University-Area Committee on the Use of Human Subjects
IRB Approval Date
2024-04-26
IRB Approval Number
IRB24-0614