Back to History Current Version

Upskilling in Data Science with GenAI: How far can we go?

Last registered on March 14, 2024


Trial Information

General Information

Upskilling in Data Science with GenAI: How far can we go?
Initial registration date
March 01, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
March 13, 2024, 11:13 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
March 14, 2024, 12:48 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.


There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

University of Michigan - Ann Arbor

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
PI Affiliation
PI Affiliation
PI Affiliation
PI Affiliation
PI Affiliation

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Generative Artificial Intelligence (GenAI) is revolutionizing the workplace in ways we have never seen before. In a recent study, researchers show how GenAI could improve productivity of workers and quality of output in tasks that are within GenAI capabilities (Dell'Acqua, et al., 2023). In this study, we aim to explore a new facet of human - GenAI interaction, exploring the potential of using GenAI chatbots to upskill participants in a domain with which the participants are not familiar. We will use Data Science as the test case because this is within the domain of expertise of ChatGPT/GPT-4 with Code Interpreter, which will be the tool used in the experiment. Moreover, we seek to identify areas where GenAI might fall short in its ability to act as an effective coach and where participants using ChatGPT will be unable to substitute for data science domain knowledge and expertise.
External Link(s)

Registration Citation

Abbadi, Mohamed et al. 2024. "Upskilling in Data Science with GenAI: How far can we go?." AEA RCT Registry. March 14.
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details


We plan to give participants in the treatment group access to ChatGPT-4 with basic training on how to use it (less than 30 minutes). Participants in the control group will have access to general search tools (e.g., Google) but won't have to ChatGPT
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
We measure the impact of intervention on two types of outcomes – task-related and self-reported outcomes. On the task-related outcomes, we will measure the impact of ChatGPT on the level of accuracy (correctness and approach) and time taken to complete three data science tasks (problem solving, statistical knowledge and coding) by associates and consultants compared to their counterparts in the control group without ChatGPT. On the task-related outcomes, we will benchmark outcomes of both the control and the treatment to data scientists that do not have access to ChatGPT but have access to general search engines (e.g., Google). On the self-reported outcomes, we will measure the impact of the intervention on their professional identity, beliefs about their career development and ability to complete data-science tasks. We will compare self-reported outcomes pre- and post-survey and across control and treatment groups.
Primary Outcomes (explanation)
Upskilling in coding

There is one distinct correct answer for the coding assignment. Correctness is a binary measure where its 0 if wrong and 1 if correct.
Code quality will be calculated through a coding effort score, where the higher the effort score means the lower the quality of the code:
Coding Effort Score= ∑▒〖(Errors encountered ×(1-ε/2〗)) (1)
where ε is 0 if the bug was not resolved and 1 if the bug was resolved.
Overall time to complete each task will be measured using Qualtrics functionality

Upskilling in data-driven problem solving
The problem-solving task is designed to have numerous possible answers, some of which are better than others. We will use the answers submitted by the data scientists as the baseline/benchmark by which to grade the results of the associates and consultants. The data scientists results will be manually graded by humans and categorically assigned a weight of good, better, or best to establish 3 baselines.
Specifically, the participants are submitting a predictability score for each soccer match. We will normalize the participants answers for each match and calculate a loss score for the answers submitted by the associates and consultants when compared to the data science benchmarks:
Loss Score= 1/n ∑_(i=0)^n▒〖(|〖Associate or Consultant normalized predictability score〗_i-〖Baseline normalized predictability score〗_i |)〗 (2)
where n is the number of soccer matches in the dataset.

Upskilling in statistics
Each question in the statistics task will be graded against the rubric (shown in appendix section 8.3). The rubric scores are a weighted correctness score such that the final score will be determined by a weighted sum across all answers:
Total correctness= ∑_(i=1)^n▒〖〖Correctness of answer〗_(i )×〖Complexity weight〗_i 〗 (3)
where n is the total number of distinct questions, correctness of answer, and the complexity weighting is defined as the level of complexity of the question. The complexity weightings were determined by asking several lead data scientists, with greater than 5 years of experience, to rank the complexity of each question and averaging across their answers.

Secondary Outcomes

Secondary Outcomes (end points)
In addition to the primary outcomes, we plan to explore various factors that may influence the performance and outcomes of associates and consultants. These factors include grit, mental presence, and background in STEM or other data science-related disciplines such as statistics, mathematics, and economics.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We will conduct an online randomized control trial to assess the impact of GenAI on upskilling in data science. Initially, we will invite associates, and consultants from a large global consulting firm to join a GenAI study via email. Associates and consultants who register will be randomly assigned to either a Control or a ChatGPT experimental condition group. In parallel, we plan to recruit data scientists to participate in a similar exercise, serving as a benchmark for the typical performance of a data scientist.
After consent, each participant will fill out a pre-survey, go over a short training session depending on experimental condition, complete 2 out of 3 data science tasks (randomly assigned) - with or without the help of ChatGPT depending on experimental condition, and fill out a post survey.
Experimental Design Details
Not available
Randomization Method
We used computer-generated numbers to stratify and randomly assign our participants to a control or treatment arm.
Randomization Unit
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
No clustering
Sample size: planned number of observations
Around 9,00 participants
Sample size (or number of clusters) by treatment arms
Around 900
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
1/3 S.D.
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information