Upskilling in Data Science with GenAI: How far can we go?

Generative Artificial Intelligence (GenAI) is revolutionizing the workplace in ways we have never seen before. In a recent study, researchers show how GenAI could improve productivity of workers and quality of output in tasks that are within GenAI capabilities (Dell'Acqua, et al., 2023). In this study, we aim to explore a new facet of human - GenAI interaction, exploring the potential of using GenAI chatbots to upskill participants in a domain with which the participants are not familiar. We will use Data Science as the test case because this is within the domain of expertise of ChatGPT/GPT-4 with Code Interpreter, which will be the tool used in the experiment. Moreover, we seek to identify areas where GenAI might fall short in its ability to act as an effective coach and where participants using ChatGPT will be unable to substitute for data science domain knowledge and expertise.

External Link(s)

Registration Citation

Citation

Abbadi, Mohamed et al. 2024. "Upskilling in Data Science with GenAI: How far can we go?." AEA RCT Registry. March 14. https://doi.org/10.1257/rct.13123-2.1

Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Experimental Details

Interventions

Intervention(s)

We plan to give participants in the treatment group access to ChatGPT-4 with basic training on how to use it (less than 30 minutes). Participants in the control group will have access to general search tools (e.g., Google) but won't have to ChatGPT

Intervention (Hidden)

Beside providing access to Chat-GPT, we will give a short training on how to use ChatGPT. Training includes an introduction to GPT, standard prompting, and analyzing data with ChatGPT Data Analyst. Details are included in the protocol.

Intervention Start Date

2024-03-15

Intervention End Date

2024-04-30

Primary Outcomes

Primary Outcomes (end points)

We measure the impact of intervention on two types of outcomes – task-related and self-reported outcomes. On the task-related outcomes, we will measure the impact of ChatGPT on the level of accuracy (correctness and approach) and time taken to complete three data science tasks (problem solving, statistical knowledge and coding) by associates and consultants compared to their counterparts in the control group without ChatGPT. On the task-related outcomes, we will benchmark outcomes of both the control and the treatment to data scientists that do not have access to ChatGPT but have access to general search engines (e.g., Google). On the self-reported outcomes, we will measure the impact of the intervention on their professional identity, beliefs about their career development and ability to complete data-science tasks. We will compare self-reported outcomes pre- and post-survey and across control and treatment groups.

Primary Outcomes (explanation)

Upskilling in coding

There is one distinct correct answer for the coding assignment. Correctness is a binary measure where its 0 if wrong and 1 if correct.
Code quality will be calculated through a coding effort score, where the higher the effort score means the lower the quality of the code:
Coding Effort Score= ∑▒〖(Errors encountered ×(1-ε/2〗)) (1)
where ε is 0 if the bug was not resolved and 1 if the bug was resolved.
Overall time to complete each task will be measured using Qualtrics functionality

Upskilling in data-driven problem solving
The problem-solving task is designed to have numerous possible answers, some of which are better than others. We will use the answers submitted by the data scientists as the baseline/benchmark by which to grade the results of the associates and consultants. The data scientists results will be manually graded by humans and categorically assigned a weight of good, better, or best to establish 3 baselines.
Specifically, the participants are submitting a predictability score for each soccer match. We will normalize the participants answers for each match and calculate a loss score for the answers submitted by the associates and consultants when compared to the data science benchmarks:
Loss Score= 1/n ∑_(i=0)^n▒〖(|〖Associate or Consultant normalized predictability score〗_i-〖Baseline normalized predictability score〗_i |)〗 (2)
where n is the number of soccer matches in the dataset.

Upskilling in statistics
Each question in the statistics task will be graded against the rubric (shown in appendix section 8.3). The rubric scores are a weighted correctness score such that the final score will be determined by a weighted sum across all answers:
Total correctness= ∑_(i=1)^n▒〖〖Correctness of answer〗_(i )×〖Complexity weight〗_i 〗 (3)
where n is the total number of distinct questions, correctness of answer, and the complexity weighting is defined as the level of complexity of the question. The complexity weightings were determined by asking several lead data scientists, with greater than 5 years of experience, to rank the complexity of each question and averaging across their answers.

Secondary Outcomes

Secondary Outcomes (end points)

In addition to the primary outcomes, we plan to explore various factors that may influence the performance and outcomes of associates and consultants. These factors include grit, mental presence, and background in STEM or other data science-related disciplines such as statistics, mathematics, and economics.

Secondary Outcomes (explanation)

Experimental Design

We will conduct an online randomized control trial to assess the impact of GenAI on upskilling in data science. Initially, we will invite associates, and consultants from a large global consulting firm to join a GenAI study via email. Associates and consultants who register will be randomly assigned to either a Control or a ChatGPT experimental condition group. In parallel, we plan to recruit data scientists to participate in a similar exercise, serving as a benchmark for the typical performance of a data scientist.
After consent, each participant will fill out a pre-survey, go over a short training session depending on experimental condition, complete 2 out of 3 data science tasks (randomly assigned) - with or without the help of ChatGPT depending on experimental condition, and fill out a post survey.

Experimental Design Details

We will conduct an online randomized control trial to assess the impact of GenAI on upskilling in data science. Initially, we will invite Boston Consulting Group (BCG) associates, and consultants to join a GenAI study via email. Associates and consultants who register will be randomly assigned to either a Control or a ChatGPT experimental condition group. In parallel, we plan to recruit data scientists to participate in a similar exercise, serving as a benchmark for the typical performance of a data scientist.

In the recruitment phase, we sent a survey to BCG's associates, and consultants to gauge their interest in the study, offering participation as a contribution towards their career development. This survey collected information on demographics, programming and ChatGPT skills, technology openness, creativity, and learning orientation (Agarwal & Prasad, 1998; Miron, Erez, & Naveh, 2004; Jha & Bhattacharyya, 2013). Details of the survey are available in the Appendix (Registration survey).

Associates and consultants will be randomly assigned to a Control or ChatGPT experimental condition. We plan to stratify our sample across multiple dimensions including their, gender, location, role (i.e., associate or consultant), coding skills, college degree (i.e., bachelors, masters, Ph.D.), and experience with ChatGPT for coding. Prior to participation in the experiment, all subjects will be asked to consent to participate in the experiment. We indicate that participation in the study is voluntary, and the time will count as an "office contribution" to their career development committee to reflect our appreciation for their efforts. We also provide additional incentives to encourage an ‘honest effort’ in the tasks. Top performers in each group will receive recognition among BCG leadership, an invitation to a small group chat with OpenAI and OpenAI merchandise.

After the consent, participants will complete pre- and post-experiment surveys and engage in data science tasks designed to evaluate their knowledge. Control group members will not use ChatGPT or similar tools to complete these tasks, although they may use Google or other resources. Conversely, the ChatGPT group will be provided a brief ChatGPT training (15-20 minutes) and asked to use ChatGPT to assist with their responses.

The experiment includes four stages, starting with a pre-experiment survey on subjective coding skills, GenAI usage, professional identity, and career aspirations (Pre-survey). We developed three independent tasks to test the knowledge in data science - a coding task (Coding task), a problem-solving task (Problem-solving task) and statistical knowledge task (Statistical knowledge task). However, we will randomly assign each participant, two randomly selected tasks from the three due to the effort they require to complete (~90 minutes each), with task order randomized to prevent ordering effects. Following task completion, a post-survey similar to the pre-survey will measure any change in participants' perceptions (Post-survey).

Following participation in the experiment, we will conduct qualitative interviews with a selective sample of participants. Particularly, we will target individuals who over-performed and under-performed in the data science tasks to better understand the underlying factors that influenced their outcomes.

Randomization Method

We used computer-generated numbers to stratify and randomly assign our participants to a control or treatment arm.

Randomization Unit

Individual

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

No clustering

Sample size: planned number of observations

Around 9,00 participants

Sample size (or number of clusters) by treatment arms

Around 900

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

1/3 S.D.

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB

Institutional Review Boards (IRBs)

IRB Name

WCG IRB

IRB Approval Date

2024-03-05

IRB Approval Number

SUB-1988325

Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?

Data Collection Complete

Data Publication

Is public data available?

Program Files

Reports, Papers & Other Materials

Upskilling in Data Science with GenAI: How far can we go?

Pre-Trial

General Information

Locations

Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Registration Citation

Interventions

Primary Outcomes

Secondary Outcomes

Experimental Design

Experiment Characteristics

Institutional Review Boards (IRBs)

Post-Trial

Study Withdrawal

Intervention

Data Publication

Program Files

Relevant Paper(s)

Reports & Other Materials