Incentivizing Innovation in Open Source: Evidence from the GitHub Sponsors Program -- Survey Evidence

Last registered on July 17, 2024

Pre-Trial

Trial Information

General Information

Title
Incentivizing Innovation in Open Source: Evidence from the GitHub Sponsors Program -- Survey Evidence
RCT ID
AEARCTR-0013605
Initial registration date
July 14, 2024

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
July 17, 2024, 1:48 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
Columbia University

Other Primary Investigator(s)

PI Affiliation
Harvard Business School
PI Affiliation
IE University
PI Affiliation
Charles River Associates

Additional Trial Information

Status
In development
Start date
2024-05-13
End date
2024-09-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Open source is key to innovation, but we know little about how to incentivize it. In this paper, we examine the impact of a program providing monetary incentives to motivate innovators to contribute to open source. The Sponsors program was introduced by GitHub in May 2019 and enabled organizations and individuals alike to reward developers for their open source work on the platform. To study this program, we collect fine-grained data on about 100,000 GitHub users, their activities, and sponsorship events. Using a difference-in-differences approach, we document two main effects. The first is that developers who opted into the program, which does not entail receiving a financial reward, increased their output after the program’s launch. The second is that the actual receipt of sponsorship has a long-lasting negative effect on innovation, as measured by new repository creation, regardless of the amount of money received. We estimate a similar decline in other community-oriented tasks, but not in coding effort. While the program’s net effect on users’ innovative output appears to be positive, our study shows that receiving an extrinsic reward may crowd out developers’ intrinsic motivation, diverting their effort away from community and service-oriented activities on open source.
External Link(s)

Registration Citation

Citation
Conti, Annamaria et al. 2024. "Incentivizing Innovation in Open Source: Evidence from the GitHub Sponsors Program -- Survey Evidence." AEA RCT Registry. July 17. https://doi.org/10.1257/rct.13605-1.0
Experimental Details

Interventions

Intervention(s)
Intervention (Hidden)
Our intervention focuses on a randomized experiment where survey-takers are provided with the following scenario:
"You have listed your profile under GitHub Sponsors, an initiative by GitHub that allows developers to receive financial payment for their open source contributions."

They are then randomly assigned one of the 4 situations:
1. After 6 months in the program, they receive no sponsorship.
2. After 6 months in the program, they receive a $ 20 sponsorship.
3. After 6 months in the program, they receive a $ 1000 sponsorship.
4. After 6 months in the program, they receive a sponsorship from an organization.

They are then asked about how motivated they would feel towards [Likert Scale - 1 (Least Motivated) to 5 (Very Motivated)]:
1. Committing new code to existing open source repositories.
2. Resolving issues you find in others’ open source repositories.
3. Starting new original open source repositories from scratch.
4. Forking other open source repositories.

After people are randomized into different situations, we can observe how their motivation for the same action varies depending on the extent of the sponsorship they receive, if it varies at all. Since we know whether the respondent is a sponsorable or a non-sponsorable, we can further see how this effect differs across the two groups.

We also report and study survey information more generally.
Intervention Start Date
2024-07-11
Intervention End Date
2024-08-20

Primary Outcomes

Primary Outcomes (end points)
(Descriptive Question) The first question in our survey asks respondents to rate how important they consider different benefits of contributing to open source software in motivating them to contribute more. This is done on a Likert scale from 1 (Not Important At All) to 5 (Very Important). This information is collected for each of the following:

1. Advancement of technology and the benefit of society.
2. Recognition from other open-source developers.
3. Improving opportunities for future jobs and promotions.
4. Making an extra income.


(Intervention) Additionally, as described in our intervention above, we hope to understand the differences in motivation (based on a Likert scale from 1 [Least Motivated] to 5 [Most Motivated]) for each of the actions below by an individual based on what treatment they were provided:

1. Committing new code to existing open source repositories.
2. Resolving issues you find in others’ open source repositories.
3. Starting new original open source repositories from scratch.
4. Forking other open source repositories.

These outcome variables will be analyzed using linear regression models to assess the impact of different motivational factors on the respondents' likelihood of engaging in various open-source activities.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
We seek to understand the familiarity and opinions of GitHub users with the GitHub Sponsors Program through free-text responses, and other post-hoc survey questions.

We also report splits of our data by sponsorables vs non-sponsorables.

Additionally, we ask them what best describes how they contribute to open source, and provide them the following options to choose from:
1. I maintain the code in projects written by others, such as closing the issues created by others.
2. I am an opinion leader in my domain of expertise, and I actively contribute to discussions where critical decisions are made for some open-source projects.
3. I invest in documenting and solving issues in the code, including those that others experience.
4. I start new projects and identify new needs open source can address, and sometimes these projects may be carried by others later.
5. Other (Please Describe): __________

For those individuals that have received sponsorships under the GitHub sponsorship program, we ask them questions about how the sponsorship changed their contributions to open source. If an individual did not receive sponsorships under the program, we ask them if they know anyone who received sponsorships under the program, and how observing someone else receiving sponsorships changed their contributions to open source.

Through our intervention which varies the hypothetical amount and type of sponsorship received by the respondent, we aim to understand the effect of compensation on multiple labor outcomes (i.e. understand how their motivations change for different ways one can contribute to open source). This is evaluated on a Likert scale of 1 (Least Motivated) to 5 (Most Motivated).

The analyses will also compare differences between groups (e.g., sponsorables vs. non-sponsorables) to understand how motivation levels vary based on the respondent's background.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
This is a survey that aims to understand the motivations behind GitHub users' contribution to open source. The experimental design involves a random assignment of the treatment to each individual, where the treatment involves scenarios with varying amounts and types of sponsorships provided to individuals 6 months after they enrolled in the GitHub Sponsorships program.
Experimental Design Details
The experiment involves randomly assigning each individual to one of the four intervention arms. Three of the intervention arms involve scenarios where an individual is instructed to assume that they received varying amounts of sponsorship money ($0, $20, $1000) and are then asked about their motivation to contribute to open source in various ways (such as starting a new open source repository), on a Likert scale from 1 (Least Motivated) to 5 (Most Motivated). We will use regression models to understand the variation in motivation for different tasks based on the varying amounts of sponsorship money they receive.
Randomization Method
Randomization is done online by Qualtrics as it decides which experimental arm to show the user.
Randomization Unit
Each individual is randomized into a treatment arm. The randomization does not depend on any of the individuals' attributes (i.e. there is no clustering done that affects the randomization).
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
Our survey focuses on individuals, and they are not clustered at any level.
Sample size: planned number of observations
We have a set of emails available for over 10,000 GitHub users who enrolled in the GitHub Sponsorship Program, and another 34,000 people who were active GitHub users but did not enroll in the GitHub sponsorship program. Their email addresses were extracted from publicly available data from GitHub. This puts our set of users we intend to share our survey at 44,238 individuals. However, not all of the 44,238 individuals are expected to respond. To estimate of our expected number of responses, we randomly sampled 1000 emails from our dataset and emailed our survey to these individuals. After a week, we had 17 responses in total. Then, we sent a final reminder, and increased our total number of responses to 36. Since all our questions were optional, for our estimated response sizes, we focused on the respondents who answered the primary outcome questions. We see that 16-32 people answered our primary outcome questions. Based on this range, we created two estimates of our expected number of observations: [a] an optimistic estimate, and [b] a pessimistic estimate. [a] Optimistic Estimate = (32/1000)*(44238-1000) = 1384 individuals [b] Pessimistic Estimate = (16/1000)*(44238-1000) = 692 individuals We plan to continue the approach of emailing individuals at 9 AM local time on a weekday, and then sending a final reminder email to those who did not respond a week after our initial email. Due to the large size of the set, we plan to randomly split the 43,238 users we plan to reach out to into 4 batches. We will then email our users in the following staggered approach: [Thursday; First week] - Initial email sent out to first batch of 10,809 people [Thursday; Second week] - Initial email sent out to second batch of 10,810 people; Reminder email sent to the first batch [Thursday; Third week] - Initial email sent out to third batch of 10,809 people; Reminder email sent to the second batch [Thursday; Fourth week] - Initial email sent out to fourth batch of 10,810 people; Reminder email sent to the third batch [Thursday; Fifth week] - Reminder email sent to the fourth batch [Thursday; Sixth week] - We use the responses collected as of this date for our analysis
Sample size (or number of clusters) by treatment arms
Our intervention involves 4 treatment arms, as described in the registration above. Since each arm is randomly assigned to users, we expect an even split between the number of individuals assigned to each treatment. Based on the optimistic estimate described above of 1384 individuals, we expect 346 individuals to be assigned to each treatment group.

346 individuals are told to assume that they receive no sponsorship after 6 months of being sponsorable.
346 individuals are told to assume that they receive a $ 20 sponsorship after 6 months of being sponsorable.
346 individuals are told to assume that they receive a $ 1000 sponsorship after 6 months of being sponsorable.
346 individuals are told to assume that they receive sponsorship from an organization after 6 months of being sponsorable.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Our power analysis estimates the minimum detectable effects based on the expected sample size of our survey for our primary outcomes. We take two approaches to estimate our expected sample size: [a] an optimistic approach (assuming that all the people who completed our survey answered all the outcome questions), and [b] a pessimistic approach (using the minimum number of people who answered the questions of relevance to the outcome in the survey as those who respond to the survey). Both the approaches provide reasonable estimates of minimum detectable effects, thus allowing us to continue with our approach of treating the Likert scores as continuous variables and estimating our effects using linear models. We do not have enough power when treating the Likert scale as binary. We have attached a copy of our calculations in Excel format to this registration using the means and standard deviations from our pilot. On the observational analysis section of our survey, there are 4 main outcomes (as described in the registration above): 1. Advancement of technology and the benefit of society. [Mean Response] - 4.3 (Likert Scale Rating) [Standard Deviation] - 0.9 [Percentage Increase MDE (Optimistic)] - 0.5% [Percentage Increase MDE (Pessimistic)] - 0.7% 2. Recognition from other open-source developers. [Mean Response] - 3.1 (Likert Scale Rating) [Standard Deviation] - 1.1 [Percentage Increase MDE (Optimistic)] - 1.2% [Percentage Increase MDE (Pessimistic)] - 1.7% 3. Improving opportunities for future jobs and promotions. [Mean Response] - 3.4 (Likert Scale Rating) [Standard Deviation] - 1.2 [Percentage Increase MDE (Optimistic)] - 1.1% [Percentage Increase MDE (Pessimistic)] - 1.6% 4. Making an extra income. [Mean Response] - 2.6 (Likert Scale Rating) [Standard Deviation] - 1.3 [Percentage Increase MDE (Optimistic)] - 2.1% [Percentage Increase MDE (Pessimistic)] - 2.9% We have another set of 4 primary outcomes in the intervention section of our survey (as described in the registration above): 1. Committing new code to existing open source repositories. [Mean Response] - 2.9 (Likert Scale Rating) [Standard Deviation] - 1.2 [Percentage Increase MDE (Optimistic)] - 2.2% [Percentage Increase MDE (Pessimistic)] - 3.1% 2. Resolving issues you find in others’ open source repositories. [Mean Response] - 3.6 (Likert Scale Rating) [Standard Deviation] - 1.2 [Percentage Increase MDE (Optimistic)] - 1.4% [Percentage Increase MDE (Pessimistic)] - 2.0% 3. Starting new original open source repositories from scratch. [Mean Response] - 3.5 (Likert Scale Rating) [Standard Deviation] - 1.5 [Percentage Increase MDE (Optimistic)] - 1.8% [Percentage Increase MDE (Pessimistic)] - 2.6% 4. Forking other open source repositories. [Mean Response] - 3.3 (Likert Scale Rating) [Standard Deviation] - 1.2 [Percentage Increase MDE (Optimistic)] - 1.8% [Percentage Increase MDE (Pessimistic)] - 2.5%
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
Columbia University
IRB Approval Date
2024-04-02
IRB Approval Number
IRB-AAAV1428

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials