NEW UPDATE: Completed trials may now upload and register supplementary documents (e.g. null results reports, populated pre-analysis plans, or post-trial results reports) in the Post Trial section under Reports, Papers, & Other Materials.
Motivating metadata contributions for data re-use and reproducibility
Initial registration date
August 02, 2020
August 03, 2020 12:30 PM EDT
This section is unavailable to the public. Use the button below
to request access to this information.
University of Michigan
Other Primary Investigator(s)
University of Michigan
University of Michigan
Additional Trial Information
Metadata is data about the data. It is introduced for archival purposes but now serves a more active role: machine-readable metadata is crucial for making the documented studies "findable" by modern search engines. We investigate different motivational treatments for soliciting contribution to metadata. In our experiment, we approach the authors who published papers with online data sets and ask them to provide study-level metadata for the studies that they have published. Our investigation will also shed light on the crowdsourcing approach to supply high-quality metadata at scale.
Participants will receive personalized emails generated using different templates corresponding to different experimental conditions, all of which encourage them to contribute metadata to their published studies.
Intervention Start Date
Intervention End Date
Primary Outcomes (end points)
Individual's willingness to participate;
Article-level metadata contribution.
Primary Outcomes (explanation)
We measure outcomes based on the quantity and quality of the metadata content. Subjects fill in the metadata fields for their assigned article (their own paper) through a Supplemental Metadata interface, distributed through a unique link imbedded in the treatment email.
Willingness to participate is recorded on an individual basis. In the body of the email, subjects choose to participate or to opt-out completely. We count those who are willing to proceed to our survey interface as willing to participate.
The other primary outcome is captured at the article level, as a count of the number of metadata fields that are populated by all subjects who belong to the article.
Secondary Outcomes (end points)
willingness to directly update metadata; self-reported motivation for contributing to metadata
Secondary Outcomes (explanation)
The metadata for articles in this experiment is stored on a centralized platform, where currently, none of the subjects have editing access to it. Through the survey interface, subjects can request direct access and supply more detailed metadata along with updated datasets.
We ask subjects to self-report their motivation through a checklist after they have finished editing the metadata fields. More details are documented in the Analysis Plan document.
Subjects will receive a personalized email inviting them to provide metadata for one of their published studies. We introduce one control message and two variants of treatment message. In the baseline template, we describe the task and invite subjects to provide metadata in a customized survey interface. In treatment 1 and treatment 2, we insert one extra paragraph into the baseline template that explains the findability implication of metadata. Treatment 1 and treatment 2 are different in how the new paragraph concludes.
Experimental Design Details
Randomization was done in office by a computer.
Relying on the network of coauthors, we extract a set of independent articles such that none of the articles we include in the experiment shall have any common authors. In practice, we take out "bridging" edges from the original network and generate a reduced network with fewer edges. We randomly pick one article from each component of the network in the reduced network and proceed to the randomization step.
As the articles are chosen from components in the reduced network, we block by the "origin" of the components as well as the year of publication and number of authors for the articles. Within each block, we reshuffle the list and stack the list of articles in the same block on top of another list of articles from another block. Lastly, in the master list where blocks are stacked on top of each other, we divide the index by 3 and assign experimental conditions based on the residual from the division.
We cluster treatment at the article level, where all authors of the same article are assigned to the same experimental condition.
Was the treatment clustered?
Sample size: planned number of clusters
Sample size: planned number of observations
Sample size (or number of clusters) by treatment arms
Each treatment arm has 486 clusters (articles).
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
10 percentage points
INSTITUTIONAL REVIEW BOARDS (IRBs)
Health Sciences and Behavioral Sciences Institutional Review Board (IRB-HSBS)
IRB Approval Date
IRB Approval Number