Skip to Main Content

NIH Data Management and Sharing Policy: Repository Selection

This guide contains resources, contact information, and step-by-step instructions for MSK staff who engage with the NIH Data Management and Sharing Policy

Selecting a Repository

A key component of the new NIH policy is support for data sharing. De-identified data and other research outcomes should be submitted to FAIR compliant, public repositories. As part of the NIH DMSP, grant seekers will have to identify which specific repositories they will employ for their data.

NIH has suggestions for approaches to sharing as well as recommendations for repository selection on their Sharing Scientific Data site.

Some data types have discipline-specific repositories, such as genomic data stored in the Gene Expression Omnibus or dbGaP. For other types of data, eg. RedCap form outputs, there isn't necessarily an obvious repository to use. As part of a FAIR compliant repository, we strongly recommend selecting a repository which can provide your data submission with a persistent identifier, such as a DOI.

If you are unsure of which repository would be best for your research output, here are a few sites that can help you search for and understand your options.

NIH Repository Selection page
https://sharing.nih.gov/data-management-and-sharing-policy/sharing-scientific-data/repositories-for-sharing-scientific-data

Registry of Research Data Repositories 
https://www.re3data.org/ 

Repository Finder 
https://repositoryfinder.datacite.org/ 

DataPortals.org 
http://dataportals.org/ 

Generalist Repositories 
If you know that your data doesn't fit into a discipline-specific repository, then you'll likely want to submit your data to a generalist repository. Some of these repositories are free while others charge per submission. The table below shows costs (as of 01/08/2023) and size limitations of some of the NIH-recommended generalist repositories. (Source: Nature Data Repository Guidance)

Repository Name Information on fees/costs Size limits
Dryad $120 USD for first 20 GB, and $50 USD for each additional 10 GB None stated
figshare 100 GB free per Scientific Data manuscript.  1 TB per dataset
Harvard Dataverse Contact repository for datasets over 1 TB

2.5 GB per file, 10 GB per dataset

Mendeley Data Open and secure cloud-based communal repository  Personal accounts have a maximum limit of 10 GB per dataset.
OSF (Open Science Framework) Free of charge 5 GB per file, multiple files can be uploaded
Zenodo Donations towards sustainability encouraged 50 GB per dataset
Science Data Bank Free of charge 8 GB per file, no limit to dataset size

The General Repository Comparison Chart and FAIRsharing Collection is another resource and an outcome of an NIH Workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse, held in early February 2020. Following the workshop, representatives of the participating generalist repositories collaborated to develop a tool researchers could use to make decisions about selecting a general repository. They intend that the content will be dynamically updated through their partnership with FAIRsharing. The American Geophysical Union should be acknowledged for designing the document. Version 3.0 (May 17, 2023) has been updated to the current services provided by the repositories and supporting policies. 

FAIR Principles

The FAIR principles (Findability, Accessibility, Interoperability, Reusability) are a set of aspirational goals to which all data should strive to adhere. If you are interested in learning more about the FAIR principles and how these standards can be applied to all aspects of your research including repositories to improve data sharing and data reuse, check out:

FAIRsharing.org 
Go FAIR

Digital Object Identifiers (DOI)

DOIs are a type of persistent identifier (PID) which can be applied to datasets, as well as published articles. They are issued through an internationally recognized searchable registry.

DOIs for datasets:

  • Support FAIR principles: Findability, Accessibility, Interoperability, and Reuse
  • Facilitate discovery, citation, and sharing of data
  • Preserve and plan beyond initial publication to improve reproducibility of research
  • Demonstrate broader picture of research by exposing the many-to-many relationship between research data and publications
  • Enhance reporting of data use and reuse for authors as well as data contributors
  • Disambiguation (similar to ORCID for authors)
  • Philosophically, treat data as an entity with its own value distinct from associated publication(s)

At MSK, the Library supports an institutional  membership to DataCite, an organization through which we are able to register DOIs. We can also integrate this service with a local MSK repository to help adhere to FAIR principles.