Skip to Main Content

Data Repository Selection: What to look for

This guide contains information on how to compare and select repositories for research data.

Evaluating a repository

There are many components you should take into account when evaluating a repository for potential use. Ask yourself the following questions as you make your decision: 

  • Does the repository accept and/or widely collect data similar to the data I've generated?
  • What are the storage size limits? Keep in mind that many repositories will have one limit for individual file sizes, and another for entire projects or datasets. It may also be possible to buy or request additional storage at certain repositories. 
  • Does the repository have any fees associated with submission? Are they one-time fees or recurring? Will you be assuming the cost or storage, or will your institution? Does your institution have (or can they acquire) a subscription or premium service for the repository? 
  • Is the repository FAIR compliant? 

Read more about how to find information to answer these questions in the boxes below.

Discipline specific vs generalist repositories

Some data types have discipline-specific repositories. If this applies to your data, it is generally better to choose to deposit in an appropriate discipline specific repository. You can learn more about these repositories and the data types they tend to cover on the discipline specific repositories tab at the top of this guide. 

If your data doesn't have a dedicated subject repository, or if you're working with many different types of data, you will probably be depositing in a generalist repository. There are many generalist repositories out there, so comparing your options is especially important in this scenario. You can read more about how to choose between these repositories on the generalist repositories tab at the top of this guide.

Be FAIR (and CARE)

One of the most important things to look for in a repository is FAIR compliance. FAIR stands for Findable, Accessible, Interoperable, and Reusable, and is used to refer to a set of standards for data storage and management set by a group of researchers, librarians, and scientists in 2016. FAIR compliance is meant to help mitigate the reproducibility crisis and ensure data is properly stored and cited. 

CARE is another set of principles created by the Global Indigenous Data Alliance that focuses on the right and ability of Indigenous people to control the way that their data is used and shared. CARE stands for Collective Benefit, Authority to Control, Responsibility, and Ethics. While CARE was developed to deal with tribal and indigenous data sovereignty, researchers have advocated for its use in fields such as genomics and biodiversity

Fees, storage space, and institutional subscriptions

Many repositories charge fees for submitting data. This helps them mitigate storage and upkeep costs. Fees can range from a few dollars to a few hundred dollars, and may be a flat rate, tied to the amount of storage your project uses, or a subscription model. Still others encourage an institutional model, where organizations pay for additional storage or services for their affiliated researchers. 

If you cannot or don't want to pay for a repository, you still have options. Free repositories often have storage caps, but in many cases, such as Harvard Dataverse, this limit can be as high as 1TB.

You can use the comparison chart under the Generalist Repositories tab to compare the costs of 9 major generalist repositories, many of which are entirely free.