Skip to Main Content

Data best practices and case studies

How to use best practices for managing your research data, along with case studies and examples to help you use these techniques.

Share data

Data sharing is often a natural part of the research process; however, your funding agency may require that you share your data or make them publicly accessible. The US Federal Government requires that Federal agencies that fund research develop plans for increasing public access to the results of that research. Comprehensive information on all of these plans can be found at SPARC's Data Sharing Requirements by Federal Agency page.

Before sharing your data, you should also consider not only the metadata you will need to provide along with the data to make it easily understood, but also the intellectual property, privacy, and copyright, or licensing issues to be addressed with regard to the sharing.

  • The Research Policy Handbook contains Stanford-specific information regarding intellectual property, including inventions, patents, licensing, copyright, and tangible research property.
  • If your research involves human subjects, you will need to consider confidentiality and privacy issues before sharing your data. See more about working with sensitive data.
  • If your research involves data created by others, you may or may not have the right to share these data. Check the license associated with these data or the data owners to verify what you can and can't do with these data.

When it is time to preserve your data, you will need to carefully consider exactly which components of your research need to be preserved. Answering "yes" to any of the following questions with regard to a particular data file or set of data may indicate that those data should be preserved for the long-term.

  • Do the data support published research?
  • Are the data vulnerable?
  • Are the data required for your research but from another source (i.e. not your original research data)? If so, is the future availability of those data from the original source uncertain?
  • Do you wish, or are you required, to share your data?
  • Are the data historically significant?

In addition, you should also consider whether you will need to preserve multiple versions of a file or whether the most recent version will be sufficient for preservation. It may also be important to consider whether the project is still in progress or whether it is complete. Long-term projects, such as those that involve sampling of a single site repeatedly over months or years, may require periodic preservation of data before the project is actually considered "finished."

A license will define what others may or may not do with your data. You may choose to assign a broad license that allows anyone to do whatever they like with your data, or you may assign a more narrow license that restricts their use to strictly non-commercial activities and requires attribution of the data creator whenever it is used.

The two primary current sources for licenses are Creative Commons and Open Data Commons. If you elect to deposit data into the Stanford Digital Repository (see below), you will have the option to choose any of the licenses below or one of several software licenses. See the full list of licenses available in the SDR.

A number of specific licenses are also available specifically for open source software (OSS). If you are interestesd in an OSS license, choosealicense.com may be of help to you.

Preserve data at Stanford

The Stanford Digital Repository (SDR), a service offered by the Stanford University Libraries, provides digital preservation, hosting, and access services that enable Stanford researchers to preserve, manage, and share research data in a secure environment for long-term citation, access, and reuse.

Please note that the SDR is intended for use by Stanford researchers only. A valid SUNet ID is required to access our online deposit application.

By depositing data into the SDR you will:

  • Obtain a permanent, reliable web link (persistent URL) to your deposit that will not change over time
  • Control when research data are made public
  • Establish licensing rules around your data
  • Make your data easy for other researchers to discover
  • Ensure that the information is preserved into the future

As part of your data management plan, you may wish to preserve your research output in the Stanford Digital Repository. If so, please note the following important information:

Depositing data into the Stanford Digital Repository requires your acceptance of the Terms of Deposit set forth by the University and may incur associated fees. Before you commit to using the Stanford Digital Repository for your ongoing data preservation and access needs, please contact sdr-contact@lists.stanford.edu for detailed information regarding terms of deposit, terms of use, fee structure, as well as embargo and licensing options.

Download SDR Terms of Deposit (pdf)

If you do choose to use the Stanford Digital Repository, the following text describing the service and how your data will be safeguarded and made accessible may be copied and pasted directly into your data management plan's section on data preservation.

"Data will be preserved and shared via the Stanford Digital Repository (SDR), a service built and run by Stanford Libraries. Datasets and other content in any format can be deposited into the SDR and made available immediately to the world via open access licenses and at no cost to the end user. Depositors can choose to have a Digital Object Identifier (DOI) assigned to their dataset, which is available at a persistent URL along with the required metadata describing the content. ORCID iDs may be included for every author. Datasets are replicated multiple times and stored in geo-diverse locations on different media types, providing long-term data management and data integrity. The stable technical infrastructure and dedicated staffing by digital preservation experts, funded by the University, ensures the long-term availability of the datasets."

Note that this language is available within the Data Management Planning Tool, if you opt to use that method for writing your data management plan.

Preserve in a domain-specific repository

The FAIRsharing website has information about some journal requirements and recommendations for data repositories and standards. To talk with us about preparing data for submission to a repository, please contact us.

Search one of the resources below to find a suitable domain-specific repository for your research data.

  • re3data.org: A global registry of research data repositories from different academic disciplines.
  • FAIRsharing databases: A catalog of databases, described according to the BioDBcore guidelines, along with the standards used within them. Compiled in collaboration with the Nucleic Acids Research Database.

Case study

When you publish information about where others can access your data online, you want to make sure that the information you are giving them will always be correct. If you publish your data online in a place that is not going to be accessible in the future, others may discover that their efforts to find, view, and reuse your data are futile.

Excerpt from published article, image by Amy Hodge

Take the example of former Stanford researcher Malin Pinsky. In 2009, Malin published an article in the journal Conservation Biology (see bottom of page for reference). In this article, he provided information about how to find an additional list of sources and a database with citations. These items could be downloaded from a Stanford web site located in his personal AFS space.

Forbidden access message screen shot by Amy Hodge

This system worked fine until Malin left Stanford. When someone leaves Stanford, web sites within their AFS space are permanently disabled. Because of that, the links in the Conservation Biology article no longer directed researchers to the data they were looking for. Instead, they saw only the message below that access to the site was forbidden. Contacting the site administrator or HelpSU as suggested would not have gotten anyone the data they were looking for.

Example persistent URL (PURL) page, screen shot by Amy Hodge

 

When the Stanford Digital Repository began accepting data from Stanford researchers, Malin's data files were deposited. They are now available on their own persistent URL (PURL) page. This URL is designed to be persistent, so Malin's data should be accessible here for a long time to come, no matter where he heads to next. (We even got the published ilnk redirected to the PURL!)

If you have data that you would like to make easily accessible to others now and in the future, contact us about using the Stanford Digital Repository.

* Article reference: Pinsky, M. L., Springmeyer, D. B., Goslin, M. N. and Augerot, X. (2009), Range-Wide Selection of Catchments for Pacific Salmon Conservation. Conservation Biology, 23: 680–691. doi: 10.1111/j.1523-1739.2008.01156.x. Available at https://doi.org/10.1111/j.1523-1739.2008.01156.x