Skip to Main Content

Data best practices and case studies

How to use best practices for managing your research data, along with case studies and examples to help you use these techniques.

Overview

Storage

Storage of data is typically done in an easily-accessible, secondary location. The data are usually mirrored, which means that the data in the secondary location is identical to the original version.

When deciding on what type of storage solution you will use, you will need to think about several things, such as how much storage you need, what your budget is for storage, what platform you are using, and whether you have data security issues. 

Stanford's IT Services has an easy-to-use table of storage solutions that are available on campus. 

Backup

Backup of data is typically done in a separate physical location that may be harder to access than your regular storage space (though not necessarily). Backups are snapshots of the information in your files at a given point in time. Usually only one version of the backup is kept, not multiple versions. The Time Machine software on a Mac is an example of a backup system. 

When deciding on what type of backup solution to use for your files, you will need to consider how often you intend to back up your files, whether you need one or multiple backups, and what your budget is for backups.

Stanford's IT Services has an easy-to-use table of backup solutions that are available on campus.

Preservation

Keep in mind that backing up your data is not the same as, nor is it a replacement for, long-term preservation. See our page on sharing files, including information about the Stanford Digital Repository, for more information about preserving your data for the long-term.

Considerations for sensitive data

Check out our page on working with sensitive data to learn more about data risk classifications and working with sensitive data.

University IT's website on risk classifications includes a table at the bottom of the page that shows which storage and backup options can be used for which categories of data.

Case study

 

Center for Advanced Study in the Behavioral Sciences at Stanford, Image by Linda A. Cicero/Stanford News Service"As Dr. Srinivas watched the charred fragments of his lifetime's work crumble in a fireman's hands and fall to the ground, the scholar wept."

Over fifty years later, the details of the story may be different, but a similar fate could be awaiting any researcher who lacks good data management practices. 

It was shortly after 5 am on April 24,1970, and the offices of Dr. Srinivas and nine other scholars at the Center for Advanced Study in the Behavioral Sciences at Stanford had just been firebombed. 

Dr. Srinivas spent 1948, the year following India's independence from Britain, living without electricity, piped water, or privacy in the small rural village of Rampura. He contended with heat, dust, insects, smells, and vermin in order to study the complex interpersonal relationships of a multi-caste village -- a way of life that was quickly disappearing as more of certain castes, particularly the Brahmin, moved to urban areas.

Dr. Srinivas learned about rural village life, agriculture, social structures, relationships, and caste tensions. He also learned more about his own ancestral roots, since several generations earlier his family had moved from the nearby state of Tamil Nadu and settled in a village only three miles from Rampura. When his studies were complete, he spent another twenty years or so compiling and analyzing the comprehensive set of field notes that he had carefully collected.

By the time he came to Stanford, he had created multiple copies of his processed notes, hand-written on 5,000 notecards. His original field notes remained in India, but the three copies of his processed notes were unsuspectingly at risk.

It's a gut-wrenching tale. 

As he later recalled, "By a strange quirk of fate all the three copies of my fieldwork notes, processed over a period of eighteen years, were in my study at the Center when a fire was started by arsonists." 

Creating multiple backup copies of your research data is an important part of data management, but it is far less effective when all those copies are stored in the same place and are not well protected. No researcher wants to suffer the loss of decades of their hard work, but good data management does take time and effort. Data Management Services at Stanford University Libraries can help you create and execute a solid data management plan that will help ensure your data are around for as long as you need them. 

Remember, it doesn't take an arsonist or Hurricane Sandy to ruin years of research. One dropped laptop or spilled cup of coffee can be just as damaging if your data haven't been properly protected. 

Once your research project is completed, consider taking advantage of another Library service to preserve your data for the long-term and share it with others -- the Stanford Digital Repository (SDR). Content in the SDR is replicated multiple times, stored in geographically-diverse locations on different types of media, and systematically audited to verify that every bit is precisely maintained. For each data deposit you'll receive a persistent URL (like this one) that can be cited in publications (or elsewhere) and at which you and others will be able to access your data.

In the end, Dr. Srinivas published The Remembered Village based largely on memories from his time in Rampura. Salvaging operations were undertaken -- one to preserve the remains of the notecards in individual plastic sleeves for photographing, and another to microfilm the original hand-written diaries in India so they could be shipped to the United States and compared with the notes. But nothing could replace the years of original work.

So, whether it's the next tsunami or just a broken water line down the hall, make sure your data are safe by employing good data management practices!

Sources: