These can trip up other software that are trying to read a contiguous data table. They also make it unclear if the data were not gathered, were not reported for some reason, or were accidentally deleted. If there is no data for the cell, indicate why.
These make it appear as if data are missing. They may also indicate that there are multiple tables in the same worksheet; every table should be in its own worksheet.
These may not be readable by other software programs, and they may be modified when you export your data.
Ideally, every table should have a single header row in the first row. Each column header should be unique. One row of headers is best; some software may have problems with multiple header rows.
These are not maintained when data are exported as a .csv file, which may cause information to be lost or confused. It is best not to use merged cells.
These are not maintained when data are exported as a .csv file. Add an extra column where information indicated by colors can be captured.
These may be misinterpreted when data are exported as a .csv file. They may also indicate multiple pieces of data in the same cell (like “city, state”) that should be captured in separate columns. Use other separators if necessary.
These are not maintained when data are exported as a .csv file or if the file is opened in other programs. Create a new column for listing comments.
This can be very confusing, as can mixing numbers and text in the same column. Split data into multiple columns if necessary.
OpenRefine (formerly Google Refine) is a desktop application that can help you clean up messy or inconsistent data by helping you locate those inconsistencies and providing easy ways to fix them. It can also help you transform your data and extend it with web services. All editing operations are tracked, allowing you to roll back to previous versions.
This example is from a collection of research data collected by Science Data Librarian Amy Hodge from 1997-1999 for her dissertation research. At least five best practices are ignored in this first spreadsheet. See if you can spot the problems before you scroll down to reveal the answers!
This example is from a collection of research data collected by Science Data Librarian Amy Hodge from 1997-1999 for her dissertation research. At least three best practices are ignored in this second spreadsheet. See if you can spot the problems before you scroll down to reveal the answers!
Do not use colored text or cell shading. This formatting is lost when the data are exported or the file is opened in another program. Consider adding another column where the information indicated by the coloring and shading can be included in text form.
Do not use commas. When data are exported as a comma separated value (.csv) file, the commas within your data cells can cause confusion. They also sometimes indicate that multiple pieces of data (like city, state) are included in a single column. Separate data into multiple fields if appropriate or use another kind of separator (colon, semicolon -- but not a symbol!) if one is necessary.
Do not merge cells. Cell merging is typically lost when data are exported and is often misread by other software programs. This may result in the shifting of data cells in the affected rows or columns. In the case shown here, the cells in column 1 should be unmerged and the date information entered into the first cell in each row.