Skip to Main Content

Create metadata for your research project

How to create metadata for your research projects.

Create advanced metadata

In order to submit your research to a data repository, you may be required to format your metadata using a metadata standard. Consult the repository you will be using to determine what their metadata requirements are.

Metadata structures are often referred to as "schema." The schema will have a defined set of characteristics for describing the data. The completed metadata are often reported in a machine-readable language such as JSON or XML.

As an example, the Dublin Core Metadata Element Set contains the following 15 basic properties. You can view comments and explanations for all of the terms below at the Dublin Core web site.

If you are not using a standard metadata schema whose details are widely known and easily accessible to other researchers, be sure that you preserve the schema itself and its documentation, along with the data and metadata. By doing so, you will help ensure that you and others are able to fully understand and reuse your data in the future.

Contributor An entity responsible for making contributions to the resource.
Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
Creator An entity primarily responsible for making the resource.
Date A point or period of time associated with an event in the life cycle of the resource.
Description An account of the resource.
Format The file format, physical medium, or dimensions of the resource.
Identifier An unambiguous reference to the resource within a given context.
Language A language of the resource.
Publisher An entity responsible for making the resource available.
Relation A related resource.
Rights Information about rights held in and over the resource.
Source A related resource from which the described resource is derived.
Subject The topic of the resource.
Title A name given to the resource
Type The nature or genre of the resource

The following are several well-known and frequently-used metadata standards. 

Ontologies are shared vocabularies that are used to describe components of a particular discipline and the relationships among these components. By using ontologies, you make it easier for others (or even the future you) to understand your data. Controlled vocabularies, on the other hand, are merely lists of predefined, authorized terms.

In addition to using a metadata standard, you may wish (or be required) to use ontologies or controlled vocabularies to create your metadata. For example, if you use the Dublin Core as your metadata schema, they recommended that you use the Internet Media List, a controlled vocabulary, to enter information in the "Format" label. It is also recommended that you use a controlled vocabulary to enter the subject terms, but it is up to you to choose which vocabulary to use.

Here are some examples of ontologies and controlled vocabularies currently in use in a variety of disciplines:

Case study

Domain-specific repositories, such as the Protein Data Bank (PDB), often require the submission of highly structured metadata along with data files. This is what enables users to perform specialized searches within these data repositories. For example, in PDB you can search for all the ligases from mice that were determined by X-ray crystallography at a resolution of 2.5 Angstroms or better. If everyone submitted data in whatever format they wanted, this kind of searching would not be possible.

Protein Data Bank search results

 

The image below shows a very small part of the metadata file for the crystal structure shown above. Some of these metadata files contain over 20,000 lines, many of which contain structure information generated during the experimental data capture. You can see that the metadata file includes specific categories that are filled in with specific data in defined formats.

Protein Data Bank metadata