Documentation or Metadata?

Many datasets and products are documented using approaches and tools developed by data collectors to support their analysis and understanding. This documentation exists in notebooks, scientific papers, web pages, user guides, word processing documents, spreadsheets, data dictionaries, PDF’s, databases, custom binary and ASCII formats, and almost any other conceivable form, each with associated storage and preservation strategies. This custom, often unstructured, approach may work well for independent investigators or in the confines of a particular laboratory or community, but it makes it difficult for users outside of these small groups to discover, use, and understand the data without consulting with its creators.

Metadata are standard and structured documentation.

Metadata are standard and structured documentation.

Metadata, in contrast to documentation, helps address discovery, use, and understanding by providing well-defined, structured content. This makes it possible for users to access and quickly understand many aspects of datasets that they have not collected. It also makes it possible to integrate information into discovery and analysis tools, and to provide consistent references from the metadata to external documentation.

Metadata standards provide standard element names and associated structures that can describe a wide variety of digital resources. The definitions and domain values are intended to be sufficiently generic to satisfy the metadata needs of various disciplines. These standards also include references to external documentation and well-defined mechanisms for adding structured information to address specific community needs.

Another important difference between documentation and metadata is the target audience. Documentation is targeted at humans and it relies heavily on our capability to make sense out of a variety of unstructured information. Metadata, on the other hand, is typically targeted at applications. Many of these applications facilitate searching metadata and displaying it in a way that facilitates data discovery by humans. As tools mature and, more importantly, the breadth of existing metadata increases, we will see more and more applications creating and using metadata to facilitate more sophisticated metadata and data driven discovery, comparisons between multiple datasets, and other analyses.

Of course, the audience is also very important when we create metadata. Humans like descriptions that help them understand the resources being described and citations to more, likely unstructured, information. Applications are generally much more demanding when it comes to consistency and completeness. It is important to consider both audiences when creating and improving metadata.

Note added: It is interesting to see that the word “documentation” has a much longer history than the word “metadata”. Metadata is really the new kid on the block.