Metadata standards (and schemas)

Research Data Management

Metadata standards (and schemas)

What is a metadata standard and schema?

You have likely already read the term FAIR principles on our website or heard about it in other research data management contexts.
One key ingredient for achieving FAIRness is metadata.
Metadata is data about data, used to provide information that can be useful to other users in various contexts. Schemas define how this metadata is structured, what fields exist and what kind of information belongs where, ensuring that data is described in a consistent, machine-readable way.

Why do metadata standards matter?

If everyone uses their own language and format, metadata quickly becomes difficult to understand and reuse. To solve this, various metadata standards have been developed to provide a shared structure for describing data, concepts, and entities. Each standard defines a schema, a formal description of what classes and types exist and which properties are used to describe and connect them. In this way, metadata schemas create a common language for data and help to support all four FAIR principles: making data Findable, Accessible, Interoperable, and Reusable.

What kinds of metadata standards and schemas exist?

Metadata schemas exist in many forms — some are quite old, while others rely on modern technologies.
Originally, metadata schemas were developed in libraries to describe where to find a book by a certain author, for example. These catalogues were like books themselves but followed a clever structure that made searching fast and efficient.

Today, much metadata is stored in additional files on computers, such as READMEs or other descriptive documents. These often do not follow a formal schema, but when they do, they adhere to strict rules that define how information must be represented. Common schema languages include JSON Schema, OWL, RDFS, SHACL, or even spreadsheets. In semantic technologies, a metadata schema can also serve as the data model of a knowledge graph.

A metadata standard uses one of these schema formats to encode the information needed to describe a thing, concept, or individual (e.g., a dataset) in line with that standard. Metadata standards themselves also vary — they may be defined by a standardization consortium (such as ISO), developed as an open-source project, or created by a company. They often target specific domains or topics, such as medicine or research publications.

How to use a metadata standard and schema?

The first step is to select an appropriate metadata standard. As a user, you typically choose which metadata standard to use, not just its schema, but the underlying technology of the schema may also influence your decision. Selecting the right standard depends largely on your use case; there is no single solution for all situations. If you simply want to publish a dataset and provide basic metadata, you can choose a generic metadata standard such as DCAT-AP, DataCite, or schema.org. These are considered fundamental building blocks for describing research or online data in general. They cover the key aspects of publicly available data but do not capture detailed, domain-specific information about your dataset. For that level of detail, you need either a domain-specific metadata standard or an extension profile that allows for more specialized descriptions. An example for the energy domain is the Open Energy Metadata Standard.

When you upload data to a platform, you use metadata standards or schemas to describe your dataset. The platform will guide you on how to provide this information. In some cases, you will need to upload a metadata file that the platform validates to ensure all required fields are completed. In others, you may fill out a web form before or after uploading your data.

A well-designed standard typically includes example values or helpful hints for providing the correct information. If further clarification is needed, most standards offer online documentation with detailed definitions and descriptions for each field to support you in creating accurate and complete metadata.

Relevance in NFDI4Energy context

As NFDI4Energy we want to empower you to do FAIR research, so we aim to provide guidance and tooling for your use cases.
To do so, we are working on tooling for handling metadata and that includes understanding different types of metadata standards, to see what works best for the energy domain.
Moreover, we develop domain-specific metadata schemas and extension profiles to enable precise description of datasets, which can be used by services to provide improved metadata and features to you.

Related task areas

  • Task Area 4 is responsible for the development of ontologies and metadata standards. So the choice of relevant metadata standards, the creation of domain-specific extension profiles, and the development of metadata-specific tooling are handled by this task area.
  • Task Area 8 is developing the services of NFDI4Energy so they are tasked with integrating the relevant standards and extension profiles in the services to process the information for the users’ benefit.

Related NFDI4Energy services

At the moment the Open Energy Platform and the Leibniz Data Manager for Energy are data platforms that require you to provide metadata about your dataset in a given schema.
The DBPedia Databus is a registry for metadata about datasets that allows metadata to be indexed for better searching.

Further resources