Skip to content

dirschema


DirSchema Logo   


A directory structure and metadata linter based on JSON Schema.

JSON Schema is great for validating (files containing) JSON objects that e.g. contain metadata, but these are only the smallest pieces in the organization of a whole directory structure, e.g. of some dataset of project. When working on datasets of a certain kind, they might contain various types of data, each different file requiring different accompanying metadata, based on its file type and/or location.

DirSchema combines JSON Schemas and regexes into a solution to enforce structural dependencies and metadata requirements in directories and directory-like archives. With it you can for example check that:

  • only files of a certain type are in a location (e.g. only jpg files in directory img)
  • for each data file there exists a metadata file (e.g. test.jpg has test.jpg_meta.json)
  • each metadata file is valid according to some JSON Schema

If validating these kinds of constraints looks appealing to you, this tool is for you!

Dirschema features:

  • Built-in support for schemas and metadata stored as JSON or YAML
  • Built-in support for checking contents of ZIP and HDF5 archives
  • Extensible validation interface for advanced needs beyond JSON Schema
  • Both a Python library and a CLI tool to perform the validation

Usage

To get started, please check out the quickstart guide.

How to Cite

If you want to cite this project in your scientific work, please use the citation file in the repository.

Acknowledgements

We kindly thank all authors and contributors.

HMC Logo    FZJ Logo


This project was developed at the Institute for Materials Data Science and Informatics (IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.