Writing Metador Plugins
Your First Metador Plugin¶
In this tutorial you will learn how Metador plugins are defined by creating a new Python package that provides a simple custom metadata schema. We will keep it simple and focus on the general aspects which apply to any kind of Metador plugin. In later tutorials, we will discuss all the specifics for different kinds of plugins in much more depth.
Prerequisites:
- Working knowledge of the Linux shell (basic navigation and using CLI tools)
- Working knowledge of version control with
git
(creating and managing a repository)
Learning Goals:
- Learn how to create a new Python package that provides new metador plugins
- Learn to define and register a new plugin providing a simple metadata schema
Creating a New Python Package¶
Metador plugins use the standard entrypoint
system which is widely supported
and used in various Python projects. In order to use this system and correctly register plugins,
you cannot just write some Python files, but must organize them as a proper Python package.
We recommend to create Python packages using a tool called poetry
.
It is also the tool used for the development of metador-core
.
If you have no experience with entrypoints and Python packages, don't worry - we will guide you through the
process of setting up a new package.
Install Poetry¶
First, make sure that you have poetry
available. To install it, follow the
steps in the documentation
(this is important - it is not supposed to be installed
using pip install
). Check that poetry is installed by running
poetry --version
, which should reply with something like Poetry version 1.1.14
.
Create the Package¶
First, navigate on the command line to the directory where you want to create your new metador plugin package, e.g.:
cd ~/projects # <- replace with path where your project should live
poetry new my-metador-plugin
cd my-metador-plugin # <- this directory was created by poetry for you
You should find yourself in your fresh project directory which already contains:
- a directory
my_metador_plugin
where your code will live - a directory
tests
where you can place all the tests for your code (hopefully many!) - an empty file
README.rst
(as a reminder that you should write a README) - a
pyproject.toml
file that tracks all dependencies to other packages (!!!)
Put the directory under version control¶
We will not discuss this further, but at this point you
probably should run git init
in the project directory, add a .gitignore
which is suitable for Python projects
(e.g. this one)
and do the first commit. In the future, also make sure to include changes
to a file called poetry.lock
in your commits (if there is one).
We assume that you are able to take care of proper version control for your project
and will not mention it anymore.
Add general information to pyproject.toml
¶
Make sure that the automatic authors
entry is correct and has an e-mail address
that can be used to contact you.
Write a brief description
of what this package will provide.
If you already have connected your local directory
to a remote public git repository hosting service, such as GitLab or GitHub, also
add the URL as repository_url
just under the other fields under [tool.poetry]
- this
information is used by Metador to help users to get your plugin in case they need it.
If you have some time, feel free to also add other package metadata.
Enter the virtual environment¶
Run poetry shell
to activate a project-specific virtual environment.
Poetry will create one for you, if it does not exist yet.
You see that you are in a virtual environment, because your command prompt in the terminal will begin with something like (my-metador-plugin-KQKVg0oX-py3.8)
. The name of all virtual environments which poetry creates is always starting with your project name and will contain some automatic identifier (like KQKVg0oX
).
In case you are used to running activate
and deactivate
for your virtual environments - these commands are not used together with poetry. Instead:
- Use
poetry shell
inside the project directory to activate (whenever you want to work on your package) - Use
exit
anywhere to deactivate (e.g. when you want to switch to another project or some custom environment)
Add metador-core
as a dependency¶
Run the following command to add metador-core
as a dependency to your project:
poetry add git+ssh://git@github.com:Materials-Data-Science-and-Informatics/metador-core.git
Until then, as an early adopter you install the current version from the main branch.
Make sure that your public ssh key is properly registered in Github to access the private repository.
If everything went smoothly (it can take a couple of minutes), then:
- the output of the poetry command contains a line like:
* Installing metador-core (...)
- the
pyproject.toml
has a new entry formetador-core
under[tool.poetry.dependencies]
Now we are ready to get started with development!
The Metador Plugin System¶
Before we go on and define our schema plugin, let us first take a closer look at the Metador plugin system.
Plugin Groups: Bags of Similar Plugins¶
Every Metador plugin belongs to a plugin group, even plugin groups themselves are just plugins of a plugin group called plugingroup
and can define new kinds of objects that can be provided as plugins (but this is an advanced topic that you most likely never will need to worry about). Probably the most important plugin group is the schema
plugin group, as everything in Metador depends on schemas.
In general, if you want to define a new well-behaved plugin, but do not know the specifics of the relevant plugin group yet and no other guidance is provided, a good place to start is the documentation of the class where the plugin group itself is defined, if no other documentation is provided. It should contain all important information for writing a suitable plugin, that is, one that provides the expected interface and behavior.
The plugin system will try to check all plugins and validate them, to catch simple but common implementation mistakes. This includes things such as forgetting to implement a method or to set a required attribute. Nevertheless, you cannot rely on the automatic superficial checks. You are responsible for the correctness of your plugins, because most higher-level requirements for plugins cannot be checked automatically.
Anatomy of a Plugin¶
Each plugin group defines their own requirements for what each plugin must provide, but there is a minimum of requirements shared by all plugins - they must provide a special inner class that defines at least the plugin name and version. So the common skeleton of a shared plugin looks like this:
class MyNewPlugin:
class Plugin:
name = "my.newplugin"
version = (0, 1, 0)
# (... possibly other plugin group specific declarations ...)
# (... all methods required by the plugin group ...)
# (... implementation details of YOUR plugin ...)
Plugin naming¶
The name of a plugin, aside of being meaningul, must satisfy two properties:
The plugin entrypoint must be equal to the name__x.y.z
, i.e. name and version of the plugin¶
Luckily, this is one property that the plugin system will check for all plugins and warn you when you try to load them. Soon you will see how to declare the entrypoint for your plugin.
The plugin name must be unique within its plugin group¶
This property cannot be checked automatically. To avoid problems, you should not use too general or too short names for your plugins. Otherwise, there is a risk that someone else will pick the same name for their plugin, which can lead to serious problems. To avoid or at least constrain this problem, you must add a "namespace prefix" to the names of all plugins that you define.
This means: All your plugins should have names of the form MYPREFIX.PLUGINNAME, where MYPREFIX is your chosen "namespace prefix".
The prefix that you pick should be something that most likely other people will not use, but something short enough that people are not too annoyed by typing out the name of your plugin by hand (remember that metadata in containers is inspected by accessing the metadata based on the schema plugin name).
Suitable choices for a prefix are:
- your last name
- your username on Github
- the short name of your employer organization or institute
If you work for a larger organization, feel free to refine this namespacing approach to a suitable level that will prevent plugin name collisions, e.g. use plugin names such as my-org.my-dept.my-plugin
.
Plugin versioning¶
All plugins must respect semantic versioning, so a version is a triple MAJOR.MINOR.PATCH
. For different plugin groups this translates to slightly different requirements, which are usually explicitly spelled out for the specific context of a plugin group. In general, semantic versioning means:
- You increase
MAJOR
by 1 whenever other things could break if they update to the new version - You increase
MINOR
by 1 whenever you added new features without breaking anything - You increase
PATCH
by 1 whenever you fixed problems in your plugin without changing its intended behaviour
The initial version you pick for your plugin is not important, but for consistency we recommend setting the first version to 0.1.0
(written as (0, 1, 0)
in the example above), which is common practice in most software projects.
Defining the Schema Plugin¶
Now after gaining insight into the general workings of the Metador Plugin system, we are finally ready and can get to work.
Write the Schema¶
In your project, create a new file schemas.py
inside the my_metador_plugin
directory (which currently only contains an __init__.py
file) and add the following contents:
"""Metador schema plugins provided by this package."""
from metador_core.schema import MetadataSchema
from metador_core.schema.types import Int, Str
class MyFirstSchema(MetadataSchema):
class Plugin:
name = "dummy.my-first-schema"
version = (0, 1, 0)
magic_number: Int
some_text: Str = "(no text)"
You can see that the minimal schema plugin is very close to the general "plugin skeleton" we discussed above.
What makes this a schema plugin is that our plugin class is a subclass of MetadataSchema
and the inner Plugin
class is a subclass of SchemaPlugin
(which itself is a subclass of PluginBase
that you have seen above). These are the minimal requirements imposed by the schema plugin group.
Schemas are defined mostly by using Python type hints - if you are familiar with dataclasses, then think of schemas as very fancy dataclasses. In another tutorial we will take a deep dive into schema development, but for now it is enough to know that our schema expects a metadata object that requires a field called magic_number
, which must be an integer value, and also supports an optional field some_text
, which if provided will override the default value "(no text)"
.
Declare the Entrypoint¶
Now open your pyproject.toml
file and define the entry point by adding these two lines (e.g. just before the [build-system]
section):
[tool.poetry.plugins.metador_schema]
'dummy.my-first-schema__0.1.0' = "my_metador_plugin.schemas:MyFirstSchema"
The first line says that we want to declare a schema
plugin (for plugin group X
, the section would be [tool.poetry.plugins.metador_X]
).
The second line declares the entry point, with our plugin name on the left and the location of our plugin on the right.
The location string on the right corresponds to how the class is imported: from my_metador_plugin.schemas import MyFirstSchema
.
Finally, run poetry install
, which will make poetry re-register your package and thus make the entrypoint known to the environment.
We are done!¶
In order to see if everything worked, make sure that you are still inside poetry shell
(remember that we said that you should work inside of it!) and try to import your schema in the python3
interpreter (lines A and B below).
If you run this notebook in the same virtual environment which is used for the plugin package, you can just restart the notebook and evaluate the following cell:
try:
# run these two lines in your python interpreter:
from metador_core.plugins import schemas # A
MyFirstSchema = schemas["dummy.my-first-schema"] # B
# ----
print("Congratulations, your new plugin was registered correctly! :)")
except KeyError:
print("Your plugin was not found :(")
Your plugin was not found :(
Assuming that you test in the regular python3
shell:
- If line A fails, then you are probably not in the correct virtual environment, because the interpreter cannot find
metador_core
. - If line B fails, than you either did a mistake with the entry point in the
pyproject.toml
file, or forgot to runpoetry install
.
You might wonder why we always access a schema through the plugin system, instead of simply importing it. One reason is that we wanted to verify that the plugin is registered correctly. Another reason is that plugin code can be moved by developers within a package, or even into completely different package. You should not need to know where a plugin comes from in order to use it, this is one of the main purposes of the Metador plugin system.
If you want, now you can try instantiating your schema in various ways to get a feeling for it:
print(MyFirstSchema(magic_number=42))
print(MyFirstSchema(magic_number=23, nonsense=True))
print(MyFirstSchema(magic_number=1, some_text="hello"))
{ "magic_number": 42, "some_text": "(no text)" } { "magic_number": 23, "some_text": "(no text)", "nonsense": true } { "magic_number": 1, "some_text": "hello" }
As an exercise, you can create a MetadorContainer
and attach an instance of your metadata schema to a node - your schema is on equal footing with the default schemas you have already seen, and it can be used in exactly the same ways. In a later tutorial, you will learn how to make use of schema inheritance and make your schema aligned with semantic standards.
Summary¶
Python Packages¶
- Metador plugins must be provided by Python packages using the entrypoint system
- The easiest and most modern way to create a package is using the
poetry
tool - Entrypoints are declared within the
pyproject.toml
file that contains all package metadata
Metador Plugins¶
- Each plugin belongs to a plugin group, which corresponds to an entry point group
- All plugins must have a unique name and possess a version tag that follows semantic versioning
- Plugins must satisfy additional requirements that depend of the respective plugin group