Skip to content

pg

Schema plugin group.

SchemaPlugin

Bases: PluginBase

Schema-specific Plugin section.

Source code in src/metador_core/schema/pg.py
18
19
20
21
22
23
24
25
26
27
28
29
30
class SchemaPlugin(PluginBase):
    """Schema-specific Plugin section."""

    auxiliary: bool = False
    """If set to True, the schema is considered auxiliary.

    The consequence is that metadata objects based on this schema cannot be
    attached to containers.

    The intended for schema plugins that are too general or not self-contained,
    but could be useful in a larger context, e.g. as a parent schema or nested
    schema.
    """

auxiliary class-attribute instance-attribute

auxiliary: bool = False

If set to True, the schema is considered auxiliary.

The consequence is that metadata objects based on this schema cannot be attached to containers.

The intended for schema plugins that are too general or not self-contained, but could be useful in a larger context, e.g. as a parent schema or nested schema.

PGSchema

Bases: PluginGroup[MetadataSchema]

Interface to access installed schema plugins.

All registered schema plugins can be used anywhere in a Metador container to annotate any group or dataset with metadata objects following that schema.

If you don't want that, do not register the schema as a plugin, but just use the schema class as a normal Python dependency. Schemas that are not registered as plugins still must inherit from MetadataSchema, to ensure that all required methods are available and work as expected by the system.

Unregistered schemas can be used as "abstract" parent schemas that cannot be instantiated in containers because they are too general to be useful, or for schemas that are not intended to be used on their own in the container, but model a meaningful metadata object that can be part of a larger schema.

Guidelines for field definition:

  • Stick to the following types to construct your field annotation:
  • basic types: (bool, int, float, str)
  • basic hints from typing: Optional, Literal, Union, Set, List, Tuple
  • default pydantic types (such as AnyHttpUrl)
  • default classes supported by pydantic (e.g. enum.Enum, datetime, etc.)
  • constrained types defined using the phantom package
  • valid schema classes (subclasses of MetadataSchema)

  • Optional is for values that are semantically missing, You must not assume that a None value represents anything else than that.

  • Prefer Set over List when order is irrelevant and duplicates are not needed

  • Avoid using plain Dict, always define a schema instead if you know the keys, unless you really need to "pass through" whatever is given, which is usually not necessary for schemas that you design from scratch.

  • Prefer types from phantom over using pydantic Field settings for expressing simple value constraints (e.g. minimal/maximal value or collection length, etc.), because phantom types can be subclassed to narrow them down.

  • In general, avoid using Field at all, except for defining an alias for attributes that are not valid as Python variables (e.g. @id or $schema).

  • When using Field, make sure to annotate it with typing_extensions.Annotated, instead of assigning the Field object to the field name.

Rules for schema versioning:

All schemas must be direct or indirect subclass of MetadataSchema.

Semantic versioning (MAJOR, MINOR, PATCH) is to be followed. Bumping a version component means incrementing it and resetting the later ones to 0. When updating a schema, you must bump:

  • PATCH, if you do not modify the set of parsable instances,

  • MINOR, if if your changes strictly increase parsable instances,

  • MAJOR otherwise, i.e. some older metadata might not be valid anymore.

If you update a nested or inherited schema to a version with higher X (MAJOR, MINOR or PATCH), the version of your schema must be bumped in X as well.

Rules for schema subclassing:

A child schema that only extends a parent with new fields is safe. To schemas that redefine parent fields additional rules apply:

EACH instance of a schema MUST also be parsable by the parent schema

This means that a child schema may only override parent fields with more specific types, i.e., only RESTRICT the set of acceptable values compared to the parent field (safe examples include adding new or narrowing existing bounds and constraints, or excluding some values that are allowed by the parent schema).

As automatically verifying this in full generality is not feasible, but the ability to "restrict" fields is very much needed in practical use, the schema developer MUST create suitable represantative test cases that check whether this property is satisfied.

Try expressing field value restrictions by:

  • removing alternatives from a Union
  • using a subclass of a schema or phantom type that was used in the parent

These can be automatically checked most of the time.

Source code in src/metador_core/schema/pg.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
class PGSchema(pg.PluginGroup[MetadataSchema]):
    """Interface to access installed schema plugins.

    All registered schema plugins can be used anywhere in a Metador container to
    annotate any group or dataset with metadata objects following that schema.

    If you don't want that, do not register the schema as a plugin, but just use
    the schema class as a normal Python dependency. Schemas that are not
    registered as plugins still must inherit from MetadataSchema, to ensure that
    all required methods are available and work as expected by the system.

    Unregistered schemas can be used as "abstract" parent schemas that cannot be
    instantiated in containers because they are too general to be useful, or for
    schemas that are not intended to be used on their own in the container, but
    model a meaningful metadata object that can be part of a larger schema.


    Guidelines for field definition:

    * Stick to the following types to construct your field annotation:
      - basic types: (`bool, int, float, str`)
      - basic hints from `typing`: `Optional, Literal, Union, Set, List, Tuple`
      - default pydantic types (such as `AnyHttpUrl`)
      - default classes supported by pydantic (e.g. `enum.Enum`, `datetime`, etc.)
      - constrained types defined using the `phantom` package
      - valid schema classes (subclasses of `MetadataSchema`)

    * `Optional` is for values that are semantically *missing*,
      You must not assume that a `None` value represents anything else than that.

    * Prefer `Set` over `List` when order is irrelevant and duplicates are not needed

    * Avoid using plain `Dict`, always define a schema instead if you know the keys,
      unless you really need to "pass through" whatever is given, which is usually
      not necessary for schemas that you design from scratch.

    * Prefer types from `phantom` over using pydantic `Field` settings for expressing
      simple value constraints (e.g. minimal/maximal value or collection length, etc.),
      because `phantom` types can be subclassed to narrow them down.

    * In general, avoid using `Field` at all, except for defining an `alias` for
      attributes that are not valid as Python variables (e.g. `@id` or `$schema`).

    * When using `Field`, make sure to annotate it with `typing_extensions.Annotated`,
      instead of assigning the `Field` object to the field name.


    Rules for schema versioning:

    All schemas must be direct or indirect subclass of `MetadataSchema`.

    Semantic versioning (MAJOR, MINOR, PATCH) is to be followed.
    Bumping a version component means incrementing it and resetting the
    later ones to 0. When updating a schema, you must bump:

    * PATCH, if you do not modify the set of parsable instances,

    * MINOR, if if your changes strictly increase parsable instances,

    * MAJOR otherwise, i.e. some older metadata might not be valid anymore.

    If you update a nested or inherited schema to a version
    with higher X (MAJOR, MINOR or PATCH), the version
    of your schema must be bumped in X as well.


    Rules for schema subclassing:

    A child schema that only extends a parent with new fields is safe.
    To schemas that redefine parent fields additional rules apply:

    EACH instance of a schema MUST also be parsable by the parent schema

    This means that a child schema may only override parent fields
    with more specific types, i.e., only RESTRICT the set of acceptable
    values compared to the parent field (safe examples include
    adding new or narrowing existing bounds and constraints,
    or excluding some values that are allowed by the parent schema).

    As automatically verifying this in full generality is not feasible, but the
    ability to "restrict" fields is very much needed in practical use, the
    schema developer MUST create suitable represantative test cases that check
    whether this property is satisfied.

    Try expressing field value restrictions by:

    * removing alternatives from a `Union`
    * using a subclass of a schema or `phantom` type that was used in the parent

    These can be automatically checked most of the time.
    """

    class Plugin:
        name = SCHEMA_GROUP_NAME
        version = (0, 1, 0)
        requires = [plugingroups.Plugin.ref()]
        plugin_class = MetadataSchema
        plugin_info_class = SchemaPlugin

    def __post_init__(self):
        self._parent_schema: Dict[
            Type[MetadataSchema], Optional[Type[MetadataSchema]]
        ] = {}
        self._parents: Dict[AnyPluginRef, List[AnyPluginRef]] = {}  # base plugins
        self._children: Dict[AnyPluginRef, Set[AnyPluginRef]] = {}  # subclass plugins

        # used schemas inside schemas
        self._field_types: Dict[
            Type[MetadataSchema], Dict[str, Set[Type[MetadataSchema]]]
        ] = {}
        self._subschemas: Dict[MetadataSchema, Set[MetadataSchema]] = {}

        # partial schema classes
        self._partials: Dict[MetadataSchema, PartialModel] = {}
        self._forwardrefs: Dict[str, MetadataSchema] = {}

    def plugin_deps(self, plugin) -> Set[AnyPluginRef]:
        self._parent_schema[plugin] = infer_parent(plugin)
        if pcls := self._parent_schema[plugin]:
            # make sure a parent schema plugin is initialized before the child
            info = pcls.Plugin
            return {self.PluginRef(name=info.name, version=info.version)}
        else:
            return set()

    def check_plugin(self, name: str, plugin: Type[MetadataSchema]):
        check_types(plugin)  # ensure that (overrides of) fields are valid

    def _compute_parent_path(self, plugin: Type[MetadataSchema]) -> List[AnyPluginRef]:
        ref = plugin.Plugin.ref()
        ret = [ref]
        curr = plugin
        parent = self._parent_schema[curr]
        while parent is not None:
            p_ref = parent.Plugin.ref()
            ret.append(p_ref)
            curr = self._get_unsafe(p_ref.name, p_ref.version)
            parent = self._parent_schema[curr]

        ret.reverse()
        return ret

    def init_plugin(self, plugin):
        # pre-compute parent schema path
        ref = plugin.Plugin.ref()
        self._parents[ref] = self._compute_parent_path(plugin)
        if ref not in self._children:
            self._children[ref] = set()

        # collect children schema set for all parents
        parents = self._parents[ref][:-1]
        for parent in parents:
            self._children[parent].add(ref)

    # ----

    def parent_path(
        self, schema, version: Optional[SemVerTuple] = None
    ) -> List[AnyPluginRef]:
        """Get sequence of registered parent schema plugins leading to the given schema.

        This sequence can be a subset of the parent sequences in the actual class
        hierarchy (not every subclass must be registered as a plugin).
        """
        name, vers = pg.plugin_args(schema, version, require_version=True)
        ref = self.PluginRef(name=name, version=vers)
        self._ensure_is_loaded(ref)
        return list(self._parents[ref])

    def children(
        self, schema, version: Optional[SemVerTuple] = None
    ) -> Set[AnyPluginRef]:
        """Get set of names of registered (strict) child schemas."""
        name, vers = pg.plugin_args(schema, version, require_version=True)
        ref = self.PluginRef(name=name, version=vers)
        self._ensure_is_loaded(ref)
        return set(self._children[ref])

parent_path

parent_path(
    schema, version: Optional[SemVerTuple] = None
) -> List[AnyPluginRef]

Get sequence of registered parent schema plugins leading to the given schema.

This sequence can be a subset of the parent sequences in the actual class hierarchy (not every subclass must be registered as a plugin).

Source code in src/metador_core/schema/pg.py
189
190
191
192
193
194
195
196
197
198
199
200
def parent_path(
    self, schema, version: Optional[SemVerTuple] = None
) -> List[AnyPluginRef]:
    """Get sequence of registered parent schema plugins leading to the given schema.

    This sequence can be a subset of the parent sequences in the actual class
    hierarchy (not every subclass must be registered as a plugin).
    """
    name, vers = pg.plugin_args(schema, version, require_version=True)
    ref = self.PluginRef(name=name, version=vers)
    self._ensure_is_loaded(ref)
    return list(self._parents[ref])

children

children(
    schema, version: Optional[SemVerTuple] = None
) -> Set[AnyPluginRef]

Get set of names of registered (strict) child schemas.

Source code in src/metador_core/schema/pg.py
202
203
204
205
206
207
208
209
def children(
    self, schema, version: Optional[SemVerTuple] = None
) -> Set[AnyPluginRef]:
    """Get set of names of registered (strict) child schemas."""
    name, vers = pg.plugin_args(schema, version, require_version=True)
    ref = self.PluginRef(name=name, version=vers)
    self._ensure_is_loaded(ref)
    return set(self._children[ref])