packer
Definition of HDF5 packer plugin interface.
Packer ¶
Bases: ABC
, EnforceOverrides
Interface to be implemented by Metador HDF5 packer plugins.
These plugins is how support for wildly different domain-specific use-cases can be added to Metador in a opt-in and loosely-coupled way.
Users can install only the packer plugins they need for their use-cases, and such plugins can be easily developed independently from the rest of the Metador tooling, as long as this interface is respected.
Carefully read the documentation for the required attributes and methods
and implement them for your use-case in a subclass.
See metador_core.packer.example.GenericPacker
for an example plugin.
Requirements for well-behaved packers:
-
No closing of the container: The packer gets a writable record and is only responsible for performing the neccessary additions, deletions and modifications. It is not allowed to
close()
the container. -
No access to data in the container: Data in the container MUST NOT be read or be relied on for doing an update, as the nodes could be dummy stubs. One MAY rely on existence or absence of Groups, Datasets, Attributes and Metadata in the container (e.g.
in
orkeys
). -
Source directory is read-only: Files or directories inside of
data_dir
MUST NOT be created, deleted or modified by this method. -
Exceptional termination: In case that packing must be aborted, and exception MUST be raised. If the exception happened due to invalid data or metadata, it MUST be an DirValidationError object like in the other methods above, helping to find and fix the problem. Otherwise, a different appropriate exception may be used.
-
Semantic correctness: Packing a directory into a fresh container and updating an existing container MUST lead to the same observable result.
If you cannot guarantee this in full generality, do not implement update
.
In that case, if a container is updated, it will be cleared and then pack
is
called on it, as if it was a fresh container. In this case, there is no space
advantage gained over a fresh container (but it will keep its UUID).
- Semantic versioning of packers:
A packer MUST be able to update records that were created by this packer of the same or an earlier MINOR version.
More formally, the version MAJOR.MINOR.PATCH MUST adhere to the following contract:
-
increasing MAJOR means a break in backward-compatibility for older datasets (i.e. new packer cannot work with old records),
-
increasing MINOR means a break in forward-compatibility for newer datasets (i.e. older packers will not work with newer records),
-
increasing PATCH does not affect compatibility for datasets with the same MAJOR and MINOR version.
When the packer is updated, the Python package version MUST increase in a suitable way. As usual, whenever an earlier number is increased, the following numbers are reset to zero.
This means, the PATCH version should increase for e.g. bugfixes that do not change the structure or metadata stored in the dataset, MINOR should increase whenever from now on older versions of the packer would not be able to produce a valid update for a dataset created with this version, but upgrading older existing datasets with this version still works. Finally, MAJOR version is increased when all compatibility guarantees are off and the resulting container cannot be migrated or updated automatically.
You SHOULD provide tooling to migrate datasets between major versions.
Source code in src/metador_core/packer/__init__.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
check_dir
abstractmethod
classmethod
¶
check_dir(data_dir: Path) -> DirValidationErrors
Check whether the given directory is suitable for packing with this plugin.
This method will be called before pack
or update
and MUST detect
all problems (such as missing or invalid data or metadata) that can be
expected to be fixed by the user in preparation for the packing.
More specifically, it MUST cover all metadata that is to be provided directly by the user (i.e. is not inferred or extracted from generated data) for the purpose of packing and SHOULD try to cover as many problems with data and metadata as possible to avoid failure during the actual packing process.
Files or directories inside of data_dir
MUST NOT be created,
deleted or modified by this method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_dir |
Path
|
Directory containing all the data to be packed. |
required |
Returns:
Type | Description |
---|---|
DirValidationErrors
|
DirValidationError initialized with a dict mapping file paths |
DirValidationErrors
|
(relative to |
DirValidationErrors
|
The errors must be either a string (containing a human-readable summary of all |
DirValidationErrors
|
problems with that file), or another dict with more granular error messages, |
DirValidationErrors
|
in case that the file is e.g. a JSON-compatible file subject to validation |
DirValidationErrors
|
with JSON Schemas. |
Source code in src/metador_core/packer/__init__.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
update
abstractmethod
classmethod
¶
update(mc: MetadorContainer, data_dir: Path, diff: DirDiff)
Update a MetadorContainer with changes done to the data source directory.
The container
is assumed to be writable, and is either empty
or was previously packed by a compatible version of the packer.
The data_dir
is assumed to be suitable (according to check_dir
).
The diff
structure contains information about changed paths.
If not implemented, updates will be created by clearing the provided
container and using pack
on it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mc |
MetadorContainer
|
Metador IH5 record to pack the data into or update |
required |
data_dir |
Path
|
Directory containing all the data to be packed |
required |
diff |
DirDiff
|
Diff tree of dirs and files in data_dir compared to a previous state |
required |
Source code in src/metador_core/packer/__init__.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
pack
abstractmethod
classmethod
¶
pack(mc: MetadorContainer, data_dir: Path)
Pack a directory into an MetadorContainer.
The container
is assumed to be writable and empty.
The data_dir
is assumed to be suitable (according to check_dir
).
If not implemented, initial packing is done using update
with an empty container and a diff containing all the files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mc |
MetadorContainer
|
Metador IH5 record to pack the data into or update |
required |
data_dir |
Path
|
Directory containing all the data to be packed |
required |
Source code in src/metador_core/packer/__init__.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
PackerInfo ¶
Bases: MetadataSchema
Schema for info about the packer that was used to create a container.
Source code in src/metador_core/packer/__init__.py
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
source_dir
class-attribute
instance-attribute
¶
source_dir: DirHashsums = {}
Directory skeleton with hashsums of files at the time of packing.
Unclosable ¶
Bases: wrapt.ObjectProxy
Wrapper to prevent packers from closing/completing a container file.
Source code in src/metador_core/packer/__init__.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
|
PGPacker ¶
Bases: pg.PluginGroup[Packer]
Packer plugin group interface.
Source code in src/metador_core/packer/__init__.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 |
|
pack ¶
pack(
packer_name: str,
data_dir: Path,
target: Path,
h5like_cls: Callable,
)
Pack a directory into a container using an installed packer.
packer_name
must be an installed packer plugin.
data_dir
must be an existing directory suitable for the packer.
target
must be a non-existing path and will be passed into h5like_cls
as-is.
h5like_cls
must be a class compatible with MetadorContainer.
In case an exception happens during packing, notice that no cleanup is done.
The user is responsible for removing inconsistent files that were created.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
packer_name |
str
|
installed packer plugin name |
required |
data_dir |
Path
|
data source directory |
required |
target |
Path
|
target path for resulting container |
required |
h5like_cls |
Callable
|
class to use for creating the container |
required |
Source code in src/metador_core/packer/__init__.py
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
update ¶
update(
packer_name: str,
data_dir: Path,
target: Path,
h5like_cls: Path,
)
Update a container from its source directory using an installed packer.
Like pack
, but the target
must be a container which can be opened
with the h5like_cls
and was packed by a compatible packer.
In case an exception happens during packing, notice that no cleanup is done and if the container has been written to, the changes persist.
The user is responsible for removing inconsistent files that were created and ensuring that the previous state can be restored, e.g. from a backup.
Source code in src/metador_core/packer/__init__.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
|