Coverage for src/metador_core/container/__init__.py: 100%

4 statements  

« prev     ^ index     » next       coverage.py v7.3.2, created at 2023-11-02 09:33 +0000

1"""Metador interface to manage metadata in HDF5 containers. 

2 

3Works with plain h5py.File and IH5Record subclasses and can be extended to work 

4with any type of archive providing the required functions in a h5py-like interface. 

5 

6When assembling a container, the compliance with the Metador container specification is 

7ensured by using it through the MetadorContainer interface. 

8 

9Technical Metador container specification (not required for users): 

10 

11Metador uses only HDF5 Groups and Datasets. We call both kinds of objects Nodes. 

12Notice that HardLinks, SymLinks, ExternalLinks or region references cannot be used. 

13 

14Users are free to lay out data in the container as they please, with one exception: 

15a user-defined Node MUST NOT have a name starting with "metador_". 

16"metador_" is a reserved prefix for Group and Dataset names used to manage 

17technical bookkeeping structures that are needed for providing all container features. 

18 

19For each HDF5 Group or Dataset there MAY exist a corresponding 

20Group for Metador-compatible metadata that is prefixed with "metador_meta_". 

21 

22For "/foo/bar" the metadata is to be found... 

23 ...in a group "/foo/metador_meta_bar", if "/foo/bar" is a dataset, 

24 ...in a group "/foo/bar/metador_meta_" if it is a group. 

25We write meta("/foo/bar") to denote that group. 

26 

27Given schemas with entrypoint names X, Y and Z such that X is the parent schema of Y, 

28and Y is the parent schema of Z and a node "/foo/bar" annotated by a JSON object of 

29type Z, that JSON object MUST be stored as a newline-terminated, utf-8 encoded byte 

30sequence at the path meta("/foo/bar")/X/Y/Z/=UUID, where the UUID is unique in the 

31container. 

32 

33For metadata attached to an object we expect the following to hold: 

34 

35Node instance uniqueness: 

36Each schema MAY be instantiated explicitly for each node at most ONCE. 

37Collections thus must be represented on schema-level whenever needed. 

38 

39Parent Validity: 

40Any object of a subschema MUST also be a valid instance of all its parent schemas. 

41The schema developers are responsible to ensure this by correct implementation 

42of subschemas. 

43 

44Parent Consistency: 

45Any objects of a subtype of schema X that stored at the same node SHOULD result 

46in the same object when parsed as X (they agree on the "common" information). 

47Thus, any child object can be used to retrieve the same parent view on the data. 

48The container creator is responsible for ensuring this property. In case it is not 

49fulfilled, retrieving data for a more abstract type will yield it from ANY present subtype 

50instance (but always the same one, as long as the container does not change)! 

51 

52If at least one metadata object it stored, a container MUST have a "/metador_toc" Group, 

53containing a lookup index of all metadata objects following a registered metadata schema. 

54This index structure MUST be in sync with the node metadata annotations. 

55Keeping this structure in sync is responsibility of the container interface. 

56 

57This means (using the previous example) that for "/foo/bar" annotated by Z there also 

58exists a dataset "/metador_toc/X/Y/Z/=UUID" containing the full path to the metadata node, 

59i.e. "meta(/foo/bar)/X/Y/Z/=UUID". Conversely, there must not be any empty entry-point 

60named Groups, and all listed paths in the TOC must point to an existing node. 

61 

62A valid container MUST contain a dataset /metador_version string of the form "X.Y" 

63 

64A correctly implemented library supporting an older minor version MUST be able open a 

65container with increased minor version without problems (by ignoring unknown data), 

66so for a minor update of this specification only new entities may be defined. 

67 

68Known technical limitations: 

69 

70Due to the fact that versioning of plugins such as schemas is coupled to the versioning 

71of the respective Python packages, it is not (directly) possible to use two different 

72versions of the same schema in the same environment (with the exception of mappings, as 

73they may bring their own equivalent schema classes). 

74 

75Minor version updates of packages providing schemas must ensure that the classes providing 

76schemas are backward-compatible (i.e. can parse instances of older minor versions). 

77 

78Major version updates must also provide mappings migrating between the old and new schema 

79versions. In case that the schema did not change, the mapping is simply the identity. 

80""" 

81 

82from .interface import MetadorContainerTOC, MetadorMeta 

83from .provider import ContainerProxy 

84from .wrappers import MetadorContainer, MetadorDataset, MetadorGroup, MetadorNode 

85 

86__all__ = [ 

87 "MetadorContainer", 

88 "MetadorNode", 

89 "MetadorGroup", 

90 "MetadorDataset", 

91 "MetadorMeta", 

92 "MetadorContainerTOC", 

93 "ContainerProxy", 

94]