Skip to content

User Manual

This is the reference manual for somesy providing details about its configuration and behavior.

Before you proceed, make sure you have read the introduction and the quick-start guide.

Somesy Metadata Schemas

Because the same information is represented in different ways and more or less detail in different files, somesy requires to put all project information in a somesy-specific input section is located in a supported input file. Somesy will use this as the single source of truth for the supported project metadata fields and can synchronize this information into different target files.

Info

The somesy schemas are designed to allow expressing metadata in the most useful and rich form, i.e. the best form that some of the target formats supports.

Somesy project metadata consists of two main schemas - one schema for describing people (authors, maintainers, contributors), and a schema describing the project.

The somesy metadata fields (especially for people) are mostly based on Citation File Format 1.2, with some custom extensions. Somesy will try staying aligned with future revisions of CITATION.cff, unless for technical or practical reasons a deviation or extension is justified.

One useful distinction somesy does in contrast to many target formats is to allow stating all people in one place. If a person is both an author and a maintainer, that person does not need to be listed with all information twice.

This is done by adding the author and maintainer flags that can be set for every listed person. Somesy will take care of duplicating the entries where this is needed.

Furthermore somesy allows to provide more fine-grained information about the contribution of a person and acknowledge contributors that are neither full authors or maintainers.

Note

Currently, provided information about contributors that are neither authors nor maintainers, and all the detailed information on the contributions is not used.

Nevertheless, we encourage tracking granular contributor and contribution information in order to motivate and acknowledge also minor or invisible contributions to a projects.

Once CITATION.cff introduces corresponding mechanisms, somesy will be aligned with the corresponding capabilities. Furthermore, somesy might support the allcontributors tool as a target in the future.

Schemas Overview

Here is an overview of the schemas used in somesy.

The complete somesy input file (somesy.toml) or section (pyproject.toml).

Field Type Required? Default Description
project ProjectMetadata yes Project metadata to be used and synchronized.
config SomesyConfig no somesy tool configuration (matches CLI flags).

Pydantic model for Project Metadata Input.

Field Type Required? Default Description
name str yes Project name.
description str yes Project description.
version str no Project version.
license LicenseEnum yes SPDX License string.
repository URL no URL of the project source code repository.
homepage URL no URL of the project homepage.
keywords list[str] no Keywords that describe the project.
people list[Person] yes Project authors, maintainers and contributors.

Metadata abount a person in the context of a software project.

This schema is based on CITATION.cff 1.2, modified and extended for the needs of somesy.

Field Type Required? Default Description
orcid URL no The person's ORCID url (not required, but highly suggested).
email str yes The person's email address.
family_names str yes The person's family names.
given_names str yes The person's given names.
name_particle str no The person's name particle, e.g., a nobiliary particle or a preposition meaning 'of' or 'from' (for example 'von' in 'Alexander von Humboldt').
name_suffix str no The person's name-suffix, e.g. 'Jr.' for Sammy Davis Jr. or 'III' for Frank Edwin Wright III.
alias str no The person's alias.
affiliation str no The person's affiliation.
address str no The person's address.
city str no The person's city.
country Country no The person's country.
fax str no The person's fax number.
post_code str no The person's post-code.
region str no The person's region.
tel str no The person's phone number.
author bool no false Indicates whether the person is an author of the project (i.e. for citations).
maintainer bool no false Indicates whether the person is a maintainer of the project (i.e. for contact).
contribution str no Summary of how the person contributed to the project.
contribution_types list[ContributionTypeEnum] no Relevant types of contributions (see https://allcontributors.org/docs/de/emoji-key).
contribution_begin date no Beginning date of the contribution.
contribution_end date no Ending date of the contribution.

Pydantic model for somesy tool configuration.

Note that all fields match CLI options, and CLI options will override the values declared in a somesy input file (such as somesy.toml).

Field Type Required? Default Description
show_info bool no false Show basic information messages on run (-v flag).
verbose bool no false Show verbose messages on run (-vv flag).
debug bool no false Show debug messages on run (-vvv flag).
input_file Path no "somesy.toml" Project metadata input file path.
no_sync_pyproject bool no false Do not sync with pyproject.toml.
pyproject_file Path no "pyproject.toml" pyproject.toml file path.
no_sync_package_json bool no false Do not sync with package.json.
package_json_file Path no "package.json" package.json file path.
no_sync_cff bool no false Do not sync with CFF.
cff_file Path no "CITATION.cff" CFF file path.
no_sync_codemeta bool no false Do not sync with codemeta.json.
codemeta_file Path no "codemeta.json" codemeta.json file path.

Metadata Mapping

From its own schema somesy must convert the information into the target formats. The following tables sketch how fields are mapped to corresponding other fields in some of the currently supported formats.

Field Name Poetry Config SetupTools Config CITATION.cff package.json Requirement
given-names name+email name given-names name required
family-names name+email name family-names name required
email name+email email email email required
orcid - - orcid url optional
(many others) - - (same) - optional
Field Name Poetry Config SetupTools Config CITATION.cff package.json Requirement
name name name title name required
description description description abstract description required
license license license license license required
version version version version version optional
author=true authors authors authors author required
maintainer=true maintainers maintainers contact maintainers optional
people - - - contributors optional
keywords keywords keywords keywords keywords optional
repository repository urls.repository repository_code repository optional
homepage homepage urls.homepage url homepage optional

Note that the mapping is often not 1-to-1. For example, CITATION.cff allows rich specification of author contact information and complex names. In contrast, poetry only supports a simple string with a name and email (like in git commits) to list authors and maintainers. Therefore somesy sometimes must do much more than just move or rename fields. This means that giving a clean and complete mapping overview is not feasible. In case of doubt or confusion, please open an issue or consult the somesy code.

The somesy CLI tool

You can see all supported somesy CLI command options using somesy --help:

somesy sync --help
                                                                                
 Usage: somesy sync [OPTIONS] COMMAND [ARGS]...                                 
                                                                                
 Sync project metadata input with metadata files.                               

╭─ Options ────────────────────────────────────────────────────────────────────╮
 --input-file         -i      FILE  Somesy input file path (default:          
                                    .somesy.toml)                             
                                    [default: None]                           
 --no-sync-pyproject  -P            Do not sync pyproject.toml file (default: 
                                    False)                                    
 --pyproject-file     -p      FILE  Existing pyproject.toml file path         
                                    (default: pyproject.toml)                 
                                    [default: None]                           
 --sync-package-json  -J            Do not sync package.json file (default:   
                                    False)                                    
 --package-json-file  -j      FILE  Existing package.json file path (default: 
                                    package.json)                             
                                    [default: None]                           
 --no-sync-cff        -C            Do not sync CITATION.cff file (default:   
                                    False)                                    
 --cff-file           -c      FILE  CITATION.cff file path (default:          
                                    CITATION.cff)                             
                                    [default: None]                           
 --no-sync-codemeta   -M            Do not sync codemeta.json file            
 --codemeta-file      -m      FILE  Custom codemeta.json file path            
                                    [default: None]                           
 --help                             Show this message and exit.               
╰──────────────────────────────────────────────────────────────────────────────╯

Everything that can be configured as a CLI flag can also be set in a somesy.toml file in the [config] section. CLI arguments set in an input file override the defaults, while options passed as CLI arguments override the configuration.

Without an input file specifically provided, somesy will check if it can find a valid

  • .somesy.toml
  • somesy.toml
  • pyproject.toml (in tool.somesy section)
  • package.json (in somesy section)

which is located in the current working directory. If you want to provide the somesy input file from a different location, you can pass it with the -i option.

Somesy Input File

Here is an example how project metadata and somesy can be configured using one of the supported input formats:

[project]
name = "my-amazing-project"
version = "0.1.0"
description = "Brief description of my amazing software."

keywords = ["some", "descriptive", "keywords"]
license = "MIT"
repository = "https://github.com/username/my-amazing-project"

# This is you, the proud author of your project
[[project.people]]
given-names = "Jane"
family-names = "Doe"
email = "j.doe@example.com"
orcid = "https://orcid.org/0000-0000-0000-0001"
author = true      # is a full author of the project (i.e. appears in citations)
maintainer = true  # currently maintains the project (i.e. is a contact person)

# this person is a acknowledged contributor, but not author or maintainer:
[[project.people]]
given-names = "Another"
family-names = "Contributor"
email = "a.contributor@example.com"
orcid = "https://orcid.org/0000-0000-0000-0002"

[config]
verbose = true     # show detailed information about what somesy is doing
[tool.poetry]
name = "my-amazing-project"
version = "0.1.0"
...

[tool.somesy.project]
name = "my-amazing-project"
version = "0.1.0"
description = "Brief description of my amazing software."

keywords = ["some", "descriptive", "keywords"]
license = "MIT"
repository = "https://github.com/username/my-amazing-project"

# This is you, the proud author of your project
[[tool.somesy.project.people]]
given-names = "Jane"
family-names = "Doe"
email = "j.doe@example.com"
orcid = "https://orcid.org/0000-0000-0000-0001"
author = true      # is a full author of the project (i.e. appears in citations)
maintainer = true  # currently maintains the project (i.e. is a contact person)

# this person is a acknowledged contributor, but not author or maintainer:
[[tool.somesy.project.people]]
given-names = "Another"
family-names = "Contributor"
email = "a.contributor@example.com"
orcid = "https://orcid.org/0000-0000-0000-0002"

[tool.somesy.config]
verbose = true     # show detailed information about what somesy is doing
{
  "name": "my-amazing-project",
  "version": "0.1.0",
  ...

  "somesy": {
    "project": {
      "name": "my-amazing-project",
      "version": "0.1.0",
      "description": "Brief description of my amazing software.",
      "keywords": ["some", "descriptive", "keywords"],
      "license": "MIT",
      "repository": "https://github.com/username/my-amazing-project",
      "people": [
        {
          "given-names": "Jane",
          "family-names": "Doe",
          "email": "j.doe@example.com",
          "orcid": "https://orcid.org/0000-0000-0000-0001",
          "author": true,
          "maintainer": true
        },
        {
          "given-names": "Another",
          "family-names": "Contributor",
          "email": "a.contributor@example.com",
          "orcid": "https://orcid.org/0000-0000-0000-0002"
        }
      ]
    },
    "config": {
      "verbose": true
    }
  }
}

All possible metadata fields and configuration options are explained further above.

pre-commit hook

We highly recommend to use somesy as a pre-commit hook. A pre-commit hook runs on every commit to automatically point out issues or fix them on the spot, so if you do not use pre-commit in your project yet, you should start today! When used this way, somesy can fix most typical issues with your project metadata even before your changes can leave your computer.

To add somesy as a pre-commit hook, add it to your .pre-commit-config.yaml file in the root folder of your repository:

repos:
  # ... (your other hooks) ...
  - repo: https://github.com/Materials-Data-Science-and-Informatics/somesy
    rev: "0.1.0"
    hooks:
      - id: somesy

Note that pre-commit gives somesy the staged version of files, so when using somesy with pre-commit, keep in mind that

  • if somesy changed some files, you need to git add them again (and rerun pre-commit)
  • if you explicitly run pre-commit, make sure to git add all changed files (just like before a commit)

Synchronization

Unless configured otherwise, somesy will create CITATION.cff and codemeta.json files if they do not exist. Other supported files (such as pyproject.toml or package.json) are updated if they already exist in your repository.

If you do not want that somesy creates/synchronizes these files, you can disable them by CLI options or in your somesy configuration.

Metadata Update Logic

In this section we explain a few details about how somesy updates metadata.

Somesy Inputs Override Target Values

In general, somesy updates fields in target files and formats based on the information stated in the somesy.toml.

It will convert the metadata into the form needed for a target, while trying to preserve as much information as possible. Then it carefully updates the file, while keeping all other fields in the target file unchanged.

In most cases, somesy will try not to interfere with other values, metadata and configuration you might have in a target file.

Info

For typically manually-edited files, it will even make sure that the comments stay in place! (currently works for TOML)

Tip

Only edit target files manually to add or update fields that somesy does not understand or care about!

Warning

All changes in target files you do to fields somesy does understand will be overwritten the next time you run somesy.

Tip

  • update all project metadata supported by somesy in the somesy.toml
  • update other metadata directly in the target files

Checking Somesy Results

Note that somesy in general will try doing a good job and hopefully will in most cases, but in certain tricky situations it might not be able to figure out the and needed changes correctly.

Danger

Always check the files that somesy synchronized look right before committing/pushing the changes!

Doing what somesy does both convenient and right is (maybe surprisingly to you) quite difficult, so while somesy should save you quite some tedious work, you should not use it blindly. You have been warned!

Person Identification Heuristics

One frequent source of high-level project metadata changes is fluctuation of authors, maintainers and contributors and eventual changes of the respective contact and identification information.

Somesy will try its best to keep track of persons involved in your software project, but to avoid possible problems and unexpected behavior, it might be helpful to understand how somesy determines whether two metadata records describe the same real person.

When somesy compares two metadata records about a person, it will proceed as follows:

  1. If both records contain an ORCID, then the person is the same if the attached ORCIDs are equal, and different if it is not.
  2. Otherwise, if both records have an attached email address, and it is the same email, then they are the same person.
  3. Otherwise, the records are considered to be about the same person if they agree on the full name (i.e. given, middle and family name sequence).

Tip

State ORCIDs for persons whenever possible (i.e. the person has an ORCID)!

Tip

If a person does not have an ORCID, suggest that they should create one!

Somesy will usually correctly understand cases such as:

  1. An ORCID being added to a person (i.e. if it was not present before)
  2. A changed email address (if the name stays the same)
  3. A changed name (if the email address stays the same)
  4. Any other relevant metadata attached to the person

Nevertheless, you should check the changes somesy does before committing them to your repository, especially after you significantly modified your project metadata.

Warning

Note that changing the ORCID will not be recognized, because ORCIDs are assumed to be unique per person.

If you initially have stated an incorrect ORCID for a person and then change it, somesy will think that this is a new person. Therefore, in such a case you will need to fix the ORCID in all configured somesy targets either before running somesy (so somesy will not create new person entries), or after running somesy (to remove the duplicate entries with the incorrect ORCID).

Codemeta

While somesy is modifying existing files for most supported formats, CodeMeta is implemented differently.

In order to avoid redundant work, somesy relies on existing tools to generate codemeta.json files. So when you synchronize the metadata and the codemeta target is enabled, somesy will generate your codemeta.json by:

  • synchronizing metadata to a pyproject.toml or package.json (if enabled)
  • synchronizing metadata to a CITATION.cff (if enabled)
  • running cffconvert and codemetapy to combine both sources into a final codemeta.json

Warning

The codemeta.json is overwritten and regenerated from scratch every time you sync, so do not edit it if you have the codemeta target enabled in somesy!

As codemeta.json is considered a technical "backend-format" derived from other inputs, in most cases you probably do not need or should edit it by hand anyway.

FAQ

Somesy introduces it's own metadata format... isn't this counter-productive?

We don't propose to use somesy as a new "standard". On the contrary, the whole point of somesy is to help maintaining standard-compliant metadata alongside other representations. To do its job, somesy needs to introduce a canonical format to express the metadata it tries to manage for you, because otherwise building such a tool is practically impossible. Should you after some time decide you do not want to use it anymore, nothing is lost - you keep all your CITATION.cff and codemeta.json and pyproject.toml files and can continue to maintain them however you see fit.

The somesy-specific format is just the nice and convenient interface to make everybody's life easier. Furthermore, nobody needs to care whether, under the hood, you use somesy (or anything like it) or not - they can use the corresponding files they already know to get the information they need. So there is no "risk" involved in adopting somesy, because it does not try to abolish any other formats or standards or becoming such.