Developer API reference

Warning

The APIs documented here are not stable and may change from one version to another. This is meant to be used by developers, both of papis itself and any external plugins.

papis.bibtex

A set of utilities for working with BibTeX and BibLaTeX (as described in the manual).

papis.bibtex.bibtex_standard_types = frozenset({'article', 'book', 'bookinbook', 'booklet', 'collection', 'dataset', 'inbook', 'incollection', 'inproceedings', 'inreference', 'manual', 'misc', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'periodical', 'proceedings', 'reference', 'report', 'software', 'suppbook', 'suppcollection', 'suppperiodical', 'thesis', 'unpublished'})

Regular BibLaTeX types (Section 2.1.1).

papis.bibtex.bibtex_type_aliases = {'conference': 'inproceedings', 'electronic': 'online', 'masterthesis': 'thesis', 'phdthesis': 'thesis', 'techreport': 'report', 'www': 'online'}

BibLaTeX type aliases (Section 2.1.2).

papis.bibtex.bibtex_non_standard_types = frozenset({'artwork', 'audio', 'bibnote', 'commentary', 'image', 'jurisdiction', 'legal', 'legislation', 'letter', 'movie', 'music', 'performance', 'review', 'standard', 'video'})

Non-standard BibLaTeX types (Section 2.1.3).

papis.bibtex.biblatex_software_types = frozenset({'codefragment', 'software', 'softwaremodule', 'softwareversion'})

BibLaTeX Software types (Section 2).

papis.bibtex.bibtex_types = frozenset({'article', 'artwork', 'audio', 'bibnote', 'book', 'bookinbook', 'booklet', 'codefragment', 'collection', 'commentary', 'conference', 'dataset', 'electronic', 'image', 'inbook', 'incollection', 'inproceedings', 'inreference', 'jurisdiction', 'legal', 'legislation', 'letter', 'manual', 'masterthesis', 'misc', 'movie', 'music', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'performance', 'periodical', 'phdthesis', 'proceedings', 'reference', 'report', 'review', 'software', 'softwaremodule', 'softwareversion', 'standard', 'suppbook', 'suppcollection', 'suppperiodical', 'techreport', 'thesis', 'unpublished', 'video', 'www'})

A set of known BibLaTeX types (as described in Section 2.1 of the manual). These types are a union of the types above and can be extended with extra-bibtex-types.

papis.bibtex.bibtex_standard_keys = frozenset({'abstract', 'addendum', 'afterword', 'annotation', 'annotator', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'file', 'foreword', 'holder', 'howpublished', 'indextitle', 'institution', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'label', 'language', 'library', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'publisher', 'pubstate', 'reprinttitle', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'subtitle', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'year'})

BibLaTeX data fields (Section 2.2.2).

papis.bibtex.bibtex_key_aliases = {'address': 'location', 'annote': 'annotation', 'archiveprefix': 'eprinttype', 'journal': 'journaltitle', 'key': 'sortkey', 'pdf': 'file', 'primaryclass': 'eprintclass', 'school': 'institution'}

BibLaTeX field aliases (Section 2.2.5).

papis.bibtex.bibtex_special_keys = frozenset({'crossref', 'entryset', 'execute', 'gender', 'ids', 'indexsorttitle', 'keywords', 'langid', 'langidopts', 'options', 'presort', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'xdata', 'xref'})

Special BibLaTeX fields (Section 2.2.3).

papis.bibtex.biblatex_software_keys = frozenset({'abstract', 'author', 'date', 'doi', 'editor', 'eprint', 'eprintclass', 'eprinttype', 'file', 'hal_id', 'hal_version', 'institution', 'introducedin', 'license', 'month', 'note', 'organization', 'publisher', 'related', 'relatedstring', 'relatedtype', 'repository', 'subtitle', 'swhid', 'title', 'url', 'urldate', 'version', 'year'})

BibLaTeX software keys (Section 3). Most of these keys are already standard BibLaTeX keys from bibtex_standard_keys.

papis.bibtex.bibtex_keys = frozenset({'abstract', 'addendum', 'address', 'afterword', 'annotation', 'annotator', 'annote', 'archiveprefix', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'crossref', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entryset', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'execute', 'file', 'foreword', 'gender', 'hal_id', 'hal_version', 'holder', 'howpublished', 'ids', 'indexsorttitle', 'indextitle', 'institution', 'introducedin', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journal', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'key', 'keywords', 'label', 'langid', 'langidopts', 'language', 'library', 'license', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'options', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'pdf', 'presort', 'primaryclass', 'publisher', 'pubstate', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'repository', 'reprinttitle', 'school', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'subtitle', 'swhid', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'xdata', 'xref', 'year'})

A set of known BibLaTeX fields (as described in Section 2.2 of the manual). These fields are a union of the above fields and can be extended with extended with extra-bibtex-keys.

papis.bibtex.bibtex_type_required_keys = {'article': ({'author'}, {'title'}, {'eprinttype', 'journaltitle'}, {'date', 'year'}), 'book': ({'author'}, {'title'}, {'date', 'year'}), 'booklet': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'codefragment': ({'url'},), 'collection': ({'editor'}, {'title'}, {'date', 'year'}), 'dataset': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'inbook': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'incollection': ({'author'}, {'title'}, {'editor'}, {'booktitle'}, {'date', 'year'}), 'inproceedings': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'manual': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'misc': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'online': ({'author', 'editor'}, {'title'}, {'date', 'year'}, {'doi', 'eprint', 'url'}), 'patent': ({'author'}, {'title'}, {'number'}, {'date', 'year'}), 'periodical': ({'editor'}, {'title'}, {'date', 'year'}), 'proceedings': ({'title'}, {'date', 'year'}), 'report': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'software': ({'author', 'editor'}, {'title'}, {'url'}, {'year'}), 'softwaremodule': ({'author'}, {'subtitle'}, {'url'}, {'year'}), 'softwareversion': ({'author', 'editor'}, {'title'}, {'url'}, {'version'}, {'year'}), 'thesis': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'unpublished': ({'author'}, {'title'}, {'date', 'year'}), None: ()}

A mapping of supported BibLaTeX entry types (see bibtex_types) to BibLaTeX fields (see bibtex_keys). Each value is a tuple of disjoint sets that can contain multiple fields required for the particular type, e.g. an article may require either a year or a date field.

papis.bibtex.bibtex_type_required_keys_aliases = {'bookinbook': 'inbook', 'inreference': 'incollection', 'mvbook': 'book', 'mvcollection': 'collection', 'mvproceedings': 'proceedings', 'mvreference': 'collection', 'reference': 'collection', 'suppbook': 'book', 'suppcollection': 'collection', 'suppperiodical': 'periodical'}

A mapping for additional BibLaTeX types that have the same required fields. This mapping can be used to convert types before looking into bibtex_type_required_keys.

papis.bibtex.bibtex_type_converter: Dict[str, str] = {'OriginalPaper': 'article', 'annotation': 'misc', 'attachment': 'misc', 'audioRecording': 'audio', 'bill': 'legislation', 'blogPost': 'online', 'bookSection': 'inbook', 'case': 'jurisdiction', 'computerProgram': 'software', 'conferencePaper': 'inproceedings', 'dictionaryEntry': 'misc', 'document': 'article', 'email': 'online', 'encyclopediaArticle': 'article', 'film': 'video', 'forumPost': 'online', 'hearing': 'jurisdiction', 'instantMessage': 'online', 'interview': 'article', 'journal': 'article', 'journalArticle': 'article', 'magazineArticle': 'article', 'manuscript': 'unpublished', 'map': 'misc', 'monograph': 'book', 'newspaperArticle': 'article', 'note': 'misc', 'podcast': 'audio', 'preprint': 'unpublished', 'presentation': 'misc', 'radioBroadcast': 'audio', 'statute': 'jurisdiction', 'tvBroadcast': 'video', 'videoRecording': 'video', 'webpage': 'online'}

A mapping of arbitrary types to BibLaTeX types in bibtex_types. This mapping can be used when translating from other software, e.g. Zotero has custom fields in its schema.

papis.bibtex.bibtex_key_converter: Dict[str, str] = {'abstractNote': 'abstract', 'conferenceName': 'eventtitle', 'place': 'location', 'proceedingsTitle': 'booktitle', 'publicationTitle': 'journal', 'university': 'school'}

A mapping of arbitrary fields to BibLaTeX fields in bibtex_keys. This mapping can be used when translating from other software.

papis.bibtex.bibtex_ignore_keys = frozenset({'file'})

A set of BibLaTeX fields to ignore when exporting from the Papis database. These can be extended with bibtex-ignore-keys.

papis.bibtex.ref_allowed_characters = '([^a-zA-Z0-9._]+|(?<!\\\\)[._])'

A regex for acceptable characters to use in a reference string. These are used by ref_cleanup() to remove any undesired characters.

papis.bibtex.bibtex_verbatim_fields = frozenset({'doi', 'eprint', 'file', 'pdf', 'url', 'urlraw'})

A list of fields that should not be escaped. In general, these will be escaped by the BibTeX engine and should not be modified (e.g. Verbatim fields and URI fields in Section 2.2.1).

papis.bibtex.exporter(documents: List[Document]) str[source]

Convert documents into a list of BibLaTeX entries

class papis.bibtex.Importer(**kwargs: Any)[source]

Importer that parses BibTeX files.

classmethod match(uri: str) Importer | None[source]

Check if the importer can process the given URI.

For example, an importer that supports links from the arXiv can check that the given URI matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Importer object.

Parameters:

uri – An URI where the document information should be retrieved from.

Returns:

An importer instance if the match to the URI is successful or None otherwise.

fetch_data() Any[source]

Fetch metadata from the given uri.

The imported metadata is stored in ctx.

papis.bibtex.bibtexparser_entry_to_papis(entry: Dict[str, Any]) Dict[str, Any][source]

Convert the keys of a BibTeX entry parsed by bibtexparser to a papis-compatible format.

Parameters:

entry – a dictionary with keys parsed by bibtexparser.

Returns:

a dictionary with keys converted to a papis-compatible format.

papis.bibtex.bibtex_to_dict(bibtex: str) List[Dict[str, str]][source]

Convert a BibTeX file (or string) to a list of papis-compatible dictionaries.

This will convert an entry like

@article{ref,
    author = { ... },
    title = { ... },
    ...,
}

to a dictionary such as

{ "type": "article", "author": "...", "title": "...", ...}
Parameters:

bibtex – a path to a BibTeX file or a string containing BibTeX formatted data. If it is a file, its contents are passed to BibTexParser.

Returns:

a list of entries from the BibTeX data in a compatible format.

papis.bibtex.ref_cleanup(ref: str) str[source]

Function to cleanup reference strings so that they are accepted by BibLaTeX.

This uses the ref_allowed_characters to remove any disallowed characters from the given ref. Furthermore, slugify is used to remove unicode characters and ensure consistent use of the underscrore _ as a separator.

Returns:

a reference without any disallowed characters.

papis.bibtex.create_reference(doc: Dict[str, Any], force: bool = False) str[source]

Try to create a reference for the document doc.

If the document doc does not have a "ref" key, this function attempts to create one, otherwise the existing key is returned. When creating a new reference:

  • the ref-format key is used, if available,

  • the document DOI is used, if available,

  • a string is constructed from the document data (author, title, etc.).

Parameters:

force – if True, the reference is re-created even if the document already has a "ref" key.

Returns:

a clean (see ref_cleanup()) reference for the document.

papis.bibtex.to_bibtex(document: Document, *, indent: int = 2) str[source]

Convert a document to a BibTeX containing only valid metadata.

To convert a document, it must have a valid BibTeX type (see bibtex_types) and a valid reference under the "ref" key (see create_reference()). Valid BibTeX keys (see bibtex_keys) are exported, while other keys are ignored (see bibtex_ignore_keys) with the following rules:

  • bibtex-unicode is used to control whether the field values can contain unicode characters.

  • bibtex-journal-key is used to define the field name for the journal.

  • bibtex-export-file is used to also add a "file" field to the BibTeX entry, which can be used by e.g. Zotero to import documents.

Parameters:
  • document – a papis document.

  • indent – set indentation for the BibTeX fields.

Returns:

a string containing the document metadata in a BibTeX format.

papis.citations

papis.citations.Citation

A citation for an existing document.

alias of Dict[str, Any]

papis.citations.Citations

A list of citations for an existing document.

alias of List[Dict[str, Any]]

papis.citations.get_metadata_citations(doc: Document | Dict[str, Any]) List[Dict[str, Any]][source]

Get the citations in the metadata that contain a DOI.

papis.citations.fetch_citations(doc: Document) List[Dict[str, Any]][source]

Retrieve citations for the document.

Citation retrieval is mainly based on querying Crossref metadata based on the DOI of the document. If the document does not have a DOI, this function will fail to retrieve any citations.

Returns:

a list of citations that have a DOI.

papis.citations.get_citations_from_database(dois: Sequence[str]) List[Dict[str, Any]][source]

Look for document DOIs in the database.

Parameters:

dois – a sequence of DOIs to look for in the current library database.

Returns:

a sequence of documents from the current library that match the given dois, if any.

papis.citations.update_and_save_citations_from_database_from_doc(doc: Document) None[source]

Update the citations file of an existing document.

This function will get any existing citations in the document, update them as appropriate and save them back to the citation file.

papis.citations.update_citations_from_database(citations: List[Dict[str, Any]]) List[Dict[str, Any]][source]

Update a list of citations with data from the database.

Parameters:

citations – a list of existing citations to update.

papis.citations.save_citations(doc: Document, citations: List[Dict[str, Any]]) None[source]

Save the citations to the document’s citation file.

papis.citations.fetch_and_save_citations(doc: Document) None[source]

Retrieve citations from available sources and save them to the citations file.

papis.citations.get_citations_file(doc: Document) str | None[source]

Get the document’s citation file path (see citations-file-name).

Returns:

an absolute path to the citations file for doc.

papis.citations.has_citations(doc: Document) bool[source]
Returns:

True if the document has an existing citations file and False otherwise.

papis.citations.get_citations(doc: Document) List[Dict[str, Any]][source]

Retrieve citations from the document’s citation file.

papis.citations.get_cited_by_file(doc: Document) str | None[source]

Get the documents cited-by file (see cited-by-file-name).

Returns:

an absolute path to the cited-by file for doc.

papis.citations.has_cited_by(doc: Document) bool[source]
Returns:

True if the document has a cited-by file and False otherwise.

papis.citations.save_cited_by(doc: Document, citations: List[Dict[str, Any]]) None[source]

Save the cited-by list citations to the document’s cited-by file.

papis.citations.fetch_cited_by_from_database(cit: Dict[str, Any]) List[Dict[str, Any]][source]

Fetch a list of documents that cite cit from the database.

Parameters:

cit – a citation to look for in the database.

Returns:

a list of documents that cite cit.

papis.citations.fetch_and_save_cited_by_from_database(doc: Document) None[source]

Call fetch_cited_by_from_database() and save_cited_by().

papis.citations.get_cited_by(doc: Document) List[Dict[str, Any]][source]

Get cited-by citations for the given document.

papis.cli

papis.cli.bool_flag(*args: Any, **kwargs: Any) Callable[[...], Any][source]

A wrapper to click.option() that hardcodes a boolean flag option.

papis.cli.query_argument(**attrs: Any) Callable[[...], Any][source]

Adds a query argument as a click decorator.

papis.cli.query_option(**attrs: Any) Callable[[...], Any][source]

Adds a -q, --query option as a click decorator.

papis.cli.sort_option(**attrs: Any) Callable[[...], Any][source]

Adds a --sort and a --reverse option as a click decorator.

papis.cli.doc_folder_option(**attrs: Any) Callable[[...], Any][source]

Adds a --doc-folder argument as a click decorator.

papis.cli.all_option(**attrs: Any) Callable[[...], Any][source]

Adds a --all option as a click decorator.

papis.cli.git_option(**attrs: Any) Callable[[...], Any][source]

Adds a --git option as a click decorator.

papis.cli.handle_doc_folder_or_query(query: str, doc_folder: str | Tuple[str, ...] | None) List[Document][source]

Query database for documents.

This handles the query_option() and doc_folder_option() command-line arguments. If a doc_folder is given, then the document at that location is loaded, otherwise the database is queried using query.

Parameters:
papis.cli.handle_doc_folder_query_sort(query: str, doc_folder: str | Tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool) List[Document][source]

Query database for documents.

Similar to handle_doc_folder_or_query(), but also handles the sort_option() arguments. It sorts the resulting documents according to sort_field and reverse_field.

Parameters:
  • sort_field – field by which to sort the resulting documents (see papis.document.sort()).

  • sort_reverse – if True, the fields are sorted in reverse order.

papis.cli.handle_doc_folder_query_all_sort(query: str, doc_folder: str | Tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool, _all: bool) List[Document][source]

Query database for documents.

Similar to handle_doc_folder_query_sort(), but also handles the all_option() argument.

Parameters:

_all – if False, the user is prompted to pick a subset of documents (see papis.api.pick_doc()).

papis.cli.bypass(group: Group, command: Command, command_name: str) Callable[[...], Any][source]

Overwrite existing papis commands.

This function is specially important for developing scripts in papis.

For example, consider augmenting the add command, as seen when using papis add. In this case, we may want to add some additional options or behavior before calling papis.commands.add, but would like to avoid writing it from scratch. This function can then be used as follows to allow this

import click
import papis.cli
import papis.commands.add

@click.group()
def main():
    """Your main app"""
    pass

@papis.cli.bypass(main, papis.commands.add.cli, "add")
def add(**kwargs):
    # do some logic here...
    # and call the original add command line function by
    papis.commands.add.cli.bypassed(**kwargs)

papis.commands

class papis.commands.AliasedGroup(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, **attrs: Any)[source]

A click.Group that accepts command aliases.

This group command is taken from here and is to be used for groups with aliases. In this case, aliases are defined as prefixes of the command, so for a command named remove, rem is also accepted as long as it is unique.

format_commands(ctx: Context, formatter: HelpFormatter) None[source]

Overwrite the default formatting.

get_command(ctx: Context, cmd_name: str) Command | None[source]
Returns:

given a context and a command name, this returns a click.Command object if it exists or returns None.

class papis.commands.Script(command_name: str, path: str | None, plugin: Command | None)[source]

A papis command plugin or script.

These plugins are made available through the main papis command-line as subcommands.

command_name: str

The name of the command.

path: str | None

The path to the script if it is a separate executable.

plugin: Command | None

A click.Command if the script is registered as an entry point.

papis.commands.get_external_scripts() Dict[str, Script][source]

Get a mapping of all external scripts that should be registered with papis.

An external script is an executable that can be found in the papis.config.get_scripts_folder() folder or in the user’s PATH. External scripts are recognized if they are prefixed with papis-.

Returns:

a mapping of scripts that have been found.

papis.commands.get_scripts() Dict[str, Script][source]

Get a mapping of commands that should be registered with papis.

This finds all the commands that are registered as entry points in the namespace "papis.command".

Returns:

a mapping of scripts that have been found.

papis.commands.get_all_scripts() Dict[str, Script][source]

Get a mapping of all commands that should be registered with papis.

This includes the results from get_external_scripts() and get_scripts(). Entrypoint-based scripts take priority, so if an external script with the same name is found it is silently ignored.

Returns:

a mapping of scripts that have been found.

papis.config

papis.config.get_general_settings_name() str[source]

Get the section name of the general settings.

>>> get_general_settings_name()
'settings'
class papis.config.Configuration[source]

A subclass of configparser.ConfigParser with custom defaults.

This class automatically reads the configuration file and imports any required scripts. If no file exists, a default one is created.

Use get_configuration() to instantiate this class instead of calling it directly.

papis.config.get_default_settings() Dict[str, Dict[str, Any]][source]

Get the default settings for all non-user variables.

Additional user variables can be registered using register_default_settings() and will be included in this dictionary.

papis.config.register_default_settings(settings_dictionary: Dict[str, Dict[str, Any]]) None[source]

Register configuration settings into the global configuration registry.

Notice that you can define sections or global options. For instance, let us suppose that a script called foobar defines some configuration options. In the script there could be the following defined

import papis.config

options = {"foobar": { "command": "open"}}
papis.config.register_default_settings(options)

which can then be accessed globally through

papis.config.get("command", section="foobar")
Parameters:

settings_dictionary – a dictionary of configuration settings, where the first level of keys defines the sections and the second level defines the actual configuration settings.

papis.config.get_config_home() str[source]
Returns:

the base directory relative to which user specific configuration files should be stored.

papis.config.get_config_dirs() List[str][source]
Returns:

a list of directories where the configuration files might be stored.

papis.config.get_config_folder() str[source]

Get the main configuration folder.

Returns:

the folder where the configuration files are stored, e.g. $HOME/.config/papis, by looking in the directories returned by get_config_dirs().

papis.config.get_config_file() str[source]

Get the main configuration file.

This file can be changed by set_config_file().

Returns:

the path of the main configuration file, e.g. $CONFIG_FOLDER/config, in the directory returned by get_config_folder().

papis.config.set_config_file(filepath: str) None[source]

Override the main configuration file.

papis.config.get_configpy_file() str[source]
Returns:

the path of the main Python configuration file, e.g. $CONFIG_FOLDER/config.py.

papis.config.get_scripts_folder() str[source]
Returns:

the folder where the scripts are stored, e.g. $CONFIG_FOLDER/scripts.

papis.config.set(key: str, value: Any, section: str | None = None) None[source]

Set a key in the configuration.

Parameters:
  • key – the name of the key to set.

  • value – the value to set it to, which can be any value understood by the Configuration.

  • section – the name of the section to set the key in.

papis.config.general_get(key: str, section: str | None = None, data_type: type | None = None) Any | None[source]

Get the value for a given key in section.

This function is a bit more general than the get from Configuration (see configparser.ConfigParser.get()). In particular it supports

  • Providing the key and section, in which case it will retrieve the key from that section directly.

  • The key has the format <section>-<key> and no section is specified. In this case, the full key is expected to be in the general settings section or a library section.

The priority of the search is given by

  1. The key is retrieved from a library section.

  2. The key is retrieved from the given section, if any.

  3. The key is retrieved from the general section.

Parameters:
  • key – a key in the configuration file to retrieve.

  • section – a section from which to retrieve the key, which defaults to get_general_settings_name().

  • data_type – the data type that should be expected for the value of the variable.

papis.config.get(key: str, section: str | None = None) Any | None[source]

Retrieve a general value (can be None) from the configuration file.

papis.config.getint(key: str, section: str | None = None) int | None[source]

Retrieve an integer value from the configuration file.

>>> set("something", 42)
>>> getint("something")
42
papis.config.getfloat(key: str, section: str | None = None) float | None[source]

Retrieve an floating point value from the configuration file.

>>> set("something", 0.42)
>>> getfloat("something")
0.42
papis.config.getboolean(key: str, section: str | None = None) bool | None[source]

Retrieve a boolean value from the configuration file.

>>> set("add-open", True)
>>> getboolean("add-open")
True
papis.config.getstring(key: str, section: str | None = None) str[source]

Retrieve a string value from the configuration file.

>>> set("add-open", "hello world")
>>> getstring("add-open")
'hello world'
papis.config.getlist(key: str, section: str | None = None) List[str][source]

Retrieve a list value from the configuration file.

This function uses eval() to execute a the string present in the configuration file into a Python list. This can be unsafe if the list contains unknown code.

>>> set("tags", "['a', 'b', 'c']")
>>> getlist("tags")
['a', 'b', 'c']
Raises:

SyntaxError – Whenever the parsed syntax is either not a valid python object or not a valid python list.

papis.config.get_configuration() Configuration[source]

Get the configuration object,

If no configuration has been initialized, it initializes one. Only one configuration per process should ever be configured.

papis.config.merge_configuration_from_path(path: str | None, configuration: Configuration) None[source]

Merge information of a configuration file found in path into configuration.

Parameters:
  • path – a path to a configuration file.

  • configuration – an existing Configuration object.

papis.config.set_lib(library: Library) None[source]

Set the current library.

papis.config.set_lib_from_name(libname: str) None[source]

Set the current library from a name.

Parameters:

libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.

papis.config.get_lib_from_name(libname: str) Library[source]

Get a library object from a name.

Parameters:

libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.

papis.config.get_lib_dirs() List[str][source]

Get the directories of the current library.

papis.config.get_lib_name() str[source]

Get the name of the current library.

papis.config.get_lib() Library[source]

Get current library.

If there is no library set before, the default library will be retrieved. If the PAPIS_LIB environment variable is defined, this is the library name (or path) that will be taken as a default.

papis.config.get_libs() List[str][source]

Get all the library names from the configuration file.

papis.config.get_libs_from_config(config: Configuration) List[str][source]

Get all library names from the given configuration.

In the configuration file, any sections that contain a "dir" or a "dirs" key are considered to be libraries.

papis.config.reset_configuration() Configuration[source]

Resets the existing configuration and returns a new one without any user settings.

papis.docmatcher

class papis.docmatcher.ParseResult(search: str, pattern: Pattern[str], doc_key: str | None)[source]

Result from parsing a search string.

For example, a search string such as "author:einstein" will result in

r = ParseResult(search="einstein", pattern=<...>, doc_key="author")
search: str

A search string that was matched for this result.

pattern: Pattern[str]

A regex pattern constructed from the search using get_regex_from_search().

doc_key: str | None

A document key that was matched for this result, if any.

class papis.docmatcher.MatcherCallable(*args, **kwargs)[source]

A callable typing.Protocol used to match a document for a given search.

__call__(document: Document, search: Pattern[str], match_format: str | None = None, doc_key: str | None = None) Any[source]

Match a document’s keys to a given search pattern.

The matcher can decide whether the match_format or the doc_key take priority when matching against the given pattern in search. If possible, doc_key should be given priority as the more specific choice.

Parameters:
  • search – a regex pattern to match the query against (see ParseResult.pattern).

  • match_format – a format string (see papis.format.format()) to match against.

  • doc_key – a specific key in the document to match against.

Returns:

None if the match fails and anything else otherwise.

class papis.docmatcher.DocMatcher[source]

This class implements the mini query language for papis.

The (static) methods should be used as follows:

  • First, the search string has to be set:

    DocMatcher.set_search(search_string)
    
  • Then, the parse method should be called in order to decipher the search_string:

    DocMatcher.parse()
    
  • Finally, the DocMatcher is ready to match documents with the input query via:

    DocMatcher.return_if_match(doc)
    
search: ClassVar[str] = ''

Search string from which the matcher is constructed.

A parsed version of the search string using parse_query().

matcher: ClassVar[MatcherCallable | None] = None

A MatcherCallable used to match the document to the parsed_search.

match_format: ClassVar[str] = '{doc[tags]}{doc.subfolder}{doc[title]}{doc[author]}{doc[year]}'

A format string (defaulting to match-format) used to match the parsed search results if no document key is present.

classmethod return_if_match(doc: Document) Document | None[source]

Use DocMatcher.parsed_search to match the doc against the query.

>>> import papis.document
>>> from papis.database.cache import match_document
>>> doc = papis.document.from_data({'title': 'einstein'})
>>> DocMatcher.set_matcher(match_document)
>>> result = DocMatcher.parse('einste')
>>> DocMatcher.return_if_match(doc) is not None
True
>>> result = DocMatcher.parse('heisenberg')
>>> DocMatcher.return_if_match(doc) is not None
False
>>> result = DocMatcher.parse('title : ein')
>>> DocMatcher.return_if_match(doc) is not None
True
Parameters:

doc – a papis document to match against.

Set the search for this instance of the matcher.

>>> DocMatcher.set_search('author:Hummel')
>>> DocMatcher.search
'author:Hummel'
classmethod set_matcher(matcher: MatcherCallable) None[source]

Set the matcher callable for the search.

>>> from papis.database.cache import match_document
>>> DocMatcher.set_matcher(match_document)
classmethod parse(search: str | None = None) List[ParseResult][source]

Parse the main query text.

This method will also set DocMatcher.parsed_search to the resulting parsed query and it will return it too.

>>> print(DocMatcher.parse('hello author : einstein'))
[['hello'], ['author', 'einstein']]
>>> print(DocMatcher.parse(''))
[]
>>> print(                DocMatcher.parse(                    '"hello world whatever :" tags : \'hello ::::\''))
[['hello world whatever :'], ['tags', 'hello ::::']]
>>> print(DocMatcher.parse('hello'))
[['hello']]
Parameters:

search – a custom search text string that overwrite search.

Returns:

a parsed query.

Creates a default regex from a search string.

>>> get_regex_from_search(' ein 192     photon').pattern
'.*ein.*192.*photon.*'
>>> get_regex_from_search('{1234}').pattern
'.*\\{1234\\}.*'
Parameters:

search – a valid search string.

Returns:

a regular expression representing the search string, which is properly escaped and allows for multiple spaces.

papis.docmatcher.parse_query(query_string: str) List[ParseResult][source]

Parse a query string using pyparsing.

The query language implemented by this function for papis supports strings of the form:

'hello author : Einstein    title: "Fancy Title: Part 1" tags'

which will result in

results = [
    ParseResult(search="hello", pattern=<...>, doc_key=None),
    ParseResult(search="Einstein", pattern=<...>, doc_key="author"),
    ParseResult(search="Fancy Title: Part 1", pattern=<...>, doc_key="title"),
    ParseResult(search="tags", pattern=<...>, doc_key=None),
]

We can see there that constructs of the form "key:value" with the colon as a separator are recognized and parsed to document keys with the color. They can be escaped by enclosing them in quotes. Otherwise, each individual word in the search query will give another ParseResult. Each search term can contain additional regex characters.

Parameters:

query_string – a search string to parse into a structured format.

Returns:

a list of parsing results for each token in the query string.

papis.document

Module defining the main document type.

papis.document.DocumentLike

A union of types that can be converted to a document.

alias of Union[Document, Dict[str, Any]]

class papis.document.KeyConversion[source]

A dict that contains a key and an action.

key: str | None

Name of a key in a foreign dictionary to convert.

action: Callable[[Any], Any] | None

Action to apply to the value at key for pre-processing.

papis.document.EmptyKeyConversion = {'action': None, 'key': None}

A default KeyConversion.

class papis.document.KeyConversionPair(from_key, rules)[source]
from_key: str

A string denoting the key in the input data.

rules: List[KeyConversion]

A list of KeyConversion key mapping rules used to rename and post-process the from_key and its value.

papis.document.keyconversion_to_data(conversions: Sequence[KeyConversionPair], data: Dict[str, Any], keep_unknown_keys: bool = False) Dict[str, Any][source]

Function to convert between dictionaries.

This can be used to define a fixed set of translation rules between, e.g., JSON data obtained from a website API and standard papis key names and formatting. The implementation is completely generic.

For example, we have the simple dictionary

data = {"id": "10.1103/physrevb.89.140501"}

which contains the DOI of a document with the wrong key. We can then write the following rules

conversions = [
    KeyConversionPair("id", [
        {"key": "doi", "action": None},
        {"key": "url": "action": lambda x: "https://doi.org/{}".format(x)}
    ])
]

new_data = keyconversion_to_data(conversions, data)

to rename the "id" key to the standard "doi" key used by papis and a URL. Any number of such rules can be written, depending on the complexity of the incoming data. Note that any errors raised on the application of the action will be silently ignored and the corresponding key will be skipped.

Parameters:
  • conversions – a sequence of KeyConversionPairs used to convert the data.

  • data – a dict to be convert according to conversions.

  • keep_unknown_keys – if True unknown keys from data are kept in the resulting dictionary. Otherwise, only keys from conversions are present.

Returns:

a new dict containing the entries from data converted according to conversions.

papis.document.author_list_to_author(data: Dict[str, Any]) str[source]

Convert a list of authors into a single author string.

This uses the multiple-authors-separator and the multiple-authors-format settings to construct the concatenated authors.

Parameters:

data – a dict that contains an "author_list" key to be converted into a single author string.

>>> author1 = {"given": "Some", "family": "Author"}
>>> author2 = {"given": "Other", "family": "Author"}
>>> author_list_to_author({"author_list": [author1, author2]})
'Author, Some and Author, Other'
papis.document.guess_authors_separator(authors: str) str[source]

Attempt to determine the separator for various non-BibTeX author lists.

Parameters:

authors – author string to determine the separator for.

Returns:

a regex that can be used to split the authors string.

For example:

>>> s = "Sanger, F. and Nicklen, S. and Coulson, A. R."
>>> assert guess_authors_separator(s) == "and"
>>> s = "Fabian Sanger and Steven Nicklen and Alexander R. Coulson"
>>> assert guess_authors_separator(s) == "and"
>>> s = "Fabian Sanger, Steven Nicklen, Alexander R. Coulson"
>>> assert guess_authors_separator(s) == ","
>>> s = "Fabian Sanger, and Steven Nicklen, and Alexander R. Coulson"
>>> import re
>>> sep = guess_authors_separator(s)
>>> assert re.match(sep, ", and")
>>> s = "Dagobert Duck and von Beethoven, Ludwig and Ford, Jr., Henry"
>>> assert guess_authors_separator(s) == "and"
>>> s = "Turing, A. M."
>>> assert guess_authors_separator(s) == "and"
papis.document.split_author_name(author: str) Dict[str, Any][source]

Split an author name into a given and family name.

This uses bibtexparser.customization.splitname() to correctly split and determine the first and last names of an author in the list. Note that this is just a heuristic and can give incorrect results for certain author names.

Parameters:

author – a string containing an author name.

Returns:

a dict with the family and given name of the author.

papis.document.split_authors_name(authors: str | List[str], separator: str | None = None) List[Dict[str, Any]][source]

Convert list of authors to a fixed format.

Uses split_author_name() to construct the individual authors and the separator to split the authors in the list.

Parameters:
  • authors – a list of author names, where each entry can consists of multiple authors separated by separator.

  • separator – a separator for entries in authors that contain multiple authors. If None, a separator is guessed using guess_authors_separator().

class papis.document.DocHtmlEscaped(doc: Document)[source]

Small helper class to escape HTML elements in a document.

>>> DocHtmlEscaped(from_data({"title": '> >< int & "" "'}))['title']
'&gt; &gt;&lt; int &amp; &quot;&quot; &quot;'
class papis.document.Document(folder: str | None = None, data: Dict[str, Any] | None = None)[source]

An abstract document in a papis library.

This class inherits from a standard dict and implements some additional functionality.

html_escape

A DocHtmlEscaped instance that can be used to escape keys in the document for use in HTML documents.

has(key: str) bool[source]

Check if key is in the document.

copy() Document[source]

Make a shallow copy of the Document.

set_folder(folder: str) None[source]

Set the document’s main folder.

This also updates the location of the info file and other attributes. Note, however, that it will not load any data from the given folder even if it contains another info file (see from_folder() for this functionality).

Parameters:

folder – an absolute path to a new main folder for the document.

get_main_folder() str | None[source]
Returns:

the root path in the filesystem where the document is stored, if any.

get_main_folder_name() str | None[source]
Returns:

the folder name of the document, i.e. the basename of the path returned by get_main_folder().

get_info_file() str[source]
Returns:

path to the info file, which can also be an empty string if no such file has been created.

get_files() List[str][source]

Get the files linked to the document.

The files in a document are stored relative to its main folder. If no main folder is set on the document (see set_folder()), then this function will not return any files. To retrieve the relative file paths only, access doc["files"] directly.

Returns:

a list of absolute file paths in the document’s main folder, if any.

get_notes() List[str][source]

Get all notes linked to the document.

Returns:

a list of absolute file paths in the document’s main folder, if any, similar to get_files().

save() None[source]

Saves the current document fields into the info file.

load() None[source]

Load information from the info file.

papis.document.from_data(data: Dict[str, Any]) Document[source]

Construct a Document from a dictionary.

Parameters:

data – a dictionary to be made into a new document.

papis.document.from_folder(folder_path: str) Document[source]

Construct a Document from a folder.

Parameters:

folder_path – absolute path to a valid papis folder.

papis.document.to_json(document: Document) str[source]

Export the document to JSON.

Returns:

a JSON string corresponding to all the entries in the document.

papis.document.to_dict(document: Document) Dict[str, Any][source]

Convert a document back into a standard dict.

Returns:

a dict corresponding to all the entries in the document.

papis.document.dump(document: Document) str[source]

Dump the document into a formatted string.

The format of the string is not fixed and is meant to be used to display the document entries in a consistent way across papis.

Returns:

a string containing all the entries in the document.

>>> doc = from_data({'title': 'Hello World'})
>>> dump(doc)
'title: Hello World'
papis.document.delete(document: Document) None[source]

Delete a document from the filesystem.

This function delete the main folder of the document (recursively), but it does not delete the in-memory version of the document.

papis.document.describe(document: Document | Dict[str, Any]) str[source]
Returns:

a string description of the current document using document-description-format.

papis.document.move(document: Document, path: str) None[source]

Move the document to a new main folder at path.

This supposes that the document exists in the location document.get_main_folder() and will change the folder in the input document as a result.

Parameters:

path – absolute path where the document should be moved to. This path is expected to not exist yet and will be created by this function.

>>> doc = from_data({'title': 'Hello World'})
>>> doc.set_folder('path/to/folder')
>>> import tempfile; newfolder = tempfile.mkdtemp()
>>> move(doc, newfolder)
Traceback (most recent call last):
...
FileExistsError: There is already...
papis.document.sort(docs: Sequence[Document], key: str, reverse: bool = False) List[Document][source]

Sort a list of documents by the given key.

The sort is performed on the key with a priority given to the type of the value. If the key does not exist in the document, this is given the lowest priority and left at the end of the list.

Parameters:
  • docs – a sequence of documents.

  • key – a key in the documents by which to sort.

  • reverse – if True, the sorting is done in reverse order (descending instead of ascending).

Returns:

a list of documents sorted by key.

papis.document.new(folder_path: str, data: Dict[str, Any], files: Sequence[str] | None = None) Document[source]

Creates a complete document with data and existing files.

The document is saved to the filesystem at folder_path and all the given files are copied over to the main folder.

Parameters:
  • folder_path – a main folder for the document.

  • data – a dict with key and values to be used as metadata in the document.

  • files – a sequence of files to add to the document.

Raises:

FileExistsError – if folder_path already exists.

papis.downloaders

class papis.downloaders.Importer(uri: str = '')[source]

Importer that tries to get data and files from implemented downloaders.

This importer simply calls get_info_from_url() on the given URI.

classmethod match(uri: str) Importer | None[source]

Check if the importer can process the given URI.

For example, an importer that supports links from the arXiv can check that the given URI matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Importer object.

Parameters:

uri – An URI where the document information should be retrieved from.

Returns:

An importer instance if the match to the URI is successful or None otherwise.

fetch() None[source]

Fetch metadata and files for the given uri.

This method calls Importer.fetch_data() and Importer.fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() None[source]

Fetch metadata from the given uri.

The imported metadata is stored in ctx.

fetch_files() None[source]

Fetch files from the given uri.

The imported files are stored in ctx.

class papis.downloaders.Downloader(uri: str = '', name: str = '', ctx: Context | None = None, expected_document_extension: str | Sequence[str] | None = None, cookies: Dict[str, str] | None = None, priority: int = 1)[source]

A base class for downloader instances implementing common functionality.

In general, downloaders are expected to implement a subset of the methods below, depending on the generality. A simple downloader could only implement get_bibtex_url() and get_document_url().

expected_document_extension

A single extension or a list of extensions supported by the downloader. The extensions do not contain the leading dot, e.g. ["pdf", "djvu"].

priority

A priority given to the downloader. This is used when trying to automatically determine a preferred downloader for a given URL.

session

A requests.Session that is used for all the requests made by the downloader.

classmethod match(url: str) Downloader | None[source]

Check if the downloader can process the given URL.

For example, an importer that supports links from the arXiv can check that the given URL matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Downloader object.

Parameters:

url – An URL where the document information should be retrieved from.

Returns:

A downloader instance if the match to the URL is successful or None otherwise.

fetch() None[source]

Fetch metadata and files for the given uri.

This method calls Downloader.fetch_data() and Downloader.fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() None[source]

Fetch metadata for the given URL.

The imported metadata is stored in ctx. To fetch the metadata, the following steps are followed

Note that previous steps overwrite any information, i.e. the BibTeX data will take priority.

fetch_files() None[source]

Fetch files from the given uri.

The imported files are stored in ctx. The file is downloaded with download_document() and stored as a temporary file.

get_bibtex_url() str | None[source]
Returns:

an URL to a valid BibTeX file that can be used to extract metadata about the document.

get_bibtex_data() str | None[source]

Get BibTeX data available at get_bibtex_url(), if any.

Returns:

a string containing the BibTeX data, which can be parsed.

download_bibtex() None[source]

Download and store that BibTeX data from get_bibtex_url().

Use get_bibtex_data() to access the metadata from the BibTeX URL.

get_data() Dict[str, Any][source]

Retrieve general metadata from the given URL.

This function is meant to be as general as possible and should not contain data imported from BibTeX (use get_bibtex_data() instead). For example, this can be used for web scrapping or calling other website APIs to gather metadata about the document.

get_doi() str | None[source]
Returns:

a DOI for the document, if any.

get_document_url() str | None[source]
Returns:

a URL to a file that should be downloaded.

get_document_data() bytes | None[source]

Get data for the downloaded file that is given by get_document_url().

Returns:

the bytes (stored in memory) for the downloaded file.

get_document_extension() str[source]
Returns:

a guess for the extension of get_document_data(). This is based on filetype and uses magic file signatures to determine the type. If no guess is valid, an empty string is returned.

download_document() None[source]

Download and store the file that is given by get_document_url().

Use get_document_data() to access the file binary contents.

check_document_format() bool[source]

Check if the document downloaded by download_document() has a file type supported by the downloader.

If the downloader has no preferred type, then all files are accepted.

Returns:

True if the document has a supported file type and False otherwise.

papis.downloaders.get_available_downloaders() List[Type[Downloader]][source]

Get all declared downloader classes.

papis.downloaders.get_matching_downloaders(url: str) List[Downloader][source]

Get downloaders matching the given url.

Parameters:

url – a URL to match.

Returns:

a list of downloaders (sorted by priority).

papis.downloaders.get_downloader_by_name(name: str) Type[Downloader][source]

Get a specific downloader by its name.

Parameters:

name – the name of the downloader. Note that this is the name of the entry point used to define the downloader. In general, this should be the same as its name, but this is not enforced.

Returns:

a downloader class.

papis.downloaders.get_info_from_url(url: str, expected_doc_format: str | None = None) Context[source]

Get information directly from the given url.

Parameters:
  • url – the URL of a resource.

  • expected_doc_format – an expected document file type, that is used to override the file type defined by the chosen downloader.

papis.downloaders.download_document(url: str, expected_document_extension: str | None = None, cookies: Dict[str, Any] | None = None) str | None[source]

Download a document from url and store it in a local file.

Parameters:
  • url – the URL of a remote file.

  • expected_document_extension – an expected file type. If None, then an extension is guessed from the file contents, but this can also fail.

Returns:

a path to a local file containing the data from url.

papis.exceptions

This module implements custom exceptions used to make the code more readable.

exception papis.exceptions.DefaultSettingValueMissing(key: str)[source]

Exception raised when a configuration setting is missing and has no default value.

exception papis.exceptions.DocumentFolderNotFound(doc: str)[source]

Exception raised when a document has no main folder.

papis.filetype

class papis.filetype.DjVu[source]

Implements a custom DjVu type matcher for filetype.

papis.filetype.guess_content_extension(content: bytes) str | None[source]

Guess the extension from (potential) file contents.

This method attempts to look at known file signatures to determine the file type. This is not always possible, as it is hard to determine a unique type.

Parameters:

content – contents of a file.

Returns:

an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.

papis.filetype.guess_document_extension(document_path: str) str | None[source]

Guess the extension of a given file at document_path.

Parameters:

document_path – path to an existing file.

Returns:

an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.

papis.filetype.get_document_extension(document_path: str) str[source]

Get an extension for the file at document_path.

This uses guess_document_extension() and returns a default extension “data” if no specific type can be determined from the file.

Parameters:

document_path – path to an existing file.

Returns:

an extension string.

papis.format

papis.format.FORMATTER_EXTENSION_NAME = 'papis.format'

The entry point name for formatter plugins.

exception papis.format.InvalidFormatterError[source]

An exception that is thrown when an invalid formatter is selected.

exception papis.format.FormatFailedError[source]

An exception that is thrown when a format string fails to be interpolated.

This can happen due to lack of data (e.g. missing fields in the document) or invalid format strings (e.g. passed to the wrong formatter).

class papis.format.Formatter[source]

A generic formatter that works on templated strings using a document.

format(fmt: str, doc: Document | Dict[str, Any], doc_key: str = '', additional: Dict[str, Any] | None = None, default: str | None = None) str[source]
Parameters:
  • fmt – a format string understood by the formatter.

  • doc – an object convertible to a document.

  • doc_key – the name of the document in the format string. By default, this falls back to format-doc-name.

  • default – an optional string to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.

  • additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

class papis.format.PythonFormatter[source]

Construct a string using a PEP 3101 (str.format based) format string.

This formatter is named "python" and can be set using the formatter setting in the configuration file. The formatted string has access to the doc variable, that is always a papis.document.Document. A string using this formatter can look like

"{doc[year]} - {doc[author_list][0][family]} - {doc[title]}"

Note, however, that according to PEP 3101 some simple formatting is not possible. For example, the following is not allowed

"{doc[title].lower()}"

and should be replaced with

"{doc[title]!l}"

The following special conversions are implemented: “l” for str.lower(), “u” for str.upper(), “t” for str.title(), “c” for str.capitalize(), “y” that uses slugify. Additionally, the following syntax is available to select subsets from a string

"{doc[title]:1.3S}"

which will select the words[1:3] from the title (words are split by single spaces).

format(fmt: str, doc: Document | Dict[str, Any], doc_key: str = '', additional: Dict[str, Any] | None = None, default: str | None = None) str[source]
Parameters:
  • fmt – a format string understood by the formatter.

  • doc – an object convertible to a document.

  • doc_key – the name of the document in the format string. By default, this falls back to format-doc-name.

  • default – an optional string to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.

  • additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

class papis.format.Jinja2Formatter[source]

Construct a string using Jinja2 templates.

This formatter is named "jinja2" and can be set using the formatter setting in the configuration file. The formatted string has access to the doc variable, that is always a papis.document.Document. A string using this formatter can look like

"{{ doc.year }} - {{ doc.author_list[0].family }} - {{ doc.title }}"

This formatter supports the whole range of Jinja2 control structures and filters so more advanced string processing is possible. For example, we can titlecase the title using

"{{ doc.title | title }}"

or give a default value if a key is missing in the document using

"{{ doc.isbn | default('ISBN-NONE', true) }}"
format(fmt: str, doc: Document | Dict[str, Any], doc_key: str = '', additional: Dict[str, Any] | None = None, default: str | None = None) str[source]
Parameters:
  • fmt – a format string understood by the formatter.

  • doc – an object convertible to a document.

  • doc_key – the name of the document in the format string. By default, this falls back to format-doc-name.

  • default – an optional string to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.

  • additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

papis.format.get_formatter(name: str | None = None) Formatter[source]

Initialize and return a formatter plugin.

Note that the formatter is cached and all subsequent calls to this function will return the same formatter.

Parameters:

name – the name of the desired formatter, by default this uses the value of formatter.

papis.format.format(fmt: str, doc: Document | Dict[str, Any], doc_key: str = '', additional: Dict[str, Any] | None = None, default: str | None = None) str[source]

Format a string using the selected formatter.

This is the user-facing function that should be called when formatting a string. The formatters should not be called directly.

Arguments match those of Formatter.format().

papis.git

This module serves as an lightweight interface for git related functions.

papis.git.commit(path: str, message: str) None[source]

Commits changes in the path with a message.

Parameters:
  • path – a folder with an existing git repository.

  • message – a commit message.

papis.git.add(path: str, resource: str) None[source]

Adds changes in the path to the git index with a message.

Parameters:
  • path – a folder with an existing git repository.

  • resource – a resource (e.g. info.yaml file) to add to the index.

papis.git.remove(path: str, resource: str, recursive: bool = False, force: bool = True) None[source]

Remove a resource from the git repository at path.

Parameters:
  • path – a folder with an existing git repository.

  • resource – a resource (e.g. info.yaml file) to remove from git.

  • recursive – if True, the given resource is removed recursively.

  • force – if True, the removal is forced so any errors (e.g. file does not exist) are silently ignored.

papis.git.add_and_commit_resource(path: str, resource: str, message: str) None[source]

Adds and commits a single resource.

Parameters:
  • path – a folder with an existing git repository.

  • resource – a resource (e.g. info.yaml file) to remove from git.

  • message – a commit message.

papis.git.add_and_commit_resources(path: str, resources: Sequence[str], message: str) None[source]

Add and commit multiple resources (see add_and_commit_resource()).

Note that a single commit message is generated for all the resources.

papis.id

papis.id.compute_an_id(doc: Document, separator: str | None = None) str[source]

Make an id for the input document doc.

This is a non-deterministic function if separator is None (a random value is used). For a given value of separator, the result is deterministic.

Parameters:
  • doc – a document for which to generate an id.

  • separator – a string used to separate the document fields that go into constructing the id.

Returns:

a (hexadecimal) id for the document that is unique to high probability.

papis.id.key_name() str[source]

Reserved key name for databases and documents.

papis.id.has_id(doc: Document | Dict[str, Any]) bool[source]

Check if the given doc has an id.

papis.id.get(doc: Document | Dict[str, Any]) str[source]

Get the id from a document.

papis.importer

class papis.importer.ImporterT

Invariant TypeVar bound to the Importer class.

alias of TypeVar(‘ImporterT’, bound=Importer)

papis.importer.cache(meth: Callable[[ImporterT], None]) Callable[[ImporterT], None][source]

Decorator used to cache Importer methods.

The data is cached in the Importer.ctx of each importer instance. The method meth is only called if the context is empty.

Parameters:

meth – a method of an Importer.

class papis.importer.Context[source]
data

A dict of fields retrieved by the Importer. These are generally not processed.

files

A list of files retrieved by the Importer.

class papis.importer.Importer(uri: str = '', name: str = '', ctx: Context | None = None)[source]
name

A name given to the importer (that is not necessarily unique).

uri

The URI (Uniform Resource Identifier) that the importer is to extract data from. This can be an URL, a local or remote file name, an object identifier (e.g. DOI), etc.

ctx

A Context that stores the data retrieved by the importer.

classmethod match(uri: str) Importer | None[source]

Check if the importer can process the given URI.

For example, an importer that supports links from the arXiv can check that the given URI matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Importer object.

Parameters:

uri – An URI where the document information should be retrieved from.

Returns:

An importer instance if the match to the URI is successful or None otherwise.

classmethod match_data(data: Dict[str, Any]) Importer | None[source]

Check if the importer can process the given metadata.

This method can be used to search for valid URIs inside the data that can then be processed by the importer. For example, if the metadata contains a DOI field, this can be used to import additional information.

Parameters:

data – An dict with metadata to inspect and match against.

Returns:

An importer instance if matching metadata is found or None otherwise.

fetch() None[source]

Fetch metadata and files for the given uri.

This method calls Importer.fetch_data() and Importer.fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() None[source]

Fetch metadata from the given uri.

The imported metadata is stored in ctx.

fetch_files() None[source]

Fetch files from the given uri.

The imported files are stored in ctx.

papis.importer.get_import_mgr() stevedore.extension.ExtensionManager[source]

Retrieve the stevedore.extension.ExtensionManager for importer plugins.

papis.importer.available_importers() List[str][source]

Get a list of available importer names.

papis.importer.get_importers() List[Type[Importer]][source]

Get a list of available importer classes.

papis.importer.get_importer_by_name(name: str) Type[Importer][source]

Get an importer class by name.

papis.library

class papis.library.Library(name: str, paths: Sequence[str])[source]

A class containing library information.

name: str

The name of the library, as it appears in the configuration file if defined there.

paths: List[str]

A list of paths with documents that form the library.

path_format() str[source]
Returns:

a string containing all the paths in the library concatenated using a colon.

papis.library.from_paths(paths: Sequence[str]) Library[source]

Create a library from a list of paths.

papis.logging

class papis.logging.ColoramaFormatter(log_format: str, full_tb: bool = False)[source]

A custom logging formatter that uses colorama.

full_tb: bool

A flag to denote whether a full traceback should be displayed when used with logger.info(..., exc_info=ext).

formatException(exc_info: Tuple[Any, ...]) str[source]

Format and return the specified exception information as a string.

If full_tb is True, then the full traceback is shown. Otherwise, a short inline description is given.

format(record: LogRecord) str[source]

Format the specified record as text.

This adds color coding to the logging levels, includes the exception into the message, removes the papis namespace from the name, etc. Any formatting of the logging output is made here.

papis.logging.setup(level: int | str | None = None, color: str | None = None, logfile: str | None = None, verbose: bool | None = None) None[source]

Set up formatting and handlers for the root level Papis logger.

Parameters:
  • level – default logging level (see logging). By default, this takes values from the PAPIS_LOG_LEVEL environment variable and falls back to "INFO".

  • color – flag to control logging colors. It should be one of ("always", "auto", "no"). By default, this takes values from the PAPIS_LOG_COLOR environment variable and falls back to "auto".

  • logfile – a path for a file in which to write log messages. By default, this takes values from the PAPIS_LOG_FILE environment variable and falls back to None.

  • verbose – make logger verbose (including debug information) regardless of the level. By default, this takes values from the PAPIS_DEBUG environment variable and falls back to False.

papis.logging.reset(level: int | str | None = None, color: str | None = None, logfile: str | None = None, verbose: bool | None = None) None[source]

Reset the root level Papis logger.

This function removes all the custom handlers and resets the logger before calling setup().

papis.logging.get_logger(name: str | None = None) Logger[source]

Get a logger instance for the given name under the papis namespace.

Parameters:

name – the provisional name of the logger instance.

Returns:

a logging.Logger under the papis namespace, i.e. with a name such as papis.<name>.

papis.notes

This module controls the notes for every papis document.

papis.notes.has_notes(doc: Document) bool[source]

Checks if the document has notes.

papis.notes.notes_path(doc: Document) str[source]

Get the path to the notes file corresponding to doc.

If the document does not have attached notes, a filename is constructed (using the notes-name setting) in the document’s main folder.

Returns:

a absolute filename that corresponds to the attached notes for doc (this file does not necessarily exist).

papis.notes.notes_path_ensured(doc: Document) str[source]

Get the path to the notes file corresponding to doc or create it if it does not exist.

If the notes do not exist, a new file is created using notes_path() and filled with the contents of the template given by the notes-template configuration option.

Returns:

an absolute filename that corresponds to the attached notes for doc.

papis.pick

papis.pick.PICKER_EXTENSION_NAME = 'papis.picker'

Name of the entry points for Picker plugins.

class papis.pick.T

Invariant TypeVar with no bounds.

alias of TypeVar(‘T’)

class papis.pick.Picker[source]

An interface used to select items from a list.

abstract __call__(items: Sequence[T], header_filter: Callable[[T], str], match_filter: Callable[[T], str], default_index: int = 0) List[T][source]
Parameters:
  • items – a sequence of items from which to pick a subset.

  • header_filter – (optional) a callable that takes an item from items and returns a string representation shown to the user.

  • match_filter – (optional) a callable that takes an item from items and returns a string representation that is used when searching or filtering the items.

  • default_index – (optional) sets the selected item when the picker is first shown to the user.

Returns:

a subset of items that were picked.

papis.pick.get_picker(name: str) Type[Picker[Any]][source]

Get a picker by its plugin name.

Parameters:

name – the name of an entrypoint to load a Picker plugin from.

Returns:

a Picker subclass implemented in the plugin.

papis.pick.pick(items: ~typing.Sequence[~papis.pick.T], header_filter: ~typing.Callable[[~papis.pick.T], str] = <class 'str'>, match_filter: ~typing.Callable[[~papis.pick.T], str] = <class 'str'>, default_index: int = 0) List[T][source]

Load a Picker plugin and select a subset of items.

The arguments to this function match those of Picker.__call__(). The specific picker is chosen through the picktool configuration option.

Returns:

a subset of items that were picked.

papis.pick.pick_doc(documents: Sequence[Document]) List[Document][source]

Pick from a sequence of documents using pick().

This function uses the header-format-file setting or, if not available, the header-format setting to construct a header_filter for the picker. It also uses the configuration setting match-format to construct a match_filter.

Parameters:

documents – a sequence of documents.

Returns:

a subset of documents that was picked.

papis.pick.pick_subfolder_from_lib(lib: str) List[str][source]

Pick subfolders from all existings subfolders in lib.

Note that this includes document folders in lib as well nested library folders.

Parameters:

lib – the name of an existing library to search in.

Returns:

a subset of the subfolders in the library.

papis.plugin

papis.plugin.get_extension_manager(namespace: str) ExtensionManager[source]
Parameters:

namespace – the namespace for the entry points.

Returns:

an extension manager for the given entry point namespace.

papis.plugin.get_available_entrypoints(namespace: str) List[str][source]
Returns:

a list of all available entry points in the given namespace.

papis.plugin.get_available_plugins(namespace: str) List[Any][source]
Returns:

a list of all available plugins in the given namespace.

papis.sphinx_ext

A collection of Papis-specific Sphinx extensions.

This can be included directly into the conf.py file as a normal extension, i.e.

extensions = [
    ...,
    "papis.sphinx_ext",
]

It will include a custom CustomClickDirective for documenting Papis commands and a PapisConfig directive for documenting Papis configuration values.

These are included by default when adding it to the extensions list in your Sphinx configuration.

class papis.sphinx_ext.CustomClickDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

A custom sphinx_click.ClickDirective that removes the automatic title from the generated documentation. Otherwise it can be used in the exact same way, e.g.:

.. click:: papis.commands.add:cli
    :prog: papis add
class papis.sphinx_ext.PapisConfig(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

A directive for describing Papis configuration values.

The directive is given as:

.. papis-config:: config-value-name

and has the following optional arguments.

  • :section:: The section in which the configuration value is given. The section defaults to get_general_settings_name().

  • :type:: The type of the configuration value, e.g. a string or an integer. If not provided, the type of the default value is used.

  • :default:: The default value for the configuration value. If not provided, this is taken from the default Papis settings.

It can be used as:

.. papis-config:: info-file
    :default: info.yml
    :type: str
    :section: settings

    This is the file name for where the document metadata should be
    stored. It is a relative path in the document's main folder.

In text, these configuration values can be referenced using standard role references, e.g.

The document metadata is found in its :confval:`info-file`.
has_content: bool = True

The directive can have a longer description.

optional_arguments: int = 3

Number of optional arguments to the directive.

required_arguments: int = 1

Number of required arguments to the directive.

option_spec: Dict[str, type] = {'default': <class 'str'>, 'section': <class 'str'>, 'type': <class 'str'>}

A description of the arguments, mapping names to validator functions.

Create a function that can be used with sphinx.ext.linkcode.

This can be used in the conf.py file as

linkcode_resolve = make_link_resolve("https://github.com/papis/papis", "main")
Parameters:
  • github_project_url – the URL to a GitHub project to which to link.

  • revision – the revision to which to point to, e.g. main.

papis.testing

papis.testing.create_random_file(filetype: str | None = None, prefix: str | None = None, suffix: str | None = None, dir: str | None = None) str[source]

Create a random file with the correct magic signature.

This function creates random empty files that can be used for testing. It supports creating PDF, EPUB, DjVu or simple text files. These are constructed in such a way that they are recognized by papis.filetype.guess_content_extension().

Parameters:
papis.testing.populate_library(libdir: str) None[source]

Add temporary documents with random files into the folder libdir.

Parameters:

libdir – an existing empty library directory.

class papis.testing.TemporaryConfiguration(settings: Dict[str, Any] | None = None, overwrite: bool = False)[source]

A context manager used to create a temporary papis configuration.

This configuration is created in a temporary directory and all the required paths are set to point to that directory (e.g. XDG_CONFIG_HOME and XDG_CACHE_HOME). This is meant to be used by tests to create a default environment in which to run.

It can be used in the standard way as

# Set the configuration option `picktool`
papis.config.set("picktool", "fzf")

with TemporaryConfiguration() as config:
    # In this block, it is back to its default value
    value = papis.config.get("picktool")
    assert value == "papis"
libname: ClassVar[str] = 'test'

Name of the default library

settings: Dict[str, Any] | None

A set of settings to be added to the configuration on creation

overwrite: bool

If True, any configuration settings are overwritten by settings.

libdir: str

When entering the context manager, this will contain the directory of a temporary library to run tests on. The library is unpopulated by default

configdir: str

When entering the context manager, this will contain the config directory used by papis.

configfile: str

When entering the context manager, this will contain the config file used by papis.

property tmpdir: str

Base temporary directory name.

create_random_file(filetype: str | None = None, prefix: str | None = None, suffix: str | None = None) str[source]

Create a random file in the tmpdir using create_random_file.

class papis.testing.TemporaryLibrary(settings: Dict[str, Any] | None = None, use_git: bool = False, populate: bool = True)[source]

A context manager used to create a temporary papis configuration with a library.

This extends TemporaryConfiguration with more support for creating and maintaining a temporary library. This can be used by tests that specifically require handling documents in a library.

use_git

If True, a git repository is created in the library directory.

populate

If True, the library is prepopulated with a set of documents that contain random files and keys, which can be used for testing.

class papis.testing.PapisRunner(**kwargs: Any)[source]

A wrapper around click.testing.CliRunner.

invoke(cli: Command, args: Sequence[str], **kwargs: Any) Result[source]

A simple wrapper around the click.testing.CliRunner.invoke() method that does not catch exceptions by default.

class papis.testing.ResourceCache(cachedir: str)[source]

A class that handles retrieving local and remote resources for tests from default folders.

This class mainly exists to test importers and downloaders that require getting a remote resource and testing it against results of the papis converters.

It can be controlled by the PAPIS_UPDATE_RESOURCES environment variable, which takes the values:

  • "none": no resources are downloaded or updated (default).

  • "remote": remote resources are downloaded and the on-disk files are updated (used in get_remote_resource()).

  • "local": local resources are updated with the results of the papis conversion (used in get_local_resource()).

  • "both": both local and remote resources are updated.

Resources can then be retrieved as

# Call some function that retrieves and converts remote data
local = papis.arxiv.get_data(...)

# Check that the expected cached resource matches the result
expected_local = cache.get_local_resource("resources/test.json", local)
assert local == expected_local
cachedir

The location of the resource directory.

session

A requests.Session used to download remote resources.

get_remote_resource(filename: str, url: str, force: bool = False, params: Dict[str, str] | None = None, headers: Dict[str, str] | None = None, cookies: Dict[str, str] | None = None) bytes[source]

Retrieve a remote resource from the resource cache.

If force is True, the filename does not exist or PAPIS_UPDATE_RESOURCES is set to ("remote", "both"), then the resource is downloaded from the remote location at url. Otherwise, it is retrieved from the locally cached version at filename.

Parameters:
  • filename – a file where to store the remote resource.

  • url – a remote URL from which to retrieve the resource.

  • force – if True, force updating the resource cached at filename.

  • params – additional params passed to requests.get().

  • headers – additional headers passed to requests.get().

  • cookies – additional cookies passed to requests.get().

get_local_resource(filename: str, data: Any, force: bool = False) Any[source]

Retrieve a local resource from the resource cache.

If force is True, the filename does not exist or PAPIS_UPDATE_RESOURCES is set to ("local", "both"), then the local resource is updated using data. Otherwise, it is retrieved from the locally cached version at filename.

Parameters:
  • filename – a file where to store the local resource.

  • data – data that should be retrieve from the resource.

  • force – if True, force updating the resource cached at filename.

papis.testing.tmp_config(request: SubRequest) Iterator[TemporaryConfiguration][source]

A fixture that creates a TemporaryConfiguration.

Additional keyword arguments can be passed using the config_setup marker

@pytest.mark.config_setup(overwrite=True)
def test_me(tmp_config: TemporaryConfiguration) -> None:
    ...
papis.testing.tmp_library(request: SubRequest) Iterator[TemporaryLibrary][source]

A fixture that creates a TemporaryLibrary.

Additional keyword arguments can be passed using the library_setup marker

@pytest.mark.library_setup(use_git=False)
def test_me(tmp_library: TemporaryLibrary) -> None:
    ...
papis.testing.resource_cache(request: SubRequest) ResourceCache[source]

A fixture that creates a ResourceCache.

Additional keyword arguments can be passed using the resource_setup marker

@pytest.mark.resource_setup(cachedir="resources")
def test_me(resource_cache: ResourceCache) -> None:
    ...

papis.utils

class papis.utils.A

Invariant typing.TypeVar

alias of TypeVar(‘A’)

class papis.utils.B

Invariant typing.TypeVar

alias of TypeVar(‘B’)

papis.utils.get_session() requests.Session[source]

Create a requests.Session for papis.

This session has the expected User-Agent (see user-agent), proxy (see downloader-proxy) and other settings used for papis. It is recommended to use it instead of creating a requests.Session at every call site.

papis.utils.parmap(f: Callable[[A], B], xs: Iterable[A], np: int | None = None) List[B][source]

Apply the function f to all elements of xs.

When available, this function uses the multiprocessing module to apply the function in parallel. This can have a noticeable performance impact when the number of elements of xs is large, but can also be slower than a sequential map().

The number of processes can also be controlled using the PAPIS_NP environment variable. Setting this variable to 0 will disable the use of multiprocessing on all platforms.

Parameters:
  • f – a callable to apply to a list of elements.

  • xs – an iterable of elements to apply the function f to.

  • np – number of processes to use when applying the function f in parallel. This value defaults to PAPIS_NP or os.cpu_count().

papis.utils.run(cmd: Sequence[str], wait: bool = True, env: Dict[str, Any] | None = None, cwd: str | None = None) None[source]

Run a given command with subprocess.

This is a simple wrapper around subprocess.Popen with custom defaults used to call Papis commands.

Parameters:
  • cmd – a sequence of arguments to run, where the first entry is expected to be the command name and the remaining entries its arguments.

  • wait – if True wait for the process to finish, otherwise detach the process and return immediately.

  • env – a mapping that defines additional environment variables for the child process.

  • cwd – current working directory in which to run the command.

papis.utils.general_open(file_name: str, key: str, default_opener: str | None = None, wait: bool = True) None[source]

Open a file with a configured open tool (executable).

Parameters:
  • file_name – a file path to open.

  • key – a key in the configuration file to determine the opener used, e.g. opentool.

  • default_opener – an existing executable that can be used to open the file given by file_name. By default, the opener given by key, if any, or the default papis opener are used.

  • wait – if True wait for the process to finish, otherwise detach the process and return immediately.

papis.utils.open_file(file_path: str, wait: bool = True) None[source]

Open file using the configured opentool.

Parameters:
  • file_path – a file path to open.

  • wait – if True wait for the process to finish, otherwise detach the process and return immediately.

papis.utils.get_folders(folder: str) List[str][source]

Get all folders with papis documents inside of folder.

This is the main indexing routine. It looks inside folder and crawls the whole directory structure in search of subfolders containing an info file. The name of the file must match the configured info-name.

Parameters:

folder – root folder to look into.

Returns:

List of folders containing an info file.

papis.utils.create_identifier(input_list: str | None = None, skip: int = 0) Iterator[str][source]

Creates an infinite list of identifiers based on input_list.

This creates a generator object capable of iterating over lists to create unique products of increasing cardinality (see here). This is mainly intended to create suffixes for existing strings, e.g. file names, to ensure uniqueness.

Parameters:
  • input_list – list to iterate over

  • skip – number of identifiers to skip.

>>> import string
>>> m = create_identifier(string.ascii_lowercase)
>>> next(m)
'a'
papis.utils.clean_document_name(doc_path: str, is_path: bool = True) str[source]

Clean a string to only contain visible ASCII characters.

This function uses slugify to create ASCII strings that can be used safely as file names or printed to consoles that do not necessarily support full unicode.

By default, it assumes that the input is a path and will only look at its basename. This can have unintended results for other strings and can be disabled by setting is_path to False.

Parameters:
  • doc_path – a string to be cleaned.

  • is_path – if True, only the basename of doc_path is cleaned, as obtained from os.path.basename().

Returns:

a cleaned ASCII string.

papis.utils.locate_document_in_lib(document: Document, library: str | None = None) Document[source]

Locate a document in a library.

This function uses the unique-document-keys to determine if the current document matches any document in the library. The first document for which a key matches exactly will be returned.

Parameters:
  • document – the document to search for.

  • library – the name of a valid papis library.

Returns:

a full document as found in the library.

Raises:

IndexError – No document found in the library.

papis.utils.locate_document(document: Document, documents: Iterable[Document]) Document | None[source]

Locate a document in a list of documents.

This function uses the unique-document-keys to determine if the current document matches any document in the list. The first document for which a key matches exactly will be returned.

Parameters:
  • document – the document to search for.

  • documents – an iterable of existing documents to match against.

Returns:

a document from documents which matches the given document or None if no document is found.

papis.utils.folders_to_documents(folders: Iterable[str]) List[Document][source]

Load a list of documents from their respective folders.

Parameters:

folders – a list of folder paths to load from.

Returns:

a list of document objects.

papis.utils.update_doc_from_data_interactively(document: Document | Dict[str, Any], data: Dict[str, Any], data_name: str) None[source]

Shows a TUI to update the document interactively with fields from data.

Parameters:
  • document – a document (or a mapping convertible to a document) which is going to be updated.

  • data – additional data to select and merge into document.

  • data_name – an identifier for the data to show in the TUI.

papis.utils.get_cache_home() str[source]

Get default cache directory.

This will retrieve the cache-dir configuration setting. It is XDG standard compatible.

Returns:

the absolute path for the cache main folder.

papis.utils.get_matching_importer_or_downloader(uri: str, download_files: bool | None = None, only_data: bool | None = None) List[Importer][source]

Gets all the importers and downloaders that match uri.

This function tries to match the URI using match() and extract the data using fetch(). Only importers that fetch the data without issues are returned.

Parameters:
  • uri – an URI to match the importers against.

  • download_files – if True, importers and downloaders also try to download files (PDFs, etc.) instead of just metadata.

papis.utils.get_matching_importer_by_name(name_and_uris: Iterable[Tuple[str, str]], download_files: bool | None = None, only_data: bool | None = None) List[Importer][source]

Get importers that match the given URIs.

This function tries to match the URI using match() and extract the data using fetch(). Only importers that fetch the data without issues are returned.

Parameters:
  • name_and_uris – an list of (name, uri) of importer names and URIs to match them against.

  • download_files – if True, importers and downloaders also try to download files (PDFs, etc.) instead of just metadata.

papis.utils.collect_importer_data(importers: Iterable[Importer], batch: bool = True, use_files: bool | None = None, only_data: bool | None = None) Context[source]

Collect all data from the given importers.

It is assumed that the importers have called the needed fetch methods, so all data has been downloaded and converted. This function is meant to only do the aggregation.

Parameters:
  • batch – if True, overwrite data from previous importers, otherwise ask the user to manually merge.

  • use_files – if True, both metadata and files are collected from the importers.

papis.utils.is_relative_to(path: str, other: str) bool[source]

Check if paths are relative to each other.

This is equivalent to pathlib.PurePath.is_relative_to().

Returns:

True if path is relative to the other path.

papis.yaml

papis.yaml.data_to_yaml(yaml_path: str, data: Dict[str, Any], *, allow_unicode: bool | None = True) None[source]

Save data to yaml_path in the YAML format.

Parameters:
  • yaml_path – path to a file.

  • data – data to write to the file as a YAML document.

papis.yaml.list_to_path(data: Sequence[Dict[str, Any]], filepath: str, *, allow_unicode: bool | None = True) None[source]

Save a list of dicts to a YAML file.

Parameters:
  • data – a sequence of dictionaries to save as YAML documents.

  • filepath – path to a file.

papis.yaml.yaml_to_data(yaml_path: str, raise_exception: bool = False) Dict[str, Any][source]

Read a YAML document from yaml_path.

Parameters:
  • yaml_path – path to a file.

  • raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.

Returns:

a dict containing the data from the YAML document.

Raises:

ValueError – if the document cannot be loaded due to YAML parsing errors.

papis.yaml.yaml_to_list(yaml_path: str, raise_exception: bool = False) List[Dict[str, Any]][source]

Read a list of YAML documents.

This is analogous to yaml_to_data(), but uses yaml.load_all to read multiple documents (see PyYAML docs).

Parameters:
  • yaml_path – path to a file containing YAML documents.

  • raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.

Returns:

a list of dict objects, one for each YAML document in the file.

Raises:

ValueError – if the documents cannot be loaded due to YAML parsing errors.

papis.yaml.exporter(documents: List[Document]) str[source]

Convert document to the YAML format

class papis.yaml.Importer(uri: str)[source]

Importer that parses a YAML file

classmethod match(uri: str) Importer | None[source]

Check if the uri points to an existing YAML file.

fetch_data() Any[source]

Fetch metadata from the YAML file.

papis.commands.doctor

papis.commands.doctor.FixFn

Callable for automatic doctor fixers. This callable is constructed by a check and is expected to wrap all the required data, so it takes no arguments.

alias of Callable[[], None]

papis.commands.doctor.CheckFn

Callable for doctor document checks.

alias of Callable[[Document], List[Error]]

class papis.commands.doctor.Error(name: str, path: str, payload: str, msg: str, suggestion_cmd: str, fix_action: Callable[[], None] | None, doc: Document | None)[source]

A detailed error error returned by a doctor check.

name: str

Name of the check generating the error.

path: str

Path to the document that generated the error.

payload: str

A value that caused the error.

msg: str

A short message describing the error that can be displayed to the user.

suggestion_cmd: str

A command to run to fix the error that can be suggested to the user.

fix_action: Callable[[], None] | None

A callable that can autofix the error (see FixFn). Note that this will change the attached doc.

doc: Document | None

The document that generated the error.

class papis.commands.doctor.Check(name, operate)[source]
name: str

Name of the check

operate: Callable[[Document], List[Error]]

A callable that takes a document and returns a list of errors generated by the current check (see CheckFn).

papis.commands.doctor.register_check(name: str, check: Callable[[Document], List[Error]]) None[source]

Register a new check.

Registered checks are recognized by papis and can be used by users in their configuration files through doctor-default-checks or on the command line through the --checks flag.

papis.commands.doctor.files_check(doc: Document) List[Error][source]

Check whether the files of a document actually exist in the filesystem.

Returns:

a list of errors, one for each file that does not exist.

papis.commands.doctor.keys_exist_check(doc: Document) List[Error][source]

Checks whether the keys provided in the configuration option doctor-keys-exist-keys exit in the document and are non-empty.

Returns:

a list of errors, one for each key that does not exist.

papis.commands.doctor.refs_check(doc: Document) List[Error][source]

Checks that a ref exists and if not it tries to create one according to the ref-format configuration option.

Returns:

an error if the reference does not exist or contains invalid characters (as required by BibTeX).

papis.commands.doctor.duplicated_keys_check(doc: Document) List[Error][source]

Check for duplicated keys in the list given by the doctor-duplicated-keys-keys configuration option.

Returns:

a list of errors, one for each key with a value that already exist in the documents from the current query.

papis.commands.doctor.duplicated_values_check(doc: Document) List[Error][source]

Check if the keys given by doctor-duplicated-values-keys contain any duplicate entries. These keys are expected to be lists of items.

Returns:

a list of errors, one for each key with a value that has duplicate entries.

papis.commands.doctor.bibtex_type_check(doc: Document) List[Error][source]

Check that the document type is compatible with BibTeX or BibLaTeX type descriptors.

Returns:

an error if the types are not compatible.

papis.commands.doctor.biblatex_type_alias_check(doc: Document) List[Error][source]

Check that the BibLaTeX type of the document is not a known alias.

The aliases are described by bibtex_type_aliases.

Returns:

an error if the type of the document is an alias.

papis.commands.doctor.biblatex_key_alias_check(doc: Document) List[Error][source]

Check that no BibLaTeX keys in the document are known aliases.

The aliases are described by bibtex_key_aliases. Note that these keys can also be converted on export to BibLaTeX.

Returns:

an error for each key of the document that is an alias.

papis.commands.doctor.biblatex_required_keys_check(doc: Document) List[Error][source]

Check that required BibLaTeX keys are part of the document based on its type.

The required keys are described by papis.bibtex.bibtex_type_required_keys. Note that most BibLaTeX processors will be quite forgiving if these keys are missing.

Returns:

an error for each key of the document that is missing.

papis.commands.doctor.get_key_type_check_keys() Dict[str, type][source]

Check the doctor-key-type-check-keys configuration entry for correctness.

The doctor-key-type-check-keys configuration entry defines a mapping of keys and their expected types. If the desired type is a list, the doctor-key-type-check-separator setting can be used to split an existing string (and, similarly, if the desired type is a string, it can be used to join a list of items).

Returns:

A dictionary mapping key names to types.

papis.commands.doctor.key_type_check(doc: Document) List[Error][source]

Check document keys have expected types.

Returns:

a list of errors, one for each key does not have the expected type (if it exists).

papis.commands.doctor.html_codes_check(doc: Document) List[Error][source]

Checks that the keys in doctor-html-codes-keys configuration options do not contain any HTML codes like &amp; etc.

Returns:

a list of errors, one for each key that contains HTML codes.

papis.commands.doctor.html_tags_check(doc: Document) List[Error][source]

Checks that the keys in doctor-html-tags-keys configuration options do not contain any HTML tags like <href> etc.

Returns:

a list of errors, one for each key that contains HTML codes.

papis.commands.doctor.gather_errors(documents: List[Document], checks: List[str] | None = None) List[Error][source]

Run all checks over the list of documents.

Only checks registered with register_check() are supported and any unrecongnized checks are automatically skipped.

Parameters:

checks – a list of checks to run over the documents. If not provided, the default doctor-default-checks are used.

Returns:

a list of all the errors gathered from the documents.

papis.commands.doctor.fix_errors(doc: Document, checks: List[str] | None = None) None[source]

Fix errors in doc for the given checks.

This function only applies existing auto-fixers to the document. This is not possible for many of the existing checks, but can be used to quickly clean up a document.

papis.commands.doctor.process_errors(errors: List[Error], fix: bool = False, explain: bool = False, suggest: bool = False, edit: bool = False) None[source]

Process a list of document errors from gather_errors().

Parameters:
  • fix – if True, any automatic fixes are applied to the document the error refers to.

  • explain – if True, a short explanation of the error is shown.

  • suggest – if True, a short suggestion for manual fixing of the error is shown.

  • edit – if True, the document is opened for editing.

papis.commands.doctor.run(doc: Document, checks: List[str] | None = None, fix: bool = True, explain: bool = False, suggest: bool = False, edit: bool = False) None[source]

Runner for papis doctor.

It runs all the checks given by the checks argument that have been registered through register_check(). It then proceeds with processing and fixing each error in turn.