Developer API reference

Warning

The APIs documented here are not stable and may change from one version to another. This is meant to be used by developers, both of papis itself and any external plugins.

`papis.bibtex`

A set of utilities for working with BibTeX and BibLaTeX (as described in the manual).

papis.bibtex.bibtex_standard_types = frozenset({'article', 'book', 'bookinbook', 'booklet', 'collection', 'dataset', 'inbook', 'incollection', 'inproceedings', 'inreference', 'manual', 'misc', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'periodical', 'proceedings', 'reference', 'report', 'software', 'suppbook', 'suppcollection', 'suppperiodical', 'thesis', 'unpublished'}): Regular BibLaTeX types (Section 2.1.1).

papis.bibtex.bibtex_type_aliases = {'conference': 'inproceedings', 'electronic': 'online', 'mastersthesis': 'thesis', 'phdthesis': 'thesis', 'techreport': 'report', 'www': 'online'}: BibLaTeX type aliases (Section 2.1.2).

papis.bibtex.bibtex_non_standard_types = frozenset({'artwork', 'audio', 'bibnote', 'commentary', 'image', 'jurisdiction', 'legal', 'legislation', 'letter', 'movie', 'music', 'performance', 'review', 'standard', 'video'}): Non-standard BibLaTeX types (Section 2.1.3).

papis.bibtex.biblatex_software_types = frozenset({'codefragment', 'software', 'softwaremodule', 'softwareversion'}): BibLaTeX Software types (Section 2).

papis.bibtex.bibtex_types = frozenset({'article', 'artwork', 'audio', 'bibnote', 'book', 'bookinbook', 'booklet', 'codefragment', 'collection', 'commentary', 'conference', 'dataset', 'electronic', 'image', 'inbook', 'incollection', 'inproceedings', 'inreference', 'jurisdiction', 'legal', 'legislation', 'letter', 'manual', 'mastersthesis', 'misc', 'movie', 'music', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'performance', 'periodical', 'phdthesis', 'proceedings', 'reference', 'report', 'review', 'software', 'softwaremodule', 'softwareversion', 'standard', 'suppbook', 'suppcollection', 'suppperiodical', 'techreport', 'thesis', 'unpublished', 'video', 'www'}): A set of known BibLaTeX types (as described in Section 2.1 of the manual). These types are a union of the types above and can be extended with extra-bibtex-types.

papis.bibtex.bibtex_standard_keys = frozenset({'abstract', 'addendum', 'afterword', 'annotation', 'annotator', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'file', 'foreword', 'holder', 'howpublished', 'indextitle', 'institution', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'label', 'language', 'library', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'publisher', 'pubstate', 'reprinttitle', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'subtitle', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'year'}): BibLaTeX data fields (Section 2.2.2).

papis.bibtex.bibtex_key_aliases = {'address': 'location', 'annote': 'annotation', 'archiveprefix': 'eprinttype', 'journal': 'journaltitle', 'key': 'sortkey', 'pdf': 'file', 'primaryclass': 'eprintclass', 'school': 'institution'}: BibLaTeX field aliases (Section 2.2.5).

papis.bibtex.bibtex_special_keys = frozenset({'crossref', 'entryset', 'execute', 'gender', 'ids', 'indexsorttitle', 'keywords', 'langid', 'langidopts', 'options', 'presort', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'xdata', 'xref'}): Special BibLaTeX fields (Section 2.2.3).

papis.bibtex.biblatex_software_keys = frozenset({'abstract', 'author', 'date', 'doi', 'editor', 'eprint', 'eprintclass', 'eprinttype', 'file', 'hal_id', 'hal_version', 'institution', 'introducedin', 'license', 'month', 'note', 'organization', 'publisher', 'related', 'relatedstring', 'relatedtype', 'repository', 'subtitle', 'swhid', 'title', 'url', 'urldate', 'version', 'year'}): BibLaTeX software keys (Section 3). Most of these keys are already standard BibLaTeX keys from bibtex_standard_keys.

papis.bibtex.bibtex_keys = frozenset({'abstract', 'addendum', 'address', 'afterword', 'annotation', 'annotator', 'annote', 'archiveprefix', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'crossref', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entryset', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'execute', 'file', 'foreword', 'gender', 'hal_id', 'hal_version', 'holder', 'howpublished', 'ids', 'indexsorttitle', 'indextitle', 'institution', 'introducedin', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journal', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'key', 'keywords', 'label', 'langid', 'langidopts', 'language', 'library', 'license', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'options', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'pdf', 'presort', 'primaryclass', 'publisher', 'pubstate', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'repository', 'reprinttitle', 'school', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'subtitle', 'swhid', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'xdata', 'xref', 'year'}): A set of known BibLaTeX fields (as described in Section 2.2 of the manual). These fields are a union of the above fields and can be extended with extended with extra-bibtex-keys.

papis.bibtex.bibtex_type_required_keys = {'article': ({'author'}, {'title'}, {'eprinttype', 'journaltitle'}, {'date', 'year'}), 'book': ({'author'}, {'title'}, {'date', 'year'}), 'booklet': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'codefragment': ({'url'},), 'collection': ({'editor'}, {'title'}, {'date', 'year'}), 'dataset': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'inbook': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'incollection': ({'author'}, {'title'}, {'editor'}, {'booktitle'}, {'date', 'year'}), 'inproceedings': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'manual': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'misc': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'online': ({'author', 'editor'}, {'title'}, {'date', 'year'}, {'doi', 'eprint', 'url'}), 'patent': ({'author'}, {'title'}, {'number'}, {'date', 'year'}), 'periodical': ({'editor'}, {'title'}, {'date', 'year'}), 'proceedings': ({'title'}, {'date', 'year'}), 'report': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'software': ({'author', 'editor'}, {'title'}, {'url'}, {'year'}), 'softwaremodule': ({'author'}, {'subtitle'}, {'url'}, {'year'}), 'softwareversion': ({'author', 'editor'}, {'title'}, {'url'}, {'version'}, {'year'}), 'thesis': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'unpublished': ({'author'}, {'title'}, {'date', 'year'}), None: ()}: A mapping of supported BibLaTeX entry types (see bibtex_types) to BibLaTeX fields (see bibtex_keys). Each value is a tuple of disjoint sets that can contain multiple fields required for the particular type, e.g. an article may require either a year or a date field.

papis.bibtex.bibtex_type_required_keys_aliases = {'bookinbook': 'inbook', 'inreference': 'incollection', 'mvbook': 'book', 'mvcollection': 'collection', 'mvproceedings': 'proceedings', 'mvreference': 'collection', 'reference': 'collection', 'suppbook': 'book', 'suppcollection': 'collection', 'suppperiodical': 'periodical'}: A mapping for additional BibLaTeX types that have the same required fields. This mapping can be used to convert types before looking into bibtex_type_required_keys.

papis.bibtex.bibtex_type_converter: dict[str, str] = {'OriginalPaper': 'article', 'annotation': 'misc', 'attachment': 'misc', 'audioRecording': 'audio', 'bill': 'legislation', 'blogPost': 'online', 'bookSection': 'inbook', 'case': 'jurisdiction', 'computerProgram': 'software', 'conferencePaper': 'inproceedings', 'dictionaryEntry': 'misc', 'document': 'article', 'email': 'online', 'encyclopediaArticle': 'article', 'film': 'video', 'forumPost': 'online', 'hearing': 'jurisdiction', 'instantMessage': 'online', 'interview': 'article', 'journal': 'article', 'journalArticle': 'article', 'magazineArticle': 'article', 'manuscript': 'unpublished', 'map': 'misc', 'monograph': 'book', 'newspaperArticle': 'article', 'note': 'misc', 'podcast': 'audio', 'preprint': 'unpublished', 'presentation': 'misc', 'radioBroadcast': 'audio', 'statute': 'jurisdiction', 'tvBroadcast': 'video', 'videoRecording': 'video', 'webpage': 'online'}: A mapping of arbitrary types to BibLaTeX types in bibtex_types. This mapping can be used when translating from other software, e.g. Zotero has custom fields in its schema.

papis.bibtex.bibtex_key_converter: dict[str, str] = {'abstractNote': 'abstract', 'conferenceName': 'eventtitle', 'place': 'location', 'proceedingsTitle': 'booktitle', 'publicationTitle': 'journal', 'university': 'school'}: A mapping of arbitrary fields to BibLaTeX fields in bibtex_keys. This mapping can be used when translating from other software.

papis.bibtex.bibtex_ignore_keys = frozenset({'file'}): A set of BibLaTeX fields to ignore when exporting from the Papis database. These can be extended with bibtex-ignore-keys.

papis.bibtex.ref_allowed_characters = '([^a-zA-Z0-9._:]+|(?<!\\\\)[._:])': A regex for acceptable characters to use in a reference string. These are used by ref_cleanup() to remove any undesired characters.

papis.bibtex.bibtex_verbatim_fields = frozenset({'doi', 'eprint', 'file', 'pdf', 'url', 'urlraw'}): A list of fields that should not be escaped. In general, these will be escaped by the BibTeX engine and should not be modified (e.g. Verbatim fields and URI fields in Section 2.2.1).

papis.bibtex.bibtexparser_entry_to_papis(entry: dict[str, Any]) → dict[str, Any][source]

Convert the keys of a BibTeX entry parsed by bibtexparser to a papis-compatible format.

Parameters:: entry – a dictionary with keys parsed by bibtexparser.
Returns:: a dictionary with keys converted to a papis-compatible format.

papis.bibtex.bibtex_to_dict(bibtex: str) → list[DocumentLike][source]

Convert a BibTeX file (or string) to a list of Papis-compatible dictionaries.

This will convert an entry like:

@article{ref,
    author = { ... },
    title = { ... },
    ...,
}

to a dictionary such as:

{ "type": "article", "author": "...", "title": "...", ...}

Parameters:: bibtex – a path to a BibTeX file or a string containing BibTeX formatted data. If it is a file, its contents are passed to BibTexParser.
Returns:: a list of entries from the BibTeX data in a compatible format.

papis.bibtex.ref_cleanup(ref: str, ref_word_separator: str | None = None) → str[source]

Function to cleanup reference strings so that they are accepted by BibLaTeX.

This uses the ref_allowed_characters to remove any disallowed characters from the given ref. Furthermore, slugify is used to remove unicode characters and ensure consistent use of the underscore _ as a separator.

Returns:: a reference without any disallowed characters.

papis.bibtex.create_reference(doc: DocumentLike, *, ref_format: AnyString | None = None, ref_word_separator: str | None = None, force: bool = False) → str[source]

Try to create a reference for the document doc.

If the document doc does not have a "ref" key, this function attempts to create one, otherwise the existing key is returned. When creating a new reference:

the ref-format key is used, if available,
the document DOI is used, if available,
a string is constructed from the document data (author, title, etc.).

Parameters:

force – if True, the reference is re-created even if the document already has a "ref" key.
ref_word_separator – separator passed to ref_cleanup().

Returns:

a clean (see ref_cleanup()) reference for the document.

papis.bibtex.author_list_to_author(doc: Document, author_list: list[dict[str, Any]]) → str[source]

Construct the BibTeX author field from the document’s author_list.

This function is similar to papis.document.author_list_to_author(), but takes into account some BibTeX peculiarities: * The separator between the authors is always “and” and * Authors with only a family or given names are surrounded by curly brackets.

Returns:: an author string.

`papis.citations`

papis.citations.Citation

A citation for an existing document.

alias of dict[str, Any]

papis.citations.Citations

A list of citations for an existing document.

alias of list[dict[str, Any]]

papis.citations.get_metadata_citations(doc: DocumentLike) → Citations[source]: Get the citations in the metadata that contain a DOI.

papis.citations.fetch_citations(doc: Document) → Citations[source]

Retrieve citations for the document.

Citation retrieval is mainly based on querying Crossref metadata based on the DOI of the document. If the document does not have a DOI, this function will fail to retrieve any citations.

Returns:: a list of citations that have a DOI.

papis.citations.get_citations_from_database(dois: Sequence[str]) → Citations[source]

Look for document DOIs in the database.

Parameters:: dois – a sequence of DOIs to look for in the current library database.
Returns:: a sequence of documents from the current library that match the given dois, if any.

papis.citations.update_and_save_citations_from_database_from_doc(doc: Document) → None[source]

Update the citations file of an existing document.

This function will get any existing citations in the document, update them as appropriate and save them back to the citation file.

papis.citations.update_citations_from_database(citations: list[dict[str, Any]]) → list[dict[str, Any]][source]

Update a list of citations with data from the database.

Parameters:: citations – a list of existing citations to update.

papis.citations.save_citations(doc: Document, citations: Citations) → None[source]: Save the citations to the document’s citation file.

papis.citations.fetch_and_save_citations(doc: Document) → None[source]: Retrieve citations from available sources and save them to the citations file.

papis.citations.get_citations_file(doc: Document) → str | None[source]

Get the document’s citation file path (see citations-file-name).

Returns:: an absolute path to the citations file for doc.

papis.citations.has_citations(doc: Document) → bool[source]

Returns:: True if the document has an existing citations file and False otherwise.

papis.citations.get_citations(doc: Document) → Citations[source]: Retrieve citations from the document’s citation file.

papis.citations.get_cited_by_file(doc: Document) → str | None[source]

Get the documents cited-by file (see cited-by-file-name).

Returns:: an absolute path to the cited-by file for doc.

papis.citations.has_cited_by(doc: Document) → bool[source]

Returns:: True if the document has a cited-by file and False otherwise.

papis.citations.save_cited_by(doc: Document, citations: Citations) → None[source]: Save the cited-by list citations to the document’s cited-by file.

papis.citations.fetch_cited_by_from_database(cit: dict[str, Any]) → list[dict[str, Any]][source]

Fetch a list of documents that cite cit from the database.

Parameters:: cit – a citation to look for in the database.
Returns:: a list of documents that cite cit.

papis.citations.fetch_and_save_cited_by_from_database(doc: Document) → None[source]: Call fetch_cited_by_from_database() and save_cited_by().

papis.citations.get_cited_by(doc: Document) → Citations[source]: Get cited-by citations for the given document.

`papis.cli`

class papis.cli.FormatPatternParamType[source]

name: str = 'pattern': Name of the parameter type (shown in the command-line).

convert(value: Any, param: Parameter | None, ctx: Context | None) → Any[source]: See click.ParamType.convert().

class papis.cli.LibraryParamType[source]

name: str = 'library': the descriptive name of this type

shell_complete(ctx: Context, param: Parameter, incomplete: str) → list[CompletionItem][source]

Return a list of CompletionItem objects for the incomplete value. Most types do not provide completions, but some do, and this allows custom types to provide custom completions as well.

Parameters:

ctx – Invocation context for this command.
param – The parameter that is requesting completion.
incomplete – Value being completed. May be empty.

Added in version 8.0.

papis.cli.bool_flag(*args: Any, **kwargs: Any) → Callable[[...], Any][source]: A wrapper to click.option() that hardcodes a boolean flag option.

papis.cli.query_argument(**attrs: Any) → Callable[[...], Any][source]: Adds a query argument as a click.argument() decorator.

papis.cli.query_option(**attrs: Any) → Callable[[...], Any][source]: Adds a -q, --query option as a click.option() decorator.

papis.cli.sort_option(**attrs: Any) → Callable[[...], Any][source]: Adds a --sort and a --reverse option as a click.option() decorator.

papis.cli.doc_folder_option(**attrs: Any) → Callable[[...], Any][source]: Adds a --doc-folder argument as a click.option() decorator.

papis.cli.all_option(**attrs: Any) → Callable[[...], Any][source]: Adds a --all option as a click.option() decorator.

papis.cli.git_option(**attrs: Any) → Callable[[...], Any][source]: Adds a --git option as a click.option() decorator.

papis.cli.handle_doc_folder_or_query(query: str, doc_folder: str | tuple[str, ...] | None, library_name: str | None = None) → list[Document][source]

Query database for documents.

This handles the query_option() and doc_folder_option() command-line arguments. If a doc_folder is given, then the document at that location is loaded, otherwise the database is queried using query.

Parameters:

query – a database query string.
doc_folder – existing document folder (see papis.document.from_folder()).
library_name – library database to query.

papis.cli.handle_doc_folder_query_sort(query: str, doc_folder: str | tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool) → list[Document][source]

Query database for documents.

Similar to handle_doc_folder_or_query(), but also handles the sort_option() arguments. It sorts the resulting documents according to sort_field and reverse_field.

Parameters:

sort_field – field by which to sort the resulting documents (see papis.document.sort()).
sort_reverse – if True, the fields are sorted in reverse order.

papis.cli.handle_doc_folder_query_all_sort(query: str, doc_folder: str | tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool, _all: bool) → list[Document][source]

Query database for documents.

Similar to handle_doc_folder_query_sort(), but also handles the all_option() argument.

Parameters:: _all – if False, the user is prompted to pick a subset of documents (see papis.api.pick_doc()).

papis.cli.bypass(group: Group, command: Command, command_name: str) → Callable[[...], Any][source]

Overwrite existing papis commands.

This function is especially important for developing scripts in papis.

For example, consider augmenting the add command, as seen when using papis add. In this case, we may want to add some additional options or behavior before calling papis.commands.add, but would like to avoid writing it from scratch. This function can then be used as follows to allow this:

import click
import papis.cli
import papis.commands.add

@click.group()
def main():
    """Your main app"""
    pass

@papis.cli.bypass(main, papis.commands.add.cli, "add")
def add(**kwargs):
    # do some logic here...
    # and call the original add command line function by
    papis.commands.add.cli.bypassed(**kwargs)

`papis.commands`

papis.commands.COMMAND_NAMESPACE_NAME = 'papis.command': Name of the entry point namespace for Command plugins.

papis.commands.EXTERNAL_COMMAND_REGEX = re.compile('.*papis-([^ .]+)$'): Regex for determining external commands.

papis.commands.make_short_help(text: str, fallback: str = 'No help message available.') → str[source]

Create a short help from the given text.

This will take the first paragraph of the text and remove any known restructuredText markup so that it can be shown as a help string in the command line.

If the text is actually empty, the fallback will be returned.

papis.commands.normalize_help(text: str | None) → str[source]

Clean up the given text so that it can be shown on the command-line.

Similarly to make_short_help(), this removes ny known restructuredText markup from the text and does additional normalizations so that it can be better displayed on the command-line.

class papis.commands.FullHelpCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: list[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool | str = False)[source]

This is a simple wrapper around click.Command that does not truncate the short help messages.

We still very much prefer that these stay short if at all possible, but the default limit of 45 characters does not work well for many non-trivial commands.

format_help_text(ctx: Context, formatter: HelpFormatter) → None[source]: Writes the help text to the formatter if it exists.

get_short_help_str(limit: int = 45) → str[source]: Gets short help for the command or makes it by shortening the long help string.

class papis.commands.AliasedGroup(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, invoke_without_command: bool = False, no_args_is_help: bool | None = None, subcommand_metavar: str | None = None, chain: bool = False, result_callback: Callable[[...], Any] | None = None, **kwargs: Any)[source]

A click.Group that accepts command aliases.

This group command is taken from here and is to be used for groups with aliases. In this case, aliases are defined as prefixes of the command. For example, for a command named remove, rem is also accepted as long as it is unique.

command_class[source]: alias of FullHelpCommand

get_command(ctx: Context, cmd_name: str) → Command | None[source]

Returns:: given a context and a command name, this returns a click.Command object if it exists or returns None.

class papis.commands.CommandPluginLoaderGroup(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, invoke_without_command: bool = False, no_args_is_help: bool | None = None, subcommand_metavar: str | None = None, chain: bool = False, result_callback: Callable[[...], Any] | None = None, **kwargs: Any)[source]

A click.Group that loads additional commands from entry points.

Commands in this group are loaded using get_commands(). By default commands from the COMMAND_NAMESPACE_NAME namespace are loaded. Additional external scripts that are found in the path and match the EXTERNAL_COMMAND_REGEX are also loaded.

To overwrite this behavior, create a subclass and modify the command_plugins() method to load commands from other namespaces.

command_class[source]: alias of FullHelpCommand

property command_plugins: dict[str, CommandPlugin]: A mapping of command names to available command plugins.

property command_plugin_names: list[str]: A list of all commands available through plugins.

list_commands(ctx: click.Context) → list[str][source]

List all matched commands in the command folder and in path

>>> group = CommandPluginLoaderGroup()
>>> rv = group.list_commands(None)
>>> len(rv) > 0
True

get_command(ctx: Context, name: str) → Command | None[source]

Get the command to be run.

>>> group = CommandPluginLoaderGroup()
>>> cmd = group.get_command(None, 'add')
>>> cmd.name, cmd.help
('add', 'Add...')
>>> group.get_command(None, 'this command does not exist')
Command ... is unknown!

class papis.commands.CommandPlugin(command_name: str, path: str | None, entrypoint: EntryPoint | None)[source]

A papis command plugin or script.

These plugins are made available through the main papis command-line interface as subcommands.

command_name: str: The name of the command.

path: str | None: The path to the script if it is a separate executable.

entrypoint: EntryPoint | None: The module the plugin is imported from if it is an entry point.

papis.commands.load_command(cmd: CommandPlugin) → Command | None[source]

Load a command based on the given information in cmd.

If the command is an entry point, then it is loaded through the mechanisms in importlib.metadata.
If the command is an external executable, it is wrapped as an external command and all command-line arguments are passed through to it.

Returns:: a click.Command that can be used by a click.Group.

papis.commands.get_external_scripts(matcher: Pattern[str] | None = None) → dict[str, CommandPlugin][source]

Get a mapping of all external scripts that should be registered with Papis.

An external script is an executable that can be found in the papis.config.get_scripts_folder() folder or in the user’s PATH. The scripts are recognized by their file name using the provided matcher regular expression. For example, default Papis commands are always recognized using EXTERNAL_COMMAND_REGEX.

Returns:: a mapping of scripts that have been found.

papis.commands.get_command_plugins(namespace: str) → dict[str, CommandPlugin][source]

Get a mapping of entry points that should be registered as Papis commands.

Parameters:: namespace – a namespace for the entry point commands to retrieve.
Returns:: a mapping of plugins that have been found.

papis.commands.get_commands(namespace: str, *, extern_matcher: Pattern[str] | Literal[False] | None = None) → dict[str, CommandPlugin][source]

Get a mapping of all commands that should be registered with Papis.

This includes the results from get_external_scripts() and get_command_plugins(). Entrypoint-based scripts take priority, so if an external script with the same name is found it is silently ignored.

Parameters:

namespace – a namespace for the entry point commands to retrieve.
extern_matcher – a regular expression that matches file names of external commands (see get_external_scripts()). If False, no external commands are loaded.

Returns:

a mapping of scripts that have been found.

`papis.config`

papis.config.get_general_settings_name() → str[source]

Get the section name of the general settings.

>>> get_general_settings_name()
'settings'

class papis.config.Configuration[source]

A subclass of configparser.ConfigParser with custom defaults.

This class automatically reads the configuration file and imports any required scripts. If no file exists, a default one is created.

Use get_configuration() to instantiate this class instead of calling it directly.

papis.config.get_default_settings() → dict[str, dict[str, Any]][source]

Get the default settings for all non-user variables.

Additional user variables can be registered using register_default_settings() and will be included in this dictionary.

papis.config.register_default_settings(settings_dictionary: dict[str, dict[str, Any]]) → None[source]

Register configuration settings into the global configuration registry.

Notice that you can define sections or global options. For instance, let us suppose that a script called foobar defines some configuration options. The script might define the following:

import papis.config

options = {"foobar": { "command": "open"}}
papis.config.register_default_settings(options)

which can then be accessed globally through:

papis.config.get("command", section="foobar")

Parameters:: settings_dictionary – a dictionary of configuration settings, where the first level of keys defines the sections and the second level defines the actual configuration settings.

papis.config.get_config_home() → str[source]

Returns:: a (platform dependent) base directory relative to which user specific configuration files should be stored.

papis.config.get_config_folder() → str[source]

Get the main configuration folder.

Returns:: a (platform dependent) folder where the configuration files are stored, e.g. $HOME/.config/papis on POSIX platforms.

papis.config.get_config_file() → str[source]

Get the main configuration file.

Returns:: the path of the main configuration file, which by default is in get_config_folder(), but can be overwritten using set_config_file().

papis.config.set_config_file(filepath: str) → None[source]: Override the main configuration file.

papis.config.get_configpy_file() → str[source]

Get the main Python configuration file.

This is a file that will get automatically eval()ed if it exists and allows for more dynamic configuration.

Returns:: the path of the main Python configuration file, which by default is in get_config_folder().

papis.config.get_scripts_folder() → str[source]

Returns:: the folder where additional scripts are stored, which by default is in get_config_folder().

papis.config.set(key: str, value: Any, section: str | None = None) → None[source]

Set a key in the configuration.

Parameters:

key – the name of the key to set.
value – the value to set it to, which can be any value understood by the Configuration.
section – the name of the section to set the key in.

papis.config.general_get(key: str, section: str | None = None, data_type: type | None = None) → Any | None[source]

Get the value for a given key in section.

This function is a bit more general than the get from Configuration (see configparser.ConfigParser.get()). In particular it supports

Providing the key and section, in which case it will retrieve the key from that section directly.
The key has the format <section>-<key> and no section is specified. In this case, the full key is expected to be in the general settings section or a library section.

The priority of the search is given by

The key is retrieved from a library section.
The key is retrieved from the given section, if any.
The key is retrieved from the general section.

Parameters:

key – a key in the configuration file to retrieve.
section – a section from which to retrieve the key, which defaults to get_general_settings_name().
data_type – the data type that should be expected for the value of the variable.

papis.config.get(key: str, section: str | None = None) → Any | None[source]: Retrieve a general value (can be None) from the configuration file.

papis.config.getint(key: str, section: str | None = None) → int | None[source]

Retrieve an integer value from the configuration file.

>>> set("something", 42)
>>> getint("something")
42

papis.config.getfloat(key: str, section: str | None = None) → float | None[source]

Retrieve an floating point value from the configuration file.

>>> set("something", 0.42)
>>> getfloat("something")
0.42

papis.config.getboolean(key: str, section: str | None = None) → bool | None[source]

Retrieve a boolean value from the configuration file.

>>> set("add-open", True)
>>> getboolean("add-open")
True

papis.config.getstring(key: str, section: str | None = None) → str[source]

Retrieve a string value from the configuration file.

>>> set("add-open", "hello world")
>>> getstring("add-open")
'hello world'

papis.config.getformatpattern(key: str, section: str | None = None) → FormatPattern[source]

Retrieve a format pattern from the configuration file.

Format patterns use the FormatPattern class to define a string that should be formatted by a specific Formatter. For configuration options, such strings can be defined in the configuration file as:

[settings]
multiple-authors-format = {au[family]}, {au[given]}
multiple-authors-format.python = {au[family]}, {au[given]}
multiple-authors-format.jinja2 = {{ au[family] }}, {{ au[given] }}

i.e. like key[.formatter]. If no formatter is provided in the key name, the default formatter is used, as defined by formatter. Formatters are checked in alphabetical order and the last one is returned.

>>> from papis.strings import FormatPattern
>>> set("add-open", "hello world")
>>> r = getformatpattern("add-open")
>>> r.formatter
'python'

>>> set("add-open", FormatPattern("python", "hello world"))
>>> r = getformatpattern("add-open")
>>> r.formatter
'python'

>>> set("add-open.python", "hello world")
>>> r = getformatpattern("add-open")
>>> r.formatter
'python'

papis.config.getlist(key: str, section: str | None = None) → list[str][source]

Retrieve a list value from the configuration file.

This function uses eval() to execute a the string present in the configuration file into a Python list. This can be unsafe if the list contains unknown code.

>>> set("tags", "['a', 'b', 'c']")
>>> getlist("tags")
['a', 'b', 'c']

Raises:: papis.exceptions.UnexpectedSettingTypeError – Whenever the parsed syntax is either not a valid python object or not a valid python list.

papis.config.get_configuration() → Configuration[source]

Get the configuration object,

If no configuration has been initialized, it initializes one. Only one configuration per process should ever be configured.

papis.config.merge_configuration_from_path(path: str | None, configuration: Configuration) → None[source]

Merge information of a configuration file found in path into configuration.

Parameters:

path – a path to a configuration file.
configuration – an existing Configuration object.

papis.config.set_lib(library: Library) → None[source]: Set the current library.

papis.config.set_lib_from_name(libname: str) → None[source]

Set the current library from a name.

Parameters:: libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.

papis.config.get_lib_from_name(libname: str) → Library[source]

Get a library object from a name.

Parameters:: libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.

papis.config.get_lib_dirs() → list[str][source]: Get the directories of the current library.

papis.config.get_lib_name() → str[source]: Get the name of the current library.

papis.config.get_lib() → Library[source]

Get current library.

If there is no library set before, the default library will be retrieved. If the PAPIS_LIB environment variable is defined, this is the library name (or path) that will be taken as a default.

papis.config.get_libs() → list[str][source]: Get all the library names from the configuration file.

papis.config.get_libs_from_config(config: Configuration) → list[str][source]

Get all library names from the given configuration.

In the configuration file, any sections that contain a "dir" or a "dirs" key are considered to be libraries.

papis.config.reset_configuration() → Configuration[source]: Resets the existing configuration and returns a new one without any user settings.

papis.config.escape_interp(path: str) → str[source]

Escape paths added to the configuration file.

By default, the papis.config.Configuration enables string interpolation in the key values (e.g. using key = %(other_key)s-suffix)). Any paths added to the configuration should then be escaped so that they do not interfere with the interpolation.

`papis.database`

papis.database.get_database(library_name: str | None = None) → Database[source]

Get the database for the library library_name.

If library_name is None, then the current database is retrieved from papis.config.get_lib(). The given library name must exist in the configuration file or it should be a path to a directory containing Papis documents (see papis.config.get_lib_from_name()).

Returns:: the caching database for the given library. The same database is returned on repeated calls to this function.

papis.database.get_all_query_string() → str[source]: Get the default query string for the current database.

papis.database.clear_cached() → None[source]

Clear cached databases.

After this function is called, all subsequent calls to get_database() will recreate the database for the given library.

papis.database.base.get_cache_file_name(libpaths: str) → str[source]

Create a cache file name out of the path of a given directory.

Parameters:: libpaths – folder names to be used as a seed for the cache name.
Returns:: a name for the cache file specific to libpaths.

>>> get_cache_file_name('path/to/my/lib')
'a8c689820a94babec20c5d6269c7d488-lib'
>>> get_cache_file_name('papers')
'a566b2bebc62611dff4cdaceac1a7bbd-papers'

papis.database.base.get_cache_file_path(libpaths: str) → str[source]

Get the full path to the cache file.

Parameters:: libpaths – a cache file specific for the given library paths.

class papis.database.base.Database(library: Library | None = None)[source]

Abstract base class for Papis caching database backends.

abstractmethod get_backend_name() → str[source]

Get the name of the database backend.

This name has to match the one used in the configuration file in the database-backend setting.

abstractmethod get_cache_path() → str[source]: Get the path to the database cache file (or directory).

abstractmethod get_all_query_string() → str[source]: Get the default query string that will match all documents.

abstractmethod initialize() → None[source]

Initialize the caching database backend.

This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.

abstractmethod clear() → None[source]

Clear the database by removing all files and directories.

After clearing the database, calling initialize() may be necessary to ensure that it is in the correct state.

abstractmethod add(document: Document) → None[source]: Add a new document to the database.

abstractmethod update(document: Document) → None[source]: Replace an existing document in the database.

abstractmethod delete(document: Document) → None[source]: Remove a document from the database.

abstractmethod query(query_string: str) → list[Document][source]

Find a document in the database by the given query_string.

The query string can have a more complex syntax based on the database backend.

abstractmethod query_dict(query: dict[str, str]) → list[Document][source]: Find a document in the database that matches the keys in query.

abstractmethod get_all_documents() → list[Document][source]: Get all documents in the database.

find_by_id(identifier: str) → Document | None[source]: Find a document in the library by its Papis ID identifier.

maybe_compute_id(doc: Document) → None[source]

Compute a Papis ID for the document doc.

If the document already has an ID, then the document is skipped and the ID is not checked for duplicates. Otherwise a new unique ID is created and the document info.yaml is updated accordingly.

`papis.database.cache`

papis.database.cache.filter_documents(documents: list[Document], search: str = '') → list[Document][source]

Filter documents based on the search string.

Parameters:: search – a search string that will be parsed by parse_query.
Returns:: a list of filtered documents.

>>> document = papis.document.from_data({'author': 'einstein'})
>>> len(filter_documents([document], search="einstein")) == 1
True
>>> len(filter_documents([document], search="author : ein")) == 1
True
>>> len(filter_documents([document], search="title : ein")) == 1
False

papis.database.cache.match_document(document: Document, search: re.Pattern[str], match_format: AnyString | None = None, doc_key: str | None = None) → re.Match[str] | None[source]

Match a document’s keys to a given search pattern.

See MatcherCallable.

>>> from papis.docmatcher import get_regex_from_search as regex
>>> document = papis.document.from_data({'author': 'einstein'})
>>> match_document(document, regex('e in'), '{doc[author]}') is None
False
>>> match_document(document, regex('ee in'), '{doc[author]}') is None
True
>>> match_document(document, regex('einstein'), '{doc[title]}') is None
True

class papis.database.cache.PickleDatabase(library: Library | None = None)[source]

A caching database backend for Papis based on pickle.

get_backend_name() → str[source]

Get the name of the database backend.

This name has to match the one used in the configuration file in the database-backend setting.

get_cache_path() → str[source]: Get the path to the database cache file (or directory).

get_all_query_string() → str[source]: Get the default query string that will match all documents.

initialize() → None[source]

Initialize the caching database backend.

This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.

clear() → None[source]

Clear the database by removing all files and directories.

After clearing the database, calling initialize() may be necessary to ensure that it is in the correct state.

add(document: Document) → None[source]: Add a new document to the database.

update(document: Document) → None[source]: Replace an existing document in the database.

delete(document: Document) → None[source]: Remove a document from the database.

query(query_string: str) → list[Document][source]

Find a document in the database by the given query_string.

The query string can have a more complex syntax based on the database backend.

query_dict(query: dict[str, str]) → list[Document][source]: Find a document in the database that matches the keys in query.

get_all_documents() → list[Document][source]: Get all documents in the database.

`papis.database.whoosh`

This is the Whoosh interface to Papis.

For future Papis developers here are some considerations.

Whoosh works with 3 main objects, the Index, the Writer and the Schema. The indices are stored in a subfolder of get_cache_home(). The name of the indices folders is similar to the cache files of the papis cache database.

Once the Index is created in the mentioned folder, a Schema is initialized, which is a declaration of the data prototype of the database, or the definition of the table in SQL parlance. This is controlled by the Papis configuration through the whoosh-schema-prototype. For instance if the database is supposed to only contain the key fields [author, title, year, tags], then the whoosh-schema-prototype string should look like the following:

{
    "author": TEXT(stored=True),
    "title": TEXT(stored=True),
    "year": TEXT(stored=True),
    "tags": TEXT(stored=True),
}

where all the fields are explained in the Whoosh documentation.

After this Schema is created, the folders of the library are traversed and the documents are added to the database. When adding documents, only the keys in the schema are stored. This means that, e.g., if publisher is not in the schema you will not be able to search for the publisher through a query.

papis.database.whoosh.WHOOSH_FOLDER_FIELD = 'papis-folder': Field name used to store the document main folder the the Whoosh database.

class papis.database.whoosh.WhooshDatabase(library: Library | None = None)[source]

get_backend_name() → str[source]

Get the name of the database backend.

This name has to match the one used in the configuration file in the database-backend setting.

get_cache_path() → str[source]: Get the path to the database cache file (or directory).

get_all_query_string() → str[source]: Get the default query string that will match all documents.

initialize() → None[source]

Initialize the caching database backend.

This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.

clear() → None[source]

Clear the database by removing all files and directories.

After clearing the database, calling initialize() may be necessary to ensure that it is in the correct state.

add(document: Document) → None[source]: Add a new document to the database.

update(document: Document) → None[source]: Replace an existing document in the database.

delete(document: Document) → None[source]: Remove a document from the database.

query(query_string: str) → list[Document][source]

Find a document in the database by the given query_string.

The query string can have a more complex syntax based on the database backend.

query_dict(query: dict[str, str]) → list[Document][source]: Find a document in the database that matches the keys in query.

get_all_documents() → list[Document][source]: Get all documents in the database.

`papis.docmatcher`

class papis.docmatcher.ParseResult(search: str, pattern: re.Pattern[str], doc_key: str | None)[source]

Result from parsing a search string using parse_query().

For example, a search string such as "author:einstein" will result in:

r = ParseResult(search="einstein", pattern=<...>, doc_key="author")

search: str: A search string that was matched for this result.

pattern: Pattern[str]: A regex pattern constructed from the search using get_regex_from_search().

doc_key: str | None: A document key that was matched for this result, if any.

class papis.docmatcher.MatcherCallable(*args, **kwargs)[source]

A callable typing.Protocol used to match a document for a given search.

__call__(document: Document, search: re.Pattern[str], match_format: AnyString | None = None, doc_key: str | None = None) → Any[source]

Match a document’s keys to a given search pattern.

The matcher can decide whether the match_format or the doc_key take priority when matching against the given pattern in search. If possible, doc_key should be given priority as the more specific choice.

Parameters:

search – a regex pattern to match the query against (see ParseResult.pattern).
match_format – a format pattern (see papis.format.format()) to match against.
doc_key – a specific key in the document to match against.

Returns:

None if the match fails and anything else otherwise.

class papis.docmatcher.DocumentMatcher(search: str, query: list[ParseResult], match_format: FormatPattern, matcher: MatcherCallable)[source]

A class that can be used to match documents to a query.

__call__(doc: Document) → Document | None[source]: Use the stored query to match the document.

search: str: Initial search string used for the matcher.

query: list[ParseResult]: The query resulting from parse_query().

match_format: FormatPattern: A format that is used to match a document against.

matcher: MatcherCallable: A callable used to match a document to the query using the match_format.

papis.docmatcher.make_document_matcher(search: str, *, matcher: MatcherCallable | None = None, match_format: AnyString | None = None) → Callable[[Document], Document | None][source]

Create a callable that can be used to match documents against the given search query.

>>> from papis.document import from_data
>>> doc = from_data({'title': 'einstein'})
>>> matcher = make_document_matcher('einste')
>>> matcher(doc) is not None
True
>>> matcher = make_document_matcher('heisenberg')
>>> matcher(doc) is not None
False
>>> matcher = make_document_matcher('title : ein')
>>> matcher(doc) is not None
True

Parameters:

matcher – a callable used to match the documents. This defaults to match_document().
match_format – a format used to match against the query. This defaults to match-format.

papis.docmatcher.get_regex_from_search(search: str) → Pattern[str][source]

Creates a default regex from a search string.

>>> get_regex_from_search(' ein 192     photon').pattern
'.*ein.*192.*photon.*'
>>> get_regex_from_search('{1234}').pattern
'.*\\{1234\\}.*'

Parameters:: search – a valid search string.
Returns:: a regular expression representing the search string, which is properly escaped and allows for multiple spaces.

papis.docmatcher.parse_query(query_string: str) → list[ParseResult][source]

Parse a query string using pyparsing.

The query language implemented by this function for Papis supports strings of the form:

'hello author : Einstein    title: "Fancy Title: Part 1" tags'

which will result in:

results = [
    ParseResult(search="hello", pattern=<...>, doc_key=None),
    ParseResult(search="Einstein", pattern=<...>, doc_key="author"),
    ParseResult(search="Fancy Title: Part 1", pattern=<...>, doc_key="title"),
    ParseResult(search="tags", pattern=<...>, doc_key=None),
]

We can see there that constructs of the form "key:value" with the colon as a separator are recognized and parsed to document keys with the color. They can be escaped by enclosing them in quotes. Otherwise, each individual word in the search query will give another ParseResult. Each search term can contain additional regex characters.

>>> print(parse_query('hello author : einstein'))
[['hello'], ['author', 'einstein']]
>>> print(parse_query(''))
[]
>>> print(            parse_query(                '"hello world whatever :" tags : \'hello ::::\''))
[['hello world whatever :'], ['tags', 'hello ::::']]
>>> print(parse_query('hello'))
[['hello']]

Parameters:: query_string – a search string to parse into a structured format.
Returns:: a list of parsing results for each token in the query string.

`papis.document`

Module defining the main document type.

papis.document.DocumentLike: TypeAlias = 'Document | dict[str, Any]': A union of types that can be converted to a document.

class papis.document.KeyConversion[source]

A dict that contains a key and an action.

key: str | None: Name of a key in a foreign dictionary to convert.

action: Callable[[Any], Any] | None: Action to apply to the value at key for pre-processing.

papis.document.EmptyKeyConversion = {'action': None, 'key': None}: A default KeyConversion.

class papis.document.KeyConversionPair(from_key, rules)[source]

from_key: str: A string denoting the key in the input data.

rules: list[KeyConversion]: A list of KeyConversion key mapping rules used to rename and post-process the from_key and its value.

papis.document.keyconversion_to_data(conversions: Sequence[KeyConversionPair], data: dict[str, Any], keep_unknown_keys: bool = False) → dict[str, Any][source]

Function to convert between dictionaries.

This can be used to define a fixed set of translation rules between, e.g., JSON data obtained from a website API and standard papis key names and formatting. The implementation is completely generic.

For example, we have the simple dictionary:

data = {"id": "10.1103/physrevb.89.140501"}

which contains the DOI of a document with the wrong key. We can then write the following rules:

conversions = [
    KeyConversionPair("id", [
        {"key": "doi", "action": None},
        {"key": "url": "action": lambda x: "https://doi.org/{}".format(x)}
    ])
]

new_data = keyconversion_to_data(conversions, data)

to rename the "id" key to the standard "doi" key used by papis and a URL. Any number of such rules can be written, depending on the complexity of the incoming data. Note that any errors raised on the application of the action will be silently ignored and the corresponding key will be skipped.

Parameters:

conversions – a sequence of KeyConversionPairs used to convert the data.
data – a dict to be convert according to conversions.
keep_unknown_keys – if True unknown keys from data are kept in the resulting dictionary. Otherwise, only keys from conversions are present.

Returns:

a new dict containing the entries from data converted according to conversions.

papis.document.author_list_to_author(data: dict[str, Any], separator: str | None = None, multiple_authors_format: AnyString | None = None) → str[source]

Convert a list of authors into a single author string.

This uses the multiple-authors-separator and the multiple-authors-format settings to construct the concatenated authors.

Parameters:: data – a dict that contains an "author_list" key to be converted into a single author string.

>>> author1 = {"given": "Some", "family": "Author"}
>>> author2 = {"given": "Other", "family": "Author"}
>>> author_list_to_author({"author_list": [author1, author2]})
'Author, Some and Author, Other'

papis.document.guess_authors_separator(authors: str) → str[source]

Attempt to determine the separator for various non-BibTeX author lists.

Parameters:: authors – author string to determine the separator for.
Returns:: a regex that can be used to split the authors string.

For example:

>>> s = "Sanger, F. and Nicklen, S. and Coulson, A. R."
>>> assert guess_authors_separator(s) == "and"
>>> s = "Fabian Sanger and Steven Nicklen and Alexander R. Coulson"
>>> assert guess_authors_separator(s) == "and"
>>> s = "Fabian Sanger, Steven Nicklen, Alexander R. Coulson"
>>> assert guess_authors_separator(s) == ","
>>> s = "Fabian Sanger, and Steven Nicklen, and Alexander R. Coulson"
>>> import re
>>> sep = guess_authors_separator(s)
>>> assert re.match(sep, ", and")
>>> s = "Dagobert Duck and von Beethoven, Ludwig and Ford, Jr., Henry"
>>> assert guess_authors_separator(s) == "and"
>>> s = "Turing, A. M."
>>> assert guess_authors_separator(s) == "and"

papis.document.split_author_name(author: str) → dict[str, Any][source]

Split an author name into a given and family name.

This uses bibtexparser.customization.splitname() to correctly split and determine the first and last names of an author in the list. Note that this is just a heuristic and can give incorrect results for certain author names.

Parameters:: author – a string containing an author name.
Returns:: a dict with the family and given name of the author.

papis.document.split_authors_name(authors: str | list[str], separator: str | None = None) → list[dict[str, Any]][source]

Convert list of authors to a fixed format.

Uses split_author_name() to construct the individual authors and the separator to split the authors in the list.

Parameters:

authors – a list of author names, where each entry can consists of multiple authors separated by separator.
separator – a separator for entries in authors that contain multiple authors. If None, a separator is guessed using guess_authors_separator().

class papis.document.DocHtmlEscaped(doc: Document)[source]

Small helper class to escape HTML elements in a document.

>>> DocHtmlEscaped(from_data({"title": '> >< int & "" "'}))['title']
'&gt; &gt;&lt; int &amp; &quot;&quot; &quot;'

class papis.document.Document(folder: str | None = None, data: dict[str, Any] | None = None)[source]

An abstract document in a papis library.

This class inherits from a standard dict and implements some additional functionality.

html_escape: A DocHtmlEscaped instance that can be used to escape keys in the document for use in HTML documents.

has(key: str) → bool[source]: Check if key is in the document.

copy() → Document[source]: Make a shallow copy of the Document.

set_folder(folder: str) → None[source]

Set the document’s main folder.

This also updates the location of the info file and other attributes. Note, however, that it will not load any data from the given folder even if it contains another info file (see from_folder() for this functionality).

Parameters:: folder – an absolute path to a new main folder for the document.

get_main_folder() → str | None[source]

Returns:: the root path in the filesystem where the document is stored, if any.

get_main_folder_name() → str | None[source]

Returns:: the folder name of the document, i.e. the basename of the path returned by get_main_folder().

get_info_file() → str[source]

Returns:: path to the info file, which can also be an empty string if no such file has been created.

get_files() → list[str][source]

Get the files linked to the document.

The files in a document are stored relative to its main folder. If no main folder is set on the document (see set_folder()), then this function will not return any files. To retrieve the relative file paths only, access doc["files"] directly.

Returns:: a list of absolute file paths in the document’s main folder, if any.

get_notes() → list[str][source]

Get all notes linked to the document.

Returns:: a list of absolute file paths in the document’s main folder, if any, similar to get_files().

save() → None[source]: Saves the current document fields into the info file.

load() → None[source]: Load information from the info file.

papis.document.from_data(data: dict[str, Any]) → Document[source]

Construct a Document from a dictionary.

Parameters:: data – a dictionary to be made into a new document.

papis.document.from_folder(folder_path: str) → Document[source]

Construct a Document from a folder.

Parameters:: folder_path – absolute path to a valid papis folder.

papis.document.to_json(document: Document) → str[source]

Export the document to JSON.

Returns:: a JSON string corresponding to all the entries in the document.

papis.document.to_dict(document: Document) → dict[str, Any][source]

Convert a document back into a standard dict.

Returns:: a dict corresponding to all the entries in the document.

papis.document.dump(document: Document) → str[source]

Dump the document into a string.

The format of the string is not fixed and is meant to be used to display the document entries in a consistent way across papis.

Returns:: a string containing all the entries in the document.

>>> doc = from_data({'title': 'Hello World'})
>>> dump(doc)
'title: Hello World'

papis.document.delete(document: Document) → None[source]

Delete a document from the filesystem.

This function delete the main folder of the document (recursively), but it does not delete the in-memory version of the document.

papis.document.describe(document: Document | dict[str, Any]) → str[source]

Returns:: a string description of the current document using document-description-format.

papis.document.move(document: Document, path: str) → None[source]

Move the document to a new main folder at path.

This supposes that the document exists in the location document.get_main_folder() and will change the folder in the input document as a result.

Parameters:: path – absolute path where the document should be moved to. This path is expected to not exist yet and will be created by this function.

>>> doc = from_data({'title': 'Hello World'})
>>> doc.set_folder('path/to/folder')
>>> import tempfile; newfolder = tempfile.mkdtemp()
>>> move(doc, newfolder)
Traceback (most recent call last):
...
FileExistsError: There is already...

papis.document.sort(docs: Sequence[Document], key: str, reverse: bool = False) → list[Document][source]

Sort a list of documents by the given key.

The sort is performed on the key with a priority given to the type of the value. If the key does not exist in the document, this is given the lowest priority and left at the end of the list.

Parameters:

docs – a sequence of documents.
key – a key in the documents by which to sort.
reverse – if True, the sorting is done in reverse order (descending instead of ascending).

Returns:

a list of documents sorted by key.

papis.document.new(folder_path: str, data: dict[str, Any], files: Sequence[str] | None = None) → Document[source]

Creates a complete document with data and existing files.

The document is saved to the filesystem at folder_path and all the given files are copied over to the main folder.

Parameters:

folder_path – a main folder for the document.
data – a dict with key and values to be used as metadata in the document.
files – a sequence of files to add to the document.

Raises:

FileExistsError – if folder_path already exists.

`papis.downloaders`

class papis.downloaders.WebImporter(uri: str = '')[source]

Importer that tries to get data and files from implemented downloaders.

This importer simply calls get_info_from_url() on the given URI.

classmethod match(uri: str) → Importer | None[source]

Check if the importer can process the given URI.

For example, an importer that supports links from arXiv can check that the given URI matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Importer object.

Parameters:: uri – An URI from which the document metadata should be retrieved.
Returns:: An importer instance if the match to the URI is successful or None otherwise.

fetch() → None[source]

Fetch metadata and files for the given uri.

This method calls fetch_data() and fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() → None[source]

Fetch metadata from the given uri.

The imported metadata is stored in ctx.

fetch_files() → None[source]

Fetch files from the given uri.

The imported files are stored in ctx.

class papis.downloaders.Downloader(uri: str = '', name: str = '', ctx: Context | None = None, expected_document_extension: str | Sequence[str] | None = None, cookies: dict[str, str] | None = None, priority: int = 1)[source]

A base class for downloader instances implementing common functionality.

In general, downloaders are expected to implement a subset of the methods below, depending on the generality. A simple downloader could only implement get_bibtex_url() and get_document_url().

expected_document_extension: A single extension or a list of extensions supported by the downloader. The extensions do not contain the leading dot, e.g. ["pdf", "djvu"].

priority: A priority given to the downloader. This is used when trying to automatically determine a preferred downloader for a given URL.

session: A requests.Session that is used for all the requests made by the downloader.

classmethod match(url: str) → Downloader | None[source]

Check if the downloader can process the given URL.

For example, an importer that supports links from the arXiv can check that the given URL matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Downloader object.

Parameters:: url – An URL where the document information should be retrieved from.
Returns:: A downloader instance if the match to the URL is successful or None otherwise.

fetch() → None[source]

Fetch metadata and files for the given uri.

This method calls Downloader.fetch_data() and Downloader.fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() → None[source]

Fetch metadata for the given URL.

The imported metadata is stored in ctx. To fetch the metadata, the following steps are followed

Call get_data() to import any scraped metadata.
Call get_bibtex_data() to import any metadata from BibTeX files available remotely.

Note that previous steps overwrite any information, i.e. the BibTeX data will take priority.

fetch_files() → None[source]

Fetch files from the given uri.

The imported files are stored in ctx. The file is downloaded with download_document() and stored as a temporary file.

get_bibtex_url() → str | None[source]

Returns:: an URL to a valid BibTeX file that can be used to extract metadata about the document.

get_bibtex_data() → str | None[source]

Get BibTeX data available at get_bibtex_url(), if any.

Returns:: a string containing the BibTeX data, which can be parsed.

download_bibtex() → None[source]

Download and store that BibTeX data from get_bibtex_url().

Use get_bibtex_data() to access the metadata from the BibTeX URL.

get_data() → dict[str, Any][source]

Retrieve general metadata from the given URL.

This function is meant to be as general as possible and should not contain data imported from BibTeX (use get_bibtex_data() instead). For example, this can be used for web scrapping or calling other website APIs to gather metadata about the document.

get_doi() → str | None[source]

Returns:: a DOI for the document, if any.

get_document_url() → str | None[source]

Returns:: a URL to a file that should be downloaded.

get_document_data() → bytes | None[source]

Get data for the downloaded file that is given by get_document_url().

Returns:: the bytes (stored in memory) for the downloaded file.

get_document_extension() → str[source]

Returns:: a guess for the extension of get_document_data(). This is based on filetype and uses magic file signatures to determine the type. If no guess is valid, an empty string is returned.

download_document() → None[source]

Download and store the file that is given by get_document_url().

Use get_document_data() to access the file binary contents.

check_document_format() → bool[source]

Check if the document downloaded by download_document() has a file type supported by the downloader.

If the downloader has no preferred type, then all files are accepted.

Returns:: True if the document has a supported file type and False otherwise.

papis.downloaders.get_available_downloaders() → list[type[Downloader]][source]: Get all declared downloader classes.

papis.downloaders.get_matching_downloaders(url: str) → list[Downloader][source]

Get downloaders matching the given url.

Parameters:: url – a URL to match.
Returns:: a list of downloaders (sorted by priority).

papis.downloaders.get_downloader_by_name(name: str) → type[Downloader][source]

Get a specific downloader by its name.

Parameters:: name – the name of the downloader. Note that this is the name of the entry point used to define the downloader. In general, this should be the same as its name, but this is not enforced.
Returns:: a downloader class.

papis.downloaders.get_info_from_url(url: str, expected_doc_format: str | None = None) → Context[source]

Get information directly from the given url.

Parameters:

url – the URL of a resource.
expected_doc_format – an expected document file type, that is used to override the file type defined by the chosen downloader.

papis.downloaders.download_document(url: str, expected_document_extension: str | None = None, cookies: dict[str, Any] | None = None, filename: str | None = None) → str | None[source]

Download a document from url and store it in a local file.

An appropriate filename is deduced from the HTTP response in most cases. If this is not possible, a temporary file is created instead. To ensure that the desired filename is chosen, provide the filename argument instead.

Parameters:

url – the URL of a remote file.
expected_document_extension – an expected file extension. If None, then an extension is guessed from the file contents or from the filename.
filename – a file name for the document, regardless of the given URL and extension.

Returns:

an absolute path to a local file containing the data from url.

`papis.exceptions`

This module implements custom exceptions used to make the code more readable.

exception papis.exceptions.UnexpectedSettingTypeError[source]: Exception raised when a configuration setting has an unexpected type.

exception papis.exceptions.DefaultSettingValueMissing(key: str)[source]: Exception raised when a configuration setting is missing and has no default value.

exception papis.exceptions.DocumentFolderNotFound(doc: str)[source]: Exception raised when a document has no main folder.

exception papis.exceptions.InvalidLibraryError[source]: Exception raised when a library is found to be invalid or in an invalid state.

exception papis.exceptions.MissingLibraryDirectoryError[source]: Exception raised when a library does not have ‘dir’ or ‘dirs’ set.

`papis.filetype`

class papis.filetype.DjVu[source]: Implements a custom DjVu type matcher for filetype.

papis.filetype.guess_content_extension(content: bytes) → str | None[source]

Guess the extension from (potential) file contents.

This method attempts to look at known file signatures to determine the file type. This is not always possible, as it is hard to determine a unique type.

Parameters:: content – contents of a file.
Returns:: an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.

papis.filetype.guess_document_extension(document_path: str) → str | None[source]

Guess the extension of a given file at document_path.

Parameters:: document_path – path to an existing file.
Returns:: an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.

papis.filetype.get_document_extension(document_path: str) → str[source]

Get an extension for the file at document_path.

This uses guess_document_extension() and returns a default extension “data” if no specific type can be determined from the file.

Parameters:: document_path – path to an existing file.
Returns:: an extension string.

`papis.format`

papis.strings.AnyString: A union of allowable formatting string types.

class papis.strings.FormatPattern(formatter: str | None, pattern: str)[source]

A tuple that defines a (formatter, string) pair.

In a configuration file, a format pattern can be defined as:

key = pattern
other_key.formatter = other_pattern

where the first key will use the default formatter and the second key will use the specified formatter. These keys can be read using papis.config.getformatpattern().

formatter: str | None: The formatter that should be used on the string pattern. If none is provided, the default formatter is used, as defined by formatter.

pattern: str: Pattern that should be evaluated by the formatter.

papis.format.FORMATTER_NAMESPACE_NAME = 'papis.format': Name of the entry point namespace for Formatter plugins.

exception papis.format.InvalidFormatterError[source]: Deprecated: Use papis.plugin.InvalidPluginTypeError instead.

exception papis.format.FormatFailedError[source]

An exception that is thrown when a format pattern fails to be interpolated.

This can happen due to lack of data (e.g. missing fields in the document) or invalid format patterns (e.g. passed to the wrong formatter).

class papis.format.Formatter[source]

A generic formatter that works on templated strings using a document.

name: ClassVar[str]: A name for the formatter.

format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) → str[source]

Parameters:

fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to format-doc-name.
default – an optional pattern to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.
additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

papis.format.get_available_formatters() → list[str][source]: Get a list of all the available formatter plugins.

papis.format.get_formatter_by_name(name: str) → Formatter[source]

Initialize and return a formatter plugin.

Parameters:: name – the name of the desired formatter.

papis.format.get_cached_formatter(name: str | None = None) → Formatter[source]

A cached variant of get_formatter_by_name().

Parameters:: name – the name of the desired formatter, by default this uses the value of formatter.

papis.format.format(fmt: AnyString, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) → str[source]

Format a string using the selected formatter.

This is the user-facing function that should be called when formatting a string. The formatters should not be called directly.

Arguments match those of Formatter.format().

class papis.format.python.PythonFormatter[source]

Construct a string using a PEP 3101 (str.format based) format pattern.

This formatter is named "python" and can be set using the formatter setting in the configuration file. The format pattern has access to the doc variable, that is always a Document. A pattern using this formatter can look like:

"{doc[year]} - {doc[author_list][0][family]} - {doc[title]}"

Note, however, that according to PEP 3101 some simple formatting is not possible. For example, the following is not allowed:

"{doc[title].lower()}"

and should be replaced with:

"{doc[title]!l}"

The following special conversions are implemented: “l” for str.lower(), “u” for str.upper(), “t” for str.title(), “c” for str.capitalize(), “y” that uses slugify (through papis.paths.normalize_path()). Additionally, the following syntax is available to select subsets from a string:

"{doc[title]:1.3S}"

which will select the words[1:3] from the title (words are split by single spaces).

name: ClassVar[str] = 'python': A name for the formatter.

format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) → str[source]

Parameters:

fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to format-doc-name.
default – an optional pattern to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.
additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

class papis.format.jinja.Jinja2Formatter[source]

Construct a string using Jinja2 templates.

This formatter is named "jinja2" and can be set using the formatter setting in the configuration file. The format pattern has access to the doc variable, that is always a Document. A pattern using this formatter can look like:

"{{ doc.year }} - {{ doc.author_list[0].family }} - {{ doc.title }}"

This formatter supports the whole range of Jinja2 control structures and filters so more advanced string processing is possible. For example, we can titlecase the title using:

"{{ doc.title | title }}"

or give a default value if a key is missing in the document using:

"{{ doc.isbn | default('ISBN-NONE', true) }}"

name: ClassVar[str] = 'jinja2': A name for the formatter.

env: ClassVar[Any] = None: The jinja2 Environment used by the formatter. This should be obtained with get_environment() (cached) and modified as required (e.g. by adding filters).

classmethod get_environment(*, force: bool = False) → Any[source]

Construct and cache the jinja2 environment used by the formatter.

The environment is created on the first call to format() and cached for future use. If it should be recreated after that, this function can be called with force set to True.

Parameters:: force – if True, the environment will be recreated.

format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) → str[source]

Parameters:

fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to format-doc-name.
default – an optional pattern to use as a default value if the formatting fails. If no default is given, a FormatFailedError will be raised.
additional – a dict of additional entries to pass to the formatter.

Returns:

a string with all the replacement fields filled in.

`papis.git`

This module serves as an lightweight interface for git related functions.

papis.git.init(path: str) → None[source]: Initialize a git repository at path.

papis.git.add(path: str, resource: str) → None[source]

Adds changes in the path to the git index with a message.

Parameters:

path – a folder with an existing git repository.
resource – a resource (e.g. info.yaml file) to add to the index.

papis.git.commit(path: str, message: str) → None[source]

Commits changes in the path with a message.

Parameters:

path – a folder with an existing git repository.
message – a commit message.

papis.git.mv(from_path: str, to_path: str) → None[source]

Renames (moves) the path from_path to to_path.

Parameters:

from_path – path to be moved (the source).
to_path – destination where from_path is moved. If this is in the same parent directory as from_path, it is a simple rename.

papis.git.remove(path: str, resource: str, recursive: bool = False, force: bool = True) → None[source]

Remove a resource from the git repository at path.

Parameters:

path – a folder with an existing git repository.
resource – a resource (e.g. info.yaml file) to remove from git.
recursive – if True, the given resource is removed recursively.
force – if True, the removal is forced so any errors (e.g. file does not exist) are silently ignored.

papis.git.add_and_commit_resource(path: str, resource: str, message: str) → None[source]

Adds and commits a single resource.

Parameters:

path – a folder with an existing git repository.
resource – a resource (e.g. info.yaml file) to remove from git.
message – a commit message.

papis.git.mv_and_commit_resource(from_path: str, to_path: str, message: str) → None[source]

Moves from_path and commits the change.

Parameters:

from_path – path to be moved (the source).
to_path – destination where from_path is moved.
message – a commit message.

papis.git.add_and_commit_resources(path: str, resources: Sequence[str], message: str) → None[source]

Add and commit multiple resources (see add_and_commit_resource()).

Note that a single commit message is generated for all the resources.

`papis.hooks`

papis.hooks.HOOKS_EXTENSION_FORMAT = 'papis.hook.{name}': Name format of the entrypoint group for hooks e.g. papis.hook.on_edit_done.

papis.hooks.CUSTOM_LOCAL_HOOKS: dict[str, list[Callable[..., None]]] = {}: A dictionary of hooks added with add(). These can be added in config.py or from other places that do not use the entrypoint framework.

papis.hooks.run(name: str, *args: Any, **kwargs: Any) → None[source]

Run a hook given by its name.

Additional positional and keyword arguments are passed directly to the hook. If it does not support these arguments, the hook will be skipped.

Hooks are run in the following order:

The hooks defined by an entry point.
The hooks defined in CUSTOM_LOCAL_HOOKS.

papis.hooks.add(name: str, fun: Callable[..., None]) → None[source]

Add an additional callback to the hook given by name.

Any new callbacks are appended to the list and will be applied after existing ones.

`papis.id`

papis.id.ID_KEY_NAME: str = 'papis_id': Key name used to store the Papis ID. This key name is reserved for use in Papis databases and documents. It can also change in the future, so it is recommended to use this variable instead of hardcoding the name.

papis.id.compute_an_id(doc: Document, separator: str | None = None) → str[source]

Make an ID for the input document doc.

This is a non-deterministic function if separator is None (a random value is used). For a given value of separator, the result is deterministic.

Parameters:

doc – a document for which to generate an ID.
separator – a string used to separate the document fields that go into constructing the ID.

Returns:

a (hexadecimal) ID for the document that is unique to high probability.

papis.id.key_name() → str[source]: Get Papis ID key name.

papis.id.get(doc: DocumentLike) → str[source]

Get the Papis ID from doc.

This function does additional checking on the ID and can raise an error if it does not exist. If the ID is known to exist, use ID_KEY_NAME directly.

`papis.importer`

papis.importer.IMPORTER_NAMESPACE_NAME = 'papis.importer': Name of the entry point namespace for Importer plugins.

class papis.importer.Context[source]

data: A dict of fields retrieved by the Importer. These are generally not processed.

files: A list of files retrieved by the Importer.

class papis.importer.Importer(uri: str = '', name: str = '', ctx: Context | None = None)[source]

A base class for Papis importer plugins.

name: str: A name given to the importer (that is not necessarily unique).

uri: str: The URI (Uniform Resource Identifier) that the importer is to extract data from. This can be an URL, a local or remote file name, an object identifier (e.g. DOI), etc.

ctx: Context: A Context that stores the data retrieved by the importer.

classmethod match(uri: str) → Importer | None[source]

Check if the importer can process the given URI.

For example, an importer that supports links from arXiv can check that the given URI matches using:

re.match(r".*arxiv.org.*", uri)

This can then be used to instantiate and return a corresponding Importer object.

Parameters:: uri – An URI from which the document metadata should be retrieved.
Returns:: An importer instance if the match to the URI is successful or None otherwise.

classmethod match_data(data: dict[str, Any]) → Importer | None[source]

Check if the importer can process the given metadata.

This method can be used to search for valid URIs inside the data that can then be processed by the importer. For example, if the metadata contains a DOI field, this can be used to import additional information.

Parameters:: data – A dict with metadata to inspect and match against.
Returns:: An importer instance if matching metadata is found or None otherwise.

fetch() → None[source]

Fetch metadata and files for the given uri.

This method calls fetch_data() and fetch_files() to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.

The imported data is stored in ctx and it is not queried again on subsequent calls to this function.

fetch_data() → None[source]

Fetch metadata from the given uri.

The imported metadata is stored in ctx.

fetch_files() → None[source]

Fetch files from the given uri.

The imported files are stored in ctx.

papis.importer.get_available_importers() → list[str][source]: Get a list of available importer names.

papis.importer.get_importer_by_name(name: str) → type[Importer][source]: Get an importer class by name.

papis.importer.get_matching_importers_by_name(name_and_uris: Iterable[tuple[str, str]], *, include_downloaders: bool = False) → list[Importer][source]

Get importers that match the given names.

This function tries to match the URI using match() for each importer in name_and_uris. All matching importers are then returned, but no data is fetched (see fetch_importers()).

Parameters:

name_and_uris – an iterable of (name, uri) tuples that describe the importer names and URIs to match them against.
include_downloaders – if True, downloader plugins are also included when matching the given names and URIs.

papis.importer.get_matching_importers_by_uri(uri: str, *, include_downloaders: bool = False) → list[Importer][source]

Get importers that match the given URI.

This function tries to match the URI using match() for all known importers. All matching importers are then returned, but no data is fetched (see fetch_importers()).

Parameters:: include_downloaders – if True, downloader plugins are also included when matching the given URI.

papis.importer.get_matching_importers_by_doc(doc: DocumentLike, *, include_downloaders: bool = False) → list[Importer][source]

Get importers that match the given document.

This function tries to match the document using match_data(). All matching importers are then returned, but no data is fetched (see fetch_importers()).

Parameters:

doc – a dictionary containing document metadata.
include_downloaders – if True, downloader plugins are also included when matching the given URI.

papis.importer.fetch_importers(importers: Iterable[Importer], *, download_files: bool = True) → list[Importer][source]

Fetch data from the given importers.

Parameters:: download_files – if True, importers also try to download files (PDFs, etc.) instead of just metadata.
Returns:: a list of importers that have not failed to fetch their metadata.

papis.importer.collect_from_importers(importers: Iterable[Importer], *, batch: bool = True, use_files: bool = True) → Context[source]

Collect all data from the given importers.

It is assumed that the importers have called the needed fetch methods, so all data has been downloaded and converted (see fetch_importers()). This function is meant to only do the aggregation.

Parameters:

batch – if True, overwrite data from previous importers, otherwise ask the user to manually merge. Note that files are always kept, even if they potentially contain duplicates.
use_files – if True, both metadata and files are collected from the importers.

papis.importer.get_importers() → list[type[Importer]][source]: Get a list of available importer classes.

`papis.library`

class papis.library.Library(name: str, paths: Sequence[str])[source]

A class containing library information.

name: str: The name of the library, as it appears in the configuration file if defined there.

paths: list[str]: A list of paths with documents that form the library.

path_format() → str[source]

Returns:: a string containing all the paths in the library concatenated using a colon.

papis.library.from_paths(paths: Sequence[str]) → Library[source]: Create a library from a list of paths.

`papis.logging`

class papis.logging.ColoramaFormatter(log_format: str, full_tb: bool = False)[source]

A custom logging formatter that uses colorama.

full_tb: bool: A flag to denote whether a full traceback should be displayed when used with logger.info(..., exc_info=ext).

formatException(exc_info: tuple[Any, ...]) → str[source]

Format and return the specified exception information as a string.

If full_tb is True, then the full traceback is shown. Otherwise, a short inline description is given.

format(record: LogRecord) → str[source]

Format the specified record as text.

This adds color coding to the logging levels, includes the exception into the message, removes the papis namespace from the name, etc. Any formatting of the logging output is made here.

papis.logging.quiet(name: str, level: int = 30) → Iterator[None][source]: Temporarily sets the logging in the given module to WARNING.

Set up formatting and handlers for the root level Papis logger.

Parameters:

level – default logging level (see logging). By default, this takes values from the PAPIS_LOG_LEVEL environment variable and falls back to "INFO".
color – flag to control logging colors. It should be one of ("always", "auto", "no"). By default, this takes values from the PAPIS_LOG_COLOR environment variable and falls back to "auto".
logfile – a path for a file in which to write log messages. By default, this takes values from the PAPIS_LOG_FILE environment variable and falls back to None.
verbose – make logger verbose (including debug information) regardless of the level. By default, this takes values from the PAPIS_DEBUG environment variable and falls back to False.

Reset the root level Papis logger.

This function removes all the custom handlers and resets the logger before calling setup().

papis.logging.get_logger(name: str | None = None) → Logger[source]

Get a logger instance for the given name under the papis namespace.

Parameters:: name – the provisional name of the logger instance.
Returns:: a logging.Logger under the papis namespace, i.e. with a name such as papis.<name>.

`papis.notes`

This module controls the notes for every Papis document.

papis.notes.has_notes(doc: Document) → bool[source]: Checks if the document has notes.

papis.notes.notes_path(doc: Document) → str[source]

Get the path to the notes file corresponding to doc.

If the document does not have attached notes, a filename is constructed (using the notes-name setting) in the document’s main folder.

Returns:: a absolute filename that corresponds to the attached notes for doc (this file does not necessarily exist).

papis.notes.notes_path_ensured(doc: Document) → str[source]

Get the path to the notes file corresponding to doc or create it if it does not exist.

If the notes do not exist, a new file is created using notes_path() and filled with the contents of the template given by the notes-template configuration option.

Returns:: an absolute filename that corresponds to the attached notes for doc.

`papis.paths`

papis.paths.PathLike: TypeAlias = pathlib.Path | str: A union type for allowable paths.

papis.paths.unique_suffixes(chars: str | None = None, skip: int = 0) → Iterator[str][source]

Creates an infinite list of suffixes based on chars.

This creates a generator object capable of iterating over lists to create unique products of increasing cardinality (see here). This is mainly intended to create suffixes for existing strings, e.g. file names, to ensure uniqueness.

Parameters:

chars – list to iterate over
skip – number of suffices to skip (negative integers are set to 0).

>>> import string
>>> s = unique_suffixes(string.ascii_lowercase)
>>> next(s)
'a'
>>> s = unique_suffixes(skip=3)
>>> next(s)
'd'

papis.paths.normalize_path(path: str, *, lowercase: bool | None = None, extra_chars: str | None = None, separator: str | None = None) → str[source]

Clean a path to only contain visible ASCII characters.

This function will create ASCII strings that can be safely used as file names or printed to consoles that do not necessarily support full unicode.

Parameters:

lowercase – if True, the resulting string will always be lowercased (defaults to doc-paths-lowercase).
extra_chars – extra characters that are allowed in the output path besides the default ASCII alphanumeric characters (defaults to doc-paths-extra-chars).
separator – word separator used to replace any non-allowed characters in the path (defaults to doc-paths-word-separator).

Returns:

a cleaned ASCII string.

papis.paths.is_relative_to(path: Path | str, other: Path | str) → bool[source]

Check if paths are relative to each other.

This is equivalent to pathlib.PurePath.is_relative_to().

Returns:: True if path is relative to the other path.

papis.paths.symlink(src: Path | str, dst: Path | str) → None[source]

Create a symbolic link pointing to src named dst.

This is a simple wrapper around os.symlink() that attempts to give better error messages on different platforms. For example, it offers suggestions for some missing privilege issues.

Parameters:

src – the existing file that dst points to.
dst – the name of the new symbolic link, pointing to src.

papis.paths.get_document_file_name(doc: DocumentLike, orig_path: PathLike, suffix: str = '', *, file_name_format: AnyString | Literal[False] | None = None, base_name_limit: int = 150) → str[source]

Generate a file name based on orig_path for the document doc.

This function will generate a file name for the given file path (that does not necessarily exist) based on the document data. If the document data does not provide the necessary keys for file_name_format, then the original path will be preserved.

If resulting path will have the same extension as orig_path and will be modified by normalize_path(). The extension is determined using get_document_extension().

Parameters:

orig_path – an input file path
suffix – a suffix to be appended to the end of the new file name.
file_name_format – a format pattern used to construct a new file name from the document data (see papis.format.format()). This value defaults to add-file-name if not provided.
base_name_limit – a maximum character length of the file name. This is important on operating systems of filesystems that do not support long file names.

Returns:

a new path based on the document data and the orig_path.

papis.paths.get_document_folder(doc: DocumentLike, dirname: PathLike, *, folder_name_format: AnyString | None = None) → str[source]

Generate a folder name for the document at dirname.

This function uses add-folder-name to generate a folder name for the doc at dirname. If no folder can be constructed from the format, then the document’s papis_id is used instead as a subfolder of dirname. The papis_id is guaranteed to be unique.

Parameters:

doc – the document used on the folder_name_format.
dirname – the base directory in which to generate the document main folder.
folder_name_format – a format to use for the folder name that will be filled in using the given doc. If no format is given, we default to add-folder-name. This format can have additional subfolders.

Returns:

a folder name for doc with the root at dirname.

papis.paths.get_document_unique_folder(doc: DocumentLike, dirname: PathLike, *, folder_name_format: AnyString | None = None) → str[source]

A wrapper around get_document_folder() that ensures that the folder is unique by adding suffixes.

Returns:: a folder name for doc with the root at dirname that does not yet exist on the filesystem.

papis.paths.download_remote_files(in_document_paths: Iterable[str]) → list[str | None][source]

Download all remote filepaths that are provided in the document list.

Parameters:: in_document_paths – a list of filename paths and URLs.
Returns:: a list of files, where each remote file is replaced with a temporary local file. If there is an error while downloading the remote file, None is used instead.

papis.paths.rename_document_files(doc: DocumentLike, in_document_paths: Iterable[str], *, allow_remote: bool | None = None, file_name_format: AnyString | Literal[False] | None = None) → list[str][source]

Rename in_document_paths according to file_name_format and ensure uniqueness.

Uniqueness is required with respect to the files in in_document_paths and those in the doc itself (under the files key). If a repeated file name is found, a suffix is generated using unique_suffixes() and appended to the new file.

Parameters:

file_name_format – a format pattern used to construct a new file name from the document data (see papis.format.format()). This value defaults to add-file-name if not provided.
allow_remote – if True, in_document_paths can also be remote URL, that will be downloaded to local files.

Returns:

a list of modified file names form in_document_paths that are renamed based on file_name_format and suffixed for uniqueness.

`papis.pick`

papis.pick.PICKER_NAMESPACE_NAME = 'papis.picker': Name of the entry point namespace for Picker plugins.

class papis.pick.T

Invariant TypeVar with no bounds.

alias of TypeVar(‘T’)

class papis.pick.Picker[source]

An interface used to select items from a list.

abstractmethod __call__(items: Sequence[T], header_filter: Callable[[T], str], match_filter: Callable[[T], str], default_index: int = 0) → list[T][source]

Parameters:

items – a sequence of items from which to pick a subset.
header_filter – a callable that takes an item from items and returns a string representation shown to the user.
match_filter – a callable that takes an item from items and returns a string representation that is used when searching or filtering the items.
default_index – sets the selected item when the picker is first shown to the user.

Returns:

a subset of items that were picked.

papis.pick.get_available_pickers() → list[str][source]: Gets all registered pickers.

papis.pick.get_picker_by_name(name: str) → type[Picker[Any]][source]

Get a picker by its plugin name.

Parameters:: name – the name of an entrypoint to load a Picker plugin from.
Returns:: a Picker subclass implemented in the plugin.

papis.pick.pick(items: Sequence[T], header_filter: Callable[[T], str] = <class 'str'>, match_filter: Callable[[T], str] = <class 'str'>, default_index: int = 0, *, picktool: str | None = None) → list[T][source]

Load a Picker plugin and select a subset of items.

The arguments to this function match those of Picker.__call__(). The specific picker is chosen through the picktool configuration option.

Returns:: a subset of items that were picked.

papis.pick.pick_doc(documents: Sequence[Document], *, header_format_file: str | None = None, header_format: AnyString | None = None, match_format: AnyString | None = None) → list[Document][source]

Pick from a sequence of documents using pick().

This function uses the header-format-file setting or, if not available, the header-format setting to construct a header_filter for the picker. It also uses the configuration setting match-format to construct a match_filter. These configuration settings can also be passed by argument.

Parameters:: documents – a sequence of documents.
Returns:: a subset of documents that was picked.

papis.pick.pick_subfolder_from_lib(libname: str) → list[str][source]

Pick subfolders from all existing subfolders in lib.

Note that this includes document folders in lib as well nested library folders.

Parameters:: libname – the name of an existing library to search in.
Returns:: a subset of the subfolders in the library.

papis.pick.pick_library(libs: list[str] | None = None, *, header_format: AnyString | None = None) → list[str][source]

Pick a library from the current configuration.

Parameters:: libs – a list of libraries to pick from.

`papis.plugin`

exception papis.plugin.PluginError[source]: A generic error raised by the plugin loader.

exception papis.plugin.PluginNotFoundError(namespace: str, name: str)[source]: An error raised when a plugin is not found.

exception papis.plugin.InvalidPluginTypeError(namespace: str, name: str)[source]: An error raised when the plugin is not the expected type.

papis.plugin.get_entrypoints(namespace: str) → list[EntryPoint][source]

Returns:: a list of available entrypoints in the given namespace.

papis.plugin.get_entrypoint_by_name(namespace: str, name: str) → EntryPoint | None[source]

Get the entrypoint name from the given namespace.

If no such entrypoint exists, then None is returned. To load the plugin defined by the entrypoint, use Entrypoint.load.

papis.plugin.get_plugin_names(namespace: str) → list[str][source]

Returns:: a list of available entrypoint names in the given namespace.

papis.plugin.get_plugins(namespace: str) → dict[str, Any][source]: Load all available plugins from namespace.

papis.plugin.get_plugin_by_name(namespace: str, name: str) → Any[source]: Load a single plugin named name from namespace.

`papis.sphinx_ext`

A collection of Papis-specific Sphinx extensions.

This can be included directly into the conf.py file as a normal extension, i.e.:

extensions = [
    ...,
    "papis.sphinx_ext",
]

It will include a custom CustomClickDirective for documenting papis commands and a PapisConfig directive for documenting Papis configuration values.

These are included by default when adding it to the extensions list in your Sphinx configuration.

class papis.sphinx_ext.CustomClickDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

A custom sphinx_click.ClickDirective that removes the automatic title from the generated documentation. Otherwise it can be used in the exact same way, e.g.:

.. click:: papis.commands.add:cli
    :prog: papis add

class papis.sphinx_ext.PapisConfig(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]

A directive for describing Papis configuration values.

The directive is given as:

.. papis-config:: config-value-name

and has the following optional arguments:

:section:: The section in which the configuration value is given. The section defaults to get_general_settings_name().
:type:: The type of the configuration value, e.g. a string or an integer. If not provided, the type of the default value is used.
:default:: The default value for the configuration value. If not provided, this is taken from the default Papis settings.

It can be used as:

.. papis-config:: info-file
    :default: info.yml
    :type: str
    :section: settings

    This is the file name for where the document metadata should be
    stored. It is a relative path in the document's main folder.

In text, these configuration values can be referenced using standard role references, e.g.:

The document metadata is found in its :confval:`info-file`.

has_content: ClassVar[bool] = True: The directive can have a longer description.

optional_arguments: ClassVar[int] = 3: Number of optional arguments to the directive.

required_arguments: ClassVar[int] = 1: Number of required arguments to the directive.

option_spec: ClassVar[dict[str, Callable[[str], Any]]] = {'default': <class 'str'>, 'section': <class 'str'>, 'type': <class 'str'>}: A description of the arguments, mapping names to validator functions.

papis.sphinx_ext.make_link_resolve(github_project_url: str, revision: str) → Callable[[str, dict[str, Any]], str | None][source]

Create a function that can be used with sphinx.ext.linkcode.

This can be used in the conf.py file as:

linkcode_resolve = make_link_resolve("https://github.com/papis/papis", "main")

Parameters:

github_project_url – the URL to a GitHub project to which to link.
revision – the revision to which to point to, e.g. main.

papis.sphinx_ext.process_autodoc_missing_reference(app: Sphinx, env: BuildEnvironment, node: pending_xref, contnode: TextElement) → TextElement | None[source]

Fix missing references due to string annotations.

This uses an alias dictionary called papis_missing_reference_aliases that maps each unknown type to a reference type and actual type full name. For example, say that the Document reference is not recognized properly. We know that this object is in the papis.document module as a class. Then, we write:

papis_missing_reference_aliases: dict[str, str] = {
    "Document": "py:class:papis.document.Document",
}

`papis.testing`

papis.testing.create_random_file(filetype: str | None = None, prefix: str | None = None, suffix: str | None = None, dir: str | None = None) → str[source]

Create a random file with the correct magic signature.

This function creates random empty files that can be used for testing. It supports creating PDF, EPUB, DjVu or simple text files. These are constructed in such a way that they are recognized by papis.filetype.guess_content_extension().

Parameters:

filetype – the desired filetype of the result, which can be one of ("pdf", "epub", "djvu", "text").
prefix – a prefix passed to tempfile.NamedTemporaryFile().
suffix – a suffix passed to tempfile.NamedTemporaryFile().
dir – a base directory passed to tempfile.NamedTemporaryFile().

papis.testing.populate_library(libdir: str) → None[source]

Add temporary documents with random files into the folder libdir.

Parameters:: libdir – an existing empty library directory.

class papis.testing.TemporaryConfiguration(prefix: str = 'papis-test-', settings: dict[str, Any] | None = None, overwrite: bool = False)[source]

A context manager used to create a temporary papis configuration.

This configuration is created in a temporary directory and all the required paths are set to point to that directory (e.g. XDG_CONFIG_HOME and XDG_CACHE_HOME). This is meant to be used by tests to create a default environment in which to run.

It can be used in the standard way as:

# Set the configuration option `picktool`
papis.config.set("picktool", "fzf")

with TemporaryConfiguration() as config:
    # In this block, it is back to its default value
    value = papis.config.get("picktool")
    assert value == "papis"

libname: ClassVar[str] = 'test': Name of the default library

settings: dict[str, Any] | None: A set of settings to be added to the configuration on creation

overwrite: bool: If True, any configuration settings are overwritten by settings.

libdir: str: When entering the context manager, this will contain the directory of a temporary library to run tests on. The library is unpopulated by default

configdir: str: When entering the context manager, this will contain the config directory used by Papis.

configfile: str: When entering the context manager, this will contain the config file used by Papis.

prefix: Prefix for the temporary directory created for the test.

property tmpdir: str: Base temporary directory name.

create_random_file(filetype: str | None = None, prefix: str | None = None, suffix: str | None = None) → str[source]: Create a random file in the tmpdir using create_random_file.

class papis.testing.TemporaryLibrary(settings: dict[str, Any] | None = None, use_git: bool = False, populate: bool = True)[source]

A context manager used to create a temporary papis configuration with a library.

This extends TemporaryConfiguration with more support for creating and maintaining a temporary library. This can be used by tests that specifically require handling documents in a library.

use_git: If True, a git repository is created in the library directory.

populate: If True, the library is prepopulated with a set of documents that contain random files and keys, which can be used for testing.

class papis.testing.PapisRunner(**kwargs: Any)[source]

A wrapper around click.testing.CliRunner.

invoke(cli: click.Command, args: Sequence[str], **kwargs: Any) → click.testing.Result[source]: A simple wrapper around the click.testing.CliRunner.invoke() method that does not catch exceptions by default.

class papis.testing.ResourceCache(cachedir: str)[source]

A class that handles retrieving local and remote resources for tests from default folders.

This class mainly exists to test importers and downloaders that require getting a remote resource and testing it against results of the papis converters.

It can be controlled by the PAPIS_UPDATE_RESOURCES environment variable, which takes the values:

"none": no resources are downloaded or updated (default).
"remote": remote resources are downloaded and the on-disk files are updated (used in get_remote_resource()).
"local": local resources are updated with the results of the papis conversion (used in get_local_resource()).
"both": both local and remote resources are updated.

Resources can then be retrieved as:

# Call some function that retrieves and converts remote data
local = papis.arxiv.get_data(...)

# Check that the expected cached resource matches the result
expected_local = cache.get_local_resource("resources/test.json", local)
assert local == expected_local

cachedir: The location of the resource directory.

session: A requests.Session used to download remote resources.

get_remote_resource(filename: str, url: str, force: bool = False, params: dict[str, str] | None = None, headers: dict[str, str] | None = None, cookies: dict[str, str] | None = None) → bytes[source]

Retrieve a remote resource from the resource cache.

If force is True, the filename does not exist or PAPIS_UPDATE_RESOURCES is set to ("remote", "both"), then the resource is downloaded from the remote location at url. Otherwise, it is retrieved from the locally cached version at filename.

Parameters:

filename – a file where to store the remote resource.
url – a remote URL from which to retrieve the resource.
force – if True, force updating the resource cached at filename.
params – additional params passed to requests.get().
headers – additional headers passed to requests.get().
cookies – additional cookies passed to requests.get().

get_local_resource(filename: str, data: Any, force: bool = False) → Any[source]

Retrieve a local resource from the resource cache.

If force is True, the filename does not exist or PAPIS_UPDATE_RESOURCES is set to ("local", "both"), then the local resource is updated using data. Otherwise, it is retrieved from the locally cached version at filename.

Parameters:

filename – a file where to store the local resource.
data – data that should be retrieve from the resource.
force – if True, force updating the resource cached at filename.

papis.testing.tmp_config(request: SubRequest) → Iterator[TemporaryConfiguration][source]

A fixture that creates a TemporaryConfiguration.

Additional keyword arguments can be passed using the config_setup marker:

@pytest.mark.config_setup(overwrite=True)
def test_me(tmp_config: TemporaryConfiguration) -> None:
    ...

papis.testing.tmp_library(request: SubRequest) → Iterator[TemporaryLibrary][source]

A fixture that creates a TemporaryLibrary.

Additional keyword arguments can be passed using the library_setup marker:

@pytest.mark.library_setup(use_git=False)
def test_me(tmp_library: TemporaryLibrary) -> None:
    ...

papis.testing.resource_cache(request: SubRequest) → ResourceCache[source]

A fixture that creates a ResourceCache.

Additional keyword arguments can be passed using the resource_setup marker

@pytest.mark.resource_setup(cachedir="resources")
def test_me(resource_cache: ResourceCache) -> None:
    ...

`papis.utils`

class papis.utils.A

Invariant typing.TypeVar

alias of TypeVar(‘A’)

class papis.utils.B

Invariant typing.TypeVar

alias of TypeVar(‘B’)

papis.utils.get_session() → requests.Session[source]

Create a requests.Session for papis.

This session has the expected User-Agent (see user-agent), proxy (see downloader-proxy) and other settings used for papis. It is recommended to use it instead of creating a requests.Session at every call site.

papis.utils.parmap(f: Callable[[A], B], xs: Iterable[A], np: int | None = None) → list[B][source]

Apply the function f to all elements of xs.

When available, this function uses the multiprocessing module to apply the function in parallel. This can have a noticeable performance impact when the number of elements of xs is large, but can also be slower than a sequential map().

The number of processes can also be controlled using the PAPIS_NP environment variable. Setting this variable to 0 will disable the use of multiprocessing on all platforms.

Parameters:

f – a callable to apply to a list of elements.
xs – an iterable of elements to apply the function f to.
np – number of processes to use when applying the function f in parallel. This value defaults to PAPIS_NP or os.cpu_count().

papis.utils.run(cmd: Sequence[str], wait: bool = True, env: dict[str, Any] | None = None, cwd: str | None = None) → None[source]

Run a given command with subprocess.

This is a simple wrapper around subprocess.Popen with custom defaults used to call papis commands.

Parameters:

cmd – a sequence of arguments to run, where the first entry is expected to be the command name and the remaining entries its arguments.
wait – if True wait for the process to finish, otherwise detach the process and return immediately.
env – a mapping that defines additional environment variables for the child process.
cwd – current working directory in which to run the command.

papis.utils.general_open(file_name: str, key: str, default_opener: str | None = None, wait: bool = True) → None[source]

Open a file with a configured open tool (executable).

Parameters:

file_name – a file path to open.
key – a key in the configuration file to determine the opener used, e.g. opentool.
default_opener – an existing executable that can be used to open the file given by file_name. By default, the opener given by key, if any, or the default papis opener are used.
wait – if True wait for the process to finish, otherwise detach the process and return immediately.

papis.utils.open_file(file_path: str, wait: bool = True) → None[source]

Open file using the configured opentool.

Parameters:

file_path – a file path to open.
wait – if True wait for the process to finish, otherwise detach the process and return immediately.

papis.utils.get_folders(folder: str) → list[str][source]

Get all folders with papis documents inside of folder.

This is the main indexing routine. It looks inside folder and crawls the whole directory structure in search of subfolders containing an info file. The name of the file must match the configured info-name.

Parameters:: folder – root folder to look into.
Returns:: List of folders containing an info file.

papis.utils.locate_document_in_lib(document: Document, library: str | None = None, *, unique_document_keys: list[str] | None = None) → Document[source]

Locate a document in a library.

This function falls back to unique-document-keys to determine if the current document matches any document in the library. The first document for which one of the keys in the list matches exactly will be returned.

Parameters:

library – the name of a valid Papis library.
unique_document_keys – a list of keys to match when locating a document.

Returns:

a full document as found in the library.

Raises:

IndexError – No document found in the library.

papis.utils.locate_document(document: Document, documents: Iterable[Document]) → Document | None[source]

Locate a document in a list of documents.

This function uses the unique-document-keys to determine if the current document matches any document in the list. The first document for which a key matches exactly will be returned.

Parameters:

document – the document to search for.
documents – an iterable of existing documents to match against.

Returns:

a document from documents which matches the given document or None if no document is found.

papis.utils.folders_to_documents(folders: Iterable[str]) → list[Document][source]

Load a list of documents from their respective folders.

Parameters:: folders – a list of folder paths to load from.
Returns:: a list of document objects.

papis.utils.update_doc_from_data_interactively(document: DocumentLike, data: dict[str, Any], data_name: str) → None[source]

Shows a TUI to update the document interactively with fields from data.

Parameters:

document – a document (or a mapping convertible to a document) which is going to be updated.
data – additional data to select and merge into document.
data_name – an identifier for the data to show in the TUI.

papis.utils.get_cache_home() → str[source]

Get default cache directory.

This will retrieve the cache-dir configuration setting. If not provided, a platform-dependent cache folder is chosen instead.

Returns:: the absolute path for the cache main folder.

`papis.yaml`

papis.yaml.data_to_yaml(yaml_path: str, data: dict[str, Any], *, allow_unicode: bool | None = True) → None[source]

Save data to yaml_path in the YAML format.

Parameters:

yaml_path – path to a file.
data – data to write to the file as a YAML document.

papis.yaml.list_to_path(data: Sequence[dict[str, Any]], filepath: str, *, allow_unicode: bool | None = True) → None[source]

Save a list of dicts to a YAML file.

Parameters:

data – a sequence of dictionaries to save as YAML documents.
filepath – path to a file.

papis.yaml.yaml_to_data(yaml_path: str, raise_exception: bool = False) → dict[str, Any][source]

Read a YAML document from yaml_path.

Parameters:

yaml_path – path to a file.
raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.

Returns:

a dict containing the data from the YAML document.

Raises:

ValueError – if the document cannot be loaded due to YAML parsing errors.

papis.yaml.yaml_to_list(yaml_path: str, raise_exception: bool = False) → list[dict[str, Any]][source]

Read a list of YAML documents.

This is analogous to yaml_to_data(), but uses yaml.load_all to read multiple documents (see PyYAML docs).

Parameters:

yaml_path – path to a file containing YAML documents.
raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.

Returns:

a list of dict objects, one for each YAML document in the file.

Raises:

ValueError – if the documents cannot be loaded due to YAML parsing errors.

`papis.commands.doctor`

papis.commands.doctor.FixFn

Callable for automatic doctor fixers. This callable is constructed by a check and is expected to wrap all the required data, so it takes no arguments.

alias of Callable[[], None]

papis.commands.doctor.CheckFn: TypeAlias = 'Callable[[Document], list[Error]]': Callable for doctor document checks.

class papis.commands.doctor.Error(name: str, path: str, payload: str, msg: str, suggestion_cmd: str, fix_action: FixFn | None, doc: Document | None)[source]

A detailed error returned by a doctor check.

name: str: Name of the check generating the error.

path: str: Path to the document that generated the error.

payload: str: A value that caused the error (usually a document key).

msg: str: A short message describing the error that can be displayed to the user.

suggestion_cmd: str: A command to run to fix the error that can be suggested to the user.

fix_action: FixFn | None: A callable that can autofix the error (see FixFn). Note that this will change the attached doc.

doc: Document | None: The document that generated the error.

class papis.commands.doctor.Check(name, operate)[source]

name: str: Name of the check

operate: CheckFn: A callable that takes a document and returns a list of errors generated by the current check (see CheckFn).

papis.commands.doctor.register_check(name: str, check: CheckFn) → None[source]

Register a new check.

Registered checks are recognized by papis and can be used by users in their configuration files through doctor-default-checks or on the command line through the --checks flag.

papis.commands.doctor.files_check(doc: Document) → list[Error][source]

Check whether the files of a document actually exist in the filesystem.

Returns:: a list of errors, one for each file that does not exist.

papis.commands.doctor.keys_missing_check(doc: Document) → list[Error][source]

Checks whether the keys provided in the configuration option doctor-keys-missing-keys exist in the document and are non-empty.

Returns:: a list of errors, one for each missing key.

papis.commands.doctor.refs_check(doc: Document) → list[Error][source]

Checks that a ref exists and if not it tries to create one according to the ref-format configuration option.

Returns:: an error if the reference does not exist or contains invalid characters (as required by BibTeX).

papis.commands.doctor.duplicated_keys_check(doc: Document) → list[Error][source]

Check for duplicated keys in the list given by the doctor-duplicated-keys-keys configuration option.

Returns:: a list of errors, one for each key with a value that already exist in the documents from the current query.

papis.commands.doctor.duplicated_values_check(doc: Document) → list[Error][source]

Check if the keys given by doctor-duplicated-values-keys contain any duplicate entries. These keys are expected to be lists of items.

Returns:: a list of errors, one for each key with a value that has duplicate entries.

papis.commands.doctor.bibtex_type_check(doc: Document) → list[Error][source]

Check that the document type is compatible with BibTeX or BibLaTeX type descriptors.

Returns:: an error if the types are not compatible.

papis.commands.doctor.biblatex_type_alias_check(doc: Document) → list[Error][source]

Check that the BibLaTeX type of the document is not a known alias.

The aliases are described by bibtex_type_aliases.

Returns:: an error if the type of the document is an alias.

papis.commands.doctor.biblatex_key_alias_check(doc: Document) → list[Error][source]

Check that no BibLaTeX keys in the document are known aliases.

The aliases are described by bibtex_key_aliases. Note that these keys can also be converted on export to BibLaTeX.

Returns:: an error for each key of the document that is an alias.

papis.commands.doctor.biblatex_required_keys_check(doc: Document) → list[Error][source]

Check that required BibLaTeX keys are part of the document based on its type.

The required keys are described by papis.bibtex.bibtex_type_required_keys. Note that most BibLaTeX processors will be quite forgiving if these keys are missing.

Returns:: an error for each key of the document that is missing.

papis.commands.doctor.biblatex_key_convert_check(doc: Document) → list[Error][source]

Check if any BibLaTeX keys in the document are incorrectly assigned.

Note that this is a heuristic in most cases, as we cannot always determine allowable values. Implemented checks include:

issue entries that should be number: issue is generally reserved for periodicals (e.g. “Spring” issue) and not meant as short designator for a publication (see Section 2.3.11 from the BibLaTeX manual).

Returns:: a list of errors for each key that appears misassigned.

papis.commands.doctor.get_key_type_check_keys() → dict[str, type][source]

Check the doctor-key-type-keys configuration entry for correctness.

The doctor-key-type-keys configuration entry defines a mapping of keys and their expected types. If the desired type is a list, the doctor-key-type-separator setting can be used to split an existing string (and, similarly, if the desired type is a string, it can be used to join a list of items).

Returns:: A dictionary mapping key names to types.

papis.commands.doctor.key_type_check(doc: Document) → list[Error][source]

Check document keys have expected types.

Returns:: a list of errors, one for each key does not have the expected type (if it exists).

papis.commands.doctor.html_codes_check(doc: Document) → list[Error][source]

Checks that the keys in doctor-html-codes-keys configuration options do not contain any HTML codes like & etc.

Returns:: a list of errors, one for each key that contains HTML codes.

papis.commands.doctor.html_tags_check(doc: Document) → list[Error][source]

Checks that the keys in doctor-html-tags-keys configuration options do not contain any HTML tags like <href> etc.

Returns:: a list of errors, one for each key that contains HTML codes.

papis.commands.doctor.string_cleaner_check(doc: Document) → list[Error][source]

Check string keys in the document for various errors.

This check goes through all the keys of the document that are known to be keys, according to doctor-key-type-keys, and fixes any obvious errors. For example (not exhaustive):

Double spacing or any repeated whitespace.
Unexpected new line characters.
Weirdly formatted names, e.g. “J R R Tolkien” should be “J. R. R. Tolkien”.

Returns:: a list of errors, one for each string-based key that has unexpected formatting.

papis.commands.doctor.gather_errors(documents: list[Document], checks: list[str] | None = None) → list[Error][source]

Run all checks over the list of documents.

Only checks registered with register_check() are supported and any unrecongnized checks are automatically skipped.

Parameters:: checks – a list of checks to run over the documents. If not provided, the default doctor-default-checks are used.
Returns:: a list of all the errors gathered from the documents.

papis.commands.doctor.fix_errors(doc: Document, checks: list[str] | None = None) → None[source]

Fix errors in doc for the given checks.

This function only applies existing auto-fixers to the document. This is not possible for many of the existing checks, but can be used to quickly clean up a document.

papis.commands.doctor.process_errors(errors: list[Error], fix: bool = False, explain: bool = False, suggest: bool = False, edit: bool = False) → None[source]

Process a list of document errors from gather_errors().

Parameters:

fix – if True, any automatic fixes are applied to the document the error refers to.
explain – if True, a short explanation of the error is shown.
suggest – if True, a short suggestion for manual fixing of the error is shown.
edit – if True, the document is opened for editing.

papis.commands.doctor.run(doc: Document, checks: list[str] | None = None, fix: bool = True, explain: bool = False, suggest: bool = False, edit: bool = False) → None[source]

Runner for papis doctor.

It runs all the checks given by the checks argument that have been registered through register_check(). It then proceeds with processing and fixing each error in turn.

Developer API reference

papis.bibtex

papis.citations

papis.cli

papis.commands

papis.config

papis.database

papis.database.cache

papis.database.whoosh

papis.docmatcher

papis.document

papis.downloaders

papis.exceptions

papis.filetype

papis.format

papis.git

papis.hooks

papis.id

papis.importer

papis.library

papis.logging

papis.notes

papis.paths

papis.pick

papis.plugin

papis.sphinx_ext

papis.testing

papis.utils

papis.yaml

papis.commands.doctor

`papis.bibtex`

`papis.citations`

`papis.cli`

`papis.commands`

`papis.config`

`papis.database`

`papis.database.cache`

`papis.database.whoosh`

`papis.docmatcher`

`papis.document`

`papis.downloaders`

`papis.exceptions`

`papis.filetype`

`papis.format`

`papis.git`

`papis.hooks`

`papis.id`

`papis.importer`

`papis.library`

`papis.logging`

`papis.notes`

`papis.paths`

`papis.pick`

`papis.plugin`

`papis.sphinx_ext`

`papis.testing`

`papis.utils`

`papis.yaml`

`papis.commands.doctor`