Developer API reference
Warning
The APIs documented here are not stable and may change from one version to
another. This is meant to be used by developers, both of papis itself and
any external plugins.
papis.bibtex
A set of utilities for working with BibTeX and BibLaTeX (as described in the manual).
- papis.bibtex.bibtex_standard_types = frozenset({'article', 'book', 'bookinbook', 'booklet', 'collection', 'dataset', 'inbook', 'incollection', 'inproceedings', 'inreference', 'manual', 'misc', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'periodical', 'proceedings', 'reference', 'report', 'software', 'suppbook', 'suppcollection', 'suppperiodical', 'thesis', 'unpublished'})
Regular BibLaTeX types (Section 2.1.1).
- papis.bibtex.bibtex_type_aliases = {'conference': 'inproceedings', 'electronic': 'online', 'mastersthesis': 'thesis', 'phdthesis': 'thesis', 'techreport': 'report', 'www': 'online'}
BibLaTeX type aliases (Section 2.1.2).
- papis.bibtex.bibtex_non_standard_types = frozenset({'artwork', 'audio', 'bibnote', 'commentary', 'image', 'jurisdiction', 'legal', 'legislation', 'letter', 'movie', 'music', 'performance', 'review', 'standard', 'video'})
Non-standard BibLaTeX types (Section 2.1.3).
- papis.bibtex.biblatex_software_types = frozenset({'codefragment', 'software', 'softwaremodule', 'softwareversion'})
BibLaTeX Software types (Section 2).
- papis.bibtex.bibtex_types = frozenset({'article', 'artwork', 'audio', 'bibnote', 'book', 'bookinbook', 'booklet', 'codefragment', 'collection', 'commentary', 'conference', 'dataset', 'electronic', 'image', 'inbook', 'incollection', 'inproceedings', 'inreference', 'jurisdiction', 'legal', 'legislation', 'letter', 'manual', 'mastersthesis', 'misc', 'movie', 'music', 'mvbook', 'mvcollection', 'mvproceedings', 'mvreference', 'online', 'patent', 'performance', 'periodical', 'phdthesis', 'proceedings', 'reference', 'report', 'review', 'software', 'softwaremodule', 'softwareversion', 'standard', 'suppbook', 'suppcollection', 'suppperiodical', 'techreport', 'thesis', 'unpublished', 'video', 'www'})
A set of known BibLaTeX types (as described in Section 2.1 of the manual). These types are a union of the types above and can be extended with
extra-bibtex-types.
- papis.bibtex.bibtex_standard_keys = frozenset({'abstract', 'addendum', 'afterword', 'annotation', 'annotator', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'file', 'foreword', 'holder', 'howpublished', 'indextitle', 'institution', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'label', 'language', 'library', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'publisher', 'pubstate', 'reprinttitle', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'subtitle', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'year'})
BibLaTeX data fields (Section 2.2.2).
- papis.bibtex.bibtex_key_aliases = {'address': 'location', 'annote': 'annotation', 'archiveprefix': 'eprinttype', 'journal': 'journaltitle', 'key': 'sortkey', 'pdf': 'file', 'primaryclass': 'eprintclass', 'school': 'institution'}
BibLaTeX field aliases (Section 2.2.5).
- papis.bibtex.bibtex_special_keys = frozenset({'crossref', 'entryset', 'execute', 'gender', 'ids', 'indexsorttitle', 'keywords', 'langid', 'langidopts', 'options', 'presort', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'xdata', 'xref'})
Special BibLaTeX fields (Section 2.2.3).
- papis.bibtex.biblatex_software_keys = frozenset({'abstract', 'author', 'date', 'doi', 'editor', 'eprint', 'eprintclass', 'eprinttype', 'file', 'hal_id', 'hal_version', 'institution', 'introducedin', 'license', 'month', 'note', 'organization', 'publisher', 'related', 'relatedstring', 'relatedtype', 'repository', 'subtitle', 'swhid', 'title', 'url', 'urldate', 'version', 'year'})
BibLaTeX software keys (Section 3). Most of these keys are already standard BibLaTeX keys from
bibtex_standard_keys.
- papis.bibtex.bibtex_keys = frozenset({'abstract', 'addendum', 'address', 'afterword', 'annotation', 'annotator', 'annote', 'archiveprefix', 'author', 'authortype', 'bookauthor', 'bookpagination', 'booksubtitle', 'booktitle', 'booktitleaddon', 'chapter', 'commentator', 'crossref', 'date', 'doi', 'edition', 'editor', 'editora', 'editoratype', 'editorb', 'editorbtype', 'editorc', 'editorctype', 'editortype', 'eid', 'entryset', 'entrysubtype', 'eprint', 'eprintclass', 'eprinttype', 'eventdate', 'eventtitle', 'eventtitleaddon', 'execute', 'file', 'foreword', 'gender', 'hal_id', 'hal_version', 'holder', 'howpublished', 'ids', 'indexsorttitle', 'indextitle', 'institution', 'introducedin', 'introduction', 'isan', 'isbn', 'ismn', 'isrn', 'issn', 'issue', 'issuesubtitle', 'issuetitle', 'issuetitleaddon', 'iswc', 'journal', 'journalsubtitle', 'journaltitle', 'journaltitleaddon', 'key', 'keywords', 'label', 'langid', 'langidopts', 'language', 'library', 'license', 'location', 'mainsubtitle', 'maintitle', 'maintitleaddon', 'month', 'nameaddon', 'note', 'number', 'options', 'organization', 'origdate', 'origlanguage', 'origlocation', 'origpublisher', 'origtitle', 'pages', 'pagetotal', 'pagination', 'part', 'pdf', 'presort', 'primaryclass', 'publisher', 'pubstate', 'related', 'relatedoptions', 'relatedstring', 'relatedtype', 'repository', 'reprinttitle', 'school', 'series', 'shortauthor', 'shorteditor', 'shorthand', 'shorthandintro', 'shortjournal', 'shortseries', 'shorttitle', 'sortkey', 'sortname', 'sortshorthand', 'sorttitle', 'sortyear', 'subtitle', 'swhid', 'title', 'titleaddon', 'translator', 'url', 'urldate', 'venue', 'version', 'volume', 'volumes', 'xdata', 'xref', 'year'})
A set of known BibLaTeX fields (as described in Section 2.2 of the manual). These fields are a union of the above fields and can be extended with extended with
extra-bibtex-keys.
- papis.bibtex.bibtex_type_required_keys = {'article': ({'author'}, {'title'}, {'eprinttype', 'journaltitle'}, {'date', 'year'}), 'book': ({'author'}, {'title'}, {'date', 'year'}), 'booklet': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'codefragment': ({'url'},), 'collection': ({'editor'}, {'title'}, {'date', 'year'}), 'dataset': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'inbook': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'incollection': ({'author'}, {'title'}, {'editor'}, {'booktitle'}, {'date', 'year'}), 'inproceedings': ({'author'}, {'title'}, {'booktitle'}, {'date', 'year'}), 'manual': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'misc': ({'author', 'editor'}, {'title'}, {'date', 'year'}), 'online': ({'author', 'editor'}, {'title'}, {'date', 'year'}, {'doi', 'eprint', 'url'}), 'patent': ({'author'}, {'title'}, {'number'}, {'date', 'year'}), 'periodical': ({'editor'}, {'title'}, {'date', 'year'}), 'proceedings': ({'title'}, {'date', 'year'}), 'report': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'software': ({'author', 'editor'}, {'title'}, {'url'}, {'year'}), 'softwaremodule': ({'author'}, {'subtitle'}, {'url'}, {'year'}), 'softwareversion': ({'author', 'editor'}, {'title'}, {'url'}, {'version'}, {'year'}), 'thesis': ({'author'}, {'title'}, {'type'}, {'institution'}, {'date', 'year'}), 'unpublished': ({'author'}, {'title'}, {'date', 'year'}), None: ()}
A mapping of supported BibLaTeX entry types (see
bibtex_types) to BibLaTeX fields (seebibtex_keys). Each value is a tuple of disjoint sets that can contain multiple fields required for the particular type, e.g. an article may require either ayearor adatefield.
- papis.bibtex.bibtex_type_required_keys_aliases = {'bookinbook': 'inbook', 'inreference': 'incollection', 'mvbook': 'book', 'mvcollection': 'collection', 'mvproceedings': 'proceedings', 'mvreference': 'collection', 'reference': 'collection', 'suppbook': 'book', 'suppcollection': 'collection', 'suppperiodical': 'periodical'}
A mapping for additional BibLaTeX types that have the same required fields. This mapping can be used to convert types before looking into
bibtex_type_required_keys.
- papis.bibtex.bibtex_type_converter: dict[str, str] = {'OriginalPaper': 'article', 'annotation': 'misc', 'attachment': 'misc', 'audioRecording': 'audio', 'bill': 'legislation', 'blogPost': 'online', 'bookSection': 'inbook', 'case': 'jurisdiction', 'computerProgram': 'software', 'conferencePaper': 'inproceedings', 'dictionaryEntry': 'misc', 'document': 'article', 'email': 'online', 'encyclopediaArticle': 'article', 'film': 'video', 'forumPost': 'online', 'hearing': 'jurisdiction', 'instantMessage': 'online', 'interview': 'article', 'journal': 'article', 'journalArticle': 'article', 'magazineArticle': 'article', 'manuscript': 'unpublished', 'map': 'misc', 'monograph': 'book', 'newspaperArticle': 'article', 'note': 'misc', 'podcast': 'audio', 'preprint': 'unpublished', 'presentation': 'misc', 'radioBroadcast': 'audio', 'statute': 'jurisdiction', 'tvBroadcast': 'video', 'videoRecording': 'video', 'webpage': 'online'}
A mapping of arbitrary types to BibLaTeX types in
bibtex_types. This mapping can be used when translating from other software, e.g. Zotero has custom fields in its schema.
- papis.bibtex.bibtex_key_converter: dict[str, str] = {'abstractNote': 'abstract', 'conferenceName': 'eventtitle', 'place': 'location', 'proceedingsTitle': 'booktitle', 'publicationTitle': 'journal', 'university': 'school'}
A mapping of arbitrary fields to BibLaTeX fields in
bibtex_keys. This mapping can be used when translating from other software.
- papis.bibtex.bibtex_ignore_keys = frozenset({'file'})
A set of BibLaTeX fields to ignore when exporting from the Papis database. These can be extended with
bibtex-ignore-keys.
- papis.bibtex.ref_allowed_characters = '([^a-zA-Z0-9._:]+|(?<!\\\\)[._:])'
A regex for acceptable characters to use in a reference string. These are used by
ref_cleanup()to remove any undesired characters.
- papis.bibtex.bibtex_verbatim_fields = frozenset({'doi', 'eprint', 'file', 'pdf', 'url', 'urlraw'})
A list of fields that should not be escaped. In general, these will be escaped by the BibTeX engine and should not be modified (e.g. Verbatim fields and URI fields in Section 2.2.1).
- papis.bibtex.bibtexparser_entry_to_papis(entry: dict[str, Any]) dict[str, Any][source]
Convert the keys of a BibTeX entry parsed by
bibtexparserto a papis-compatible format.- Parameters:
entry – a dictionary with keys parsed by
bibtexparser.- Returns:
a dictionary with keys converted to a papis-compatible format.
- papis.bibtex.bibtex_to_dict(bibtex: str) list[DocumentLike][source]
Convert a BibTeX file (or string) to a list of Papis-compatible dictionaries.
This will convert an entry like:
@article{ref, author = { ... }, title = { ... }, ..., }
to a dictionary such as:
{ "type": "article", "author": "...", "title": "...", ...}
- Parameters:
bibtex – a path to a BibTeX file or a string containing BibTeX formatted data. If it is a file, its contents are passed to
BibTexParser.- Returns:
a list of entries from the BibTeX data in a compatible format.
- papis.bibtex.ref_cleanup(ref: str, ref_word_separator: str | None = None) str[source]
Function to cleanup reference strings so that they are accepted by BibLaTeX.
This uses the
ref_allowed_charactersto remove any disallowed characters from the given ref. Furthermore,slugifyis used to remove unicode characters and ensure consistent use of the underscore_as a separator.- Returns:
a reference without any disallowed characters.
- papis.bibtex.create_reference(doc: DocumentLike, *, ref_format: AnyString | None = None, ref_word_separator: str | None = None, force: bool = False) str[source]
Try to create a reference for the document doc.
If the document doc does not have a
"ref"key, this function attempts to create one, otherwise the existing key is returned. When creating a new reference:the
ref-formatkey is used, if available,the document DOI is used, if available,
a string is constructed from the document data (author, title, etc.).
- Parameters:
force – if True, the reference is re-created even if the document already has a
"ref"key.ref_word_separator – separator passed to
ref_cleanup().
- Returns:
a clean (see
ref_cleanup()) reference for the document.
- papis.bibtex.author_list_to_author(doc: Document, author_list: list[dict[str, Any]]) str[source]
Construct the BibTeX author field from the document’s author_list.
This function is similar to
papis.document.author_list_to_author(), but takes into account some BibTeX peculiarities: * The separator between the authors is always “and” and * Authors with only a family or given names are surrounded by curly brackets.- Returns:
an author string.
papis.citations
- papis.citations.Citations
A list of citations for an existing document.
- papis.citations.get_metadata_citations(doc: DocumentLike) Citations[source]
Get the citations in the metadata that contain a DOI.
- papis.citations.fetch_citations(doc: Document) Citations[source]
Retrieve citations for the document.
Citation retrieval is mainly based on querying Crossref metadata based on the DOI of the document. If the document does not have a DOI, this function will fail to retrieve any citations.
- Returns:
a list of citations that have a DOI.
- papis.citations.get_citations_from_database(dois: Sequence[str]) Citations[source]
Look for document DOIs in the database.
- Parameters:
dois – a sequence of DOIs to look for in the current library database.
- Returns:
a sequence of documents from the current library that match the given dois, if any.
- papis.citations.update_and_save_citations_from_database_from_doc(doc: Document) None[source]
Update the citations file of an existing document.
This function will get any existing citations in the document, update them as appropriate and save them back to the citation file.
- papis.citations.update_citations_from_database(citations: list[dict[str, Any]]) list[dict[str, Any]][source]
Update a list of citations with data from the database.
- Parameters:
citations – a list of existing citations to update.
- papis.citations.save_citations(doc: Document, citations: Citations) None[source]
Save the citations to the document’s citation file.
- papis.citations.fetch_and_save_citations(doc: Document) None[source]
Retrieve citations from available sources and save them to the citations file.
- papis.citations.get_citations_file(doc: Document) str | None[source]
Get the document’s citation file path (see
citations-file-name).- Returns:
an absolute path to the citations file for doc.
- papis.citations.has_citations(doc: Document) bool[source]
- Returns:
True if the document has an existing citations file and False otherwise.
- papis.citations.get_citations(doc: Document) Citations[source]
Retrieve citations from the document’s citation file.
- papis.citations.get_cited_by_file(doc: Document) str | None[source]
Get the documents cited-by file (see
cited-by-file-name).- Returns:
an absolute path to the cited-by file for doc.
- papis.citations.has_cited_by(doc: Document) bool[source]
- Returns:
True if the document has a cited-by file and False otherwise.
- papis.citations.save_cited_by(doc: Document, citations: Citations) None[source]
Save the cited-by list citations to the document’s cited-by file.
- papis.citations.fetch_cited_by_from_database(cit: dict[str, Any]) list[dict[str, Any]][source]
Fetch a list of documents that cite cit from the database.
- Parameters:
cit – a citation to look for in the database.
- Returns:
a list of documents that cite cit.
papis.cli
- class papis.cli.LibraryParamType[source]
-
- shell_complete(ctx: Context, param: Parameter, incomplete: str) list[CompletionItem][source]
Return a list of
CompletionItemobjects for the incomplete value. Most types do not provide completions, but some do, and this allows custom types to provide custom completions as well.- Parameters:
ctx – Invocation context for this command.
param – The parameter that is requesting completion.
incomplete – Value being completed. May be empty.
Added in version 8.0.
- papis.cli.bool_flag(*args: Any, **kwargs: Any) Callable[[...], Any][source]
A wrapper to
click.option()that hardcodes a boolean flag option.
- papis.cli.query_argument(**attrs: Any) Callable[[...], Any][source]
Adds a
queryargument as aclick.argument()decorator.
- papis.cli.query_option(**attrs: Any) Callable[[...], Any][source]
Adds a
-q,--queryoption as aclick.option()decorator.
- papis.cli.sort_option(**attrs: Any) Callable[[...], Any][source]
Adds a
--sortand a--reverseoption as aclick.option()decorator.
- papis.cli.doc_folder_option(**attrs: Any) Callable[[...], Any][source]
Adds a
--doc-folderargument as aclick.option()decorator.
- papis.cli.all_option(**attrs: Any) Callable[[...], Any][source]
Adds a
--alloption as aclick.option()decorator.
- papis.cli.git_option(**attrs: Any) Callable[[...], Any][source]
Adds a
--gitoption as aclick.option()decorator.
- papis.cli.handle_doc_folder_or_query(query: str, doc_folder: str | tuple[str, ...] | None, library_name: str | None = None) list[Document][source]
Query database for documents.
This handles the
query_option()anddoc_folder_option()command-line arguments. If a doc_folder is given, then the document at that location is loaded, otherwise the database is queried using query.- Parameters:
query – a database query string.
doc_folder – existing document folder (see
papis.document.from_folder()).library_name – library database to query.
- papis.cli.handle_doc_folder_query_sort(query: str, doc_folder: str | tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool) list[Document][source]
Query database for documents.
Similar to
handle_doc_folder_or_query(), but also handles thesort_option()arguments. It sorts the resulting documents according to sort_field and reverse_field.- Parameters:
sort_field – field by which to sort the resulting documents (see
papis.document.sort()).sort_reverse – if True, the fields are sorted in reverse order.
- papis.cli.handle_doc_folder_query_all_sort(query: str, doc_folder: str | tuple[str, ...] | None, sort_field: str | None, sort_reverse: bool, _all: bool) list[Document][source]
Query database for documents.
Similar to
handle_doc_folder_query_sort(), but also handles theall_option()argument.- Parameters:
_all – if False, the user is prompted to pick a subset of documents (see
papis.api.pick_doc()).
- papis.cli.bypass(group: Group, command: Command, command_name: str) Callable[[...], Any][source]
Overwrite existing
papiscommands.This function is especially important for developing scripts in
papis.For example, consider augmenting the
addcommand, as seen when usingpapis add. In this case, we may want to add some additional options or behavior before callingpapis.commands.add, but would like to avoid writing it from scratch. This function can then be used as follows to allow this:import click import papis.cli import papis.commands.add @click.group() def main(): """Your main app""" pass @papis.cli.bypass(main, papis.commands.add.cli, "add") def add(**kwargs): # do some logic here... # and call the original add command line function by papis.commands.add.cli.bypassed(**kwargs)
papis.commands
- papis.commands.COMMAND_NAMESPACE_NAME = 'papis.command'
Name of the entry point namespace for
Commandplugins.
- papis.commands.EXTERNAL_COMMAND_REGEX = re.compile('.*papis-([^ .]+)$')
Regex for determining external commands.
- papis.commands.make_short_help(text: str, fallback: str = 'No help message available.') str[source]
Create a short help from the given text.
This will take the first paragraph of the text and remove any known restructuredText markup so that it can be shown as a help string in the command line.
If the text is actually empty, the fallback will be returned.
- papis.commands.normalize_help(text: str | None) str[source]
Clean up the given text so that it can be shown on the command-line.
Similarly to
make_short_help(), this removes ny known restructuredText markup from the text and does additional normalizations so that it can be better displayed on the command-line.
- class papis.commands.FullHelpCommand(name: str | None, context_settings: MutableMapping[str, Any] | None = None, callback: Callable[[...], Any] | None = None, params: list[Parameter] | None = None, help: str | None = None, epilog: str | None = None, short_help: str | None = None, options_metavar: str | None = '[OPTIONS]', add_help_option: bool = True, no_args_is_help: bool = False, hidden: bool = False, deprecated: bool | str = False)[source]
This is a simple wrapper around
click.Commandthat does not truncate the short help messages.We still very much prefer that these stay short if at all possible, but the default limit of 45 characters does not work well for many non-trivial commands.
- format_help_text(ctx: Context, formatter: HelpFormatter) None[source]
Writes the help text to the formatter if it exists.
- class papis.commands.AliasedGroup(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, invoke_without_command: bool = False, no_args_is_help: bool | None = None, subcommand_metavar: str | None = None, chain: bool = False, result_callback: Callable[[...], Any] | None = None, **kwargs: Any)[source]
A
click.Groupthat accepts command aliases.This group command is taken from here and is to be used for groups with aliases. In this case, aliases are defined as prefixes of the command. For example, for a command named
remove,remis also accepted as long as it is unique.- command_class[source]
alias of
FullHelpCommand
- class papis.commands.CommandPluginLoaderGroup(name: str | None = None, commands: MutableMapping[str, Command] | Sequence[Command] | None = None, invoke_without_command: bool = False, no_args_is_help: bool | None = None, subcommand_metavar: str | None = None, chain: bool = False, result_callback: Callable[[...], Any] | None = None, **kwargs: Any)[source]
A
click.Groupthat loads additional commands from entry points.Commands in this group are loaded using
get_commands(). By default commands from theCOMMAND_NAMESPACE_NAMEnamespace are loaded. Additional external scripts that are found in the path and match theEXTERNAL_COMMAND_REGEXare also loaded.To overwrite this behavior, create a subclass and modify the
command_plugins()method to load commands from other namespaces.- command_class[source]
alias of
FullHelpCommand
- property command_plugins: dict[str, CommandPlugin]
A mapping of command names to available command plugins.
- list_commands(ctx: click.Context) list[str][source]
List all matched commands in the command folder and in path
>>> group = CommandPluginLoaderGroup() >>> rv = group.list_commands(None) >>> len(rv) > 0 True
- class papis.commands.CommandPlugin(command_name: str, path: str | None, entrypoint: EntryPoint | None)[source]
A
papiscommand plugin or script.These plugins are made available through the main
papiscommand-line interface as subcommands.- entrypoint: EntryPoint | None
The module the plugin is imported from if it is an entry point.
- papis.commands.load_command(cmd: CommandPlugin) Command | None[source]
Load a command based on the given information in cmd.
If the command is an entry point, then it is loaded through the mechanisms in
importlib.metadata.If the command is an external executable, it is wrapped as an external command and all command-line arguments are passed through to it.
- Returns:
a
click.Commandthat can be used by aclick.Group.
- papis.commands.get_external_scripts(matcher: Pattern[str] | None = None) dict[str, CommandPlugin][source]
Get a mapping of all external scripts that should be registered with Papis.
An external script is an executable that can be found in the
papis.config.get_scripts_folder()folder or in the user’s PATH. The scripts are recognized by their file name using the provided matcher regular expression. For example, default Papis commands are always recognized usingEXTERNAL_COMMAND_REGEX.- Returns:
a mapping of scripts that have been found.
- papis.commands.get_command_plugins(namespace: str) dict[str, CommandPlugin][source]
Get a mapping of entry points that should be registered as Papis commands.
- Parameters:
namespace – a namespace for the entry point commands to retrieve.
- Returns:
a mapping of plugins that have been found.
- papis.commands.get_commands(namespace: str, *, extern_matcher: Pattern[str] | Literal[False] | None = None) dict[str, CommandPlugin][source]
Get a mapping of all commands that should be registered with Papis.
This includes the results from
get_external_scripts()andget_command_plugins(). Entrypoint-based scripts take priority, so if an external script with the same name is found it is silently ignored.- Parameters:
namespace – a namespace for the entry point commands to retrieve.
extern_matcher – a regular expression that matches file names of external commands (see
get_external_scripts()). If False, no external commands are loaded.
- Returns:
a mapping of scripts that have been found.
papis.config
- papis.config.get_general_settings_name() str[source]
Get the section name of the general settings.
>>> get_general_settings_name() 'settings'
- class papis.config.Configuration[source]
A subclass of
configparser.ConfigParserwith custom defaults.This class automatically reads the configuration file and imports any required scripts. If no file exists, a default one is created.
Use
get_configuration()to instantiate this class instead of calling it directly.
- papis.config.get_default_settings() dict[str, dict[str, Any]][source]
Get the default settings for all non-user variables.
Additional user variables can be registered using
register_default_settings()and will be included in this dictionary.
- papis.config.register_default_settings(settings_dictionary: dict[str, dict[str, Any]]) None[source]
Register configuration settings into the global configuration registry.
Notice that you can define sections or global options. For instance, let us suppose that a script called
foobardefines some configuration options. The script might define the following:import papis.config options = {"foobar": { "command": "open"}} papis.config.register_default_settings(options)
which can then be accessed globally through:
papis.config.get("command", section="foobar")
- Parameters:
settings_dictionary – a dictionary of configuration settings, where the first level of keys defines the sections and the second level defines the actual configuration settings.
- papis.config.get_config_home() str[source]
- Returns:
a (platform dependent) base directory relative to which user specific configuration files should be stored.
- papis.config.get_config_folder() str[source]
Get the main configuration folder.
- Returns:
a (platform dependent) folder where the configuration files are stored, e.g.
$HOME/.config/papison POSIX platforms.
- papis.config.get_config_file() str[source]
Get the main configuration file.
- Returns:
the path of the main configuration file, which by default is in
get_config_folder(), but can be overwritten usingset_config_file().
- papis.config.get_configpy_file() str[source]
Get the main Python configuration file.
This is a file that will get automatically
eval()ed if it exists and allows for more dynamic configuration.- Returns:
the path of the main Python configuration file, which by default is in
get_config_folder().
- papis.config.get_scripts_folder() str[source]
- Returns:
the folder where additional scripts are stored, which by default is in
get_config_folder().
- papis.config.set(key: str, value: Any, section: str | None = None) None[source]
Set a key in the configuration.
- Parameters:
key – the name of the key to set.
value – the value to set it to, which can be any value understood by the
Configuration.section – the name of the section to set the key in.
- papis.config.general_get(key: str, section: str | None = None, data_type: type | None = None) Any | None[source]
Get the value for a given key in section.
This function is a bit more general than the get from
Configuration(seeconfigparser.ConfigParser.get()). In particular it supportsProviding the key and section, in which case it will retrieve the key from that section directly.
The key has the format
<section>-<key>and no section is specified. In this case, the full key is expected to be in the general settings section or a library section.
The priority of the search is given by
The key is retrieved from a library section.
The key is retrieved from the given section, if any.
The key is retrieved from the general section.
- Parameters:
key – a key in the configuration file to retrieve.
section – a section from which to retrieve the key, which defaults to
get_general_settings_name().data_type – the data type that should be expected for the value of the variable.
- papis.config.get(key: str, section: str | None = None) Any | None[source]
Retrieve a general value (can be None) from the configuration file.
- papis.config.getint(key: str, section: str | None = None) int | None[source]
Retrieve an integer value from the configuration file.
>>> set("something", 42) >>> getint("something") 42
- papis.config.getfloat(key: str, section: str | None = None) float | None[source]
Retrieve an floating point value from the configuration file.
>>> set("something", 0.42) >>> getfloat("something") 0.42
- papis.config.getboolean(key: str, section: str | None = None) bool | None[source]
Retrieve a boolean value from the configuration file.
>>> set("add-open", True) >>> getboolean("add-open") True
- papis.config.getstring(key: str, section: str | None = None) str[source]
Retrieve a string value from the configuration file.
>>> set("add-open", "hello world") >>> getstring("add-open") 'hello world'
- papis.config.getformatpattern(key: str, section: str | None = None) FormatPattern[source]
Retrieve a format pattern from the configuration file.
Format patterns use the
FormatPatternclass to define a string that should be formatted by a specificFormatter. For configuration options, such strings can be defined in the configuration file as:[settings] multiple-authors-format = {au[family]}, {au[given]} multiple-authors-format.python = {au[family]}, {au[given]} multiple-authors-format.jinja2 = {{ au[family] }}, {{ au[given] }}i.e. like
key[.formatter]. If no formatter is provided in the key name, the default formatter is used, as defined byformatter. Formatters are checked in alphabetical order and the last one is returned.>>> from papis.strings import FormatPattern >>> set("add-open", "hello world") >>> r = getformatpattern("add-open") >>> r.formatter 'python'
>>> set("add-open", FormatPattern("python", "hello world")) >>> r = getformatpattern("add-open") >>> r.formatter 'python'
>>> set("add-open.python", "hello world") >>> r = getformatpattern("add-open") >>> r.formatter 'python'
- papis.config.getlist(key: str, section: str | None = None) list[str][source]
Retrieve a list value from the configuration file.
This function uses
eval()to execute a the string present in the configuration file into a Python list. This can be unsafe if the list contains unknown code.>>> set("tags", "['a', 'b', 'c']") >>> getlist("tags") ['a', 'b', 'c']
- Raises:
papis.exceptions.UnexpectedSettingTypeError – Whenever the parsed syntax is either not a valid python object or not a valid python list.
- papis.config.get_configuration() Configuration[source]
Get the configuration object,
If no configuration has been initialized, it initializes one. Only one configuration per process should ever be configured.
- papis.config.merge_configuration_from_path(path: str | None, configuration: Configuration) None[source]
Merge information of a configuration file found in path into configuration.
- Parameters:
path – a path to a configuration file.
configuration – an existing
Configurationobject.
- papis.config.set_lib_from_name(libname: str) None[source]
Set the current library from a name.
- Parameters:
libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.
- papis.config.get_lib_from_name(libname: str) Library[source]
Get a library object from a name.
- Parameters:
libname – the name of a library in the configuration file or a path to an existing folder that should be considered a library.
- papis.config.get_lib() Library[source]
Get current library.
If there is no library set before, the default library will be retrieved. If the
PAPIS_LIBenvironment variable is defined, this is the library name (or path) that will be taken as a default.
- papis.config.get_libs_from_config(config: Configuration) list[str][source]
Get all library names from the given configuration.
In the configuration file, any sections that contain a
"dir"or a"dirs"key are considered to be libraries.
- papis.config.reset_configuration() Configuration[source]
Resets the existing configuration and returns a new one without any user settings.
- papis.config.escape_interp(path: str) str[source]
Escape paths added to the configuration file.
By default, the
papis.config.Configurationenables string interpolation in the key values (e.g. usingkey = %(other_key)s-suffix)). Any paths added to the configuration should then be escaped so that they do not interfere with the interpolation.
papis.database
- papis.database.get_database(library_name: str | None = None) Database[source]
Get the database for the library library_name.
If library_name is None, then the current database is retrieved from
papis.config.get_lib(). The given library name must exist in the configuration file or it should be a path to a directory containing Papis documents (seepapis.config.get_lib_from_name()).- Returns:
the caching database for the given library. The same database is returned on repeated calls to this function.
- papis.database.get_all_query_string() str[source]
Get the default query string for the current database.
- papis.database.clear_cached() None[source]
Clear cached databases.
After this function is called, all subsequent calls to
get_database()will recreate the database for the given library.
- papis.database.base.get_cache_file_name(libpaths: str) str[source]
Create a cache file name out of the path of a given directory.
- Parameters:
libpaths – folder names to be used as a seed for the cache name.
- Returns:
a name for the cache file specific to libpaths.
>>> get_cache_file_name('path/to/my/lib') 'a8c689820a94babec20c5d6269c7d488-lib' >>> get_cache_file_name('papers') 'a566b2bebc62611dff4cdaceac1a7bbd-papers'
- papis.database.base.get_cache_file_path(libpaths: str) str[source]
Get the full path to the cache file.
- Parameters:
libpaths – a cache file specific for the given library paths.
- class papis.database.base.Database(library: Library | None = None)[source]
Abstract base class for Papis caching database backends.
- abstractmethod get_backend_name() str[source]
Get the name of the database backend.
This name has to match the one used in the configuration file in the
database-backendsetting.
- abstractmethod get_cache_path() str[source]
Get the path to the database cache file (or directory).
- abstractmethod get_all_query_string() str[source]
Get the default query string that will match all documents.
- abstractmethod initialize() None[source]
Initialize the caching database backend.
This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.
- abstractmethod clear() None[source]
Clear the database by removing all files and directories.
After clearing the database, calling
initialize()may be necessary to ensure that it is in the correct state.
- abstractmethod update(document: Document) None[source]
Replace an existing document in the database.
- abstractmethod query(query_string: str) list[Document][source]
Find a document in the database by the given query_string.
The query string can have a more complex syntax based on the database backend.
- abstractmethod query_dict(query: dict[str, str]) list[Document][source]
Find a document in the database that matches the keys in query.
papis.database.cache
- papis.database.cache.filter_documents(documents: list[Document], search: str = '') list[Document][source]
Filter documents based on the search string.
- Parameters:
search – a search string that will be parsed by
parse_query.- Returns:
a list of filtered documents.
>>> document = papis.document.from_data({'author': 'einstein'}) >>> len(filter_documents([document], search="einstein")) == 1 True >>> len(filter_documents([document], search="author : ein")) == 1 True >>> len(filter_documents([document], search="title : ein")) == 1 False
- papis.database.cache.match_document(document: Document, search: re.Pattern[str], match_format: AnyString | None = None, doc_key: str | None = None) re.Match[str] | None[source]
Match a document’s keys to a given search pattern.
See
MatcherCallable.>>> from papis.docmatcher import get_regex_from_search as regex >>> document = papis.document.from_data({'author': 'einstein'}) >>> match_document(document, regex('e in'), '{doc[author]}') is None False >>> match_document(document, regex('ee in'), '{doc[author]}') is None True >>> match_document(document, regex('einstein'), '{doc[title]}') is None True
- class papis.database.cache.PickleDatabase(library: Library | None = None)[source]
A caching database backend for Papis based on
pickle.- get_backend_name() str[source]
Get the name of the database backend.
This name has to match the one used in the configuration file in the
database-backendsetting.
- initialize() None[source]
Initialize the caching database backend.
This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.
- clear() None[source]
Clear the database by removing all files and directories.
After clearing the database, calling
initialize()may be necessary to ensure that it is in the correct state.
- query(query_string: str) list[Document][source]
Find a document in the database by the given query_string.
The query string can have a more complex syntax based on the database backend.
papis.database.whoosh
This is the Whoosh interface to Papis.
For future Papis developers here are some considerations.
Whoosh works with 3 main objects, the Index, the Writer and the Schema.
The indices are stored in a subfolder of get_cache_home().
The name of the indices folders is similar to the cache files of the papis
cache database.
Once the Index is created in the mentioned folder, a Schema is initialized,
which is a declaration of the data prototype of the database, or the
definition of the table in SQL parlance. This is controlled by the
Papis configuration through the whoosh-schema-prototype. For instance
if the database is supposed to only contain the key fields
[author, title, year, tags], then the whoosh-schema-prototype
string should look like the following:
{
"author": TEXT(stored=True),
"title": TEXT(stored=True),
"year": TEXT(stored=True),
"tags": TEXT(stored=True),
}
where all the fields are explained in the Whoosh documentation.
After this Schema is created, the folders of the library are traversed
and the documents are added to the database. When adding documents, only the
keys in the schema are stored. This means that, e.g., if publisher is not in
the schema you will not be able to search for the publisher through a query.
- papis.database.whoosh.WHOOSH_FOLDER_FIELD = 'papis-folder'
Field name used to store the document main folder the the Whoosh database.
- class papis.database.whoosh.WhooshDatabase(library: Library | None = None)[source]
- get_backend_name() str[source]
Get the name of the database backend.
This name has to match the one used in the configuration file in the
database-backendsetting.
- initialize() None[source]
Initialize the caching database backend.
This can involve creating any necessary directories, opening files, etc. This function should be called in the constructor of the database class, as needed.
- clear() None[source]
Clear the database by removing all files and directories.
After clearing the database, calling
initialize()may be necessary to ensure that it is in the correct state.
- query(query_string: str) list[Document][source]
Find a document in the database by the given query_string.
The query string can have a more complex syntax based on the database backend.
papis.docmatcher
- class papis.docmatcher.ParseResult(search: str, pattern: re.Pattern[str], doc_key: str | None)[source]
Result from parsing a search string using
parse_query().For example, a search string such as
"author:einstein"will result in:r = ParseResult(search="einstein", pattern=<...>, doc_key="author")
- pattern: Pattern[str]
A regex pattern constructed from the
searchusingget_regex_from_search().
- class papis.docmatcher.MatcherCallable(*args, **kwargs)[source]
A callable
typing.Protocolused to match a document for a given search.- __call__(document: Document, search: re.Pattern[str], match_format: AnyString | None = None, doc_key: str | None = None) Any[source]
Match a document’s keys to a given search pattern.
The matcher can decide whether the match_format or the doc_key take priority when matching against the given pattern in search. If possible, doc_key should be given priority as the more specific choice.
- Parameters:
search – a regex pattern to match the query against (see
ParseResult.pattern).match_format – a format pattern (see
papis.format.format()) to match against.doc_key – a specific key in the document to match against.
- Returns:
None if the match fails and anything else otherwise.
- class papis.docmatcher.DocumentMatcher(search: str, query: list[ParseResult], match_format: FormatPattern, matcher: MatcherCallable)[source]
A class that can be used to match documents to a query.
- query: list[ParseResult]
The query resulting from
parse_query().
- match_format: FormatPattern
A format that is used to match a document against.
- matcher: MatcherCallable
A callable used to match a document to the
queryusing thematch_format.
- papis.docmatcher.make_document_matcher(search: str, *, matcher: MatcherCallable | None = None, match_format: AnyString | None = None) Callable[[Document], Document | None][source]
Create a callable that can be used to match documents against the given search query.
>>> from papis.document import from_data >>> doc = from_data({'title': 'einstein'}) >>> matcher = make_document_matcher('einste') >>> matcher(doc) is not None True >>> matcher = make_document_matcher('heisenberg') >>> matcher(doc) is not None False >>> matcher = make_document_matcher('title : ein') >>> matcher(doc) is not None True
- Parameters:
matcher – a callable used to match the documents. This defaults to
match_document().match_format – a format used to match against the query. This defaults to
match-format.
- papis.docmatcher.get_regex_from_search(search: str) Pattern[str][source]
Creates a default regex from a search string.
>>> get_regex_from_search(' ein 192 photon').pattern '.*ein.*192.*photon.*' >>> get_regex_from_search('{1234}').pattern '.*\\{1234\\}.*'
- Parameters:
search – a valid search string.
- Returns:
a regular expression representing the search string, which is properly escaped and allows for multiple spaces.
- papis.docmatcher.parse_query(query_string: str) list[ParseResult][source]
Parse a query string using
pyparsing.The query language implemented by this function for Papis supports strings of the form:
'hello author : Einstein title: "Fancy Title: Part 1" tags'
which will result in:
results = [ ParseResult(search="hello", pattern=<...>, doc_key=None), ParseResult(search="Einstein", pattern=<...>, doc_key="author"), ParseResult(search="Fancy Title: Part 1", pattern=<...>, doc_key="title"), ParseResult(search="tags", pattern=<...>, doc_key=None), ]
We can see there that constructs of the form
"key:value"with the colon as a separator are recognized and parsed to document keys with the color. They can be escaped by enclosing them in quotes. Otherwise, each individual word in the search query will give anotherParseResult. Each search term can contain additional regex characters.>>> print(parse_query('hello author : einstein')) [['hello'], ['author', 'einstein']] >>> print(parse_query('')) [] >>> print( parse_query( '"hello world whatever :" tags : \'hello ::::\'')) [['hello world whatever :'], ['tags', 'hello ::::']] >>> print(parse_query('hello')) [['hello']]
- Parameters:
query_string – a search string to parse into a structured format.
- Returns:
a list of parsing results for each token in the query string.
papis.document
Module defining the main document type.
- papis.document.DocumentLike: TypeAlias = 'Document | dict[str, Any]'
A union of types that can be converted to a document.
- papis.document.EmptyKeyConversion = {'action': None, 'key': None}
A default
KeyConversion.
- class papis.document.KeyConversionPair(from_key, rules)[source]
-
- rules: list[KeyConversion]
A
listofKeyConversionkey mapping rules used to rename and post-process thefrom_keyand its value.
- papis.document.keyconversion_to_data(conversions: Sequence[KeyConversionPair], data: dict[str, Any], keep_unknown_keys: bool = False) dict[str, Any][source]
Function to convert between dictionaries.
This can be used to define a fixed set of translation rules between, e.g., JSON data obtained from a website API and standard
papiskey names and formatting. The implementation is completely generic.For example, we have the simple dictionary:
data = {"id": "10.1103/physrevb.89.140501"}
which contains the DOI of a document with the wrong key. We can then write the following rules:
conversions = [ KeyConversionPair("id", [ {"key": "doi", "action": None}, {"key": "url": "action": lambda x: "https://doi.org/{}".format(x)} ]) ] new_data = keyconversion_to_data(conversions, data)
to rename the
"id"key to the standard"doi"key used bypapisand a URL. Any number of such rules can be written, depending on the complexity of the incoming data. Note that any errors raised on the application of the action will be silently ignored and the corresponding key will be skipped.- Parameters:
conversions – a sequence of
KeyConversionPairs used to convert the data.data – a
dictto be convert according to conversions.keep_unknown_keys – if True unknown keys from data are kept in the resulting dictionary. Otherwise, only keys from conversions are present.
- Returns:
a new
dictcontaining the entries from data converted according to conversions.
- papis.document.author_list_to_author(data: dict[str, Any], separator: str | None = None, multiple_authors_format: AnyString | None = None) str[source]
Convert a list of authors into a single author string.
This uses the
multiple-authors-separatorand themultiple-authors-formatsettings to construct the concatenated authors.- Parameters:
data – a
dictthat contains an"author_list"key to be converted into a single author string.
>>> author1 = {"given": "Some", "family": "Author"} >>> author2 = {"given": "Other", "family": "Author"} >>> author_list_to_author({"author_list": [author1, author2]}) 'Author, Some and Author, Other'
- papis.document.guess_authors_separator(authors: str) str[source]
Attempt to determine the separator for various non-BibTeX author lists.
- Parameters:
authors – author string to determine the separator for.
- Returns:
a regex that can be used to split the authors string.
For example:
>>> s = "Sanger, F. and Nicklen, S. and Coulson, A. R." >>> assert guess_authors_separator(s) == "and" >>> s = "Fabian Sanger and Steven Nicklen and Alexander R. Coulson" >>> assert guess_authors_separator(s) == "and" >>> s = "Fabian Sanger, Steven Nicklen, Alexander R. Coulson" >>> assert guess_authors_separator(s) == "," >>> s = "Fabian Sanger, and Steven Nicklen, and Alexander R. Coulson" >>> import re >>> sep = guess_authors_separator(s) >>> assert re.match(sep, ", and") >>> s = "Dagobert Duck and von Beethoven, Ludwig and Ford, Jr., Henry" >>> assert guess_authors_separator(s) == "and" >>> s = "Turing, A. M." >>> assert guess_authors_separator(s) == "and"
- papis.document.split_author_name(author: str) dict[str, Any][source]
Split an author name into a given and family name.
This uses
bibtexparser.customization.splitname()to correctly split and determine the first and last names of an author in the list. Note that this is just a heuristic and can give incorrect results for certain author names.- Parameters:
author – a string containing an author name.
- Returns:
a
dictwith the family and given name of the author.
- papis.document.split_authors_name(authors: str | list[str], separator: str | None = None) list[dict[str, Any]][source]
Convert list of authors to a fixed format.
Uses
split_author_name()to construct the individual authors and the separator to split the authors in the list.- Parameters:
authors – a list of author names, where each entry can consists of multiple authors separated by separator.
separator – a separator for entries in authors that contain multiple authors. If None, a separator is guessed using
guess_authors_separator().
- class papis.document.DocHtmlEscaped(doc: Document)[source]
Small helper class to escape HTML elements in a document.
>>> DocHtmlEscaped(from_data({"title": '> >< int & "" "'}))['title'] '> >< int & "" "'
- class papis.document.Document(folder: str | None = None, data: dict[str, Any] | None = None)[source]
An abstract document in a
papislibrary.This class inherits from a standard
dictand implements some additional functionality.- html_escape
A
DocHtmlEscapedinstance that can be used to escape keys in the document for use in HTML documents.
- set_folder(folder: str) None[source]
Set the document’s main folder.
This also updates the location of the info file and other attributes. Note, however, that it will not load any data from the given folder even if it contains another info file (see
from_folder()for this functionality).- Parameters:
folder – an absolute path to a new main folder for the document.
- get_main_folder() str | None[source]
- Returns:
the root path in the filesystem where the document is stored, if any.
- get_main_folder_name() str | None[source]
- Returns:
the folder name of the document, i.e. the basename of the path returned by
get_main_folder().
- get_info_file() str[source]
- Returns:
path to the info file, which can also be an empty string if no such file has been created.
- get_files() list[str][source]
Get the files linked to the document.
The files in a document are stored relative to its main folder. If no main folder is set on the document (see
set_folder()), then this function will not return any files. To retrieve the relative file paths only, accessdoc["files"]directly.- Returns:
a
listof absolute file paths in the document’s main folder, if any.
- papis.document.from_data(data: dict[str, Any]) Document[source]
Construct a
Documentfrom a dictionary.- Parameters:
data – a dictionary to be made into a new document.
- papis.document.from_folder(folder_path: str) Document[source]
Construct a
Documentfrom a folder.- Parameters:
folder_path – absolute path to a valid
papisfolder.
- papis.document.to_json(document: Document) str[source]
Export the document to JSON.
- Returns:
a JSON string corresponding to all the entries in the document.
- papis.document.to_dict(document: Document) dict[str, Any][source]
Convert a document back into a standard
dict.- Returns:
a
dictcorresponding to all the entries in the document.
- papis.document.dump(document: Document) str[source]
Dump the document into a string.
The format of the string is not fixed and is meant to be used to display the document entries in a consistent way across
papis.- Returns:
a string containing all the entries in the document.
>>> doc = from_data({'title': 'Hello World'}) >>> dump(doc) 'title: Hello World'
- papis.document.delete(document: Document) None[source]
Delete a document from the filesystem.
This function delete the main folder of the document (recursively), but it does not delete the in-memory version of the document.
- papis.document.describe(document: Document | dict[str, Any]) str[source]
- Returns:
a string description of the current document using
document-description-format.
- papis.document.move(document: Document, path: str) None[source]
Move the document to a new main folder at path.
This supposes that the document exists in the location
document.get_main_folder()and will change the folder in the input document as a result.- Parameters:
path – absolute path where the document should be moved to. This path is expected to not exist yet and will be created by this function.
>>> doc = from_data({'title': 'Hello World'}) >>> doc.set_folder('path/to/folder') >>> import tempfile; newfolder = tempfile.mkdtemp() >>> move(doc, newfolder) Traceback (most recent call last): ... FileExistsError: There is already...
- papis.document.sort(docs: Sequence[Document], key: str, reverse: bool = False) list[Document][source]
Sort a list of documents by the given key.
The sort is performed on the key with a priority given to the type of the value. If the key does not exist in the document, this is given the lowest priority and left at the end of the list.
- Parameters:
docs – a sequence of documents.
key – a key in the documents by which to sort.
reverse – if True, the sorting is done in reverse order (descending instead of ascending).
- Returns:
a list of documents sorted by key.
- papis.document.new(folder_path: str, data: dict[str, Any], files: Sequence[str] | None = None) Document[source]
Creates a complete document with data and existing files.
The document is saved to the filesystem at folder_path and all the given files are copied over to the main folder.
- Parameters:
folder_path – a main folder for the document.
data – a
dictwith key and values to be used as metadata in the document.files – a sequence of files to add to the document.
- Raises:
FileExistsError – if folder_path already exists.
papis.downloaders
- class papis.downloaders.WebImporter(uri: str = '')[source]
Importer that tries to get data and files from implemented downloaders.
This importer simply calls
get_info_from_url()on the given URI.- classmethod match(uri: str) Importer | None[source]
Check if the importer can process the given URI.
For example, an importer that supports links from arXiv can check that the given URI matches using:
re.match(r".*arxiv.org.*", uri)
This can then be used to instantiate and return a corresponding
Importerobject.- Parameters:
uri – An URI from which the document metadata should be retrieved.
- Returns:
An importer instance if the match to the URI is successful or None otherwise.
- fetch() None[source]
Fetch metadata and files for the given
uri.This method calls
fetch_data()andfetch_files()to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.The imported data is stored in
ctxand it is not queried again on subsequent calls to this function.
- class papis.downloaders.Downloader(uri: str = '', name: str = '', ctx: Context | None = None, expected_document_extension: str | Sequence[str] | None = None, cookies: dict[str, str] | None = None, priority: int = 1)[source]
A base class for downloader instances implementing common functionality.
In general, downloaders are expected to implement a subset of the methods below, depending on the generality. A simple downloader could only implement
get_bibtex_url()andget_document_url().- expected_document_extension
A single extension or a list of extensions supported by the downloader. The extensions do not contain the leading dot, e.g.
["pdf", "djvu"].
- priority
A priority given to the downloader. This is used when trying to automatically determine a preferred downloader for a given URL.
- session
A
requests.Sessionthat is used for all the requests made by the downloader.
- classmethod match(url: str) Downloader | None[source]
Check if the downloader can process the given URL.
For example, an importer that supports links from the arXiv can check that the given URL matches using:
re.match(r".*arxiv.org.*", uri)
This can then be used to instantiate and return a corresponding
Downloaderobject.- Parameters:
url – An URL where the document information should be retrieved from.
- Returns:
A downloader instance if the match to the URL is successful or None otherwise.
- fetch() None[source]
Fetch metadata and files for the given
uri.This method calls
Downloader.fetch_data()andDownloader.fetch_files()to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.The imported data is stored in
ctxand it is not queried again on subsequent calls to this function.
- fetch_data() None[source]
Fetch metadata for the given URL.
The imported metadata is stored in
ctx. To fetch the metadata, the following steps are followedCall
get_data()to import any scraped metadata.Call
get_bibtex_data()to import any metadata from BibTeX files available remotely.
Note that previous steps overwrite any information, i.e. the BibTeX data will take priority.
- fetch_files() None[source]
Fetch files from the given
uri.The imported files are stored in
ctx. The file is downloaded withdownload_document()and stored as a temporary file.
- get_bibtex_url() str | None[source]
- Returns:
an URL to a valid BibTeX file that can be used to extract metadata about the document.
- get_bibtex_data() str | None[source]
Get BibTeX data available at
get_bibtex_url(), if any.- Returns:
a string containing the BibTeX data, which can be parsed.
- download_bibtex() None[source]
Download and store that BibTeX data from
get_bibtex_url().Use
get_bibtex_data()to access the metadata from the BibTeX URL.
- get_data() dict[str, Any][source]
Retrieve general metadata from the given URL.
This function is meant to be as general as possible and should not contain data imported from BibTeX (use
get_bibtex_data()instead). For example, this can be used for web scrapping or calling other website APIs to gather metadata about the document.
- get_document_data() bytes | None[source]
Get data for the downloaded file that is given by
get_document_url().- Returns:
the bytes (stored in memory) for the downloaded file.
- get_document_extension() str[source]
- Returns:
a guess for the extension of
get_document_data(). This is based on filetype and uses magic file signatures to determine the type. If no guess is valid, an empty string is returned.
- download_document() None[source]
Download and store the file that is given by
get_document_url().Use
get_document_data()to access the file binary contents.
- check_document_format() bool[source]
Check if the document downloaded by
download_document()has a file type supported by the downloader.If the downloader has no preferred type, then all files are accepted.
- Returns:
True if the document has a supported file type and False otherwise.
- papis.downloaders.get_available_downloaders() list[type[Downloader]][source]
Get all declared downloader classes.
- papis.downloaders.get_matching_downloaders(url: str) list[Downloader][source]
Get downloaders matching the given url.
- Parameters:
url – a URL to match.
- Returns:
a list of downloaders (sorted by priority).
- papis.downloaders.get_downloader_by_name(name: str) type[Downloader][source]
Get a specific downloader by its name.
- Parameters:
name – the name of the downloader. Note that this is the name of the entry point used to define the downloader. In general, this should be the same as its name, but this is not enforced.
- Returns:
a downloader class.
- papis.downloaders.get_info_from_url(url: str, expected_doc_format: str | None = None) Context[source]
Get information directly from the given url.
- Parameters:
url – the URL of a resource.
expected_doc_format – an expected document file type, that is used to override the file type defined by the chosen downloader.
- papis.downloaders.download_document(url: str, expected_document_extension: str | None = None, cookies: dict[str, Any] | None = None, filename: str | None = None) str | None[source]
Download a document from url and store it in a local file.
An appropriate filename is deduced from the HTTP response in most cases. If this is not possible, a temporary file is created instead. To ensure that the desired filename is chosen, provide the filename argument instead.
- Parameters:
url – the URL of a remote file.
expected_document_extension – an expected file extension. If None, then an extension is guessed from the file contents or from the filename.
filename – a file name for the document, regardless of the given URL and extension.
- Returns:
an absolute path to a local file containing the data from url.
papis.exceptions
This module implements custom exceptions used to make the code more readable.
- exception papis.exceptions.UnexpectedSettingTypeError[source]
Exception raised when a configuration setting has an unexpected type.
- exception papis.exceptions.DefaultSettingValueMissing(key: str)[source]
Exception raised when a configuration setting is missing and has no default value.
- exception papis.exceptions.DocumentFolderNotFound(doc: str)[source]
Exception raised when a document has no main folder.
papis.filetype
- papis.filetype.guess_content_extension(content: bytes) str | None[source]
Guess the extension from (potential) file contents.
This method attempts to look at known file signatures to determine the file type. This is not always possible, as it is hard to determine a unique type.
- Parameters:
content – contents of a file.
- Returns:
an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.
- papis.filetype.guess_document_extension(document_path: str) str | None[source]
Guess the extension of a given file at document_path.
- Parameters:
document_path – path to an existing file.
- Returns:
an extension string (e.g. “pdf” without the dot) or None if the file type cannot be determined.
- papis.filetype.get_document_extension(document_path: str) str[source]
Get an extension for the file at document_path.
This uses
guess_document_extension()and returns a default extension “data” if no specific type can be determined from the file.- Parameters:
document_path – path to an existing file.
- Returns:
an extension string.
papis.format
- papis.strings.AnyString
A union of allowable formatting string types.
- class papis.strings.FormatPattern(formatter: str | None, pattern: str)[source]
A tuple that defines a
(formatter, string)pair.In a configuration file, a format pattern can be defined as:
key = pattern other_key.formatter = other_pattern
where the first key will use the default
formatterand the second key will use the specified formatter. These keys can be read usingpapis.config.getformatpattern().
- papis.format.FORMATTER_NAMESPACE_NAME = 'papis.format'
Name of the entry point namespace for
Formatterplugins.
- exception papis.format.InvalidFormatterError[source]
Deprecated: Use
papis.plugin.InvalidPluginTypeErrorinstead.
- exception papis.format.FormatFailedError[source]
An exception that is thrown when a format pattern fails to be interpolated.
This can happen due to lack of data (e.g. missing fields in the document) or invalid format patterns (e.g. passed to the wrong formatter).
- class papis.format.Formatter[source]
A generic formatter that works on templated strings using a document.
- format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) str[source]
- Parameters:
fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to
format-doc-name.default – an optional pattern to use as a default value if the formatting fails. If no default is given, a
FormatFailedErrorwill be raised.additional – a
dictof additional entries to pass to the formatter.
- Returns:
a string with all the replacement fields filled in.
- papis.format.get_available_formatters() list[str][source]
Get a list of all the available formatter plugins.
- papis.format.get_formatter_by_name(name: str) Formatter[source]
Initialize and return a formatter plugin.
- Parameters:
name – the name of the desired formatter.
- papis.format.get_cached_formatter(name: str | None = None) Formatter[source]
A cached variant of
get_formatter_by_name().- Parameters:
name – the name of the desired formatter, by default this uses the value of
formatter.
- papis.format.format(fmt: AnyString, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) str[source]
Format a string using the selected formatter.
This is the user-facing function that should be called when formatting a string. The formatters should not be called directly.
Arguments match those of
Formatter.format().
- class papis.format.python.PythonFormatter[source]
Construct a string using a PEP 3101 (str.format based) format pattern.
This formatter is named
"python"and can be set using theformattersetting in the configuration file. The format pattern has access to thedocvariable, that is always aDocument. A pattern using this formatter can look like:"{doc[year]} - {doc[author_list][0][family]} - {doc[title]}"
Note, however, that according to PEP 3101 some simple formatting is not possible. For example, the following is not allowed:
"{doc[title].lower()}"and should be replaced with:
"{doc[title]!l}"The following special conversions are implemented: “l” for
str.lower(), “u” forstr.upper(), “t” forstr.title(), “c” forstr.capitalize(), “y” that usesslugify(throughpapis.paths.normalize_path()). Additionally, the following syntax is available to select subsets from a string:"{doc[title]:1.3S}"which will select the
words[1:3]from the title (words are split by single spaces).- format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) str[source]
- Parameters:
fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to
format-doc-name.default – an optional pattern to use as a default value if the formatting fails. If no default is given, a
FormatFailedErrorwill be raised.additional – a
dictof additional entries to pass to the formatter.
- Returns:
a string with all the replacement fields filled in.
- class papis.format.jinja.Jinja2Formatter[source]
Construct a string using Jinja2 templates.
This formatter is named
"jinja2"and can be set using theformattersetting in the configuration file. The format pattern has access to thedocvariable, that is always aDocument. A pattern using this formatter can look like:"{{ doc.year }} - {{ doc.author_list[0].family }} - {{ doc.title }}"This formatter supports the whole range of Jinja2 control structures and filters so more advanced string processing is possible. For example, we can titlecase the title using:
"{{ doc.title | title }}"or give a default value if a key is missing in the document using:
"{{ doc.isbn | default('ISBN-NONE', true) }}"- env: ClassVar[Any] = None
The
jinja2Environment used by the formatter. This should be obtained withget_environment()(cached) and modified as required (e.g. by adding filters).
- classmethod get_environment(*, force: bool = False) Any[source]
Construct and cache the
jinja2environment used by the formatter.The environment is created on the first call to
format()and cached for future use. If it should be recreated after that, this function can be called with force set to True.- Parameters:
force – if True, the environment will be recreated.
- format(fmt: str, doc: DocumentLike, doc_key: str = '', additional: dict[str, Any] | None = None, default: str | None = None) str[source]
- Parameters:
fmt – a format pattern understood by the formatter.
doc – an object convertible to a document.
doc_key – the name of the document in the format pattern. By default, this falls back to
format-doc-name.default – an optional pattern to use as a default value if the formatting fails. If no default is given, a
FormatFailedErrorwill be raised.additional – a
dictof additional entries to pass to the formatter.
- Returns:
a string with all the replacement fields filled in.
papis.git
This module serves as an lightweight interface for git related functions.
- papis.git.add(path: str, resource: str) None[source]
Adds changes in the path to the git index with a message.
- Parameters:
path – a folder with an existing git repository.
resource – a resource (e.g.
info.yamlfile) to add to the index.
- papis.git.commit(path: str, message: str) None[source]
Commits changes in the path with a message.
- Parameters:
path – a folder with an existing git repository.
message – a commit message.
- papis.git.mv(from_path: str, to_path: str) None[source]
Renames (moves) the path from_path to to_path.
- Parameters:
from_path – path to be moved (the source).
to_path – destination where from_path is moved. If this is in the same parent directory as from_path, it is a simple rename.
- papis.git.remove(path: str, resource: str, recursive: bool = False, force: bool = True) None[source]
Remove a resource from the git repository at path.
- Parameters:
path – a folder with an existing git repository.
resource – a resource (e.g.
info.yamlfile) to remove from git.recursive – if True, the given resource is removed recursively.
force – if True, the removal is forced so any errors (e.g. file does not exist) are silently ignored.
- papis.git.add_and_commit_resource(path: str, resource: str, message: str) None[source]
Adds and commits a single resource.
- Parameters:
path – a folder with an existing git repository.
resource – a resource (e.g.
info.yamlfile) to remove from git.message – a commit message.
papis.hooks
- papis.hooks.HOOKS_EXTENSION_FORMAT = 'papis.hook.{name}'
Name format of the entrypoint group for hooks e.g.
papis.hook.on_edit_done.
- papis.hooks.CUSTOM_LOCAL_HOOKS: dict[str, list[Callable[..., None]]] = {}
A dictionary of hooks added with
add(). These can be added inconfig.pyor from other places that do not use the entrypoint framework.
- papis.hooks.run(name: str, *args: Any, **kwargs: Any) None[source]
Run a hook given by its name.
Additional positional and keyword arguments are passed directly to the hook. If it does not support these arguments, the hook will be skipped.
Hooks are run in the following order:
The hooks defined by an entry point.
The hooks defined in
CUSTOM_LOCAL_HOOKS.
papis.id
- papis.id.ID_KEY_NAME: str = 'papis_id'
Key name used to store the Papis ID. This key name is reserved for use in Papis databases and documents. It can also change in the future, so it is recommended to use this variable instead of hardcoding the name.
- papis.id.compute_an_id(doc: Document, separator: str | None = None) str[source]
Make an ID for the input document doc.
This is a non-deterministic function if separator is None (a random value is used). For a given value of separator, the result is deterministic.
- Parameters:
doc – a document for which to generate an ID.
separator – a string used to separate the document fields that go into constructing the ID.
- Returns:
a (hexadecimal) ID for the document that is unique to high probability.
- papis.id.get(doc: DocumentLike) str[source]
Get the Papis ID from doc.
This function does additional checking on the ID and can raise an error if it does not exist. If the ID is known to exist, use
ID_KEY_NAMEdirectly.
papis.importer
- papis.importer.IMPORTER_NAMESPACE_NAME = 'papis.importer'
Name of the entry point namespace for
Importerplugins.
- class papis.importer.Importer(uri: str = '', name: str = '', ctx: Context | None = None)[source]
A base class for Papis importer plugins.
- uri: str
The URI (Uniform Resource Identifier) that the importer is to extract data from. This can be an URL, a local or remote file name, an object identifier (e.g. DOI), etc.
- classmethod match(uri: str) Importer | None[source]
Check if the importer can process the given URI.
For example, an importer that supports links from arXiv can check that the given URI matches using:
re.match(r".*arxiv.org.*", uri)
This can then be used to instantiate and return a corresponding
Importerobject.- Parameters:
uri – An URI from which the document metadata should be retrieved.
- Returns:
An importer instance if the match to the URI is successful or None otherwise.
- classmethod match_data(data: dict[str, Any]) Importer | None[source]
Check if the importer can process the given metadata.
This method can be used to search for valid URIs inside the data that can then be processed by the importer. For example, if the metadata contains a DOI field, this can be used to import additional information.
- Parameters:
data – A
dictwith metadata to inspect and match against.- Returns:
An importer instance if matching metadata is found or None otherwise.
- fetch() None[source]
Fetch metadata and files for the given
uri.This method calls
fetch_data()andfetch_files()to get all the information available for the document. It is recommended to implement the two methods separately, if possible, for maximum flexibility.The imported data is stored in
ctxand it is not queried again on subsequent calls to this function.
- papis.importer.get_importer_by_name(name: str) type[Importer][source]
Get an importer class by name.
- papis.importer.get_matching_importers_by_name(name_and_uris: Iterable[tuple[str, str]], *, include_downloaders: bool = False) list[Importer][source]
Get importers that match the given names.
This function tries to match the URI using
match()for each importer in name_and_uris. All matching importers are then returned, but no data is fetched (seefetch_importers()).- Parameters:
name_and_uris – an iterable of
(name, uri)tuples that describe the importer names and URIs to match them against.include_downloaders – if True, downloader plugins are also included when matching the given names and URIs.
- papis.importer.get_matching_importers_by_uri(uri: str, *, include_downloaders: bool = False) list[Importer][source]
Get importers that match the given URI.
This function tries to match the URI using
match()for all known importers. All matching importers are then returned, but no data is fetched (seefetch_importers()).- Parameters:
include_downloaders – if True, downloader plugins are also included when matching the given URI.
- papis.importer.get_matching_importers_by_doc(doc: DocumentLike, *, include_downloaders: bool = False) list[Importer][source]
Get importers that match the given document.
This function tries to match the document using
match_data(). All matching importers are then returned, but no data is fetched (seefetch_importers()).- Parameters:
doc – a dictionary containing document metadata.
include_downloaders – if True, downloader plugins are also included when matching the given URI.
- papis.importer.fetch_importers(importers: Iterable[Importer], *, download_files: bool = True) list[Importer][source]
Fetch data from the given importers.
- Parameters:
download_files – if True, importers also try to download files (PDFs, etc.) instead of just metadata.
- Returns:
a list of importers that have not failed to fetch their metadata.
- papis.importer.collect_from_importers(importers: Iterable[Importer], *, batch: bool = True, use_files: bool = True) Context[source]
Collect all data from the given importers.
It is assumed that the importers have called the needed
fetchmethods, so all data has been downloaded and converted (seefetch_importers()). This function is meant to only do the aggregation.- Parameters:
batch – if True, overwrite data from previous importers, otherwise ask the user to manually merge. Note that files are always kept, even if they potentially contain duplicates.
use_files – if True, both metadata and files are collected from the importers.
papis.library
papis.logging
- class papis.logging.ColoramaFormatter(log_format: str, full_tb: bool = False)[source]
A custom logging formatter that uses
colorama.- full_tb: bool
A flag to denote whether a full traceback should be displayed when used with
logger.info(..., exc_info=ext).
- papis.logging.quiet(name: str, level: int = 30) Iterator[None][source]
Temporarily sets the logging in the given module to
WARNING.
- papis.logging.setup(level: int | str | None = None, color: str | None = None, logfile: str | None = None, verbose: bool | None = None) None[source]
Set up formatting and handlers for the root level Papis logger.
- Parameters:
level – default logging level (see
logging). By default, this takes values from thePAPIS_LOG_LEVELenvironment variable and falls back to"INFO".color – flag to control logging colors. It should be one of
("always", "auto", "no"). By default, this takes values from thePAPIS_LOG_COLORenvironment variable and falls back to"auto".logfile – a path for a file in which to write log messages. By default, this takes values from the
PAPIS_LOG_FILEenvironment variable and falls back to None.verbose – make logger verbose (including debug information) regardless of the level. By default, this takes values from the
PAPIS_DEBUGenvironment variable and falls back to False.
papis.notes
This module controls the notes for every Papis document.
- papis.notes.notes_path(doc: Document) str[source]
Get the path to the notes file corresponding to doc.
If the document does not have attached notes, a filename is constructed (using the
notes-namesetting) in the document’s main folder.- Returns:
a absolute filename that corresponds to the attached notes for doc (this file does not necessarily exist).
- papis.notes.notes_path_ensured(doc: Document) str[source]
Get the path to the notes file corresponding to doc or create it if it does not exist.
If the notes do not exist, a new file is created using
notes_path()and filled with the contents of the template given by thenotes-templateconfiguration option.- Returns:
an absolute filename that corresponds to the attached notes for doc.
papis.paths
- papis.paths.unique_suffixes(chars: str | None = None, skip: int = 0) Iterator[str][source]
Creates an infinite list of suffixes based on chars.
This creates a generator object capable of iterating over lists to create unique products of increasing cardinality (see here). This is mainly intended to create suffixes for existing strings, e.g. file names, to ensure uniqueness.
- Parameters:
chars – list to iterate over
skip – number of suffices to skip (negative integers are set to 0).
>>> import string >>> s = unique_suffixes(string.ascii_lowercase) >>> next(s) 'a' >>> s = unique_suffixes(skip=3) >>> next(s) 'd'
- papis.paths.normalize_path(path: str, *, lowercase: bool | None = None, extra_chars: str | None = None, separator: str | None = None) str[source]
Clean a path to only contain visible ASCII characters.
This function will create ASCII strings that can be safely used as file names or printed to consoles that do not necessarily support full unicode.
- Parameters:
lowercase – if True, the resulting string will always be lowercased (defaults to
doc-paths-lowercase).extra_chars – extra characters that are allowed in the output path besides the default ASCII alphanumeric characters (defaults to
doc-paths-extra-chars).separator – word separator used to replace any non-allowed characters in the path (defaults to
doc-paths-word-separator).
- Returns:
a cleaned ASCII string.
- papis.paths.is_relative_to(path: Path | str, other: Path | str) bool[source]
Check if paths are relative to each other.
This is equivalent to
pathlib.PurePath.is_relative_to().- Returns:
True if path is relative to the other path.
- papis.paths.symlink(src: Path | str, dst: Path | str) None[source]
Create a symbolic link pointing to src named dst.
This is a simple wrapper around
os.symlink()that attempts to give better error messages on different platforms. For example, it offers suggestions for some missing privilege issues.- Parameters:
src – the existing file that dst points to.
dst – the name of the new symbolic link, pointing to src.
- papis.paths.get_document_file_name(doc: DocumentLike, orig_path: PathLike, suffix: str = '', *, file_name_format: AnyString | Literal[False] | None = None, base_name_limit: int = 150) str[source]
Generate a file name based on orig_path for the document doc.
This function will generate a file name for the given file path (that does not necessarily exist) based on the document data. If the document data does not provide the necessary keys for file_name_format, then the original path will be preserved.
If resulting path will have the same extension as orig_path and will be modified by
normalize_path(). The extension is determined usingget_document_extension().- Parameters:
orig_path – an input file path
suffix – a suffix to be appended to the end of the new file name.
file_name_format – a format pattern used to construct a new file name from the document data (see
papis.format.format()). This value defaults toadd-file-nameif not provided.base_name_limit – a maximum character length of the file name. This is important on operating systems of filesystems that do not support long file names.
- Returns:
a new path based on the document data and the orig_path.
- papis.paths.get_document_folder(doc: DocumentLike, dirname: PathLike, *, folder_name_format: AnyString | None = None) str[source]
Generate a folder name for the document at dirname.
This function uses
add-folder-nameto generate a folder name for the doc at dirname. If no folder can be constructed from the format, then the document’spapis_idis used instead as a subfolder of dirname. Thepapis_idis guaranteed to be unique.- Parameters:
doc – the document used on the folder_name_format.
dirname – the base directory in which to generate the document main folder.
folder_name_format – a format to use for the folder name that will be filled in using the given doc. If no format is given, we default to
add-folder-name. This format can have additional subfolders.
- Returns:
a folder name for doc with the root at dirname.
- papis.paths.get_document_unique_folder(doc: DocumentLike, dirname: PathLike, *, folder_name_format: AnyString | None = None) str[source]
A wrapper around
get_document_folder()that ensures that the folder is unique by adding suffixes.- Returns:
a folder name for doc with the root at dirname that does not yet exist on the filesystem.
- papis.paths.download_remote_files(in_document_paths: Iterable[str]) list[str | None][source]
Download all remote filepaths that are provided in the document list.
- Parameters:
in_document_paths – a list of filename paths and URLs.
- Returns:
a list of files, where each remote file is replaced with a temporary local file. If there is an error while downloading the remote file, None is used instead.
- papis.paths.rename_document_files(doc: DocumentLike, in_document_paths: Iterable[str], *, allow_remote: bool | None = None, file_name_format: AnyString | Literal[False] | None = None) list[str][source]
Rename in_document_paths according to file_name_format and ensure uniqueness.
Uniqueness is required with respect to the files in in_document_paths and those in the doc itself (under the files key). If a repeated file name is found, a suffix is generated using
unique_suffixes()and appended to the new file.- Parameters:
file_name_format – a format pattern used to construct a new file name from the document data (see
papis.format.format()). This value defaults toadd-file-nameif not provided.allow_remote – if True, in_document_paths can also be remote URL, that will be downloaded to local files.
- Returns:
a list of modified file names form in_document_paths that are renamed based on file_name_format and suffixed for uniqueness.
papis.pick
- papis.pick.PICKER_NAMESPACE_NAME = 'papis.picker'
Name of the entry point namespace for
Pickerplugins.
- class papis.pick.Picker[source]
An interface used to select items from a list.
- abstractmethod __call__(items: Sequence[T], header_filter: Callable[[T], str], match_filter: Callable[[T], str], default_index: int = 0) list[T][source]
- Parameters:
items – a sequence of items from which to pick a subset.
header_filter – a callable that takes an item from items and returns a string representation shown to the user.
match_filter – a callable that takes an item from items and returns a string representation that is used when searching or filtering the items.
default_index – sets the selected item when the picker is first shown to the user.
- Returns:
a subset of items that were picked.
- papis.pick.get_picker_by_name(name: str) type[Picker[Any]][source]
Get a picker by its plugin name.
- papis.pick.pick(items: Sequence[T], header_filter: Callable[[T], str] = <class 'str'>, match_filter: Callable[[T], str] = <class 'str'>, default_index: int = 0, *, picktool: str | None = None) list[T][source]
Load a
Pickerplugin and select a subset of items.The arguments to this function match those of
Picker.__call__(). The specific picker is chosen through thepicktoolconfiguration option.- Returns:
a subset of items that were picked.
- papis.pick.pick_doc(documents: Sequence[Document], *, header_format_file: str | None = None, header_format: AnyString | None = None, match_format: AnyString | None = None) list[Document][source]
Pick from a sequence of documents using
pick().This function uses the
header-format-filesetting or, if not available, theheader-formatsetting to construct a header_filter for the picker. It also uses the configuration settingmatch-formatto construct a match_filter. These configuration settings can also be passed by argument.- Parameters:
documents – a sequence of documents.
- Returns:
a subset of documents that was picked.
- papis.pick.pick_subfolder_from_lib(libname: str) list[str][source]
Pick subfolders from all existing subfolders in lib.
Note that this includes document folders in lib as well nested library folders.
- Parameters:
libname – the name of an existing library to search in.
- Returns:
a subset of the subfolders in the library.
papis.plugin
- exception papis.plugin.PluginNotFoundError(namespace: str, name: str)[source]
An error raised when a plugin is not found.
- exception papis.plugin.InvalidPluginTypeError(namespace: str, name: str)[source]
An error raised when the plugin is not the expected type.
- papis.plugin.get_entrypoints(namespace: str) list[EntryPoint][source]
- Returns:
a list of available entrypoints in the given namespace.
- papis.plugin.get_entrypoint_by_name(namespace: str, name: str) EntryPoint | None[source]
Get the entrypoint name from the given namespace.
If no such entrypoint exists, then None is returned. To load the plugin defined by the entrypoint, use
Entrypoint.load.
- papis.plugin.get_plugin_names(namespace: str) list[str][source]
- Returns:
a list of available entrypoint names in the given namespace.
papis.sphinx_ext
A collection of Papis-specific Sphinx extensions.
This can be included directly into the conf.py file as a normal extension, i.e.:
extensions = [
...,
"papis.sphinx_ext",
]
It will include a custom CustomClickDirective for
documenting papis commands and a PapisConfig directive
for documenting Papis configuration values.
These are included by default when adding it to the extensions list in your
Sphinx configuration.
- class papis.sphinx_ext.CustomClickDirective(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]
A custom sphinx_click.ClickDirective that removes the automatic title from the generated documentation. Otherwise it can be used in the exact same way, e.g.:
.. click:: papis.commands.add:cli :prog: papis add
- class papis.sphinx_ext.PapisConfig(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine)[source]
A directive for describing Papis configuration values.
The directive is given as:
.. papis-config:: config-value-name
and has the following optional arguments:
:section:: The section in which the configuration value is given. The section defaults toget_general_settings_name().:type:: The type of the configuration value, e.g. a string or an integer. If not provided, the type of the default value is used.:default:: The default value for the configuration value. If not provided, this is taken from the default Papis settings.
It can be used as:
.. papis-config:: info-file :default: info.yml :type: str :section: settings This is the file name for where the document metadata should be stored. It is a relative path in the document's main folder.
In text, these configuration values can be referenced using standard role references, e.g.:
The document metadata is found in its :confval:`info-file`.
- papis.sphinx_ext.make_link_resolve(github_project_url: str, revision: str) Callable[[str, dict[str, Any]], str | None][source]
Create a function that can be used with
sphinx.ext.linkcode.This can be used in the
conf.pyfile as:linkcode_resolve = make_link_resolve("https://github.com/papis/papis", "main")
- Parameters:
github_project_url – the URL to a GitHub project to which to link.
revision – the revision to which to point to, e.g.
main.
- papis.sphinx_ext.process_autodoc_missing_reference(app: Sphinx, env: BuildEnvironment, node: pending_xref, contnode: TextElement) TextElement | None[source]
Fix missing references due to string annotations.
This uses an alias dictionary called
papis_missing_reference_aliasesthat maps each unknown type to a reference type and actual type full name. For example, say that theDocumentreference is not recognized properly. We know that this object is in thepapis.documentmodule as a class. Then, we write:papis_missing_reference_aliases: dict[str, str] = { "Document": "py:class:papis.document.Document", }
papis.testing
- papis.testing.create_random_file(filetype: str | None = None, prefix: str | None = None, suffix: str | None = None, dir: str | None = None) str[source]
Create a random file with the correct magic signature.
This function creates random empty files that can be used for testing. It supports creating PDF, EPUB, DjVu or simple text files. These are constructed in such a way that they are recognized by
papis.filetype.guess_content_extension().- Parameters:
filetype – the desired filetype of the result, which can be one of
("pdf", "epub", "djvu", "text").prefix – a prefix passed to
tempfile.NamedTemporaryFile().suffix – a suffix passed to
tempfile.NamedTemporaryFile().dir – a base directory passed to
tempfile.NamedTemporaryFile().
- papis.testing.populate_library(libdir: str) None[source]
Add temporary documents with random files into the folder libdir.
- Parameters:
libdir – an existing empty library directory.
- class papis.testing.TemporaryConfiguration(prefix: str = 'papis-test-', settings: dict[str, Any] | None = None, overwrite: bool = False)[source]
A context manager used to create a temporary papis configuration.
This configuration is created in a temporary directory and all the required paths are set to point to that directory (e.g.
XDG_CONFIG_HOMEandXDG_CACHE_HOME). This is meant to be used by tests to create a default environment in which to run.It can be used in the standard way as:
# Set the configuration option `picktool` papis.config.set("picktool", "fzf") with TemporaryConfiguration() as config: # In this block, it is back to its default value value = papis.config.get("picktool") assert value == "papis"
- libdir: str
When entering the context manager, this will contain the directory of a temporary library to run tests on. The library is unpopulated by default
- configdir: str
When entering the context manager, this will contain the config directory used by Papis.
- configfile: str
When entering the context manager, this will contain the config file used by Papis.
- prefix
Prefix for the temporary directory created for the test.
- class papis.testing.TemporaryLibrary(settings: dict[str, Any] | None = None, use_git: bool = False, populate: bool = True)[source]
A context manager used to create a temporary papis configuration with a library.
This extends
TemporaryConfigurationwith more support for creating and maintaining a temporary library. This can be used by tests that specifically require handling documents in a library.- use_git
If True, a git repository is created in the library directory.
- populate
If True, the library is prepopulated with a set of documents that contain random files and keys, which can be used for testing.
- class papis.testing.PapisRunner(**kwargs: Any)[source]
A wrapper around
click.testing.CliRunner.- invoke(cli: click.Command, args: Sequence[str], **kwargs: Any) click.testing.Result[source]
A simple wrapper around the
click.testing.CliRunner.invoke()method that does not catch exceptions by default.
- class papis.testing.ResourceCache(cachedir: str)[source]
A class that handles retrieving local and remote resources for tests from default folders.
This class mainly exists to test importers and downloaders that require getting a remote resource and testing it against results of the papis converters.
It can be controlled by the
PAPIS_UPDATE_RESOURCESenvironment variable, which takes the values:"none": no resources are downloaded or updated (default)."remote": remote resources are downloaded and the on-disk files are updated (used inget_remote_resource())."local": local resources are updated with the results of the papis conversion (used inget_local_resource())."both": both local and remote resources are updated.
Resources can then be retrieved as:
# Call some function that retrieves and converts remote data local = papis.arxiv.get_data(...) # Check that the expected cached resource matches the result expected_local = cache.get_local_resource("resources/test.json", local) assert local == expected_local
- cachedir
The location of the resource directory.
- session
A
requests.Sessionused to download remote resources.
- get_remote_resource(filename: str, url: str, force: bool = False, params: dict[str, str] | None = None, headers: dict[str, str] | None = None, cookies: dict[str, str] | None = None) bytes[source]
Retrieve a remote resource from the resource cache.
If force is True, the filename does not exist or
PAPIS_UPDATE_RESOURCESis set to("remote", "both"), then the resource is downloaded from the remote location at url. Otherwise, it is retrieved from the locally cached version at filename.- Parameters:
filename – a file where to store the remote resource.
url – a remote URL from which to retrieve the resource.
force – if True, force updating the resource cached at filename.
params – additional params passed to
requests.get().headers – additional headers passed to
requests.get().cookies – additional cookies passed to
requests.get().
- get_local_resource(filename: str, data: Any, force: bool = False) Any[source]
Retrieve a local resource from the resource cache.
If force is True, the filename does not exist or
PAPIS_UPDATE_RESOURCESis set to("local", "both"), then the local resource is updated using data. Otherwise, it is retrieved from the locally cached version at filename.- Parameters:
filename – a file where to store the local resource.
data – data that should be retrieve from the resource.
force – if True, force updating the resource cached at filename.
- papis.testing.tmp_config(request: SubRequest) Iterator[TemporaryConfiguration][source]
A fixture that creates a
TemporaryConfiguration.Additional keyword arguments can be passed using the
config_setupmarker:@pytest.mark.config_setup(overwrite=True) def test_me(tmp_config: TemporaryConfiguration) -> None: ...
- papis.testing.tmp_library(request: SubRequest) Iterator[TemporaryLibrary][source]
A fixture that creates a
TemporaryLibrary.Additional keyword arguments can be passed using the
library_setupmarker:@pytest.mark.library_setup(use_git=False) def test_me(tmp_library: TemporaryLibrary) -> None: ...
- papis.testing.resource_cache(request: SubRequest) ResourceCache[source]
A fixture that creates a
ResourceCache.Additional keyword arguments can be passed using the
resource_setupmarker@pytest.mark.resource_setup(cachedir="resources") def test_me(resource_cache: ResourceCache) -> None: ...
papis.utils
- class papis.utils.A
Invariant
typing.TypeVaralias of TypeVar(‘A’)
- class papis.utils.B
Invariant
typing.TypeVaralias of TypeVar(‘B’)
- papis.utils.get_session() requests.Session[source]
Create a
requests.Sessionforpapis.This session has the expected
User-Agent(seeuser-agent), proxy (seedownloader-proxy) and other settings used forpapis. It is recommended to use it instead of creating arequests.Sessionat every call site.
- papis.utils.parmap(f: Callable[[A], B], xs: Iterable[A], np: int | None = None) list[B][source]
Apply the function f to all elements of xs.
When available, this function uses the
multiprocessingmodule to apply the function in parallel. This can have a noticeable performance impact when the number of elements of xs is large, but can also be slower than a sequentialmap().The number of processes can also be controlled using the
PAPIS_NPenvironment variable. Setting this variable to0will disable the use ofmultiprocessingon all platforms.- Parameters:
f – a callable to apply to a list of elements.
xs – an iterable of elements to apply the function f to.
np – number of processes to use when applying the function f in parallel. This value defaults to
PAPIS_NPoros.cpu_count().
- papis.utils.run(cmd: Sequence[str], wait: bool = True, env: dict[str, Any] | None = None, cwd: str | None = None) None[source]
Run a given command with
subprocess.This is a simple wrapper around
subprocess.Popenwith custom defaults used to call papis commands.- Parameters:
cmd – a sequence of arguments to run, where the first entry is expected to be the command name and the remaining entries its arguments.
wait – if True wait for the process to finish, otherwise detach the process and return immediately.
env – a mapping that defines additional environment variables for the child process.
cwd – current working directory in which to run the command.
- papis.utils.general_open(file_name: str, key: str, default_opener: str | None = None, wait: bool = True) None[source]
Open a file with a configured open tool (executable).
- Parameters:
file_name – a file path to open.
key – a key in the configuration file to determine the opener used, e.g.
opentool.default_opener – an existing executable that can be used to open the file given by file_name. By default, the opener given by key, if any, or the default
papisopener are used.wait – if True wait for the process to finish, otherwise detach the process and return immediately.
- papis.utils.open_file(file_path: str, wait: bool = True) None[source]
Open file using the configured
opentool.- Parameters:
file_path – a file path to open.
wait – if True wait for the process to finish, otherwise detach the process and return immediately.
- papis.utils.get_folders(folder: str) list[str][source]
Get all folders with
papisdocuments inside of folder.This is the main indexing routine. It looks inside folder and crawls the whole directory structure in search of subfolders containing an
infofile. The name of the file must match the configuredinfo-name.- Parameters:
folder – root folder to look into.
- Returns:
List of folders containing an
infofile.
- papis.utils.locate_document_in_lib(document: Document, library: str | None = None, *, unique_document_keys: list[str] | None = None) Document[source]
Locate a document in a library.
This function falls back to
unique-document-keysto determine if the current document matches any document in the library. The first document for which one of the keys in the list matches exactly will be returned.- Parameters:
library – the name of a valid Papis library.
unique_document_keys – a list of keys to match when locating a document.
- Returns:
a full document as found in the library.
- Raises:
IndexError – No document found in the library.
- papis.utils.locate_document(document: Document, documents: Iterable[Document]) Document | None[source]
Locate a document in a list of documents.
This function uses the
unique-document-keysto determine if the current document matches any document in the list. The first document for which a key matches exactly will be returned.- Parameters:
document – the document to search for.
documents – an iterable of existing documents to match against.
- Returns:
a document from documents which matches the given document or None if no document is found.
- papis.utils.folders_to_documents(folders: Iterable[str]) list[Document][source]
Load a list of documents from their respective folders.
- Parameters:
folders – a list of folder paths to load from.
- Returns:
a list of document objects.
- papis.utils.update_doc_from_data_interactively(document: DocumentLike, data: dict[str, Any], data_name: str) None[source]
Shows a TUI to update the document interactively with fields from data.
- Parameters:
document – a document (or a mapping convertible to a document) which is going to be updated.
data – additional data to select and merge into document.
data_name – an identifier for the data to show in the TUI.
papis.yaml
- papis.yaml.data_to_yaml(yaml_path: str, data: dict[str, Any], *, allow_unicode: bool | None = True) None[source]
Save data to yaml_path in the YAML format.
- Parameters:
yaml_path – path to a file.
data – data to write to the file as a YAML document.
- papis.yaml.list_to_path(data: Sequence[dict[str, Any]], filepath: str, *, allow_unicode: bool | None = True) None[source]
Save a list of
dicts to a YAML file.- Parameters:
data – a sequence of dictionaries to save as YAML documents.
filepath – path to a file.
- papis.yaml.yaml_to_data(yaml_path: str, raise_exception: bool = False) dict[str, Any][source]
Read a YAML document from yaml_path.
- Parameters:
yaml_path – path to a file.
raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.
- Returns:
a
dictcontaining the data from the YAML document.- Raises:
ValueError – if the document cannot be loaded due to YAML parsing errors.
- papis.yaml.yaml_to_list(yaml_path: str, raise_exception: bool = False) list[dict[str, Any]][source]
Read a list of YAML documents.
This is analogous to
yaml_to_data(), but usesyaml.load_allto read multiple documents (see PyYAML docs).- Parameters:
yaml_path – path to a file containing YAML documents.
raise_exception – if True an exception is raised when loading the data has failed. Otherwise just a log message is emitted.
- Returns:
a
listofdictobjects, one for each YAML document in the file.- Raises:
ValueError – if the documents cannot be loaded due to YAML parsing errors.
papis.commands.doctor
- papis.commands.doctor.FixFn
Callable for automatic doctor fixers. This callable is constructed by a check and is expected to wrap all the required data, so it takes no arguments.
- papis.commands.doctor.CheckFn: TypeAlias = 'Callable[[Document], list[Error]]'
Callable for doctor document checks.
- class papis.commands.doctor.Error(name: str, path: str, payload: str, msg: str, suggestion_cmd: str, fix_action: FixFn | None, doc: Document | None)[source]
A detailed error returned by a doctor check.
- papis.commands.doctor.register_check(name: str, check: CheckFn) None[source]
Register a new check.
Registered checks are recognized by
papisand can be used by users in their configuration files throughdoctor-default-checksor on the command line through the--checksflag.
- papis.commands.doctor.files_check(doc: Document) list[Error][source]
Check whether the files of a document actually exist in the filesystem.
- Returns:
a
listof errors, one for each file that does not exist.
- papis.commands.doctor.keys_missing_check(doc: Document) list[Error][source]
Checks whether the keys provided in the configuration option
doctor-keys-missing-keysexist in the document and are non-empty.- Returns:
a
listof errors, one for each missing key.
- papis.commands.doctor.refs_check(doc: Document) list[Error][source]
Checks that a ref exists and if not it tries to create one according to the
ref-formatconfiguration option.- Returns:
an error if the reference does not exist or contains invalid characters (as required by BibTeX).
- papis.commands.doctor.duplicated_keys_check(doc: Document) list[Error][source]
Check for duplicated keys in the list given by the
doctor-duplicated-keys-keysconfiguration option.- Returns:
a
listof errors, one for each key with a value that already exist in the documents from the current query.
- papis.commands.doctor.duplicated_values_check(doc: Document) list[Error][source]
Check if the keys given by
doctor-duplicated-values-keyscontain any duplicate entries. These keys are expected to be lists of items.- Returns:
a
listof errors, one for each key with a value that has duplicate entries.
- papis.commands.doctor.bibtex_type_check(doc: Document) list[Error][source]
Check that the document type is compatible with BibTeX or BibLaTeX type descriptors.
- Returns:
an error if the types are not compatible.
- papis.commands.doctor.biblatex_type_alias_check(doc: Document) list[Error][source]
Check that the BibLaTeX type of the document is not a known alias.
The aliases are described by
bibtex_type_aliases.- Returns:
an error if the type of the document is an alias.
- papis.commands.doctor.biblatex_key_alias_check(doc: Document) list[Error][source]
Check that no BibLaTeX keys in the document are known aliases.
The aliases are described by
bibtex_key_aliases. Note that these keys can also be converted on export to BibLaTeX.- Returns:
an error for each key of the document that is an alias.
- papis.commands.doctor.biblatex_required_keys_check(doc: Document) list[Error][source]
Check that required BibLaTeX keys are part of the document based on its type.
The required keys are described by
papis.bibtex.bibtex_type_required_keys. Note that most BibLaTeX processors will be quite forgiving if these keys are missing.- Returns:
an error for each key of the document that is missing.
- papis.commands.doctor.biblatex_key_convert_check(doc: Document) list[Error][source]
Check if any BibLaTeX keys in the document are incorrectly assigned.
Note that this is a heuristic in most cases, as we cannot always determine allowable values. Implemented checks include:
issueentries that should benumber: issue is generally reserved for periodicals (e.g. “Spring” issue) and not meant as short designator for a publication (see Section 2.3.11 from the BibLaTeX manual).
- Returns:
a list of errors for each key that appears misassigned.
- papis.commands.doctor.get_key_type_check_keys() dict[str, type][source]
Check the
doctor-key-type-keysconfiguration entry for correctness.The
doctor-key-type-keysconfiguration entry defines a mapping of keys and their expected types. If the desired type is a list, thedoctor-key-type-separatorsetting can be used to split an existing string (and, similarly, if the desired type is a string, it can be used to join a list of items).- Returns:
A dictionary mapping key names to types.
- papis.commands.doctor.key_type_check(doc: Document) list[Error][source]
Check document keys have expected types.
- Returns:
a
listof errors, one for each key does not have the expected type (if it exists).
- papis.commands.doctor.html_codes_check(doc: Document) list[Error][source]
Checks that the keys in
doctor-html-codes-keysconfiguration options do not contain any HTML codes like&etc.- Returns:
a
listof errors, one for each key that contains HTML codes.
- papis.commands.doctor.html_tags_check(doc: Document) list[Error][source]
Checks that the keys in
doctor-html-tags-keysconfiguration options do not contain any HTML tags like<href>etc.- Returns:
a
listof errors, one for each key that contains HTML codes.
- papis.commands.doctor.string_cleaner_check(doc: Document) list[Error][source]
Check string keys in the document for various errors.
This check goes through all the keys of the document that are known to be keys, according to
doctor-key-type-keys, and fixes any obvious errors. For example (not exhaustive):Double spacing or any repeated whitespace.
Unexpected new line characters.
Weirdly formatted names, e.g. “J R R Tolkien” should be “J. R. R. Tolkien”.
- Returns:
a
listof errors, one for each string-based key that has unexpected formatting.
- papis.commands.doctor.gather_errors(documents: list[Document], checks: list[str] | None = None) list[Error][source]
Run all checks over the list of documents.
Only checks registered with
register_check()are supported and any unrecongnized checks are automatically skipped.- Parameters:
checks – a list of checks to run over the documents. If not provided, the default
doctor-default-checksare used.- Returns:
a list of all the errors gathered from the documents.
- papis.commands.doctor.fix_errors(doc: Document, checks: list[str] | None = None) None[source]
Fix errors in doc for the given checks.
This function only applies existing auto-fixers to the document. This is not possible for many of the existing checks, but can be used to quickly clean up a document.
- papis.commands.doctor.process_errors(errors: list[Error], fix: bool = False, explain: bool = False, suggest: bool = False, edit: bool = False) None[source]
Process a list of document errors from
gather_errors().- Parameters:
fix – if True, any automatic fixes are applied to the document the error refers to.
explain – if True, a short explanation of the error is shown.
suggest – if True, a short suggestion for manual fixing of the error is shown.
edit – if True, the document is opened for editing.
- papis.commands.doctor.run(doc: Document, checks: list[str] | None = None, fix: bool = True, explain: bool = False, suggest: bool = False, edit: bool = False) None[source]
Runner for
papis doctor.It runs all the checks given by the checks argument that have been registered through
register_check(). It then proceeds with processing and fixing each error in turn.