Document

Module defining the main document type.

class papis.document.KeyConversion

A dict that contains a key and an action. The key contains the name of a key in another dictionary and the action contains a callable that can pre-processes the value.

class papis.document.KeyConversionPair
.. attribute:: foreign_key

A string denoting the foreign key (in the input data).

list

A list of KeyConversion dictionaries used to rename and post-process the foreign_key and its value.

class papis.document.SortPriority(value)[source]

An enumeration.

class papis.document.KeyConversion(_typename, _fields=None, /, **kwargs)
class papis.document.KeyConversionPair(foreign_key, list)
foreign_key: str

Alias for field number 0

list: List[KeyConversion]

Alias for field number 1

papis.document.keyconversion_to_data(conversions: Sequence[KeyConversionPair], data: Dict[str, Any], keep_unknown_keys: bool = False) Dict[str, Any][source]

Function to convert between dictionaries.

This can be used to define a fixed set of translation rules between, e.g., JSON data obtained from a website API and standard papis key names and formatting. The implementation is completely generic.

For example, we have the simple dictionary

data = {"id": "10.1103/physrevb.89.140501"}

which contains the DOI of a document with the wrong key. We can then write the following rules

conversions = [
    KeyConversionPair("id", [
        {"key": "doi", "action": None},
        {"key": "url": "action": lambda x: "https://doi.org/{}".format(x)}
    ])
]

new_data = keyconversion_to_data(conversions, data)

to rename the "id" key to the standard "doi" key used by papis and a URL. Any number of such rules can be written, depending on the complexity of the incoming data. Note that any errors raised on the application of the action will be silently ignored and the corresponding key will be skipped.

Parameters:
  • conversions – a sequence of KeyConversionPairs used to convert the data.

  • data – a dict to be convert according to conversions.

  • keep_unknown_keys – if True unknown keys from data are kept in the resulting dictionary. Otherwise, only keys from conversions are present.

Returns:

a new dict containing the entries from data converted according to conversions.

papis.document.author_list_to_author(data: Dict[str, Any]) str[source]

Convert a list of authors into a single author string.

This uses the multiple-authors-separator and the multiple-authors-format configuration settings (see General settings) to construct the concatenated authors.

Parameters:

data – a dict that contains an "author_list" key to be converted into a single author string.

>>> authors = [        {"given": "Some", "family": "Author"},        {"given": "Other", "family": "Author"}]
>>> author_list_to_author({"author_list": authors})
'Author, Some and Author, Other'
papis.document.split_authors_name(authors: List[str], separator: str = 'and') List[Dict[str, Any]][source]

Convert list of authors to a fixed format.

This uses bibtexparser.customization.splitname() to correctly split and determine the first and last names of an author in the list. Note that this is just a heuristic and can give incorrect results for certain author names.

Parameters:
  • authors – a list of author names, where each entry can consists of multiple authors separated by separator.

  • separator – a separator for entries in authors that contain multiple authors.

class papis.document.DocHtmlEscaped(doc: Document)[source]

Small helper class to escape HTML elements in a document.

>>> DocHtmlEscaped(from_data({"title": '> >< int & "" "'}))['title']
'&gt; &gt;&lt; int &amp; &quot;&quot; &quot;'
class papis.document.Document(folder: str | None = None, data: Dict[str, Any] | None = None)[source]

An abstract document in a papis library.

This class inherits from a standard dict and implements some additional functionality.

html_escape

A DocHtmlEscaped instance that can be used to escape keys in the document for use in HTML documents.

has(key: str) bool[source]

Check if key is in the document.

set_folder(folder: str) None[source]

Set the document’s main folder.

This also updates the location of the info file and other attributes. Note, however, that it will not load any data from the given folder even if it contains another info file (see from_folder() for this functionality).

Parameters:

folder – an absolute path to a new main folder for the document.

get_main_folder() str | None[source]
Returns:

the root path in the filesystem where the document is stored, if any.

get_main_folder_name() str | None[source]
Returns:

the folder name of the document, i.e. the basename of the path returned by get_main_folder().

get_info_file() str[source]
Returns:

path to the info file, which can also be an empty string if no such file has been created.

get_files() List[str][source]

Get the files linked to the document.

The files in a document are stored relative to its main folder. If no main folder is set on the document (see set_folder()), then this function will not return any files. To retrieve the relative file paths only, access doc["files"] directly.

Returns:

a list of absolute file paths in the document’s main folder, if any.

save() None[source]

Saves the current document fields into the info file.

load() None[source]

Load information from the info file.

papis.document.from_data(data: Dict[str, Any]) Document[source]

Construct a Document from a dictionary.

Parameters:

data – a dictionary to be made into a new document.

papis.document.from_folder(folder_path: str) Document[source]

Construct a Document from a folder.

Parameters:

folder_path – absolute path to a valid papis folder.

papis.document.to_json(document: Document) str[source]

Export the document to JSON.

Returns:

a JSON string corresponding to all the entries in the document.

papis.document.to_dict(document: Document) Dict[str, Any][source]

Convert a document back into a standard dict.

Returns:

a dict corresponding to all the entries in the document.

papis.document.dump(document: Document) str[source]

Dump the document into a formatted string.

The format of the string is not fixed and is meant to be used to display the document entries in a consistent way across papis.

Returns:

a string containing all the entries in the document.

>>> doc = from_data({'title': 'Hello World'})
>>> dump(doc)
'title:     Hello World'
papis.document.delete(document: Document) None[source]

Delete a document from the filesystem.

This function delete the main folder of the document (recursively), but it does not delete the in-memory version of the document.

papis.document.describe(document: Document | Dict[str, Any]) str[source]
Returns:

a string description of the current document using document-description-format (see General settings).

papis.document.move(document: Document, path: str) None[source]

Move the document to a new main folder at path.

This supposes that the document exists in the location document.get_main_folder() and will change the folder in the input document as a result.

Parameters:

path – absolute path where the document should be moved to. This path is expected to not exist yet and will be created by this function.

>>> doc = from_data({'title': 'Hello World'})
>>> doc.set_folder('path/to/folder')
>>> import tempfile; newfolder = tempfile.mkdtemp()
>>> move(doc, newfolder)
Traceback (most recent call last):
...
Exception: There is already...
papis.document.sort(docs: Sequence[Document], key: str, reverse: bool = False) List[Document][source]

Sort a list of documents by the given key.

The sort is performed on the key with a priority given to the type of the value. If the key does not exist in the document, this is given the lowest priority and left at the end of the list.

Parameters:
  • docs – a sequence of documents.

  • key – a key in the documents by which to sort.

  • reverse – if True, the sorting is done in reverse order (descending instead of ascending).

Returns:

a list of documents sorted by key.

papis.document.new(folder_path: str, data: Dict[str, Any], files: Sequence[str] | None = None) Document[source]

Creates a complete document with data and existing files.

The document is saved to the filesystem at folder_path and all the given files are copied over to the main folder.

Parameters:
  • folder_path – a main folder for the document.

  • data – a dict with key and values to be used as metadata in the document.

  • files – a sequence of files to add to the document.

Raises:

FileExistsError – if folder_path already exists.