The database
Papis stores each document in its own folder inside a library directory, with
all metadata kept in an info.yaml file (see The library structure and
The info.yaml file). The database is a cache of that metadata: instead of reading
every info.yaml file from disk on every command, Papis builds a single index
and keeps it up to date as documents are added, edited, or removed.
To help with this, Papis implements a simple caching system. For each library,
it creates a database (as defined by database-backend) that holds
sufficient relevant information about the documents to avoid such slowdowns and
allow quick access and search for document metadata.
Right now, the following backends are available:
No database:
database-backend = papis
use-cache = False
Simple
pickle-based database:
database-backend = papis
Whoosh based database:
database-backend = whoosh
sqlite3-based database:
database-backend = sqlite
If you plan to have about <1000 documents in your library, the default
papis backend will offer ample performance. However, for larger
libraries, switching to the sqlite backend should work a lot better.
Using the databases
By default, these database files are stored in get_cache_home()
(on Linux, this will be something like ~/.cache/papis). You can
put the files next to your library by using cache-dir:
.. code:: ini
[papers] dir = /path/to/my/papers cache-dir = /path/to/my/papers
When switching database backends, make sure to also update the
default-query-string option to match. This will be used when no
query is provided to match “all” the documents. The value differs per backend:
papisbackend:default-query-string = .(the default)whooshbackend:default-query-string = *sqlitebackend:default-query-string = *
Note that most papis commands will update the cache if they modify the document. For
example, the edit command will let you edit your document’s metadata and, after you
are done editing, it will update the information for the given document in the cache.
Note
The cache is built automatically the first time you run any papis command
against a library. You do not need to initialize it manually.
If you go directly to the document and edit the info file without passing through
the papis edit command, the cache will not be updated and therefore Papis will
not know of these changes, although they will be there. In such cases you will have
to clear the cache manually.
Clearing the cache
To clear the cache for a given library you can use the cache command:
papis cache clear
In order to clear and rebuild the cache (i.e., reset it), you can simply run:
papis cache reset
Disabling the cache
You can disable the cache using the configuration setting use-cache and set
it to False, e.g.:
[settings]
use-cache = False
[books]
# Use cache for books but don't use for the rest of libraries
use-cache = True
Warning
The use-cache option is only used by the papis backend. The other
backends cannot be disabled if they are chosen using database-backend.
Papis backend
Since version v0.3, Papis implements a simple query language to search documents
when using the papis backend. Queries can contain any field of the info file, so
that author:einstein publisher:review will match documents that have author
match with einstein AND publisher match with review in a case-insensitive
fashion.
In general, the query syntax is formed of multiple [key:]"value" matches, where
the key is optional (searches all keys in this case)
and the value can be any string (with optional quotes required to include spaces).
the terms in the search query can be optionally separated by keywords such as
AND(default),ORorNOTto construct complex queries.the terms can also contain regex characters to extend the matching. If those characters should be part of the query (e.g. parentheses), they should be escaped.
Note
Free-form terms in the query, i.e. ones that are not prefixed by a key
name like author:einstein, are only matched against the values in
match-format. For example, if you want to search for Springer,
you need to include {doc[publisher]} in the format pattern.
For illustration, here are some examples:
Open documents where the author key matches ‘albert’ (ignoring case) and year matches ‘05’ (i.e. could be ‘1905’ or ‘2005’):
papis open 'author : albert year : 05'
Add the restriction to the previous search that the usual matching matches the substring ‘licht’ in addition to the previously selected:
papis open 'author : albert year : 05 licht'
This is not to be mixed with the restriction that the key year matches
'05 licht', which will not match any year, i.e.:
papis open 'author : albert year : "05 licht"'
Find documents by either ‘einstein’ or ‘bohr’:
papis open 'author:einstein OR author:bohr'
Find documents about ‘physics’ that are not by ‘einstein’:
papis open 'physics NOT author:einstein'
Use parentheses to group complex logic:
papis open 'author:einstein AND (year:1905 OR year:1915)'
Use regex character classes to match a range of years:
papis open 'year:20[0-1][0-9]'
Use regex anchors to match the beginning of a title:
papis open 'title:^Quantum'
Whoosh backend
Papis can alternatively use the Whoosh library. This backend can have better performance when using large libraries.
Of course, the performance comes at a cost. To achieve more performance, Whoosh needs to create an index with information about the documents. Parsing a user query means going to the index and matching the query to what is found in the index. This means that the index can not in general have all the information that the info file of the documents includes.
In other words, the Whoosh index will store only certain fields from the documents’ info files. The good news is that we can tell Papis exactly which fields we want to index. These flags are
The prototype is for advanced users. If you just want to, say, include the publisher to the fields that you can search in, then you can put:
whoosh-schema-fields = ['publisher']
and you will be able to find documents by their publisher. For example, without this line set for publisher, the query:
papis open publisher:*
will not return anything, since the publisher field is not being stored.
Query language
The Whoosh database uses the Whoosh query language which is much more advanced than the query language in the Papis backend.
The Whoosh query language supports both AND and OR and other wildcards.
For instance:
Find papers by Einstein from 1905, or any paper with “einstein” in the title:
papis open '(author:einstein AND year:1905) OR title:einstein'
Find all papers tagged “physics” or “quantum”:
papis open 'tags:physics OR tags:quantum'
Use a wildcard to find papers whose title starts with “rela”:
papis open 'title:rela*'
You can read more about the Whoosh query language here.
SQLite backend
This backend is similar to the Whoosh backend in the way that it functions. It is expected to be even more performant than the Whoosh backend and it comes with no additional dependencies. It should be a good first choice if you notice your library searches are getting sluggish.
To customize the searchable fields by the sqlite backend, you will also need
to define sqlite-schema-fields. A good default is in place, so this
should not be necessary unless you require complex queries.
Query language
To perform search queries, the sqlite backend uses the Full Text Search (FTS5) functionality. This allows using AND and OR
and various groupings of queries, as expected.
Warning
FTS5 has very limited support for regular expressions and substring matching. The only supported cases are: a prefix wildcard (e.g. “einst*” to match any strings starting with “einst”) and a initial token “^” (to match the first word in a string).
For illustration, here are some examples:
Find papers where the title contains “einstein” (searches all indexed fields):
papis open 'einstein'
Find papers that contain the prefix “einst” (note that FTS will not find “einstein” if given just “einst”, but requires the wildcard “*” to match it):
papis open 'einst*'
Find papers where the author field contains “einstein” and the year field contains “1905”:
papis open 'author : einstein AND year : 1905'
Find papers matching “einstein” or “bohr” anywhere in the indexed fields:
papis open 'einstein OR bohr'
For advanced users, FTS5 also supports NEAR queries, which match documents
where two or more terms appear within a specified number of tokens of each other.
The syntax is NEAR(phrase1 phrase2, N) where N is the maximum number of
tokens allowed between the end of the first phrase and the start of the last
(default: 10 if omitted).
Find papers where “quantum” and “gravity” appear within 5 tokens of each other in the title field:
papis open 'title : NEAR(quantum gravity, 5)'
Find papers where “general” and “relativity” appear close together anywhere in the indexed fields:
papis open 'NEAR(general relativity)'
The FTS5 module in sqlite3 has a lot more functionality for complex
queries that you can read about here.