The database

One of the things that makes Papis interesting is the fact that there can be many backends for the database system, including no database.

Right now there are three types of databases that the user can use:

  • No database:
    database-backend = papis
    use-cache = False
    
  • Simple cache based database:
    database-backend = papis
    
  • Whoosh based database:
    database-backend = whoosh
    

If you just plan to have up to 3000 documents in your library, you will have ample performance with the two first options. However if you’re reaching higher numbers, you’ll probably want to use the Whoosh backend for very good performance.

You can select a database by using the flag database-backend.

Papis database

Without a database, Papis would need to crawl through the library folders and see which subfolders have an info.yaml file. Repeatedly accessing the filesystem like this can be slow on older computers, remotely mounted partitions, etc.

To help with this, Papis implements a simple caching system. For each library, it creates a database (as defined by database-backend) that holds sufficient relevant information about the documents to avoid such slowdowns.

These cache files are stored per default in:

~/.cache/papis/

Notice that most papis commands will update the cache if it has to be the case. For instance the edit command will let you edit your document’s information and after you are done editing it will update the information for the given document in the cache. If you go directly to the document and edit the info file without passing through the papis edit command, the cache will not be updated and therefore Papis will not know of these changes, although they will be there. In such cases you will have to clear the cache.

Clearing the cache

To clear the cache for a given library you can use the command cache:

papis cache clear

In order to clear and rebuild the cache (i.e., reset it), you can simply run:

papis cache reset

Query language

Since version v0.3, Papis implements a query language to search documents. Queries can contain any field of the info file, so that author:einstein publisher:review will match documents that have author match with einstein AND publisher match with review. Note that only the AND filter is implemented in this simple query language and that OR is not supported. If you need this, consider using the Whoosh database.

For illustration, here are some examples:

  • Open documents where the author key matches ‘albert’ (ignoring case) and year matches ‘05’ (i.e. could be ‘1905’ or ‘2005’):

    papis open 'author : albert year : 05'
    
  • Add the restriction to the previous search that the usual matching matches the substring ‘licht’ in addition to the previously selected:

    papis open 'author : albert year : 05 licht'
    

    This is not to be mixed with the restriction that the key year matches '05 licht', which will not match any year, i.e.:

    papis open 'author : albert year : "05 licht"'
    

Disabling the cache

You can disable the cache using the configuration setting use-cache and set it to False, e.g.:

[settings]

use-cache = False

[books]
# Use cache for books but don't use for the rest of libraries
use-cache = True

Whoosh database

Papis can alternatively use the performant Whoosh library.

Of course, the performance comes at a cost. To achieve more performance, a database backend should create an index with information about the documents. Parsing a user query means going to the index and matching the query to what is found in the index. This means that the index can not in general have all the information that the info file of the documents includes.

In other words, the Whoosh index will store only certain fields from the documents’ info files. The good news is that we can tell Papis exactly which fields we want to index. These flags are

The prototype is for advanced users. If you just want to, say, include the publisher to the fields that you can search in, then you can put:

whoosh-schema-fields = ['publisher']

and you will be able to find documents by their publisher. For example, without this line set for publisher, the query:

papis open publisher:*

will not return anything, since the publisher field is not being stored.

Query language

The Whoosh database uses the Whoosh query language which is much more advanced than the query language in the Papis database.

The Whoosh query language supports both AND and OR, for instance:

papis open '(author:einstein AND year:1905) OR title:einstein'

will give papers of einstein in the year 1905 together with all papers where einstein appears in the title.

You can read more about the Whoosh query language here.