Difference between revisions of "Make library card catalogue PDFs with Python scripts"

Revision as of 20:56, 9 November 2021

Dependencies

git, software for managing development and versioning repositories of code
an activated python virtual environment
scripts cloned from a git repository
installation of calibrestekje, a python-bindings library
a valid metadata.db file as produced by an existing installation of Calibre

Getting started

If you don't have git on your system, you can install it via instructions on this website:

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

After installing git, you can clone repositories, and work on files on your computer. There are many other things you can do with git, for example push/pull changes, check version history and "fork" projects or work on them collaboratively.

First clone the git repository and change to the new bootleg/ directory::

git clone https://git.xpub.nl/simoon/bootleg.git
cd bootleg/

In the bootleg/ directory, create and activate a python virtual environment. Once activated, you'll notice the prompt in the terminal has changed to be prefaced by (venv), which indicates that the virtual environment is active.

python3 -m venv venv
source venv/bin/activate

Then, install dependencies in the python environment.

pip install reportlab calibrestekje pillow markdown html5lib

With Etherpad

Next, run this command:

python3 reportlab_image_poster.py

This will produce a PDF, and also a list of the contents. In my case, it produced a 1048 page PDF in seconds, with the title and author of each book on separate pages of a card catalogue.

How it works

The python script lesssimplelayout.py first imports all the necessary modules. Then, the pagewidth and pageheight are declared as landscape (A6). Next, a canvas object is made with the output file named as "card_catalogue.pdf".

from reportlab.lib.pagesizes import *
from reportlab.pdfgen import canvas
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from calibrestekje import Book, Publisher, init_session
from readfrompad import curl, parse_pad
from xml.etree import ElementTree as ET 


paragraphs_by_header = parse_pad(curl("https://pad.xpub.nl/p/bootleg_annotations/export/txt"))
from pprint import pprint
pprint(paragraphs_by_header)

pagewidth, pageheight = landscape(A6)

doc = SimpleDocTemplate("text.pdf", pagesize=landscape(A6),
                        rightMargin=18, leftMargin=18,
                        topMargin=0, bottomMargin=18)

content = []
styles = getSampleStyleSheet()

session = init_session("sqlite:///metadata.db")

for book in session.query(Book).all():
    book_url = "https://hub.xpub.nl/bootleglibrary/book/{}".format(book.id)
    print (book.title)
    print (book_url)
    # print (book.authors)
    
    # c.drawString(10,pageheight-10, book.title)
    # c.showPage()

    # create a paragraph and append content to it - e.g. book.title, book.authors etc
    p = Paragraph('<font size=12>{}</font>'.format(book.title), styles["Italic"])

    content.append(p)
    # content.append(PageBreak())
    content.append(Spacer(1, 12))

    #import ipdb; ipdb.set_trace()

    format_string = '<font size=12>{}</font>'
    all_authors = [author.name for author in book.authors]
    glued_together = format_string.format(", ".join(all_authors))

    # ALTERNATIVE WAY... (without list comprehensions)
    first = True
    author_text = ""
    for author in book.authors:
        if not first:
            author_text += ", "
        author_text += "<font size=12>{}</font>".format(author.name)
        first = False

    #if all_authors==['John Markoff']:
    #	import ipdb; ipdb.set_trace()

    p = Paragraph(glued_together, styles["Normal"])
    content.append(p)
    content.append(PageBreak())
    content.append(Spacer(1, 12))

    # BACK SIDE
    if book_url in paragraphs_by_header:
        print ("FOUND ANNOTATIONS FOR BOOK", book_url)
        # ANNOTATIONS FROM PAD
        annotations = paragraphs_by_header[book_url]
        for p in annotations:
            p_text = ET.tostring(p, method="html", encoding="utf-8")
            p = Paragraph(p_text, styles["Normal"])
            content.append(p)
            content.append(PageBreak())
            content.append(Spacer(1, 12))
    else:
        # BLANK BACK SIDE
        p = Paragraph("", styles["Normal"])
        content.append(p)
        content.append(PageBreak())
        content.append(Spacer(1, 12))


doc.build(content)

With Calibrestekje, Calibre and Reportlab

Calibrestkje works with an existing Calibre database. This means, a previously installed version of Calibre. If you don't have calibre installed yet, you should do this before trying out this recipe. The contents that are produced in the PDF depend entirely on what is in the metadata.db file. Alternatively, it may also be useful to install calibre and force it to make an empty, but valid file which can be written to using calibredb, a tool that comes with the calibre package. There is a handy guide for how to work with the Calibre database on the examples page for Calibrestekje.

Make sure you have a valid metadata.db file in the same bootleg/ directory. One which is usually produced the first time you run Calibre in a path similar to /home/myusername/calibre/metadata.db on Debian and Unix-like systems. This file is usually kept with the contents of the Calibre book collection.

@@ Line 31: / Line 31: @@
 Then, install dependencies in the python environment.
-<syntaxhighlight lang="bash" line>
+<syntaxhighlight lang="bash">
 pip install reportlab calibrestekje pillow markdown html5lib
 </syntaxhighlight>