Difference between revisions of "Imagining librarianship & experiments with document conversion"

From Parallel Library Services
Jump to navigation Jump to search
Line 23: Line 23:
The first half of the workshop involved taking a close look at Calibre and hybrid publishing workflows using plain text file formats such as [[HTML]] and Markdown.
The first half of the workshop involved taking a close look at Calibre and hybrid publishing workflows using plain text file formats such as [[HTML]] and Markdown.


After inspecting metadata in PDFs using ExifTool
We followed [[Create_portable_libraries_by_embedding_metadata_in_Calibre|a tutorial]] (originally written by Roel Roscam-Abbing) which shows how to inspect metadata in PDFs using ExifTool, and then embed it with a Calibre plugin. This plugin, as well as many others that extend Calibre's functionality, can be added to the main toolbar in Calibre easily. It was important to note that this is only possible in Calibre, [[Calibre-web]] does not support this, or other plugins.
 
We followed [[Create_portable_libraries_by_embedding_metadata_in_Calibre|a tutorial]] (originally written by Roel Roscam-Abbing) which shows how to embed metadata in PDFs using a Calibre plugin. This plugin, ,as well as many others that extend Calibre's functionality can be added to the main toolbar in Calibre easily. It was important to note that this is only possible in Calibre, [[Calibre-web]] does not support this, or other plugins.


[[File:Calibre customise main toolbar.png|thumb|Calibre's main toolbar preferences]]
[[File:Calibre customise main toolbar.png|thumb|Calibre's main toolbar preferences]]

Revision as of 10:48, 12 May 2022

Imagining librarianship & experiments with document conversion
Location: At Varia (Gouwstraat 3, Rotterdam), and online
Date: November 24th, 2021
Time: 16:00-19:00 CET
Pad: https://pad.simonbrowne.biz/p/pls-meeting-4
Tools: {{{tools detail}}}
Guests: {{{guests detail}}}

Context

PDF (Portable Document Format) is a highly popular digital file format for ebooks. In this workshop, we created, queried and embedded metadata in a PDF by using tools such as Pandoc, ExifTool and of course Calibre, "the swiss army knife of document conversion".

Activities

After some catching up on the contexts of our projects, we discussed the plan for today:

  • a tour of Calibre
  • hybrid publishing workflows
  • embedding metadata in PDFs
  • making digital files (EPUB, PDF) with pandoc
  • converting between file formats in Calibre (.docx > .epub)

The first half of the workshop involved taking a close look at Calibre and hybrid publishing workflows using plain text file formats such as HTML and Markdown.

We followed a tutorial (originally written by Roel Roscam-Abbing) which shows how to inspect metadata in PDFs using ExifTool, and then embed it with a Calibre plugin. This plugin, as well as many others that extend Calibre's functionality, can be added to the main toolbar in Calibre easily. It was important to note that this is only possible in Calibre, Calibre-web does not support this, or other plugins.

Calibre's main toolbar preferences

Our workshop was documented on a pad using Markdown to create structure. Markdown is a lightweight markup language that can be useful in hybrid publishing, where inputs (plain text) may have may outputs (file formats). From the one document it is possible to create a variety of files, including EPUB, PDF, HTML and even Wikitext, the syntax MediaWiki uses.

Markdown uses YAML metadata headers, which require a title in the initial metatdata block:

---
title: my new document
---

After this, it uses a simple syntax to make headings, paragraphs, bold and italic, lists (ordered and unordered), hyperlinks, and many more elements that can easily be converted to multiple file formats. This is part of a markdown publishing workflow, whereby content is gathered and structured in plain text documents. These are usually a source markdown document with the extension .md, and a stylesheet - in CSS, for example - with the file extension .css.

A traditional publishing workflow, with linear content creation and intense design activity to produce many formats (image from the Digital Publishing Toolkit, pg 92)
A "single source" publishing workflow, using a markup language such as Markdown to create content and design in parallel, with multiple formats to export to (image from the Digital Publishing Toolkit, pg 97)

We began by catching up on our projects, recording notes in a pad:

Pls-workshop-04.png

We then exported the pad to a plain text format by running curl in a terminal:

curl https://pad.simonbrowne.biz/p/pls-meeting-4/export/txt -o pls-meeting-4.md

This exports the file in plain text, from which we can use Markdown and CSS to make a PDF with pandoc's weasyprint pdf rendering engine:

pandoc --pdf-engine=weasyprint -c stylesheet.css -s pls-meeting-4.md -o pls-meeting-4.pdf

File:Workshop 04.md.pdf