Create portable libraries by embedding metadata in Calibre

From Parallel Library Services
Jump to navigation Jump to search

https://test.roelof.info/portable-libraries.html

While it usually stores metadata in a separate file adjacent to a PDF, Calibre is able to embed metadata in the file with its Embed metadata tool. This not only cuts down the work of future librarians, but opens up the possibility for the metadata to become enriched by future editors.

The following recipe is adapted from one made by Roel Roscam-Abbing in a post from September 14th, 2020. From the post:

"Tools like Calibre make it possible to organize digital publications into meaningful collections by adding a wealth of metadata to files. This makes it possible to organize and search publications according to theme, publisher, authors, year and more. There is also a ‘description’ field which allow for comments of different kinds, such as an abstract of the publication, or personal notes on its (ir)relevance. A well-tended collection which has been thoughtfully annotated can become a valuable research tool for individual or collective researchers.

The big downside however is that the annotations exist separately from the files in question. This means that if the document in question is shared, the rich annotations are lost. This is because by default Calibre saves all the meta-data in a separate file. It is however possible to embed metadata in digital publications, doing so can help turn a folder of PDFs into a portable library."

How is metadata stored in digital publications?

In PDFs metadata is stored in so-called XMP headers. This is an XML-based scheme that which can hold different metadata namespaces. Metadata in PDFs can be queried with exiftool or pdfinfo. For this recipe, we are using a copy of Silvio Lorusso's Entreprecariat, which is licensed CC-By-NC. Running

exiftool entreprecariat.pdf

produces the following:

ExifTool Version Number         : 12.30
File Name                       : entreprecariat.pdf
Directory                       : .
File Size                       : 7.1 MiB
File Modification Date/Time     : 2021:11:22 16:05:59+01:00
File Access Date/Time           : 2021:11:22 16:05:59+01:00
File Inode Change Date/Time     : 2021:11:22 16:06:54+01:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
XMP Toolkit                     : XMP Core 5.4.0
Creator Tool                    : Adobe InDesign CC (Macintosh)
Metadata Date                   : 2019:09:30 14:07:09+02:00
Create Date                     : 2019:09:30 14:06:54+02:00
Modify Date                     : 2019:09:30 14:07:09+02:00
Format                          : application/pdf
Original Document ID            : xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
History Software Agent          : Adobe InDesign CC (Macintosh)
History Parameters              : from application/x-indesign to application/pdf
History Changed                 : /
History When                    : 2019:09:30 14:06:54+02:00
History Action                  : converted
Instance ID                     : uuid:d696f48c-7cbe-3b46-a3d2-48b7904b4eb4
Document ID                     : xmp.id:158e5a30-813c-4a76-9772-2ada50141a11
Derived From Rendition Class    : default
Derived From Document ID        : xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
Derived From Instance ID        : xmp.iid:872edb25-4a31-4dbf-8907-edc9701bb1f6
Derived From Original Document ID: xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
Rendition Class                 : proof:pdf
Trapped                         : False
Producer                        : Adobe PDF Library 11.0
Page Count                      : 264
PDF Version                     : 1.4
Creator                         : Adobe InDesign CC (Macintosh)

Similarly, if you want to see the raw XMP metadata, running

pdfinfo -meta entreprecariat.pdf
<syntaxthighlight>

produces

<syntaxthighlight lang="bash">
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 5.4.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
            xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/">
         <xmp:CreatorTool>Adobe InDesign CC (Macintosh)</xmp:CreatorTool>
         <xmp:MetadataDate>2019-09-30T14:07:09+02:00</xmp:MetadataDate>
         <xmp:CreateDate>2019-09-30T14:06:54+02:00</xmp:CreateDate>
         <xmp:ModifyDate>2019-09-30T14:07:09+02:00</xmp:ModifyDate>
         <dc:format>application/pdf</dc:format>
         <xmpMM:OriginalDocumentID>xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49</xmpMM:OriginalDocumentID>
         <xmpMM:History>
            <rdf:Seq>
               <rdf:li rdf:parseType="Resource">
                  <stEvt:softwareAgent>Adobe InDesign CC (Macintosh)</stEvt:softwareAgent>
                  <stEvt:parameters>from application/x-indesign to application/pdf</stEvt:parameters>
                  <stEvt:changed>/</stEvt:changed>
                  <stEvt:when>2019-09-30T14:06:54+02:00</stEvt:when>
                  <stEvt:action>converted</stEvt:action>
               </rdf:li>
            </rdf:Seq>
         </xmpMM:History>
         <xmpMM:InstanceID>uuid:d696f48c-7cbe-3b46-a3d2-48b7904b4eb4</xmpMM:InstanceID>
         <xmpMM:DocumentID>xmp.id:158e5a30-813c-4a76-9772-2ada50141a11</xmpMM:DocumentID>
         <xmpMM:DerivedFrom rdf:parseType="Resource">
            <stRef:renditionClass>default</stRef:renditionClass>
            <stRef:documentID>xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49</stRef:documentID>
            <stRef:instanceID>xmp.iid:872edb25-4a31-4dbf-8907-edc9701bb1f6</stRef:instanceID>
            <stRef:originalDocumentID>xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49</stRef:originalDocumentID>
         </xmpMM:DerivedFrom>
         <xmpMM:RenditionClass>proof:pdf</xmpMM:RenditionClass>
         <pdf:Trapped>False</pdf:Trapped>
         <pdf:Producer>Adobe PDF Library 11.0</pdf:Producer>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

Embedding metadata

The metadata shown by running exiftool is useful, but mostly technical, with information about the filename, file size, permissions, and other details such as when the file was made or last modified. However, what is more meaningful to readers is semantic metadata, details such as the title, author, and keywords or summaries of texts. To embed these in a PDF means that this work only needs to be done once, and when files are uploaded into a new Calibre library, they will come with this metadata already written -- this does not mean that it can't be edited at a later time...

Calibre preferences.png

To embed this metadata, open Calibre, and open the "Preferences". Once there, go to Interface > Toolbars & menus, then you will get a drop-down list of which tool bar you would like to customise. Choose "The main toolbar".

You will then see Available actions on the left, and Current actions on the right. Scroll down to the Embed metadata action, click it, and add it to the Current actions on the right.

Calibre customise main toolbar.png

Click Apply, and you will now see the Embed metadata action in the main toolbar of Calibre.

Edit the metadata on the PDF, to give it more semantic information:

Calibre edit metadata.png

Save the metadata. Now, when you click the Embed metadata action, metadata made in Calibre will be embedded in the XML headers of the PDF. Running the same command as before with exiftool, we now see there are more details embedded within the file:

ExifTool Version Number         : 12.30
File Name                       : Entreprecariat_ Everyone is an Entrepreneu - Silvio Lorusso.pdf
Directory                       : .
File Size                       : 7.0 MiB
File Modification Date/Time     : 2021:11:22 16:25:11+01:00
File Access Date/Time           : 2021:11:22 16:25:11+01:00
File Inode Change Date/Time     : 2021:11:22 16:25:11+01:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Author                          : Silvio Lorusso
Create Date                     : 2019:09:30 12:07:29Z
Keywords                        : precarity, labour, gig economy, entrepreneurialism
Modify Date                     : 2021:11:22 16:25:10+01:00
Title                           : Entreprecariat: Everyone is an Entrepreneur. Nobody is Safe
Description                     : <div>.<p>“Entrepreneur or precarious worker? These are the terms of a cognitive dissonance that turns everyone’s life into a shaky project in perennial start-up phase. Silvio Lorusso guides us through the entreprecariat, a world where change is natural and healthy, whatever it may bring. A world populated by motivational posters, productivity tools, mobile offices and self-help techniques. A world in which a mix of entrepreneurial ideology and widespread precarity is what regulates professional social media, online marketplaces for self-employment and crowdfunding platforms for personal needs. The result? A life in permanent beta, with sometimes tragic implications.”</p>.<p>English edition Translated by Isobel Butters Publisher Onomatopee, Eindhoven, 2019 Creative Commons BY-NC 4.0 License ISBN 9789493148161, 9493148165 257 pages</p></div>
Creator                         : Silvio Lorusso
Subject                         : precarity, labour, gig economy, entrepreneurialism
Publisher                       : Onomatopee
Date                            : 2019:09:30 14:06:54+02:00
Language                        : en
Format                          : application/pdf
Metadata Date                   : 2021:11:22 16:25:10.507496+01:00
Creator Tool                    : Adobe InDesign CC (Macintosh)
Timestamp                       : 2021:11:22 16:14:08.388822+01:00
Author link map                 : {"Silvio Lorusso": ""}
Title sort                      : Entreprecariat: Everyone is an Entrepreneur. Nobody is Safe
Author sort                     : Lorusso, Silvio
Trapped                         : False
Producer                        : Adobe PDF Library 11.0
Original Document ID            : xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
History Software Agent          : Adobe InDesign CC (Macintosh)
History Parameters              : from application/x-indesign to application/pdf
History Changed                 : /
History When                    : 2019:09:30 14:06:54+02:00
History Action                  : converted
Instance ID                     : uuid:d696f48c-7cbe-3b46-a3d2-48b7904b4eb4
Document ID                     : xmp.id:158e5a30-813c-4a76-9772-2ada50141a11
Derived From Rendition Class    : default
Derived From Document ID        : xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
Derived From Instance ID        : xmp.iid:872edb25-4a31-4dbf-8907-edc9701bb1f6
Derived From Original Document ID: xmp.did:77eb9a89-ab0d-4669-a071-891dd3c13e49
Rendition Class                 : proof:pdf
Page Count                      : 264
PDF Version                     : 1.4