Difference between revisions of "Workshops"

From Parallel Library Services
Jump to navigation Jump to search
Line 1: Line 1:
[[First meeting, Introductions, Code of Conduct and Infrastructour]]
{{:First meeting, Introductions, Code of Conduct and Infrastructour}}


[[Introducing digital library types and setup]]
{{:Introducing digital library types and setup}}


[[Digitising, scanning, processing and republishing]]
{{:Organising library structure, classifying and cataloguing texts}}
 
{{:Digitising, scanning, processing and republishing}}

Revision as of 15:51, 21 March 2022

First meeting, Introductions, Code of Conduct and Infrastructour
Location: Online
Date: 13th October, 2021
Time: 16:00-19:00 CEST
Pad: https://pad.simonbrowne.biz/p/pls-meeting-1
Tools: Etherpad, Calibre-web, MediaWiki
Guests: {{{guests detail}}}

Context

We all met online for the first time, through a web conferencing video call to introduce ourselves as a group of 10 participants:

Tools

For the workshop series, the following ancillary infrastructure was created:

  • a dedicated MediaWiki instance, this wiki
  • a Calibre-web instance, called "parallel library"
  • an Etherpad instance for note-taking and communication purposes
  • an email mailing list for participants to communicate together: pls[at]post.lurk.org

Activities

We took an "infrastructour” of the digital resources (MediaWiki, Calibre-web and Etherpad); making user accounts and becoming familiar with wikitext, uploading to a digital library and collaborative writing in pads.

The "parallel-library" used for the workshop series

Following the wiki tutorial, we added text, links and images to our user pages with MediaWiki syntax:

MediaWiki syntax

Etherpad introductions

We also began drafting a Code of Conduct on a pad, which was completed by the next workshop on October 27th. The final draft was later added to this wiki as the PLS Code of Conduct, taking some inspiration from the IFF CoC.


Introducing digital library types and setup
Location: At Varia (Gouwstraat 3, Rotterdam) and online
Date: 27th October, 2021
Time: 16:00-19:00 CEST
Pad: https://pad.simonbrowne.biz/p/pls-meeting-2
Tools: Bibliotecha
Guests: Luke Murphy

Context

There are a few tried-and-tested tools for digital librarianship, many of which depend in some way on the open-source ebook management software Calibre. In this workshop, we were joined by guest Luke Murphy, one of the people working on the Bibliotecha project, which is a framework for using Calibre to serve files over a local area network.

Activities

Some participants had brought some minimal equipment to try to install Bibliotecha. It is designed to be self-hosted and run on a computer within a local area network on a wifi hotspot that is created during the installation.

Bibliotecha: digital books need libraries too

With Luke we discussed a few different options for hosting Calibre-based digital library systems, from homebrewed (self-hosted) servers to VPS (Virtual Private Server). There was the opportunity to see the type of hardware required for setting up a homebrewed server - a cheap singleboard microcomputer such as Raspberry Pi can easily run a local instance of Bibliotecha.

A basic set of equipment is:

  • a cheap computer with a wifi interface (e.g. Raspberry Pi 3 or 4)
  • a screen with cables to connect to the computer
  • a keyboard
  • a router
  • an SD card (minimum 32GB)
  • an ethernet cable

The software installation process is well documented here: https://manual.bibliotecha.info

We went through an installation of the software on different computers in our various locations, screensharing a terminal window to show what commands to write and when.

Installation of Bibliotecha at Varia


Organising library structure, classifying and cataloguing texts
Location: At Varia (Gouwstraat 3, Rotterdam) and online
Date: 10th November, 2021
Time: 16:00-19:00 CEST
Pad: https://pad.simonbrowne.biz/p/pls-meeting-3
Tools: Wikibase, Wikidata
Guests: Lozana Rossenova

Context

From self-managed libraries to open structured data, there are many ways to organise libraries. In this workshop we were joined by Lozana Rossenova, a designer and researcher, who presented some of her work on the Digital Archive of Artist's Publishing, an archive of net art made for Rhizome and the structuring of the data using the Wikibase extension of MediaWiki.

Activities

After Lozana's presentation, we took a deep dive into open structured data. This began with looking at the Wikidata website (https://www.wikidata.org/wiki/Wikidata:Main_Page), and learning more about the project. Items in the Wikidata database have particular "Q numbers", which are unique numbers used to identify the item. Many of these are historical and relate to the order in which items were initially entered ("the universe" is Q1, "earth" is Q2), however Q42 is reserved for Douglas Adams as an inside joke.

Items such as the Earth are labelled with aliases (also known as). They also have been given properties, and linked to objects. For example, Earth is listed as:

Earth; instance of; planet

Earth is the subject, instance of is the property, and planet is the object.

Earth (Q2)

Earth (Q2) does not have many aliases, unlike Rembrant (Q5598):

Rembrant (Q5598)

Exercises

We began by making accounts on WBStack, a platform from which you can create new wikis with the Wikibase extension installed. From here you can choose a name for your wiki and URL:

WBStack-register-wiki.png

Next step is to set skin and registration details:

WBStack-wiki-settings.png

In the features tab you can choose how to map existing properties in WikiData:

Wikidata-map-properties.png

One of the crucial choices you make when registering a wiki is whether to federate the 10,000+ properties of WikiData, or to begin from scratch:

Wikidata-federate-properties.png

Finally, we started making items and adding properties on our wikis:

Wikibase-simon-item.png


Digitising, scanning, processing and republishing
Location: At Varia (Gouwstraat 3, Rotterdam), and online
Date: December 8th, 2021
Time: 16:00-19:00 CET
Pad: https://pad.simonbrowne.biz/p/pls-meeting-5
Tools: Tesseract, OCRmyPDF, PDFsandwich
Guests: Pedro Sá Couto

Context

Digitising printed matter involves more than scanning - to make a file searchable it requires a text layer. In this workshop, we were joined by guest speaker Pedro Sá Couto, a designer and PhD researcher based on Porto, Portugal interested in the realm of surveillance in publishing digital and analog media. Pedro presented his work on projects such as Tactical Watermarks, an online republishing platform that adds user-generated watermarks to uploaded PDFs. His more recent PhD research follows copy shops located near Portuguese academic instituions, which act as “informal libraries”.

Activities

Pertinent to this topic, we explored the process of digitising printed books, from scan to a PDF with and OCR (Optical Character Recognition) layer. The second half of the workshop took a deep dive into using tesseract, an open-source OCR engine. While tesseract does a good job of recognising the characters in printed text, other software is needed to compile the PDF. Following some experiments with Tesseract, we trialled software such as OCRmyPDF and PDFsandwich, which can compile and run OCR in one command.

Add_a_text_layer_to_a_PDF_with_OCRmyPDF

Make_searchable_PDFs