Pdfimages

From Parallel Library Services
Revision as of 15:31, 7 December 2021 by Simon (talk | contribs) (Created page with "<code>pdfimages</code> is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs. == Syntax == <syntaxhighlight lang="...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

pdfimages is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs.

Syntax

pdfimages /path/to/file.pdf /path/to/output/dir

Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:

pdfimages bar.pdf /tmp/image
ls /tmp/image*

A sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm  image-882.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm  image-883.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm  image-884.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:

pdfimages -j bar.pdf /tmp/image

The -f option specifies the first page to scan. To scan the first 5 pages, enter:

pdfimages -j -f 5 bar.pdf /tmp/image

The -l option specifies the last page to scan. To scan last 5 pages, enter:

pdfimages -j -l 5 bar.pdf /tmp/image