Difference between revisions of "Pdfimages"

From Parallel Library Services
Jump to navigation Jump to search
Line 16: Line 16:
A sample output:
A sample output:


<syntaxhighlight lang="bash" line>
<syntaxhighlight lang="bash">
image-000.ppm  image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm
image-000.ppm  image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm
image-001.ppm  image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm
image-001.ppm  image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm

Revision as of 15:32, 7 December 2021

pdfimages is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs.

Syntax

pdfimages /path/to/file.pdf /path/to/output/dir

Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:

pdfimages bar.pdf /tmp/image
ls /tmp/image*

A sample output:

image-000.ppm   image-1025.ppm  image-1140.ppm  image-1256.ppm  image-247.ppm  image-374.ppm  image-501.ppm  image-628.ppm  image-755.ppm
image-001.ppm   image-1026.ppm  image-1141.ppm  image-1257.ppm  image-248.ppm  image-375.ppm  image-502.ppm  image-629.ppm  image-756.ppm
image-002.ppm   image-1027.ppm  image-1142.ppm  image-1258.ppm  image-249.ppm  image-376.ppm  image-503.ppm  image-630.ppm  image-757.ppm

Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:

pdfimages -j bar.pdf /tmp/image

The -f option specifies the first page to scan. To scan the first 5 pages, enter:

pdfimages -j -f 5 bar.pdf /tmp/image

The -l option specifies the last page to scan. To scan last 5 pages, enter:

pdfimages -j -l 5 bar.pdf /tmp/image