Pdfimages
Jump to navigation
Jump to search
pdfimages
is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs.
Syntax
pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
pdfimages bar.pdf /tmp/image
ls /tmp/image*
A sample output:
image-000.ppm image-1025.ppm image-1140.ppm image-1256.ppm image-247.ppm image-374.ppm image-501.ppm image-628.ppm image-755.ppm image-882.ppm
image-001.ppm image-1026.ppm image-1141.ppm image-1257.ppm image-248.ppm image-375.ppm image-502.ppm image-629.ppm image-756.ppm image-883.ppm
image-002.ppm image-1027.ppm image-1142.ppm image-1258.ppm image-249.ppm image-376.ppm image-503.ppm image-630.ppm image-757.ppm image-884.ppm
Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
pdfimages -j bar.pdf /tmp/image
The -f
option specifies the first page to scan. To scan the first 5 pages, enter:
pdfimages -j -f 5 bar.pdf /tmp/image
The -l
option specifies the last page to scan. To scan last 5 pages, enter:
pdfimages -j -l 5 bar.pdf /tmp/image