Difference between revisions of "Pdfimages"
Jump to navigation
Jump to search
(Created page with "<code>pdfimages</code> is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs. == Syntax == <syntaxhighlight lang="...") |
|||
Line 17: | Line 17: | ||
<syntaxhighlight lang="bash" line> | <syntaxhighlight lang="bash" line> | ||
image-000.ppm image-1025.ppm image-1140.ppm image-1256.ppm image-247.ppm image-374.ppm image-501.ppm image-628.ppm image-755 | image-000.ppm image-1025.ppm image-1140.ppm image-1256.ppm image-247.ppm image-374.ppm image-501.ppm image-628.ppm image-755.ppm | ||
image-001.ppm image-1026.ppm image-1141.ppm image-1257.ppm image-248.ppm image-375.ppm image-502.ppm image-629.ppm image-756 | image-001.ppm image-1026.ppm image-1141.ppm image-1257.ppm image-248.ppm image-375.ppm image-502.ppm image-629.ppm image-756.ppm | ||
image-002.ppm image-1027.ppm image-1142.ppm image-1258.ppm image-249.ppm image-376.ppm image-503.ppm image-630.ppm image-757 | image-002.ppm image-1027.ppm image-1142.ppm image-1258.ppm image-249.ppm image-376.ppm image-503.ppm image-630.ppm image-757.ppm | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Revision as of 15:32, 7 December 2021
pdfimages
is part of the Poppler utilities for working with PDFs. It is a tool which helps to extract images from PDFs.
Syntax
pdfimages /path/to/file.pdf /path/to/output/dir
Extract the PDF file called bar.pdf and save every image as image-00{1,2,3..N}.ppm, enter:
pdfimages bar.pdf /tmp/image
ls /tmp/image*
A sample output:
image-000.ppm image-1025.ppm image-1140.ppm image-1256.ppm image-247.ppm image-374.ppm image-501.ppm image-628.ppm image-755.ppm
image-001.ppm image-1026.ppm image-1141.ppm image-1257.ppm image-248.ppm image-375.ppm image-502.ppm image-629.ppm image-756.ppm
image-002.ppm image-1027.ppm image-1142.ppm image-1258.ppm image-249.ppm image-376.ppm image-503.ppm image-630.ppm image-757.ppm
Normally, all images are written as PBM (for monochrome images) or PPM (for non-monochrome images) files. With the -j option, images in DCT format are saved as JPEG files. All non-DCT images are saved in PBM/PPM format as usual:
pdfimages -j bar.pdf /tmp/image
The -f
option specifies the first page to scan. To scan the first 5 pages, enter:
pdfimages -j -f 5 bar.pdf /tmp/image
The -l
option specifies the last page to scan. To scan last 5 pages, enter:
pdfimages -j -l 5 bar.pdf /tmp/image