cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]
[-dedup | -dedup-perpage] [-raw] -o <path>
cpdf -list-images[-json] in.pdf [<range>]
cpdf -image-resolution[-json] <minimum resolution> in.pdf [<range>]
cpdf -list-images-used[-json] in.pdf [<range>]
cpdf -process-images [-process-images-info] in.pdf [<range>]
[-im <filename>] [-jbig2enc <filename>]
[-lossless-resample[-dpi] <n> | -lossless-to-jpeg <n>]
[-jpeg-to-jpeg <n>] [-1bpp-method <method>]
[-jbig2-lossy-threshold <n>]
[-pixel-threshold <n>] [-length-threshold <n>]
[-percentage-threshold <n>] [-dpi-threshold <n>]
[-resample-interpolate]
-o out.pdf
Cpdf can extract the raster images to a given location. JPEG and JPEG2000 and lossless JBIG2 images are extracted directly.
Lossy JBIG2 images are extracted likewise, but an extra __<n> is added, giving the number of the JBIG2Global stream for this image, which is extracted as <n>.jbig2global. You may reconstruct the individual images with, for example, jbig2dec.
Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.
cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]
[-dedup | -dedup-perpage] -o <path>
The -im or -p2p option is used to give the path to the external tool, one of which must be installed (unless -raw is added, which outputs instead just JPEG or plain .pnm files).
The output specifier, e.g -o output/%%%
gives the number format for numbering the images.
Output files are named serially from 0, and include the page number too. For example, output files
might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is
an example invocation:
cpdf -extract-images in.pdf -im magick -o output/%%%
The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.
The -list-images operation lists all images in the file:
6, 1, /Z_Im0, 3300, 2550, 13432, 1, /DeviceGray, /CCITTFaxDecode 9, 2 13 14 15, /Z_Im0, 3376, 2649, 37972, 1, /DeviceGray, /CCITTFaxDecode
The fields are object number, page numbers, image name, width, height, size in bytes, bits per pixel, colour space, filter (compression method). With -list-images-json, the same information is available in JSON format:
[ { "Object": 6, "Pages": [ 1 ], "Name": "/Z_Im0", "Width": 3300, "Height": 2550, "Bytes": 13432, "BitsPerComponent": 1, "Colourspace": "/DeviceGray", "Filter": "/CCITTFaxDecode" }, { "Object": 9, "Pages": [ 2, 13, 14, 15 ], "Name": "/Z_Im0", "Width": 3376, "Height": 2649, "Bytes": 37972, "BitsPerComponent": 1, "Colourspace": "/DeviceGray", "Filter": "/CCITTFaxDecode" } ]
To list all images in the given range of pages which fall below a given resolution (in dots-per-inch),
use the -image-resolution
function:
cpdf -image-resolution 300 in.pdf [<range>]
2, /Im5, 531, 684, 149.935297, 150.138267, 31 2, /Im6, 184, 164, 149.999988, 150.458710, 39 2, /Im7, 171, 156, 149.999996, 150.579145, 40 2, /Im9, 65, 91, 149.999986, 151.071856, 57 2, /Im10, 94, 60, 149.999990, 152.284285, 59 2, /Im15, 184, 139, 149.960011, 150.672060, 91 4, /Im29, 53, 48, 149.970749, 151.616446, 93
The information is also available in JSON format:
[ { "Object": 240, "Page": 79, "XObject": "/Z_Im0", "W": 3326, "H": 2584, "Xdpi": 300.0, "Ydpi": 300.0 }, { "Object": 243, "Page": 80, "XObject": "/Z_Im0", "W": 3300, "H": 2550, "Xdpi": 300.0, "Ydpi": 300.0 } ]
To list all images regardless of resolution, use -list-images-used or -list-images-used-json instead.
To remove a particular image, find its name using -list-images then apply the -draft and -draft-remove-only operations from Section 19.1.
Cpdf can process images within a PDF, replacing the original with the processed version. It does this by saving out the image data, putting it through an external process, and then reading it back in and re-inserting it. This is typically used to reduce the size of image data, and thus the size of the PDF.
There are a number of option to deal with lossy (e.g JPEG) and lossless images, one or more of which is specified. For example, the -jpeg-to-jpeg option processes existing JPEG images to a given JPEG quality level:
cpdf -process-images -im magick -jpeg-to-jpeg 65 in.pdf -o out.pdf
ImageMagick is required. Use -im to supply it. If we specify -process-images-info too, we can see the work being done:
cpdf -process-images -process-images-info -jpeg-to-jpeg 65
-im magick in.pdf -o out.pdf
Here is sample output:
(20/344) Object 265 (JPEG)... JPEG to JPEG 40798 -> 33463 (82%) (38/344) Object 278 (JPEG)... JPEG to JPEG 4382 -> 3482 (79%) (87/344) Object 266 (JPEG)... JPEG to JPEG 37227 -> 30199 (81%) (243/344) Object 209 (JPEG)... no size reduction (246/344) Object 270 (JPEG)... JPEG to JPEG 202568 -> 191175 (94%) (281/344) Object 280 (JPEG)... JPEG to JPEG 12255 -> 9825 (80%) (312/344) Object 279 (JPEG)... JPEG to JPEG 4117 -> 3157 (76%)
Similar output appears for the other methods, when they are specified. You can see the counter of work being done, and the result for each image chosen for processing.
The -lossless-to-jpeg option converts lossless images within PDFs to JPEG too, at the given quality level. It may be specified in addition to -jpeg-to-jpeg:
cpdf -process-images -jpeg-to-jpeg 65 -lossless-to-jpeg 80
-im magick in.pdf -o out.pdf
Images are only processed if they meet certain thresholds. Changes to the default thresholds may be specified:
Instead of compressing lossless images with lossy JPEG compression, we can resample losslessly:
cpdf -process-images -im magick -lossless-resample 80 in.pdf -o out.pdf
This will resample losslessly-compressed images to contain 80 percent of the original pixels. By default, there will be no interpolation. To use interpolation, which may result in slightly larger data, add -resample-interpolate. To use a DPI target instead, use -lossless-resample-dpi instead:
cpdf -process-images -im magick -lossless-resample-dpi 300
in.pdf -o out.pdf
The methods so far introduced do not operate on 1 bit per pixel data. Different compression mechanisms are typically in use, and we need a different approach. The -1bpp-method option specifies what to do with losslessly compressed 1 bit-per-pixel images.
These options require the jbig2enc program, whose location may be specified with -jbig2enc. For lossy JBIG2, the threshold for similarity of data may be set with -jbig2-lossy-threshold. For example:
cpdf -process-images -jbig2enc jbig2enc -1bpp-method JBIG2Lossy
-jbig2-lossy-threshold 75 in.pdf -o out.pdf
It is not currently possible to reprocess lossless JBIG2 into lossy JBIG2, nor is it possible to recompress into CCITT.
NB: CYMK images will be converted to RGB or untouched by some of these processes. A future version of cpdf will remove this limitation.