13 Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>] [-dedup | -dedup-perpage] [-raw] -o <path>

cpdf -list-images[-json] in.pdf [<range>]

cpdf -image-resolution[-json] <minimum resolution> in.pdf [<range>]

cpdf -list-images-used[-json] in.pdf [<range>]

cpdf -process-images [-process-images-info] in.pdf [<range>]
     [-im <filename>] [-jbig2enc <filename>]
     [-lossless-resample[-dpi] <n> | -lossless-to-jpeg <n>]
     [-jpeg-to-jpeg <n>] [-jpeg-to-jpeg-scale <n>]
     [-jpeg-to-jpeg-dpi <n>] [-1bpp-method <method>]
     [-jbig2-lossy-threshold <n>] [-pixel-threshold <n>]
     [-length-threshold <n>] [-percentage-threshold <n>]
     [-dpi-threshold <n>] [-resample-interpolate]
     -o out.pdf

cpdf -rasterize in.pdf <range> -o out.pdf
     [-rasterize[-gray|-1bpp|-jpeg|-jpeggray]
     [-rasterize-res <n>] [-rasterize-jpeg-quality <n>]
     [-rasterize-no-antialias | -rasterize-downsample]
     [-rasterize-annots]

cpdf -output-image in.pdf <range> -o <format>
     [-rasterize[-gray|-1bpp|-jpeg|-jpeggray]
     [-rasterize-res <n>] [-rasterize-jpeg-quality <n>]
     [-rasterize-no-antialias | -rasterize-downsample]
     [-rasterize-annots] [-tobox <BoxName>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG and JPEG2000 and lossless JBIG2 images are extracted directly.

Lossy JBIG2 images are extracted likewise, but an extra __<n> is added, giving the number of the JBIG2Global stream for this image, which is extracted as <n>.jbig2global. You may reconstruct the individual images with, for example, jbig2dec.

Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path] [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed (unless -raw is added, which outputs instead just JPEG or plain .pnm files).

The output specifier, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. The specification %objnum may also be used to insert the object number of the image. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Listing images

6, 1, /Z_Im0, 3300, 2550, 13432, 1, /DeviceGray, /CCITTFaxDecode
9, 2 13 14 15, /Z_Im0, 3376, 2649, 37972, 1, /DeviceGray, /CCITTFaxDecode

The fields are object number, page numbers, image name, width, height, size in bytes, bits per pixel, colour space, filter (compression method). With -list-images-json, the same information is available in JSON format:

[
  {
    "Object": 6,
    "Pages": [ 1 ],
    "Name": "/Z_Im0",
    "Width": 3300,
    "Height": 2550,
    "Bytes": 13432,
    "BitsPerComponent": 1,
    "Colourspace": "/DeviceGray",
    "Filter": "/CCITTFaxDecode"
  },
  {
    "Object": 9,
    "Pages": [ 2, 13, 14, 15 ],
    "Name": "/Z_Im0",
    "Width": 3376,
    "Height": 2649,
    "Bytes": 37972,
    "BitsPerComponent": 1,
    "Colourspace": "/DeviceGray",
    "Filter": "/CCITTFaxDecode"
  }
]

13.3 Listing images at point of use

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267, 31
2, /Im6, 184, 164, 149.999988, 150.458710, 39
2, /Im7, 171, 156, 149.999996, 150.579145, 40
2, /Im9, 65, 91, 149.999986, 151.071856, 57
2, /Im10, 94, 60, 149.999990, 152.284285, 59
2, /Im15, 184, 139, 149.960011, 150.672060, 91
4, /Im29, 53, 48, 149.970749, 151.616446, 93

The format is page number, image name, x pixels, y pixels, x resolution, y resolution, object number. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

[
  {
    "Object": 240,
    "Page": 79,
    "XObject": "/Z_Im0",
    "W": 3326,
    "H": 2584,
    "Xdpi": 300.0,
    "Ydpi": 300.0
  },
  {
    "Object": 243,
    "Page": 80,
    "XObject": "/Z_Im0",
    "W": 3300,
    "H": 2550,
    "Xdpi": 300.0,
    "Ydpi": 300.0
  }
]

To list all images regardless of resolution, use -list-images-used or -list-images-used-json instead.

13.4 Removing an Image

To remove a particular image, find its name using -list-images then apply the -draft and -draft-remove-only operations from Section 20.1.

13.5 Processing Images

Cpdf can process images within a PDF, replacing the original with the processed version. It does this by saving out the image data, putting it through an external process, and then reading it back in and re-inserting it. This is typically used to reduce the size of image data, and thus the size of the PDF.

There are a number of option to deal with lossy (e.g JPEG) and lossless images, one or more of which is specified. For example, the -jpeg-to-jpeg option processes existing JPEG images to a given JPEG quality level:

cpdf -process-images -im magick -jpeg-to-jpeg 65 in.pdf -o out.pdf

ImageMagick is required. Use -im to supply it. If we specify -process-images-info too, we can see the work being done:

cpdf -process-images -process-images-info -jpeg-to-jpeg 65
-im magick in.pdf -o out.pdf

(20/344) Object 265 (JPEG)... JPEG to JPEG 40798 -> 33463 (82%)
(38/344) Object 278 (JPEG)... JPEG to JPEG 4382 -> 3482 (79%)
(87/344) Object 266 (JPEG)... JPEG to JPEG 37227 -> 30199 (81%)
(243/344) Object 209 (JPEG)... no size reduction
(246/344) Object 270 (JPEG)... JPEG to JPEG 202568 -> 191175 (94%)
(281/344) Object 280 (JPEG)... JPEG to JPEG 12255 -> 9825 (80%)
(312/344) Object 279 (JPEG)... JPEG to JPEG 4117 -> 3157 (76%)

Similar output appears for the other methods, when they are specified. You can see the counter of work being done, and the result for each image chosen for processing.

The -lossless-to-jpeg option converts lossless images within PDFs to JPEG too, at the given quality level. It may be specified in addition to -jpeg-to-jpeg:

cpdf -process-images -jpeg-to-jpeg 65 -lossless-to-jpeg 80
-im magick in.pdf -o out.pdf

Images are only processed if they meet certain thresholds. Changes to the default thresholds may be specified:

Instead of compressing lossless images with lossy JPEG compression, we can resample losslessly:

cpdf -process-images -im magick -lossless-resample 80 in.pdf -o out.pdf

This will resample losslessly-compressed images to be 80 percent of the original width and height. By default, there will be no interpolation. To use interpolation, which may result in slightly larger data, add -resample-interpolate. To use a DPI target instead, use -lossless-resample-dpi instead:

cpdf -process-images -im magick -lossless-resample-dpi 300
in.pdf -o out.pdf

We can also use resampling with -jpeg-to-jpeg, buy specifying -jpeg-to-jpeg-scale:

cpdf -process-images -im magick -jpeg-to-jpeg 70 -jpeg-to-jpeg-scale 50
in.pdf -o out.pdf

cpdf -process-images -im magick -jpeg-to-jpeg 70 -jpeg-to-jpeg-dpi 150
in.pdf -o out.pdf

The methods so far introduced do not operate on 1 bit per pixel data. Different compression mechanisms are typically in use, and we need a different approach. The -1bpp-method option specifies what to do with losslessly compressed 1 bit-per-pixel images.

These options require the jbig2enc program, whose location may be specified with -jbig2enc. For lossy JBIG2, the threshold for similarity of data may be set with -jbig2-lossy-threshold. For example:

cpdf -process-images -jbig2enc jbig2enc -1bpp-method JBIG2Lossy
-jbig2-lossy-threshold 75 in.pdf -o out.pdf

It is not currently possible to reprocess lossless JBIG2 into lossy JBIG2, nor is it possible to recompress into CCITT.

NB: CMYK images will be converted to RGB or untouched by some of these processes. A future version of Cpdf will remove this limitation.

13.6 Rasterization (PDF to image conversion)

Cpdf can send individual pages of a PDF out to gs to rasterize them - they are then read back in and replace the original page content:

cpdf -gs gs -rasterize in.pdf -o out.pdf

Other metadata (for example, bookmarks) is preserved. By default, the resolution is 144dpi, and the raster data is losslessly compressed. It is the Crop Box which is rasterized, or the Media Box if absent. The following options may be added:

Option	Effect

-rasterize-gray	Use grayscale instead of colour
-rasterize-1bpp	Use monochrome instead of colour
-rasterize-jpeg	Use JPEG instead of lossless compression
-rasterize-jpeggray	Use grayscale JPEG instead of lossless compression
-rasterize-jpeg-quality	Set JPEG image quality (0..100)
-rasterize-res	Set the resolution
-rasterize-annots	Rasterize annotations instead of retaining
-rasterize-no-antialias	Turn off antialiasing
-rasterize-downsample	Use better but slower antialiasing
-gs-quiet	Don’t show gs output

In addition to rasterization of pages, we can export them in PNG or JPEG format, again by the use of gs:

cpdf -gs gs -output-image in.pdf 10-end -o image%%%.png

This will extract pages 10 onwards to the files image000.png, image001.png and so on. All the options above apply, and in addition we can choose which box is rasterized:

Option	Effect	Default value

-pixel-threshold	Images below this number of pixels not processed	25
-length-threshold	Images with less than this number of bytes of data not processed	100
-percentage-threshold	Results not below this percentage of original size discarded	99
-dpi-threshold	Only images above this threshold at all use points processed	(no dpi check)

Method	Effect

JBIG2	Lossless JBIG2
JBIG2Lossy	Lossy JBIG2, sharing JBIG2Globals data amongst all images.