Chapter 13
Working with Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]      [-dedup | -dedup-perpage] -o <path>

cpdf -image-resolution <minimum resolution> in.pdf [<range>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG, JPEG2000 and JBIG2 images are extracted directly. Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]      [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed. The output specifer, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Detecting Low-resolution Images

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267
2, /Im6, 184, 164, 149.999988, 150.458710
2, /Im7, 171, 156, 149.999996, 150.579145
2, /Im9, 65, 91, 149.999986, 151.071856
2, /Im10, 94, 60, 149.999990, 152.284285
2, /Im15, 184, 139, 149.960011, 150.672060
4, /Im29, 53, 48, 149.970749, 151.616446

The format is page number, image name, x pixels, y pixels, x resolution, y resolution. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

13.3 Removing an Image

To remove a particular image, find its name using -image-resolution with a sufficiently high resolution (so as to list all images), and then apply the -draft and -draft-remove-only operations from Section 18.1.

C Interface

/* CHAPTER 13. Images. */ 
 * Get image data, including resolution at all points of use. Call 
 * cpdf_startGetImageResolution(pdf, min_required_resolution) will begin the 
 * process of obtaining data on all image uses below min_required_resolution, 
 * returning the total number. So, to return all image uses, specify a very 
 * high min_required_resolution. Then, call the other functions giving a 
 * serial number 0..<total number> - 1, to retrieve the data. Finally, call 
 * cpdf_endGetImageResolution to clean up. 
int cpdf_startGetImageResolution(int, float); 
int cpdf_getImageResolutionPageNumber(int); 
char *cpdf_getImageResolutionImageName(int); 
int cpdf_getImageResolutionXPixels(int); 
int cpdf_getImageResolutionYPixels(int); 
double cpdf_getImageResolutionXRes(int); 
double cpdf_getImageResolutionYRes(int); 
void cpdf_endGetImageResolution(void);