Chapter 13
Working with Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]      [-dedup | -dedup-perpage] -o <path>

cpdf -image-resolution <minimum resolution> in.pdf [<range>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG, JPEG2000 and JBIG2 images are extracted directly. Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]      [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed. The output specifer, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Detecting Low-resolution Images

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267
2, /Im6, 184, 164, 149.999988, 150.458710
2, /Im7, 171, 156, 149.999996, 150.579145
2, /Im9, 65, 91, 149.999986, 151.071856
2, /Im10, 94, 60, 149.999990, 152.284285
2, /Im15, 184, 139, 149.960011, 150.672060
4, /Im29, 53, 48, 149.970749, 151.616446

The format is page number, image name, x pixels, y pixels, x resolution, y resolution. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

13.3 Removing an Image

To remove a particular image, find its name using -image-resolution with a sufficiently high resolution (so as to list all images), and then apply the -draft and -draft-remove-only operations from Section 18.1.

JavaScript Interface

 
//CHAPTER 13. Images 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {pdf} pdf PDF document 
@arg {number} min_required_resolution minimum required resolution 
@return {number} number of uses */ 
function startGetImageResolution(pdf, min_required_resolution) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {number} page number */ 
function getImageResolutionPageNumber(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {string} image name */ 
function getImageResolutionImageName(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {number} X pixels */ 
function getImageResolutionXPixels(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {number} Y pixels */ 
function getImageResolutionYPixels(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {number} X Res */ 
function getImageResolutionXRes(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. 
@arg {number} n serial number 
@return {number} Y Res */ 
function getImageResolutionYRes(n) {} 
 
/** Gets image data, including resolution at all points of use. Call 
startGetImageResolution(pdf, min_required_resolution) to begin the process of 
obtaining data on all image uses below min_required_resolution, returning the 
total number. So, to return all image uses, specify a very high 
min_required_resolution. Then, call the other functions giving a serial number 
0..n - 1, to retrieve the data. Finally, call endGetImageResolution to clean 
up. */ 
function endGetImageResolution() {}