Chapter 13
Working with Images

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path>]      [-dedup | -dedup-perpage] -o <path>

cpdf -image-resolution <minimum resolution> in.pdf [<range>]

13.1 Extracting images

Cpdf can extract the raster images to a given location. JPEG, JPEG2000 and JBIG2 images are extracted directly. Other images are written as PNGs, processed with either ImageMagick’s “magick” command, or NetPBM’s “pnmtopng” program, whichever is installed.

cpdf -extract-images in.pdf [<range>] [-im <path>] [-p2p <path]      [-dedup | -dedup-perpage] -o <path>

The -im or -p2p option is used to give the path to the external tool, one of which must be installed. The output specifer, e.g -o output/%%% gives the number format for numbering the images. Output files are named serially from 0, and include the page number too. For example, output files might be called output/000-p1.jpg, output/001-p1.png, output/002-p3.jpg etc. Here is an example invocation:

cpdf -extract-images in.pdf -im magick -o output/%%%

The output directory must already exist. The -dedup option deduplicates images entirely; the -dedup-perpage option only per page.

13.2 Detecting Low-resolution Images

To list all images in the given range of pages which fall below a given resolution (in dots-per-inch), use the -image-resolution function:

cpdf -image-resolution 300 in.pdf [<range>]

2, /Im5, 531, 684, 149.935297, 150.138267
2, /Im6, 184, 164, 149.999988, 150.458710
2, /Im7, 171, 156, 149.999996, 150.579145
2, /Im9, 65, 91, 149.999986, 151.071856
2, /Im10, 94, 60, 149.999990, 152.284285
2, /Im15, 184, 139, 149.960011, 150.672060
4, /Im29, 53, 48, 149.970749, 151.616446

The format is page number, image name, x pixels, y pixels, x resolution, y resolution. The resolutions refer to the image’s effective resolution at point of use (taking account of scaling, rotation etc).

13.3 Removing an Image

To remove a particular image, find its name using -image-resolution with a sufficiently high resolution (so as to list all images), and then apply the -draft and -draft-remove-only operations from Section 18.1.

Java Interface

 
/* CHAPTER 13. Images. */ 
 
/** Gets image data, including resolution at all points of use. Call 
{@link #startGetImageResolution(pdf, double) startGetImageResolution(pdf, 
min_required_resolution)} to begin the process of obtaining data on all 
image uses below <code>min_required_resolution</code>, returning the total 
number. So, to return all image uses, specify a very high 
<code>min_required_resolution</code>. Then, call the other functions giving 
a serial number <code>0...n - 1</code>, to retrieve the data. Finally, call 
{@link #endGetImageResolution() endGetImageResolution} to clean up. */ 
public native int startGetImageResolution(Pdf pdf, double res) 
    throws CpdfError; 
 
public native int getImageResolutionPageNumber(int serial) 
    throws CpdfError; 
 
public native String getImageResolutionImageName(int serial) 
    throws CpdfError; 
 
public native int getImageResolutionXPixels(int serial) throws CpdfError; 
 
public native int getImageResolutionYPixels(int serial) throws CpdfError; 
 
public native double getImageResolutionXRes(int serial) throws CpdfError; 
 
public native double getImageResolutionYRes(int serial) throws CpdfError; 
 
public native void endGetImageResolution() throws CpdfError;