Chapter 19

cpdf -draft [-boxes] [-draft-remove-only <n>] in.pdf [<range>] -o out.pdf

cpdf -remove-all-text in.pdf [<range>] -o out.pdf

cpdf -blacktext in.pdf [<range>] -o out.pdf

cpdf -blacklines in.pdf [<range>] -o out.pdf

cpdf -blackfills in.pdf [<range>] -o out.pdf

cpdf -thinlines <minimum thickness> in.pdf [<range>] -o out.pdf

cpdf -clean in.pdf -o out.pdf

cpdf -set-version <version number> in.pdf -o out.pdf

cpdf -copy-id-from source.pdf in.pdf -o out.pdf

cpdf -remove-id in.pdf -o out.pdf

cpdf -list-spot-colors in.pdf

cpdf -print-dict-entry <key> in.pdf

cpdf -remove-dict-entry <key> [-dict-entry-search <term>]
      in.pdf -o out.pdf

cpdf -replace-dict-entry <key> -replace-dict-entry-value <value>
     [-dict-entry-search <term>] in.pdf -o out.pdf

cpdf -remove-clipping [<range>] in.pdf -o out.pdf

cpdf -obj <obj num> in.pdf

cpdf -extract-stream[-decompress] <obj num> in.pdf [-o out.dat | -stdout]

19.1 Draft Documents

The -draft operation removes bitmap (photographic) images from a file, so that it can be printed with less ink. Optionally, the -boxes option can be added, filling the spaces left blank with a crossed box denoting where the image was. This is not guaranteed to be fully visible in all cases (the bitmap may be have been partially covered by vector objects or clipped in the original). For example:

cpdf -draft -boxes in.pdf -o out.pdf

To remove a single image only, specify -draft-remove-only, giving the name of the image obtained by a call to -image-resolution as described in Section 13.3 and giving the appropriate page. For example:

cpdf -draft -boxes -draft-remove-only "/Im1" in.pdf 7 -o out.pdf

To remove text instead of images, use the -remove-all-text operation:

cpdf -remove-all-text in.pdf -o out.pdf

19.2 Blackening Text, Lines and Fills

Sometimes PDF output from an application (for instance, a web browser) has text in colors which would not print well on a grayscale printer. The -blacktext operation blackens all text on the given pages so it will be readable when printed.

This will not work on text which has been converted to outlines, nor on text which is part of a form.

cpdf -blacktext in.pdf -o out.pdf

The -blacklines operation blackens all lines on the given pages.

cpdf -blacklines in.pdf -o out.pdf

The -blackfills operation blackens all fills on the given pages.

cpdf -blackfills in.pdf -o out.pdf

Contrary to their names, all these operations can use another color, if specified with -color.

19.3 Hairline Removal

Quite often, applications will use very thin lines, or even the value of 0, which in PDF means ”The thinnest possible line on the output device”. This might be fine for on-screen work, but when printed on a high resolution device, such as by a commercial printer, they may be too faint, or disappear altogether. The -thinlines operation prevents this by changing all lines thinner than <minimal thickness> to the given thickness. For example:

cpdf -thinlines 0.2mm in.pdf [<range>] -o out.pdf

Thicken all lines less than 0.2mm to that value.

19.4 Garbage Collection

Sometimes incremental updates to a file by an application, or bad applications can leave data in a PDF file which is no longer used. This function removes that unneeded data.

cpdf -clean in.pdf -o out.pdf

NB: This operation is deprecated. This work is now done by default upon writing any file.

19.5 Change PDF Version Number

To change the pdf version number, use the -set-version operation, giving the part of the version number after the decimal point. For example:

cpdf -set-version 4 in.pdf -o out.pdf

Change file to PDF 1.4.

This does not alter any of the actual data in the file — just the supposed version number. For PDF versions starting with 2 add ten to the number. For example, for PDF version 2.0, use -set-version 10.

19.6 Copy ID

The -copy-id-from operation copies the ID from the given file to the input, writing to the output.

cpdf -copy-id-from source.pdf in.pdf -o out.pdf

Copy the id from source.pdf to the contents of in.pdf, writing to out.pdf.

If there is no ID in the source file, the existing ID is retained. You cannot use -recrypt with -copy-id-from.

19.7 Remove ID

The -remove-id operation removes the ID from a document.

cpdf -remove-id in.pdf -o out.pdf

Remove the ID from in.pdf, writing to out.pdf.

You cannot use -recrypt with -remove-id.

19.8 List Spot Colours

This operation lists the name of any “separation” color space in the given PDF file.

cpdf -list-spot-colors in.pdf

List the spot colors, one per line in in.pdf, writing to stdout.

19.9 PDF Dictionary Entries

This is for editing data within the PDF’s internal representation. Use with caution. To print a dictionary entry:

cpdf -print-dict-entry /URI in.pdf

Print all URLs in annotation hyperlinks in.pdf.

To remove a dictionary entry:

cpdf -remove-dict-entry /One in.pdf -o out.pdf

Remove the entry for /One in every dictionary in.pdf, writing to out.pdf.

cpdf -remove-dict-entry /One -dict-entry-search "\{I : 1\}"
     in.pdf -o out.pdf

Replace the entry for /One in every dictionary in.pdf if the key’s value is the given CPDFJSON value, writing to out.pdf.

To replace a dictionary entry, give the replacement value in CPDFJSON format:

cpdf -replace-dict-entry /One -replace-dict-entry-value "\{I : 2\}"
     in.pdf -o out.pdf

Remove the entry for /One in every dictionary in.pdf, writing to out.pdf.

cpdf -replace-dict-entry /One -dict-entry-search "\{I : 1\}"
     -replace-dict-entry-value "\{I : 2\}" in.pdf -o out.pdf

Remove the entry for /One in every dictionary in.pdf if the key’s value is the given value, writing to out.pdf.

19.10 Removing Clipping

The -remove-clipping operation removes any clipping paths on given pages from the file.

cpdf -remove-clipping in.pdf -o out.pdf

Remove clipping paths in in.pdf, writing to out.pdf.

19.11 Exploring PDFs

The -obj operation prints an object to standard output, given the object number. Number 0 is the trailer dictionary, so we begin there:

$ cpdf -obj 0 in.pdf
"<</Root 1256 0 R/Length 588/ID[('\029\\t>\249\157\182F_\153V\175z[\234\196)
('\029\\t>\249\157\182F_\153V\175z[\234\196)]/Info 1351 0 R/Size 1406>>"

$ cpdf -obj 1256 in.pdf
"<</OpenAction 1238 0 R/PageLabels<</Nums[0<</S/r>>16<</S/D>>]>>/PageMode
/UseOutlines/Names 924 0 R/Outlines 838 0 R/Pages 851 0 R/Type/Catalog>>"

$ cpdf -obj 1238 out.pdf
"<</D[1225 0 R/Fit]/S/GoTo>>"

A stream may be extracted with -extract-stream or -extract-stream-decompress, which decompresses it first where possible:

$ cpdf -obj 0 hello.pdf
"<</Size 4/Root 4 0 R/ID[(\232\20625\030\179/\176q:O\202\135\176u\137)

$ cpdf -obj 4 hello.pdf
"<</Type/Catalog/Pages 1 0 R>>"

$ cpdf -obj 1 hello.pdf
"<</Type/Pages/Kids[3 0 R]/Count 1>>"

$ cpdf -obj 3 hello.pdf
"<</Type/Page/Parent 1 0 R/Resources<</Font<</F0<</Type/Font/Subtype/Type1/BaseFont
/Times-Italic>>>>>>/MediaBox[0 0 595.275590551 841.88976378]/Rotate 0/Contents
[2 0 R]>>"

$ cpdf -extract-stream-decompress 2 hello.pdf -stdout
1 0 0 1 50 770 cm BT/F0 36 Tf(Hello, World!)Tj ET

By these mechanisms, ad-hoc exploration of PDF files is possible.