Chapter 1
Basic Usage

The Coherent PDF tools provide a wide range of facilities for modifying PDF files created by other means. There is a single command-line program cpdf (cpdf.exe under Microsoft Windows). The rest of this manual describes the options that may be given to this program.

1.1 Input and Output Files

The typical pattern for usage is

and the simplest concrete example, assuming the existence of a file in.pdf is:

which copies in.pdf to out.pdf. The input and output may be the same file. Of course, we should like to do more interesting things to the PDF file than that!

Files on the command line are distinguished from other input by their containing a period. If an input file does not contain a period, it should be preceded by -i. For example:

A whole directory of files may be added (where a command supports multiple files) by using the -idir option:

The files in the directory myfiles are considered in alphabetical order. They must all be PDF files. If the names of the files are numeric, leading zeroes will be required for the order to be correct (e.g 001.pdf, 002.pdf etc).

1.2 Input Ranges

An input range may be specified after each input file. This is treated differently by each operation. For instance

extracts pages two, three, four and five from in.pdf, writing the result to out.pdf, assuming that in.pdf contains at least five pages. Here are the rules for building input ranges:

For example:

1.3 Working with Encrypted Documents

In order to perform many operations, encrypted input PDF files must be decrypted. Some require the owner password, some either the user or owner passwords. Either password is supplied by writing user=<password> or owner=<password> following each input file requiring it (before or after any range). The document will not be re-encrypted upon writing. For example:

To re-encrypt the file with its existing encryption upon writing, which is required if only the user password was supplied, but allowed in any case, add the -recrypt option:

The password required (owner or user) depends upon the operation being performed. Separate facilities are provided to decrypt and encrypt files (See Section 4).

1.4 Standard Input and Standard Output

Thus far, we have assumed that the input PDF will be read from a file on disk, and the output written similarly. Often it’s useful to be able to read input from stdin (Standard Input) or write output to stdout (Standard Output) instead. The typical use is to join several programs together into a pipe, passing data from one to the next without the use of intermediate files. Use -stdin to read from standard input, and -stdout to write to standard input, either to pipe data between multiple programs, or multiple invocations of the same program. For example, this sequence of commands (all typed on one line)

extracts the last five pages of in.pdf in the correct order, writing them to out.pdf. It does this by reversing the input, taking the first five pages and then reversing the result.

To supply passwords for a file from -stdin, use -stdin-owner <password> and/or -stdin-user <password>.

Using -stdout on the final command in the pipeline to output the PDF to screen is not recommended, since PDF files often contain compressed sections which are not screen-readable.

Several cpdf operations write to standard output by default (for example, listing fonts). A useful feature of the command line (not specific to cpdf) is the ability to redirect this output to a file. This is achieved with the > operator:

1.5 Doing Several Things at Once with AND

The keyword AND can be used to string together several commands in one. The advantage compared with using pipes is that the file need not be repeatedly parsed and written out, saving time.

To use AND, simply leave off the output specifier (e.g -o) of one command, and the input specifier (e.g filename) of the next. For instance:

To specify the range for each section, use -range:

1.6 Units

When measurements are given to cpdf, they are in points (1 point = 1/72 inch). They may optionally be followed by some letters to change the measurement. The following are supported:

pt  Points (72 points per inch). The default.
cm  Centimeters
mm  Millimeters
in  Inches

For example, one may write 14mm or 21.6in. In addition, the following letters stand, in some operations (-scale-page, -scale-to-fit, -scale-contents, -shift, -mediabox, -crop) for various page dimensions:

   PW  Page width
   PH  Page height
PMINX  Page minimum x coordinate
PMINY  Page minimum y coordinate
PMAXX  Page maximum  x coordinate
PMAXY  Page maximum  y coordinate
   CW  Crop box width
   CH  Crop box height
CMINX  Crop box minimum  x coordinate
CMINY  Crop box minimum  y coordinate
CMAXX  Crop box maximum  x coordinate
CMAXY  Crop box maximum  y coordinate

For example, we may write PMINX PMINY to stand for the coordinate of the lower left corner of the page.

Simple arithmetic may be performed using the words add, sub, mul and div to stand for addition, subtraction, multiplication and division. For example, one may write 14insub30pt or PMINXmul 2

1.7 Setting the Producer and Creator

The -producer and -creator options may be added to any cpdf command line to set the producer and/or creator of the PDF file. If the file was converted from another format, the creator is the program producing the original, the producer the program converting it to PDF.

1.8 PDF Version Numbers

When an operation which uses a part of the PDF standard which was introduced in a later version than that of the input file, the PDF version in the output file is set to the later version (most PDF viewers will try to load any PDF file, even if it is marked with a later version number). However, this automatic version changing may be suppressed with the -keep-version flag.

Here is a list of Acrobat versions together with the maximum PDF version they are intended to support:

PDF 1.2  Acrobat 3.0
PDF 1.3  Acrobat 4.0
PDF 1.4  Acrobat 5.0
PDF 1.5  Acrobat 6.0
PDF 1.6  Acrobat 7.0
PDF 1.7  Acrobat 8.0, 9.0, 10.0

If you wish to manually alter the PDF version of a file, use the -set-version option described in Section 15.5.

1.9 File IDs

PDF files contain an ID (consisting of two parts), used by some workflow systems to uniquely identify a file. To change the ID, behavior, use the -change-id operation. This will create a new ID for the output file.

1.10 Linearization

Linearized PDF is a version of the PDF format in which the data is held in a special manner to allow content to be fetched only when needed. This means viewing a multipage PDF over a slow connection is more responsive. By default, cpdf does not linearize output files. To make it do so, add the -l option to the command line, in addition to any other command being used. For example:

This requires the existence of the external program cpdflin which is provided with commercial versions of cpdf. This must be installed as described in the installation documentation provided with your copy of cpdf. If you are unable to install cpdflin, you must use -cpdflin to let cpdf know where to find it:

In extremis, you may place cpdflin and its resources in the current working directory, though this is not recommended. For further help, refer to the installation instructions for your copy of cpdf.

To keep the existing linearization status of a file (produce linearized output if the input is linearized and the reverse), use -keep-l instead of -l.

1.11 Object Streams

PDF 1.5 introduced a new mechanism for storing objects to save space: object streams. by default, cpdf will preserve object streams in input files, creating no more. To prevent the retention of existing object streams, use -no-preserve-objstm:

To create new object streams if none exist, or augment the existing ones, use -create-objstm:

To create wholly new object streams, use both options together:

Files written with object streams will be set to PDF 1.5 or higher, unless -keep-version is used (see above).

1.12 Malformed Files

There are many malformed PDF files in existence, including many produced by otherwise-reputable applications. cpdf attempts to correct these problems silently.

Grossly malformed files will be reconstructed. The reconstruction progress is shown on stderr (Standard Error):

Sometimes files can be technically well-formed but use inefficient PDF constructs. If you are sure the input files you are using are impeccably formed, the -fast option added to the command line (or, if using AND, to each section of the command line). This will use certain shortcuts which speed up processing, but would fail on badly-produced files.

The -fast option may be used with:

If problems occur, refrain from using -fast.

1.13 Error Handling

When cpdf encounters an error, it exits with code 2. An error message is displayed on stderr (Standard Error). In normal usage, this means it’s displayed on the screen. When a bad or inappropriate password is given, the exit code is 1.

1.14 Control Files

Some operating systems have a limit on the length of a command line. To circumvent this, or simply for reasons of flexibility, a control file may be specified from which arguments are drawn. This file does not support the full syntax of the command line. Commands are separated by whitespace, quotation marks may be used if an argument contains a space, and the sequence \" may be used to introduce a genuine quotation mark in such an argument.

Several -control arguments may be specified, and may be mixed in with conventional command-line arguments. The commands in each control file are considered in the order in which they are given, after all conventional arguments have been processed. It is recommended to use -args in all new applications. However, -control will be supported for legacy applications.

To avoid interference between -control and AND, a new mechanism has been added. Using -args in place of -control will perform direct textual substitution of the file into the command line, prior to any other processing.

1.15 String Arguments

Command lines are handled differently on each operating system. Some characters are reserved with special meanings, even when they occur inside quoted string arguments. To avoid this problem, cpdf performs processing on string arguments as they are read.

A backslash is used to indicate that a character which would otherwise be treated specially by the command line interpreter is to be treated literally. For example, Unix-like systems attribute a special meaning to the exclamation mark, so the command line

would fail. We must escape the exclamation mark with a backslash:

It follows that backslashes intended to be taken literally must themselves be escaped (i.e. written \\).

1.16 Text Encodings

Some cpdf commands write text to standard output, or read text from the command line or configuration files. These are:

There are three options to control how the text is interpreted:

Add -utf8 to use Unicode UTF8, -stripped to convert to 7 bit ASCII by dropping any high characters, or -raw to perform no processing. The default is -stripped.

1.17 Font Embedding

Use the -no-embed-font to avoid embedding the Standard 14 Font metrics when adding text with -add-text.