Chapter 1
Basic Usage

  -help                    --help                   -version  
  -o                       -i                       -idir <directory>  
  -recrypt                 -stdout                  -stdin  
  -stdin-user <password>   -stdin-owner <password>  -producer <text>  
  -creator <text>          -change-id               -l  
  -cpdflin <filename>      -keep-l                  -no-preserve-objstm  
  -create-objstm           -control <filename>      -args <filename>  
  -utf8                    -stripped                -raw  
  -no-embed-font           -create-pdf              -create-pdf-pages  
  -create-pdf-papersize    -gs                      -gs-malformed  
  -gs-malformed-force      -gs-quiet                -error-on-malformed

The Coherent PDF tools provide a wide range of facilities for modifying PDF files created by other means. There is a single command-line program cpdf (cpdf.exe under Microsoft Windows). The rest of this manual describes the options that may be given to this program.

1.1 Documentation

The operation -help / --help prints each operation and option together with a short description. The operation -version prints the cpdf version string.

1.2 Input and Output Files

The typical pattern for usage is

cpdf [<operation>] <input file(s)>  -o <output file>

and the simplest concrete example, assuming the existence of a file in.pdf is:

cpdf in.pdf -o out.pdf

which copies in.pdf to out.pdf. The input and output may be the same file. Of course, we should like to do more interesting things to the PDF file than that!

Files on the command line are distinguished from other input by their containing a period. If an input file does not contain a period, it should be preceded by -i. For example:

cpdf -i in -o out.pdf

A whole directory of files may be added (where a command supports multiple files) by using the -idir option:

cpdf -merge -idir myfiles -o out.pdf

The files in the directory myfiles are considered in alphabetical order. They must all be PDF files. If the names of the files are numeric, leading zeroes will be required for the order to be correct (e.g 001.pdf, 002.pdf etc).

1.3 Input Ranges

An input range may be specified after each input file. This is treated differently by each operation. For instance

cpdf in.pdf 2-5 -o out.pdf

extracts pages two, three, four and five from in.pdf, writing the result to out.pdf, assuming that in.pdf contains at least five pages. Here are the rules for building input ranges:

For example:

cpdf in.pdf 1,2,7-end -o out.pdf

Remove pages three, four, five and six from a document.

cpdf in.pdf 1-16odd -o out.pdf

Extract the odd pages 1,3,...,13,15.

cpdf in.pdf landscape -rotate 90 -o out.pdf

Rotate all landscape pages by ninety degrees.

cpdf in.pdf 1,all -o out.pdf

Duplicate the front page of a document, perhaps as a fax cover sheet.

cpdf in.pdf ~3-~1 -o out.pdf

Extract the last three pages of a document, in order.

1.4 Working with Encrypted Documents

In order to perform many operations, encrypted input PDF files must be decrypted. Some require the owner password, some either the user or owner passwords. Either password is supplied by writing user=<password> or owner=<password> following each input file requiring it (before or after any range). The document will not be re-encrypted upon writing. For example:

cpdf in.pdf user=charles -info
cpdf in.pdf owner=fred reverse -o out.pdf

To re-encrypt the file with its existing encryption upon writing, which is required if only the user password was supplied, but allowed in any case, add the -recrypt option:

cpdf in.pdf user=charles reverse -recrypt -o out.pdf

The password required (owner or user) depends upon the operation being performed. Separate facilities are provided to decrypt and encrypt files (See Section 4).

1.5 Standard Input and Standard Output

Thus far, we have assumed that the input PDF will be read from a file on disk, and the output written similarly. Often it’s useful to be able to read input from stdin (Standard Input) or write output to stdout (Standard Output) instead. The typical use is to join several programs together into a pipe, passing data from one to the next without the use of intermediate files. Use -stdin to read from standard input, and -stdout to write to standard input, either to pipe data between multiple programs, or multiple invocations of the same program. For example, this sequence of commands (all typed on one line)

   cpdf in.pdf reverse -stdout |  
   cpdf -stdin 1-5 -stdout |  
   cpdf -stdin reverse -o out.pdf

extracts the last five pages of in.pdf in the correct order, writing them to out.pdf. It does this by reversing the input, taking the first five pages and then reversing the result.

To supply passwords for a file from -stdin, use -stdin-owner <password> and/or -stdin-user <password>.

Using -stdout on the final command in the pipeline to output the PDF to screen is not recommended, since PDF files often contain compressed sections which are not screen-readable.

Several cpdf operations write to standard output by default (for example, listing fonts). A useful feature of the command line (not specific to cpdf) is the ability to redirect this output to a file. This is achieved with the > operator:

cpdf -info in.pdf > file.txt

Use the -info operation (See Section 11.1), redirecting the output to file.txt.

1.6 Doing Several Things at Once with AND

The keyword AND can be used to string together several commands in one. The advantage compared with using pipes is that the file need not be repeatedly parsed and written out, saving time.

To use AND, simply leave off the output specifier (e.g -o) of one command, and the input specifier (e.g filename) of the next. For instance:

cpdf -merge in.pdf in2.pdf AND -add-text "Label"        AND -merge in3.pdf -o out.pdf

Merge in.pdf and in2.pdf together, add text to both pages, append in3.pdf and write to out.pdf.

To specify the range for each section, use -range:

cpdf -merge in.pdf in2.pdf AND -range 2-4 -add-text "Label"         AND -merge in3.pdf -o out.pdf

1.7 Units

When measurements are given to cpdf, they are in points (1 point = 1/72 inch). They may optionally be followed by some letters to change the measurement. The following are supported:


pt  P oints(72pointsperinch). Thedefault.
cm  C entim eters
mm  M illim eters
in  Inches

For example, one may write 14mm or 21.6in. In addition, the following letters stand, in some operations (-scale-page, -scale-to-fit, -scale-contents, -shift, -mediabox,
-crop) for various page dimensions:


   PW   Pagew idth
   PH   Pageh eight
PMINX   Pagem inim um x coord inate
PMINY   Pagem inim um y coord inate
PMAXX   Pagem aximum  xcoordin ate
PMAXY   Pagem aximum  ycoordin ate
   CW   Crop boxw idth
   CH   Crop boxheight
CMINX   Crop boxm inim um xcoordinate
CMINY   Crop boxm inim um y coordinate
CMAXX   Crop boxm axim um x coord inate
CMAXY   Crop boxm axim um y coord inate

For example, we may write PMINXPMINY to stand for the coordinate of the lower left corner of the page.

Simple arithmetic may be performed using the words add, sub, mul and div to stand for addition, subtraction, multiplication and division. For example, one may write 14in sub 30pt or PMINX mul 2

1.8 Setting the Producer and Creator

The -producer and -creator options may be added to any cpdf command line to set the producer and/or creator of the PDF file. If the file was converted from another format, the creator is the program producing the original, the producer the program converting it to PDF.

cpdf -merge in.pdf in2.pdf -producer MyMerger -o out.pdf

Merge in.pdf and in2.pdf, setting the producer to MyMerger and writing the output to out.pdf.

1.9 PDF Version Numbers

When an operation which uses a part of the PDF standard which was introduced in a later version than that of the input file, the PDF version in the output file is set to the later version (most PDF viewers will try to load any PDF file, even if it is marked with a later version number). However, this automatic version changing may be suppressed with the -keep-version option. If you wish to manually alter the PDF version of a file, use the -set-version operation described in Section 17.5.

1.10 File IDs

PDF files contain an ID (consisting of two parts), used by some workflow systems to uniquely identify a file. To change the ID, behavior, use the -change-id operation. This will create a new ID for the output file.

cpdf -change-id in.pdf -o out.pdf

Write in.pdf to out.pdf, changing the ID.

1.11 Linearization

Linearized PDF is a version of the PDF format in which the data is held in a special manner to allow content to be fetched only when needed. This means viewing a multipage PDF over a slow connection is more responsive. By default, cpdf does not linearize output files. To make it do so, add the -l option to the command line, in addition to any other command being used. For example:

cpdf -l in.pdf -o out.pdf

Linearize the file in.pdf, writing to out.pdf.

This requires the existence of the external program cpdflin which is provided with commercial versions of cpdf. This must be installed as described in the installation documentation provided with your copy of cpdf. If you are unable to install cpdflin, you must use -cpdflin to let cpdf know where to find it:

cpdf.exe -cpdflin "C:\\cpdflin.exe" -l in.pdf -o out.pdf

Linearize the file in.pdf, writing to out.pdf.

In extremis, you may place cpdflin and its resources in the current working directory, though this is not recommended. For further help, refer to the installation instructions for your copy of cpdf.

To keep the existing linearization status of a file (produce linearized output if the input is linearized and the reverse), use -keep-l instead of -l.

1.12 Object Streams

PDF 1.5 introduced a new mechanism for storing objects to save space: object streams. by default, cpdf will preserve object streams in input files, creating no more. To prevent the retention of existing object streams, use -no-preserve-objstm:

cpdf -no-preserve-objstm in.pdf -o out.pdf

Write the file in.pdf to out.pdf, removing any object streams.

To create new object streams if none exist, or augment the existing ones, use -create-objstm:

cpdf -create-objstm in.pdf -o out.pdf

Write the file in.pdf to out.pdf, preserving any existing object streams, and creating any new ones for new objects which have been added.

To create wholly new object streams, use both options together:

cpdf -create-objstm -no-preserve-objstm in.pdf -o out.pdf

Write the file in.pdf to out.pdf with wholly new object streams.

Files written with object streams will be set to PDF 1.5 or higher, unless -keep-version is used (see above).

1.13 Malformed Files

There are many malformed PDF files in existence, including many produced by otherwise-reputable applications. cpdf attempts to correct these problems silently.

Grossly malformed files will be reconstructed. The reconstruction progress is shown on stderr (Standard Error):

$cpdf in.pdf -o out.pdf
couldnt lex object number
Attempting to reconstruct the malformed pdf in.pdf...
Read 5530 objects
Malformed PDF reconstruction succeeded!

If cpdf cannot reconstruct a malformed file, it is able to use the gs program to try to reconstruct the PDF file, if you have it installed. For example, if gs is installed and in your path, we might try:

cpdf -gs gs -gs-malformed in.pdf -o out.pdf

To suppress the output of gs use the -gs-quiet option.

If the malformity lies inside an individual page of the PDF, rather than in its gross structure, cpdf may appear to succeed in reconstruction, only to fail when processing a page (e.g when adding text). To force the use of gs to pre-process such files so cpdf cannot fail on them, use -gs-malformed-force:

cpdf in.pdf -gs gs -gs-malformed-force -o out.pdf [-gs-quiet]

The command line for -gs-malformed-force must be of precisely this form. Sometimes, on the other hand, we might wish cpdf to fail immediately on any malformed file, rather than try its own reconstruction process. The option -error-on-malformed achieves this.

Sometimes files can be technically well-formed but use inefficient PDF constructs. If you are sure the input files you are using are impeccably formed, the -fast option may be added to the command line (or, if using AND, to each section of the command line). This will use certain shortcuts which speed up processing, but would fail on badly-produced files. The -fast option may be used with:

Chapter 3
-rotate-contents -upright -vflip -hflip
-shift -scale-page -scale-to-fit -scale-contents
-show-boxes -hard-box -trim-marks

Chapter 8
-add-text -add-rectangle
-stamp-on -stamp-under -combine-pages

Chapter 9
-twoup -twoup-stack

If problems occur, refrain from using -fast.

1.14 Error Handling

When cpdf encounters an error, it exits with code 2. An error message is displayed on stderr (Standard Error). In normal usage, this means it’s displayed on the screen. When a bad or inappropriate password is given, the exit code is 1.

1.15 Control Files

cpdf -control <filename>
cpdf -args <filename>

Some operating systems have a limit on the length of a command line. To circumvent this, or simply for reasons of flexibility, a control file may be specified from which arguments are drawn. This file does not support the full syntax of the command line. Commands are separated by whitespace, quotation marks may be used if an argument contains a space, and the sequence \" may be used to introduce a genuine quotation mark in such an argument.

Several -control arguments may be specified, and may be mixed in with conventional command-line arguments. The commands in each control file are considered in the order in which they are given, after all conventional arguments have been processed. It is recommended to use -args in all new applications. However, -control will be supported for legacy applications.

To avoid interference between -control and AND, a new mechanism has been added. Using -args in place of -control will perform direct textual substitution of the file into the command line, prior to any other processing.

1.16 String Arguments

Command lines are handled differently on each operating system. Some characters are reserved with special meanings, even when they occur inside quoted string arguments. To avoid this problem, cpdf performs processing on string arguments as they are read.

A backslash is used to indicate that a character which would otherwise be treated specially by the command line interpreter is to be treated literally. For example, Unix-like systems attribute a special meaning to the exclamation mark, so the command line

cpdf -add-text "Hello!" in.pdf -o out.pdf

would fail. We must escape the exclamation mark with a backslash:

cpdf -add-text "Hello\!" in.pdf -o out.pdf

It follows that backslashes intended to be taken literally must themselves be escaped (i.e. written \\).

1.17 Text Encodings

Some cpdf commands write text to standard output, or read text from the command line or configuration files. These are:

-info
-list-bookmarks
-set-author et al.
-list-annotations
-dump-attachments

There are three options to control how the text is interpreted:

-utf8
-stripped
-raw

Add -utf8 to use Unicode UTF8, -stripped to convert to 7 bit ASCII by dropping any high characters, or -raw to perform no processing. The default unless specified in the documentation for an individual operation is -stripped.

1.18 Font Embedding

Use the -no-embed-font to avoid embedding the Standard 14 Font metrics when adding text with -add-text.

1.19 Creating a new PDF

Cpdf can build a new PDF file, given a number of pages and a paper size. The default is one page, A4 portrait.

cpdf -create-pdf -create-pdf-pages 20
        -create-pdf-papersize usletterportrait -o out.pdf

The standard paper sizes are listed in Section 3.1, or you may specify the width and height directly, as described in the same chapter.

Python Interface

 
Loading the libpypcdf and libcpdf DLLs 
-------------------------------------- 
 
Before using the library, you must load the ‘‘libpycpdf‘‘ and ‘‘libcpdf‘‘ DLLs. 
This is achieved with the ‘‘pycpdflib.loadDLL‘‘ function, given the filename or 
full path of the ‘‘libpycpdf‘‘ DLL. 
 
On Windows, you may have to call ‘‘os.add_dll_directory‘‘ first. On MacOS, you 
may need to give the full path, and you may need to install ‘‘libcpdf.so‘‘ in a 
standard location ‘‘/usr/local/lib/‘‘, or use the ‘‘install_name_tool‘‘ command 
to tell ‘‘libpycpdf.so‘‘ where to find ‘‘libcpdf.so‘‘. 
 
Conventions 
----------- 
 
Any function may raise the exception ‘‘CPDFError‘‘, carrying a string describing 
the error. 
 
A range is a list of integers specifying page numbers. Page numbers start at 
1. Range arguments are called r. 
 
Text arguments and results are in UTF8. 
 
Units are in PDF points (1/72 inch). 
 
Angles are in degrees. 
 
 
Built-in values 
--------------- 
 
Paper sizes: 
 
a0portrait a1portrait a2portrait a3portrait a4portrait a5portrait a0landscape 
a1landscape a2landscape a3landscape a4landscape a5landscape usletterportrait 
usletterlandscape uslegalportrait uslegallandscape 
 
Permissions: 
 
noEdit noPrint noCopy noAnnot noForms noExtract noAssemble noHqPrint 
 
Encryption methods: 
 
pdf40bit pdf128bit aes128bitfalse aes128bittrue aes256bitfalse aes256bittrue 
aes256bitisofalse aes256bitisotrue 
 
Positions: 
 
Positions with two numbers in a tuple e.g (posLeft, 10.0, 20.0) 
 
posCentre posLeft posRight 
 
Positions with one number in a tuple e.g (top, 5.0) 
 
top topLeft topRight left bottomLeft bottomRight right 
 
Positions with no numbers e.g diagonal 
 
diagonal reverseDiagonal 
 
Fonts: 
 
timesRoman timesBold timesItalic timesBoldItalic helvetica helveticaBold 
helveticaOblique helveticaBoldOblique courier courierBold courierOblique 
courierBoldOblique 
 
Justification: 
 
leftJustify centreJustify rightJustify 
 
Page layouts: 
 
singlePage oneColumn twoColumnLeft twoColumnRight twoPageLeft twoPageRight 
 
Page modes: 
 
useNone useOutlines useThumbs useOC useAttachments 
 
Page label styles: 
 
decimalArabic uppercaseRoman lowercaseRoman uppercaseLetters lowercaseLetters 
 
 
# CHAPTER 0. Preliminaries 
 
class Pdf: 
    """The type of PDF documents.""" 
 
def loadDLL(f): 
    """Load the libpycpdf DLL from a given file, and set up pycpdflib. Must be 
    called prior to using any other function in the library.""" 
 
class CPDFError(Exception): 
    """Any function may raise an exception CPDFError, carrying a string 
    describing what went wrong.""" 
 
def lastError(): 
    """Return the last error. Not usually used directly, since pycpdflib 
    functions raise exceptions.""" 
 
def lastErrorString(): 
    """Return the last error string. Not usually used directly, since pycpdflib 
    functions raise exceptions.""" 
 
def checkerror(): 
    """Raise an exception if the last function call resulted in an error. Not 
    used directly, since pycpdflib functions will raise the exception 
    directly.""" 
 
def version(): 
    """Return the version number of the pycpdflib library.""" 
 
def setFast(): 
    """ Set fast mode. Some operations have a fast mode. The default is slow 
    mode, which works even on old-fashioned files. For more details, see 
    section 1.13 of the CPDF manual. This function sets the mode globally. """ 
 
def setSlow(): 
    """ Set slow mode. Some operations have a fast mode. The default is slow 
    mode, which works even on old-fashioned files. For more details, see 
    section 1.13 of the CPDF manual. This function sets the mode globally. """ 
 
def clearError(): 
    """ Clear the current error state. """ 
 
def onExit(): 
    """ A debug function which prints some information about 
    resource usage. This can be used to detect if PDFs or ranges are being 
    deallocated properly."""
 
# CHAPTER 1. Basics 
 
def fromFile(filename, userpw): 
    """ Load a PDF file from a given file. 
    Supply a user password (possibly blank) in case the file is encypted. It 
    wont be decrypted, but sometimes the password is needed just to load the 
    file.""" 
 
def fromFileLazy(filename, userpw): 
    """ Loads a PDF from a file, doing only 
    minimal parsing. The objects will be read and parsed when they are actually 
    needed.  Use this when the whole file wont be required. Also supply a user 
    password (possibly blank) in case the file is encypted. It wont be 
    decrypted, but sometimes the password is needed just to load the file.""" 
 
def fromMemory(data, userpw): 
    """ Load a file from a byte array and the user password (blank if none).""" 
 
def fromMemoryLazy(data, userpw): 
    """ Load a file from from a byte array and the user password (blank if 
    none), but lazily like fromFileLazy.""" 
 
def blankDocument(w, h, pages): 
    """ Create a blank document 
    with pages of the given width (in points), height (in points), and number 
    of pages.""" 
 
def blankDocumentPaper(papersize, pages): 
    """Create a blank document with pages of the given paper size, and number 
    of pages. """ 
 
def ptOfCm(i): 
    """Convert a figure in centimetres to points (72 points to 1 inch).""" 
 
def ptOfMm(i): 
    """Convert a figure in millimetres to points (72 points to 1 inch).""" 
 
def ptOfIn(i): 
    """Convert a figure in inches to points (72 points to 1 inch).""" 
 
def cmOfPt(i): 
    """Convert a figure in points to centimetres (72 points to 1 inch).""" 
 
def mmOfPt(i): 
    """Convert a figure in points to millimetres (72 points to 1 inch).""" 
 
def inOfPt(i): 
    """Convert a figure in points to inches (72 points to 1 inch).""" 
 
def parsePagespec(pdf, pagespec): 
    """Parse a page specification such as "1-3,8-end" to a range with reference to 
    a given PDF (the PDF is supplied so that page ranges which reference pages 
    which do not exist are rejected).""" 
 
def validatePagespec(pagespec): 
    """Validate a page specification, returning True or False, so far as is 
    possible in the absence of the actual document.""" 
 
def stringOfPagespec(pdf, r): 
    """Build a page specification from a page 
    range. For example, the range containing 1,2,3,6,7,8 in a document of 8 
    pages might yield "1-3,6-end" """ 
 
def blankRange(): 
    """Create a range with no pages in.""" 
 
def pageRange(f, t): 
    """ Nuild a range from one page to another inclusive. 
    For example, pageRange(3,7) gives the range 3,4,5,6,7. """ 
 
def all(pdf): 
    """The range containing all the pages in a given document.""" 
 
def even(r): 
    """A range which contains just the even pages of another 
    range.""" 
 
def odd(r): 
    """A range which contains just the odd pages of another 
    range.""" 
 
def rangeUnion(a, b): 
    """The union of two ranges giving a range containing 
    the pages in range a and range b.""" 
 
def difference(a, b): 
    """The difference of two ranges, giving a range 
    containing all the pages in a except for those which are also in b.""" 
 
def removeDuplicates(r): 
    """Deduplicates a range, returning a new one.""" 
 
def rangeLength(r): 
    """The number of pages in a range.""" 
 
def rangeGet(r, n): 
    """Get the page number at position n in a range, where 
 
def rangeAdd(r, p): 
    """Add the page to a range, if it is not already 
    there.""" 
 
def isInRange(r, p): 
    """Returns True if the page p is in the range r, False otherwise.""" 
 
def pages(pdf): 
    """Return the number of pages in a PDF.""" 
    r = libc.pycpdf_pages(pdf.pdf) 
    checkerror() 
    return r 
 
def pagesFast(userpw, filename): 
    """Return the number of pages in a given 
    PDF, with given user password. It tries to do this as fast as 
    possible, without loading the whole file.""" 
 
def toFile(pdf, filename, linearize, make_id): 
    """Write the file to a given filename. If linearize is True, it will be 
    linearized, if supported by libcpdf. If make_id is True, it will be given a 
    new ID.""" 
 
def toFileExt(pdf, filename, linearize, make_id, preserve_objstm, 
              generate_objstm, compress_objstm): 
    """Write the file to a given filename. If linearize is True, it will be 
    linearized, if supported by libcpdf. If make_id is True, it will be given a 
    new ID.  If preserve_objstm is True, existing object streams will be 
    preserved. If generate_objstm is True, object streams will be generated 
    even if not originally present. If compress_objstm is True, object streams 
    will be compressed (what we usually want). WARNING: the pdf argument will 
    be invalid after this call and should not be used again.""" 
 
 
def toMemory(pdf, linearize, make_id): 
    """Write a file to memory, returning the buffer as a byte array of type 
    bytes.""" 
 
def isEncrypted(pdf): 
    """Returns True if a documented is encrypted, False otherwise.""" 
    r = libc.pycpdf_isEncrypted(pdf.pdf) 
    checkerror() 
    return r 
 
def toFileEncrypted(pdf, method, permissions, ownerpw, userpw, linearize, 
                    makeid, filename): 
    """Write the file to a given filename encrypted with the given encryption 
    method, permissions list, and owener and user passwords. If linearize is 
    True, it will be linearized, if supported by libcpdf. If make_id is True, 
    it will be given a new ID.""" 
 
def toFileEncryptedExt(pdf, method, permissions, ownerpw, userpw, linearize, 
                       makeid, preserve_objstm, generate_objstm, 
                       compress_objstm, filename): 
    """Write the file to a given filename encrypted with the given encryption 
    method, permissions list, and owener and user passwords. If linearize is 
    True, it will be linearized, if supported by libcpdf. If make_id is True, 
    it will be given a new ID. If preserve_objstm is True, existing object 
    streams will be preserved. If generate_objstm is True, object streams will 
    be generated even if not originally present. If compress_objstm is True, 
    object streams will be compressed (what we usually want). WARNING: the pdf 
    argument will be invalid after this call and should not be used again.""" 
 
def decryptPdf(pdf, userpw): 
    """Attempts to decrypt a PDF using the given user password. An exception is 
    raised in the event of a bad password.""" 
 
def decryptPdfOwner(pdf, ownerpw): 
    """Attempts to decrypt a PDF using the given owner password. An exception 
    is raised in the event of a bad password.""" 
 
def hasPermission(pdf, perm): 
    """Returns True if the given permission (restriction) is present.""" 
 
def encryptionKind(pdf): 
    """Return the encryption method currently in use on a document."""