Archive for June, 2008

Books for the PDF Programmer

Saturday, June 28th, 2008

PDF Refence Manual Front Cover

1200 pages of specification, referencing about fifty other documents. Not complete, not even self-consistent, but essential.

PDF Hacks Book Front Cover

One of O’Reilly’s Hacks series - a little book about generating, manipulating, annotating, and consuming PDF information. Lots of stuff which doesn’t need Acrobat, including free resources.

Fonts and Encodings Book Front Cover

All too rare these days, a book which pulls together a whole field, from detailed chapters of explanatory prose and historical information, right down to Type 1 and Truetype font formats in detail, and a few chapters on TeX fonts, too.

Computer Graphics Book Front Cover

Twelve years after it’s latest edition, and almost twenty since it’s first, still essential reference material.

Fax Modem Sourcebook Front Cover

Pretty much the only sensible source for information on CCITT fax encodings, which PDF uses for one bit-per-pixel bitmaps. The standards documents are virtually impenetrable.

Compiling code under OCaml and F#

Tuesday, June 24th, 2008

I spent a couple of afternoons last week beginning to compile our CamlPDF library under F#, with the intention of making our PDF Command Line Tools available as a .NET library.

CamlPDF + the command line suite is about 20000 lines of OCaml, so I’m loath to fork it. I’m half way through now.

Here’s how to deal with conditional compilation:

let digest =
 (*IF-OCAML*)
 Digest.string
 (*ENDIF-OCAML*)
 (*i*)(*F#
 function s ->
  let hasher =
   System.Security.Cryptography.MD5.Create ()
  in
   let ascii = Bytearray.string_to_ascii s in
    Bytearray.ascii_to_string (hasher.ComputeHash ascii)
 F#*)(*i*)

(In this instance, making up for the fact that the Digest module from the standard OCaml distribution isn’t available in F#). The (*i*) is to prevent OcamlWeb from asking TeX to interpret the F# part (TeX isn’t keen on the # character).

F# provides a library which gives some of the facilities of OCaml’s Pervasives library, and alternatives for Set and Map (F#’s functorial facilities differ from OCaml), so we have:

module PdfObjMap =
  Map.Make
   (struct
     type t = int
     let compare = compare
   end)

in OCaml, and in F#:

let PdfObjMap : (int, objectdata ref * int) Map.CMapOps = Map.Make compare

There are also subtle differences in the type systems, leading to changes in .mli files, but normally not too extensive. Some extra type annotations are required in ML code too, but again - not many.

A couple of important OCaml libraries are missing - for instance GenLex - in that case, I just wrote a simple lexer myself, to replace my uses of GenLex.

The current stumbling block is a file that simply won’t compile in F# - the compiler freezes - but back to that tomorrow. Once the main CamlPDF library is converted, we can get on to building a nice API for using the PDF tools from .NET languages such as C#, and working out how to package it into a distributable product.

PDF Command Line Tools 1.1

Wednesday, June 4th, 2008

I’ve just uploaded the new version of our tools for merging, splitting, annotating, encrypting and stamping PDF filesĀ here.

New features:
  • -blacklines and -blackfills (Blacken lines and fills)
  • -idir (Add whole directory of files)
  • -scale-to-fit (Scale pages to fit a given paper size)
  • -scale-contents (Scale page contents)
  • Use page size names (e.g a4paper)
  • Multiline text stamps
  • Print page information (media box etc)
Bugs fixed:
  • -stdin and -stdout now work on Windows
  • Encrypted files now viewable in Acrobat 5.0
  • Merge now produces smaller files when several partsĀ taken from a single input file
A bugfix release of CamlPDF (which forms the basis for these tools) will be released soon. More substantial feature additions for CamlPDF are in the works - more details later.