Coherent Graphics Ltd | Ocaml and PDF

New Reviews of Old Books #57

Posted on July 11, 2008 by coherent

Digital Typography, Donald E. Knuth, 1999 (Amazon)

Digital Typography Book Front Cover

This collection of more than thirty articles and notes covering the Knuth’s foray into digital type in the late seventies and eighties. They range from font design (a whole chapter on the shape of the letter S), to the history of typography, to some TeX related material (the entire exposition of TeX’s line-breaking algorithm, for instance). There is also plently of the concrete mathematical analysis associated with Knuth, including a piece explaining why arrowheads plotted on bitmap displays can look asymmetrical even at quite high resolutions.

Perhaps the most appealing of the less-technical articles is Chapter 17 (AMS Euler), which describes the collaboration between Knuth and Herman Zapf to produce a new maths font for the American Mathematical Society. It’s drawn directly from the correspondence between the two as the maths of MetaFont progresses along with the design of the font itself.

The book is, as you would expect, beautifully typeset and bound. It costs about sixteen pounds.

Posted in Uncategorized | Tagged fonts, TeX, typography | Leave a comment

Compiling Code Under OCaml and F# (Part Two)

Posted on July 4, 2008 by coherent

[Part One]

Twenty thousand lines of CamlPDF and cpdf later, here are some numbers:

Occasions on which conditional compilation is required: 22
Compilation warnings with fsc –no-warn 62: 15
Time taken: 22 hours

The current executable appears to be about 8 times slower than OCaml native compilation, but I haven’t examined this enough to know how much we might be able to improve upon it.

I’m planning to clean up the code to see how much of the conditional compilation we can get into a single Utility module.

What’s next is to repackage the command line tools as an API for .NET users. I know very little about this topic, so it’s going to require quite a bit of effort before I’m willing to put it on sale and support it.

If you’re familiar with packaging up libraries for .NET and would like to beta test, drop me a line in the comments, or via our website.

Posted in Uncategorized | Tagged camlpdf, F#, fsharp, Ocaml | Leave a comment

Books for the PDF Programmer

Posted on June 28, 2008 by coherent

PDF Refence Manual Front Cover

1200 pages of specification, referencing about fifty other documents. Not complete, not even self-consistent, but essential.

PDF Hacks Book Front Cover

One of O’Reilly’s Hacks series – a little book about generating, manipulating, annotating, and consuming PDF information. Lots of stuff which doesn’t need Acrobat, including free resources.

Fonts and Encodings Book Front Cover

All too rare these days, a book which pulls together a whole field, from detailed chapters of explanatory prose and historical information, right down to Type 1 and Truetype font formats in detail, and a few chapters on TeX fonts, too.

Computer Graphics Book Front Cover

Twelve years after it’s latest edition, and almost twenty since it’s first, still essential reference material.

Fax Modem Sourcebook Front Cover

Pretty much the only sensible source for information on CCITT fax encodings, which PDF uses for one bit-per-pixel bitmaps. The standards documents are virtually impenetrable.

Posted in Uncategorized | Tagged Books, PDF | Leave a comment

Compiling code under OCaml and F#

Posted on June 24, 2008 by coherent

I spent a couple of afternoons last week beginning to compile our CamlPDF library under F#, with the intention of making our PDF Command Line Tools available as a .NET library.

CamlPDF + the command line suite is about 20000 lines of OCaml, so I’m loath to fork it. I’m half way through now.

Here’s how to deal with conditional compilation:


let digest =
 (*IF-OCAML*)
 Digest.string
 (*ENDIF-OCAML*)
 (*i*)(*F#
 function s ->
  let hasher =
   System.Security.Cryptography.MD5.Create ()
  in
   let ascii = Bytearray.string_to_ascii s in
    Bytearray.ascii_to_string (hasher.ComputeHash ascii)
 F#*)(*i*)

(In this instance, making up for the fact that the Digest module from the standard OCaml distribution isn’t available in F#). The (*i*) is to prevent OcamlWeb from asking TeX to interpret the F# part (TeX isn’t keen on the # character).

F# provides a library which gives some of the facilities of OCaml’s Pervasives library, and alternatives for Set and Map (F#’s functorial facilities differ from OCaml), so we have:


module PdfObjMap =
  Map.Make
   (struct
     type t = int
     let compare = compare
   end)

in OCaml, and in F#:


let PdfObjMap : (int, objectdata ref * int) Map.CMapOps = Map.Make compare

There are also subtle differences in the type systems, leading to changes in .mli files, but normally not too extensive. Some extra type annotations are required in ML code too, but again – not many.

A couple of important OCaml libraries are missing – for instance GenLex – in that case, I just wrote a simple lexer myself, to replace my uses of GenLex.

The current stumbling block is a file that simply won’t compile in F# – the compiler freezes – but back to that tomorrow. Once the main CamlPDF library is converted, we can get on to building a nice API for using the PDF tools from .NET languages such as C#, and working out how to package it into a distributable product.

Posted in Uncategorized | Tagged F#, Ocaml | Leave a comment

PDF Command Line Tools 1.1

Posted on June 4, 2008 by coherent

I’ve just uploaded the new version of our tools for merging, splitting, annotating, encrypting and stamping PDF files here.

New features:

-blacklines and -blackfills (Blacken lines and fills)
-idir (Add whole directory of files)
-scale-to-fit (Scale pages to fit a given paper size)
-scale-contents (Scale page contents)
Use page size names (e.g a4paper)
Multiline text stamps
Print page information (media box etc)

Bugs fixed:

-stdin and -stdout now work on Windows
Encrypted files now viewable in Acrobat 5.0
Merge now produces smaller files when several parts taken from a single input file

A bugfix release of CamlPDF (which forms the basis for these tools) will be released soon. More substantial feature additions for CamlPDF are in the works – more details later.

Posted in Uncategorized | Tagged camlpdf, Coherent Graphics Ltd, cpdf, Ocaml, PDF | Leave a comment

Storing Colours in 31 bits (Part 2)

Posted on April 15, 2008 by coherent

Jean-Baptiste Rouquier rose to the challenge in my last post: to provide a fast way of storing premultiplied colours in OCaml’s 31bit integers (see his post here).

With his kind permission, I’ve included this code (somewhat optimized) in the Colour module, together with the longstanding compositing code.

http://www.coherentgraphics.co.uk/colour.zip

It’s under the BSD license.

Posted in Uncategorized | Tagged Graphics, Ocaml | Leave a comment

A Simple Parser with Genlex

Posted on April 9, 2008 by coherent

(If you’re reading this via the OCaml planet, click the post title to see it with proper formatting – I shall fix this problem for the next post, I hope.)

When calling our command line PDF Tools to, for instance, merge files, one can use page specifiers with a simple grammar. For instance:

cpdf file.pdf 1,2,6-end -o out.pdf

cpdf in.pdf 1,all -o fax.pdf

(The second example is useful to duplicate the first page of a document as a fax cover sheet)

If the grammar is simple, OCaml’s Genlex plus a small recursive parser is sufficient.

Here’s an informal description of the mini language:

• A dash (-) deﬁnes ranges, e.g. 1-5 or 6-3.

• A comma (,) allows one to specify several ranges, e.g. 1-2,4-5.

• The word “end” represents the last page number.

• The words “odd” and “even” represent the odd and even pages.

• The word “reverse” is the same as end-1.

• The word “all” is the same as 1-end.

• A range must contain no spaces.

Our input is the string in this language, our output an ordered list of page numbers. (I’m using some functions from the Utility module available with CamlPDF)

First build a lexer:

let lexer =

Genlex.make_lexer [“-“; “,”; “all”; “reverse”; “even”; “odd”; “end”]

and define a utility function which lexes a string to a list of Genlex lexemes (no need to be lazy here):

let lexwith s =

list_of_stream (lexer (Stream.of_string s))

Now, this language is quite simple to deal with. We’ll pattern-match for all the simple cases – anything which doesn’t contain a comma is finite. If nothing matches, we assume the string contains one or more commas, and attempt to split it up.

Here is the main function, which is given the end page of the PDF file to which the range applies (the start page is always 1), and a list of tokens to match against:

let rec mk_numbers endpage = function

| [Genlex.Int n] -> [n]

| [Genlex.Int n; Genlex.Kwd “-“; Genlex.Int n’] ->

if n > n’ then rev (ilist n’ n) else ilist n n’

| [Genlex.Kwd “end”; Genlex.Kwd “-“; Genlex.Int n] ->

if n <= endpage

then rev (ilist n endpage)

else failwith “n > endpage”

| [Genlex.Int n; Genlex.Kwd “-“; Genlex.Kwd “end”] ->

if n <= endpage

then ilist n endpage

else failwith “n > endpage2”

| [Genlex.Kwd “end”; Genlex.Kwd “-“; Genlex.Kwd “end”] ->

[endpage]

| [Genlex.Kwd “even”] ->

drop_odds (ilist 1 endpage)

| [Genlex.Kwd “odd”] ->

really_drop_evens (ilist 1 endpage)

| [Genlex.Kwd “all”] ->

ilist 1 endpage

| [Genlex.Kwd “reverse”] ->

rev (ilist 1 endpage)

| toks ->

let ranges = splitat_commas toks in

(* Check we’ve made progress *)

if ranges = [toks] then error “Bad page range” else

flatten (map (mk_numbers endpage) ranges)

(ilist x y produces the list [x, x + 1, … y – 1, y], drop_odds [1;2;3;4;5;6;7] is [2;4;6], really_drop_evens [1;2;3;4;5;6] is [1;3;5])

The auxilliary function to split a token list at commas into a list of token lists:

let rec splitat_commas toks =

match cleavewhile (neq (Genlex.Kwd “,”)) toks with

| [], _ -> []

| some, [] -> [some]

| _::_ as before, _::rest -> before::splitat_commas rest

(cleavewhile returns, in order, the elements at the beginning of a list until a predicate is false, paired with the rest of the elements, in order. neq is ( <> ))

And here’s the wrapper function, which unifies the error handling and tests for page numbers not within range.

let parse_pagespec pdf spec =

let endpage = endpage_of_pdf pdf in

let numbers =

try mk_numbers endpage (lexwith spec) with

_ -> error (“Bad page specification ” ^ spec)

iter

(fun n -> if n <> endpage then

error (“Page ” ^ string_of_int n ^ ” does not exist.”))

numbers;

numbers

This is often easier to write and debug than using a separate tool intended for highly complex grammars.

Posted in Uncategorized | Tagged GenLex, lexing, Ocaml | Leave a comment

Storing Colours in 31 bits (Part One)

Posted on April 4, 2008 by coherent

For rendering vector graphics scenes, colours are usually stored with what we call “premultiplied” or “associated” alpha. For instance, opaque dark red is stored as:

0.5, 0, 0, 0.5 (R * A, G * A, B * A, A)

instead of

0.5, 0, 0, 1 (R, G, B, A).

This originally had to do with making compositing algorithms fast (fewer multiplications), but it has other advantages – for instance there is only a single value for the clear colour (0, 0, 0, 0) instead of many (x, y, z, 0).

We usually use 8 bits for each component, packing them into a 32 bit word.

Now, 32 bits can store 2^32 colours. In fact, in the premultiplied scheme, many bit patterns are unused (when R, G or B is > A). It turns out there are only slightly more than 2^30 unique premultiplied colours. In other words, with a suitable mapping, we should be able to store them in OCaml’s 31 bit integers. This is important so we can store them in native arrays unboxed, for example.

Such a mapping (togther with a discussion of all this) is in Jim Blinn’s book “Dirty Pixels”, Chapter 20. Unfortunately, it’s too slow for practical use. Can you think of a fast one?

Meanwhile, here’s some code from our renderer which uses a lossy approach: throwing away the least significant red bit (The question of which colour to lose the bit is not clear: theoretically the eyes are less sensitive to changes in blue, but my tests didn’t seem to bear that out).

To build one of these colours (assertions left out for this post)

let colour_of_rgba r g b a =

(a lsl 23) lor (b lsl 15) lor (g lsl 7) lor (r lsr 1)

Extraction of blue, green and alpha components is easy, but where we’ve dropped the LSB, we need to reconstruct carefully, at least making sure 254 reconstructs to 255 – otherwise we couldn’t represent full red. We must also make sure the invariant that a component can never be more than the alpha is obeyed.

let rec red_of_colour c =

let red =

match (c land 127) lsl 1 with

| 254 -> 255

| r -> r

and alpha = c lsr 23 in

if red > alpha then alpha else red

In Part Two, I’ll release the Colour module, which provides for all this, and implements the standard Porter/Duff compositing arithmetic efficiently.

Posted in Uncategorized | Tagged Graphics, Ocaml | Leave a comment

A Proper GUI for OCaml (Part Two)

Posted on April 1, 2008 by coherent

I’ve packaged up the basic libraries described last time and you can get them here, together with a little example.

It should work on any platform / toolchain where OCaml and Python and WxPython can be installed. It allows you to build standalone windows exes and mac applications out of the box.

As I mentioned before, this is just a proof of concept. I have no intention of writing a generic GUI library for my renderer, instead sticking with the specialised one I’m using already.

That means I’m using a different (specialised) wxgui.ml and main.ml and main.py, but the same mltalk.py, pytalk.ml, camlpy.ml and pycaml.py — and I’ll only be keeping those basic four files up to date.

I’ve put it under the BSD license, so have fun.

Posted in Uncategorized | Tagged GUI, Ocaml, WXPython | Leave a comment

A Proper GUI for OCaml (Part One)

Posted on March 19, 2008 by coherent

Some time ago, I wrote a GUI for a large OCaml project in C++ using WxWidgets, linked to OCaml in a single executable. It was hard and slow to write and crashed all over the place.

So I’ve redone it in a particularly low-tech way. Consider the following:

A GUI layer in pure Python, using the wxPython library
A GUI library written in pure OCaml

We start the two in separate processes, and have them talk via a socket. Sounds horrible – works fine.

Advantages of this as a long-term approach to the building of a proper OCaml GUI library:

Pure. No FFI involved
Wxwidgets is very good. It’s a whole platform and a mature one
Cross-platform by default. Native appearance on Windows, Unix, OS X
Effort required by OCaml community lower than for any other option
Build single executables for release with py2exe and py2app
Write OCaml interfaces to hundreds of python libraries

Disadvantages (please add more in the comments!)

Python and OCaml processes can fail independently
Might need to work on getting executable sizes down
Asynchronous event loops need care: work required to ensure mouse events don’t get out of sync etc – but this is fairly easy to overcome.

Here is a screenshot of our rendering engine using this approach on the Mac, on Windows and under Ubuntu.

I don’t have the time to write a generic OCaml / wxpython library at the moment, so I’m just going to stick with my special purpose implementation in our software for now.

In Part Two, I’ll release the basic libraries:

mltalk.py, pytalk.ml (Establishing a connection between Ocaml and Python processes)
camlpy.ml pycaml.py (Marshalling and Unmarshalling Ocaml and Python data)
Wxgui.ml (Event polling and synchronous events)
main.ml, main.py (Main programs)
Scripts for py2exe and py2app

Hopefully someone will have the time to build a proper library from this example.

Posted in Uncategorized | Tagged GUI, Ocaml, WXPython | Leave a comment

Archives

Meta