The Structure of OCaml Programs

Now we're going to take some time out to take a high-level look at some real OCaml programs. I want to teach you about local and global definitions, modules, nested functions, and references. For this we're going to look at a lot of OCaml concepts which won't yet make sense because we haven't seen them before. Don't worry about the details for the moment. Concentrate instead on the overall shape of the programs and the features which I'll point out.

Local "variables" (really local expressions)

Let's take the average function and add a local variable in C. (Compare it to the first definition we had above).

double average (double a, double b)
{
  double sum = a + b;
  return sum / 2;
}

Now let's do the same to our OCaml version:

# let average a b =
    let sum = a +. b in
      sum /. 2.0;;
val average : float -> float -> float = <fun>

The standard phrase let name = expression in is used to define a named local expression, and name can then be used later on in the function instead of expression.

Now comparing C local variables and these named local expressions is a sleight of hand. In fact they are somewhat different. The C variable sum has a slot allocated for it on the stack. You can assign to sum later in the function if you want, or even take the address of sum. This is NOT true for the OCaml version. In the OCaml version, sum is just a shorthand name for the expression a +. b. There is no way to assign to sum or change its value in any way. (We'll see how you can do variables whose value changes in a minute).

Here's another example to make this clearer. The following two code snippets should return the same value (namely (a + b) + (a + b)²):

# let f a b =
    (a +. b) +. (a +. b) ** 2.;;
val f : float -> float -> float = <fun>
# let f a b =
    let x = a +. b in
    x +. x ** 2.;;
val f : float -> float -> float = <fun>

The second version might be faster (but most compilers ought to be able to perform this step of "common subexpression elimination" for you), and it is certainly easier to read. x in the second example is just shorthand for a +. b.

Global "variables" (really global expressions)

You can also define global names for things at the top level, and as with our local "variables" above, these aren't really variable at all, just shorthand names for things. Here's a real (but cut-down) example:

let html =
  let content = read_whole_file file in
  GHtml.html_from_string content

let menu_bold () =
  match bold_button#active with
  | true -> html#set_font_style ~enable:[`BOLD] ()
  | false -> html#set_font_style ~disable:[`BOLD] ()

let main () =
  (* code omitted *)
  factory#add_item "Cut" ~key:_X ~callback: html#cut

In this real piece of code, html is an HTML editing widget (an object from the lablgtk library) which is created once at the beginning of the program by the first let html = statement. It is then referred to in several later functions.

Note that the html name in the code snippet above shouldn't really be compared to a real global variable as in C or other imperative languages. There is no space allocated to "store" the "html pointer". Nor is it possible to assign anything to html, for example to reassign it to point to a different widget. In the next section we'll talk about references, which are real variables.

Let-bindings

Any use of let ..., whether at the top level (globally) or within a function, is often called a let-binding.

References: real variables

What happens if you want a real variable that you can assign to and change throughout your program? You need to use a reference. References are very similar to pointers in C/C++. In Java, all variables which store objects are really references (pointers) to the objects.

Here's how we create a reference to an int in OCaml:

# ref 0;;
- : int ref = {contents = 0}

Actually that statement wasn't really very useful. We created the reference and then, because we didn't name it, the garbage collector came along and collected it immediately afterwards! (actually, it was probably thrown away at compile-time.) Let's name the reference:

# let my_ref = ref 0;;
val my_ref : int ref = {contents = 0}

This reference is currently storing a zero integer. Let's put something else into it (assignment):

# my_ref := 100;;
- : unit = ()

And let's find out what the reference contains now:

# !my_ref;;
- : int = 100

So the := operator is used to assign to references, and the ! operator dereferences to get out the contents. Here's a rough-and-ready comparison with C/C++:

OCaml

# let my_ref = ref 0;;
val my_ref : int ref = {contents = 0} # my_ref := 100;;
- : unit = () # !my_ref;;
- : int = 100

C/C++

int a = 0; int *my_ptr = &a;
*my_ptr = 100;
*my_ptr;

References have their place, but you may find that you don't use references very often. Much more often you'll be using let name = expression in to name local expressions in your function definitions.

Nested functions

C doesn't really have a concept of nested functions. GCC supports nested functions for C programs but I don't know of any program which actually uses this extension. Anyway, here's what the gcc info page has to say about nested functions:

A "nested function" is a function defined inside another function. (Nested functions are not supported for GNU C++.) The nested function's name is local to the block where it is defined. For example, here we define a nested function named 'square', and call it twice:

foo (double a, double b)
{
  double square (double z) { return z * z; }

  return square (a) + square (b);
}

The nested function can access all the variables of the containing function that are visible at the point of its definition. This is called "lexical scoping". For example, here we show a nested function which uses an inherited variable named offset:

bar (int *array, int offset, int size)
{
  int access (int *array, int index)
    { return array[index + offset]; }
  int i;
  /* ... */
  for (i = 0; i < size; i++)
    /* ... */ access (array, i) /* ... */
}

You get the idea. Nested functions are, however, very useful and very heavily used in OCaml. Here is an example of a nested function from some real code:

# let read_whole_channel chan =
    let buf = Buffer.create 4096 in
    let rec loop () =
      let newline = input_line chan in
        Buffer.add_string buf newline;
        Buffer.add_char buf '\n';
        loop ()
    in
      try
        loop ()
      with
        End_of_file -> Buffer.contents buf;;
val read_whole_channel : in_channel -> string = <fun>

Don't worry about what this code does - it contains many concepts which haven't been discussed in this tutorial yet. Concentrate instead on the central nested function called loop which takes just a unit argument. You can call loop () from within the function read_whole_channel, but it's not defined outside this function. The nested function can access variables defined in the main function (here loop accesses the local names buf and chan).

The form for nested functions is the same as for local named expressions: let name arguments = function-definition in.

You normally indent the function definition on a new line as in the example above, and remember to use let rec instead of let if your function is recursive (as it is in that example).

Modules and open

OCaml comes with lots of fun and interesting modules (libraries of useful code). For example there are standard libraries for drawing graphics, interfacing with GUI widget sets, handling large numbers, data structures, and making POSIX system calls. These libraries are located in /usr/lib/ocaml/ (on Unix anyway).

For these examples we're going to use module called Graphics which can be installed with opam install graphics and the ocamlfind program installed with opam install ocamlfind.

If we want to use the functions in Graphics there are two ways we can do it. Either at the start of our program we have the open Graphics;; declaration. Or we prefix all calls to the functions like this: Graphics.open_graph. open is a little bit like Java's import statement.

To use Graphics in the interactive toplevel, you must first load the library with

# #use "topfind";;
- : unit = () Findlib has been successfully loaded. Additional directives: #require "package";; to load a package - : unit = () # #require "graphics";;
/Users/me/.opam/4.12.0/lib/graphics: added to search path /Users/me/.opam/4.12.0/lib/graphics/graphics.cma: loaded

A couple of examples should make this clear. (The two examples draw different things - try them out). Note the first example calls open_graph and the second one Graphics.open_graph.

open Graphics

open_graph " 640x480"

for i = 12 downto 1 do
  let radius = i * 20 in
    set_color (if i mod 2 = 0 then red else yellow);
    fill_circle 320 240 radius
done;;
read_line ()
Random.self_init ()

Graphics.open_graph " 640x480"

let rec iterate r x_init i =
  if i = 1 then x_init else
    let x = iterate r x_init (i - 1) in
      r *. x *. (1.0 -. x)

for x = 0 to 639 do
  let r = 4.0 *. (float_of_int x) /. 640.0 in
  for i = 0 to 39 do
    let x_init = Random.float 1.0 in
    let x_final = iterate r x_init 500 in
    let y = int_of_float (x_final *. 480.) in
      Graphics.plot x y
  done
done

read_line ()

Both of these examples make use of some features we haven't talked about yet: imperative-style for-loops, if-then-else and recursion. We'll talk about those later. Nevertheless you should look at these programs and try and find out (1) how they work, and (2) how type inference is helping you to eliminate bugs.

The Stdlib module

There's one module that you never need to "open". That is the Stdlib module. All of the symbols from the Stdlib module are automatically imported into every OCaml program.

Renaming modules

What happens if you want to use symbols in the Graphics module, but you don't want to import all of them and you can't be bothered to type Graphics each time? Just rename it like this:

module Gr = Graphics

Gr.open_graph " 640x480"

Gr.fill_circle 320 240 240

read_line ()

Actually this is really useful when you want to import a nested module (modules can be nested inside one another), but you don't want to type out the full path to the nested module name each time.

The sequence operator ;

The semi-colon ; is an operator, just like + is. Well, not quite just like + is, but conceptually the same. The operator + has type int -> int -> int — it takes two ints and returns an int (the sum). The semi-colon ; may be seen as having type unit -> 'b -> 'b — it takes two values and simply returns the second one, the first expression is guaranteed to be evaluated before the second. Rather like C's , (comma) operator. You can write a; b; c; d just as easily as you can write a + b + c + d.

This is one of those "mental leaps" which is never spelled out very well — in OCaml, nearly everything is an expression. if/then/else is an expression. a; b is an expression. match foo with ... is an expression. The following code is perfectly legal (and all do the same thing):

# let f x b y = if b then x + y else x + 0
   let f x b y = x + (if b then y else 0)
   let f x b y = x + (match b with true -> y | false -> 0)
   let f x b y = x + (let g z = function true -> z | false -> 0 in g y b)
   let f = fun x -> fun b -> fun y -> if b then x + y else x + 0
   let f x b y = x + (let _ = y + 3 in (); if b then y else 0);;
val f : int -> bool -> int -> int = <fun>

Note especially the last one — I'm using ; as an operator to "join" two statements. All functions in OCaml can be expressed as:

 let name [parameters] = expression

OCaml's definition of what is an expression is just a little wider than C's. In fact, C has the concept of "statements" — but all of C's statements are just expressions in OCaml of type unit (combined with the ; operator).

The one place that ; is different from + is that I can refer to + just like a function. For instance, I can define a sum_list function, to sum a list of ints, like:

# let sum_list = List.fold_left ( + ) 0;;
val sum_list : int list -> int = <fun>

Putting it all together: some real code

In this section we're going to show some real code fragments from the lablgtk 1.2 library. (Lablgtk is the OCaml interface to the native Unix Gtk widget library). A word of warning: these fragments contain a lot of ideas which we haven't discussed yet. Don't look at the details, look instead at the overall shape of the code, where they used open, how they indented the code, how they used local and global named expressions.

... However, I'll give you some clues so you don't get totally lost!

  • ?foo and ~foo is OCaml's way of doing optional and named arguments to functions. There is no real parallel to this in C-derived languages, but Python and Smalltalk all have this concept that you can name the arguments in a function call, omit some of them, and supply the others in any order you like.
  • foo#bar is a method invocation (calling a method called bar on an object called foo). It's similar to foo->bar or foo.bar in C++ or Java respectively.

First snippet: The programmer opens a couple of standard libraries. They then create a function called file_dialog. Inside this function they define a named expression called sel using a two-line let sel = ... in statement. Then they call several methods on sel.

(* First snippet *)
open StdLabels
open GMain

let file_dialog ~title ~callback ?filename () =
  let sel =
    GWindow.file_selection ~title ~modal:true ?filename ()
  in
    sel#cancel_button#connect#clicked ~callback:sel#destroy;
    sel#ok_button#connect#clicked ~callback:do_ok;
    sel#show ()

Second snippet: Just a long list of global names at the top level. Notice that the author elided every single one of the ;; because of Rule #2.

(* Second snippet *)
let window = GWindow.window ~width:500 ~height:300 ~title:"editor" ()
let vbox = GPack.vbox ~packing:window#add ()

let menubar = GMenu.menu_bar ~packing:vbox#pack ()
let factory = new GMenu.factory menubar
let accel_group = factory#accel_group
let file_menu = factory#add_submenu "File"
let edit_menu = factory#add_submenu "Edit"

let hbox = GPack.hbox ~packing:vbox#add ()
let editor = new editor ~packing:hbox#add ()
let scrollbar = GRange.scrollbar `VERTICAL ~packing:hbox#pack ()

Third snippet: The author imports all the symbols from the GdkKeysyms module. Now we have an unusual let-binding. let _ = expression means "calculate the value of the expression (with all the side-effects that may entail), but throw away the result". In this case, "calculate the value of the expression" means to run Main.main () which is Gtk's main loop, which has the side-effect of popping the window onto the screen and running the whole application. The "result" of Main.main () is insignificant - probably a unit return value, but I haven't checked - and it doesn't get returned until the application finally exits.

Notice in this snippet how we have a long series of essentially procedural commands. This is really a classic imperative program.

(* Third snippet *)
open GdkKeysyms

let _ =
  window#connect#destroy ~callback:Main.quit;
  let factory = new GMenu.factory file_menu ~accel_group in
  factory#add_item "Open..." ~key:_O ~callback:editor#open_file;
  factory#add_item "Save" ~key:_S ~callback:editor#save_file;
  factory#add_item "Save as..." ~callback:editor#save_dialog;
  factory#add_separator ();
  factory#add_item "Quit" ~key:_Q ~callback:window#destroy;
  let factory = new GMenu.factory edit_menu ~accel_group in
  factory#add_item "Copy" ~key:_C ~callback:editor#text#copy_clipboard;
  factory#add_item "Cut" ~key:_X ~callback:editor#text#cut_clipboard;
  factory#add_item "Paste" ~key:_V ~callback:editor#text#paste_clipboard;
  factory#add_separator ();
  factory#add_check_item "Word wrap" ~active:false
    ~callback:editor#text#set_word_wrap;
  factory#add_check_item "Read only" ~active:false
    ~callback:(fun b -> editor#text#set_editable (not b));
  window#add_accel_group accel_group;
  editor#text#event#connect#button_press
    ~callback:(fun ev ->
      let button = GdkEvent.Button.button ev in
      if button = 3 then begin
        file_menu#popup ~button ~time:(GdkEvent.Button.time ev); true
      end else false);
  editor#text#set_vadjustment scrollbar#adjustment;
  window#show ();
  Main.main ()