Saturday, 30 August 2008

Sharing a global variable between C and OCaml

I was asked today if it is possible to share a global variable between C and OCaml. This was my first response:

OCaml cannot just directly access C globals. At best you'd need to have a function that returns the address of the C global, _and_ the C global would need to be in a form that OCaml code could understand although that is pretty easy to arrange.
In a followup, I gave some example code:

------------------------------------------------------------ test_c.c
/* Variable shared between C and OCaml. */

#include <caml/mlvalues.h>

/* On the OCaml side, this will be made to look like a structure
* containing a single int (well, provided you don't look *too*
* closely at it).
value shared_var = Val_int (0);

get_struct_addr (value unitv)
/* Return the address of the 'structure'. */
return ((value) &shared_var);

/* Increment the shared variable. */
increment_it (value unitv)
int i = Int_val (shared_var);
shared_var = Val_int (i);
return (Val_unit);

(* Variable shared between C and OCaml. *)

type var = {
shared_var : int;

external get_struct_addr : unit -> var = "get_struct_addr" "noalloc"
external increment_it : unit -> unit = "increment_it" "noalloc"

let var = get_struct_addr () ;;

while true do
Printf.printf "value of the variable now is %d\n%!" var.shared_var;
increment_it ();
(* OCaml isn't expecting that increment_it modifies a variable, so
* there is no guarantee that we will see the changed value next
* time around.
Unix.sleep 1;

$ gcc -I /usr/lib/ocaml -c test_c.c
$ ocamlopt -c
$ ocamlopt unix.cmxa test_c.o test.cmx -o test
$ ./test
value of the variable now is 0
value of the variable now is 1
value of the variable now is 2
value of the variable now is 3
value of the variable now is 4

Before you use this code, be aware that it's a real hack.

Thursday, 21 August 2008

Tip: Calling C functions directly with "noalloc"

In OCaml you can call a C function by writing:

external call_1 : float -> float = "call_1"
then just using call_1. However these calls are not direct. They go via an OCaml runtime function called caml_c_call. This is a tiny bit of assembler, so the overhead isn't large, but it does use a computed jump which on many processors is quite slow.

Luckily this indirection is only needed in order to set up the garbage collector. If your C function won't perform any OCaml allocations, then you don't need this, and you can tell OCaml to jump directly to your C function like this:

external call_2 : float -> float = "call_2" "noalloc"

Let's compare the generated assembly code for the calls in both cases:

Normal "noalloc"

pushl %eax pushl %eax
movl $call_1, %eax call call_2
call caml_c_call addl $4, %esp
addl $4, %esp
movl (%esp), %edx
movl %edx, G(caml_last_return_address)
leal 4(%esp), %edx
movl %edx, G(caml_bottom_of_stack)
jmp *%eax

As you can see, the "noalloc" version is much shorter.

Tuesday, 19 August 2008

Just draw something on the f-ing screen

I don't believe that computers have got better over the past 40 years.

Case in point: I've spent at least 2 hours trying to debug a Gtk program which is supposed to plot some dots on the screen. Like this sort of thing in barely remembered ZX Spectrum BASIC:

10 FOR F = 0 TO 2*PI STEP 0.1
20 PLOT SIN(F)*200+100, COS(F)*200+100

The Gtk program is 20 times longer than this. And refuses to draw anything except a black window.

Computers have got worse in many ways since my first computer.


My angry late-night programming rant makes it to reddit.

Ob-awesome Wikipedia page of the week: List of 8 bit computer hardware palettes.

Saturday, 16 August 2008

3 things that will confuse you when reading functional programs

Here are 3 things that will confuse you when you read a program written in a functional language. They're just small stylistic differences compared to more mainstream languages, and once you understand them you'll be able to read the code much more easily.

1. Function calls

Calls to functions are written differently, without any brackets or commas.

In C you would write:

printf ("hello %s\n", name);

but in OCaml the same function call would be written like this:

printf "hello %s\n" name;

What's the difference? In functional languages you don't put brackets around the arguments, and you just put spaces between the function name and the arguments.

Why is it confusing? It doesn't look like a function call (unless you're used to this style). Functions are very common in functional programming, not surprising really, and they are often given short names, so you'll see plenty of code like this:

f (g a b) c

Just take it one step at a time and remember that g a b is a function call (in C it would be written as g (a, b)), and that f (...) c is a function call with two parameters (in C it would be written as f (g (a, b), c).)

Why is it done like this? This syntax is better because it's shorter. Functions and function calls are very common in functional programming languages, so we need to use the shortest possible syntax, which is this one.

2. Bindings are not variables

let x = foo

x is not a variable. It's just a name which refers to foo, and there is no way to change its value. The technical term is a let-binding.

You can create another name x with a different value, but that doesn't change the original x or foo:

let x = foo in
let x = bar in

One consequence of this is that the following code doesn't do what you think it does:

let quit = false in
while not quit do
let line = read_line () in
if line = "q" then let quit = true in ();
print_endline line

In fact this loop never exits. Why? Because the inner quit is just a different label from the outer one. In this case you would get a compiler warning because the inner quit label is never used.

What's the difference? Let bindings make labels, not variables.

Why is it confusing? Variables do exist in some functional languages, particularly ones based on ML like OCaml, but they aren't used very much. Most code you look at will use only let bindings, and you shouldn't confuse those with variables.

Why is it done like this? This encourages the use of immutable data structures, which is a giant topic in itself. In brief, immutable data structures make programming errors less likely because they remove the "who owns that data" problem that imperative languages have. (It's fair to say that immutable data structures also have disadvantages, which is why OCaml lets you drop down to mutable data when you need it).

3. Function types use lots of '->' (arrows)

To write the type of a function, you use an arrow notation. The parameters and the return type are separated by arrows like this:

val average : float -> float -> float

This means that there is a function called average which takes two parameters, both floating point numbers, and returns a floating pointer number.

Here are some more examples:

val print_string : string -> unit

(Takes one parameter, a string, and returns nothing -- unit is like void in C).

val int_of_string : string -> int

(Takes one parameter, a string, and returns an int).

val open_out_gen : open_flag list -> int -> string -> out_channel

(Takes three parameters: a list of flags, an integer and a string. Returns an output channel).

What's the difference? This syntax is common in functional programming, and almost completely unknown outside of it.

Why is it confusing? The parameters and the return type aren't separated from each other.

Why is it done like this? The reason is to do with a mathematical concept called currying. The practical reason is that functional languages let you generate new functions by partially applying some arguments. Thus:

add : int -> int -> int
(add 42) : int -> int
(add 42 2) : int

The first is the general adding function. The second is a partially applied function which adds 42 to any int. The third is the number 44.

Sunday, 10 August 2008

virt-mem 0.2.9

I'm pleased to announce the latest alpha release of the virt-mem tools, version 0.2.9.

These are tools for system administrators which let you find things like kernel messages, process lists and network information of your guests.

For example:

'uname' command, shows OS version, architecture, etc.
'dmesg' command, shows kernel messages
'ps' command, shows process list

Nothing needs to be installed in the guest for this to work, and the tools are specifically designed to allow easy scripting and integration with databases and monitoring systems.

Source is available from the web page here:

The latest version (0.2.9) reworks the internals substantially so that we have direct access to basically any kernel structure, and this will allow us to quickly add the remaining features that people have asked for (memory usage information, lists of network interfaces and so on).

As usual, patches, feedback, suggestions etc. are very welcome!

Binaries are available for Fedora through this link: