Monday, 26 May 2008

Phantom types

In libvirt we have a concept of a connection to the hypervisor, and every libvirt operation happens across this connection. Connections are normally read/write, which means that you can carry out any operation, even destructive ones like shutting down a virtual machine. But since programs normally need to have root (or "elevated") privileges to do these destructive operations, we also introduced the concept of a read-only connection which can just be used for non-destructive things such as listing out virtual machines or monitoring CPU loads.








Read-only operations Read/write operations

get hypervisor type
get version
get hostname
get URI
get num CPUs
list VMs
list networks
get VM CPU stats

suspend VM
resume VM
shutdown VM
destroy VM
coredump VM
create VM
change CPU pinning


The question is, what happens if a program tries to call a destructive operation using just a read-only connection? The answer, naturally, is that libvirt throws a runtime error if you try this.

Wouldn't it be nicer if the compiler could find these sorts of errors? That way you could be sure that your compiled program was completely correct, and there wasn't some sort of dark corner of untested code which might fail in rare circumstances. For the OCaml bindings to libvirt that was exactly the sort of guarantee I wanted to make, and to do it I used a language feature which is often thought of as being confusing or highly expert (but in fact is neither of those) ... phantom types.

Before we can understand phantom types, we need to set up a little bit of background: firstly asking what sort of errors is it conceivable for the compiler to find, and secondly understanding a bit more about polymorphic types and variants (which are the toolbox we use to construct phantom types).

Compile time or run time?


To ask if phantom types are useful we need to know first if they are applicable to our problem. Phantom types find only compile time errors. Let's understand what a compile time error could be in libvirt:

let conn = Libvirt.Connect.connect_readonly ()
let dom = Libvirt.Domain.lookup_by_name conn "test"
...
Libvirt.Domain.destroy dom (* fail! *)

Clearly here the dom ("domain" = virtual machine) is derived from a read-only connection (conn), so the call to destroy must fail. It is also clear that the compiler can trivially know that conn / dom are read-only, because of the special connect_readonly call.

Contrast that with the following code:

printf "Connect read-only?" ;;
let readonly_flag = read_line () = "y"
let conn = Libvirt.Connect.open readonly_flag
(* etc. *)

Now the compiler cannot possibly predict what the user will type in at runtime, so this is not a situation where the compiler or phantom types can help. (The Libvirt.Connect.open call is one I just made up, it's not a real call precisely because of this problem).

So libvirt has two separate calls to create connections, connect_readonly which makes read-only connections and connect which makes read/write connections. The connections returned have different but compatible types, and because of this "compatibility" the connections can still be passed to the common non-destructive functions. Even more amazingly, the extra "is read/write" flag which is attached to the connection simply disappears at runtime, which means there is zero loss of efficiency when using phantom types (I'll explain a bit more about that below).

It's important to keep a very clear distinction in your head about what happens at compile time versus what happens at run time.

Mysterious polymorphic types


You may be familiar with polymorphic types such as 'a list, lists of any type, where 'a (pronounced alpha) stands for all types. Here's another example of a polymorphic type, a structure which contains a "free" field that can store any particular type:

type 'a t = { data : 'a }

We can make a structure containing string data for example. Notice its type:

# { data = "hello" } ;;
- : string t = {data = "hello"}


What is perhaps not so well known is that you can add extra polymorphism to any type you like, even when it doesn't seem to need it.

This just makes t an alias for the floating point type:

type t = float

But the following (legal) definition is much more mysterious:

type 'a t = float


What does it do, and how can you create anything of this type? The answer is that any float has this type, as you can prove very easily in the OCaml toplevel:

# (3.0 : unit t);;
- : unit t = 3.
# (10.4 : string t);;
- : string t = 10.4

Because the 'a isn't needed, it can be set to any type (unit and string in the examples above).

This isn't yet the "phantom" type though. It's tempting to think we could write a function which only works on string t (I'll call them "stringies"):

# let add_stringies (a : string t) (b : string t) = (a +. b : string t) ;;
val add_stringies : string t -> string t -> string t = <fun>

But in fact this function does not work!

# add_stringies (3.0 : unit t) 5.0 ;;
- : string t = 8.

This is because unit t and string t can be freely unified with each other because the compiler knows that both are really just floats:

# ((3.0 : unit t) : string t) ;;
- : string t = 3.


To prevent this "rogue" unification and allow us to write a function like add_stringies correctly, we have to hide the real type of t inside a module, like this:

module T : sig
type 'a t
val add_stringies : string t -> string t -> string t
end = struct
type 'a t = float
let add_stringies a b = a +. b
end

Module T is completely useless by the way, it's just setting the stage for ...

Phantom types


OCaml might have saved NASA from losing a $125 million space probe if only they'd had the following Length module which prevents programmers from mixing imperial and metric measurements:

module Length : sig
type 'a t
val meters : float -> [`Meters] t
val feet : float -> [`Feet] t
val (+.) : 'a t -> 'a t -> 'a t
val to_float : 'a t -> float
end = struct
type 'a t = float
external meters : float -> [`Meters] t = "%identity"
external feet : float -> [`Feet] t = "%identity"
let (+.) = (+.)
external to_float : 'a t -> float = "%identity"
end

open Length
open Printf

let () =
let m1 = meters 10. in
let m2 = meters 20. in
printf "10m + 20m = %g\n" (to_float (m1 +. m2));
let f1 = feet 40. in
let f2 = feet 50. in
printf "40ft + 50ft = %g\n" (to_float (f1 +. f2));
(*printf "10m + 50ft = %g\n" (to_float (m1 +. f2)) (* error *) *)

(Try compiling and running this, then try uncommenting the very last line and compiling it again).

Just for readability, we're using polymorphic variant types instead of the basic types (unit and string from the previous section). This allows us to substitute meaningful type names like [`Meters] t which I hope is obvious as to what it contains. It also means the error messages from the compiler will be easy to read - that's important because we are expecting to get a compiler error each time the programmer makes a mistake.

Now look at the functions we have:

Two functions (meters and feet) convert floating point numbers into meters or feet respectively. But the implementation of these functions is completely efficient. They're just the identity operation (which the compiler turns into a null operation). At compile time, the values have this extra type information. But at run time the overhead evaporates completely. At run time, these are just floats.

The addition function is constrained: 'a t -> 'a t -> 'a t. This means you can use it on two meter measurements, or two feet measurements, but you cannot mix meters and feet. Furthermore the return type is the same as the input types, so this safety cascades through all code.

Finally we need a way to extract and print the results. I've included a rather naive to_float function, but for better safety we'd probably want to define special print functions which ensure that the output indicates the correct type back to the user.

Let's go back to our read-only/-write connections from the beginning. Remember that simple non-destructive status functions are safe for anyone to call, but destructive functions must only be used when you have a read/write connection. To express this we will use a slightly different feature of polymorphic variants, that is being able to express open types using [>...]. A read-only connection will have type [`Readonly] t and a read/write connection will have type [`Readonly|`Readwrite] t which means that it's compatible with the read-only type but has the extra read/write ability.

Status functions have type : [>`Readonly] t -> ... because they work with "read-only or greater".

Here is our module:

module Connection : sig
type 'a t
val connect_readonly : unit -> [`Readonly] t
val connect : unit -> [`Readonly|`Readwrite] t
val status : [>`Readonly] t -> int
val destroy : [>`Readwrite] t -> unit
end = struct
type 'a t = int
let count = ref 0
let connect_readonly () = incr count; !count
let connect () = incr count; !count
let status c = c
let destroy c = ()
end

open Connection
open Printf

let () =
let conn = connect_readonly () in
printf "status = %d\n" (status conn);
(*destroy conn; (* error *) *)
let conn = connect () in
printf "status = %d\n" (status conn);
destroy conn

Again, after compiling this, try uncommenting the rogue call to destroy and notice that the error is caught by the compiler.

Here's another example, again it comes from actual deployed code. A "memory map" is a view into the memory of a virtual machine. When we first load a memory map we don't know anything about the endianness or word size of the virtual machine. We have to inspect the memory map first to determine endianness (little or big endian) and word size (32 or 64 bit pointers). Once we've determined both, we offer additional operations which work on the integers and pointers contained in the memory map.

So there are four "subtypes" (states?) of memory map, summarized in the diagram on the left.

Can we use phantom types to ensure that the transitions happen before specific calls are allowed? Yes we can!

We start by defining our phantom type. This is how it appears in the module signature (the mli file). The specifics of the implementation (from the ml file) aren't important here. Note that because there are two degrees of freedom (word size and endianness), there are two phantom types attached to t:

type ('a,'b) t

The basic of_file function makes a memory map from a file descriptor and base address. It returns a memory map with no word size or endianness, which is pretty clearly expressed in the return type:

val of_file : Unix.file_descr -> addr -> ([`NoWordsize], [`NoEndian]) t

Some functions work with any memory map, whether or not word size and endianness have been set. For example, the find function searches for strings in the memory map:

val find : ('a, 'b) t -> ?start:addr -> string -> addr option


A related function is find_align which finds strings that are aligned to the word size. This function cares that word size has been set, but not endianness, and its type is therefore:

val find_align : ([`Wordsize], 'b) t -> ?start:addr -> string -> addr option


The find_pointer function looks for pointers appearing in the memory map. Pointers have both endianness and word size implicitly, so this function can only be used when both have been set on the memory map. Its type is:

val find_pointer : ([`Wordsize], [`Endian]) t -> ?start:addr -> addr ->
addr option


Finally of course we need a way to supply the word size and endianness to the memory map. Detecting these is a difficult and drawn-out process, so our main code will set each one individually by calling the following two functions:

val set_wordsize : ([`NoWordsize], 'b) t -> wordsize ->
([`Wordsize], 'b) t
val set_endian : ('a, [`NoEndian]) t -> endian ->
('a, [`Endian]) t

The compiler guarantees here are quite strict. Although the programmer can call the functions in either order, they must call each function only once. And of course the programmer won't be allowed to call functions like find_pointer until they have called both set_wordsize and set_endian exactly once (although the order they call the functions doesn't matter).

Personally I think this is a pretty awesome language feature. I'm not sure if any other non-functional languages can make these sorts of guarantees at compile time. (Perhaps Eiffel?)

One interesting thing is that the implementation of the memory map still has runtime checks in those functions (such as find_pointer) which need it. But you can be sure that the runtime checks will never fail. An improvement to the code would be to write it so it doesn't need any runtime checks at all.

Runtime and phantom types



My final example with phantom types will be to explain how to add runtime checks back in. Some situations simply cannot be checked at compile time, yet don't represent errors. For example in mlvirsh users can open read-only or read/write connections to the hypervisor. We deal with this by adding a limited amount of dynamic typing back into the system:

type conn_t =
| No_connection
| RO of Libvirt.ro Libvirt.Connect.t
| RW of Libvirt.rw Libvirt.Connect.t

Of course we need to also add runtime checks into the program.

I will argue that this is an acceptable trade-off and still much better than languages which only do dynamic typing. We get the benefits of static types / early compiler errors when it's possible to check for them, but we can slip back to dynamic typing in the situations where it's necessary. Out of the entire suite of virt tools, just one (mlvirsh) needs runtime types.

1 comment:

Unknown said...

Thanks Richard. A very nice and interesting post! I was pondering about doing the same kind of thing in my own program and your post is just at the right timing! :-)