Saturday, 20 September 2008

What can OCaml do that you can't do in other programming languages?

Depressingly predictable comments on the Haskell story in Slashdot yesterday such as:
"I have *never* seen it being used since. To my mind they both belong in the category 'interesting, but pointless'."
and
"The point is that there's nothing those languages can do that can't be done, often more easily, with the current crop of popular languages. Elegance cannot beat convenience in the workplace, or in most at any rate."
and so on.

Here are some useful things you can do in OCaml which you cannot do in the "current crop of popular languages".

1. Change the language to suit your task.

In our case the task was to parse binary formats.

If you have a few hours to spare, try writing a C program to parse tcpdump files. Don't forget that the endianness in these files is variable so you'll need lots of if (..) { field1 = htonl (field1); ... }. OK so that's a bit hard. Let's say you want to parse a 6 bit length field 'n' followed by an n+1 bit data field (as a 1-64 bit int). Go and write it in C now.

Using our bitstring extension to OCaml, parsing binary structures is really effortless:

let bits = Bitstring.bitstring_of_file "input.data" in
bitmatch bits with
| { n : 6;
data : n+1 } -> data

This is the complete tcpdump parser, just 113 lines of code.

And it's fast too. The resulting code compiles down to machine code and inlines direct calls to C functions at every opportunity, so in practice you can parse data at speeds approaching C.

Another task we had was to check that hundreds of SQL statements in a modest sized web application actually matched with fields in the database, in other words that they didn't reference non-existent fields or treat a string field as an integer and so forth. Doing this by testing is almost impossible because many SQL statements are only used on rare error paths. We could contemplate doing it manually once but not routinely.

So instead we extended the OCaml language to do the checking for us every time we compiled the code. The resulting PG'OCaml project gives you compiled-time checked type-safe access to your database. It's used by mod_caml and ocsigen, or you can just use it in standalone programs. I won't go into detail because Dario Teixeira wrote a great introduction and tutorial to PG'OCaml.

Martin Jambon, author of Micmatch which adds regular expressions directly into the language, has an excellent tutorial. Browse the list of syntax extensions here.

2. Get the compiler to check for errors in your logic

Why can't your compiler check for errors like when you use a read-only database connection in a SQL INSERT statement, or you are supposed to call library function set_word_size because you try to call get_word?

In OCaml, using phantom types [tutorial] you can do exactly this.

3. Drop down to imperative code when you need speed like C

Scripting languages are expressive but slow ... and getting slower. But there's no need to compromise expressiveness to get real speed. Static typed languages with type inference (which excludes C, C++, Java, C# and most others, but includes SML, OCaml and F#) give you the expressiveness to write compact code but without compromising on speed.

OCaml programs are more than 10 - 15 times faster and less memory intensive than Python programs.

More about speed and optimizing OCaml programs. Comparison of a raytracer written in C++ and OCaml.

4. Large well-established standard libraries

OCaml comes with hundreds of packages for many tasks now.

If that's not enough you can directly call out to Perl, Java, Python and .Net [on Windows] libraries. Or you can call C functions directly.

You can compile the same code on Unix, Mac OS X and Windows.

So enough of this "elegance can't beat convenience" stuff please.

Tuesday, 16 September 2008

Tip: Read all lines from a file (the most common OCaml newbie question?)

Is this the most common OCaml beginners question? It comes up every few weeks on the OCaml beginners list, and I have tried to answer it before.

It seems that everyone who learns OCaml comes away with the impression that functional programming is the New Cool Thing, and imperative programming is Bad and Must Be Avoided.

I'm going to say it now: programming fashions are stupid and counterproductive. The only things that matter are that your program is short, easy to write, easy to maintain and works correctly. How you achieve this has nothing to do with programming fads.

Reading all lines from a file is an imperative problem, and the shortest solution (easy to write, easy to maintain and correct1) uses a while loop, in OCaml or any other language:
let lines = ref [] in
let chan = open_in filename in
try
while true; do
lines := input_line chan :: !lines
done; []
with End_of_file ->
close_in chan;
List.rev !lines
Actually, no, I'm lying. The best solution is this:
Std.input_list chan
which is supplied by extlib. Don't bother to duplicate functions which are already provided in commonly available libraries.


1This is only strictly speaking correct if you handle clean-up if input_line throws some read error (exception). In the common case where you just exit the program, leaving the channel open is perfectly acceptable.

Home server, part 3, quick update

As I alluded to on the GLLUG mailing list, I got stuck installing Red Hat Enterprise Linux (RHEL) 5.2 on the Viglen MPC-L because of this bug when installing RHEL or CentOS on a Geode LX. Consequently I trashed the Xubuntu pre-install and managed to make the machine unbootable. The BIOS claims to support all sorts of useful boot methods such as booting from USB flash drives and over the network, but in practice nothing seemed to work for me except booting from the local hard disk.

This means I'll have to remove the hard disk (it's a 2.5" laptop-sized HDD) and transplant it into another machine, and to do this I'm waiting for the correct 2.5" - IDE adaptor cable from Maplin.

I think the most common question people have asked me is how fast the Viglen is, with its AMD Geode LX central processor. When I had Xubuntu on the machine and while I was failing to install RHEL, I played with it and the conclusions are mixed. Firstly it's fair to say that it is not a fast machine, but fast enough to be useful for (eg) light browsing. But I did observe that some specific things are really slow. For example the timezone / graphical map selection during the RHEL install is really slow, but the rest of the RHEL install was perfectly fine. According to Alan Cox the Geode is quite a strange beast. Some pieces of "hardware" are in reality emulated (certain PCI devices for one), so my theory is that when you hit a piece of code which uses these emulated devices in a certain way, suddenly everything slows down.

The true test of this will come when I get RHEL installed and have it running as a server.

Friday, 12 September 2008

Home server, part 2, just photos

I received the Viglen MPC-L which will be part of my home server.

The first, rather unfair impression is "cheap Mac Mini", but to Viglen's credit it is very much cheaper than a Mac, about one quarter of the price, at a mere £80 inc. tax and delivery.



Front and back:





The full set is shown below. I was surprised that Viglen even bothered supplying a mouse and keyboard, since they don't supply the obvious missing bits:



... but what about all this waste? Viglen could do better to reduce this:

Wednesday, 10 September 2008

MinGW - build Windows binaries from the comfort of your Fedora

MinGW is a Windows C/C++ compiler and toolchain based on gcc and binutils which runs on Linux, but generates Windows libraries (*.dll) and executables (*.exe). This means that when a customer asks you to write a Windows program, you can do it without leaving the comfort of your familiar Linux environment.

Today the Fedora MinGW Special Interest Group made a milestone release of the compiler toolchain and many basic libraries including Gtk and all its dependencies.

These aren't part of Fedora proper yet, but you can grab source RPMs and x86-64 binaries from my website here:

http://www.annexia.org/tmp/mingw/
(Still uploading as I write this)

If you prefer you can clone our Mercurial development repository. (Read the README file)

The current draft packaging guidelines may look complicated but if you start from our example spec file (mingw-example.spec) then that takes care of most packaging scenarios and dependency generation automatically.

Tuesday, 9 September 2008

Home server, part 1

This is going to be a series about how I built a home network storage server for all my photos, music, videos, software etc.

I now have many computers at home, no central storage for them, and an incoherent backup regime. I've been planning a home NAS (network attached storage) server for some time.

The basic requirements are that I have at least 1 terabyte of usable storage, it must be available on the network for everyone in the house to use, and it must be regularly backed up. Additionally I'm planning that it will use RAID to give it some resilience, it should be easily and cheaply upgradable, and it will be fully encrypted to prevent catastrophic loss in case of theft.

Now the architecture is a bit unusual. Dedicated NAS systems are quite expensive, so instead I'm planning to use USB 2.0 external drives attached to a tiny Viglen MPC-L computer. This setup sacrifices some performance for an exceptionally low price (but then again, performance isn't too important when the only way to access it will be over a wireless network).

The Viglen MPC-L is a small, cheap, AMD Geode-based Linux box, akin to a Mac Mini (but considerably cheaper at £80 inc. tax and delivery). I also bought three 500 GB USB drives. The total cost (inc. delivery and UK sales taxes) is £260 (approx. US $400). That doesn't yet include backup which will be another 1 TB USB drive for around another $150-200.

Here are the three drives. Externally these are rather elegant FreeCom cases, and internally they are standard Samsung drives:



Here I'm formatting and testing the drives using my laptop. I'll cover the precise setup in a future article:

Sunday, 7 September 2008

Writing printf-like functions

Here's a quick and useful tip: failwith is the standard function to raise a general error, but it's a little bit clumsy to use because it only takes a fixed string.
if temp >= 100 then
failwith "we've reached boiling point"
If you want to have the message contain useful debugging information, you need to use sprintf to generate the fixed string, like:
if temp >= 100 then
failwith (sprintf "%d degC: boiling point reached" temp)
(I'm assuming here that you have open Printf at the top of your file, something which you should almost always do so you don't need to write Printf.sprintf all the time,).

With this simple tip we can turn failwith into a function that automatically takes a printf-like format string, and we can learn a little bit about the arcana of polymorphic types too.

First of all, here is the code:
let failwith format = ksprintf failwith format
You can see in the toplevel that it works:
# let failwith format = ksprintf failwith format ;;
val failwith : ('a, unit, string, 'b) format4 -> 'a = <fun>
# failwith "hello, %s" "world" ;;
Exception: Failure "hello, world".
# failwith "error code %d" 3 ;;
Exception: Failure "error code 3".
ksprintf is the key function here. Like sprintf it takes a format string and a variable number of parameters, and makes a fixed result string. Unlike sprintf it doesn't return the string, but passes it to the function which is its first parameter — in this case, the standard failwith function. So ksprintf is useful because it can turn almost any fixed string function into a printf-like function.

Now how about the lesson on type arcana? Well if you know anything about currying you might think that you could write the new failwith function even shorter, like this:
let failwith = ksprintf failwith
If you try this, you'll find the new function works some of the time, but fails to type-check at other times. In fact, the first time you use it, it seems to "remember" the type of all the arguments, and then refuses to work if any of those types change:
# failwith "hello, %s" "world" ;;
Exception: Failure "hello, world".
# failwith "error code %d" 3 ;;
This expression has type (int -> 'a, 'b, 'c, 'd, 'd, 'a) format6
but is here used with type
(string -> 'e, unit, string, 'e) format4 =
(string -> 'e, unit, string, string, string, 'e) format6
If we take a close look at the inferred types of the wrong definition, we can see why:
# let failwith = ksprintf failwith ;;
val failwith : ('_a, unit, string, '_b) format4 -> '_a = <fun>
'_a (with an underscore) is not a polymorphic type, but a single type that the compiler just hasn't been able to infer fully yet. As soon as you give it more information (eg. calling the function), the compiler infers that type into some concrete type (like string -> ... above) and won't let you change it later.

A more advanced question is to work out why type inference fails to infer the more general polymorphic type. I suspect this FAQ may have the answer.