Saturday 20 September 2008

What can OCaml do that you can't do in other programming languages?

Depressingly predictable comments on the Haskell story in Slashdot yesterday such as:
"I have *never* seen it being used since. To my mind they both belong in the category 'interesting, but pointless'."
and
"The point is that there's nothing those languages can do that can't be done, often more easily, with the current crop of popular languages. Elegance cannot beat convenience in the workplace, or in most at any rate."
and so on.

Here are some useful things you can do in OCaml which you cannot do in the "current crop of popular languages".

1. Change the language to suit your task.

In our case the task was to parse binary formats.

If you have a few hours to spare, try writing a C program to parse tcpdump files. Don't forget that the endianness in these files is variable so you'll need lots of if (..) { field1 = htonl (field1); ... }. OK so that's a bit hard. Let's say you want to parse a 6 bit length field 'n' followed by an n+1 bit data field (as a 1-64 bit int). Go and write it in C now.

Using our bitstring extension to OCaml, parsing binary structures is really effortless:

let bits = Bitstring.bitstring_of_file "input.data" in
bitmatch bits with
| { n : 6;
data : n+1 } -> data

This is the complete tcpdump parser, just 113 lines of code.

And it's fast too. The resulting code compiles down to machine code and inlines direct calls to C functions at every opportunity, so in practice you can parse data at speeds approaching C.

Another task we had was to check that hundreds of SQL statements in a modest sized web application actually matched with fields in the database, in other words that they didn't reference non-existent fields or treat a string field as an integer and so forth. Doing this by testing is almost impossible because many SQL statements are only used on rare error paths. We could contemplate doing it manually once but not routinely.

So instead we extended the OCaml language to do the checking for us every time we compiled the code. The resulting PG'OCaml project gives you compiled-time checked type-safe access to your database. It's used by mod_caml and ocsigen, or you can just use it in standalone programs. I won't go into detail because Dario Teixeira wrote a great introduction and tutorial to PG'OCaml.

Martin Jambon, author of Micmatch which adds regular expressions directly into the language, has an excellent tutorial. Browse the list of syntax extensions here.

2. Get the compiler to check for errors in your logic

Why can't your compiler check for errors like when you use a read-only database connection in a SQL INSERT statement, or you are supposed to call library function set_word_size because you try to call get_word?

In OCaml, using phantom types [tutorial] you can do exactly this.

3. Drop down to imperative code when you need speed like C

Scripting languages are expressive but slow ... and getting slower. But there's no need to compromise expressiveness to get real speed. Static typed languages with type inference (which excludes C, C++, Java, C# and most others, but includes SML, OCaml and F#) give you the expressiveness to write compact code but without compromising on speed.

OCaml programs are more than 10 - 15 times faster and less memory intensive than Python programs.

More about speed and optimizing OCaml programs. Comparison of a raytracer written in C++ and OCaml.

4. Large well-established standard libraries

OCaml comes with hundreds of packages for many tasks now.

If that's not enough you can directly call out to Perl, Java, Python and .Net [on Windows] libraries. Or you can call C functions directly.

You can compile the same code on Unix, Mac OS X and Windows.

So enough of this "elegance can't beat convenience" stuff please.

9 comments:

gaius said...

There may be libraries for all sorts of tasks - but they have to be relevant tasks to the working programmer. I'd be using OCaml in production today if there were rock-solid native bindings to Oracle. There's a hack that lets you piggyback on Perl's DBI and another library that won't even compile on any of my Linux or Solaris boxes and that's it. My choices are a) write my own or b) just use Python. I have real work to do, so b wins hands down.

There's no mystery - from outside the academic world - why OCaml, Haskell, et al aren't popular. It's not that FP is "too hard". It's that the language developers simply aren't interested in the same things as commercial users. And that's fine, it's their language after all. But that's also the real reason.

Richard Jones said...

Oracaml is an OCaml interface to Oracle:

http://sourceforge.net/projects/oracaml

http://groups.google.com/group/fa.caml/browse_thread/thread/0b4a5caed6cf71af/1d2244da3af86fa7

Richard Jones said...

I should probably add to the preceeding that I have done a huge amount of work to make OCaml useful for commercial users, but we tend to concentrate on things which make free software solutions better.

Oracle is a proprietary product, almost impossible to install on free systems, and any effort is going towards making the free solutions better (ie. PostgreSQL, MySQL) before trying to benefit proprietary ones.

This is nothing to do with the supposed "academic world" (mainly a myth) that is falsely associated with some functional languages.

Anonymous said...

Let me preface by saying I don't write OCaml myself (not because of anything it doesn't have, I just am not using it at the moment). I do want to point out that the arguments generally made against it are really similar to the arguments C guys had against C++ , C++ guys had against Java and the Java guys today have against Ruby and Python. OCaml is completely ready to be used in most if not all environments ... it's just bright, scary and shiny to those who have to really work hard to learn new languages.

@gaius I'm a HUGE (weight and love of the language) Python fan myself ... and programming languages are tools themselves ... you use the best language for the job at hand ... sometimes it's Python, sometimes it's Ruby, sometimes it's Erlang, sometimes it's C, etc.. Honestly the same argument can be made against Python ... the Python community is to focused on readability and style which is not something most commercial entities care about. On the flip side Python isn't as fast as $LOWERLEVELLANG which a lot of commercial entities do care about. Are the arguments really valid? I personally don't think so.

Jengu said...

Only the first two items in your list are found in Ocaml but not traditional languages. Most scripting languages nowadays support dropping into C (e.g. Perl, Python, Lua, TCL) and the popular languages have a far greater number of establish libraries than Ocaml (e.g. Java, C++, C#).

Richard Jones said...

Jengu: The point of item 3 is that you don't need to drop into C to get the performance. On item 4, the libraries are there, and if any are missing you just out to libraries written in the other languages.

xk0der said...

Thanks for a wonderful post.

I don't really get it why people get scared of newer programming languages (or languages they don't know).

Even before trying language A out, to solve their "commercial" problems, they start whining.

Pramod said...

What's a good book on Ocaml? One of the advantages of programming in, say, Java, is that you can find a "Learn Java in 30 days" at every street corner. I have a theory that the "hot" and "new" programming languages are partly decided by well written books. For instance, the Ruby explosion happened around the time people started writing all these books about Rails.

Anyway, great post. Once this semester is over, you've motivated me to do some serious programming in Ocaml as soon as this semester ends.

Richard Jones said...

Yes, I agree about the books.

I offered to write one for O'Reilly a few years back, but they turned me down. Instead I wrote the OCaml Tutorial which is moderately popular.

Don't get Practical OCaml. It's terrible.

Jason Hickey is publishing a book through Cambridge University Press soon. There is a draft version on his website.