Saturday, 16 August 2008

3 things that will confuse you when reading functional programs

Here are 3 things that will confuse you when you read a program written in a functional language. They're just small stylistic differences compared to more mainstream languages, and once you understand them you'll be able to read the code much more easily.

1. Function calls


Calls to functions are written differently, without any brackets or commas.

In C you would write:

printf ("hello %s\n", name);

but in OCaml the same function call would be written like this:

printf "hello %s\n" name;


What's the difference? In functional languages you don't put brackets around the arguments, and you just put spaces between the function name and the arguments.

Why is it confusing? It doesn't look like a function call (unless you're used to this style). Functions are very common in functional programming, not surprising really, and they are often given short names, so you'll see plenty of code like this:

f (g a b) c

Just take it one step at a time and remember that g a b is a function call (in C it would be written as g (a, b)), and that f (...) c is a function call with two parameters (in C it would be written as f (g (a, b), c).)

Why is it done like this? This syntax is better because it's shorter. Functions and function calls are very common in functional programming languages, so we need to use the shortest possible syntax, which is this one.

2. Bindings are not variables


let x = foo

x is not a variable. It's just a name which refers to foo, and there is no way to change its value. The technical term is a let-binding.

You can create another name x with a different value, but that doesn't change the original x or foo:

let x = foo in
let x = bar in
...

One consequence of this is that the following code doesn't do what you think it does:

let quit = false in
while not quit do
let line = read_line () in
if line = "q" then let quit = true in ();
print_endline line
done

In fact this loop never exits. Why? Because the inner quit is just a different label from the outer one. In this case you would get a compiler warning because the inner quit label is never used.

What's the difference? Let bindings make labels, not variables.

Why is it confusing? Variables do exist in some functional languages, particularly ones based on ML like OCaml, but they aren't used very much. Most code you look at will use only let bindings, and you shouldn't confuse those with variables.

Why is it done like this? This encourages the use of immutable data structures, which is a giant topic in itself. In brief, immutable data structures make programming errors less likely because they remove the "who owns that data" problem that imperative languages have. (It's fair to say that immutable data structures also have disadvantages, which is why OCaml lets you drop down to mutable data when you need it).

3. Function types use lots of '->' (arrows)


To write the type of a function, you use an arrow notation. The parameters and the return type are separated by arrows like this:

val average : float -> float -> float

This means that there is a function called average which takes two parameters, both floating point numbers, and returns a floating pointer number.

Here are some more examples:

val print_string : string -> unit

(Takes one parameter, a string, and returns nothing -- unit is like void in C).

val int_of_string : string -> int

(Takes one parameter, a string, and returns an int).

val open_out_gen : open_flag list -> int -> string -> out_channel

(Takes three parameters: a list of flags, an integer and a string. Returns an output channel).

What's the difference? This syntax is common in functional programming, and almost completely unknown outside of it.

Why is it confusing? The parameters and the return type aren't separated from each other.

Why is it done like this? The reason is to do with a mathematical concept called currying. The practical reason is that functional languages let you generate new functions by partially applying some arguments. Thus:

add : int -> int -> int
(add 42) : int -> int
(add 42 2) : int

The first is the general adding function. The second is a partially applied function which adds 42 to any int. The third is the number 44.

4 comments:

warren said...

One of the biggest things I've seen new users trip on is that you have to know a lot about operator precedence to understand what programs mean. E.g. is "fun () -> (), fun () -> ()" of type "(unit->unit)*(unit->unit)", or "unit->unit*(unit->unit)" (and what's the precedence of * vs ->)? I often trip on this myself when reading academic papers on the subject.

Richard Jones said...

To be honest Warren, I wouldn't know immediately what 'fun () -> (), fun () -> ()' means.

Typing it into the toplevel shows the true type, but it's still rather obscure to me:

# fun () -> (), fun () -> () ;;
- : unit -> unit * (unit -> unit) = <fun>

If this was real code, I'd tell people to fully bracket the expression and give some explanatory comment.

warren said...

I'm sorry my example was so over-simplified that it obscured my point... which was that I've found that the first thing that confuses new/potential users about functional programs is the immediate need to understand syntactic precedence of operators and expressions in general.

Unlike familiar languages like c and java, programs in functional languages like ocaml and haskell allow (even encourage) users to leave off many visual bracketing cues (parens) that are unnecessary with respect to precedence relations in the grammar. This raises the bar for new users who must understand this precedence to understand how the program will be understood by the compiler.

List processing code often exhibits this need to understand precedence. Consider this line from the pairwise-mapping function map2 in ocaml:

| a1 :: l1, a2 :: l2 -> let r = f a1 a2 in r :: map2 f l1 l2

One must not only understand that :: binds tighter than comma in the pattern, but also that function application ("map2 f l1 l2") binds tighter than :: (and consequently is performed first, providing the tail of the list to which r is prepended). All of this is alien to users that are new to functional programming. (Not to mention code with user-defined infix operators like this: a ++ b >>= fun v -> return (k v).)

The second thing that probably throws new users off is why many things are done by using recursive functions rather than the ubiquitous iteration used in traditional languages -- and this gets into composability, understanding tail recursion, functional data structures, etc. -- although this gets more into the value of functional programming rather than its confusing aspects.

Richard Jones said...

Warren:

Yes, I fully agree. It's both very convenient (for the experienced) and very confusing (for the newbie) to leave out brackets.

But you know it's just one of those things about getting used to a new language. I recall C appears equally confusing if all you've used before is BASIC.