then just using
external call_1 : float -> float = "call_1"
call_1. However these calls are not direct. They go via an OCaml runtime function called
caml_c_call. This is a tiny bit of assembler, so the overhead isn't large, but it does use a computed jump which on many processors is quite slow.
Luckily this indirection is only needed in order to set up the garbage collector. If your C function won't perform any OCaml allocations, then you don't need this, and you can tell OCaml to jump directly to your C function like this:
external call_2 : float -> float = "call_2" "noalloc"
Let's compare the generated assembly code for the calls in both cases:
pushl %eax pushl %eax
movl $call_1, %eax call call_2
call caml_c_call addl $4, %esp
addl $4, %esp
movl (%esp), %edx
movl %edx, G(caml_last_return_address)
leal 4(%esp), %edx
movl %edx, G(caml_bottom_of_stack)
As you can see, the "noalloc" version is much shorter.