And I won't recur (pun, intended) on the subject of how easy it is to write code to process these data structures.
type xml =
| Element of (string * (string * string) list * xml list)
| PCData of string
However there is one thing which is difficult to do, and that is to meaningfully extend or annotate such data.
Let's say as a practical example I want to write a CSS renderer for XML documents. A first step to doing this is to annotate each node in the XML document with the list of styles that apply to it. (Styles are inherited, so natural recursive algorithms work quite well for this). The second step is to take what CSS calls "replaced elements" (elements like <IMG> that have an intrinsic size not known to CSS itself) and compute their sizes.
xmltype defined above is not extensible at all. In order to allow it to store styles and sizes for replaced elements we'd need to change the type definition to something like this:
Unfortunately this is no use to us because all our existing code that worked on the old
type xml =
| Element of (string * (string * string) list * xml list
* style list * size option)
| PCData of string
xmltype no longer works on this new type. That is no mere theoretical concern either, because the old
xmltype at the top wasn't just chosen at random. It is in fact the type used by Xml-Light, a useful, simplified XML parser and printer, and we cannot change this type without stopping using this library.
PXP (another XML library -- OCaml has several) has a polymorphic XML type, allowing you to attach precisely one arbitrary datum to each node. In terms of the
xmltype defined above, it looks similar to this:
This is half-way useful, because one can now use the library functions (they assume polymorphism, so work fine), but has some serious shortcomings of its own. One is that you can't easily build up independent modules. If there is a PXP module which uses the
type 'a xml =
| Element of ('a, ...
'afor its own purpose then one cannot extend the XML document any further. If you avoid that, you can attach multiple data by using a tuple, but each module that wants to attach data had better know about all the other modules in advance and agree on a single tuple type.
For the CSS styles / sizes example, we could also imagine adding a
sizewhen that is needed. You'd end up with document types like this:
But you can't write a function that just operates on "an XML document annotated with sizes", unless it "knows" about all the other annotations and the order in which they were applied. (Is the type
unit xml # The basic document
unit styles xml # After annotating with styles (pass 1)
size styles xml # After annotating with sizes (pass 2)
size styles xmlor
styles size xmlif we happened to swap the two passes around?)
In part 2 (next week) I'll look at some radically different ways to annotate and extend immutable data structures.