[Haskell-cafe] Re: XML parser recommendation?

Wed Oct 24 07:06:06 EDT 2007

apfelmus wrote:

> Ah! I got struck by a trick: it's possible to store every tag name in
> full only once but still present a simple tree with full tag names to
> the user. You simply share all the tag names. For instance, in
>
>    let x = "mytagname" in Tree (Tag x) [Tree (Tag x) [Text "foobar"]]
>
> the string "mytagname" is stored in memory only once although there are
> two tags with this name.
>
> When parsing an XML-file, you look up every parsed tag name in a finite
> map (i.e. the trie). If it's already in, you don't store the parsed tag
> name in the XML tree but the one stored at the leaf in the trie. Of
> course, these two strings are equal. But they're not (pointer-)
> identical! After parsing, all equal tag names now are pointers to one
> and the same leaf in the finite map. You can drop the finite map
> structure afterwards, the pointers to the leaves will remain.
>
> That would be quite the memory reduction for large XML trees which are
> likely to have many many identical tag names.

I've also thought about this approach. It sounds a bit weired,
to built a map (or a trie) for the identity function. But it would
solve a part of the space problem, at least when building XML
documents by parsing.
So I guess, there is a new project:
A simple, small and lazy parser (no parsec), at least for the content
part of XML.

Uwe