[Haskell-cafe] Re: XML parser recommendation?

apfelmus apfelmus at quantentunnel.de
Wed Oct 24 05:24:09 EDT 2007


Uwe Schmidt wrote:
> The hashtable approach would of course reduce memory usage,

Note that hashtables are evil :) I'm all for tries instead.

> but this
> would require a global change of the processing model: A document then
> does not longer consist of a single tree, it alway consists of a pair of a tree and a map.

Ah! I got struck by a trick: it's possible to store every tag name in 
full only once but still present a simple tree with full tag names to 
the user. You simply share all the tag names. For instance, in

   let x = "mytagname" in Tree (Tag x) [Tree (Tag x) [Text "foobar"]]

the string "mytagname" is stored in memory only once although there are 
two tags with this name.

When parsing an XML-file, you look up every parsed tag name in a finite 
map (i.e. the trie). If it's already in, you don't store the parsed tag 
name in the XML tree but the one stored at the leaf in the trie. Of 
course, these two strings are equal. But they're not (pointer-) 
identical! After parsing, all equal tag names now are pointers to one 
and the same leaf in the finite map. You can drop the finite map 
structure afterwards, the pointers to the leaves will remain.

That would be quite the memory reduction for large XML trees which are 
likely to have many many identical tag names.

Regards,
apfelmus



More information about the Haskell-Cafe mailing list