[Haskell-cafe] Re: XML parser recommendation?

Rene de Visser Rene_de_Visser at hotmail.com
Tue Oct 23 12:37:59 EDT 2007


"Uwe Schmidt" <uwe at fh-wedel.de> schrieb im Newsbeitrag 
news:200710231717.47003.uwe at fh-wedel.de...
it into HXT.
>
> This still does not solve the processing of "very very large"
> XML document. I doubt, whether we can do this with a DOM
> like approach, as in HXT or HaXml. Lazy input does not solve all problems.
> A SAX like parser could be a more useful choice for very large documents.
>
> Uwe

I think a step towards support medium size documents in HXT would be to 
store the tags and content more efficiently.
If I undertand the coding correctly every tag is stored as a seperate 
Haskell string. As each byte of a string under GHC takes 12 bytes this alone 
leads to high memory usage. Tags tend to repeat. You could store them 
uniquely using a hash table. Content could be stored in compressed byte 
strings.

As I mentioned in an earlier post 2GB memory is not enough to process a 35MB 
XML document in HXT as we have

30 x 2 x 12 = 720 MB for starters to just store the string data (once in the 
parser and once in the DOM).

(Well a machine with 2GB memory). I guess I had somewhere around 1GB free 
for the program. Other overheads most likely used up the ramaining 300 MB.

Rene. 





More information about the Haskell-Cafe mailing list