[Haskell-cafe] Re: XML parser recommendation?
Rene de Visser
Rene_de_Visser at hotmail.com
Tue Oct 23 12:37:59 EDT 2007
"Uwe Schmidt" <uwe at fh-wedel.de> schrieb im Newsbeitrag
news:200710231717.47003.uwe at fh-wedel.de...
it into HXT.
>
> This still does not solve the processing of "very very large"
> XML document. I doubt, whether we can do this with a DOM
> like approach, as in HXT or HaXml. Lazy input does not solve all problems.
> A SAX like parser could be a more useful choice for very large documents.
>
> Uwe
I think a step towards support medium size documents in HXT would be to
store the tags and content more efficiently.
If I undertand the coding correctly every tag is stored as a seperate
Haskell string. As each byte of a string under GHC takes 12 bytes this alone
leads to high memory usage. Tags tend to repeat. You could store them
uniquely using a hash table. Content could be stored in compressed byte
strings.
As I mentioned in an earlier post 2GB memory is not enough to process a 35MB
XML document in HXT as we have
30 x 2 x 12 = 720 MB for starters to just store the string data (once in the
parser and once in the DOM).
(Well a machine with 2GB memory). I guess I had somewhere around 1GB free
for the program. Other overheads most likely used up the ramaining 300 MB.
Rene.
More information about the Haskell-Cafe
mailing list