[Haskell-cafe] Re: hxt memory useage

Fri Jan 25 14:49:53 EST 2008

"Matthew Pocock" <matthew.pocock at ncl.ac.uk> schrieb im Newsbeitrag 
news:200801241917.33281.matthew.pocock at ncl.ac.uk...
> On Thursday 24 January 2008, Albert Y. C. Lai wrote:
>> Matthew Pocock wrote:
>> > I've been using hxt to process xml files. Now that my files are getting 
>> > a
>> > bit bigger (30m) I'm finding that hxt uses inordinate amounts of 
>> > memory.
>> > I have 8g on my box, and it's running out. As far as I can tell, this
>> > memory is getting used up while parsing the text, rather than in any
>> > down-stream processing by xpickle.
>> >
>> > Is this a known issue?
>>
>> Yes, hxt calls parsec, which is not incremental.
>>
>> haxml offers the choice of non-incremental parsers and incremental
>> parsers. The incremental parsers offer finer control (and therefore also
>> require finer control).
>
> I've got a load of code using xpickle, which taken together are quite an
> investment in hxt. Moving to haxml may not be very practical, as I'll have 
> to
> find some eqivalent of xpickle for haxml and port thousands of lines of 
> code
> over. Is there likely to be a low-cost solution to convincing hxt to be
> incremental that would get me out of this mess?
>
> Matthew

I don't think so. Even if you replace parsec, HXT is itself not incremental. 
(It stores the whole XML document in memory as a tree, and the tree is not 
memory effecient.

Still I am a bit surprised that you can't parse 30m with 8 gig memory.

This was discussed here before, and I think someone benchmarked HXT as using 
roughly 50 bytes of memory per 1 byte of input.
i.e. HXT would then be using about 1.5 gig of memory for your 30m file.

Rene.