[Haskell-cafe] Re: XML (HXML) parsing :: fixed GHC 6.8.3 space leak from 2000

Lev Walkin vlm at lionet.info
Tue Sep 23 10:47:33 EDT 2008

This solution seem to provide a practical alternative to pusing 
datatypes for streaming XML.


Lev Walkin wrote:
> Marc A. Ziegert wrote:
>>>> We don't know of a good way to fix this problem.  I'm going to 
>>>> record this example in a ticket for future reference, though.
>>> Simon,
>>> is there a way, perhaps, to rewrite this expression to avoid leaks?
>>> An ad-hoc will do, perhaps split in two modules to avoid intramodular
>>> optimizations?
>>> -- 
>>> Lev Walkin
>> finally... there is a way! :D
>> hmm... this was a nice puzzle ;)
>> i've tried several times (and hours!) to implement a Continuation (not 
>> monad) based solution, but finally i developed this tricky but elegant 
>> foldr solution...
>> i built the parser around this type:
>>   type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either 
>> [UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree 
>> String)]
>> it is based on the following thought:
>> the tuple
>>   (rs,ps)::([Rest],[Processed]) -- with the restriction, which forces 
>> the list ps to be processed entirely before rs.
>> is equipollent to
>>   (fmap Right ps++[Left rs])::[Either [Rest] Processed]
>> , but the latter is easier to handle ...at least if you can't trust 
>> the GC.
> Marc, you are my hero of the month!
> I can't say I understood this solution before applying it back to
> HXML-0.2, but it surely worked and made quite observable 20%
> difference in performance:
> 9.8 seconds on my 45 megabyte XML test, running in half the space
> (4m) compared to my parallel version based on Ketil Malde's suggestion
> (which was 12 seconds on two cores (though, one core was almost
> idling, `par` was used purely for its side-effect)).
> To those who wants to parse XML in constant space, attached find
> a patch to HXML-0.2 which fixes a space leak.
> However, I am still a bit surprized to discover there is not an order
> of magnitude difference between `par`-based version and your builder.
> While the foldr-based builder is clearly superior, one can't
> help but wonder whether is it `par` that is so efficient compared
> to crunching through Eithers, or there's some other bottleneck
> in the code. Will profile a bit later.
> The XML parsing space leak was declared in HXML back in 2000 and
> lingered in the code for 8 years. Good riddance!
> ------------------------------------------------------------------------
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

More information about the Haskell-Cafe mailing list