[Haskell-cafe] Re: XML (HXML) parsing :: GHC 6.8.3 space leak from 2000

Marc A. Ziegert coeus at gmx.de
Tue Sep 23 07:21:35 EDT 2008


> > 
> >>     -- Lazily build a tree out of a sequence of tree-building events
> >>     build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
> >>     build (Start str : es) =
> >>             let (es', subnodes) = build es
> >>                 (spill, siblings) = build es'
> >>             in (spill, (Tree str subnodes : siblings))
> >>     build (Leaf str : es) =
> >>             let (spill, siblings) = build es
> >>             in (spill, Tree str [] : siblings)
> >>     build (Stop : es) = (es, [])
> >>     build [] = ([], [])
> 
> [skip]
> 
> > We don't know of a good way to fix this problem.  I'm going to record 
> > this example in a ticket for future reference, though.
> 
> Simon,
> 
> is there a way, perhaps, to rewrite this expression to avoid leaks?
> An ad-hoc will do, perhaps split in two modules to avoid intramodular
> optimizations?
> 
> -- 
> Lev Walkin

finally... there is a way! :D

hmm... this was a nice puzzle ;)

i've tried several times (and hours!) to implement a Continuation (not monad) based solution, but finally i developed this tricky but elegant foldr solution...
i built the parser around this type:
  type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either [UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree String)]

it is based on the following thought:
the tuple
  (rs,ps)::([Rest],[Processed]) -- with the restriction, which forces the list ps to be processed entirely before rs.
is equipollent to
  (fmap Right ps++[Left rs])::[Either [Rest] Processed]
, but the latter is easier to handle ...at least if you can't trust the GC.


- marc

---------------example_context_free_grammar_parser.hs--------------------------
module Main where

import Data.List

data Tree a = Tree a [Tree a] deriving Show

data TreeEvent = Start String	-- Branch off a new subtree
        | Stop		-- Stop branching and return 1 level
        | Leaf String	-- A simple leaf without children
        deriving Show

main = print . snd . build $ Start "top" : cycle [Leaf "sub"]
--main = print . snd . build $ [Leaf "bla",Leaf "bla",Start "S(",Leaf "bli",Start "T(",Leaf "blu",Stop,Stop,Leaf "bla"]

type UnconsumedEvent = TreeEvent	-- Alias for program documentation



build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
build tes = let (ts_,ue_,_) = splitAtLeftDefault [] $ foldr builder [] [(te,ue)|ue@(te:_)<-tails tes] in (ue_,ts_)
--                                                                              ^^^^^^^^^
-- a little change (bugfix?) to the space leaking solution...
-- [Stop,Leaf "x"]  now evaluates to  ([Stop,Leaf "x"],[])  instead of  ([Leaf "x"],[])
-- like this:   build ue@(Stop:_) = (ue,[])
-- instead of:  build (Stop : es) = (es,[])

type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either [UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree String)]
builder :: FoldR_Builder
builder (Stop,ue) euts = (Left ue:euts)
builder (Leaf str,_) euts = (Right (Tree str []):euts)
builder (Start str,_) euts = let (sub,_,euts') = splitAtLeftDefault [] euts in (Right (Tree str sub):euts')


-- default value is needed iff the list is finite and contains no (Left _).
splitAtLeftDefault :: a -> [Either a b] -> ([b],a,[Either a b])
splitAtLeftDefault a0 [] = ([],a0,[])
splitAtLeftDefault a0 (Right b:xs) = let (bs,a,es) = splitAtLeftDefault a0 xs in (b:bs,a,es)
splitAtLeftDefault _ (Left a:xs) = ([],a,xs)













-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20080923/f2995f86/attachment.bin


More information about the Haskell-Cafe mailing list