[Haskell-cafe] Re: XML (HXML) parsing :: GHC 6.8.3 space leak
from 2000
Marc A. Ziegert
coeus at gmx.de
Tue Sep 23 07:21:35 EDT 2008
> >
> >> -- Lazily build a tree out of a sequence of tree-building events
> >> build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
> >> build (Start str : es) =
> >> let (es', subnodes) = build es
> >> (spill, siblings) = build es'
> >> in (spill, (Tree str subnodes : siblings))
> >> build (Leaf str : es) =
> >> let (spill, siblings) = build es
> >> in (spill, Tree str [] : siblings)
> >> build (Stop : es) = (es, [])
> >> build [] = ([], [])
>
> [skip]
>
> > We don't know of a good way to fix this problem. I'm going to record
> > this example in a ticket for future reference, though.
>
> Simon,
>
> is there a way, perhaps, to rewrite this expression to avoid leaks?
> An ad-hoc will do, perhaps split in two modules to avoid intramodular
> optimizations?
>
> --
> Lev Walkin
finally... there is a way! :D
hmm... this was a nice puzzle ;)
i've tried several times (and hours!) to implement a Continuation (not monad) based solution, but finally i developed this tricky but elegant foldr solution...
i built the parser around this type:
type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either [UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree String)]
it is based on the following thought:
the tuple
(rs,ps)::([Rest],[Processed]) -- with the restriction, which forces the list ps to be processed entirely before rs.
is equipollent to
(fmap Right ps++[Left rs])::[Either [Rest] Processed]
, but the latter is easier to handle ...at least if you can't trust the GC.
- marc
---------------example_context_free_grammar_parser.hs--------------------------
module Main where
import Data.List
data Tree a = Tree a [Tree a] deriving Show
data TreeEvent = Start String -- Branch off a new subtree
| Stop -- Stop branching and return 1 level
| Leaf String -- A simple leaf without children
deriving Show
main = print . snd . build $ Start "top" : cycle [Leaf "sub"]
--main = print . snd . build $ [Leaf "bla",Leaf "bla",Start "S(",Leaf "bli",Start "T(",Leaf "blu",Stop,Stop,Leaf "bla"]
type UnconsumedEvent = TreeEvent -- Alias for program documentation
build :: [TreeEvent] -> ([UnconsumedEvent], [Tree String])
build tes = let (ts_,ue_,_) = splitAtLeftDefault [] $ foldr builder [] [(te,ue)|ue@(te:_)<-tails tes] in (ue_,ts_)
-- ^^^^^^^^^
-- a little change (bugfix?) to the space leaking solution...
-- [Stop,Leaf "x"] now evaluates to ([Stop,Leaf "x"],[]) instead of ([Leaf "x"],[])
-- like this: build ue@(Stop:_) = (ue,[])
-- instead of: build (Stop : es) = (es,[])
type FoldR_Builder = (TreeEvent,[UnconsumedEvent]) -> [Either [UnconsumedEvent] (Tree String)] -> [Either [UnconsumedEvent] (Tree String)]
builder :: FoldR_Builder
builder (Stop,ue) euts = (Left ue:euts)
builder (Leaf str,_) euts = (Right (Tree str []):euts)
builder (Start str,_) euts = let (sub,_,euts') = splitAtLeftDefault [] euts in (Right (Tree str sub):euts')
-- default value is needed iff the list is finite and contains no (Left _).
splitAtLeftDefault :: a -> [Either a b] -> ([b],a,[Either a b])
splitAtLeftDefault a0 [] = ([],a0,[])
splitAtLeftDefault a0 (Right b:xs) = let (bs,a,es) = splitAtLeftDefault a0 xs in (b:bs,a,es)
splitAtLeftDefault _ (Left a:xs) = ([],a,xs)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20080923/f2995f86/attachment.bin
More information about the Haskell-Cafe
mailing list