[Haskell-cafe] Space leak when returning pairs?

Chris Kuklewicz haskell at list.mightyreason.com
Fri May 19 12:57:08 EDT 2006


Shin-Cheng Mu wrote:
> Dear members,
> 
> I am experiencing a space leak, which I suspect to be
> an instance of the problem addressed by Wadler before.
> I'd appreciate if someone here would take a look.
> 
> Given the following datatype:
> 
>  data XMLEvent = StartEvent String
>                | EndEvent   String
>                | TextEvent  String  deriving Show
> 
> where the three constructors represent the start tag
> (<a> where a is a string), end tag (</a>), and text data,
> respectively, of an XML stream. (They are actually from
> the library HXML). The following function
> simply returns the same stream while doing a minimal
> amount of validation (ignoring the closing tag).
> 
>   idX :: [XMLEvent] -> ([XMLEvent], [XMLEvent])
>   idX [] = ([], [])
>   idX (StartEvent a : strm) =
>     let (ts, strm') = idX strm
>         (us, strm'') = idX strm'
>     in (StartEvent a [] : ts ++ EndEvent a : us, strm'')
>   idX (EndEvent _: strm) = ([], strm)
>   idX (TextEvent s : strm) =
>     let (ts, strm') = idX strm
>     in (TextEvent s : ts, strm')
> 
> The function idX returns a pair, where the first component
> is the processed stream, while the second component is the
> rest of the input. The intention is to thread the input
> and release processed events.
> 
> If the function is used in a pipelined manner:
> 
>    print . fst . idX . parseInstance $ input
> 

I would not have written idX in this manner.  My version is

> data XMLEvent = StartEvent String
>               | EndEvent   String
>               | TextEvent  String  deriving Show 
> 
> idX' :: [XMLEvent] -> [XMLEvent]
> idX' = untilTags []
> 
> untilTags :: [String] -> [XMLEvent] -> [XMLEvent]
> untilTags [] [] = []
> untilTags tags [] = error ("Did not find closing EndEvents " ++ show tags)
> untilTags tags (x:xs) =
>   case x of
>     StartEvent tag' -> x : untilTags (tag':tags) xs
>     EndEvent tag' -> if null tags 
>                        then error ("Unexpected EndEvent with tag " ++ show tag')
>                        else let (tag:tags') = tags
>                             in if tag == tag'
>                                  then x : untilTags tags' xs
>                                  else error ("Expected EndEvents " ++ show tags ++ " but found EndEvent " ++ show tag')
>     TextEvent _ -> x : untilTags tags xs

Here the flow of the input "x : ..." to the output "x : ..." is more obvious,
and the open tags are kept in a stack and used to check the closing tags.

The memory usage will be only slightly higher than your idX due to keeping the
list of open tag names.  If you want to remove that overhead and assume the
ending closing tags have correct strings, then you can keep a mere count of open
tags:

> countTags :: Int -> [XMLEvent] -> [XMLEvent]
> countTags 0 [] = []
> countTags n [] = error ("Did not find the closing EndEvents " ++ show n)
> countTags n (x:xs) =
>   case x of
>     StartEvent tag' -> x : countTags (succ n) xs
>     EndEvent tag' -> if 0 == n
>                        then error ("Unexpected EndEvent with tag " ++ show tag')
>                        else x : countTags (pred n) xs
>     TextEvent _ -> x : countTags n xs


More information about the Haskell-Cafe mailing list