[Haskell-cafe] Space leak when returning pairs?

Shin-Cheng Mu scm at ipl.t.u-tokyo.ac.jp
Fri May 19 11:56:43 EDT 2006


On May 19, 2006, at 6:16 PM, Henning Thielemann wrote:
> let ~(ts, strm') = idX strm
>     ~(us, strm'') = idX strm'

I seem to have found a partial solution to the problem.
It's rather ugly, however, and I think there should be
a better way.

The original definition for one of the clauses was like
this:

   idX (StartEvent a : strm) =
       let (ts, strm') = idX strm
           (us, strm'') = idX strm'
       in (StartEvent a [] : ts ++ EndEvent a : us, strm'')

The intention was to thread strm through the recursive calls
and free the items in strm as soon as they are processed. However,
with this definition I needed about 8Mb of heap for a 1Mb input
file. Since I suspected that delayed evaluation of us and strm''
keeps the pointer to strm around for too long, the natural solution
seems to be forcing their evaluation. So I tried:

   idX (StartEvent a : strm) =
       case idX strm of
          (ts, strm') ->
              case idX strm' of
                (us, strm'') ->
                  (StartEvent a [] : ts ++ EndEvent a : us, strm'')

which made things worse. The heap size increased a bit more
and I get a typical "bell" shaped heap profile suggesting idX
cannot output anything before processing the whole input.

So I have to allow idX to output something while consuming
the input. What eventually worked was this messy mixture
of let and case:

   idX (StartEvent a : strm) =
     let (xs, rest) =
          case idX strm of
            (ts, strm') ->
               let (ws, rest) =
                     case idX strm' of
                       (us, strm'') -> (us, strm'')
               in (ts ++ EndEvent a : ws, rest)
     in (StartEvent a [] : xs, rest)

The heap residency reduced from about 8MB to around 80Kb.

This is rather tricky, however. The following "nicer looking"
version has a heap profile as bad as the worse case.

    idX (StartEvent a : strm) =
      let (ts,us,strm'') =
          case idXsp strm of
            (ts, strm') ->
                case idXsp strm' of
                  (us, strm'') -> (ts, us, strm'')
      in (StartEvent a [] : ts ++ EndEvent a : us, strm'')

And I cannot quite explain why (Anyone?).

Is there a more structured solution? Can I leave the hard
work to the compiler?

sincerely,
Shin-Cheng Mu


More information about the Haskell-Cafe mailing list