[Haskell-cafe] anti-memoization?

Sat Aug 28 08:45:31 EDT 2004

I've been thinking and wondering about the possiblility of
anti-memoization.  Memoization reduces computational time at the expense of
additional memory use.  I would like to be able to do the opposite.
Obviously, antimemoization is something you'd only want to do with "cheap"
functions having larger outputs than inputs.

In particular, I have (in darcs) a number of "parsing" functions, which use
up a lot of memory, and *also* hold onto their argument.  The simplest
example is linesPS, which breaks a PackedString (from my FastPackedString
library) into lines.  The contents of the argument are not copied, so the
original string is held in memory as long as its parsed version is also
held.

linesPS :: PackedString -> [PackedString]

I'd like to be able to tell the GC that it can "throw away" the parsed
lines, and replace the thunk of the [PackedString] with the original
function if necesary.

I think this can be approximated using weak pointers, but to do it "right"
would require support in the compiler.  Using weak pointers I imagine
something like:

antimemoize :: (a -> b) -> a -> AntiMemo a b
readAntiMemo :: AntiMemo a b -> b

with

data AntiMemo a b = AntiMemo (a -> b) a (IORef (Weak b))

where

readAntiMemo (AntiMemo f a iowb) =
  case unsafePerformIO (readIORef iowb >>= deRefWeak) of
  Nothing -> case f a of
             newb -> unsafePerformIO $
                     do wb <- mkWeakPtr newb Nothing
                        writeIORef iowb wb
                        return newb
  Just b -> b

antimemoize f a =
  unsafePerformIO $ do wb <- mkWeakPtr (f a) Nothing
                       iowb <- newIORef wb
                       return (AntiMemo f a iowb)

I haven't actually tested it, but to the extent that I understand weak
pointers, it ought to work.  Comments or suggestions are welcome.

It would be really nice if this could be done at the "thunk" level, so it
could be transparent to the caller.  I'd really like to be able to create
the function

antimemoize :: (a -> b) -> (a -> b)

which like "memo" would be completely transparent in the sense that it
would be a pure function and wouldn't modify the output of the function it
modifies, but would just make the output of that function a thunk that when
evaluated stores not only the output b, but also the function itself and
the input a, so that the b output could be garbage-collected.

On the other hand, perhaps there's a reason why all this wouldn't work.  I
certainly don't understand space usage in haskell as well as I wish.  But I
also *really* would like to reduce the space usage of darcs.  Some of this
I can acheive by strategically introducing laziness, but that's a rather
fragile way to reduce the memory footprint, since relatively small
modifications change what gets held in memory.  So it'd be nice to be able
to throw away the results of a parse that I can always re-perform.
-- 
David Roundy
http://www.abridgegame.org/darcs