[Haskell-cafe] Trying to reduce memory costs of String duplicates

Luke Palmer lrpalmer at gmail.com
Sat Sep 5 13:38:02 EDT 2009


2009/9/5 Günther Schmidt <gue.schmidt at web.de>:
> Hi all,
>
> I'm reading in a data of 216k records into a map of Key, Values Pairs, the
> values being strings.
>
> As it happens out of 216k String values there really are only about 6.6k
> distinct string values, so I could save a lot of RAM if I was able to
> "insert" only actually *new* string values into the map and use references
> to (string) values that already are in memory instead.
>
> Is there a container that would, if I wanted to insert an element, return a
> pair of either the previously inserted, equal value and the container
> unchanged, or the new, previously unknown value and the new container
> amended by that element?

I believe a memoization of the identity function will do what you want:

import qualified Data.MemoCombinators as Memo

share = Memo.list Memo.char id

Then pass any string through share to make/get a cached version.

You might want to limit the scope of share -- eg. put it in a where
clause for the function where you're using it -- so that it doesn't
eat memory for the lifetime of your program, only for when you need
it.

Luke


More information about the Haskell-Cafe mailing list