[Haskell-cafe] is 256M RAM insufficient for a 20 million element Int/Int map?

Bertram Felgenhauer bertram.felgenhauer at googlemail.com
Sat Oct 18 22:19:31 EDT 2008


Don Stewart wrote:
> tphyahoo:
> > I'm trying to run a HAppS web site with a large amount of data: stress
> > testing happstutorial.com.
> > Well, 20 million records doesn't sound that large by today's
> > standards, but anyway that's my goal for now.
> > I have a standard Data.Map.Map as the base structure for one of my
> > macid data tables (jobs), but I noticed something
> > that is probably causing problems for me.
> > Even a simple 20 million record with int/int key values causes an out
> > of memory error for me in ghci,
> 
> Int keys, Int values eh?
> 
> Does using IntMap help?

Interesting. Map Int Int, IntMap Int and [(Int, Int)] use pretty much
the same amount of memory, assuming that no keys or values are shared:

Map Int Int:

  data Map k a = Tip 
               | Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a) 

  - all 'Tip's are shared.
  - For each (key, value) pair, there is one 'Bin', needing
    tag[*] (1 word)
    size (unpacked, 1 word)
    key, value (pointer, tag, value = 3 words each)
    two more pointers (1 word each)
  for a total of 10 words per element.

IntMap Int:

  data IntMap a = Nil
     | Tip {-# UNPACK #-} !Key a
     | Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask
           !(IntMap a) !(IntMap a) 

  - one 'Tip' per element:
    tag (1 word) + key (1 word) + value (3 words)
  - one 'Bin' per element (minus 1):
    tag (1 word) + prefix, mask (1 word each) + 2 pointers (1 word each)
  for a total of 10 words per element.

[(Int, Int)]:
  - one '(:)' per element:
    - tag (1 word) + 2 pointers (1 word each)
  - one '(Int, Int)' per element:
    - tag (1 word) + key (3 words) + value (3 words)
  for a total of 10 words per element, again.  

Now if we want to save memory, we can specialise Data.Map
to Int keys and values, saving 4 words per element, and encode the
external leaves (Tips) in the constructor, saving another word:

  data IntIntMap = Tip
      | BinBB !Size !Int !Int IntIntMap IntIntMap
      | BinBT !Size !Int !Int IntIntMap
      | BinTB !Size !Int !Int IntIntMap
      | BinTT !Size !Int !Int
  -- sprinkle {-# UNPACK #-} as needed

That's 5 words per elements. Would that be worthwhile?

Bertram

[*] actually, the info pointer in the Spineless, Tagless G-machine.


More information about the Haskell-Cafe mailing list