[Haskell-cafe] is 256M RAM insufficient for a 20 million
element Int/Int map?
Bertram Felgenhauer
bertram.felgenhauer at googlemail.com
Sat Oct 18 22:19:31 EDT 2008
Don Stewart wrote:
> tphyahoo:
> > I'm trying to run a HAppS web site with a large amount of data: stress
> > testing happstutorial.com.
> > Well, 20 million records doesn't sound that large by today's
> > standards, but anyway that's my goal for now.
> > I have a standard Data.Map.Map as the base structure for one of my
> > macid data tables (jobs), but I noticed something
> > that is probably causing problems for me.
> > Even a simple 20 million record with int/int key values causes an out
> > of memory error for me in ghci,
>
> Int keys, Int values eh?
>
> Does using IntMap help?
Interesting. Map Int Int, IntMap Int and [(Int, Int)] use pretty much
the same amount of memory, assuming that no keys or values are shared:
Map Int Int:
data Map k a = Tip
| Bin {-# UNPACK #-} !Size !k a !(Map k a) !(Map k a)
- all 'Tip's are shared.
- For each (key, value) pair, there is one 'Bin', needing
tag[*] (1 word)
size (unpacked, 1 word)
key, value (pointer, tag, value = 3 words each)
two more pointers (1 word each)
for a total of 10 words per element.
IntMap Int:
data IntMap a = Nil
| Tip {-# UNPACK #-} !Key a
| Bin {-# UNPACK #-} !Prefix {-# UNPACK #-} !Mask
!(IntMap a) !(IntMap a)
- one 'Tip' per element:
tag (1 word) + key (1 word) + value (3 words)
- one 'Bin' per element (minus 1):
tag (1 word) + prefix, mask (1 word each) + 2 pointers (1 word each)
for a total of 10 words per element.
[(Int, Int)]:
- one '(:)' per element:
- tag (1 word) + 2 pointers (1 word each)
- one '(Int, Int)' per element:
- tag (1 word) + key (3 words) + value (3 words)
for a total of 10 words per element, again.
Now if we want to save memory, we can specialise Data.Map
to Int keys and values, saving 4 words per element, and encode the
external leaves (Tips) in the constructor, saving another word:
data IntIntMap = Tip
| BinBB !Size !Int !Int IntIntMap IntIntMap
| BinBT !Size !Int !Int IntIntMap
| BinTB !Size !Int !Int IntIntMap
| BinTT !Size !Int !Int
-- sprinkle {-# UNPACK #-} as needed
That's 5 words per elements. Would that be worthwhile?
Bertram
[*] actually, the info pointer in the Spineless, Tagless G-machine.
More information about the Haskell-Cafe
mailing list