[Haskell-cafe] is 256M RAM insufficient for a 20 million element Int/Int map?

Derek Elkins derek.a.elkins at gmail.com
Sat Oct 18 16:40:43 EDT 2008


On Sat, 2008-10-18 at 22:26 +0200, Thomas Hartman wrote:
> {-# LANGUAGE BangPatterns #-}
> import qualified Data.Map as M
> import Debug.Trace
> {-
> I'm trying to run a HAppS web site with a large amount of data: stress
> testing happstutorial.com.
> Well, 20 million records doesn't sound that large by today's
> standards, but anyway that's my goal for now.
> I have a standard Data.Map.Map as the base structure for one of my
> macid data tables (jobs), but I noticed something
> that is probably causing problems for me.
> Even a simple 20 million record with int/int key values causes an out
> of memory error for me in ghci,
> on a computer with 256M of ram.
> I'm wondering if there is a data structure that might be more suitable
> for large recordsets.
> Or do you just have to use a database, or some sort of file-based
> serialization, once your records
> are in the millions?
> Or is this some weird subtlety of lazy evalution, or some other haskell gotcha?
> -}

In GHC, a linked list is about 12 bytes per cons (on a 32-bit computer).
Let's say you had a linked list of Ints, 20 million elements long.
That's 240 million bytes.  Data.Map probably has a higher per element
memory cost.  You could probably use/make a data structure that stores
the data more densely, but even a flat array of 20 million Ints on a
32-bit machine is approximately 80MB, on a 64-bit machine 160MB.  Note,
that by today's standards 256MB of memory is no memory at all.


> 
> size = 2 * 10^7
> 
> -- out of memory error
> t = (M.! size) . myFromList . map (\i->(i,i)) $ [1..size]
> 
> -- Lists are no problem
> {-
> *Main> :! time ghc -e tL testMap.hs
> (20000000,20000000)
> 3.38user 0.09system 0:03.53elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
> -}
> tL = (!! (size-1)) . map (\i->(i,i)) $ [1..size]
> 
> t2 = (M.fromList . map (\i->(i,i)) $ [1..10] )
>        M.\\  (M.fromList . map (\i->(i,i)) $ [6..15])
> 
> 
> -- does this evaluate all of list l, or just whnf?
> myFromList (!l) = M.fromList l

tL can garbage collect the list as it goes along and runs in constant
memory due to laziness.  It may even be deforested leading to no heap
allocation at all.  As I mentioned above, all of a 20 million element
long list probably wouldn't fit in (physical) memory on your computer.

In t2's case, the entire Map is built and in memory.



More information about the Haskell-Cafe mailing list