[Haskell-cafe] Constructing Data.Map from ByteString

Tue Mar 11 12:18:05 EDT 2008

On Mon, Mar 10, 2008 at 11:33:58PM +0000, Dave Tapley wrote:
>I've been plugging away at this all day and some discussion in
>#haskell has been fruitless. Perhaps you have the inspiration to see
>what's happening!
>
>Concerning this minimal example:
>http://hpaste.org/6268
>
>It works as required, loading K/V pairs into a Data.Map, the concern
>is the amount of memory used. Pausing (using a getLine) after the
>readFile one can see (through 'ps v') that the whole file 'f' is
>loaded in to memory.
>However once the Map is created the memory footprint swells to around
>five times the size. Surely this can't be just overhead from Map?
>My reasoning was that by using ByteString and getting the whole file
>into memory a small and linear increase would be seen for Map
>overhead..

Just looking at the code, I'd guess that it's making at least three
copies of the data in total, plus Map overhead, so five times larger
before garbage collection isn't completely outrageous. (Unless my
intuition of what ByteString is doing is completely off-base of
course.) Don't forget that Map construction is lazy in its value
argument by default: Lots of potential for thunks to hang around the
place.

>I have tried using both Data.ByteString.Char8 and
>Data.ByteString.Lazy.Char8 with negligible difference.  For a hoot I
>tried it with String and yes, it's ridiculous :)

Strings won't help, since you're modifying the data, so you get to pay
the string overhead without getting any sharing advantage from the
String list implementation.

You might do better feeding the input to a proper parser to build the
map -- At least then you'd only be copying the input ByteString data
once. I don't think you can do much better than that in an immutable
world.

Phil

-- 
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt