[Haskell-cafe] Constructing Data.Map from ByteString

Don Stewart dons at galois.com
Tue Mar 11 20:31:30 EDT 2008

> Hi all,
> I've been plugging away at this all day and some discussion in
> #haskell has been fruitless. Perhaps you have the inspiration to see
> what's happening!
> Concerning this minimal example:
> http://hpaste.org/6268
> It works as required, loading K/V pairs into a Data.Map, the concern
> is the amount of memory used. Pausing (using a getLine) after the
> readFile one can see (through 'ps v') that the whole file 'f' is
> loaded in to memory.
> However once the Map is created the memory footprint swells to around
> five times the size. Surely this can't be just overhead from Map?
> My reasoning was that by using ByteString and getting the whole file
> into memory a small and linear increase would be seen for Map
> overhead..
> I have tried using both Data.ByteString.Char8 and
> Data.ByteString.Lazy.Char8 with negligible difference.
> For a hoot I tried it with String and yes, it's ridiculous :)

Speaking to you on #haskell we worked out that the keys are integers,
and elements can be bytestrings, so an IntMap ByteString seems like
a good idea.

The attached builds the Map directly (avoiding the lines splitting of
the file), and seems to use around half the heap of the generic Map.
It also builds much faster.

Still, for big data, I'm not sure that a more specialised structure
wouldn't be better.

-- Don
-------------- next part --------------
{-# OPTIONS -fbang-patterns #-}

import System.Environment
import Control.Monad
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Unsafe as S
import qualified Data.IntMap as M

main = do
    [f] <- getArgs
    x <- loadToMap f
    print (M.size x)
    getChar -- wait

loadToMap :: String -> IO (M.IntMap S.ByteString)
loadToMap f = parseLines `fmap` S.readFile f

-- Build an IntMap as we traverse the NNN\tfoo\n lines of the file
parseLines :: S.ByteString -> M.IntMap S.ByteString
parseLines s = go s M.empty
     go s m
        | S.null s  = m
        | otherwise = case S.readInt s of
            Nothing     -> error "No integer field"
            Just (n, y) -> go (S.tail rest) m'

                where (str,rest) = S.break ('\n'==) (S.unsafeTail y)
                      m'         = M.insert (fromIntegral n) str m

More information about the Haskell-Cafe mailing list