[Haskell-cafe] Why does this program eat RAM?

Donald Bruce Stewart dons at cse.unsw.edu.au
Tue Sep 5 01:12:42 EDT 2006


jeremy.shaw:
> At Tue, 5 Sep 2006 03:03:51 +0000 (UTC),
> John Goerzen wrote:
> > 
> > I have the below program, and I'm trying to run it on an input of about
> > 90MB.  It eats RAM like crazy, and I can't figure out why.
> 
> I have not looked in detail at your code -- but it could simply be the
> fact that String requires gobs of memory to store a string. If you
> forced all 90MB into memory at once, I would expect it to take almost
> of gig of RAM. (Around a 10-11 fold increase in size).
> 
> I suspect this line could be forcing the whole thing into memory:
> 
> > wordfreq = map (\x -> (head x, length x)) . group . sort
> 
> because sort can not return the first element until it has looked at
> all the elements in the list to determine which one should be first.
> 
> If you fold a Data.Map or associative list over the word-list, then
> you could probably get the lazy behaviour you expect.

A quick hack up to use Data.ByteString uses a lot less ram, though
profiling still shows 95% of time spent in the building the Map.

    import System.Environment
    import Data.Char
    import Data.List
    import qualified Data.Map as Map

    import qualified Data.ByteString.Char8 as B
    import Data.ByteString (ByteString)

    wordfreq inp = Map.toList $ foldl' k m inp
        where
          m     = Map.empty :: Map.Map ByteString Int
          k n w = Map.insertWith f w 1 n
          f _ x = let y = x + 1 in y `seq` y

    freqsort (w1, c1) (w2, c2) | c1 == c2  = compare w1 w2
                               | otherwise = compare c2 c1

    showit (w, c) = B.join (B.singleton ' ') [B.pack(show c), w]

    main :: IO ()
    main = do args <- getArgs
              B.interact $ B.unlines . map showit . take (read . head $ args)
                         . sortBy freqsort . wordfreq . B.words

-- Don


More information about the Haskell-Cafe mailing list