[Haskell-cafe] Why does this program eat RAM?
Donald Bruce Stewart
dons at cse.unsw.edu.au
Tue Sep 5 01:12:42 EDT 2006
jeremy.shaw:
> At Tue, 5 Sep 2006 03:03:51 +0000 (UTC),
> John Goerzen wrote:
> >
> > I have the below program, and I'm trying to run it on an input of about
> > 90MB. It eats RAM like crazy, and I can't figure out why.
>
> I have not looked in detail at your code -- but it could simply be the
> fact that String requires gobs of memory to store a string. If you
> forced all 90MB into memory at once, I would expect it to take almost
> of gig of RAM. (Around a 10-11 fold increase in size).
>
> I suspect this line could be forcing the whole thing into memory:
>
> > wordfreq = map (\x -> (head x, length x)) . group . sort
>
> because sort can not return the first element until it has looked at
> all the elements in the list to determine which one should be first.
>
> If you fold a Data.Map or associative list over the word-list, then
> you could probably get the lazy behaviour you expect.
A quick hack up to use Data.ByteString uses a lot less ram, though
profiling still shows 95% of time spent in the building the Map.
import System.Environment
import Data.Char
import Data.List
import qualified Data.Map as Map
import qualified Data.ByteString.Char8 as B
import Data.ByteString (ByteString)
wordfreq inp = Map.toList $ foldl' k m inp
where
m = Map.empty :: Map.Map ByteString Int
k n w = Map.insertWith f w 1 n
f _ x = let y = x + 1 in y `seq` y
freqsort (w1, c1) (w2, c2) | c1 == c2 = compare w1 w2
| otherwise = compare c2 c1
showit (w, c) = B.join (B.singleton ' ') [B.pack(show c), w]
main :: IO ()
main = do args <- getArgs
B.interact $ B.unlines . map showit . take (read . head $ args)
. sortBy freqsort . wordfreq . B.words
-- Don
More information about the Haskell-Cafe
mailing list