[Haskell-cafe] How on Earth Do You Reason about Space?
brandon_m_moore at yahoo.com
Tue May 31 20:43:27 CEST 2011
I can't reproduce heap usage growing with the
size of the input file.
I made a word list from Project Gutenberg's
copy of "War and Peace" by
tr -sc '[[:alpha:]]' '\n' < pg2600.txt > words.txt
Using 1, 25, or 1000 repetitions of this ~3MB wordlist
shows about 100MB of address space used according
to top, and no more than 5MB or so of haskell heap
used according to the memory profile, with a flat
Is your memory usage growing with the size of the input
file, or the size of the histogram?
I was worried data sharing might mean your keys
retain entire 64K chunks of the input. However, it
seems enumLines depends on the StringLike ByteString
instance, which just converts to and from String.
That can't be efficient, but I suppose it avoids excessive sharing.
The other thing that occurs to me is that the total size of
your keys would also be approximately the size of the input
file if you were using plain text without each word split onto
a separate line.
More information about the Haskell-Cafe