[Haskell-cafe] Space leak with Data.Binary and decodeFile
Maxime Henrion
mhenrion at gmail.com
Sun Jan 11 17:00:09 EST 2009
Hello all,
I've been observing a huge space leak with some code using Data.Binary
that I cannot make sense of and I hope someone here can shed some light
on this, so I'll try to explain my problem as clearly as possible. I
qualify the space leak as huge because if I let the program run, it will
soon consume the whole memory available (~3G) and finally will get
killed by the system.
The code I'm writing implements a search algorithm using an inverted
index. This index is built from a Trie [Int] (from the bytestring-trie
package) and an Array Int ByteString. The trie maps each referenced
word to an integer list that is a list of indices into the array. Here
is the code for the Index datatype and the obvious Binary instance:
data Index = Index { entries :: Array Int ByteString
, invidx :: Trie [Int
}
instance Binary Index where
put (Index dirs idx) = put dirs >> put idx
get = liftM2 get get
I have no problems creating and seralizing this data structure to a
file. The huge leak appears instead when I'm reading this data
structure from a file and try to do something with it.
This is the smallest test case I came up with that can reproduce the
problem :
main = do idx <- decodeFile "list.idx"; mapM_ (B.putStrLn . snd) (assocs
(entries idx))
The space leak also appears when I try to touch the trie instead of the
array. I've been trying tons of combinations involving adding or
removing strictness annotations and seq calls in various places with no
luck. I have also been adding SCC annotations and tried to profile the
code. This seemed to suggest the space leak happens in the get method
of the Array instance of Binary :
instance (Binary i, Ix i, Binary e) => Binary (Array i e) where
[...]
get = do
bs <- get
n <- get -- read the length
xs <- replicateM n get -- now the elems.
return (listArray bs xs)
The output of the profiler tells me that all the space gets allocated
from the "replicateM n get" expression.
Now for the really weird part: if I load my code in GHCi and type
"main", I can observe the space leak. However, if I copy paste the
definition of main instead, the code runs fine! This is the only
circumstance I've seen this code work instead of eating all the RAM...
I have been using GHC 6.10.1, binary 0.4.4 and bytestring-trie 0.1.2.
If there's anything else that I can do to understand what's going on, I
would gladfully hear about it. Please also tell me if I should provide
more information.
Thanks,
--
Maxime Henrion
More information about the Haskell-Cafe
mailing list