[Haskell-cafe] hash tables, judy arrays, and more
Ketil Malde
ketil at malde.org
Fri Dec 6 12:45:31 UTC 2013
Hi,
As some of you may know¹, I've been playing around with associative data
structures, and in particular Judy arrays, using the "judy" package by
Don S. Now, I find these to be fast and (more importantly) frugal
concerning memory - but now I see what I can only interpret as memory
corruption - a bit depending on the ordering of the code and/or
insertion of print statements, output is truncated or contaminated.
The strange thing is that this only occurs when I also use Data.Vector
or Data.HashTable.IO in the same program - if it's all Judy, then things
work as expected - as far as I have seen, anyway.
First I thought it was my dubious "insertWith" function, but when I
reverted to separate "insert" and "lookup", I can still provoke the
behavior.
Has anybody else seen anything similar? Any idea how to debug something
like this?
Then I thought I'd look at hash tables, using the 'hashtables' package.
I haven't tested it much yet, but it appears to be a lot slower than
Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a
factor of 10). I guess I might be able to improve things a bit by
judiciously applying strictness, but it seems to be storing both keys
and values unboxed, so I don't expect to come close to Judy - I guess
there isn't any unboxed hash table implementations around?
Anyway - I seem to have programmed myself into a corner here, so
suggestions welcome!
-k
¹ http://blog.malde.org/posts/frequency-counting.html and
http://blog.malde.org/posts/k-mer-counting.html
--
If I haven't seen further, it is by standing in the footprints of giants
More information about the Haskell-Cafe
mailing list