[Haskell-cafe] hash tables, judy arrays, and more

Fri Dec 6 12:45:31 UTC 2013

Hi,

As some of you may know¹, I've been playing around with associative data
structures, and in particular Judy arrays, using the "judy" package by
Don S.  Now, I find these to be fast and (more importantly) frugal
concerning memory - but now I see what I can only interpret as memory
corruption - a bit depending on the ordering of the code and/or
insertion of print statements, output is truncated or contaminated.

The strange thing is that this only occurs when I also use Data.Vector
or Data.HashTable.IO in the same program - if it's all Judy, then things
work as expected - as far as I have seen, anyway.

First I thought it was my dubious "insertWith" function, but when I
reverted to separate "insert" and "lookup", I can still provoke the
behavior.

Has anybody else seen anything similar?  Any idea how to debug something
like this?

Then I thought I'd look at hash tables, using the 'hashtables' package.
I haven't tested it much yet, but it appears to be a lot slower than
Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a
factor of 10).  I guess I might be able to improve things a bit by
judiciously applying strictness, but it seems to be storing both keys
and values unboxed, so I don't expect to come close to Judy - I guess
there isn't any unboxed hash table implementations around?

Anyway - I seem to have programmed myself into a corner here, so
suggestions welcome!

-k

¹ http://blog.malde.org/posts/frequency-counting.html and
http://blog.malde.org/posts/k-mer-counting.html
-- 
If I haven't seen further, it is by standing in the footprints of giants