[Haskell-cafe] Bloom Filter
ajb at spamcop.net
ajb at spamcop.net
Mon Apr 30 22:39:28 EDT 2007
Quoting tom <tom at almostobsolete.net>:
> I'm pretty new to Haskell, I've been working on a Bloom filter
> implementation as a learning exercise.
Excellent! Sounds like a fun test.
> I'd really appreciate it if someone more experienced would comment on
> the code. I'm sure there's plenty of places where I'm doing things in
> silly or overly complex ways.
All in all, very well done. It works, and it looks pretty efficient.
My quibbles are mostly stylistic or syntactic in nature. Please
understand that the relative triviality of my quibbles is a sign that
there are really no major problems.
This is not a criticism, but more an advertisement: What are you using
for source control here? Darcs is nice, and as a bonus, it's trivially
browsable from a web browser, which saves downloading and unpacking.
You overuse parentheses. A lot. Definitions like this:
ary = (listArray (0, wordc-1) (repeat 0))
don't need parentheses around them, and just add to the general noise
And (.&. ((size b)-1)) is much more cleanly expressed as (.&. (size b - 1)).
Rather than carrying around a hash function, it might be better to use
a type class:
class BloomHash k where
bloomHash :: k -> [Word8]
You don't need to hard-code this. You can use:
wordsize = bitSize (undefined::Word32) -- Or Int, of course!
bitSize is defined in Data.Bits.
I got a bit confused by the local binding names. It's usual, especially
in generic code, to use "xs", "ys" etc for a list of "x" and "y".
Something like this might be more idiomatic:
splitup n xs = let (xs1, xs2) = splitAt n xs
in xs1 : splitup n xs2
(fromIntegral $ x `div` wordsize, fromIntegral $ x .&. (wordsize-1))
Seems intuitively wasteful. Either use divMod or bit operations.
Similarly, (hashfunc b) key is the same as hashfunc b key. But even
split bytecount . hashfunc b $ key
That makes it obvious that it's a pipeline of functions applied to the key.
This looks cool:
bytes2int = foldr ((. (256 *)) . (+)) 0 . (map toInteger)
but I'm not smart enough to parse it. This is both more readable and
bytes2int = foldr (\x r -> r*256 + fromInteger x) 0
Integer log2's are probably better done using integers only, or at least
abstracted out into a separate function.
Function guards are your friends! This:
bloom hf sz hc = if condition
else error "Badness"
is almost always better expressed as:
bloom hf sz hc
| condition = b
| otherwise = error "Badness"
You can now inline b. (I can see why you put it in a where clause; now
you don't have to.)
wordc, again, only needs integral arithmetic:
wordc = ceiling ((fromIntegral a) / (fromIntegral b :: Double))
is more or less:
wordc = (a+b-1) `div` b
And drop the parentheses around the definition of ary.
Try to use function names that are close to names in existing libraries,
like Data.Set. "insert" sounds better here.
Also, rather than this:
add :: Bloom a -> a -> Bloom a
a better argument order is this:
insert :: a -> Bloom a -> Bloom a
That way, you can use it with foldr.
Again, probably misnamed. Data.Set calls this "member". And again,
arguably the wrong argument ordering.
Once again, well done.
More information about the Haskell-Cafe