[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and
mutable Bloom filters
Scott Cruzen
sic at lerp.com
Tue Jun 3 19:03:33 EDT 2008
* Jim Snow <jsnow at cs.pdx.edu> [080531 15:23]:
> Without looking at the code to verify that this is how it has really
> been implemented, a bloom filter is like a series of hash tables, where
> the hash table entries are one bit. The bit is set if there is an item
> that hashes to that value in the bloom filter. So, assuming a single
> hash table where half the bits are set, there's a 50% false positive
> rate and no false negatives when you do a membership test.
>
> To reduce the false positives, we can add another hash table with a
> different hash function. Assuming it also is half full, we can check if
> an item is in both tables, and our false positive rate drops to 25%.
>
> In practice, one might use something like 32 hash tables. This yields a
> false positive rate of 1/(2^32). Their most obvious application is to
> store the dictionary for a spell checker in a space-efficient way,
> though I have a friend who wrote a paper on using them for router caches.
>
> There was a google tech talk on bloom filters awhile ago:
> http://www.youtube.com/watch?v=947gWqwkhu0
One other use:
LOAF is a simple extension to email that lets you append your entire
address book to outgoing mail message without compromising your privacy.
Correspondents can use this information to prioritize their mail, and
learn more about their social networks. The LOAF home page is at
http://loaf.cantbedone.org.
More information about the Haskell-Cafe
mailing list