[Haskell-cafe] Re: [ANN] bloomfilter 1.0 - Fast immutable and
mutable Bloom filters
Jim Snow
jsnow at cs.pdx.edu
Sat May 31 18:22:42 EDT 2008
Achim Schneider wrote:
> Aaron Denney <wnoise at ofb.net> wrote:
>
>
>> On 2008-05-30, Achim Schneider <barsoap at web.de> wrote:
>>
>>> Bryan O'Sullivan <bos at serpentine.com> wrote:
>>>
>>>
>>>> A Bloom filter is a probabilistic data
>>>> structure that provides a fast set membership querying capability.
>>>> It does not give false negatives, but has a tunable false positive
>>>> rate. (A false positive arises when the filter claims that an
>>>> element is present, but in fact it is not.)
>>>>
>>>>
>>> /me squints.
>>>
>>> Please tell me that this isn't reversible.
>>>
>> Tell me what you mean by "reversible". You can't, for instance,
>> extract the items in the set.
>>
>>
> I guess invertible would have been the right word, though it's still
> ambiguous.
>
> Turning it into something that does not give false positives, but has a
> tunable false negative rate.
>
> Without looking at the algorithm, I imagine it working somewhat like a
> hashtable, and this inversion would utterly destroy my intuition.
>
Without looking at the code to verify that this is how it has really
been implemented, a bloom filter is like a series of hash tables, where
the hash table entries are one bit. The bit is set if there is an item
that hashes to that value in the bloom filter. So, assuming a single
hash table where half the bits are set, there's a 50% false positive
rate and no false negatives when you do a membership test.
To reduce the false positives, we can add another hash table with a
different hash function. Assuming it also is half full, we can check if
an item is in both tables, and our false positive rate drops to 25%.
In practice, one might use something like 32 hash tables. This yields a
false positive rate of 1/(2^32). Their most obvious application is to
store the dictionary for a spell checker in a space-efficient way,
though I have a friend who wrote a paper on using them for router caches.
There was a google tech talk on bloom filters awhile ago:
http://www.youtube.com/watch?v=947gWqwkhu0
-jim
More information about the Haskell-Cafe
mailing list