Proposal: Add the unordered-containers package and the hashable package to the Haskell Platform
Thomas Schilling
nominolo at googlemail.com
Wed Mar 20 18:02:30 CET 2013
To make this more precise the next version of hashable (say, 1.3)
would include this:
newtype SipHashed a = SipHashed a
class SipHashable a where
sipHashWithSalt :: Int -> a -> Int
instance SipHashable a => Hashable (SipHashed a) where
hashWithSalt salt (SipHashed x) = sipHashWithSalt salt x
Then all Hashable instances are taken from hashable-1.1. All Hashable
instances from hashable-1.2 are renamed to become instances of
SipHashable.
Alternatively, hashable-1.2 could be renamed to hashable-sip-1.0 or so.
Note that I do not propose to make this change now. We include
hashable-1.1 now, and the above will be the upgrade path to include
secure hashing.
On 20 March 2013 16:47, Thomas Schilling <nominolo at googlemail.com> wrote:
> OK. I think a reasonable approach would be the following:
>
> - add hashable-1.1 (ie., without SipHash) to the platform
> - later create a new release of hashable, that is fast by default and
> provides SipHash functionality via a newtype wrapper (or it could be a
> new package that defines the newtype and all the standard instances)
>
> What's important is that the default behaviour (fast vs. secure) won't
> change in a later version of the platform.
>
> We should also make it clear that hashable (even with siphash) does
> not aim to implement secure hashing. I.e., no replacement for proper
> HMACs, SHA1, etc.
>
>
> On 19 March 2013 16:51, Johan Tibell <johan.tibell at gmail.com> wrote:
>> Hi Thomas,
>>
>> On Tue, Mar 19, 2013 at 8:41 AM, Thomas Schilling <nominolo at googlemail.com>
>> wrote:
>>>
>>> On 19 March 2013 16:01, Johan Tibell <johan.tibell at gmail.com> wrote:
>>> >
>>> > http://trac.haskell.org/haskell-platform/wiki/Proposals/unordered-containers
>>>
>>> The links to the repos are wrong. It should be "tibbe" instead of
>>> "tibbel".
>>
>>
>> Fixed.
>>
>>>
>>> Bryan's recent change to change "hashable" to use SipHash is certainly
>>> the right default. There were some complaints about performance for
>>> use cases where security is not an issue. What are the options for
>>> users that wish to use a different hash function? According to the
>>> paper, SipHash is about 2x slower than CityHash.
>>
>>
>> 2x is *a lot*. 2x is about the performance difference between Map and
>> HashMap. Since the raison d'etre for HashMap is that it's faster than Map,
>> if we'd see a 2x slowdown in HashMap there would be little reason to use it.
>>
>> For example, 'delete' for HashMap ByteString got almost 2x slower with
>> hashable-1.2. Since 'delete' does more than just hashing, that means that
>> SipHash is quite a bit slower than the current (insecure) hash function.
>> Another example: with GHC 7.6.2 HashMap String is almost unusable slow (5x
>> slower than before). This is likely due to a GHC bug, but it's something we
>> need to investigate. At the moment I don't encourage people to upgrade to
>> hashable-1.2.0.5.
>>
>> The right way to go is probably to make this a user decision. Many
>> applications (e.g. data processing) has no need for the security guarantee
>> so paying for it makes little sense.
>>
>> Cheers,
>> Johan
>>
More information about the Libraries
mailing list