[Haskell-cafe] cryptographic hash functions in darcs (re: announcing darcs 2.0.0pre3)

Thu Jan 24 15:15:31 EST 2008

On Jan 24, 2008, at 11:32 AM, David Roundy wrote:

> All data in the hashed format is hashed.  Darcs doesn't implement any
> checking of signatures, but you could (relatively) easily do so by
> hand.  Just sign _darcs/hashed_inventory, and if the signature is
> valid and the repository is consistent (which darcs automatically
> checks for any portion of the repository that it accesses), then the
> repository hasn't been tampered with (since it was signed, anyhow).
>
> As far as what the guarantee is, all contents of the repository
> (except _darcs/prefs/ and of course the working directory) are
> accessed by hashes stored in that one file.

So, let me see if I understand the issues here.

1.  Darcs needs to have short identifiers for a patch.  Is it  
required, or merely tolerable that the same patch (contents and  
metadata) yield the same identifier?  Or is it a bug, in which the  
independent generation of an identical patch will result in undefined  
behavior?  Let's call this property "patch id determined by patch".

2.  It would be nice, but isn't currently used, if one could rely on  
the property that for a given patch id, nobody can come up with  
another patch that has the same id.  This would be nice because in  
the future we might use this to securely identify patches and entire  
repositories.  This property is called "second pre-image resistance".

3.  Likewise, it would be nice if it were impossible for someone to  
come up with two different patches that had the same patch id.  This  
is called "collision resistance".

4.  Someday, someone might want to expose their patch ids without  
also thus exposing information about their patches.  This is called  
"pre-image resistance".

5.  Inasmuch as anyone might want to rely on any of these properties,  
they might want to do so years down the road.  They might want to  
take darcs patches or darcs repositories that were generated in 2008,  
and rely on these properties in 2015 or even later.

Now it is important to note that SHA-1 does *not* have collision  
resistance today, and there is reason to suspect that it doesn't have  
other security properties either.  The more years into the future we  
look, the less likely it is that the user will be willing to rely on  
SHA-1.

Let's enumerate a few options for what darcs can do now:

1.  Use randomly generated patch identifiers.  This means that the  
identifiers don't have any of the properties described, but they are  
easy to implement, fast to produce (practically instantaneous), and  
at least we know that they don't have these properties so we don't  
accidentally rely on them.

2.  Use a CRC like ZFS does.  This means that identifiers have "patch  
id determined by patch", but don't have the other properties.  Again,  
this is easy to implement, fast to compute (let's say around 1 GiB/ 
sec), and has the virtue of not fooling us into thinking we have a  
security guarantee that we don't.

3.  Use MD5.  This is easy to implement (because MD5 is widely used),  
fast to compute (370 MiB/sec), and has the dubious virtue of being  
mostly well understood as being insecure.

4.  Use SHA-1.  This is easy to implement (because SHA-1 is widely  
used), a bit slower to compute (220 MiB/sec), and has the regrettable  
misfeature of being widely misunderstood (e.g. by Linus) as being  
secure.

5.  Use SHA-256.  This is hard to implement (I'm assuming there  
aren't already Haskell bindings for it), even slower to compute (110  
MiB/sec), but it is the first of these options has a chance of  
allowing people in the future to rely on the security properties.

6.  Use Tiger.  This is hard to implement (I'm assuming there aren't  
already Haskell bindings for it), relatively fast (320 MiB/sec), and  
might or might not offer security ten or twenty years from now.

I must say that option 4 is the worst of the bunch.  Except for the  
implementation difficulty (of which I know little), any other option  
would provide a better trade-off.

Regards,

Zooko