[Haskell-cafe] cryptographic hash functions in darcs (re:
announcing darcs 2.0.0pre3)
zooko at zooko.com
Thu Jan 24 15:15:31 EST 2008
On Jan 24, 2008, at 11:32 AM, David Roundy wrote:
> All data in the hashed format is hashed. Darcs doesn't implement any
> checking of signatures, but you could (relatively) easily do so by
> hand. Just sign _darcs/hashed_inventory, and if the signature is
> valid and the repository is consistent (which darcs automatically
> checks for any portion of the repository that it accesses), then the
> repository hasn't been tampered with (since it was signed, anyhow).
> As far as what the guarantee is, all contents of the repository
> (except _darcs/prefs/ and of course the working directory) are
> accessed by hashes stored in that one file.
So, let me see if I understand the issues here.
1. Darcs needs to have short identifiers for a patch. Is it
required, or merely tolerable that the same patch (contents and
metadata) yield the same identifier? Or is it a bug, in which the
independent generation of an identical patch will result in undefined
behavior? Let's call this property "patch id determined by patch".
2. It would be nice, but isn't currently used, if one could rely on
the property that for a given patch id, nobody can come up with
another patch that has the same id. This would be nice because in
the future we might use this to securely identify patches and entire
repositories. This property is called "second pre-image resistance".
3. Likewise, it would be nice if it were impossible for someone to
come up with two different patches that had the same patch id. This
is called "collision resistance".
4. Someday, someone might want to expose their patch ids without
also thus exposing information about their patches. This is called
5. Inasmuch as anyone might want to rely on any of these properties,
they might want to do so years down the road. They might want to
take darcs patches or darcs repositories that were generated in 2008,
and rely on these properties in 2015 or even later.
Now it is important to note that SHA-1 does *not* have collision
resistance today, and there is reason to suspect that it doesn't have
other security properties either. The more years into the future we
look, the less likely it is that the user will be willing to rely on
Let's enumerate a few options for what darcs can do now:
1. Use randomly generated patch identifiers. This means that the
identifiers don't have any of the properties described, but they are
easy to implement, fast to produce (practically instantaneous), and
at least we know that they don't have these properties so we don't
accidentally rely on them.
2. Use a CRC like ZFS does. This means that identifiers have "patch
id determined by patch", but don't have the other properties. Again,
this is easy to implement, fast to compute (let's say around 1 GiB/
sec), and has the virtue of not fooling us into thinking we have a
security guarantee that we don't.
3. Use MD5. This is easy to implement (because MD5 is widely used),
fast to compute (370 MiB/sec), and has the dubious virtue of being
mostly well understood as being insecure.
4. Use SHA-1. This is easy to implement (because SHA-1 is widely
used), a bit slower to compute (220 MiB/sec), and has the regrettable
misfeature of being widely misunderstood (e.g. by Linus) as being
5. Use SHA-256. This is hard to implement (I'm assuming there
aren't already Haskell bindings for it), even slower to compute (110
MiB/sec), but it is the first of these options has a chance of
allowing people in the future to rely on the security properties.
6. Use Tiger. This is hard to implement (I'm assuming there aren't
already Haskell bindings for it), relatively fast (320 MiB/sec), and
might or might not offer security ten or twenty years from now.
I must say that option 4 is the worst of the bunch. Except for the
implementation difficulty (of which I know little), any other option
would provide a better trade-off.
More information about the Haskell-Cafe