[Haskell-cafe] Re: cryptohash and an incremental API

Thomas DuBuisson thomas.dubuisson at gmail.com
Wed Jul 14 14:43:45 EDT 2010

Vincent said:
> couple of comments around the hashes interface:
>
> * updateCtx works on blockLength, instead of working on arbitrary size...

So for performance reasons you seem to prefer Semantics 1.2?

"""
1.2 Multiple of blockSize bytes
Implementations are encouraged to consume data (continue updating,
encrypting, or decrypting) until there is less than blockSize bits
available.
"""

Also, I'll amend 1.2 and say the hashUpdate/encrypt/decrypt functions
should only consume n * blockSize bytes, tracking the remainder will
be done at the higher level.  Also, the higher level default
implementations should only pass n * blocksize inputs to these
functions.

I can see how that's reasonable and am strongly considering using

> * hash is a generic operation based on the class Hash. In my case, it improve
> performance by not running the pure init/update/finalize exposed, but use the hidden
> impure function. I realized yesterday it's not as much as i though since i had
> a bug in my benchmark, but it's still there (100ms for 500mb of data).

Humm, 0.2 sections  / GB is significant so again I can be swayed - it
isn't like I can't have a default definition of hash (and others) when
its part of the class instance.

> * Why is the digest of a specific type ? I like representing different
> things with different types, but i'm not sure what do you gain with digests
> though.

This I am less flexible on.  My thought on how people will use this
library is centered around the instantiation of classes on the keys
used or resulting digests.  Anyone wanting ByteString results can
simply use Data.[Serialize,Binary].encode.

Here is a user getting a sha256 hash:
let h = hash contents :: SHA256

or the type could be implicit due to context (not shown):
let h = hash contents

> * is strength really useful in the Hash class ? it might be accurate when the
> thing get implemented, but i'm not sure what would happens over time, and flaws
> are discovered. would people actually updates it ?

Will people actually update it?  I hope so but if they don't are we
really worse off than not having any strength numbers?  People who
care about strength will likely keep track of the algorithms on which
they depend.  I added strength largely because the Hash class came
from DRBG (NIST SP 800-90) and that needed strength values.

If we don't have strength then applications like DRBG need a way to
know which algorithm each data type represents then to look up that
algorithm their its own table of algorithm strength - very messy.  I'd
imaging crypto-api would have to look something like:

\begin{code}
data HashAlgorithm = MD5 | SHA1 | SHA256 | SHA512 | ...

class Hash d c | d -> c, c -> d where
...
algorithm :: Tagged d HashAlgorithm
...
\end{code}

I don't consider this a win - crypto-api now enumerating all hash
algorithms wanting Hash instances.

> The blockCipher should exposes the chaining modes as overridable typeclass
> functions, with default generic implementations that use encryptBlocks. For
> example the haskell AES package has different C implementations for each
> chaining modes (e.g. cbc, ebc), and i suspect that using a generic chaining
> implementation would slow things down.

As with "hash" being part of the hash typeclass, I don't have a strong
objection here.  It allows particular implementations to be slightly
higher performance and does not preclude default definitions.  This is
rather messier than I wanted, but the reasoning seems sound.

encryptBlocksCBC :: k -> ByteString -> (k, ByteString)
decryptBlocksCBC :: k -> ByteString -> (k, ByteString)

These I do object to.  The key does not change as the CBC algorithm
progresses, but contextual information does.  My initial mode
implementations have types like:

cbc :: (BlockCipher k) => k -> IV k -> ByteString -> (ByteString, IV k)

In other words, initialization vectors are explicit and separate from
the key.  The type parameter on IV allows us to build an IV of proper
size, something like:

buildIV :: (BlockCipher k, MonadRandom m) => m (IV k)

and it is always true that
iv :: IV k
iv <- buildIV
B.length (encode iv) == blockSize for (undefined :: k)

> and my last comment, is that i don't understand the streamcipher interface
> you're proposing.  I've got a (inefficient) RC4 implementation that has this
> interface:
>
> stream :: Ctx -> B.ByteString -> (Ctx, B.ByteString)
> streamlazy :: Ctx -> L.ByteString -> (Ctx, L.ByteString)

My interface was just a quick hack with me understanding it would
likely change -  I didn't know there was a Haskell RC4 binding or