[Haskell-cafe] Re: cryptohash and an incremental API

Wed Jul 14 04:22:14 EDT 2010

On Mon, Jul 12, 2010 at 02:52:10PM -0700, Thomas M. DuBuisson wrote:
> I've been working on a new crypto library specifically to provide a
> unified API for packages implementing cryptographic algorithms.  You can
> see the discussions on libraries at haskell.org [1] [2].  Please feel free
> to take a look, comment, contribute, and hopefully move to the
> interface.  I should be finishing up BlockCipher modes and adding hash
> tests soon.

Hi Thomas,

first, I think that's a great efforts to standardize the crypto API !

couple of comments around the hashes interface:

* updateCtx works on blockLength, instead of working on arbitrary size: while
this does represent what the underlaying algorithm do, letting the algorithm
implementation process any size is, I think, better. chunking the bytestring
might have a significant cost (a rope based implementation would not suffer
this), and in my case, processing as much as possible at each update call,
prevent from suffering from the marshalling/unmarshalling cost of the mutable
state.

* hash is a generic operation based on the class Hash. In my case, it improve 
performance by not running the pure init/update/finalize exposed, but use the hidden
impure function. I realized yesterday it's not as much as i though since i had
a bug in my benchmark, but it's still there (100ms for 500mb of data).

* Why is the digest of a specific type ? I like representing different
things with different types, but i'm not sure what do you gain with digests
though.

* is strength really useful in the Hash class ? it might be accurate when the
thing get implemented, but i'm not sure what would happens over time, and flaws
are discovered. would people actually updates it ?

The blockCipher should exposes the chaining modes as overridable typeclass
functions, with default generic implementations that use encryptBlocks. For
example the haskell AES package has different C implementations for each
chaining modes (e.g. cbc, ebc), and i suspect that using a generic chaining
implementation would slow things down.

what about something like:

-- each plaintext bytestring need to be a multiple of blockSize
class (Binary k, Serialize k) => BlockCipher k where
  blockSize        :: Tagged k BitLength
  encryptBlocks    :: k -> ByteString -> ByteString
  decryptBlocks    :: k -> ByteString -> ByteString
  encryptBlocksCBC :: k -> ByteString -> (k, ByteString)
  encryptBlocksCBC = genericCBC encryptBlocks
  decryptBlocksCBC :: k -> ByteString -> (k, ByteString)
  .. same for ebc, ...
  buildKey        :: ByteString -> Maybe k
  keyLength       :: k -> BitLength       -- ^ keyLength may inspect...

and my last comment, is that i don't understand the streamcipher interface
you're proposing.  I've got a (inefficient) RC4 implementation that has this
interface:

stream :: Ctx -> B.ByteString -> (Ctx, B.ByteString)
streamlazy :: Ctx -> L.ByteString -> (Ctx, L.ByteString)

I'm not sure how it would fit this interface (some kind of state monad ?):

encryptStream         :: k -> B.ByteString -> B.ByteString

I hope that's useful comments,
-- 
Vincent Hanquez