DRBG pre-announce and a discussion on RNG/Crypto infrastructure

Wed Jun 16 01:19:01 EDT 2010

All,
A new pair of typeclasses are below and in the repo [1].  Mostly this
is just me tweaking the Hash class and updating DRBG [2] to use the
new interface (tests not yet run I might have broken something, but
that wouldn't be the interfaces fault).  The classes include:

Note L. is ByteString.Lazy while B. is strict bytestrings.

=====
class (Binary d, Serialize d)
    => Hash ctx d | d -> ctx, ctx -> d where
  outputLength  :: Tagged d BitLength
  blockLength   :: Tagged d BitLength
  initialCtx    :: ctx
  updateCtx     :: ctx -> B.ByteString -> ctx
  finalize      :: ctx -> B.ByteString -> d
  strength      :: Tagged d BitLength
=====

I was considering having a 'needAlignment :: Tagged d ByteLength'
value for Hashes.  The reasoning was [3].

====
class BlockCipher k where
  blockSize      :: Tagged k BitLength
  encryptBlock        :: k -> B.ByteString -> B.ByteString
  decryptBlock        :: k -> B.ByteString -> B.ByteString
  buildKey       :: B.ByteString -> Maybe k
  keyLength      :: k -> BitLength      -- ^ keyLength may inspect its
argument to return the length
====

Other helper functions exist that build on the class primitives to
provide operations such as hash and hash'.

The TODO list includes:

- Look harder at the other classes including "BlockCipher",
"AsymCipher", "StreamCipher"
- example instances of each class
- example uses of each class
- Collecting tests, building a test framework
- Move "for" and (.::.) into the Tagged library (?)
- Decide what we want on padding
- Decide what we want with crypto-related items that aren't directly a
cipher or hash (ex: pbkdf2).
- Decide on package name (replace "Crypto" or select a new name? Goes
with another recent threads' topic)
- Implement modes

Individual responses:

Bas said:
> Why not use the Edward Kmett's 'tagged'[1] package for these methods? As in:
> outputLength :: Tagged d BitLength

Done.  I like it.

Adam Wick <awick at galois.com> wrote:
> Why two libraries instead of n+1? Wouldn't it make sense to just have
> one library (what you call "Crypto") define the interface  as one
> package, and then have a number of packages that implement that
> interface as a series of other modules?

It will start as just 1 (crypto) then I'm leaning toward targeting n+2
where n is the number of packages that have the desired interface and
testing (currently zero).  -Algs can simply re-export from alg
specific packages (i.e. is a meta package) when such package exists
and is maintained.  I feel there is value in a well supported
algorithm collection, namely uniform inclusion policy and maintenance;
this doesn't stop algorithm specific packages from targeting the
Crypto API, that is the whole point of having Crypto and Crypto-Algs
separate.

>> Enumerating principles I support:
>> * Lazy ByteStrings should be used for all input data
>>
> Really? Why? I've actually been considering going back to both the SHA
> and RSA packages and redoing them using strict ByteStrings. Recent
> experience has suggested that strict ByteStrings are almost always what
> I want, and building a fast lazy ByteString interface over strict
> ByteString routines seems like a pretty trivial task.

It was this comment that caused me to realize the class interface
should all be strict bytestrings performing component operations
(matches crypto definitions better anyway) and have helper functions
that use these component functions to provide strict and lazy
operations.  For example, the Hash class defines initialContext,
update, and finalize while helper functions use these to provide hash
and hash'.  Such design was already the idea behind cipher, just
didn't consciously realize it.

Cheers,
Thomas

[1] http://code.haskell.org/~tommd/crypto/
[2] http://code.haskell.org/~tommd/DRBG/
[3] Reasoning behind the currently excluded 'neededAlignment' value

The 'needAlignment' value is the byte alignment assumed by the Hash
for input data (presumably 1, 2, 4, or 8).  The 'hash' helper function
(or any users of 'finialize' or 'update') checks the alignment of the
input data - if it is not aligned then it's copied into a newly
allocated bytestring, allowing the implementation to assume 64 bit
alignment (new allocation rule in Haskell 2010).  Implementations that
use alignment-safe word extraction (ex: Cereal) can just specify 1
while other implementations (ex: for performance reasons pureMD5 used
to use an unsafePerformIO ... peekElem ...) can request proper
alignment.

But this is a hack job, we need to get a high performance way to
extract unboxed words from a bytestring that will fall back to a safe
method when the alignment isn't correct (Cereal is measurably lower
performance than unsafePerformIO with peekElem).