[Haskell-cafe] MD5?
Andrew Coppin
andrewcoppin at btinternet.com
Sat Nov 17 09:45:29 EST 2007
Neil Mitchell wrote:
> Hi
>
>
>> The MD5SUM.EXE file I have chokes if you ask it to hash a file in
>> another directory. It will hash from stdin, or from a file in the
>> current directory, but point-blank refuses to hash anything else.
>>
>
> Try http://www.cs.york.ac.uk/fp/yhc/dependencies/UnxUtils.zip - that
> has an MD5SUM program in it that seems to work fine on things in
> different directories. It also has many other great utilities in it.
>
Negative. It gives strange output if the pathname contains any
backslashes. (Each backslash appears twice, and an additional backslash
appears just before the hash value. Very odd...)
I spent a while playing with Google, and found many, many
implementations of MD5. Every single one of them did *something* strange
under certain conditions. Most frustrating! Well anyway, I eventually
settled on a program MD5DEEP.EXE, which seems to work just about well
enough to be useful.
> I'm trying to imagine what mistake the authors of your version of
> MD5SUM must have made to screw up files in different directories, but
> it eludes me...
>
It seems typically Unix tools are compiled for Windows with the aid of a
Unix emulator. These often do all sorts of strange path munging to make
Windows look like Unix. That's probably the source of the problem...
BTW, while I'm here... I sat down and wrote my own MD5 implementation.
It's now 95% working. (The padding algorithm goes wrong for certain
message lengths.) I doubt it'll ever be fast, but I wanted to see how
hard it would be to implement. The hard part, ridiculously enough,
wasn't MD5 itself. It's all the datatype conversions. Nowhere in the
Haskell libraries can I find any of these functions:
pack8into16 :: [Word8] -> Word16
pack8into32 :: [Word8] -> Word32
unpack16into8 :: Word16 -> [Word8]
unpack32into8 :: Word32 -> [Word8]
pack8into16s :: [Word8] -> [Word16]
pack8into32s :: [Word8] -> [Word32]
etc.
I had to write all these myself, by hand, and then check that I got
everything the right way round and so forth. (And every now and then I
find an edge case where these functions go wrong.) Of course, on top of
that, MD5 uses something really stupid called "little endian integers".
In other words, to interpret the data, you have to read it partially
backwards, partially forwards. Really awkward to get right!
But, after a few hours last night and a few more this morning, I was
able to get the main program to work properly. If I can just straighten
out the message padding code, I'll be all set... Then I can see about
measuring just how slow it is. :-}
Most amusing moment: Trying to run the GHC debugger, and then realising
that you have to actually install the new version of GHC first...
More information about the Haskell-Cafe
mailing list