[Haskell-cafe] Architecturally flawed
Andrew Coppin
andrewcoppin at btinternet.com
Thu Jul 10 14:35:21 EDT 2008
OK, so I just spent an entire day trying to write some code, and after
hours of struggling I have something that's semi-working but very ugly
and rather unreliable. My usual guideline for Haskell programming is "if
it seems hard, you're doing it wrong"...
After many hours of effort, I came up with these:
data Writer x
instance Monad Writer
run_writer :: Writer () -> ByteString
write1 :: Bool -> Writer ()
write8 :: Word8 -> Writer ()
write16 :: Word16 -> Writer ()
write32 :: Word32 -> Writer ()
write64 :: Word64 -> Writer ()
writeN :: Word8 -> Word32 -> Writer ()
data Reader x
instance Monad Reader
run_reader :: Reader x -> ByteString -> x
is_EOF :: Reader Bool
read1 :: Reader Bool
read8 :: Reader Word8
read16 :: Reader Word16
read32 :: Reader Word32
read64 :: Reader Word64
readN :: Word8 -> Reader Word32
Notice that, unlike the Binary package, all of these are bit-aligned
rather than byte-aligned. This is what took so much effort. I strived to
make each operation as efficient as possible - by performing as few
steps as possible. It would have been *far* easier to, say, implement
read8 by calling read1 eight times and then packing the bits back into a
byte. But that wouldn't be very fast. So actually read8 reads a pair of
bytes, bit-shifts them as required, and then ANDs them together. And so
on. (Note that the Writer monad uses Data.Binary.Builder (only) from the
Binary package. I assume this is more efficient...)
I even sat down and wrote a battery of QuickCheck properties because -
as you can imagine - there's a hell of a lot to go wrong here! After
many hours, I managed to get it to pass all tests...
Next I decided to write an LZW compressor. Here's what I came up with:
data EncodeLZW symbol
eLZW_start :: (Ord symbol) => EncodeLZW symbol
eLZW_encode :: (Ord symbol) => EncodeLZW symbol -> symbol -> Writer
(EncodeLZW symbol)
eLZW_end :: (Ord symbol) => EncodeLZW symbol -> Writer ()
data DecodeLZW symbol
dLZW_start :: DecodeLZW symbol
dLZW_decode :: DecodeLZW symbol -> Reader (DecodeLZW symbol, symbol)
So each step of the encoder (possibly) writes some bits, and also
returns a new encoder state. The decoder is similar. (At least one
program bug was caused by the wrong version of the state being used
somewhere!)
Then I tried to run the code, and oh look, there's a bug somewhere.
Emphasis "somewhere". At this point I have several KB of code and no
idea how in hell to test it. What I *want* to do is single-step through
the encoder algorithm, watch it write bits out and update state, and
then single-step through the decoder watching it read bits in and update
its state, etc. But I can't. All the encoder does is return a giant
Writer object which you can't look inside, only run. And likewise for
the decoder.
I could try GHC's new debugger. But my experiences with it so far have
shown that for all but the most trivial programs possible, it becomes
intractably difficult to figure out what the debugger is actually
showing you. I actually tried to debug a "normal" LZW implementation
once - one that didn't involve two highly convoluted custom monads and a
stateful execution model with manually threaded state. This is *not* my
idea of a fun time...
In a "normal" programming language, at this point you would sprinkle a
few print statements around so you can see the intermediate steps the
program is taking. But I can't. I'm in the wrong monad. Curses!
In the end, the only solution I could come up with was to design a
second, "hacked" version of the two monads. Instead of importing one
module, you import another that provides the same interface. However,
now Reader and Writer are aliases to IO. All write requests cause
pretty-printed binary to be sent to stdout, and all read requests pop up
a prompt for input from stdin. It worked reasonably well in that I could
now add the vitally necessary print statements, and see intermediate
values and so forth... It wasn't very pretty though.
At this point I decided that lugging 5 individual functions around was
silly, and I should write a class:
class Encoder e where
encoder_start :: e x
encode :: e x -> x -> Writer (e x)
encoder_end :: e x -> Writer ()
class Decoder d where
decoder_start :: d x
decode :: d x -> Reader (d x, x)
10 points to the first person to spot the fatal flaw in this plan...
Yep, that's right. The type system won't accept this. The reason? x
needs an Ord context.
If you turn on multi-parameter type classes *and* flexible instances,
apparently this works:
class Encoder e x where...
instance (Ord x) => Encoder EncodeLZW x where...
(I haven't tried but I *presume* it means what I think it does...)
Alternatively, you can avoid using dangerous type system extensions and
just write
class Encoder e where
encode :: (Symbol x) => e x -> x -> Writer (e x)
...
and make Symbol encompass everything any possible encoder could want.
(In reality, x is almost always Word8 or something. But I dislike loss
of generality just for the hell of it...)
So at least I got that part fixed. Heh. But now I find myself worrying
about yet *another* problem: is Writer lazy enough?
I mean, the idea is that you can write a program that lazily reads from
a file, pushes it through a Writer, and lazily writes the result back to
another file. The thing should chug along reasonably fast and use a
constant amount of RAM. But all this is only true of Writer is actually
lazy enough. I have a horrible feeling that all the complicated Origami
inside makes it too strict. And I have no way to check!
Actually, thinking about it, is Reader lazy enough? You call run_reader
and it hands over your data, but if that data is, say, a list, will
run_reader build the entire damn list before it'll hand it over? Or will
the monadic code by called as-needed to generate the list elements?
Obviously I desire the latter, but I'm really not sure what the actual
behaviour is...
In summary, I've spent several days coding, and I still don't have a
single algorithm working. I've spent all my time wrestling with the
mundane details of infrastructure rather than the interesting algorithms
I actually set out to implement. This makes me very sad. :-(
If anybody can think of a better set of abstractions or another way to
do debugging, I'd be interested to hear about it.
(This would all be so trivial in an OO language. Just make an Encoder
object that updates its own state internally and talks to a Source
object and a Destination object for its data...)
More information about the Haskell-Cafe
mailing list