[Haskell-cafe] Abstraction leak
Andrew Coppin
andrewcoppin at btinternet.com
Fri Jun 29 14:39:28 EDT 2007
...and again today I found myself trying to do something that would be
very easy in an imperative language, but I cannot think of a single good
way of doing it in Haskell. Hopfully somebody can give me some hints.
I'm writing a whole bunch of data compression programs. In particular,
one of them was a Huffman encoder. The encoder scans the input file,
computes symbol frequencies, and builds a canonical Huffman table, which
it then dumps to file. The actual compressed data goes after that.
The encoder consists of two functions. One takes the Huffman table and
serializes it. The other does the actual encoding. The (++) operator
joins the two together before going to file. Getting the data back is
mildly more tricky. There's a decoder function that takes the complete
data stream, parses out the Huffman table, and returns a pair containing
the Huffman table and the remaining data. Then a second decoder function
actually does the Huffman decoding on this data.
So far, it's all pretty easy. The size of the Huffman table determins
how many bytes the decoder needs to read, so it's easy to figure out
where the table ends.
And then I had a brainwave. I altered the encoder so that the output
from the Huffman table serialize function codes through a simple RLE
encoder before going to file. Then a went and altered the decoder to...
uh... oops.
Now I have a problem. It's easy enough to pass the entire data stream
through an RLE decoder and feed that to the Huffman table deserialize
function, and it will give be back the table. But I now have *no clue*
where the table ends in the original stream!
If this was Java, you would write all these compression and
decompression stages as stream wrappers. So you wrap the raw input
stream with an RLE decoder, have the function read the Huffman table,
and then take off the RLE decoder and process the rest of the stream.
However, in Haskell, the reading and deserializing of the Huffman table
doesn't so anything detectable to the original data stream, so you can't
tell how much of it the RLE decoder actually processed. Um... help!
(What I did in the end was to write the size of the RLE encoded Huffman
table to the file and read it back in the decoder. But this shouldn't
actually be necessary...)
More information about the Haskell-Cafe
mailing list