[Haskell-cafe] Using Data.Binary for compression

Don Stewart dons at galois.com
Thu Nov 15 01:21:32 EST 2007


stefanor:
> On Wed, Nov 14, 2007 at 10:03:52PM -0800, Chad Scherrer wrote:
> > Hi,
> > 
> > I'd like to be able to use Data.Binary (or similar) for compression.
> > Say I have an abstract type Symbol, and for each value of Symbol I
> > have a representation in terms of some number of bits. For compression
> > to be efficient, commonly-used Symbols should have very short
> > representations, while less common ones can be longer.
> ...
> > (1) Am I reinventing the wheel? I haven't seen anything like this, but
> > it would be nice to be a bit more certain.
> > 
> > (2) This seems like it will work ok, but the feel is not as clean as
> > the current Data.Binary interface. Is there something I'm missing that
> > might make it easier to integrate this?
> > 
> > (3) Right now this is just proof of concept, but eventually I'd like
> > to do some performance tuning, and it would be nice to have a
> > representation that's amenable to this. Any thoughts on speeding this
> > up while keeping the interface reasonably clean would be much
> > appreciated.
> 
> Almost all 'real users' just use Codec.Compression.GZip.  It's very
> fast, very compositional, and (perhaps suprisingly) almost as effective
> as application-specific schemes.

I was about to say the same thing. So so much simpler to use Duncan's
carefully written zlib binding,

    import Data.Binary
    import Codec.Compression.GZip
    import qualified Data.ByteString.Lazy as L

    main = L.writeFile "log.gz" . compress . encode $ [1..10::Int]

Simple, purely functional, fast.

-- Don


More information about the Haskell-Cafe mailing list