[Haskell-cafe] Re: [Haskell] installing streams library

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Sun May 28 13:04:54 EDT 2006


On Sun, 2006-05-28 at 20:40 +0400, Bulat Ziganshin wrote:
> Hello Duncan,
> 
> Sunday, May 28, 2006, 3:05:53 PM, you wrote:
> 
> >> createMemBuf does exactly this :)
> 
> > One of the areas where we found that Data.ByteString.Lazy was performing
> > better than the ordinary Data.ByteString is cases like this where we do
> > not know beforehand how big the buffer will be.
> 
> i like your idea of using ByteString.Lazy to implement fast and
> easy-to-use i/o, although i don't think that speed will be in 10% of C :)

Actually Donald recently posted a benchmark (to the libraries mailing
list) of ByteString.Lazy  where we were getting within 6% of C. That was
on a 10GB file.

> ghc by itself generates code that is several times slower than
> gcc-generated and you can't do anything agaist this, except for
> implementing everything in C.

ByteString does use C code in places and ByteString.Lazy inherits the
benefits of that. Both modules also use array fusion to combine
pipelines of loops into a single loop. This has big performance
benefits. This is not something you can easily do in C. Using fusion
also mean one doesn't have to allocate so many buffers and some
transformations can work in-place on intermediate buffers.

You might be able to use similar fusion techniques for layering Streams.

> but, nevertheless, i think that this is a great idea - much faster
> than String-based hGetContents. it should help in numerous programs
> that need fast-and-dirty text processing, although it needs further
> development of library in order to implement for LazyByteString full
> String-like interface

Data.ByteString.Lazy implements more or less the same interface as
Data.ByteString which in turn implements almost the same interface as
Data.List. We're still working on improving the API.

> > If you have to use a single contiguous buffer then it involves guessing
> > and possible reallocation. With a 'chunked' representation like
> > ByteString.Lazy it's not a problem as we just allocate another chunk and
> > start to fill that.
> 
> > Obvious example include concat and getContents.
> 
> > Would the same make sense for a MemBuf stream? Why does it need to be a
> > single large buffer? Couldn't it be a list of buffers?
> 
> i also had this idea and it can be implemented in 1 day, i think (when
> someone will need this). but this is not for Jeremy, he need a
> contiguous buffer for interfacing with DBD.

The approach we're taking for Data.ByteString.Lazy is that when a
contiguous buffer is needed (eg for passing to foreign code) that we
convert it to an ordinary strict Data.ByteString.

> btw, it's better to use UArray instead of list 

Not if you want to generate or consume the stream lazily.

Duncan



More information about the Haskell-Cafe mailing list