[Haskell-cafe] Attoparsec concatenating combinator

Simon Meier iridcode at gmail.com
Tue Jun 7 21:41:48 CEST 2011


2011/6/7 Bryan O'Sullivan <bos at serpentine.com>:
> On Tue, Jun 7, 2011 at 1:40 AM, Simon Meier <iridcode at gmail.com> wrote:
>>
>> Why would you need 'unsafePerformIO'. You can scrutinise the 'PS'
>> constructors of the slice without dropping down to IO.
>
> True. Oops :-)
>
>>
>> Using a Builder for concatentation makes sense, if you want to exploit
>> that copying a slice of the input array is cheaper right after it has
>> been inspected (its fully cached) than later (as it is done when
>> collecting slices in a list).
>
> When I've measured this in the past, I've found that it's often faster to
> accumulate a list and then run concat at the end than to use blaze-builder
> directly. That was certainly the case wit GHC 6.12; I haven't remeasured
> with 7.0. That's why you'll see that some places in the aeson JSON library
> use blaze-builder, while others manipulate bytestrings directly.

When creating a Builder that you run afterwards, then you essentially
create a list of bytestring concatenations as a closure. It makes
sense that you don't win with such an approach, as the concatenation
still only happens after then end of this list is reached. What you'd
need is to nest attoparsec's Parser monad with the 'Put' monad
provided in the blaze-builder internals [1]. However, I'm not yet sure
how to achieve this in a modular fashion. Perhaps, using iteratee's
the right way might provide a good answer, but that's just guesswork.

The performance characteristics of blaze-builder are such that for
short output sequences working with bytestrings directly is sometimes
favorable, as the setup overhead before the Builder can be executed
needs to be amortized. However, Builders work with a single pass
through the input data, while many bytestring operations require two
passes. For example, 'Data.ByteString.pack' first determines the
length of the input and only afterwards writes the data to memory.
Using blaze-builder's 'fromWord8s' requires only a single traversal
and is faster for lists of length >64 on my 32-bit Core 2 Duo machine.
I expect a similar effect for concatenating lists of strict
bytestrings.

[1] http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/Blaze-ByteString-Builder-Internal.html#g:3



More information about the Haskell-Cafe mailing list