runtime fusion for Data.ByteString.cons ?
Claus Reinke
claus.reinke at talk21.com
Sun Nov 19 12:54:14 EST 2006
I noticed that ByteString is drastically slower than String if I use
cons a lot. according to the source, that is expected because of
the memcpy for the second parameter.
but it seems to me that construction should be able to play the
dual trick to deconstruction (which does not copy the tail, but
returns an indirection into the original list).
roughly speaking: when constructing a ByteString via cons h t,
we know the length of the result (1+length t), and instead of
creating t in a separate storage location then copying it over to
the result, we could try to create it inplace. or rather, we could
delay construction of ByteStrings until we know whether we
need to allocate fresh memory for them or whether they can
be created by filling some context.
this rough idea runs into a couple of issues in practice:
- first of all, it seems to be a runtime fusion (unless we do whole
program optimization, simplifier rewrites won't do, although
unfolding recursions might still expose some opportunities for
a static variant of this fusion)
- if t is shared, we'd like to redirect these shared references to
an indirection into the tail of cons h t
ignoring the second point for now, if we look into the source
for cons, we find something like:
cons :: Word8 -> ByteString -> ByteString
cons c (PS x s l) = unsafeCreate (l+1) $ \p-> ... poke p c; memcpy (p+1)..
now, let's imagine a pre-ByteString as a not-yet allocated ByteString:
data PreBS = PrePS l f
so that
createBS :: PreBS -> ByteString
createBS (PrePS l f) = unsafeCreate l f
consPre :: Word8 -> ByteString -> PrePS
consPre c (PS x s l) = PrePS (l+1) $ \p-> ... poke p c; memcpy (p+1)..
cons :: Word8 -> ByteString -> ByteString
cons c bs = createBS (consPre c bs)
then we could express our fusion as
consPre c (create (PrePS l f)) = PrePS (l+1) $ \p-> ... poke p c; f (p+1)..
in other words, in a typical map-like recursion scheme, we do not create,
copy & release the tails recursively, but delay creation until we know where
to embed our ByteString, at which point we do a sequence of pokes, no
memcpy. but note that we are matching on a function application here, even
though we are not in a simplifier rule, so this doesn't work as it stands..
this is as far as I got so far.. (see attached example for a manual use of
pre-ByteStrings to speed up a map). now my questions for you:-)
- does this make sense?
- can it be made to work?
as we probably cannot redirect shared references to t in (cons h t), can
we identify the situations where t is not shared, as in a map, or can we
just ignore any shared references (they will point to a "create (PrePS ..)"
combination, and should just keep working, since we bypass those
combinations instead of rewriting them)?
a tentative idea would be to overload create so that it produces a proper,
allocated ByteString where such is expected, but can also just pass through
the PreBS where the context can handle it?
class Create r where
create :: PreBS -> r
instance Create ByteString where
create (PrePS l f) = unsafeCreate l f
instance Create PreBS where
create = id
consPre :: Word8 -> ByteString -> PrePS
consPre c (PS x s l) = PrePS (l+1) $ \p-> ... poke p c; memcpy (p+1)..
cons :: Word8 -> ByteString -> ByteString
cons c bs = create (consPre c bs)
consPre' :: Word8 -> PrePS -> PrePS
consPre' c (PrePS l f) = PrePS (l+1) $ \p-> ... poke p c; f (p+1)..
so that "consPre c (create (PrePS ..))" works as normal, while
"consPre' c (create (PrePS ..))" uses the fusion path.
or something like that;-)
Claus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BScons.hs
Type: application/octet-stream
Size: 2066 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20061119/9af49e0c/BScons.obj
More information about the Glasgow-haskell-users
mailing list