[Haskell-cafe] blaze-builder and FlexibleInstances in code that aims to become part of the Haskell platform

Johan Tibell johan.tibell at gmail.com
Fri May 20 17:02:54 CEST 2011


Hi Simon,

On Thu, May 19, 2011 at 10:46 PM, Simon Meier <iridcode at gmail.com> wrote:
>> Write achieves this separation, but it has some costs which I'm not
>> entirely comfortable with.
>>
>> First, it leads to lots of API duplication. For every type (e.g. Word)
>> we want to be able serialize we have two morally identical functions
>>
>>    writeWordhost :: Word -> Write
>>    fromWordhost :: Word -> Builder
>>
>> in the API, where the latter simply calls the former and does some
>> additional "wrapping".
>
> Yes, I agree with this duplication. I'll explain below what we gain
> from it. Note that I factored out the whole Write stuff into its own
> library (system-io-write) for the bytestring integration. Therefore,
> an end-user of bytestring will only see the Builder versions except
> he's doing more low-level stuff to gain some extra performance.

There are (at least) two cases where I think the simple Builder API
must perform well for it to be usable on its own: simple loops and
sequential writes. To be specific, if the following two cases don't
compile into near optimal code, there's a compiler bug we should fix.
First, a simple loop:

    f :: [Word8] -> Builder
    f [] = mempty
    f (x:xs) = singleton x `mappend` xs

This code is already quite low level, there should be enough
information here for the compiler to emit a simple loop with one
buffer bounds check per iteration. Second, a bunch of sequential
writes:

    g :: Word8 -> Word8 -> Word8 -> Word8 -> Builder
    g a b c d = singleton `mappend` (b `mappend` (c `mappend` d))

This ought to compile to a single bounds check followed by for memory writes.

The user shouldn't have to get more low-level than this in these
simple examples. Today this is currently only true for the second
example, which we can solve using rewrite rules. The first example
doesn't work due to the GHC compiler bug I mentioned.


>> Simon, is the reason for this duplication this comment on top of
>> Blaze.ByteString.Builder.Word?
>>
>>    "<snip>"
>
> That's one of the reasons, but not the main one. The core reason is
> that Write's provide
> an interface between implementors of the low-level bit-twiddling
> required to efficiently implement encodings like UTF-8 or Base16 and
> the providers of efficient traversal functions through (streams of)
> Haskell values. For simple traversals like
>
>   fromWrite          :: Write a -> a -> Builder
>   fromWriteList      :: Write a -> [a] -> Builder
>   fromWriteUnfoldr   :: Write b -> (a -> Maybe (b, a)) -> a -> Builder
>
> there might be the option that GHC is clever enough and can find the
> efficient loop. However, for more complicated functions like
>
>   mapWriteByteString :: Write Word8 -> S.ByteString -> Builder
>
> That certainly isn't the case. I'm using quite a few tricks there [3]
> to enable a tight inner loop with few live variables.

Right. So this argues for having an escape hatch, and I agree we
should have one. Write at writeAtMost are both such escape hatches and
I believe them to equal in expressiveness. This shouldn't come as a
surprise as Write is writeAtMost with one argument reified into into a
constructor field:

    writeAtMost :: Int -> (Ptr Word8 -> IO Int) -> IO ()
    data Write = {-# UNPACK #-} !Int (Ptr Word8 -> IO (Ptr Word8))

(That the second argument of writeAtMost is an Int instead of a Ptr
Word8 as in Write is an unimportant difference.)

There are some operational differences.

* The argument to Write can be inspected at runtime, while the
argument to writeAtMost can only be inspected at compile time (by a
rewrite rule).

* Write might exist at runtime, if it's allocation site cannot be seen
by its use site, which hard to guarantee in general (it requires
serious staring at Core). This is not the case for writeAtMost, unless
it's partially applied.

* The second field of Write is lazy. I'm not sure what, if any,
implications this might have for how GHC optimizes the code.

> In my opinion, Writes and Builders have different use-cases and
> different semantics. Providing a type modeling Writes makes therefore
> sense to me. Moreover, note that Writes are built as a compile time
> abstraction. All their definitions are intended to be completely
> inlined and care is taken that the inliner also does so. Therefore,
> they incur no runtime cost.

This is up to the user of the Write abstraction to ensure, as any
function that takes a Write as an argument must have the correct
INLINE incantations applied to make this happen.

Cheers,
Johan



More information about the Haskell-Cafe mailing list