[Haskell-cafe] blaze-builder and FlexibleInstances in code that aims to become part of the Haskell platform

Fri May 20 09:38:01 CEST 2011

2011/5/19 Antoine Latter <aslatter at gmail.com>:
> On Thu, May 19, 2011 at 3:06 PM, Simon Meier <iridcode at gmail.com> wrote:
>
>> The core problem that drove me towards this solution is the abundance
>> of different IntX and WordX types. Each of them requiring a separate
>> Write for big-endian, little-endian, host-endian, lower-case-hex, and
>> uper-case-hex encodings; i.e., currently, there are
>>
>> int8BE   :: Write Int8
>> int16BE :: Write Int16
>> int32BE :: Write Int32
>> ...
>> hexLowerInt8 :: Write Int8
>> ...
>>
>> and so on. As you can see
>> (http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/Blaze-ByteString-Builder-Word.html)
>> this approach clutters the public API quite a bit. Hence, I'm thinking
>> of using a separate type-class for each encoding; i.e.,
>>
>
> If Johan's work on Data.Binary and rewrite rules works out, then it
> would cut the exposed API in half, which helps.
>
> We could then use the module and package system to further keep the
> API clean, with builders which output a specific encoding could live
> in separate modules. This could also keep the names of the functions
> short, as well.
>
> That would require coming up with logical divisions for the functions
> you're creating, and I don't understand the big picture enough to help
> with that.
>
>>  class BigEndian a where
>>    bigEndian :: Write a
>>
>> This collapses the big-endian encodings of all 10 bounded-size (signed
>> and unsigned) integer types under a single name with a well-defined
>> semantics. Moreover, it's standard Haskell 98. For the hex-encodings,
>> I'm thinking about providing type-classes
>>
>>  class HexLower a where
>>    hexLower :: Write a
>>
>>  class HexLowerNoLead a where
>>    hexLowerNoLead :: Write a
>>
>>  ...
>>
>> for ASCII encoding and each of the standard Unicode encodings in a
>> separate module. The user can then select the right ones using
>> qualified imports. In most cases, he won't even need qualification, as
>> mixing different character encodings is seldomly used.
>>
>
> I think we may be at cross-purposes here, and might not even be
> discussing the same thing - I would imagine that any sort of 'Builder'
> type included in the bytestring package would only provide the core
> combinators for packing data into low-level binary formats, so
> discussions about text encoding issues, converting to hexidecimal and
> Html escaping are going above my head.
>
> This seems like what the 'text' package was written for - to separate
> out the construction of textual data from choosing its encoding.
>
> Are there use-cases where the 'text' package is too slow for this sort
> of approach?
>
> Take care,
> Antoine
>
>> What do you think about such an interface? Is there another catch
>> hidden, I'm not seeing? BTW, note that Writes are a pure compile time
>> abstraction and are thought to be completely inlined. In typical, uses
>> cases there's no efficiency overhead stemming from these typeclasses.
>>
>> best regards,
>> Simon
>>
>

Yes, for example using the current 'text' package is sup-optimal for
dyamically generating UTF-8 encoded HTML pages. The job is simple: the
data which is originally held in standard Haskell types (e.g., String)
needs to be HTML escaped and UTF-8 encoded and sprinkled with tags in
between.

For blaze-html using blaze-builder the cost for a tag is a memcpy of
the corresponding tag and the cost for a single character is one call
to the nested case statement determining if the char needs to be
escaped (one memcpy of its escaped version) or what bytes need to be
written for UTF-8 encoding the char. This solution works with a single
output buffer.

For a solution using the text library the cost of creating the
underlying UTF-16 array is similar to the cost for blaze-builder.
However, you now also need to UTF-8 encode the UTF-16 array. This
costs you more than double, as now you also have to inspect every
character of every tag. For ~50% of your data you suddenly have to
spend a lot more effort!

I agree that the text library is a good choice for representing
Unicode data of an application. However, for high-performance
applications it pays off to think of its output in binary form and
exploit the offered shortcuts. That's where blaze-builder and the like
come in.

thanks for your input,
Simon