[Haskell-cafe] blaze-builder and FlexibleInstances in code that aims to become part of the Haskell platform

Thu May 19 22:06:38 CEST 2011

Hi Antoine, thanks for your feedback.

2011/5/18 Antoine Latter <aslatter at gmail.com>:
> On Wed, May 18, 2011 at 12:32 PM, Simon Meier <iridcode at gmail.com> wrote:
>> Hello Haskell-Cafe,
>>
>
>
>
>>
>> There are many providers of Writes. Each bounded-length-encoding of a
>> standard Haskell value is likely to have a corresponding Write. For
>> example, encoding an Int32 as a big-endian, little-endian, and
>> host-endian byte-sequence is currently achieved with the following
>> three functions.
>>
>>  writeInt32BE :: Write Int32
>>  writeInt32LE :: Write Int32
>>  writeInt32HE :: Write Int32
>>
>> I would like to avoid naming all these encodings individually.
>> Especially, as the situation becomes worse for more elaborate
>> encodings like hexadecimal encodings. There, we encounter encodings
>> like the utf8-encoding of the hexadecimal-encoding with lower-case
>> letters of an Int32.
>>
>>  writeInt32HexLowerUtf8 :: Write Int32
>>
>> I really don't like that. Therefore, I'm thinking about the following
>> solution based on type-classes. We introduce a single typeclass
>>
>>  class Writable a where
>>      write :: Write a
>>
>> and use a bunch of newtypes to denote our encodings.
>>
>>  newtype Ascii7   a = Ascii7   { unAscii7   :: a }
>>  newtype Utf8     a = Utf8     { unUtf8     :: a }
>>  newtype HexUpper a = HexUpper { unHexUpper :: a }
>>  newtype HexLower a = HexLower { unHexLower :: a }
>>  ...
>>
>> Assuming FlexibleInstnaces, we can write encodings like the above
>> hex-encoding as instances
>>
>>  instance Write (Utf8 (HexLower Int32)) where
>>    write = ...
>>
>> This composes rather nicely and allows the implementations to exploit
>> special properties of the involved data. For example, if we also had a
>> HTML escaping marker
>>
>>  newtype Html     a = Html     { unHtml     :: a }
>>
>> Then, the instance
>>
>>  instance Write (Utf8 (HTML (HexLower Int32))) where
>>    write (Utf8 (HTML (HexLower i))) = write (Utf8 (HexLower i))
>
> If I were authoring the above code, I don't see why that code is any
> easier to write or easier to read than:
>
>> urf8HtmlHexLower i = utf8HexLower i
>
> And if I were using the encoding functions, I would much prefer to see:
>
>> urf8HtmlHexLower magicNumber
>
> In my code, instead of:
>
>> write $ Utf8 $ HTML $ HexLower magicNumber
>
> In addition, this would be difficult for me as a developer using the
> proposed library, because I would have no way to know which
> combinations of newtypes are valid from reading the haddocks.
>
> Maybe I'm missing something fundamental, but this approach seems more
> cumbersome to me as a library author (more boilerplate) and as the
> user of the library (less clarity in the docs and in the resultant
> code).

Hmm, that's a valid point you raise here. Especially, the
documentation issue bothers me.

The core problem that drove me towards this solution is the abundance
of different IntX and WordX types. Each of them requiring a separate
Write for big-endian, little-endian, host-endian, lower-case-hex, and
uper-case-hex encodings; i.e., currently, there are

int8BE   :: Write Int8
int16BE :: Write Int16
int32BE :: Write Int32
...
hexLowerInt8 :: Write Int8
...

and so on. As you can see
(http://hackage.haskell.org/packages/archive/blaze-builder/0.3.0.1/doc/html/Blaze-ByteString-Builder-Word.html)
this approach clutters the public API quite a bit. Hence, I'm thinking
of using a separate type-class for each encoding; i.e.,

  class BigEndian a where
    bigEndian :: Write a

This collapses the big-endian encodings of all 10 bounded-size (signed
and unsigned) integer types under a single name with a well-defined
semantics. Moreover, it's standard Haskell 98. For the hex-encodings,
I'm thinking about providing type-classes

  class HexLower a where
    hexLower :: Write a

  class HexLowerNoLead a where
    hexLowerNoLead :: Write a

  ...

for ASCII encoding and each of the standard Unicode encodings in a
separate module. The user can then select the right ones using
qualified imports. In most cases, he won't even need qualification, as
mixing different character encodings is seldomly used.

What do you think about such an interface? Is there another catch
hidden, I'm not seeing? BTW, note that Writes are a pure compile time
abstraction and are thought to be completely inlined. In typical, uses
cases there's no efficiency overhead stemming from these typeclasses.

best regards,
Simon