Haskell Platform Proposal: add the 'text' library

Wed Sep 8 05:56:01 EDT 2010

On 8 September 2010 05:53, Bryan O'Sullivan <bos at serpentine.com> wrote:
> On Tue, Sep 7, 2010 at 8:58 PM, Krasimir Angelov <kr.angelov at gmail.com>
> wrote:
>>
>> Exactly. But it is probably possible to make version of text which
>> with 6.10 uses some copy of the routines and with 6.12 uses the
>> routines in base.
>
> It might be possible, but I am not going to do it :-)

In the longer term I would also like to see these unified, but I don't
think it has to be done immediately. It will require more changes in
the TextEncoding stuff than in the text package. In particular the
TextEncoding will need to be changed to be pure, e.g. using the ST
monad rather than the IO monad as it uses currently. I hope that way,
the same encoding stuff can be used for IO handles and for pure
conversions and that it can perform well in both use cases.

>> For now at least the API should be made compatible with base.
>
> I'm afraid not. The TextEncoding type ties encoding and decoding together,
> when in pure code you need just one or the other. The TextEncoding design is
> fine for read/write Handles, where you may need both, but it does not make
> sense for pure code, where the current API provided by text is more
> appropriate.

I have to say I don't understand this. It's easy to use just one
direction of encode/decode. Are you saying there are encodings where
it only makes sense to implement one direction? Or are you saying that
writing decodeUtf8 :: ByteString -> Text is just that much nicer than
writing decode utf8 :: ByteString -> Text ?

Here is a possible solution: keep the current encodeFoo/decodeFoo in
Data.Text.Encoding. Later when we get a sensible reusable TextEncoding
abstraction (e.g. by pulling it out of GHC.IO.* and making it use ST
so it can be pure) then we add to Data.Text.Encoding:

encode :: TextEncoding -> Text -> ByteString
decode :: TextEncoding -> ByteString -> Text
decodeWith :: TextEncoding -> OnDecodeError -> ByteString -> Text

and internally redefine:

decodeUtf8 = decode utf8  -- or is it utf8_bom ?

Duncan