[Haskell-cafe] Re: String vs ByteString

John Millikin jmillikin at gmail.com
Tue Aug 17 21:29:10 EDT 2010


On Tue, Aug 17, 2010 at 12:30, Donn Cave <donn at avvanta.com> wrote:
> If Haskell had the development resources to make something like this
> work, would it actually take the form of a Haskell-level type like
> that - data Text = (Encoding, ByteString)?  I mean, I know that's
> just a very clear and convenient way to express it for the purposes
> of the present discussion, and actual design is a little premature -
> ... but, I think you could argue that from the Haskell level,
> `Text' should be a single type, if the encoding differences aren't
> semantically interesting.

It should be possible to create a Ruby-style Text in Haskell, using
the existing Text API. The constructor would be something like << data
Text = Text !Encoding !ByteString >>, but there's no need to export
it. The only significant improvements, performance-wise, would be that
1) "encoding" text to its internal encoding would be O(1) and 2)
"decoding" text would only have to perform validation, instead of
validation+copy+stream fusion muck. Downside: lazy decoding makes it
very difficult to reason about failures, since even simple operations
like 'append' might fail if you try to append two texts with
mutually-incompatible characters.

In any case, I suspect getting Haskell itself to support non-Unicode
characters is much more difficult than writing an appropriate Text
type.


More information about the Haskell-Cafe mailing list