add utf8-string in haskell platform

Duncan Coutts duncan.coutts at worc.ox.ac.uk
Fri May 15 14:37:02 EDT 2009


On Fri, 2009-05-15 at 12:02 +0100, Simon Marlow wrote:
> On 15/05/2009 03:07, Bryan O'Sullivan wrote:
> > On Thu, May 14, 2009 at 4:23 PM, Simon Michael <simon at joyful.com
> > <mailto:simon at joyful.com>> wrote:
> >
> >     I'd like to request that utf8-string be added to the haskell
> >     platform, so that HP users can work with non-ascii text.
> >
> >
> > I'd rather this wasn't added. It's an acceptable crutch for the short
> > term, but we shouldn't be using String for text manipulation, and
> > bundling utf8-string implicitly blesses that approach. The text library
> > needs a few weeks of polish and some more testing work for QA, but it'll
> > be the right answer well before the end of this year.
> 
> We ought to think about the interaction between text (and bytestring) 
> and the new Unicode IO library.

Yes absolutely. We should (re-)design the IO functions for text and
bytestring in parallel with the new IO stuff.

> What does text have in the way of IO operations?

It doesn't have any at the moment. The plan I think is to have it work
with the Unicode stuff in the new IO system.

So yes, we designing System.IO properly and with consideration for types
like Text and ByteString and not standardising on interim solutions like
utf8-string.

> I've been wondering about what bytestring's hGetLine should do.

ByteString is always binary so hGetLine for ByteString only makes sense
on binary handles. Going via any Unicode decoding would be wrong. When
we consider ByteString via the Char8 view it's still binary but we
assume there is some ASCII mixed in so we can look for the ASCII
encodings of '\n' etc.

On the other hand hGetLine for the String or Text type only makes sense
for text handles where we're decoding and translating into Unicode.

Duncan



More information about the Libraries mailing list