HTTP and character encodings

Christian Maeder Christian.Maeder at dfki.de
Thu Sep 13 10:21:51 CEST 2012


Am 12.09.2012 23:57, schrieb Ganesh Sittampalam:
> On 12/09/2012 11:09, Christian Maeder wrote:
>
>> My main use-case is simpleHTTP that is bound to the String instance,
>> currently. There are no such short-cuts for byte-strings, are there?
>
> That's a good point. I guess I would make simpleHTTP overloaded while I
> was making breaking changes anyway.

Ah, I thought about something like "simpleByteStringHTTP".

>> I'ld suggest to make a proper byte-string interface first
>
> What do you mean by "proper"? Unfortunately I don't really have time to
> do any substantial refactoring in the near future.
>
> Given lots of time now, I'd immediately make high-level and low-level
> interfaces with encoding only handled in the high-level one.
>
>> and then deprecate the String stuff.
>
> Is it possible to deprecate an instance?

I believe, no. So forget deprecation (just document it) but consider to 
remain backward compatible.

> I could perhaps instead provide an escape hatch with a newtype like
> UnsafeChar8String or something, either temporarily or permanently.
>
>> (before calling Char8.pack, strings could be checked or filtered for
>> "isAscii")
>
> The problem is more on the download side; if it's a wide encoding like
> UTF-16, even 7-bit cleanliness isn't enough to make Char8.unpack safe.

Just to make the string instance work, it is enough to ignore encoding 
and return only ascii bytes as chars or change bytes 128--255 to a 
replacement ascii char (i.e. '?').

For proper encodings other functions or (text) instances must be used.

> On the upload side, automatically using UTF-8 would probably be good enough.
>
> Cheers,
>
> Ganesh
>
>




More information about the Libraries mailing list