[Haskell-cafe] Encoding the encoding type of a string into its type

Felipe Lessa felipe.lessa at gmail.com
Fri Jun 11 10:34:32 EDT 2010


On Fri, Jun 11, 2010 at 04:17:25PM +0200, Christopher Done wrote:
> There are a lot of issues with string encoding type mismatches.
> Especially "automatic" conversions. This mailing list gets enough
> posts about encoding confusions.
>
> Would it make sense to make the string depend on its encoding type?

I think our String type doesn't have semantic problems, a string
really is a list of Unicode codepoints.  However this
representation has serious performance drawbacks.

Now we have Data.Text, which should have better performance and
maintain nice semantics.  However it uses a single internal
encoding for various reasons.  So, if your input and your output
are on the same coding X, where X isn't UTF-16 (IIRC), then you
will have to do two reencodes, perhaps unnecessarily.

So maybe annotating the encoding *could* be useful on some
applications.  But I can't imagine how hairy the implementation
of such a generalised Data.Text would be, nor the performance
impact if the dictionary isn't inlined/specialized for the case
in hand.

> E.g. a String UTF16 cannot be used with putStrLn :: String UTF8, it
> has to be used with putStrLn :: String UTF16. Provided the fundamental
> functions that read and write strings are type safe, there'll be no
> mix-ups?

Note that right now you don't need this extra burden to get the
safety you want.  Just use Data.Text everywhere.  The problem
isn't Data.Text but the Prelude IO functions using String where
there should be [Word8].

Cheers,

--
Felipe.


More information about the Haskell-Cafe mailing list