Proposal #3337: expose Unicode and newline translation from System.IO

Judah Jacobson judah.jacobson at gmail.com
Thu Jul 2 18:04:16 EDT 2009


On Tue, Jun 30, 2009 at 5:03 AM, Simon Marlow<marlowsd at gmail.com> wrote:
> Ticket:
>
>  http://hackage.haskell.org/trac/ghc/ticket/3337
>
> For the proposed new additions, see:
>
>  * http://www.haskell.org/~simonmar/base/System-IO.html#23
>   System.IO (Unicode encoding/decoding)
>
>  * http://www.haskell.org/~simonmar/base/System-IO.html#25
>   System.IO (Newline conversion)
>
> Discussion period: 2 weeks (14 July).

Three points:

1) It would be good to have an hGetEncoding function, so that we can
temporarily set the encoding of a Handle like stdin without affecting
the rest of the program.

2) It looks like your API always throws an error on invalid input; it
would be great if there were some way to customize this behavior.
Nothing complicated, maybe just an enum which specifies one of the
following behaviors:

- throw an error
- ignore (i.e., drop) invalid bytes/Chars
- replace undecodable bytes with u+FFFD and unencodable Chars with '?'

My preference for the API change would be to add a function in
GHC.IO.Encoding.Iconv; for example,

mkTextEncodingError :: String -> ErrorHandling -> IO TextEncoding

since this is similar to how GHC.IO.Encoding.Latin1 allows error
handling by providing latin1 and  latin1_checked as separate encoders.

Any more complicated behavior is probably best handled by something
like the text package.


3) How hard would it be to get Windows code page support working?  I'd
like that a lot since it would further simplify the code in Haskeline.
 I can help out with the implementation if it's just a question of
time.

Thanks again for taking care of all this,
-Judah


More information about the Libraries mailing list