Proposal #3455: Add a setting to change how Unicode encoding errors are handled

Simon Marlow marlowsd at gmail.com
Tue Aug 25 08:10:09 EDT 2009


On 23/08/2009 17:22, Judah Jacobson wrote:
> I proposal that we augment ghc-6.12.1's support for Unicode Handles
> by adding the following functions to System.IO:
>
> hSetOnEncodingError :: Handle ->  OnEncodingError ->  IO ()
> hGetOnEncodingError :: Handle ->  IO OnEncodingError
>
> as well as the enumeration `OnEncodingError` with three constructors:
>
>   - `ThrowEncodingError`: Throw an exception at the first encoding or
>   decoding
>     error.
>   - `SkipEncodingError`: Skip all invalid bytes or characters.
>   - `TranslitEncodingError`: Replace undecodable bytes with u+FFFD, and
>   unencodable characters with '?'.
>
> I have implemented this functionality in a patch attached to the
> ticket.  Haddock docs
> are here:
> http://code.haskell.org/~judah/new-io-docs/System-IO.html#23
>
>
> The choice of error handler is orthogonal to the choice of encoder.
> Additionally, the same setting is used for both read and write modes.  For
> portability, the handlers are written in pure Haskell rather than using
> GNU iconv's //TRANSLIT feature.
>
> Note that the text package, for example, provides more sophisticated
> error-handling options.  However, I think the above choices are useful
> enough without making the API too complicated.

I replied on the ticket, reproduced here for readers of libraries@:

It looks like the main question here is whether the IOError should be 
returned explicitly (as in your patch), or whether we should just catch 
the exception. All things being equal, catching the exception would be 
simpler, as it wouldn't require any changes in the codecs. Is there a 
reason why you didn't do it that way? Perhaps because you want to be 
sure that the exception is really an encoding error, and not some other 
kind of exception? If that's the case, then we should introduce a new 
exception for encoding errors (that's probably a good idea anyway).

Cheers,
	Simon


More information about the Libraries mailing list