implementation of UTF-8 conversion for text I/O: iconv vs hand-made

Bulat Ziganshin bulat.ziganshin at gmail.com
Thu Apr 20 11:10:29 EDT 2006


Hello Einar,

Thursday, April 20, 2006, 6:24:14 PM, you wrote:

> Does Data.CharEncoding work with encodings that have state associated
> with them? One example is ISO-2022-JP.

no. so the list of things that is principal impossible with current
design of Data.CharEncoding is error processing/masking and handling
of stateful encodings

> Maybe with using a suitable monad transformer?

how you imagine that? we have the following classes:

class ByteStream m h where
  vGetByte :: h -> m Word8
  vPutByte :: h -> Word8 -> m ()

class TextStream m h where
  vGetChar :: h -> m Char
  vPutChar :: h -> Char -> m ()

and char encoding transformer should implement later via former:

instance ByteStream m h => TextStream m (CharEncoding h) where ...

it seems that we should just improve type of (vGetByte->vGetChar) and
(vPutByte->vPutChar) converters so that they will accept old state
and error processing mode and returns error code and new state. smth
like this:

type PutByte m h = h -> Word8 -> m ()
type EncodeConverter m h state = PutByte m h -> ErrMode -> h -> state
                                 -> m (Either Char ErrCode, state)

where `state` saves current processing state, Errmode is error
processing mode and ErrCode is error code. of course, this should make
implementation even slower :(


>> 2) Einar once asked me about changing the encoding on the
>> fly, that is needed for some HTML processing. it is also possible that
>> some program will need to intersperse text I/O with
>> buffer/array/byte/bits I/O. it's a sort of things that are absolutely
>> impossible with iconv 

> The example goes like this:
> 1) HTTP client reads response from server using ascii
> 2) When reading headers is complete, either:
>    * decode body (binary data) and after decompressing convert to text
>    * decode body (text in some encoding) straight from the Handle.

> Is there a reason this is impossible with iconv if the character conversion
> is on top of the buffering?

let's they answer :)  i just want to mention to Simon that some apps
want to use binary and text i/o at the same stream. if you think that
HTTP has bad design, you know where to complain ;)


-- 
Best regards,
 Bulat                            mailto:Bulat.Ziganshin at gmail.com



More information about the Libraries mailing list