implementation of UTF-8 conversion for text I/O: iconv vs
hand-made
Bulat Ziganshin
bulat.ziganshin at gmail.com
Thu Apr 20 11:10:29 EDT 2006
Hello Einar,
Thursday, April 20, 2006, 6:24:14 PM, you wrote:
> Does Data.CharEncoding work with encodings that have state associated
> with them? One example is ISO-2022-JP.
no. so the list of things that is principal impossible with current
design of Data.CharEncoding is error processing/masking and handling
of stateful encodings
> Maybe with using a suitable monad transformer?
how you imagine that? we have the following classes:
class ByteStream m h where
vGetByte :: h -> m Word8
vPutByte :: h -> Word8 -> m ()
class TextStream m h where
vGetChar :: h -> m Char
vPutChar :: h -> Char -> m ()
and char encoding transformer should implement later via former:
instance ByteStream m h => TextStream m (CharEncoding h) where ...
it seems that we should just improve type of (vGetByte->vGetChar) and
(vPutByte->vPutChar) converters so that they will accept old state
and error processing mode and returns error code and new state. smth
like this:
type PutByte m h = h -> Word8 -> m ()
type EncodeConverter m h state = PutByte m h -> ErrMode -> h -> state
-> m (Either Char ErrCode, state)
where `state` saves current processing state, Errmode is error
processing mode and ErrCode is error code. of course, this should make
implementation even slower :(
>> 2) Einar once asked me about changing the encoding on the
>> fly, that is needed for some HTML processing. it is also possible that
>> some program will need to intersperse text I/O with
>> buffer/array/byte/bits I/O. it's a sort of things that are absolutely
>> impossible with iconv
> The example goes like this:
> 1) HTTP client reads response from server using ascii
> 2) When reading headers is complete, either:
> * decode body (binary data) and after decompressing convert to text
> * decode body (text in some encoding) straight from the Handle.
> Is there a reason this is impossible with iconv if the character conversion
> is on top of the buffering?
let's they answer :) i just want to mention to Simon that some apps
want to use binary and text i/o at the same stream. if you think that
HTTP has bad design, you know where to complain ;)
--
Best regards,
Bulat mailto:Bulat.Ziganshin at gmail.com
More information about the Libraries
mailing list