portable encoding/decoding without going via a handle
Herbert Valerio Riedel
hvr at gnu.org
Sun Nov 25 11:51:32 CET 2012
Ganesh Sittampalam <ganesh at earth.li> writes:
> I need to convert directly between different string encodings, rather
> than just using a particular encoding when reading from/writing to a Handle.
>
> I'm aware of the following options, but they have a few problems:
>
> - text-icu: not easily usable on Windows as it requires libicu
> - text: just handles utf8/16/32
> - iconv: POSIX only
>
> It seems like GHC's TextEncoding has the necessary low-level
> functionality
> (http://hackage.haskell.org/packages/archive/base/latest/doc/html/GHC-IO-Encoding-Types.html#t:BufferCodec),
> but I can't find any high-level interface for directly transcoding
> between String/Bytestring/Text.
>
> Am I missing something, or would this be a useful addition as a separate
> library?
btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that
'mkTextEncoding'[1] only supports utf8/16/32 in a system independent
fashion:
,----
| The set of known encodings is system-dependent, but includes at least:
|
| - UTF-8
| - UTF-16, UTF-16BE, UTF-16LE
| - UTF-32, UTF-32BE, UTF-32LE
|
| On systems using GNU iconv (e.g. Linux), there is additional notation
| for specifying how illegal characters are handled:
|
| - a suffix of //IGNORE, e.g. UTF-8//IGNORE, will cause all illegal
| sequences on input to be ignored, and on output will drop all code
| points that have no representation in the target encoding.
|
| - a suffix of //TRANSLIT will choose a replacement character for
| illegal sequences or code points.
|
| On Windows, you can access supported code pages with the prefix CP; for
| example, "CP1250".
`----
...so does using GHC.Encoding.* actually provide you with more encodings
than using the other options ('text' et al.) you mentioned? which text
encodings beyond the UTF-family do you need btw?
[1]: http://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/GHC-IO-Encoding.html
cheers,
hvr
More information about the Libraries
mailing list