portable encoding/decoding without going via a handle
Herbert Valerio Riedel
hvr at gnu.org
Sun Nov 25 11:51:32 CET 2012
Ganesh Sittampalam <ganesh at earth.li> writes:
> I need to convert directly between different string encodings, rather
> than just using a particular encoding when reading from/writing to a Handle.
> I'm aware of the following options, but they have a few problems:
> - text-icu: not easily usable on Windows as it requires libicu
> - text: just handles utf8/16/32
> - iconv: POSIX only
> It seems like GHC's TextEncoding has the necessary low-level
> but I can't find any high-level interface for directly transcoding
> between String/Bytestring/Text.
> Am I missing something, or would this be a useful addition as a separate
btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that
'mkTextEncoding' only supports utf8/16/32 in a system independent
| The set of known encodings is system-dependent, but includes at least:
| - UTF-8
| - UTF-16, UTF-16BE, UTF-16LE
| - UTF-32, UTF-32BE, UTF-32LE
| On systems using GNU iconv (e.g. Linux), there is additional notation
| for specifying how illegal characters are handled:
| - a suffix of //IGNORE, e.g. UTF-8//IGNORE, will cause all illegal
| sequences on input to be ignored, and on output will drop all code
| points that have no representation in the target encoding.
| - a suffix of //TRANSLIT will choose a replacement character for
| illegal sequences or code points.
| On Windows, you can access supported code pages with the prefix CP; for
| example, "CP1250".
...so does using GHC.Encoding.* actually provide you with more encodings
than using the other options ('text' et al.) you mentioned? which text
encodings beyond the UTF-family do you need btw?
More information about the Libraries