portable encoding/decoding without going via a handle

Sun Nov 25 11:51:32 CET 2012

Ganesh Sittampalam <ganesh at earth.li> writes:

> I need to convert directly between different string encodings, rather
> than just using a particular encoding when reading from/writing to a Handle.
>
> I'm aware of the following options, but they have a few problems:
>
> - text-icu: not easily usable on Windows as it requires libicu
> - text: just handles utf8/16/32
> - iconv: POSIX only
>
> It seems like GHC's TextEncoding has the necessary low-level
> functionality
> (http://hackage.haskell.org/packages/archive/base/latest/doc/html/GHC-IO-Encoding-Types.html#t:BufferCodec),
> but I can't find any high-level interface for directly transcoding
> between String/Bytestring/Text.
>
> Am I missing something, or would this be a useful addition as a separate
> library?

btw, looking at the GHC.IO.Encoding.* modules, it seems to me that that
'mkTextEncoding'[1] only supports utf8/16/32 in a system independent
fashion:

,----
| The set of known encodings is system-dependent, but includes at least:
| 
|  - UTF-8
|  - UTF-16, UTF-16BE, UTF-16LE
|  - UTF-32, UTF-32BE, UTF-32LE 
| 
| On systems using GNU iconv (e.g. Linux), there is additional notation
| for specifying how illegal characters are handled:
| 
|  - a suffix of //IGNORE, e.g. UTF-8//IGNORE, will cause all illegal
|    sequences on input to be ignored, and on output will drop all code
|    points that have no representation in the target encoding.
| 
|  - a suffix of //TRANSLIT will choose a replacement character for
|    illegal sequences or code points. 
| 
| On Windows, you can access supported code pages with the prefix CP; for
| example, "CP1250".
`----

...so does using GHC.Encoding.* actually provide you with more encodings
than using the other options ('text' et al.) you mentioned? which text
encodings beyond the UTF-family do you need btw?

 [1]: http://hackage.haskell.org/packages/archive/base/4.6.0.0/doc/html/GHC-IO-Encoding.html

cheers,
   hvr