[Haskell-cafe] What is the state if Unicode in Haskell
implementations?
Duncan Coutts
duncan.coutts at worc.ox.ac.uk
Mon Jul 31 08:07:56 EDT 2006
On Mon, 2006-07-31 at 13:56 +0200, Olof Bjarnason wrote:
> Hi there!
> I'm trying to user Haskell as a code-generating language, specifically
> generating C# code files. The wish list is
> 1) reading UTF-8 coded text files into unicode-enabled Strings, lets
> call them UString
The ordinary Haskell String type is "unicode-enabled".
> 2) writing UStrings to UTF-8 coded text files
> 3) using unicode strings in-code, that is in my .hs files
>
> I can live without 3), and with a little good will also 2), but 1) is
> harder since I cannot really hope my input files (meta-data-files) are
> coded in anything else than UTF-8.
You can do 1 and 2 now with a little extra code for decoding and
encoding UTF8. You will be able to do 3) in GHC 6.6.
For 1 & 2, grab some UTF8 code from somewhere:
encode, decode :: String -> String
and define
readFileUTF8 fname = fmap decode (readFile fname)
writeFileUTF8 fname content = writeFile fname (encode content)
So all internal processing happens as String which is Unicode and you
encode and decode when you read/write UTF8 encoded files.
Duncan
More information about the Haskell-Cafe
mailing list