Text in Haskell: A PROPOSAL
Axel Simon
A.Simon@ukc.ac.uk
Thu, 8 Aug 2002 11:44:19 +0100
On Thu, Aug 08, 2002 at 03:16:09AM -0700, Ashley Yakeley wrote:
> At 2002-08-08 02:54, Ketil Z. Malde wrote:
> >and let the "standard" functions (e.g. readFile) convert to
> >[Char] according to current locale settings?
>
> The notion of "current locale settings" (including newline conventions)
> bothers me. I'd like my Haskell program to do the same thing regardless
> of which machine executes it -- particularly these days when files get
> shared around a lot.
But you can't make a problem go away by pretending it isn't there. If you
have a file in a different encoding than your current locale, at least you
can change the locale and then run your Haskell program on that file. I
think default encoding should depend on the current locale and some clever
"guess" functionality which switches to UTF-8 or Unicode when it sees the
magic characters at the beginning of a file. I think it is important that
the representation _within_ Haskell is well-defined (i.e. Unicode code
points with \n as newline).
> Do we really need "text mode" anymore?
What do you mean?
> > With, perhaps, UTF-8 as a reasonable default?
>
> Perhaps it should _always_ be UTF-8? Or is that too slow in some cases?
> It certainly raises "seek" issues as one Char codepoint may be
> represented by several octets.
Luckily "seek" is not a problem with readFile and most other common
functions. It's only
hSeek :: Handle -> SeekMode -> Integer -> IO ()
which then should get the remark "does only work reliably with hGetOctet
but not with hGetChar".
Axel.