Text in Haskell: A PROPOSAL

Axel Simon A.Simon@ukc.ac.uk
Thu, 8 Aug 2002 11:44:19 +0100


On Thu, Aug 08, 2002 at 03:16:09AM -0700, Ashley Yakeley wrote:
> At 2002-08-08 02:54, Ketil Z. Malde wrote:
> >and let the "standard" functions (e.g. readFile) convert to
> >[Char] according to current locale settings?
> 
> The notion of "current locale settings" (including newline conventions) 
> bothers me. I'd like my Haskell program to do the same thing regardless 
> of which machine executes it -- particularly these days when files get 
> shared around a lot.
But you can't make a problem go away by pretending it isn't there. If you 
have a file in a different encoding than your current locale, at least you 
can change the locale and then run your Haskell program on that file. I 
think default encoding should depend on the current locale and some clever 
"guess" functionality which switches to UTF-8 or Unicode when it sees the 
magic characters at the beginning of a file. I think it is important that 
the representation _within_ Haskell is well-defined (i.e. Unicode code 
points with \n as newline).

> Do we really need "text mode" anymore?
What do you mean?
 
> >  With, perhaps, UTF-8 as a reasonable default?
> 
> Perhaps it should _always_ be UTF-8? Or is that too slow in some cases? 
> It certainly raises "seek" issues as one Char codepoint may be 
> represented by several octets.
Luckily "seek" is not a problem with readFile and most other common 
functions. It's only
hSeek :: Handle -> SeekMode -> Integer -> IO ()
which then should get the remark "does only work reliably with hGetOctet 
but not with hGetChar".

Axel.