[Haskell-cafe] Core packages and locale support

Jason Dagit dagit at codersbase.com
Fri Jun 25 13:09:21 EDT 2010


On Thu, Jun 24, 2010 at 11:42 PM, Roman Cheplyaka <roma at ro-che.info> wrote:

> * Jason Dagit <dagit at codersbase.com> [2010-06-24 20:52:03-0700]
> > On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka <roma at ro-che.info>
> wrote:
> >
> > > While ghc 6.12 finally has proper locale support, core packages (such
> as
> > > unix) still use withCString and therefore work incorrectly when
> argument
> > > (e.g. file path) is not ASCII.
> > >
> >
> > Pardon me if I'm misunderstanding withCString, but my understanding of
> unix
> > paths is that they are to be treated as strings of bytes.  That is,
> unlike
> > windows, they do not have an encoding predefined.  Furthermore, you could
> > have two filepaths in the same directory with different encodings due to
> > this.
> >
> > In this case, what would be the correct way of handling the paths?
> >  Converting to a Haskell String would require knowing the encoding,
> right?
> >  My reasoning is that Haskell Char type is meant to correspond to code
> > points so putting them into a string means you have to know their code
> point
> > which is different from their (multi-)byte value right?
> >
> > Perhaps I have some details wrong?  If so, please clarify.
>
> Jason,
>
> you got everything right here. So, as you said, there is a mismatch
> between representation in Haskell (list of code points) and
> representation in the operating system (list of bytes), so we need to
> know the encoding. Encoding is supplied by the user via locale
> (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
> LC_CTYPE variable.
>
> The problem with encodings is not new -- it was already solved e.g. for
> input/output.
>

This is the part where I don't understand the problem well.  I thought that
with IO the program assumes the locale of the environment but that with
filepaths you don't know what locale (more specifically which encoding) they
were created with.  So if you try to treat them as having the locale of the
current environment you run the risk of misunderstanding their encoding.

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100625/a2d72b6d/attachment.html


More information about the Haskell-Cafe mailing list