[Haskell-cafe] Core packages and locale support

Roman Cheplyaka roma at ro-che.info
Fri Jun 25 02:42:21 EDT 2010


* Jason Dagit <dagit at codersbase.com> [2010-06-24 20:52:03-0700]
> On Sat, Jun 19, 2010 at 1:06 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
> 
> > While ghc 6.12 finally has proper locale support, core packages (such as
> > unix) still use withCString and therefore work incorrectly when argument
> > (e.g. file path) is not ASCII.
> >
> 
> Pardon me if I'm misunderstanding withCString, but my understanding of unix
> paths is that they are to be treated as strings of bytes.  That is, unlike
> windows, they do not have an encoding predefined.  Furthermore, you could
> have two filepaths in the same directory with different encodings due to
> this.
> 
> In this case, what would be the correct way of handling the paths?
>  Converting to a Haskell String would require knowing the encoding, right?
>  My reasoning is that Haskell Char type is meant to correspond to code
> points so putting them into a string means you have to know their code point
> which is different from their (multi-)byte value right?
> 
> Perhaps I have some details wrong?  If so, please clarify.

Jason,

you got everything right here. So, as you said, there is a mismatch
between representation in Haskell (list of code points) and
representation in the operating system (list of bytes), so we need to
know the encoding. Encoding is supplied by the user via locale
(https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly
LC_CTYPE variable.

The problem with encodings is not new -- it was already solved e.g. for
input/output.

As I said, I'm willing to prepare the patches, but I really need a
mentor for this.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
"Don't let school get in the way of your education." - Mark Twain


More information about the Haskell-Cafe mailing list