non-ASCII filepaths in a C function

David Turner dct25-561bs at mythic-beasts.com
Sat Jul 25 21:04:54 UTC 2015


Hi,

The native representation for filepaths on Linux is char[] (i.e. raw
bytes). withCString converts from String to char[] using the current
locale, which doesn't always work (at least, it doesn't always do what you
want). As long as everything is in the same locale, ideally UTF-8, then
you'll be fine, but it's legitimate to have a file whose name is not legal
UTF-8 even in a UTF-8 locale, and these will cause you problems.

(Minor, nitpicky bugbear: the native representation for filepaths on
Windows is wchar_t[] which is interpreted as UTF-16 *where possible*, but
there are also some legal filenames (e.g. "C:\\Temp\\\xd800") which are
invalid as UTF-16)

I'm not familiar with soxlib specifically, but for opening a file on
Windows named as a char[] I'm going to guess that the library ultimately
ends up calling a so-called ANSI version of a function like CreateFileA,
which accepts a char[] and converts it to wchar_t[] within the OS according
to the current code page. withCString seems to look at the current code
page when converting a String to a char[] too, but clearly something's not
matching for you.

So a few things to check:

- does soxlib use the ANSI version, CreateFileA or similar?
- what code page does it think it's in?
- can you convert the troublesome filename to bytes in this code page by
hand, and compare with what withCString is doing?
- can you convert these bytes to wchar_t[] using MultiByteToWideChar in the
current code page? Does this look like what you expect?

Unfortunately there's no complete general solution to this problem that
fits through an API that only uses char[] for filenames - the mapping from
filenames written as char[] to Windows filenames is never surjective. The
best solution would be for soxlib to offer an API that accepted wchar_t[]
filenames on Windows, although I appreciate this might not be reasonable!

Hopefully this helps a bit.

On 25 July 2015 at 08:40, Malcolm Wallace <malcolm.wallace at me.com> wrote:

> I believe the native representation for FilePaths on Windows should be
> UTF16 strings.
>
> Regards,
>     Malcolm
>
> > On 24 Jul 2015, at 22:52, Henning Thielemann <
> lemming at henning-thielemann.de> wrote:
> >
> >
> > In my 'soxlib' package I have written a binding to
> >
> > sox_format_t * sox_open_read(
> >    char               const * path,
> >    sox_signalinfo_t   const * signal,
> >    sox_encodinginfo_t const * encoding,
> >    char               const * filetype);
> >
> >
> > I construct the C filepath "path" from a Haskell FilePath using
> Foreign.C.String.withCString. This works for ASCII and non-ASCII characters
> in Linux. However, non-ASCII characters let sox_open_read fail on Windows.
> What is the correct way to convert FilePath to "char *"?
> > _______________________________________________
> > Libraries mailing list
> > Libraries at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20150725/152c9354/attachment.html>


More information about the Libraries mailing list