behaviour change in getDirectoryContents in GHC 7.2?

Simon Marlow marlowsd at gmail.com
Tue Nov 8 12:04:14 CET 2011


On 07/11/2011 17:32, John Millikin wrote:
> On Mon, Nov 7, 2011 at 09:02, Simon Marlow<marlowsd at gmail.com>  wrote:
>> I think you might be misunderstanding how the new API works.  Basically,
>> imagine a reversible transformation:
>>
>>   encode :: String ->  [Word8]
>>   decode :: [Word8] ->  String
>>
>> this transformation is applied in the appropriate direction by the IO
>> library to translate filesystem paths into FilePath and vice versa.  No
>> information is lost; furthermore you can apply the transformation yourself
>> in order to recover the original [Word8] from a String, or to inject your
>> own [Word8] file path.
>>
>> Ok?
>
> I understand how the API is intended / designed to work; however, the
> implementation does not actually do this. My argument is that this
> transformation should be in a high-level library like "directory", and
> the low-level libraries like "base" or "unix" ought to provide
> functions which do not transform their inputs. That way, when an error
> is found in the encoding logic, it can be fixed by just pushing a new
> version of the affected library to Hackage, instead of requiring a new
> version of the compiler.
>
> I am also not convinced that it is possible to correctly implement
> either of these functions if their behavior is dependent on the user's
> locale.
>
>> All this does is mean that the common case where you want to interpret file
>> system paths as text works with no fuss, without breaking anything in the
>> case when the file system paths are not actually text.
>
> As mentioned earlier in the thread, this behavior is breaking things.
> Due to an implementation error, programs compiled with GHC 7.2 on
> POSIX systems cannot open files unless their paths also happen to be
> valid text according to their locale. It is very difficult to work
> around this error, because the paths-are-text logic was placed at a
> very low level in the library stack.

So your objection is that there is a bug?  What if we fixed the bug?

>> It would probably be better to have an abstract FilePath type and to keep
>> the original bytes, decoding on demand.  But that is a big change to the API
>> and would break much more code.  One day we'll do this properly; for now we
>> have this, which I think is a pretty reasonble compromise.
>
> Please understand, I am not arguing against the existence of this
> encoding layer in general. It's a fine idea for a simplistic
> high-level filesystem interaction library. But it should be
> *optional*, not part of the compiler or "base.

Ok, so I was about to reply and say that the low-level API is available 
via the unix and Win32 packages, and then I thought I should check 
first, and I discovered that even using System.Posix you get the magic 
encoding behaviour.

I really think we should provide the native APIs.  The problem is that 
the System.Posix.Directory API is all in terms of FilePath (=String), 
and if we gave that a different meaning from the System.Directory 
FilePaths then confusion would ensue.  So perhaps we need to add another 
API to System.Posix with filesystem operations in terms of ByteString, 
and similarly for Win32.

Cheers,
	Simon



More information about the Glasgow-haskell-users mailing list