behaviour change in getDirectoryContents in GHC 7.2?

John Millikin jmillikin at gmail.com
Mon Nov 7 18:32:01 CET 2011


On Mon, Nov 7, 2011 at 09:02, Simon Marlow <marlowsd at gmail.com> wrote:
> I think you might be misunderstanding how the new API works.  Basically,
> imagine a reversible transformation:
>
>  encode :: String -> [Word8]
>  decode :: [Word8] -> String
>
> this transformation is applied in the appropriate direction by the IO
> library to translate filesystem paths into FilePath and vice versa.  No
> information is lost; furthermore you can apply the transformation yourself
> in order to recover the original [Word8] from a String, or to inject your
> own [Word8] file path.
>
> Ok?

I understand how the API is intended / designed to work; however, the
implementation does not actually do this. My argument is that this
transformation should be in a high-level library like "directory", and
the low-level libraries like "base" or "unix" ought to provide
functions which do not transform their inputs. That way, when an error
is found in the encoding logic, it can be fixed by just pushing a new
version of the affected library to Hackage, instead of requiring a new
version of the compiler.

I am also not convinced that it is possible to correctly implement
either of these functions if their behavior is dependent on the user's
locale.

> All this does is mean that the common case where you want to interpret file
> system paths as text works with no fuss, without breaking anything in the
> case when the file system paths are not actually text.

As mentioned earlier in the thread, this behavior is breaking things.
Due to an implementation error, programs compiled with GHC 7.2 on
POSIX systems cannot open files unless their paths also happen to be
valid text according to their locale. It is very difficult to work
around this error, because the paths-are-text logic was placed at a
very low level in the library stack.

> It would probably be better to have an abstract FilePath type and to keep
> the original bytes, decoding on demand.  But that is a big change to the API
> and would break much more code.  One day we'll do this properly; for now we
> have this, which I think is a pretty reasonble compromise.

Please understand, I am not arguing against the existence of this
encoding layer in general. It's a fine idea for a simplistic
high-level filesystem interaction library. But it should be
*optional*, not part of the compiler or "base.

As implemented in GHC 7.2, this encoding is a complex and untested
behavior with no escape hatch.



More information about the Glasgow-haskell-users mailing list