behaviour change in getDirectoryContents in GHC 7.2?

Daniel Peebles pumpkingod at gmail.com
Mon Nov 7 07:41:32 CET 2011


Can't we just have the usual .Internal module convention, where people who
want internals can get at them if they need to, and most people get a
simpler interface? It's amazingly frustrating when you have a library that
does 99% of what you need it to do, except for one tiny internal detail
that the author didn't foresee anyone needing, so didn't export.

2011/11/6 John Lask <jvlask at hotmail.com>

> for what it is worth, I would like to see both System.IO and Directory
> export "internal functions" where the filepath is a Raw Byte
> representation.
>
> I have utilities that regularly scan 100,000 of files and hash the path
> the details of which are irrelevant to this discussion, the point being
> that the locale encoding/decoding is not relevant in this situation and
> adds unnecessary overhead that would affect the speed of the file-system
> scans.
>
> A  denotation of a filepath as an uninterpreted sequence of bytes is the
> lowest common denominator for all systems that I know of and would be
> worthwhile to export from the system libraries upon which other
> abstractions can be built.
>
> I agree that for the general user the current behavior is sufficient,
> however exporting the raw interface would be beneficial for some users,
> for instance those that have responded to this thread.
>
>
> On 7/11/2011 2:42 AM, Max Bolingbroke wrote:
> > On 6 November 2011 04:14, John Millikin<jmillikin at gmail.com>  wrote:
> >> For what it's worth, on my Ubuntu system, Nautilus ignores the locale
> >> and just treats all paths as either UTF8 or invalid.
> >> To me, this seems like the most reasonable option; the concept of
> >> "locale encoding" is entirely vestigal, and should only be used in
> >> certain specialized cases.
> >
> > Unfortunately non-UTF8 locale encodings are seen in practice quite
> > often. I'm not sure about Linux, but certainly lots of Windows systems
> > are configured with a locale encoding like GBK or Big5.
> >
> >> Paths as text is what *Windows* programmers expect. Paths as bytes is
> >> what's expected by programmers on non-Windows OSes, including Linux
> >> and OS X.
> >
> > IIRC paths on OS X are guaranteed to be valid UTF-8. The only platform
> > that uses bytes for paths (that we care about) is Linux.
> >
> >> I'm not saying one is inherently better than the other, but
> >> considering that various UNIX  and UNIX-like operating systems have
> >> been using byte-based paths for near on forty years now, trying to
> >> abolish them by redefining the type is not a useful action.
> >
> > We have to:
> >   1. Provide an API that makes sense on all our supported OSes
> >   2. Have getArgs :: IO [String]
> >   3. Have it such that if you go to your console and write
> > (./MyHaskellProgram 你好) then getArgs tells you ["你好"]
> >
> > Given these constraints I don't see any alternative to PEP-383 behaviour.
> >
> >> If you're going to make all the System.IO stuff use text, at least
> >> give us an escape hatch. The "unix" package is ideally suited, as it's
> >> already inherently OS-specific. Something like this would be perfect:
> >
> > You can already do this with the implemented design. We have:
> >
> > openFile :: FilePath ->  IO Handle
> >
> > The FilePath will be encoded in the fileSystemEncoding. On Unix this
> > will have PEP383 roundtripping behaviour. So if you want openFile' ::
> > [Byte] ->  IO Handle you can write something like this:
> >
> > escape = map (\b ->  if b<  128 then chr b else chr (0xEF00 + b))
> > openFile = openFile' . escape
> >
> > The bytes that reach the API call will be exactly the ones you supply.
> > (You can also implement "escape" by just encoding the [Byte] with the
> > fileSystemEncoding).
> >
> > Likewise, if you have a String and want to get the [Byte] we decoded
> > it from, you just need to encode the String again with the
> > fileSystemEncoding.
> >
> > If this is not enough for you please let me know, but it seems to me
> > that it covers all your use cases, without any need to reimplement the
> > FFI bindings.
> >
> > Max
> >
> > _______________________________________________
> > Glasgow-haskell-users mailing list
> > Glasgow-haskell-users at haskell.org
> > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/glasgow-haskell-users/attachments/20111107/981be363/attachment-0001.htm>


More information about the Glasgow-haskell-users mailing list