Raw filenames vs locales
igloo at earth.li
Sat Jul 30 10:50:13 EDT 2005
I'd like to propose some changes to the IO library to fix some problems
(IMO) with hugs' recent closer adherence to the Haskell 98 report. I
believe this is orthogonal to the proposed new IO library stuff that's
been discussed before.
The rest of this mail is in 3 parts. First I describe the problem, then
a proposed solution, and finally some comments on backwards
With it's closer adherence to the Haskell 98 report, it is no longer
possible with hugs to manipulate files using the standard IO functions
if the filenames are not representable in your locale. To demonstrate
the problem consider this:
touch `printf "1\xA4"`
import System.Directory (getDirectoryContents)
import Data.Char (ord)
main = do xs <- getDirectoryContents "."
print (map (map ord) xs)
' > foo.hs
for locale in en_GB en_GB.ISO-8859-15 en_GB.UTF-8
echo "Doing $locale"
LC_ALL=$locale ../runhugs foo.hs
Here we create a file whose filename if 1\xA4. \xA4 is a
"currency sign" in ISO-8859-1, "euro sign" in ISO-8859-15 and not a
valid character in UTF-8. We then print the results of
getDirectoryContents, converting the Chars to Ints so we can see what's
The result is this:
The third file is the interesting one. We have:
ISO-8859-1: 164 = U+A4 = "currency sign"
ISO-8859-15: 8364 = U+20AC = "euro sign"
UTF-8: 65533 = U+FFFD = "replacement character"
"replacement character" is "used to replace an incoming character whose
value is unknown or unrepresentable in Unicode".
My suggestion is essentially that we change all functions using the
FilePath type to instead use FilePath a => a.
[ By jumping through hoops I think this could be done H98-compatibly,
but for simplicity I'll ignore that for now. I'm not sure if it's a
problem for any impl anyway? ]
I imagine the class would look something like
class FilePath a where
to_filename :: a -> IO FileName
from_filename :: FileName -> IO a
from_free_filename :: FileName -> IO a
from_free_filename f = do x <- from_filename f
with_filename :: FilePath a => a -> (FileName -> IO b) -> IO b
with_filename x f = do x' <- to_filename x
res <- f x'
We would then have
System.IO.Impl.getDirContents :: FileName -> [FileName]
System.IO.getDirContents :: FilePath a => a -> [a] -- Could be more general
System.IO.getDirContents x = do ys <- with_filename x Impl.getDirContents
mapM from_free_filename ys
On Unix systems FileName would be a Ptr Word8.
My knowledge of Windows isn't great, but I think there it would be an
array of 16-bit values?
We would have instances of FilePath for String and [Word8] to solve the
immediate problem. String would be the current behaviour, but [Word8]
would be converted to a FileName unchanged. On Windows it would probably
be necessary to throw an exception if a [Word8] is passed which is not
It would also be nice to have a FileName instance, to avoid unnecessary
conversions. A Ptr Word8 instance would also be handy for things like
darcs' FastPackedString module to be able to use efficiently (without
taking a round trip via a lazy list).
I haven't done any research into it, but I hope that a lot of the time
this will not be an issue as the impl will be able to infer the type
String is being used, either by a string literal, the fact it is
putStrLn'd, there is a type signature saying it is a String, etc.
The Haskell 98 modules like IO could re-export the functions with their
types restricted to what they are now. This would give us complete
backwards compatibility to Haskell 98.
It is certainly possible for there to be ambiguities in programs that
use the hierarchial libraries, however. Possible solutions are:
* Tell people to add type sigs to fix it.
* Define the new stuff in System.IO.Impl in a package iobase.
The oldio package would then contain System.IO which re-exports all
the functions with the old types, and the io package would do the same
with the new types.
Unfortunately i don't think this would work if you have some libraries
compiled against the io package you don't want to use. I think this
might be an argument that the package system is not being flexible
That's all I've got.
More information about the Libraries