[Haskell-cafe] Re: Filename encoding error (was: Perform a research a la Unix 'find')

Simon Michael simon at joyful.com
Mon Aug 23 19:36:32 EDT 2010


I've been banging my head on the same issues. To summarise: GHC 6.12 strings are unicode; unix file paths are slightly 
restricted byte strings; the former is used to represent the latter, leading to great confusion; the best way to fix it 
is unclear. Here's a workaround I wrote this morning:

-- | A platform string is a string value from or for the operating system,
-- such as a file path or command-line argument (or environment variable's
-- name or value ?). On some platforms (such as unix) these are not real
-- unicode strings but have some encoding such as UTF-8. This alias does
-- no type enforcement but aids code clarity.
type PlatformString = String

-- | Convert a possibly encoded platform string to a real unicode string.
-- We decode the UTF-8 encoding recommended for unix systems
-- (cf http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html)
-- and leave anything else unchanged.
fromPlatformString :: PlatformString -> String
fromPlatformString s = if UTF8.isUTF8Encoded s then UTF8.decodeString s else s

-- | Convert a unicode string to a possibly encoded platform string.
-- On unix we encode with the recommended UTF-8
-- (cf http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html)
-- and elsewhere we leave it unchanged.
toPlatformString :: String -> PlatformString
toPlatformString = case os of
                      "unix" -> UTF8.encodeString
                      "linux" -> UTF8.encodeString
                      "darwin" -> UTF8.encodeString
                      _ -> id




More information about the Haskell-Cafe mailing list