[Haskell-cafe] getting crazy with character encoding
Brandon S. Allbery KF8NH
allbery at ece.cmu.edu
Wed Sep 12 10:53:29 EDT 2007
On Sep 12, 2007, at 10:18 , Andrea Rossato wrote:
> supposed that, in a Linux system, in an utf-8 locale, you create a
> file
> with non ascii characters. For instance:
> touch abèèè
>
> Now, I would expect that the output of a shell command such as
> "ls ab*"
> would be a string/list of 5 chars. Instead I find it to be a list of 8
> chars...;-)
That is expected. The low level filesystem storage doesn't know
about character sets, so non-ASCII filenames must be encoded in e.g.
UTF-8. 8 characters is therefore correct, and you must do UTF-8
decoding on input because Haskell does not do so automatically.
This will also be true with getdirent() aka getDirectoryContents.
--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery at kf8nh.com
system administrator [openafs,heimdal,too many hats] allbery at ece.cmu.edu
electrical and computer engineering, carnegie mellon university KF8NH
More information about the Haskell-Cafe
mailing list